A survey of commercial object-oriented database management systems
NASA Technical Reports Server (NTRS)
Atkins, John
1992-01-01
The object-oriented data model is the culmination of over thirty years of database research. Initially, database research focused on the need to provide information in a consistent and efficient manner to the business community. Early data models such as the hierarchical model and the network model met the goal of consistent and efficient access to data and were substantial improvements over simple file mechanisms for storing and accessing data. However, these models required highly skilled programmers to provide access to the data. Consequently, in the early 70's E.F. Codd, an IBM research computer scientists, proposed a new data model based on the simple mathematical notion of the relation. This model is known as the Relational Model. In the relational model, data is represented in flat tables (or relations) which have no physical or internal links between them. The simplicity of this model fostered the development of powerful but relatively simple query languages that now made data directly accessible to the general database user. Except for large, multi-user database systems, a database professional was in general no longer necessary. Database professionals found that traditional data in the form of character data, dates, and numeric data were easily represented and managed via the relational model. Commercial relational database management systems proliferated and performance of relational databases improved dramatically. However, there was a growing community of potential database users whose needs were not met by the relational model. These users needed to store data with data types not available in the relational model and who required a far richer modelling environment than that provided by the relational model. Indeed, the complexity of the objects to be represented in the model mandated a new approach to database technology. The Object-Oriented Model was the result.
SORTEZ: a relational translator for NCBI's ASN.1 database.
Hart, K W; Searls, D B; Overton, G C
1994-07-01
The National Center for Biotechnology Information (NCBI) has created a database collection that includes several protein and nucleic acid sequence databases, a biosequence-specific subset of MEDLINE, as well as value-added information such as links between similar sequences. Information in the NCBI database is modeled in Abstract Syntax Notation 1 (ASN.1) an Open Systems Interconnection protocol designed for the purpose of exchanging structured data between software applications rather than as a data model for database systems. While the NCBI database is distributed with an easy-to-use information retrieval system, ENTREZ, the ASN.1 data model currently lacks an ad hoc query language for general-purpose data access. For that reason, we have developed a software package, SORTEZ, that transforms the ASN.1 database (or other databases with nested data structures) to a relational data model and subsequently to a relational database management system (Sybase) where information can be accessed through the relational query language, SQL. Because the need to transform data from one data model and schema to another arises naturally in several important contexts, including efficient execution of specific applications, access to multiple databases and adaptation to database evolution this work also serves as a practical study of the issues involved in the various stages of database transformation. We show that transformation from the ASN.1 data model to a relational data model can be largely automated, but that schema transformation and data conversion require considerable domain expertise and would greatly benefit from additional support tools.
System, method and apparatus for conducting a phrase search
NASA Technical Reports Server (NTRS)
McGreevy, Michael W. (Inventor)
2004-01-01
A phrase search is a method of searching a database for subsets of the database that are relevant to an input query. First, a number of relational models of subsets of a database are provided. A query is then input. The query can include one or more sequences of terms. Next, a relational model of the query is created. The relational model of the query is then compared to each one of the relational models of subsets of the database. The identifiers of the relevant subsets are then output.
System, method and apparatus for generating phrases from a database
NASA Technical Reports Server (NTRS)
McGreevy, Michael W. (Inventor)
2004-01-01
A phrase generation is a method of generating sequences of terms, such as phrases, that may occur within a database of subsets containing sequences of terms, such as text. A database is provided and a relational model of the database is created. A query is then input. The query includes a term or a sequence of terms or multiple individual terms or multiple sequences of terms or combinations thereof. Next, several sequences of terms that are contextually related to the query are assembled from contextual relations in the model of the database. The sequences of terms are then sorted and output. Phrase generation can also be an iterative process used to produce sequences of terms from a relational model of a database.
Levy, C.; Beauchamp, C.
1996-01-01
This poster describes the methods used and working prototype that was developed from an abstraction of the relational model from the VA's hierarchical DHCP database. Overlaying the relational model on DHCP permits multiple user views of the physical data structure, enhances access to the database by providing a link to commercial (SQL based) software, and supports a conceptual managed care data model based on primary and longitudinal patient care. The goal of this work was to create a relational abstraction of the existing hierarchical database; to construct, using SQL data definition language, user views of the database which reflect the clinical conceptual view of DHCP, and to allow the user to work directly with the logical view of the data using GUI based commercial software of their choosing. The workstation is intended to serve as a platform from which a managed care information model could be implemented and evaluated.
The PMDB Protein Model Database
Castrignanò, Tiziana; De Meo, Paolo D'Onorio; Cozzetto, Domenico; Talamo, Ivano Giuseppe; Tramontano, Anna
2006-01-01
The Protein Model Database (PMDB) is a public resource aimed at storing manually built 3D models of proteins. The database is designed to provide access to models published in the scientific literature, together with validating experimental data. It is a relational database and it currently contains >74 000 models for ∼240 proteins. The system is accessible at and allows predictors to submit models along with related supporting evidence and users to download them through a simple and intuitive interface. Users can navigate in the database and retrieve models referring to the same target protein or to different regions of the same protein. Each model is assigned a unique identifier that allows interested users to directly access the data. PMID:16381873
Component, Context and Manufacturing Model Library (C2M2L)
2013-03-01
Penn State team were stored in a relational database for easy access, storage and maintainability. The relational database consisted of a PostGres ...file into a format that can be imported into the PostGres database. This same custom application was used to generate Microsoft Excel templates...Press Break Forming Equipment 4.14 Manufacturing Model Library Database Structure The data storage mechanism for the ARL PSU MML was a PostGres database
Fuzzy queries above relational database
NASA Astrophysics Data System (ADS)
Smolka, Pavel; Bradac, Vladimir
2017-11-01
The aim of the theme is to introduce a possibility of fuzzy queries implemented in relational databases. The issue is described on a model which identifies the appropriate part of the problem domain for fuzzy approach. The model is demonstrated on a database of wines focused on searching in it. The construction of the database complies with the Law of the Czech Republic.
System, method and apparatus for conducting a keyterm search
NASA Technical Reports Server (NTRS)
McGreevy, Michael W. (Inventor)
2004-01-01
A keyterm search is a method of searching a database for subsets of the database that are relevant to an input query. First, a number of relational models of subsets of a database are provided. A query is then input. The query can include one or more keyterms. Next, a gleaning model of the query is created. The gleaning model of the query is then compared to each one of the relational models of subsets of the database. The identifiers of the relevant subsets are then output.
[The future of clinical laboratory database management system].
Kambe, M; Imidy, D; Matsubara, A; Sugimoto, Y
1999-09-01
To assess the present status of the clinical laboratory database management system, the difference between the Clinical Laboratory Information System and Clinical Laboratory System was explained in this study. Although three kinds of database management systems (DBMS) were shown including the relational model, tree model and network model, the relational model was found to be the best DBMS for the clinical laboratory database based on our experience and developments of some clinical laboratory expert systems. As a future clinical laboratory database management system, the IC card system connected to an automatic chemical analyzer was proposed for personal health data management and a microscope/video system was proposed for dynamic data management of leukocytes or bacteria.
A probabilistic NF2 relational algebra for integrated information retrieval and database systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fuhr, N.; Roelleke, T.
The integration of information retrieval (IR) and database systems requires a data model which allows for modelling documents as entities, representing uncertainty and vagueness and performing uncertain inference. For this purpose, we present a probabilistic data model based on relations in non-first-normal-form (NF2). Here, tuples are assigned probabilistic weights giving the probability that a tuple belongs to a relation. Thus, the set of weighted index terms of a document are represented as a probabilistic subrelation. In a similar way, imprecise attribute values are modelled as a set-valued attribute. We redefine the relational operators for this type of relations such thatmore » the result of each operator is again a probabilistic NF2 relation, where the weight of a tuple gives the probability that this tuple belongs to the result. By ordering the tuples according to decreasing probabilities, the model yields a ranking of answers like in most IR models. This effect also can be used for typical database queries involving imprecise attribute values as well as for combinations of database and IR queries.« less
Building a generalized distributed system model
NASA Technical Reports Server (NTRS)
Mukkamala, Ravi
1991-01-01
A number of topics related to building a generalized distributed system model are discussed. The effects of distributed database modeling on evaluation of transaction rollbacks, the measurement of effects of distributed database models on transaction availability measures, and a performance analysis of static locking in replicated distributed database systems are covered.
SQL/NF Translator for the Triton Nested Relational Database System
1990-12-01
18as., Ohio .. 9~~ ~~ 1 4- AFIT/GCE/ENG/90D-05 SQL/Nk1 TRANSLATOR FOR THE TRITON NESTED RELATIONAL DATABASE SYSTEM THESIS Craig William Schnepf Captain...FOR THE TRITON NESTED RELATIONAL DATABASE SYSTEM THESIS Presented to the Faculty of the School of Engineering of the Air Force Institute of Technnlogy... systems . The SQL/NF query language used for the nested relationil model is an extension of the popular relational model query language SQL. The query
Class dependency of fuzzy relational database using relational calculus and conditional probability
NASA Astrophysics Data System (ADS)
Deni Akbar, Mohammad; Mizoguchi, Yoshihiro; Adiwijaya
2018-03-01
In this paper, we propose a design of fuzzy relational database to deal with a conditional probability relation using fuzzy relational calculus. In the previous, there are several researches about equivalence class in fuzzy database using similarity or approximate relation. It is an interesting topic to investigate the fuzzy dependency using equivalence classes. Our goal is to introduce a formulation of a fuzzy relational database model using the relational calculus on the category of fuzzy relations. We also introduce general formulas of the relational calculus for the notion of database operations such as ’projection’, ’selection’, ’injection’ and ’natural join’. Using the fuzzy relational calculus and conditional probabilities, we introduce notions of equivalence class, redundant, and dependency in the theory fuzzy relational database.
Combining computational models, semantic annotations and simulation experiments in a graph database
Henkel, Ron; Wolkenhauer, Olaf; Waltemath, Dagmar
2015-01-01
Model repositories such as the BioModels Database, the CellML Model Repository or JWS Online are frequently accessed to retrieve computational models of biological systems. However, their storage concepts support only restricted types of queries and not all data inside the repositories can be retrieved. In this article we present a storage concept that meets this challenge. It grounds on a graph database, reflects the models’ structure, incorporates semantic annotations and simulation descriptions and ultimately connects different types of model-related data. The connections between heterogeneous model-related data and bio-ontologies enable efficient search via biological facts and grant access to new model features. The introduced concept notably improves the access of computational models and associated simulations in a model repository. This has positive effects on tasks such as model search, retrieval, ranking, matching and filtering. Furthermore, our work for the first time enables CellML- and Systems Biology Markup Language-encoded models to be effectively maintained in one database. We show how these models can be linked via annotations and queried. Database URL: https://sems.uni-rostock.de/projects/masymos/ PMID:25754863
Guide on Data Models in the Selection and Use of Database Management Systems. Final Report.
ERIC Educational Resources Information Center
Gallagher, Leonard J.; Draper, Jesse M.
A tutorial introduction to data models in general is provided, with particular emphasis on the relational and network models defined by the two proposed ANSI (American National Standards Institute) database language standards. Examples based on the network and relational models include specific syntax and semantics, while examples from the other…
An Object-Relational Ifc Storage Model Based on Oracle Database
NASA Astrophysics Data System (ADS)
Li, Hang; Liu, Hua; Liu, Yong; Wang, Yuan
2016-06-01
With the building models are getting increasingly complicated, the levels of collaboration across professionals attract more attention in the architecture, engineering and construction (AEC) industry. In order to adapt the change, buildingSMART developed Industry Foundation Classes (IFC) to facilitate the interoperability between software platforms. However, IFC data are currently shared in the form of text file, which is defective. In this paper, considering the object-based inheritance hierarchy of IFC and the storage features of different database management systems (DBMS), we propose a novel object-relational storage model that uses Oracle database to store IFC data. Firstly, establish the mapping rules between data types in IFC specification and Oracle database. Secondly, design the IFC database according to the relationships among IFC entities. Thirdly, parse the IFC file and extract IFC data. And lastly, store IFC data into corresponding tables in IFC database. In experiment, three different building models are selected to demonstrate the effectiveness of our storage model. The comparison of experimental statistics proves that IFC data are lossless during data exchange.
An effective model for store and retrieve big health data in cloud computing.
Goli-Malekabadi, Zohreh; Sargolzaei-Javan, Morteza; Akbari, Mohammad Kazem
2016-08-01
The volume of healthcare data including different and variable text types, sounds, and images is increasing day to day. Therefore, the storage and processing of these data is a necessary and challenging issue. Generally, relational databases are used for storing health data which are not able to handle the massive and diverse nature of them. This study aimed at presenting the model based on NoSQL databases for the storage of healthcare data. Despite different types of NoSQL databases, document-based DBs were selected by a survey on the nature of health data. The presented model was implemented in the Cloud environment for accessing to the distribution properties. Then, the data were distributed on the database by applying the Shard property. The efficiency of the model was evaluated in comparison with the previous data model, Relational Database, considering query time, data preparation, flexibility, and extensibility parameters. The results showed that the presented model approximately performed the same as SQL Server for "read" query while it acted more efficiently than SQL Server for "write" query. Also, the performance of the presented model was better than SQL Server in the case of flexibility, data preparation and extensibility. Based on these observations, the proposed model was more effective than Relational Databases for handling health data. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
NASA Technical Reports Server (NTRS)
Maluf, David A.; Tran, Peter B.
2003-01-01
Object-Relational database management system is an integrated hybrid cooperative approach to combine the best practices of both the relational model utilizing SQL queries and the object-oriented, semantic paradigm for supporting complex data creation. In this paper, a highly scalable, information on demand database framework, called NETMARK, is introduced. NETMARK takes advantages of the Oracle 8i object-relational database using physical addresses data types for very efficient keyword search of records spanning across both context and content. NETMARK was originally developed in early 2000 as a research and development prototype to solve the vast amounts of unstructured and semi-structured documents existing within NASA enterprises. Today, NETMARK is a flexible, high-throughput open database framework for managing, storing, and searching unstructured or semi-structured arbitrary hierarchal models, such as XML and HTML.
The relational database model and multiple multicenter clinical trials.
Blumenstein, B A
1989-12-01
The Southwest Oncology Group (SWOG) chose to use a relational database management system (RDBMS) for the management of data from multiple clinical trials because of the underlying relational model's inherent flexibility and the natural way multiple entity types (patients, studies, and participants) can be accommodated. The tradeoffs to using the relational model as compared to using the hierarchical model include added computing cycles due to deferred data linkages and added procedural complexity due to the necessity of implementing protections against referential integrity violations. The SWOG uses its RDBMS as a platform on which to build data operations software. This data operations software, which is written in a compiled computer language, allows multiple users to simultaneously update the database and is interactive with respect to the detection of conditions requiring action and the presentation of options for dealing with those conditions. The relational model facilitates the development and maintenance of data operations software.
Characterizing the genetic structure of a forensic DNA database using a latent variable approach.
Kruijver, Maarten
2016-07-01
Several problems in forensic genetics require a representative model of a forensic DNA database. Obtaining an accurate representation of the offender database can be difficult, since databases typically contain groups of persons with unregistered ethnic origins in unknown proportions. We propose to estimate the allele frequencies of the subpopulations comprising the offender database and their proportions from the database itself using a latent variable approach. We present a model for which parameters can be estimated using the expectation maximization (EM) algorithm. This approach does not rely on relatively small and possibly unrepresentative population surveys, but is driven by the actual genetic composition of the database only. We fit the model to a snapshot of the Dutch offender database (2014), which contains close to 180,000 profiles, and find that three subpopulations suffice to describe a large fraction of the heterogeneity in the database. We demonstrate the utility and reliability of the approach with three applications. First, we use the model to predict the number of false leads obtained in database searches. We assess how well the model predicts the number of false leads obtained in mock searches in the Dutch offender database, both for the case of familial searching for first degree relatives of a donor and searching for contributors to three-person mixtures. Second, we study the degree of partial matching between all pairs of profiles in the Dutch database and compare this to what is predicted using the latent variable approach. Third, we use the model to provide evidence to support that the Dutch practice of estimating match probabilities using the Balding-Nichols formula with a native Dutch reference database and θ=0.03 is conservative. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Ezra Tsur, Elishai
2017-01-01
Databases are imperative for research in bioinformatics and computational biology. Current challenges in database design include data heterogeneity and context-dependent interconnections between data entities. These challenges drove the development of unified data interfaces and specialized databases. The curation of specialized databases is an ever-growing challenge due to the introduction of new data sources and the emergence of new relational connections between established datasets. Here, an open-source framework for the curation of specialized databases is proposed. The framework supports user-designed models of data encapsulation, objects persistency and structured interfaces to local and external data sources such as MalaCards, Biomodels and the National Centre for Biotechnology Information (NCBI) databases. The proposed framework was implemented using Java as the development environment, EclipseLink as the data persistency agent and Apache Derby as the database manager. Syntactic analysis was based on J3D, jsoup, Apache Commons and w3c.dom open libraries. Finally, a construction of a specialized database for aneurysms associated vascular diseases is demonstrated. This database contains 3-dimensional geometries of aneurysms, patient's clinical information, articles, biological models, related diseases and our recently published model of aneurysms' risk of rapture. Framework is available in: http://nbel-lab.com.
Linking Multiple Databases: Term Project Using "Sentences" DBMS.
ERIC Educational Resources Information Center
King, Ronald S.; Rainwater, Stephen B.
This paper describes a methodology for use in teaching an introductory Database Management System (DBMS) course. Students master basic database concepts through the use of a multiple component project implemented in both relational and associative data models. The associative data model is a new approach for designing multi-user, Web-enabled…
NASA Technical Reports Server (NTRS)
Wang, Yi; Pant, Kapil; Brenner, Martin J.; Ouellette, Jeffrey A.
2018-01-01
This paper presents a data analysis and modeling framework to tailor and develop linear parameter-varying (LPV) aeroservoelastic (ASE) model database for flexible aircrafts in broad 2D flight parameter space. The Kriging surrogate model is constructed using ASE models at a fraction of grid points within the original model database, and then the ASE model at any flight condition can be obtained simply through surrogate model interpolation. The greedy sampling algorithm is developed to select the next sample point that carries the worst relative error between the surrogate model prediction and the benchmark model in the frequency domain among all input-output channels. The process is iterated to incrementally improve surrogate model accuracy till a pre-determined tolerance or iteration budget is met. The methodology is applied to the ASE model database of a flexible aircraft currently being tested at NASA/AFRC for flutter suppression and gust load alleviation. Our studies indicate that the proposed method can reduce the number of models in the original database by 67%. Even so the ASE models obtained through Kriging interpolation match the model in the original database constructed directly from the physics-based tool with the worst relative error far below 1%. The interpolated ASE model exhibits continuously-varying gains along a set of prescribed flight conditions. More importantly, the selected grid points are distributed non-uniformly in the parameter space, a) capturing the distinctly different dynamic behavior and its dependence on flight parameters, and b) reiterating the need and utility for adaptive space sampling techniques for ASE model database compaction. The present framework is directly extendible to high-dimensional flight parameter space, and can be used to guide the ASE model development, model order reduction, robust control synthesis and novel vehicle design of flexible aircraft.
Vieira, A.
2010-01-01
Background: In relation to pharmacognosy, an objective of many ethnobotanical studies is to identify plant species to be further investigated, for example, tested in disease models related to the ethnomedicinal application. To further warrant such testing, research evidence for medicinal applications of these plants (or of their major phytochemical constituents and metabolic derivatives) is typically analyzed in biomedical databases. Methods: As a model of this process, the current report presents novel information regarding traditional anti-inflammation and anti-infection medicinal plant use. This information was obtained from an interview-based ethnobotanical study; and was compared with current biomedical evidence using the Medline® database. Results: Of the 8 anti-infection plant species identified in the ethnobotanical study, 7 have related activities reported in the database; and of the 6 anti-inflammation plants, 4 have related activities in the database. Conclusion: Based on novel and complimentary results from the ethnobotanical and biomedical database analyses, it is suggested that some of these plants warrant additional investigation of potential anti-inflammatory or anti-infection activities in related disease models, and also additional studies in other population groups. PMID:21589754
NASA Technical Reports Server (NTRS)
Maluf, David A.; Tran, Peter B.
2003-01-01
Object-Relational database management system is an integrated hybrid cooperative approach to combine the best practices of both the relational model utilizing SQL queries and the object-oriented, semantic paradigm for supporting complex data creation. In this paper, a highly scalable, information on demand database framework, called NETMARK, is introduced. NETMARK takes advantages of the Oracle 8i object-relational database using physical addresses data types for very efficient keyword search of records spanning across both context and content. NETMARK was originally developed in early 2000 as a research and development prototype to solve the vast amounts of unstructured and semistructured documents existing within NASA enterprises. Today, NETMARK is a flexible, high-throughput open database framework for managing, storing, and searching unstructured or semi-structured arbitrary hierarchal models, such as XML and HTML.
An Extensible Schema-less Database Framework for Managing High-throughput Semi-Structured Documents
NASA Technical Reports Server (NTRS)
Maluf, David A.; Tran, Peter B.; La, Tracy; Clancy, Daniel (Technical Monitor)
2002-01-01
Object-Relational database management system is an integrated hybrid cooperative approach to combine the best practices of both the relational model utilizing SQL queries and the object oriented, semantic paradigm for supporting complex data creation. In this paper, a highly scalable, information on demand database framework, called NETMARK is introduced. NETMARK takes advantages of the Oracle 8i object-relational database using physical addresses data types for very efficient keyword searches of records for both context and content. NETMARK was originally developed in early 2000 as a research and development prototype to solve the vast amounts of unstructured and semi-structured documents existing within NASA enterprises. Today, NETMARK is a flexible, high throughput open database framework for managing, storing, and searching unstructured or semi structured arbitrary hierarchal models, XML and HTML.
MODBASE, a database of annotated comparative protein structure models
Pieper, Ursula; Eswar, Narayanan; Stuart, Ashley C.; Ilyin, Valentin A.; Sali, Andrej
2002-01-01
MODBASE (http://guitar.rockefeller.edu/modbase) is a relational database of annotated comparative protein structure models for all available protein sequences matched to at least one known protein structure. The models are calculated by MODPIPE, an automated modeling pipeline that relies on PSI-BLAST, IMPALA and MODELLER. MODBASE uses the MySQL relational database management system for flexible and efficient querying, and the MODVIEW Netscape plugin for viewing and manipulating multiple sequences and structures. It is updated regularly to reflect the growth of the protein sequence and structure databases, as well as improvements in the software for calculating the models. For ease of access, MODBASE is organized into different datasets. The largest dataset contains models for domains in 304 517 out of 539 171 unique protein sequences in the complete TrEMBL database (23 March 2001); only models based on significant alignments (PSI-BLAST E-value < 10–4) and models assessed to have the correct fold are included. Other datasets include models for target selection and structure-based annotation by the New York Structural Genomics Research Consortium, models for prediction of genes in the Drosophila melanogaster genome, models for structure determination of several ribosomal particles and models calculated by the MODWEB comparative modeling web server. PMID:11752309
Compartmental and Data-Based Modeling of Cerebral Hemodynamics: Linear Analysis.
Henley, B C; Shin, D C; Zhang, R; Marmarelis, V Z
Compartmental and data-based modeling of cerebral hemodynamics are alternative approaches that utilize distinct model forms and have been employed in the quantitative study of cerebral hemodynamics. This paper examines the relation between a compartmental equivalent-circuit and a data-based input-output model of dynamic cerebral autoregulation (DCA) and CO2-vasomotor reactivity (DVR). The compartmental model is constructed as an equivalent-circuit utilizing putative first principles and previously proposed hypothesis-based models. The linear input-output dynamics of this compartmental model are compared with data-based estimates of the DCA-DVR process. This comparative study indicates that there are some qualitative similarities between the two-input compartmental model and experimental results.
Adding Hierarchical Objects to Relational Database General-Purpose XML-Based Information Managements
NASA Technical Reports Server (NTRS)
Lin, Shu-Chun; Knight, Chris; La, Tracy; Maluf, David; Bell, David; Tran, Khai Peter; Gawdiak, Yuri
2006-01-01
NETMARK is a flexible, high-throughput software system for managing, storing, and rapid searching of unstructured and semi-structured documents. NETMARK transforms such documents from their original highly complex, constantly changing, heterogeneous data formats into well-structured, common data formats in using Hypertext Markup Language (HTML) and/or Extensible Markup Language (XML). The software implements an object-relational database system that combines the best practices of the relational model utilizing Structured Query Language (SQL) with those of the object-oriented, semantic database model for creating complex data. In particular, NETMARK takes advantage of the Oracle 8i object-relational database model using physical-address data types for very efficient keyword searches of records across both context and content. NETMARK also supports multiple international standards such as WEBDAV for drag-and-drop file management and SOAP for integrated information management using Web services. The document-organization and -searching capabilities afforded by NETMARK are likely to make this software attractive for use in disciplines as diverse as science, auditing, and law enforcement.
Corwin, John; Silberschatz, Avi; Miller, Perry L; Marenco, Luis
2007-01-01
Data sparsity and schema evolution issues affecting clinical informatics and bioinformatics communities have led to the adoption of vertical or object-attribute-value-based database schemas to overcome limitations posed when using conventional relational database technology. This paper explores these issues and discusses why biomedical data are difficult to model using conventional relational techniques. The authors propose a solution to these obstacles based on a relational database engine using a sparse, column-store architecture. The authors provide benchmarks comparing the performance of queries and schema-modification operations using three different strategies: (1) the standard conventional relational design; (2) past approaches used by biomedical informatics researchers; and (3) their sparse, column-store architecture. The performance results show that their architecture is a promising technique for storing and processing many types of data that are not handled well by the other two semantic data models.
NASA Astrophysics Data System (ADS)
Dziedzic, Adam; Mulawka, Jan
2014-11-01
NoSQL is a new approach to data storage and manipulation. The aim of this paper is to gain more insight into NoSQL databases, as we are still in the early stages of understanding when to use them and how to use them in an appropriate way. In this submission descriptions of selected NoSQL databases are presented. Each of the databases is analysed with primary focus on its data model, data access, architecture and practical usage in real applications. Furthemore, the NoSQL databases are compared in fields of data references. The relational databases offer foreign keys, whereas NoSQL databases provide us with limited references. An intermediate model between graph theory and relational algebra which can address the problem should be created. Finally, the proposal of a new approach to the problem of inconsistent references in Big Data storage systems is introduced.
Crasto, Chiquito J.; Marenco, Luis N.; Liu, Nian; Morse, Thomas M.; Cheung, Kei-Hoi; Lai, Peter C.; Bahl, Gautam; Masiar, Peter; Lam, Hugo Y.K.; Lim, Ernest; Chen, Huajin; Nadkarni, Prakash; Migliore, Michele; Miller, Perry L.; Shepherd, Gordon M.
2009-01-01
This article presents the latest developments in neuroscience information dissemination through the SenseLab suite of databases: NeuronDB, CellPropDB, ORDB, OdorDB, OdorMapDB, ModelDB and BrainPharm. These databases include information related to: (i) neuronal membrane properties and neuronal models, and (ii) genetics, genomics, proteomics and imaging studies of the olfactory system. We describe here: the new features for each database, the evolution of SenseLab’s unifying database architecture and instances of SenseLab database interoperation with other neuroscience online resources. PMID:17510162
Environmental modeling and recognition for an autonomous land vehicle
NASA Technical Reports Server (NTRS)
Lawton, D. T.; Levitt, T. S.; Mcconnell, C. C.; Nelson, P. C.
1987-01-01
An architecture for object modeling and recognition for an autonomous land vehicle is presented. Examples of objects of interest include terrain features, fields, roads, horizon features, trees, etc. The architecture is organized around a set of data bases for generic object models and perceptual structures, temporary memory for the instantiation of object and relational hypotheses, and a long term memory for storing stable hypotheses that are affixed to the terrain representation. Multiple inference processes operate over these databases. Researchers describe these particular components: the perceptual structure database, the grouping processes that operate over this, schemas, and the long term terrain database. A processing example that matches predictions from the long term terrain model to imagery, extracts significant perceptual structures for consideration as potential landmarks, and extracts a relational structure to update the long term terrain database is given.
NASA Technical Reports Server (NTRS)
Saile, Lynn; Lopez, Vilma; Bickham, Grandin; FreiredeCarvalho, Mary; Kerstman, Eric; Byrne, Vicky; Butler, Douglas; Myers, Jerry; Walton, Marlei
2011-01-01
This slide presentation reviews the Integrated Medical Model (IMM) database, which is an organized evidence base for assessing in-flight crew health risk. The database is a relational database accessible to many people. The database quantifies the model inputs by a ranking based on the highest value of the data as Level of Evidence (LOE) and the quality of evidence (QOE) score that provides an assessment of the evidence base for each medical condition. The IMM evidence base has already been able to provide invaluable information for designers, and for other uses.
Evaluating the Cassandra NoSQL Database Approach for Genomic Data Persistency.
Aniceto, Rodrigo; Xavier, Rene; Guimarães, Valeria; Hondo, Fernanda; Holanda, Maristela; Walter, Maria Emilia; Lifschitz, Sérgio
2015-01-01
Rapid advances in high-throughput sequencing techniques have created interesting computational challenges in bioinformatics. One of them refers to management of massive amounts of data generated by automatic sequencers. We need to deal with the persistency of genomic data, particularly storing and analyzing these large-scale processed data. To find an alternative to the frequently considered relational database model becomes a compelling task. Other data models may be more effective when dealing with a very large amount of nonconventional data, especially for writing and retrieving operations. In this paper, we discuss the Cassandra NoSQL database approach for storing genomic data. We perform an analysis of persistency and I/O operations with real data, using the Cassandra database system. We also compare the results obtained with a classical relational database system and another NoSQL database approach, MongoDB.
The Use of a Relational Database in Qualitative Research on Educational Computing.
ERIC Educational Resources Information Center
Winer, Laura R.; Carriere, Mario
1990-01-01
Discusses the use of a relational database as a data management and analysis tool for nonexperimental qualitative research, and describes the use of the Reflex Plus database in the Vitrine 2001 project in Quebec to study computer-based learning environments. Information systems are also discussed, and the use of a conceptual model is explained.…
BIOSPIDA: A Relational Database Translator for NCBI.
Hagen, Matthew S; Lee, Eva K
2010-11-13
As the volume and availability of biological databases continue widespread growth, it has become increasingly difficult for research scientists to identify all relevant information for biological entities of interest. Details of nucleotide sequences, gene expression, molecular interactions, and three-dimensional structures are maintained across many different databases. To retrieve all necessary information requires an integrated system that can query multiple databases with minimized overhead. This paper introduces a universal parser and relational schema translator that can be utilized for all NCBI databases in Abstract Syntax Notation (ASN.1). The data models for OMIM, Entrez-Gene, Pubmed, MMDB and GenBank have been successfully converted into relational databases and all are easily linkable helping to answer complex biological questions. These tools facilitate research scientists to locally integrate databases from NCBI without significant workload or development time.
Validation of a common data model for active safety surveillance research
Ryan, Patrick B; Reich, Christian G; Hartzema, Abraham G; Stang, Paul E
2011-01-01
Objective Systematic analysis of observational medical databases for active safety surveillance is hindered by the variation in data models and coding systems. Data analysts often find robust clinical data models difficult to understand and ill suited to support their analytic approaches. Further, some models do not facilitate the computations required for systematic analysis across many interventions and outcomes for large datasets. Translating the data from these idiosyncratic data models to a common data model (CDM) could facilitate both the analysts' understanding and the suitability for large-scale systematic analysis. In addition to facilitating analysis, a suitable CDM has to faithfully represent the source observational database. Before beginning to use the Observational Medical Outcomes Partnership (OMOP) CDM and a related dictionary of standardized terminologies for a study of large-scale systematic active safety surveillance, the authors validated the model's suitability for this use by example. Validation by example To validate the OMOP CDM, the model was instantiated into a relational database, data from 10 different observational healthcare databases were loaded into separate instances, a comprehensive array of analytic methods that operate on the data model was created, and these methods were executed against the databases to measure performance. Conclusion There was acceptable representation of the data from 10 observational databases in the OMOP CDM using the standardized terminologies selected, and a range of analytic methods was developed and executed with sufficient performance to be useful for active safety surveillance. PMID:22037893
Evaluating the Cassandra NoSQL Database Approach for Genomic Data Persistency
Aniceto, Rodrigo; Xavier, Rene; Guimarães, Valeria; Hondo, Fernanda; Holanda, Maristela; Walter, Maria Emilia; Lifschitz, Sérgio
2015-01-01
Rapid advances in high-throughput sequencing techniques have created interesting computational challenges in bioinformatics. One of them refers to management of massive amounts of data generated by automatic sequencers. We need to deal with the persistency of genomic data, particularly storing and analyzing these large-scale processed data. To find an alternative to the frequently considered relational database model becomes a compelling task. Other data models may be more effective when dealing with a very large amount of nonconventional data, especially for writing and retrieving operations. In this paper, we discuss the Cassandra NoSQL database approach for storing genomic data. We perform an analysis of persistency and I/O operations with real data, using the Cassandra database system. We also compare the results obtained with a classical relational database system and another NoSQL database approach, MongoDB. PMID:26558254
Protein Simulation Data in the Relational Model.
Simms, Andrew M; Daggett, Valerie
2012-10-01
High performance computing is leading to unprecedented volumes of data. Relational databases offer a robust and scalable model for storing and analyzing scientific data. However, these features do not come without a cost-significant design effort is required to build a functional and efficient repository. Modeling protein simulation data in a relational database presents several challenges: the data captured from individual simulations are large, multi-dimensional, and must integrate with both simulation software and external data sites. Here we present the dimensional design and relational implementation of a comprehensive data warehouse for storing and analyzing molecular dynamics simulations using SQL Server.
Protein Simulation Data in the Relational Model
Simms, Andrew M.; Daggett, Valerie
2011-01-01
High performance computing is leading to unprecedented volumes of data. Relational databases offer a robust and scalable model for storing and analyzing scientific data. However, these features do not come without a cost—significant design effort is required to build a functional and efficient repository. Modeling protein simulation data in a relational database presents several challenges: the data captured from individual simulations are large, multi-dimensional, and must integrate with both simulation software and external data sites. Here we present the dimensional design and relational implementation of a comprehensive data warehouse for storing and analyzing molecular dynamics simulations using SQL Server. PMID:23204646
Relational Database Technology: An Overview.
ERIC Educational Resources Information Center
Melander, Nicole
1987-01-01
Describes the development of relational database technology as it applies to educational settings. Discusses some of the new tools and models being implemented in an effort to provide educators with technologically advanced ways of answering questions about education programs and data. (TW)
NASA Astrophysics Data System (ADS)
Gentry, Jeffery D.
2000-05-01
A relational database is a powerful tool for collecting and analyzing the vast amounts of inner-related data associated with the manufacture of composite materials. A relational database contains many individual database tables that store data that are related in some fashion. Manufacturing process variables as well as quality assurance measurements can be collected and stored in database tables indexed according to lot numbers, part type or individual serial numbers. Relationships between manufacturing process and product quality can then be correlated over a wide range of product types and process variations. This paper presents details on how relational databases are used to collect, store, and analyze process variables and quality assurance data associated with the manufacture of advanced composite materials. Important considerations are covered including how the various types of data are organized and how relationships between the data are defined. Employing relational database techniques to establish correlative relationships between process variables and quality assurance measurements is then explored. Finally, the benefits of database techniques such as data warehousing, data mining and web based client/server architectures are discussed in the context of composite material manufacturing.
A Relational Encoding of a Conceptual Model with Multiple Temporal Dimensions
NASA Astrophysics Data System (ADS)
Gubiani, Donatella; Montanari, Angelo
The theoretical interest and the practical relevance of a systematic treatment of multiple temporal dimensions is widely recognized in the database and information system communities. Nevertheless, most relational databases have no temporal support at all. A few of them provide a limited support, in terms of temporal data types and predicates, constructors, and functions for the management of time values (borrowed from the SQL standard). One (resp., two) temporal dimensions are supported by historical and transaction-time (resp., bitemporal) databases only. In this paper, we provide a relational encoding of a conceptual model featuring four temporal dimensions, namely, the classical valid and transaction times, plus the event and availability times. We focus our attention on the distinctive technical features of the proposed temporal extension of the relation model. In the last part of the paper, we briefly show how to implement it in a standard DBMS.
BIOSPIDA: A Relational Database Translator for NCBI
Hagen, Matthew S.; Lee, Eva K.
2010-01-01
As the volume and availability of biological databases continue widespread growth, it has become increasingly difficult for research scientists to identify all relevant information for biological entities of interest. Details of nucleotide sequences, gene expression, molecular interactions, and three-dimensional structures are maintained across many different databases. To retrieve all necessary information requires an integrated system that can query multiple databases with minimized overhead. This paper introduces a universal parser and relational schema translator that can be utilized for all NCBI databases in Abstract Syntax Notation (ASN.1). The data models for OMIM, Entrez-Gene, Pubmed, MMDB and GenBank have been successfully converted into relational databases and all are easily linkable helping to answer complex biological questions. These tools facilitate research scientists to locally integrate databases from NCBI without significant workload or development time. PMID:21347013
Lee, Ken Ka-Yin; Tang, Wai-Choi; Choi, Kup-Sze
2013-04-01
Clinical data are dynamic in nature, often arranged hierarchically and stored as free text and numbers. Effective management of clinical data and the transformation of the data into structured format for data analysis are therefore challenging issues in electronic health records development. Despite the popularity of relational databases, the scalability of the NoSQL database model and the document-centric data structure of XML databases appear to be promising features for effective clinical data management. In this paper, three database approaches--NoSQL, XML-enabled and native XML--are investigated to evaluate their suitability for structured clinical data. The database query performance is reported, together with our experience in the databases development. The results show that NoSQL database is the best choice for query speed, whereas XML databases are advantageous in terms of scalability, flexibility and extensibility, which are essential to cope with the characteristics of clinical data. While NoSQL and XML technologies are relatively new compared to the conventional relational database, both of them demonstrate potential to become a key database technology for clinical data management as the technology further advances. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.
Evaluation of relational and NoSQL database architectures to manage genomic annotations.
Schulz, Wade L; Nelson, Brent G; Felker, Donn K; Durant, Thomas J S; Torres, Richard
2016-12-01
While the adoption of next generation sequencing has rapidly expanded, the informatics infrastructure used to manage the data generated by this technology has not kept pace. Historically, relational databases have provided much of the framework for data storage and retrieval. Newer technologies based on NoSQL architectures may provide significant advantages in storage and query efficiency, thereby reducing the cost of data management. But their relative advantage when applied to biomedical data sets, such as genetic data, has not been characterized. To this end, we compared the storage, indexing, and query efficiency of a common relational database (MySQL), a document-oriented NoSQL database (MongoDB), and a relational database with NoSQL support (PostgreSQL). When used to store genomic annotations from the dbSNP database, we found the NoSQL architectures to outperform traditional, relational models for speed of data storage, indexing, and query retrieval in nearly every operation. These findings strongly support the use of novel database technologies to improve the efficiency of data management within the biological sciences. Copyright © 2016 Elsevier Inc. All rights reserved.
GraQL: A Query Language for High-Performance Attributed Graph Databases
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chavarría-Miranda, Daniel; Castellana, Vito G.; Morari, Alessandro
Graph databases have gained increasing interest in the last few years due to the emergence of data sources which are not easily analyzable in traditional relational models or for which a graph data model is the natural representation. In order to understand the design and implementation choices for an attributed graph database backend and query language, we have started to design our infrastructure for attributed graph databases. In this paper, we describe the design considerations of our in-memory attributed graph database system with a particular focus on the data definition and query language components.
NASA Technical Reports Server (NTRS)
Campbell, William J.; Short, Nicholas M., Jr.; Roelofs, Larry H.; Dorfman, Erik
1991-01-01
A methodology for optimizing organization of data obtained by NASA earth and space missions is discussed. The methodology uses a concept based on semantic data modeling techniques implemented in a hierarchical storage model. The modeling is used to organize objects in mass storage devices, relational database systems, and object-oriented databases. The semantic data modeling at the metadata record level is examined, including the simulation of a knowledge base and semantic metadata storage issues. The semantic data model hierarchy and its application for efficient data storage is addressed, as is the mapping of the application structure to the mass storage.
2010-06-11
MODELING WITH IMPLEMENTED GBI AND MD DATA (STEADY STATE GB MIGRATION) PAGE 48 5. FORMATION AND ANALYSIS OF GB PROPERTIES DATABASE PAGE 53 5.1...Relative GB energy for specified GBM averaged on possible GBIs PAGE 53 5.2. Database validation on available experimental data PAGE 56 5.3. Comparison...PAGE 70 Fig. 6.11. MC Potts Rex. and GG software: (a) modeling volume analysis; (b) searching for GB energy value within included database . PAGE
A case study for a digital seabed database: Bohai Sea engineering geology database
NASA Astrophysics Data System (ADS)
Tianyun, Su; Shikui, Zhai; Baohua, Liu; Ruicai, Liang; Yanpeng, Zheng; Yong, Wang
2006-07-01
This paper discusses the designing plan of ORACLE-based Bohai Sea engineering geology database structure from requisition analysis, conceptual structure analysis, logical structure analysis, physical structure analysis and security designing. In the study, we used the object-oriented Unified Modeling Language (UML) to model the conceptual structure of the database and used the powerful function of data management which the object-oriented and relational database ORACLE provides to organize and manage the storage space and improve its security performance. By this means, the database can provide rapid and highly effective performance in data storage, maintenance and query to satisfy the application requisition of the Bohai Sea Oilfield Paradigm Area Information System.
The methodology of database design in organization management systems
NASA Astrophysics Data System (ADS)
Chudinov, I. L.; Osipova, V. V.; Bobrova, Y. V.
2017-01-01
The paper describes the unified methodology of database design for management information systems. Designing the conceptual information model for the domain area is the most important and labor-intensive stage in database design. Basing on the proposed integrated approach to design, the conceptual information model, the main principles of developing the relation databases are provided and user’s information needs are considered. According to the methodology, the process of designing the conceptual information model includes three basic stages, which are defined in detail. Finally, the article describes the process of performing the results of analyzing user’s information needs and the rationale for use of classifiers.
Moskalev, Alexey; Chernyagina, Elizaveta; de Magalhães, João Pedro; Barardo, Diogo; Thoppil, Harikrishnan; Shaposhnikov, Mikhail; Budovsky, Arie; Fraifeld, Vadim E; Garazha, Andrew; Tsvetkov, Vasily; Bronovitsky, Evgeny; Bogomolov, Vladislav; Scerbacov, Alexei; Kuryan, Oleg; Gurinovich, Roman; Jellen, Leslie C; Kennedy, Brian; Mamoshina, Polina; Dobrovolskaya, Evgeniya; Aliper, Alex; Kaminsky, Dmitry; Zhavoronkov, Alex
2015-09-01
As the level of interest in aging research increases, there is a growing number of geroprotectors, or therapeutic interventions that aim to extend the healthy lifespan and repair or reduce aging-related damage in model organisms and, eventually, in humans. There is a clear need for a manually-curated database of geroprotectors to compile and index their effects on aging and age-related diseases and link these effects to relevant studies and multiple biochemical and drug databases. Here, we introduce the first such resource, Geroprotectors (http://geroprotectors.org). Geroprotectors is a public, rapidly explorable database that catalogs over 250 experiments involving over 200 known or candidate geroprotectors that extend lifespan in model organisms. Each compound has a comprehensive profile complete with biochemistry, mechanisms, and lifespan effects in various model organisms, along with information ranging from chemical structure, side effects, and toxicity to FDA drug status. These are presented in a visually intuitive, efficient framework fit for casual browsing or in-depth research alike. Data are linked to the source studies or databases, providing quick and convenient access to original data. The Geroprotectors database facilitates cross-study, cross-organism, and cross-discipline analysis and saves countless hours of inefficient literature and web searching. Geroprotectors is a one-stop, knowledge-sharing, time-saving resource for researchers seeking healthy aging solutions.
Moskalev, Alexey; Chernyagina, Elizaveta; de Magalhães, João Pedro; Barardo, Diogo; Thoppil, Harikrishnan; Shaposhnikov, Mikhail; Budovsky, Arie; Fraifeld, Vadim E.; Garazha, Andrew; Tsvetkov, Vasily; Bronovitsky, Evgeny; Bogomolov, Vladislav; Scerbacov, Alexei; Kuryan, Oleg; Gurinovich, Roman; Jellen, Leslie C.; Kennedy, Brian; Mamoshina, Polina; Dobrovolskaya, Evgeniya; Aliper, Alex; Kaminsky, Dmitry; Zhavoronkov, Alex
2015-01-01
As the level of interest in aging research increases, there is a growing number of geroprotectors, or therapeutic interventions that aim to extend the healthy lifespan and repair or reduce aging-related damage in model organisms and, eventually, in humans. There is a clear need for a manually-curated database of geroprotectors to compile and index their effects on aging and age-related diseases and link these effects to relevant studies and multiple biochemical and drug databases. Here, we introduce the first such resource, Geroprotectors (http://geroprotectors.org). Geroprotectors is a public, rapidly explorable database that catalogs over 250 experiments involving over 200 known or candidate geroprotectors that extend lifespan in model organisms. Each compound has a comprehensive profile complete with biochemistry, mechanisms, and lifespan effects in various model organisms, along with information ranging from chemical structure, side effects, and toxicity to FDA drug status. These are presented in a visually intuitive, efficient framework fit for casual browsing or in-depth research alike. Data are linked to the source studies or databases, providing quick and convenient access to original data. The Geroprotectors database facilitates cross-study, cross-organism, and cross-discipline analysis and saves countless hours of inefficient literature and web searching. Geroprotectors is a one-stop, knowledge-sharing, time-saving resource for researchers seeking healthy aging solutions. PMID:26342919
Chen, Yi- Ping Phoebe; Hanan, Jim
2002-01-01
Models of plant architecture allow us to explore how genotype environment interactions effect the development of plant phenotypes. Such models generate masses of data organised in complex hierarchies. This paper presents a generic system for creating and automatically populating a relational database from data generated by the widely used L-system approach to modelling plant morphogenesis. Techniques from compiler technology are applied to generate attributes (new fields) in the database, to simplify query development for the recursively-structured branching relationship. Use of biological terminology in an interactive query builder contributes towards making the system biologist-friendly.
Architecture Knowledge for Evaluating Scalable Databases
2015-01-16
problems, arising from the proliferation of new data models and distributed technologies for building scalable, available data stores . Architects must...longer are relational databases the de facto standard for building data repositories. Highly distributed, scalable “ NoSQL ” databases [11] have emerged...This is especially challenging at the data storage layer. The multitude of competing NoSQL database technologies creates a complex and rapidly
The CEBAF Element Database and Related Operational Software
DOE Office of Scientific and Technical Information (OSTI.GOV)
Larrieu, Theodore; Slominski, Christopher; Keesee, Marie
The newly commissioned 12GeV CEBAF accelerator relies on a flexible, scalable and comprehensive database to define the accelerator. This database delivers the configuration for CEBAF operational tools, including hardware checkout, the downloadable optics model, control screens, and much more. The presentation will describe the flexible design of the CEBAF Element Database (CED), its features and assorted use case examples.
Legehar, Ashenafi; Xhaard, Henri; Ghemtio, Leo
2016-01-01
The disposition of a pharmaceutical compound within an organism, i.e. its Absorption, Distribution, Metabolism, Excretion, Toxicity (ADMET) properties and adverse effects, critically affects late stage failure of drug candidates and has led to the withdrawal of approved drugs. Computational methods are effective approaches to reduce the number of safety issues by analyzing possible links between chemical structures and ADMET or adverse effects, but this is limited by the size, quality, and heterogeneity of the data available from individual sources. Thus, large, clean and integrated databases of approved drug data, associated with fast and efficient predictive tools are desirable early in the drug discovery process. We have built a relational database (IDAAPM) to integrate available approved drug data such as drug approval information, ADMET and adverse effects, chemical structures and molecular descriptors, targets, bioactivity and related references. The database has been coupled with a searchable web interface and modern data analytics platform (KNIME) to allow data access, data transformation, initial analysis and further predictive modeling. Data were extracted from FDA resources and supplemented from other publicly available databases. Currently, the database contains information regarding about 19,226 FDA approval applications for 31,815 products (small molecules and biologics) with their approval history, 2505 active ingredients, together with as many ADMET properties, 1629 molecular structures, 2.5 million adverse effects and 36,963 experimental drug-target bioactivity data. IDAAPM is a unique resource that, in a single relational database, provides detailed information on FDA approved drugs including their ADMET properties and adverse effects, the corresponding targets with bioactivity data, coupled with a data analytics platform. It can be used to perform basic to complex drug-target ADMET or adverse effects analysis and predictive modeling. IDAAPM is freely accessible at http://idaapm.helsinki.fi and can be exploited through a KNIME workflow connected to the database.Graphical abstractFDA approved drug data integration for predictive modeling.
Charoute, Hicham; Nahili, Halima; Abidi, Omar; Gabi, Khalid; Rouba, Hassan; Fakiri, Malika; Barakat, Abdelhamid
2014-03-01
National and ethnic mutation databases provide comprehensive information about genetic variations reported in a population or an ethnic group. In this paper, we present the Moroccan Genetic Disease Database (MGDD), a catalogue of genetic data related to diseases identified in the Moroccan population. We used the PubMed, Web of Science and Google Scholar databases to identify available articles published until April 2013. The Database is designed and implemented on a three-tier model using Mysql relational database and the PHP programming language. To date, the database contains 425 mutations and 208 polymorphisms found in 301 genes and 259 diseases. Most Mendelian diseases in the Moroccan population follow autosomal recessive mode of inheritance (74.17%) and affect endocrine, nutritional and metabolic physiology. The MGDD database provides reference information for researchers, clinicians and health professionals through a user-friendly Web interface. Its content should be useful to improve researches in human molecular genetics, disease diagnoses and design of association studies. MGDD can be publicly accessed at http://mgdd.pasteur.ma.
Re-thinking organisms: The impact of databases on model organism biology.
Leonelli, Sabina; Ankeny, Rachel A
2012-03-01
Community databases have become crucial to the collection, ordering and retrieval of data gathered on model organisms, as well as to the ways in which these data are interpreted and used across a range of research contexts. This paper analyses the impact of community databases on research practices in model organism biology by focusing on the history and current use of four community databases: FlyBase, Mouse Genome Informatics, WormBase and The Arabidopsis Information Resource. We discuss the standards used by the curators of these databases for what counts as reliable evidence, acceptable terminology, appropriate experimental set-ups and adequate materials (e.g., specimens). On the one hand, these choices are informed by the collaborative research ethos characterising most model organism communities. On the other hand, the deployment of these standards in databases reinforces this ethos and gives it concrete and precise instantiations by shaping the skills, practices, values and background knowledge required of the database users. We conclude that the increasing reliance on community databases as vehicles to circulate data is having a major impact on how researchers conduct and communicate their research, which affects how they understand the biology of model organisms and its relation to the biology of other species. Copyright © 2011 Elsevier Ltd. All rights reserved.
1987-12-01
Application Programs Intelligent Disk Database Controller Manangement System Operating System Host .1’ I% Figure 2. Intelligent Disk Controller Application...8217. /- - • Database Control -% Manangement System Disk Data Controller Application Programs Operating Host I"" Figure 5. Processor-Per- Head data. Therefore, the...However. these ad- ditional properties have been proven in classical set and relation theory [75]. These additional properties are described here
Zhang, Y J; Zhou, D H; Bai, Z P; Xue, F X
2018-02-10
Objective: To quantitatively analyze the current status and development trends regarding the land use regression (LUR) models on ambient air pollution studies. Methods: Relevant literature from the PubMed database before June 30, 2017 was analyzed, using the Bibliographic Items Co-occurrence Matrix Builder (BICOMB 2.0). Keywords co-occurrence networks, cluster mapping and timeline mapping were generated, using the CiteSpace 5.1.R5 software. Relevant literature identified in three Chinese databases was also reviewed. Results: Four hundred sixty four relevant papers were retrieved from the PubMed database. The number of papers published showed an annual increase, in line with the growing trend of the index. Most papers were published in the journal of Environmental Health Perspectives . Results from the Co-word cluster analysis identified five clusters: cluster#0 consisted of birth cohort studies related to the health effects of prenatal exposure to air pollution; cluster#1 referred to land use regression modeling and exposure assessment; cluster#2 was related to the epidemiology on traffic exposure; cluster#3 dealt with the exposure to ultrafine particles and related health effects; cluster#4 described the exposure to black carbon and related health effects. Data from Timeline mapping indicated that cluster#0 and#1 were the main research areas while cluster#3 and#4 were the up-coming hot areas of research. Ninety four relevant papers were retrieved from the Chinese databases with most of them related to studies on modeling. Conclusion: In order to better assess the health-related risks of ambient air pollution, and to best inform preventative public health intervention policies, application of LUR models to environmental epidemiology studies in China should be encouraged.
A data model and database for high-resolution pathology analytical image informatics.
Wang, Fusheng; Kong, Jun; Cooper, Lee; Pan, Tony; Kurc, Tahsin; Chen, Wenjin; Sharma, Ashish; Niedermayr, Cristobal; Oh, Tae W; Brat, Daniel; Farris, Alton B; Foran, David J; Saltz, Joel
2011-01-01
The systematic analysis of imaged pathology specimens often results in a vast amount of morphological information at both the cellular and sub-cellular scales. While microscopy scanners and computerized analysis are capable of capturing and analyzing data rapidly, microscopy image data remain underutilized in research and clinical settings. One major obstacle which tends to reduce wider adoption of these new technologies throughout the clinical and scientific communities is the challenge of managing, querying, and integrating the vast amounts of data resulting from the analysis of large digital pathology datasets. This paper presents a data model, which addresses these challenges, and demonstrates its implementation in a relational database system. This paper describes a data model, referred to as Pathology Analytic Imaging Standards (PAIS), and a database implementation, which are designed to support the data management and query requirements of detailed characterization of micro-anatomic morphology through many interrelated analysis pipelines on whole-slide images and tissue microarrays (TMAs). (1) Development of a data model capable of efficiently representing and storing virtual slide related image, annotation, markup, and feature information. (2) Development of a database, based on the data model, capable of supporting queries for data retrieval based on analysis and image metadata, queries for comparison of results from different analyses, and spatial queries on segmented regions, features, and classified objects. The work described in this paper is motivated by the challenges associated with characterization of micro-scale features for comparative and correlative analyses involving whole-slides tissue images and TMAs. Technologies for digitizing tissues have advanced significantly in the past decade. Slide scanners are capable of producing high-magnification, high-resolution images from whole slides and TMAs within several minutes. Hence, it is becoming increasingly feasible for basic, clinical, and translational research studies to produce thousands of whole-slide images. Systematic analysis of these large datasets requires efficient data management support for representing and indexing results from hundreds of interrelated analyses generating very large volumes of quantifications such as shape and texture and of classifications of the quantified features. We have designed a data model and a database to address the data management requirements of detailed characterization of micro-anatomic morphology through many interrelated analysis pipelines. The data model represents virtual slide related image, annotation, markup and feature information. The database supports a wide range of metadata and spatial queries on images, annotations, markups, and features. We currently have three databases running on a Dell PowerEdge T410 server with CentOS 5.5 Linux operating system. The database server is IBM DB2 Enterprise Edition 9.7.2. The set of databases consists of 1) a TMA database containing image analysis results from 4740 cases of breast cancer, with 641 MB storage size; 2) an algorithm validation database, which stores markups and annotations from two segmentation algorithms and two parameter sets on 18 selected slides, with 66 GB storage size; and 3) an in silico brain tumor study database comprising results from 307 TCGA slides, with 365 GB storage size. The latter two databases also contain human-generated annotations and markups for regions and nuclei. Modeling and managing pathology image analysis results in a database provide immediate benefits on the value and usability of data in a research study. The database provides powerful query capabilities, which are otherwise difficult or cumbersome to support by other approaches such as programming languages. Standardized, semantic annotated data representation and interfaces also make it possible to more efficiently share image data and analysis results.
Performance related issues in distributed database systems
NASA Technical Reports Server (NTRS)
Mukkamala, Ravi
1991-01-01
The key elements of research performed during the year long effort of this project are: Investigate the effects of heterogeneity in distributed real time systems; Study the requirements to TRAC towards building a heterogeneous database system; Study the effects of performance modeling on distributed database performance; and Experiment with an ORACLE based heterogeneous system.
Report on Approaches to Database Translation. Final Report.
ERIC Educational Resources Information Center
Gallagher, Leonard; Salazar, Sandra
This report describes approaches to database translation (i.e., transferring data and data definitions from a source, either a database management system (DBMS) or a batch file, to a target DBMS), and recommends a method for representing the data structures of newly-proposed network and relational data models in a form suitable for database…
A Community Data Model for Hydrologic Observations
NASA Astrophysics Data System (ADS)
Tarboton, D. G.; Horsburgh, J. S.; Zaslavsky, I.; Maidment, D. R.; Valentine, D.; Jennings, B.
2006-12-01
The CUAHSI Hydrologic Information System project is developing information technology infrastructure to support hydrologic science. Hydrologic information science involves the description of hydrologic environments in a consistent way, using data models for information integration. This includes a hydrologic observations data model for the storage and retrieval of hydrologic observations in a relational database designed to facilitate data retrieval for integrated analysis of information collected by multiple investigators. It is intended to provide a standard format to facilitate the effective sharing of information between investigators and to facilitate analysis of information within a single study area or hydrologic observatory, or across hydrologic observatories and regions. The observations data model is designed to store hydrologic observations and sufficient ancillary information (metadata) about the observations to allow them to be unambiguously interpreted and used and provide traceable heritage from raw measurements to usable information. The design is based on the premise that a relational database at the single observation level is most effective for providing querying capability and cross dimension data retrieval and analysis. This premise is being tested through the implementation of a prototype hydrologic observations database, and the development of web services for the retrieval of data from and ingestion of data into the database. These web services hosted by the San Diego Supercomputer center make data in the database accessible both through a Hydrologic Data Access System portal and directly from applications software such as Excel, Matlab and ArcGIS that have Standard Object Access Protocol (SOAP) capability. This paper will (1) describe the data model; (2) demonstrate the capability for representing diverse data in the same database; (3) demonstrate the use of the database from applications software for the performance of hydrologic analysis across different observation types.
Migration of legacy mumps applications to relational database servers.
O'Kane, K C
2001-07-01
An extended implementation of the Mumps language is described that facilitates vendor neutral migration of legacy Mumps applications to SQL-based relational database servers. Implemented as a compiler, this system translates Mumps programs to operating system independent, standard C code for subsequent compilation to fully stand-alone, binary executables. Added built-in functions and support modules extend the native hierarchical Mumps database with access to industry standard, networked, relational database management servers (RDBMS) thus freeing Mumps applications from dependence upon vendor specific, proprietary, unstandardized database models. Unlike Mumps systems that have added captive, proprietary RDMBS access, the programs generated by this development environment can be used with any RDBMS system that supports common network access protocols. Additional features include a built-in web server interface and the ability to interoperate directly with programs and functions written in other languages.
The risk of paradoxical embolism (RoPE) study: initial description of the completed database.
Thaler, David E; Di Angelantonio, Emanuele; Di Tullio, Marco R; Donovan, Jennifer S; Griffith, John; Homma, Shunichi; Jaigobin, Cheryl; Mas, Jean-Louis; Mattle, Heinrich P; Michel, Patrik; Mono, Marie-Luise; Nedeltchev, Krassen; Papetti, Federica; Ruthazer, Robin; Serena, Joaquín; Weimar, Christian; Elkind, Mitchell S V; Kent, David M
2013-12-01
Detecting a benefit from closure of patent foramen ovale in patients with cryptogenic stroke is hampered by low rates of stroke recurrence and uncertainty about the causal role of patent foramen ovale in the index event. A method to predict patent foramen ovale-attributable recurrence risk is needed. However, individual databases generally have too few stroke recurrences to support risk modeling. Prior studies of this population have been limited by low statistical power for examining factors related to recurrence. The aim of this study was to develop a database to support modeling of patent foramen ovale-attributable recurrence risk by combining extant data sets. We identified investigators with extant databases including subjects with cryptogenic stroke investigated for patent foramen ovale, determined the availability and characteristics of data in each database, collaboratively specified the variables to be included in the Risk of Paradoxical Embolism database, harmonized the variables across databases, and collected new primary data when necessary and feasible. The Risk of Paradoxical Embolism database has individual clinical, radiologic, and echocardiographic data from 12 component databases, including subjects with cryptogenic stroke both with (n = 1925) and without (n = 1749) patent foramen ovale. In the patent foramen ovale subjects, a total of 381 outcomes (stroke, transient ischemic attack, death) occurred (median follow-up 2·2 years). While there were substantial variations in data collection between studies, there was sufficient overlap to define a common set of variables suitable for risk modeling. While individual studies are inadequate for modeling patent foramen ovale-attributable recurrence risk, collaboration between investigators has yielded a database with sufficient power to identify those patients at highest risk for a patent foramen ovale-related stroke recurrence who may have the greatest potential benefit from patent foramen ovale closure. © 2012 The Authors. International Journal of Stroke © 2012 World Stroke Organization.
2012-09-01
relative performance of several conventional SQL and NoSQL databases with a set of one billion file block hashes. Digital Forensics, Sector Hashing, Full... NoSQL databases with a set of one billion file block hashes. v THIS PAGE INTENTIONALLY LEFT BLANK vi Table of Contents List of Acronyms and...Operating System NOOP No Operation assembly instruction NoSQL “Not only SQL” model for non-relational database management NSRL National Software
Freire, Sergio Miranda; Teodoro, Douglas; Wei-Kleiner, Fang; Sundvall, Erik; Karlsson, Daniel; Lambrix, Patrick
2016-01-01
This study provides an experimental performance evaluation on population-based queries of NoSQL databases storing archetype-based Electronic Health Record (EHR) data. There are few published studies regarding the performance of persistence mechanisms for systems that use multilevel modelling approaches, especially when the focus is on population-based queries. A healthcare dataset with 4.2 million records stored in a relational database (MySQL) was used to generate XML and JSON documents based on the openEHR reference model. Six datasets with different sizes were created from these documents and imported into three single machine XML databases (BaseX, eXistdb and Berkeley DB XML) and into a distributed NoSQL database system based on the MapReduce approach, Couchbase, deployed in different cluster configurations of 1, 2, 4, 8 and 12 machines. Population-based queries were submitted to those databases and to the original relational database. Database size and query response times are presented. The XML databases were considerably slower and required much more space than Couchbase. Overall, Couchbase had better response times than MySQL, especially for larger datasets. However, Couchbase requires indexing for each differently formulated query and the indexing time increases with the size of the datasets. The performances of the clusters with 2, 4, 8 and 12 nodes were not better than the single node cluster in relation to the query response time, but the indexing time was reduced proportionally to the number of nodes. The tested XML databases had acceptable performance for openEHR-based data in some querying use cases and small datasets, but were generally much slower than Couchbase. Couchbase also outperformed the response times of the relational database, but required more disk space and had a much longer indexing time. Systems like Couchbase are thus interesting research targets for scalable storage and querying of archetype-based EHR data when population-based use cases are of interest. PMID:26958859
Freire, Sergio Miranda; Teodoro, Douglas; Wei-Kleiner, Fang; Sundvall, Erik; Karlsson, Daniel; Lambrix, Patrick
2016-01-01
This study provides an experimental performance evaluation on population-based queries of NoSQL databases storing archetype-based Electronic Health Record (EHR) data. There are few published studies regarding the performance of persistence mechanisms for systems that use multilevel modelling approaches, especially when the focus is on population-based queries. A healthcare dataset with 4.2 million records stored in a relational database (MySQL) was used to generate XML and JSON documents based on the openEHR reference model. Six datasets with different sizes were created from these documents and imported into three single machine XML databases (BaseX, eXistdb and Berkeley DB XML) and into a distributed NoSQL database system based on the MapReduce approach, Couchbase, deployed in different cluster configurations of 1, 2, 4, 8 and 12 machines. Population-based queries were submitted to those databases and to the original relational database. Database size and query response times are presented. The XML databases were considerably slower and required much more space than Couchbase. Overall, Couchbase had better response times than MySQL, especially for larger datasets. However, Couchbase requires indexing for each differently formulated query and the indexing time increases with the size of the datasets. The performances of the clusters with 2, 4, 8 and 12 nodes were not better than the single node cluster in relation to the query response time, but the indexing time was reduced proportionally to the number of nodes. The tested XML databases had acceptable performance for openEHR-based data in some querying use cases and small datasets, but were generally much slower than Couchbase. Couchbase also outperformed the response times of the relational database, but required more disk space and had a much longer indexing time. Systems like Couchbase are thus interesting research targets for scalable storage and querying of archetype-based EHR data when population-based use cases are of interest.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Quock, D. E. R.; Cianciarulo, M. B.; APS Engineering Support Division
2007-01-01
The Integrated Relational Model of Installed Systems (IRMIS) is a relational database tool that has been implemented at the Advanced Photon Source to maintain an updated account of approximately 600 control system software applications, 400,000 process variables, and 30,000 control system hardware components. To effectively display this large amount of control system information to operators and engineers, IRMIS was initially built with nine Web-based viewers: Applications Organizing Index, IOC, PLC, Component Type, Installed Components, Network, Controls Spares, Process Variables, and Cables. However, since each viewer is designed to provide details from only one major category of the control system, themore » necessity for a one-stop global search tool for the entire database became apparent. The user requirements for extremely fast database search time and ease of navigation through search results led to the choice of Asynchronous JavaScript and XML (AJAX) technology in the implementation of the IRMIS global search tool. Unique features of the global search tool include a two-tier level of displayed search results, and a database data integrity validation and reporting mechanism.« less
Singh, Vinay Kumar; Ambwani, Sonu; Marla, Soma; Kumar, Anil
2009-10-23
We describe the development of a user friendly tool that would assist in the retrieval of information relating to Cry genes in transgenic crops. The tool also helps in detection of transformed Cry genes from Bacillus thuringiensis present in transgenic plants by providing suitable designed primers for PCR identification of these genes. The tool designed based on relational database model enables easy retrieval of information from the database with simple user queries. The tool also enables users to access related information about Cry genes present in various databases by interacting with different sources (nucleotide sequences, protein sequence, sequence comparison tools, published literature, conserved domains, evolutionary and structural data). http://insilicogenomics.in/Cry-btIdentifier/welcome.html.
PS1-41: Just Add Data: Implementing an Event-Based Data Model for Clinical Trial Tracking
Fuller, Sharon; Carrell, David; Pardee, Roy
2012-01-01
Background/Aims Clinical research trials often have similar fundamental tracking needs, despite being quite variable in their specific logic and activities. A model tracking database that can be quickly adapted by a variety of studies has the potential to achieve significant efficiencies in database development and maintenance. Methods Over the course of several different clinical trials, we have developed a database model that is highly adaptable to a variety of projects. Rather than hard-coding each specific event that might occur in a trial, along with its logical consequences, this model considers each event and its parameters to be a data record in its own right. Each event may have related variables (metadata) describing its prerequisites, subsequent events due, associated mailings, or events that it overrides. The metadata for each event is stored in the same record with the event name. When changes are made to the study protocol, no structural changes to the database are needed. One has only to add or edit events and their metadata. Changes in the event metadata automatically determine any related logic changes. In addition to streamlining application code, this model simplifies communication between the programmer and other team members. Database requirements can be phrased as changes to the underlying data, rather than to the application code. The project team can review a single report of events and metadata and easily see where changes might be needed. In addition to benefitting from streamlined code, the front end database application can also implement useful standard features such as automated mail merges and to do lists. Results The event-based data model has proven itself to be robust, adaptable and user-friendly in a variety of study contexts. We have chosen to implement it as a SQL Server back end and distributed Access front end. Interested readers may request a copy of the Access front end and scripts for creating the back end database. Discussion An event-based database with a consistent, robust set of features has the potential to significantly reduce development time and maintenance expense for clinical trial tracking databases.
A spatial-temporal system for dynamic cadastral management.
Nan, Liu; Renyi, Liu; Guangliang, Zhu; Jiong, Xie
2006-03-01
A practical spatio-temporal database (STDB) technique for dynamic urban land management is presented. One of the STDB models, the expanded model of Base State with Amendments (BSA), is selected as the basis for developing the dynamic cadastral management technique. Two approaches, the Section Fast Indexing (SFI) and the Storage Factors of Variable Granularity (SFVG), are used to improve the efficiency of the BSA model. Both spatial graphic data and attribute data, through a succinct engine, are stored in standard relational database management systems (RDBMS) for the actual implementation of the BSA model. The spatio-temporal database is divided into three interdependent sub-databases: present DB, history DB and the procedures-tracing DB. The efficiency of database operation is improved by the database connection in the bottom layer of the Microsoft SQL Server. The spatio-temporal system can be provided at a low-cost while satisfying the basic needs of urban land management in China. The approaches presented in this paper may also be of significance to countries where land patterns change frequently or to agencies where financial resources are limited.
NASA Astrophysics Data System (ADS)
Thakore, Arun K.; Sauer, Frank
1994-05-01
The organization of modern medical care environments into disease-related clusters, such as a cancer center, a diabetes clinic, etc., has the side-effect of introducing multiple heterogeneous databases, often containing similar information, within the same organization. This heterogeneity fosters incompatibility and prevents the effective sharing of data amongst applications at different sites. Although integration of heterogeneous databases is now feasible, in the medical arena this is often an ad hoc process, not founded on proven database technology or formal methods. In this paper we illustrate the use of a high-level object- oriented semantic association method to model information found in different databases into an integrated conceptual global model that integrates the databases. We provide examples from the medical domain to illustrate an integration approach resulting in a consistent global view, without attacking the autonomy of the underlying databases.
Efficient hemodynamic event detection utilizing relational databases and wavelet analysis
NASA Technical Reports Server (NTRS)
Saeed, M.; Mark, R. G.
2001-01-01
Development of a temporal query framework for time-oriented medical databases has hitherto been a challenging problem. We describe a novel method for the detection of hemodynamic events in multiparameter trends utilizing wavelet coefficients in a MySQL relational database. Storage of the wavelet coefficients allowed for a compact representation of the trends, and provided robust descriptors for the dynamics of the parameter time series. A data model was developed to allow for simplified queries along several dimensions and time scales. Of particular importance, the data model and wavelet framework allowed for queries to be processed with minimal table-join operations. A web-based search engine was developed to allow for user-defined queries. Typical queries required between 0.01 and 0.02 seconds, with at least two orders of magnitude improvement in speed over conventional queries. This powerful and innovative structure will facilitate research on large-scale time-oriented medical databases.
NASA Astrophysics Data System (ADS)
Michel-Sendis, Franco; Martinez-González, Jesus; Gauld, Ian
2017-09-01
SFCOMPO-2.0 is a database of experimental isotopic concentrations measured in destructive radiochemical analysis of spent nuclear fuel (SNF) samples. The database includes corresponding design description of the fuel rods and assemblies, relevant operating conditions and characteristics of the host reactors necessary for modelling and simulation. Aimed at establishing a thorough, reliable, and publicly available resource for code and data validation of safety-related applications, SFCOMPO-2.0 is developed and maintained by the OECD Nuclear Energy Agency (NEA). The SFCOMPO-2.0 database is a Java application which is downloadable from the NEA website.
Goverman, Jeremy; Mathews, Katie; Holavanahalli, Radha K; Vardanian, Andrew; Herndon, David N; Meyer, Walter J; Kowalske, Karen; Fauerbach, Jim; Gibran, Nicole S; Carrougher, Gretchen J; Amtmann, Dagmar; Schneider, Jeffrey C; Ryan, Colleen M
The National Institute on Disability, Independent Living, and Rehabilitation Research (NIDILRR) established the Burn Model System (BMS) in 1993 to improve the lives of burn survivors. The BMS program includes 1) a multicenter longitudinal database describing the functional and psychosocial recovery of burn survivors; 2) site-specific burn-related research; and 3) a knowledge dissemination component directed toward patients and providers. Output from each BMS component was analyzed. Database structure, content, and access procedures are described. Publications using the database were identified and categorized to illustrate the content area of the work. Unused areas of the database were identified for future study. Publications related to site-specific projects were cataloged. The most frequently cited articles are summarized to illustrate the scope of these projects. The effectiveness of dissemination activities was measured by quantifying website hits and information downloads. There were 25 NIDILRR-supported publications that utilized the database. These articles covered topics related to psychological outcomes, functional outcomes, community reintegration, and burn demographics. There were 172 site-specific publications; highly cited articles demonstrate a wide scope of study. For information dissemination, visits to the BMS website quadrupled between 2013 and 2014, with 124,063 downloads of educational material in 2014. The NIDILRR BMS program has played a major role in defining the course of burn recovery, and making that information accessible to the general public. The accumulating information in the database serves as a rich resource to the burn community for future study. The BMS is a model for collaborative research that is multidisciplinary and outcome focused.
Kang, Young Gon; Suh, Eunkyung; Lee, Jae-woo; Kim, Dong Wook; Cho, Kyung Hee; Bae, Chul-Young
2018-01-01
Purpose A comprehensive health index is needed to measure an individual’s overall health and aging status and predict the risk of death and age-related disease incidence, and evaluate the effect of a health management program. The purpose of this study is to demonstrate the validity of estimated biological age (BA) in relation to all-cause mortality and age-related disease incidence based on National Sample Cohort database. Patients and methods This study was based on National Sample Cohort database of the National Health Insurance Service – Eligibility database and the National Health Insurance Service – Medical and Health Examination database of the year 2002 through 2013. BA model was developed based on the National Health Insurance Service – National Sample Cohort (NHIS – NSC) database and Cox proportional hazard analysis was done for mortality and major age-related disease incidence. Results For every 1 year increase of the calculated BA and chronological age difference, the hazard ratio for mortality significantly increased by 1.6% (1.5% in men and 2.0% in women) and also for hypertension, diabetes mellitus, heart disease, stroke, and cancer incidence by 2.5%, 4.2%, 1.3%, 1.6%, and 0.4%, respectively (p<0.001). Conclusion Estimated BA by the developed BA model based on NHIS – NSC database is expected to be used not only as an index for assessing health and aging status and predicting mortality and major age-related disease incidence, but can also be applied to various health care fields. PMID:29593385
Human Exposure Modeling - Databases to Support Exposure Modeling
Human exposure modeling relates pollutant concentrations in the larger environmental media to pollutant concentrations in the immediate exposure media. The models described here are available on other EPA websites.
Supporting user-defined granularities in a spatiotemporal conceptual model
Khatri, V.; Ram, S.; Snodgrass, R.T.; O'Brien, G. M.
2002-01-01
Granularities are integral to spatial and temporal data. A large number of applications require storage of facts along with their temporal and spatial context, which needs to be expressed in terms of appropriate granularities. For many real-world applications, a single granularity in the database is insufficient. In order to support any type of spatial or temporal reasoning, the semantics related to granularities needs to be embedded in the database. Specifying granularities related to facts is an important part of conceptual database design because under-specifying the granularity can restrict an application, affect the relative ordering of events and impact the topological relationships. Closely related to granularities is indeterminacy, i.e., an occurrence time or location associated with a fact that is not known exactly. In this paper, we present an ontology for spatial granularities that is a natural analog of temporal granularities. We propose an upward-compatible, annotation-based spatiotemporal conceptual model that can comprehensively capture the semantics related to spatial and temporal granularities, and indeterminacy without requiring new spatiotemporal constructs. We specify the formal semantics of this spatiotemporal conceptual model via translation to a conventional conceptual model. To underscore the practical focus of our approach, we describe an on-going case study. We apply our approach to a hydrogeologic application at the United States Geologic Survey and demonstrate that our proposed granularity-based spatiotemporal conceptual model is straightforward to use and is comprehensive.
Orris, Greta J.; Cocker, Mark D.; Dunlap, Pamela; Wynn, Jeff C.; Spanski, Gregory T.; Briggs, Deborah A.; Gass, Leila; Bliss, James D.; Bolm, Karen S.; Yang, Chao; Lipin, Bruce R.; Ludington, Stephen; Miller, Robert J.; Słowakiewicz, Mirosław
2014-01-01
This report describes a global, evaporite-related potash deposits and occurrences database and a potash tracts database. Chapter 1 summarizes potash resource history and use. Chapter 2 describes a global potash deposits and occurrences database, which contains more than 900 site records. Chapter 3 describes a potash tracts database, which contains 84 tracts with geology permissive for the presence of evaporite-hosted potash resources, including areas with active evaporite-related potash production, areas with known mineralization that has not been quantified or exploited, and areas with potential for undiscovered potash resources. Chapter 4 describes geographic information system (GIS) data files that include (1) potash deposits and occurrences data, (2) potash tract data, (3) reference databases for potash deposit and tract data, and (4) representative graphics of geologic features related to potash tracts and deposits. Summary descriptive models for stratabound potash-bearing salt and halokinetic potash-bearing salt are included in appendixes A and B, respectively. A glossary of salt- and potash-related terms is contained in appendix C and a list of database abbreviations is given in appendix D. Appendix E describes GIS data files, and appendix F is a guide to using the geodatabase.
Search extension transforms Wiki into a relational system: a case for flavonoid metabolite database.
Arita, Masanori; Suwa, Kazuhiro
2008-09-17
In computer science, database systems are based on the relational model founded by Edgar Codd in 1970. On the other hand, in the area of biology the word 'database' often refers to loosely formatted, very large text files. Although such bio-databases may describe conflicts or ambiguities (e.g. a protein pair do and do not interact, or unknown parameters) in a positive sense, the flexibility of the data format sacrifices a systematic query mechanism equivalent to the widely used SQL. To overcome this disadvantage, we propose embeddable string-search commands on a Wiki-based system and designed a half-formatted database. As proof of principle, a database of flavonoid with 6902 molecular structures from over 1687 plant species was implemented on MediaWiki, the background system of Wikipedia. Registered users can describe any information in an arbitrary format. Structured part is subject to text-string searches to realize relational operations. The system was written in PHP language as the extension of MediaWiki. All modifications are open-source and publicly available. This scheme benefits from both the free-formatted Wiki style and the concise and structured relational-database style. MediaWiki supports multi-user environments for document management, and the cost for database maintenance is alleviated.
Search extension transforms Wiki into a relational system: A case for flavonoid metabolite database
Arita, Masanori; Suwa, Kazuhiro
2008-01-01
Background In computer science, database systems are based on the relational model founded by Edgar Codd in 1970. On the other hand, in the area of biology the word 'database' often refers to loosely formatted, very large text files. Although such bio-databases may describe conflicts or ambiguities (e.g. a protein pair do and do not interact, or unknown parameters) in a positive sense, the flexibility of the data format sacrifices a systematic query mechanism equivalent to the widely used SQL. Results To overcome this disadvantage, we propose embeddable string-search commands on a Wiki-based system and designed a half-formatted database. As proof of principle, a database of flavonoid with 6902 molecular structures from over 1687 plant species was implemented on MediaWiki, the background system of Wikipedia. Registered users can describe any information in an arbitrary format. Structured part is subject to text-string searches to realize relational operations. The system was written in PHP language as the extension of MediaWiki. All modifications are open-source and publicly available. Conclusion This scheme benefits from both the free-formatted Wiki style and the concise and structured relational-database style. MediaWiki supports multi-user environments for document management, and the cost for database maintenance is alleviated. PMID:18822113
Global Distribution of Outbreaks of Water-Associated Infectious Diseases
Yang, Kun; LeJeune, Jeffrey; Alsdorf, Doug; Lu, Bo; Shum, C. K.; Liang, Song
2012-01-01
Background Water plays an important role in the transmission of many infectious diseases, which pose a great burden on global public health. However, the global distribution of these water-associated infectious diseases and underlying factors remain largely unexplored. Methods and Findings Based on the Global Infectious Disease and Epidemiology Network (GIDEON), a global database including water-associated pathogens and diseases was developed. In this study, reported outbreak events associated with corresponding water-associated infectious diseases from 1991 to 2008 were extracted from the database. The location of each reported outbreak event was identified and geocoded into a GIS database. Also collected in the GIS database included geo-referenced socio-environmental information including population density (2000), annual accumulated temperature, surface water area, and average annual precipitation. Poisson models with Bayesian inference were developed to explore the association between these socio-environmental factors and distribution of the reported outbreak events. Based on model predictions a global relative risk map was generated. A total of 1,428 reported outbreak events were retrieved from the database. The analysis suggested that outbreaks of water-associated diseases are significantly correlated with socio-environmental factors. Population density is a significant risk factor for all categories of reported outbreaks of water-associated diseases; water-related diseases (e.g., vector-borne diseases) are associated with accumulated temperature; water-washed diseases (e.g., conjunctivitis) are inversely related to surface water area; both water-borne and water-related diseases are inversely related to average annual rainfall. Based on the model predictions, “hotspots” of risks for all categories of water-associated diseases were explored. Conclusions At the global scale, water-associated infectious diseases are significantly correlated with socio-environmental factors, impacting all regions which are affected disproportionately by different categories of water-associated infectious diseases. PMID:22348158
A new relational database structure and online interface for the HITRAN database
NASA Astrophysics Data System (ADS)
Hill, Christian; Gordon, Iouli E.; Rothman, Laurence S.; Tennyson, Jonathan
2013-11-01
A new format for the HITRAN database is proposed. By storing the line-transition data in a number of linked tables described by a relational database schema, it is possible to overcome the limitations of the existing format, which have become increasingly apparent over the last few years as new and more varied data are being used by radiative-transfer models. Although the database in the new format can be searched using the well-established Structured Query Language (SQL), a web service, HITRANonline, has been deployed to allow users to make most common queries of the database using a graphical user interface in a web page. The advantages of the relational form of the database to ensuring data integrity and consistency are explored, and the compatibility of the online interface with the emerging standards of the Virtual Atomic and Molecular Data Centre (VAMDC) project is discussed. In particular, the ability to access HITRAN data using a standard query language from other websites, command line tools and from within computer programs is described.
WholeCellSimDB: a hybrid relational/HDF database for whole-cell model predictions
Karr, Jonathan R.; Phillips, Nolan C.; Covert, Markus W.
2014-01-01
Mechanistic ‘whole-cell’ models are needed to develop a complete understanding of cell physiology. However, extracting biological insights from whole-cell models requires running and analyzing large numbers of simulations. We developed WholeCellSimDB, a database for organizing whole-cell simulations. WholeCellSimDB was designed to enable researchers to search simulation metadata to identify simulations for further analysis, and quickly slice and aggregate simulation results data. In addition, WholeCellSimDB enables users to share simulations with the broader research community. The database uses a hybrid relational/hierarchical data format architecture to efficiently store and retrieve both simulation setup metadata and results data. WholeCellSimDB provides a graphical Web-based interface to search, browse, plot and export simulations; a JavaScript Object Notation (JSON) Web service to retrieve data for Web-based visualizations; a command-line interface to deposit simulations; and a Python API to retrieve data for advanced analysis. Overall, we believe WholeCellSimDB will help researchers use whole-cell models to advance basic biological science and bioengineering. Database URL: http://www.wholecellsimdb.org Source code repository URL: http://github.com/CovertLab/WholeCellSimDB PMID:25231498
NASA Technical Reports Server (NTRS)
Petropulos, Dolores; Bittner, David; Murawski, Robert; Golden, Bert
2015-01-01
The SmallSat has an unrealized potential in both the private industry and in the federal government. Currently over 70 companies, 50 universities and 17 governmental agencies are involved in SmallSat research and development. In 1994, the U.S. Army Missile and Defense mapped the moon using smallSat imagery. Since then Smart Phones have introduced this imagery to the people of the world as diverse industries watched this trend. The deployment cost of smallSats is also greatly reduced compared to traditional satellites due to the fact that multiple units can be deployed in a single mission. Imaging payloads have become more sophisticated, smaller and lighter. In addition, the growth of small technology obtained from private industries has led to the more widespread use of smallSats. This includes greater revisit rates in imagery, significantly lower costs, the ability to update technology more frequently and the ability to decrease vulnerability of enemy attacks. The popularity of smallSats show a changing mentality in this fast paced world of tomorrow. What impact has this created on the NASA communication networks now and in future years? In this project, we are developing the SmallSat Relational Database which can support a simulation of smallSats within the NASA SCaN Compatability Environment for Networks and Integrated Communications (SCENIC) Modeling and Simulation Lab. The NASA Space Communications and Networks (SCaN) Program can use this modeling to project required network support needs in the next 10 to 15 years. The SmallSat Rational Database could model smallSats just as the other SCaN databases model the more traditional larger satellites, with a few exceptions. One being that the smallSat Database is designed to be built-to-order. The SmallSat database holds various hardware configurations that can be used to model a smallSat. It will require significant effort to develop as the research material can only be populated by hand to obtain the unique data required. When completed it will interface with the SCENIC environment to allow modeling of smallSats. The SmallSat Relational Database can also be integrated with the SCENIC Simulation modeling system that is currently in development. The SmallSat Relational Database simulation will be of great significance in assisting the NASA SCaN group to understand the impact the smallSats have made which have populated the lower orbit around our mother earth. What I have created and worked on this summer session 2015, is the basis for a tool that will be of value to the NASA SCaN SCENIC Simulation Environment for years to come.
Building an Integrated Environment for Multimedia
NASA Technical Reports Server (NTRS)
1997-01-01
Multimedia courseware on the solar system and earth science suitable for use in elementary, middle, and high schools was developed under this grant. The courseware runs on Silicon Graphics, Incorporated (SGI) workstations and personal computers (PCs). There is also a version of the courseware accessible via the World Wide Web. Accompanying multimedia database systems were also developed to enhance the multimedia courseware. The database systems accompanying the PC software are based on the relational model, while the database systems accompanying the SGI software are based on the object-oriented model.
Predicted Hematologic and Plasma Volume Responses Following Rapid Ascent to Progressive Altitudes
2014-06-01
of these changes, and define baseline demographics and physiologic descriptors that are important in predicting these changes. The overall impact of... physiologic descriptors that are important in predicting these changes. Using general linear mixed models and a comprehensive relational database...accomplished using a comprehensive relational database containing individual ascent profiles, demographics, and physiologic subject descriptors as well as
The representation of manipulable solid objects in a relational database
NASA Technical Reports Server (NTRS)
Bahler, D.
1984-01-01
This project is concerned with the interface between database management and solid geometric modeling. The desirability of integrating computer-aided design, manufacture, testing, and management into a coherent system is by now well recognized. One proposed configuration for such a system uses a relational database management system as the central focus; the various other functions are linked through their use of a common data repesentation in the data manager, rather than communicating pairwise to integrate a geometric modeling capability with a generic relational data managemet system in such a way that well-formed questions can be posed and answered about the performance of the system as a whole. One necessary feature of any such system is simplification for purposes of anaysis; this and system performance considerations meant that a paramount goal therefore was that of unity and simplicity of the data structures used.
Domain fusion analysis by applying relational algebra to protein sequence and domain databases
Truong, Kevin; Ikura, Mitsuhiko
2003-01-01
Background Domain fusion analysis is a useful method to predict functionally linked proteins that may be involved in direct protein-protein interactions or in the same metabolic or signaling pathway. As separate domain databases like BLOCKS, PROSITE, Pfam, SMART, PRINTS-S, ProDom, TIGRFAMs, and amalgamated domain databases like InterPro continue to grow in size and quality, a computational method to perform domain fusion analysis that leverages on these efforts will become increasingly powerful. Results This paper proposes a computational method employing relational algebra to find domain fusions in protein sequence databases. The feasibility of this method was illustrated on the SWISS-PROT+TrEMBL sequence database using domain predictions from the Pfam HMM (hidden Markov model) database. We identified 235 and 189 putative functionally linked protein partners in H. sapiens and S. cerevisiae, respectively. From scientific literature, we were able to confirm many of these functional linkages, while the remainder offer testable experimental hypothesis. Results can be viewed at . Conclusion As the analysis can be computed quickly on any relational database that supports standard SQL (structured query language), it can be dynamically updated along with the sequence and domain databases, thereby improving the quality of predictions over time. PMID:12734020
Data structures and organisation: Special problems in scientific applications
NASA Astrophysics Data System (ADS)
Read, Brian J.
1989-12-01
In this paper we discuss and offer answers to the following questions: What, really, are the benifits of databases in physics? Are scientific databases essentially different from conventional ones? What are the drawbacks of a commercial database management system for use with scientific data? Do they outweigh the advantages? Do databases systems have adequate graphics facilities, or is a separate graphics package necessary? SQL as a standard language has deficiencies, but what are they for scientific data in particular? Indeed, is the relational model appropriate anyway? Or, should we turn to object oriented databases?
ARCPHdb: A comprehensive protein database for SF1 and SF2 helicase from archaea.
Moukhtar, Mirna; Chaar, Wafi; Abdel-Razzak, Ziad; Khalil, Mohamad; Taha, Samir; Chamieh, Hala
2017-01-01
Superfamily 1 and Superfamily 2 helicases, two of the largest helicase protein families, play vital roles in many biological processes including replication, transcription and translation. Study of helicase proteins in the model microorganisms of archaea have largely contributed to the understanding of their function, architecture and assembly. Based on a large phylogenomics approach, we have identified and classified all SF1 and SF2 protein families in ninety five sequenced archaea genomes. Here we developed an online webserver linked to a specialized protein database named ARCPHdb to provide access for SF1 and SF2 helicase families from archaea. ARCPHdb was implemented using MySQL relational database. Web interfaces were developed using Netbeans. Data were stored according to UniProt accession numbers, NCBI Ref Seq ID, PDB IDs and Entrez Databases. A user-friendly interactive web interface has been developed to browse, search and download archaeal helicase protein sequences, their available 3D structure models, and related documentation available in the literature provided by ARCPHdb. The database provides direct links to matching external databases. The ARCPHdb is the first online database to compile all protein information on SF1 and SF2 helicase from archaea in one platform. This database provides essential resource information for all researchers interested in the field. Copyright © 2016 Elsevier Ltd. All rights reserved.
Modeling biology using relational databases.
Peitzsch, Robert M
2003-02-01
There are several different methodologies that can be used for designing a database schema; no one is the best for all occasions. This unit demonstrates two different techniques for designing relational tables and discusses when each should be used. These two techniques presented are (1) traditional Entity-Relationship (E-R) modeling and (2) a hybrid method that combines aspects of data warehousing and E-R modeling. The method of choice depends on (1) how well the information and all its inherent relationships are understood, (2) what types of questions will be asked, (3) how many different types of data will be included, and (4) how much data exists.
Dynamic taxonomies applied to a web-based relational database for geo-hydrological risk mitigation
NASA Astrophysics Data System (ADS)
Sacco, G. M.; Nigrelli, G.; Bosio, A.; Chiarle, M.; Luino, F.
2012-02-01
In its 40 years of activity, the Research Institute for Geo-hydrological Protection of the Italian National Research Council has amassed a vast and varied collection of historical documentation on landslides, muddy-debris flows, and floods in northern Italy from 1600 to the present. Since 2008, the archive resources have been maintained through a relational database management system. The database is used for routine study and research purposes as well as for providing support during geo-hydrological emergencies, when data need to be quickly and accurately retrieved. Retrieval speed and accuracy are the main objectives of an implementation based on a dynamic taxonomies model. Dynamic taxonomies are a general knowledge management model for configuring complex, heterogeneous information bases that support exploratory searching. At each stage of the process, the user can explore or browse the database in a guided yet unconstrained way by selecting the alternatives suggested for further refining the search. Dynamic taxonomies have been successfully applied to such diverse and apparently unrelated domains as e-commerce and medical diagnosis. Here, we describe the application of dynamic taxonomies to our database and compare it to traditional relational database query methods. The dynamic taxonomy interface, essentially a point-and-click interface, is considerably faster and less error-prone than traditional form-based query interfaces that require the user to remember and type in the "right" search keywords. Finally, dynamic taxonomy users have confirmed that one of the principal benefits of this approach is the confidence of having considered all the relevant information. Dynamic taxonomies and relational databases work in synergy to provide fast and precise searching: one of the most important factors in timely response to emergencies.
Artificial intelligence techniques for modeling database user behavior
NASA Technical Reports Server (NTRS)
Tanner, Steve; Graves, Sara J.
1990-01-01
The design and development of the adaptive modeling system is described. This system models how a user accesses a relational database management system in order to improve its performance by discovering use access patterns. In the current system, these patterns are used to improve the user interface and may be used to speed data retrieval, support query optimization and support a more flexible data representation. The system models both syntactic and semantic information about the user's access and employs both procedural and rule-based logic to manipulate the model.
Quantify spatial relations to discover handwritten graphical symbols
NASA Astrophysics Data System (ADS)
Li, Jinpeng; Mouchère, Harold; Viard-Gaudin, Christian
2012-01-01
To model a handwritten graphical language, spatial relations describe how the strokes are positioned in the 2-dimensional space. Most of existing handwriting recognition systems make use of some predefined spatial relations. However, considering a complex graphical language, it is hard to express manually all the spatial relations. Another possibility would be to use a clustering technique to discover the spatial relations. In this paper, we discuss how to create a relational graph between strokes (nodes) labeled with graphemes in a graphical language. Then we vectorize spatial relations (edges) for clustering and quantization. As the targeted application, we extract the repetitive sub-graphs (graphical symbols) composed of graphemes and learned spatial relations. On two handwriting databases, a simple mathematical expression database and a complex flowchart database, the unsupervised spatial relations outperform the predefined spatial relations. In addition, we visualize the frequent patterns on two text-lines containing Chinese characters.
Chen, R S; Nadkarni, P; Marenco, L; Levin, F; Erdos, J; Miller, P L
2000-01-01
The entity-attribute-value representation with classes and relationships (EAV/CR) provides a flexible and simple database schema to store heterogeneous biomedical data. In certain circumstances, however, the EAV/CR model is known to retrieve data less efficiently than conventionally based database schemas. To perform a pilot study that systematically quantifies performance differences for database queries directed at real-world microbiology data modeled with EAV/CR and conventional representations, and to explore the relative merits of different EAV/CR query implementation strategies. Clinical microbiology data obtained over a ten-year period were stored using both database models. Query execution times were compared for four clinically oriented attribute-centered and entity-centered queries operating under varying conditions of database size and system memory. The performance characteristics of three different EAV/CR query strategies were also examined. Performance was similar for entity-centered queries in the two database models. Performance in the EAV/CR model was approximately three to five times less efficient than its conventional counterpart for attribute-centered queries. The differences in query efficiency became slightly greater as database size increased, although they were reduced with the addition of system memory. The authors found that EAV/CR queries formulated using multiple, simple SQL statements executed in batch were more efficient than single, large SQL statements. This paper describes a pilot project to explore issues in and compare query performance for EAV/CR and conventional database representations. Although attribute-centered queries were less efficient in the EAV/CR model, these inefficiencies may be addressable, at least in part, by the use of more powerful hardware or more memory, or both.
NoSQL data model for semi-automatic integration of ethnomedicinal plant data from multiple sources.
Ningthoujam, Sanjoy Singh; Choudhury, Manabendra Dutta; Potsangbam, Kumar Singh; Chetia, Pankaj; Nahar, Lutfun; Sarker, Satyajit D; Basar, Norazah; Das Talukdar, Anupam
2014-01-01
Sharing traditional knowledge with the scientific community could refine scientific approaches to phytochemical investigation and conservation of ethnomedicinal plants. As such, integration of traditional knowledge with scientific data using a single platform for sharing is greatly needed. However, ethnomedicinal data are available in heterogeneous formats, which depend on cultural aspects, survey methodology and focus of the study. Phytochemical and bioassay data are also available from many open sources in various standards and customised formats. To design a flexible data model that could integrate both primary and curated ethnomedicinal plant data from multiple sources. The current model is based on MongoDB, one of the Not only Structured Query Language (NoSQL) databases. Although it does not contain schema, modifications were made so that the model could incorporate both standard and customised ethnomedicinal plant data format from different sources. The model presented can integrate both primary and secondary data related to ethnomedicinal plants. Accommodation of disparate data was accomplished by a feature of this database that supported a different set of fields for each document. It also allowed storage of similar data having different properties. The model presented is scalable to a highly complex level with continuing maturation of the database, and is applicable for storing, retrieving and sharing ethnomedicinal plant data. It can also serve as a flexible alternative to a relational and normalised database. Copyright © 2014 John Wiley & Sons, Ltd.
Guidelines for the Effective Use of Entity-Attribute-Value Modeling for Biomedical Databases
Dinu, Valentin; Nadkarni, Prakash
2007-01-01
Purpose To introduce the goals of EAV database modeling, to describe the situations where Entity-Attribute-Value (EAV) modeling is a useful alternative to conventional relational methods of database modeling, and to describe the fine points of implementation in production systems. Methods We analyze the following circumstances: 1) data are sparse and have a large number of applicable attributes, but only a small fraction will apply to a given entity; 2) numerous classes of data need to be represented, each class has a limited number of attributes, but the number of instances of each class is very small. We also consider situations calling for a mixed approach where both conventional and EAV design are used for appropriate data classes. Results and Conclusions In robust production systems, EAV-modeled databases trade a modest data sub-schema for a complex metadata sub-schema. The need to design the metadata effectively makes EAV design potentially more challenging than conventional design. PMID:17098467
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kogalovskii, M.R.
This paper presents a review of problems related to statistical database systems, which are wide-spread in various fields of activity. Statistical databases (SDB) are referred to as databases that consist of data and are used for statistical analysis. Topics under consideration are: SDB peculiarities, properties of data models adequate for SDB requirements, metadata functions, null-value problems, SDB compromise protection problems, stored data compression techniques, and statistical data representation means. Also examined is whether the present Database Management Systems (DBMS) satisfy the SDB requirements. Some actual research directions in SDB systems are considered.
Ontological interpretation of biomedical database content.
Santana da Silva, Filipe; Jansen, Ludger; Freitas, Fred; Schulz, Stefan
2017-06-26
Biological databases store data about laboratory experiments, together with semantic annotations, in order to support data aggregation and retrieval. The exact meaning of such annotations in the context of a database record is often ambiguous. We address this problem by grounding implicit and explicit database content in a formal-ontological framework. By using a typical extract from the databases UniProt and Ensembl, annotated with content from GO, PR, ChEBI and NCBI Taxonomy, we created four ontological models (in OWL), which generate explicit, distinct interpretations under the BioTopLite2 (BTL2) upper-level ontology. The first three models interpret database entries as individuals (IND), defined classes (SUBC), and classes with dispositions (DISP), respectively; the fourth model (HYBR) is a combination of SUBC and DISP. For the evaluation of these four models, we consider (i) database content retrieval, using ontologies as query vocabulary; (ii) information completeness; and, (iii) DL complexity and decidability. The models were tested under these criteria against four competency questions (CQs). IND does not raise any ontological claim, besides asserting the existence of sample individuals and relations among them. Modelling patterns have to be created for each type of annotation referent. SUBC is interpreted regarding maximally fine-grained defined subclasses under the classes referred to by the data. DISP attempts to extract truly ontological statements from the database records, claiming the existence of dispositions. HYBR is a hybrid of SUBC and DISP and is more parsimonious regarding expressiveness and query answering complexity. For each of the four models, the four CQs were submitted as DL queries. This shows the ability to retrieve individuals with IND, and classes in SUBC and HYBR. DISP does not retrieve anything because the axioms with disposition are embedded in General Class Inclusion (GCI) statements. Ambiguity of biological database content is addressed by a method that identifies implicit knowledge behind semantic annotations in biological databases and grounds it in an expressive upper-level ontology. The result is a seamless representation of database structure, content and annotations as OWL models.
[A basic research to share Fourier transform near-infrared spectrum information resource].
Zhang, Lu-Da; Li, Jun-Hui; Zhao, Long-Lian; Zhao, Li-Li; Qin, Fang-Li; Yan, Yan-Lu
2004-08-01
A method to share the information resource in the database of Fourier transform near-infrared(FTNIR) spectrum information of agricultural products and utilize the spectrum information sufficiently is explored in this paper. Mapping spectrum information from one instrument to another is studied to express the spectrum information accurately between the instruments. Then mapping spectrum information is used to establish a mathematical model of quantitative analysis without including standard samples. The analysis result is that the relative coefficient r is 0.941 and the relative error is 3.28% between the model estimate values and the Kjeldahl's value for the protein content of twenty-two wheat samples, while the relative coefficient r is 0.963 and the relative error is 2.4% for the other model, which is established by using standard samples. It is shown that the spectrum information can be shared by using the mapping spectrum information. So it can be concluded that the spectrum information in one FTNIR spectrum information database can be transformed to another instrument's mapping spectrum information, which makes full use of the information resource in the database of FTNIR spectrum information to realize the resource sharing between different instruments.
High-quality unsaturated zone hydraulic property data for hydrologic applications
Perkins, Kimberlie; Nimmo, John R.
2009-01-01
In hydrologic studies, especially those using dynamic unsaturated zone moisture modeling, calculations based on property transfer models informed by hydraulic property databases are often used in lieu of measured data from the site of interest. Reliance on database-informed predicted values has become increasingly common with the use of neural networks. High-quality data are needed for databases used in this way and for theoretical and property transfer model development and testing. Hydraulic properties predicted on the basis of existing databases may be adequate in some applications but not others. An obvious problem occurs when the available database has few or no data for samples that are closely related to the medium of interest. The data set presented in this paper includes saturated and unsaturated hydraulic conductivity, water retention, particle-size distributions, and bulk properties. All samples are minimally disturbed, all measurements were performed using the same state of the art techniques and the environments represented are diverse.
Financing a future for public biological data.
Ellis, L B; Kalumbi, D
1999-09-01
The public web-based biological database infrastructure is a source of both wonder and worry. Users delight in the ever increasing amounts of information available; database administrators and curators worry about long-term financial support. An earlier study of 153 biological databases (Ellis and Kalumbi, Nature Biotechnol., 16, 1323-1324, 1998) determined that near future (1-5 year) funding for over two-thirds of them was uncertain. More detailed data are required to determine the magnitude of the problem and offer possible solutions. This study examines the finances and use statistics of a few of these organizations in more depth, and reviews several economic models that may help sustain them. Six organizations were studied. Their administrative overhead is fairly low; non-administrative personnel and computer-related costs account for 77% of expenses. One smaller, more specialized US database, in 1997, had 60% of total access from US domains; a majority (56%) of its US accesses came from commercial domains, although only 2% of the 153 databases originally studied received any industrial support. The most popular model used to gain industrial support is asymmetric pricing: preferentially charging the commercial users of a database. At least five biological databases have recently begun using this model. Advertising is another model which may be useful for the more general, more heavily used sites. Microcommerce has promise, especially for databases that do not attract advertisers, but needs further testing. The least income reported for any of the databases studied was $50,000/year; applying this rate to 400 biological databases (a lower limit of the number of such databases, many of which require far larger resources) would mean annual support need of at least $20 million. To obtain this level of support is challenging, yet failure to accept the challenge could be catastrophic. lynda@tc.umn. edu
Bravo, Carlos; Suarez, Carlos; González, Carolina; López, Diego; Blobel, Bernd
2014-01-01
Healthcare information is distributed through multiple heterogeneous and autonomous systems. Access to, and sharing of, distributed information sources are a challenging task. To contribute to meeting this challenge, this paper presents a formal, complete and semi-automatic transformation service from Relational Databases to Web Ontology Language. The proposed service makes use of an algorithm that allows to transform several data models of different domains by deploying mainly inheritance rules. The paper emphasizes the relevance of integrating the proposed approach into an ontology-based interoperability service to achieve semantic interoperability.
Human Thermal Model Evaluation Using the JSC Human Thermal Database
NASA Technical Reports Server (NTRS)
Bue, Grant; Makinen, Janice; Cognata, Thomas
2012-01-01
Human thermal modeling has considerable long term utility to human space flight. Such models provide a tool to predict crew survivability in support of vehicle design and to evaluate crew response in untested space environments. It is to the benefit of any such model not only to collect relevant experimental data to correlate it against, but also to maintain an experimental standard or benchmark for future development in a readily and rapidly searchable and software accessible format. The Human thermal database project is intended to do just so; to collect relevant data from literature and experimentation and to store the data in a database structure for immediate and future use as a benchmark to judge human thermal models against, in identifying model strengths and weakness, to support model development and improve correlation, and to statistically quantify a model s predictive quality. The human thermal database developed at the Johnson Space Center (JSC) is intended to evaluate a set of widely used human thermal models. This set includes the Wissler human thermal model, a model that has been widely used to predict the human thermoregulatory response to a variety of cold and hot environments. These models are statistically compared to the current database, which contains experiments of human subjects primarily in air from a literature survey ranging between 1953 and 2004 and from a suited experiment recently performed by the authors, for a quantitative study of relative strength and predictive quality of the models.
Knowlton, Michelle N; Li, Tongbin; Ren, Yongliang; Bill, Brent R; Ellis, Lynda Bm; Ekker, Stephen C
2008-01-07
The zebrafish is a powerful model vertebrate amenable to high throughput in vivo genetic analyses. Examples include reverse genetic screens using morpholino knockdown, expression-based screening using enhancer trapping and forward genetic screening using transposon insertional mutagenesis. We have created a database to facilitate web-based distribution of data from such genetic studies. The MOrpholino DataBase is a MySQL relational database with an online, PHP interface. Multiple quality control levels allow differential access to data in raw and finished formats. MODBv1 includes sequence information relating to almost 800 morpholinos and their targets and phenotypic data regarding the dose effect of each morpholino (mortality, toxicity and defects). To improve the searchability of this database, we have incorporated a fixed-vocabulary defect ontology that allows for the organization of morpholino affects based on anatomical structure affected and defect produced. This also allows comparison between species utilizing Phenotypic Attribute Trait Ontology (PATO) designated terminology. MODB is also cross-linked with ZFIN, allowing full searches between the two databases. MODB offers users the ability to retrieve morpholino data by sequence of morpholino or target, name of target, anatomical structure affected and defect produced. MODB data can be used for functional genomic analysis of morpholino design to maximize efficacy and minimize toxicity. MODB also serves as a template for future sequence-based functional genetic screen databases, and it is currently being used as a model for the creation of a mutagenic insertional transposon database.
Human Thermal Model Evaluation Using the JSC Human Thermal Database
NASA Technical Reports Server (NTRS)
Cognata, T.; Bue, G.; Makinen, J.
2011-01-01
The human thermal database developed at the Johnson Space Center (JSC) is used to evaluate a set of widely used human thermal models. This database will facilitate a more accurate evaluation of human thermoregulatory response using in a variety of situations, including those situations that might otherwise prove too dangerous for actual testing--such as extreme hot or cold splashdown conditions. This set includes the Wissler human thermal model, a model that has been widely used to predict the human thermoregulatory response to a variety of cold and hot environments. These models are statistically compared to the current database, which contains experiments of human subjects primarily in air from a literature survey ranging between 1953 and 2004 and from a suited experiment recently performed by the authors, for a quantitative study of relative strength and predictive quality of the models. Human thermal modeling has considerable long term utility to human space flight. Such models provide a tool to predict crew survivability in support of vehicle design and to evaluate crew response in untested environments. It is to the benefit of any such model not only to collect relevant experimental data to correlate it against, but also to maintain an experimental standard or benchmark for future development in a readily and rapidly searchable and software accessible format. The Human thermal database project is intended to do just so; to collect relevant data from literature and experimentation and to store the data in a database structure for immediate and future use as a benchmark to judge human thermal models against, in identifying model strengths and weakness, to support model development and improve correlation, and to statistically quantify a model s predictive quality.
Domain fusion analysis by applying relational algebra to protein sequence and domain databases.
Truong, Kevin; Ikura, Mitsuhiko
2003-05-06
Domain fusion analysis is a useful method to predict functionally linked proteins that may be involved in direct protein-protein interactions or in the same metabolic or signaling pathway. As separate domain databases like BLOCKS, PROSITE, Pfam, SMART, PRINTS-S, ProDom, TIGRFAMs, and amalgamated domain databases like InterPro continue to grow in size and quality, a computational method to perform domain fusion analysis that leverages on these efforts will become increasingly powerful. This paper proposes a computational method employing relational algebra to find domain fusions in protein sequence databases. The feasibility of this method was illustrated on the SWISS-PROT+TrEMBL sequence database using domain predictions from the Pfam HMM (hidden Markov model) database. We identified 235 and 189 putative functionally linked protein partners in H. sapiens and S. cerevisiae, respectively. From scientific literature, we were able to confirm many of these functional linkages, while the remainder offer testable experimental hypothesis. Results can be viewed at http://calcium.uhnres.utoronto.ca/pi. As the analysis can be computed quickly on any relational database that supports standard SQL (structured query language), it can be dynamically updated along with the sequence and domain databases, thereby improving the quality of predictions over time.
A Methodology for Benchmarking Relational Database Machines,
1984-01-01
user benchmarks is to compare the multiple users to the best-case performance The data for each query classification coll and the performance...called a benchmark. The term benchmark originates from the markers used by sur - veyors in establishing common reference points for their measure...formatted databases. In order to further simplify the problem, we restrict our study to those DBMs which support the relational model. A sur - vey
Engineering the object-relation database model in O-Raid
NASA Technical Reports Server (NTRS)
Dewan, Prasun; Vikram, Ashish; Bhargava, Bharat
1989-01-01
Raid is a distributed database system based on the relational model. O-raid is an extension of the Raid system and will support complex data objects. The design of O-Raid is evolutionary and retains all features of relational data base systems and those of a general purpose object-oriented programming language. O-Raid has several novel properties. Objects, classes, and inheritance are supported together with a predicate-base relational query language. O-Raid objects are compatible with C++ objects and may be read and manipulated by a C++ program without any 'impedance mismatch'. Relations and columns within relations may themselves be treated as objects with associated variables and methods. Relations may contain heterogeneous objects, that is, objects of more than one class in a certain column, which can individually evolve by being reclassified. Special facilities are provided to reduce the data search in a relation containing complex objects.
Recent Progress in the Development of Metabolome Databases for Plant Systems Biology
Fukushima, Atsushi; Kusano, Miyako
2013-01-01
Metabolomics has grown greatly as a functional genomics tool, and has become an invaluable diagnostic tool for biochemical phenotyping of biological systems. Over the past decades, a number of databases involving information related to mass spectra, compound names and structures, statistical/mathematical models and metabolic pathways, and metabolite profile data have been developed. Such databases complement each other and support efficient growth in this area, although the data resources remain scattered across the World Wide Web. Here, we review available metabolome databases and summarize the present status of development of related tools, particularly focusing on the plant metabolome. Data sharing discussed here will pave way for the robust interpretation of metabolomic data and advances in plant systems biology. PMID:23577015
An integrated approach to reservoir modeling
DOE Office of Scientific and Technical Information (OSTI.GOV)
Donaldson, K.
1993-08-01
The purpose of this research is to evaluate the usefulness of the following procedural and analytical methods in investigating the heterogeneity of the oil reserve for the Mississipian Big Injun Sandstone of the Granny Creek field, Clay and Roane counties, West Virginia: (1) relational database, (2) two-dimensional cross sections, (3) true three-dimensional modeling, (4) geohistory analysis, (5) a rule-based expert system, and (6) geographical information systems. The large data set could not be effectively integrated and interpreted without this approach. A relational database was designed to fully integrate three- and four-dimensional data. The database provides an effective means for maintainingmore » and manipulating the data. A two-dimensional cross section program was designed to correlate stratigraphy, depositional environments, porosity, permeability, and petrographic data. This flexible design allows for additional four-dimensional data. Dynamic Graphics[sup [trademark
van Baal, Sjozef; Kaimakis, Polynikis; Phommarinh, Manyphong; Koumbi, Daphne; Cuppens, Harry; Riccardino, Francesca; Macek, Milan; Scriver, Charles R; Patrinos, George P
2007-01-01
Frequency of INherited Disorders database (FINDbase) (http://www.findbase.org) is a relational database, derived from the ETHNOS software, recording frequencies of causative mutations leading to inherited disorders worldwide. Database records include the population and ethnic group, the disorder name and the related gene, accompanied by links to any corresponding locus-specific mutation database, to the respective Online Mendelian Inheritance in Man entries and the mutation together with its frequency in that population. The initial information is derived from the published literature, locus-specific databases and genetic disease consortia. FINDbase offers a user-friendly query interface, providing instant access to the list and frequencies of the different mutations. Query outputs can be either in a table or graphical format, accompanied by reference(s) on the data source. Registered users from three different groups, namely administrator, national coordinator and curator, are responsible for database curation and/or data entry/correction online via a password-protected interface. Databaseaccess is free of charge and there are no registration requirements for data querying. FINDbase provides a simple, web-based system for population-based mutation data collection and retrieval and can serve not only as a valuable online tool for molecular genetic testing of inherited disorders but also as a non-profit model for sustainable database funding, in the form of a 'database-journal'.
Monitoring of services with non-relational databases and map-reduce framework
NASA Astrophysics Data System (ADS)
Babik, M.; Souto, F.
2012-12-01
Service Availability Monitoring (SAM) is a well-established monitoring framework that performs regular measurements of the core site services and reports the corresponding availability and reliability of the Worldwide LHC Computing Grid (WLCG) infrastructure. One of the existing extensions of SAM is Site Wide Area Testing (SWAT), which gathers monitoring information from the worker nodes via instrumented jobs. This generates quite a lot of monitoring data to process, as there are several data points for every job and several million jobs are executed every day. The recent uptake of non-relational databases opens a new paradigm in the large-scale storage and distributed processing of systems with heavy read-write workloads. For SAM this brings new possibilities to improve its model, from performing aggregation of measurements to storing raw data and subsequent re-processing. Both SAM and SWAT are currently tuned to run at top performance, reaching some of the limits in storage and processing power of their existing Oracle relational database. We investigated the usability and performance of non-relational storage together with its distributed data processing capabilities. For this, several popular systems have been compared. In this contribution we describe our investigation of the existing non-relational databases suited for monitoring systems covering Cassandra, HBase and MongoDB. Further, we present our experiences in data modeling and prototyping map-reduce algorithms focusing on the extension of the already existing availability and reliability computations. Finally, possible future directions in this area are discussed, analyzing the current deficiencies of the existing Grid monitoring systems and proposing solutions to leverage the benefits of the non-relational databases to get more scalable and flexible frameworks.
Database and Related Activities in Japan
DOE Office of Scientific and Technical Information (OSTI.GOV)
Murakami, Izumi; Kato, Daiji; Kato, Masatoshi
2011-05-11
We have constructed and made available atomic and molecular (AM) numerical databases on collision processes such as electron-impact excitation and ionization, recombination and charge transfer of atoms and molecules relevant for plasma physics, fusion research, astrophysics, applied-science plasma, and other related areas. The retrievable data is freely accessible via the internet. We also work on atomic data evaluation and constructing collisional-radiative models for spectroscopic plasma diagnostics. Recently we have worked on Fe ions and W ions theoretically and experimentally. The atomic data and collisional-radiative models for these ions are examined and applied to laboratory plasmas. A visible M1 transition ofmore » W{sup 26+} ion is identified at 389.41 nm by EBIT experiments and theoretical calculations. We have small non-retrievable databases in addition to our main database. Recently we evaluated photo-absorption cross sections for 9 atoms and 23 molecules and we present them as a new database. We established a new association ''Forum of Atomic and Molecular Data and Their Applications'' to exchange information among AM data producers, data providers and data users in Japan and we hope this will help to encourage AM data activities in Japan.« less
Database and Related Activities in Japan
NASA Astrophysics Data System (ADS)
Murakami, Izumi; Kato, Daiji; Kato, Masatoshi; Sakaue, Hiroyuki A.; Kato, Takako; Ding, Xiaobin; Morita, Shigeru; Kitajima, Masashi; Koike, Fumihiro; Nakamura, Nobuyuki; Sakamoto, Naoki; Sasaki, Akira; Skobelev, Igor; Tsuchida, Hidetsugu; Ulantsev, Artemiy; Watanabe, Tetsuya; Yamamoto, Norimasa
2011-05-01
We have constructed and made available atomic and molecular (AM) numerical databases on collision processes such as electron-impact excitation and ionization, recombination and charge transfer of atoms and molecules relevant for plasma physics, fusion research, astrophysics, applied-science plasma, and other related areas. The retrievable data is freely accessible via the internet. We also work on atomic data evaluation and constructing collisional-radiative models for spectroscopic plasma diagnostics. Recently we have worked on Fe ions and W ions theoretically and experimentally. The atomic data and collisional-radiative models for these ions are examined and applied to laboratory plasmas. A visible M1 transition of W26+ ion is identified at 389.41 nm by EBIT experiments and theoretical calculations. We have small non-retrievable databases in addition to our main database. Recently we evaluated photo-absorption cross sections for 9 atoms and 23 molecules and we present them as a new database. We established a new association "Forum of Atomic and Molecular Data and Their Applications" to exchange information among AM data producers, data providers and data users in Japan and we hope this will help to encourage AM data activities in Japan.
Vernetti, Lawrence; Bergenthal, Luke; Shun, Tong Ying; Taylor, D. Lansing
2016-01-01
Abstract Microfluidic human organ models, microphysiology systems (MPS), are currently being developed as predictive models of drug safety and efficacy in humans. To design and validate MPS as predictive of human safety liabilities requires safety data for a reference set of compounds, combined with in vitro data from the human organ models. To address this need, we have developed an internet database, the MPS database (MPS-Db), as a powerful platform for experimental design, data management, and analysis, and to combine experimental data with reference data, to enable computational modeling. The present study demonstrates the capability of the MPS-Db in early safety testing using a human liver MPS to relate the effects of tolcapone and entacapone in the in vitro model to human in vivo effects. These two compounds were chosen to be evaluated as a representative pair of marketed drugs because they are structurally similar, have the same target, and were found safe or had an acceptable risk in preclinical and clinical trials, yet tolcapone induced unacceptable levels of hepatotoxicity while entacapone was found to be safe. Results demonstrate the utility of the MPS-Db as an essential resource for relating in vitro organ model data to the multiple biochemical, preclinical, and clinical data sources on in vivo drug effects. PMID:28781990
The Efficacy of Multidimensional Constraint Keys in Database Query Performance
ERIC Educational Resources Information Center
Cardwell, Leslie K.
2012-01-01
This work is intended to introduce a database design method to resolve the two-dimensional complexities inherent in the relational data model and its resulting performance challenges through abstract multidimensional constructs. A multidimensional constraint is derived and utilized to implement an indexed Multidimensional Key (MK) to abstract a…
Human Ageing Genomic Resources: new and updated databases
Tacutu, Robi; Thornton, Daniel; Johnson, Emily; Budovsky, Arie; Barardo, Diogo; Craig, Thomas; Diana, Eugene; Lehmann, Gilad; Toren, Dmitri; Wang, Jingwei; Fraifeld, Vadim E
2018-01-01
Abstract In spite of a growing body of research and data, human ageing remains a poorly understood process. Over 10 years ago we developed the Human Ageing Genomic Resources (HAGR), a collection of databases and tools for studying the biology and genetics of ageing. Here, we present HAGR’s main functionalities, highlighting new additions and improvements. HAGR consists of six core databases: (i) the GenAge database of ageing-related genes, in turn composed of a dataset of >300 human ageing-related genes and a dataset with >2000 genes associated with ageing or longevity in model organisms; (ii) the AnAge database of animal ageing and longevity, featuring >4000 species; (iii) the GenDR database with >200 genes associated with the life-extending effects of dietary restriction; (iv) the LongevityMap database of human genetic association studies of longevity with >500 entries; (v) the DrugAge database with >400 ageing or longevity-associated drugs or compounds; (vi) the CellAge database with >200 genes associated with cell senescence. All our databases are manually curated by experts and regularly updated to ensure a high quality data. Cross-links across our databases and to external resources help researchers locate and integrate relevant information. HAGR is freely available online (http://genomics.senescence.info/). PMID:29121237
1983-12-16
management system (DBMS) is to record and maintain information used by an organization in the organization’s decision-making process. Some advantages of a...independence. Database Management Systems are classified into three major models; relational, network, and hierarchical. Each model uses a software...feeling impedes the overall effectiveness of the 4-" Acquisition Management Information System (AMIS), which currently uses S2k. The size of the AMIS
High dimensional biological data retrieval optimization with NoSQL technology.
Wang, Shicai; Pandis, Ioannis; Wu, Chao; He, Sijin; Johnson, David; Emam, Ibrahim; Guitton, Florian; Guo, Yike
2014-01-01
High-throughput transcriptomic data generated by microarray experiments is the most abundant and frequently stored kind of data currently used in translational medicine studies. Although microarray data is supported in data warehouses such as tranSMART, when querying relational databases for hundreds of different patient gene expression records queries are slow due to poor performance. Non-relational data models, such as the key-value model implemented in NoSQL databases, hold promise to be more performant solutions. Our motivation is to improve the performance of the tranSMART data warehouse with a view to supporting Next Generation Sequencing data. In this paper we introduce a new data model better suited for high-dimensional data storage and querying, optimized for database scalability and performance. We have designed a key-value pair data model to support faster queries over large-scale microarray data and implemented the model using HBase, an implementation of Google's BigTable storage system. An experimental performance comparison was carried out against the traditional relational data model implemented in both MySQL Cluster and MongoDB, using a large publicly available transcriptomic data set taken from NCBI GEO concerning Multiple Myeloma. Our new key-value data model implemented on HBase exhibits an average 5.24-fold increase in high-dimensional biological data query performance compared to the relational model implemented on MySQL Cluster, and an average 6.47-fold increase on query performance on MongoDB. The performance evaluation found that the new key-value data model, in particular its implementation in HBase, outperforms the relational model currently implemented in tranSMART. We propose that NoSQL technology holds great promise for large-scale data management, in particular for high-dimensional biological data such as that demonstrated in the performance evaluation described in this paper. We aim to use this new data model as a basis for migrating tranSMART's implementation to a more scalable solution for Big Data.
High dimensional biological data retrieval optimization with NoSQL technology
2014-01-01
Background High-throughput transcriptomic data generated by microarray experiments is the most abundant and frequently stored kind of data currently used in translational medicine studies. Although microarray data is supported in data warehouses such as tranSMART, when querying relational databases for hundreds of different patient gene expression records queries are slow due to poor performance. Non-relational data models, such as the key-value model implemented in NoSQL databases, hold promise to be more performant solutions. Our motivation is to improve the performance of the tranSMART data warehouse with a view to supporting Next Generation Sequencing data. Results In this paper we introduce a new data model better suited for high-dimensional data storage and querying, optimized for database scalability and performance. We have designed a key-value pair data model to support faster queries over large-scale microarray data and implemented the model using HBase, an implementation of Google's BigTable storage system. An experimental performance comparison was carried out against the traditional relational data model implemented in both MySQL Cluster and MongoDB, using a large publicly available transcriptomic data set taken from NCBI GEO concerning Multiple Myeloma. Our new key-value data model implemented on HBase exhibits an average 5.24-fold increase in high-dimensional biological data query performance compared to the relational model implemented on MySQL Cluster, and an average 6.47-fold increase on query performance on MongoDB. Conclusions The performance evaluation found that the new key-value data model, in particular its implementation in HBase, outperforms the relational model currently implemented in tranSMART. We propose that NoSQL technology holds great promise for large-scale data management, in particular for high-dimensional biological data such as that demonstrated in the performance evaluation described in this paper. We aim to use this new data model as a basis for migrating tranSMART's implementation to a more scalable solution for Big Data. PMID:25435347
Relax with CouchDB - Into the non-relational DBMS era of Bioinformatics
Manyam, Ganiraju; Payton, Michelle A.; Roth, Jack A.; Abruzzo, Lynne V.; Coombes, Kevin R.
2012-01-01
With the proliferation of high-throughput technologies, genome-level data analysis has become common in molecular biology. Bioinformaticians are developing extensive resources to annotate and mine biological features from high-throughput data. The underlying database management systems for most bioinformatics software are based on a relational model. Modern non-relational databases offer an alternative that has flexibility, scalability, and a non-rigid design schema. Moreover, with an accelerated development pace, non-relational databases like CouchDB can be ideal tools to construct bioinformatics utilities. We describe CouchDB by presenting three new bioinformatics resources: (a) geneSmash, which collates data from bioinformatics resources and provides automated gene-centric annotations, (b) drugBase, a database of drug-target interactions with a web interface powered by geneSmash, and (c) HapMap-CN, which provides a web interface to query copy number variations from three SNP-chip HapMap datasets. In addition to the web sites, all three systems can be accessed programmatically via web services. PMID:22609849
Zhang, Qingzhou; Yang, Bo; Chen, Xujiao; Xu, Jing; Mei, Changlin; Mao, Zhiguo
2014-01-01
We present a bioinformatics database named Renal Gene Expression Database (RGED), which contains comprehensive gene expression data sets from renal disease research. The web-based interface of RGED allows users to query the gene expression profiles in various kidney-related samples, including renal cell lines, human kidney tissues and murine model kidneys. Researchers can explore certain gene profiles, the relationships between genes of interests and identify biomarkers or even drug targets in kidney diseases. The aim of this work is to provide a user-friendly utility for the renal disease research community to query expression profiles of genes of their own interest without the requirement of advanced computational skills. Availability and implementation: Website is implemented in PHP, R, MySQL and Nginx and freely available from http://rged.wall-eva.net. Database URL: http://rged.wall-eva.net PMID:25252782
Gene annotation from scientific literature using mappings between keyword systems.
Pérez, Antonio J; Perez-Iratxeta, Carolina; Bork, Peer; Thode, Guillermo; Andrade, Miguel A
2004-09-01
The description of genes in databases by keywords helps the non-specialist to quickly grasp the properties of a gene and increases the efficiency of computational tools that are applied to gene data (e.g. searching a gene database for sequences related to a particular biological process). However, the association of keywords to genes or protein sequences is a difficult process that ultimately implies examination of the literature related to a gene. To support this task, we present a procedure to derive keywords from the set of scientific abstracts related to a gene. Our system is based on the automated extraction of mappings between related terms from different databases using a model of fuzzy associations that can be applied with all generality to any pair of linked databases. We tested the system by annotating genes of the SWISS-PROT database with keywords derived from the abstracts linked to their entries (stored in the MEDLINE database of scientific references). The performance of the annotation procedure was much better for SWISS-PROT keywords (recall of 47%, precision of 68%) than for Gene Ontology terms (recall of 8%, precision of 67%). The algorithm can be publicly accessed and used for the annotation of sequences through a web server at http://www.bork.embl.de/kat
The eNanoMapper database for nanomaterial safety information
Chomenidis, Charalampos; Doganis, Philip; Fadeel, Bengt; Grafström, Roland; Hardy, Barry; Hastings, Janna; Hegi, Markus; Jeliazkov, Vedrin; Kochev, Nikolay; Kohonen, Pekka; Munteanu, Cristian R; Sarimveis, Haralambos; Smeets, Bart; Sopasakis, Pantelis; Tsiliki, Georgia; Vorgrimmler, David; Willighagen, Egon
2015-01-01
Summary Background: The NanoSafety Cluster, a cluster of projects funded by the European Commision, identified the need for a computational infrastructure for toxicological data management of engineered nanomaterials (ENMs). Ontologies, open standards, and interoperable designs were envisioned to empower a harmonized approach to European research in nanotechnology. This setting provides a number of opportunities and challenges in the representation of nanomaterials data and the integration of ENM information originating from diverse systems. Within this cluster, eNanoMapper works towards supporting the collaborative safety assessment for ENMs by creating a modular and extensible infrastructure for data sharing, data analysis, and building computational toxicology models for ENMs. Results: The eNanoMapper database solution builds on the previous experience of the consortium partners in supporting diverse data through flexible data storage, open source components and web services. We have recently described the design of the eNanoMapper prototype database along with a summary of challenges in the representation of ENM data and an extensive review of existing nano-related data models, databases, and nanomaterials-related entries in chemical and toxicogenomic databases. This paper continues with a focus on the database functionality exposed through its application programming interface (API), and its use in visualisation and modelling. Considering the preferred community practice of using spreadsheet templates, we developed a configurable spreadsheet parser facilitating user friendly data preparation and data upload. We further present a web application able to retrieve the experimental data via the API and analyze it with multiple data preprocessing and machine learning algorithms. Conclusion: We demonstrate how the eNanoMapper database is used to import and publish online ENM and assay data from several data sources, how the “representational state transfer” (REST) API enables building user friendly interfaces and graphical summaries of the data, and how these resources facilitate the modelling of reproducible quantitative structure–activity relationships for nanomaterials (NanoQSAR). PMID:26425413
The eNanoMapper database for nanomaterial safety information.
Jeliazkova, Nina; Chomenidis, Charalampos; Doganis, Philip; Fadeel, Bengt; Grafström, Roland; Hardy, Barry; Hastings, Janna; Hegi, Markus; Jeliazkov, Vedrin; Kochev, Nikolay; Kohonen, Pekka; Munteanu, Cristian R; Sarimveis, Haralambos; Smeets, Bart; Sopasakis, Pantelis; Tsiliki, Georgia; Vorgrimmler, David; Willighagen, Egon
2015-01-01
The NanoSafety Cluster, a cluster of projects funded by the European Commision, identified the need for a computational infrastructure for toxicological data management of engineered nanomaterials (ENMs). Ontologies, open standards, and interoperable designs were envisioned to empower a harmonized approach to European research in nanotechnology. This setting provides a number of opportunities and challenges in the representation of nanomaterials data and the integration of ENM information originating from diverse systems. Within this cluster, eNanoMapper works towards supporting the collaborative safety assessment for ENMs by creating a modular and extensible infrastructure for data sharing, data analysis, and building computational toxicology models for ENMs. The eNanoMapper database solution builds on the previous experience of the consortium partners in supporting diverse data through flexible data storage, open source components and web services. We have recently described the design of the eNanoMapper prototype database along with a summary of challenges in the representation of ENM data and an extensive review of existing nano-related data models, databases, and nanomaterials-related entries in chemical and toxicogenomic databases. This paper continues with a focus on the database functionality exposed through its application programming interface (API), and its use in visualisation and modelling. Considering the preferred community practice of using spreadsheet templates, we developed a configurable spreadsheet parser facilitating user friendly data preparation and data upload. We further present a web application able to retrieve the experimental data via the API and analyze it with multiple data preprocessing and machine learning algorithms. We demonstrate how the eNanoMapper database is used to import and publish online ENM and assay data from several data sources, how the "representational state transfer" (REST) API enables building user friendly interfaces and graphical summaries of the data, and how these resources facilitate the modelling of reproducible quantitative structure-activity relationships for nanomaterials (NanoQSAR).
NASA Astrophysics Data System (ADS)
Gebhardt, Steffen; Wehrmann, Thilo; Klinger, Verena; Schettler, Ingo; Huth, Juliane; Künzer, Claudia; Dech, Stefan
2010-10-01
The German-Vietnamese water-related information system for the Mekong Delta (WISDOM) project supports business processes in Integrated Water Resources Management in Vietnam. Multiple disciplines bring together earth and ground based observation themes, such as environmental monitoring, water management, demographics, economy, information technology, and infrastructural systems. This paper introduces the components of the web-based WISDOM system including data, logic and presentation tier. It focuses on the data models upon which the database management system is built, including techniques for tagging or linking metadata with the stored information. The model also uses ordered groupings of spatial, thematic and temporal reference objects to semantically tag datasets to enable fast data retrieval, such as finding all data in a specific administrative unit belonging to a specific theme. A spatial database extension is employed by the PostgreSQL database. This object-oriented database was chosen over a relational database to tag spatial objects to tabular data, improving the retrieval of census and observational data at regional, provincial, and local areas. While the spatial database hinders processing raster data, a "work-around" was built into WISDOM to permit efficient management of both raster and vector data. The data model also incorporates styling aspects of the spatial datasets through styled layer descriptions (SLD) and web mapping service (WMS) layer specifications, allowing retrieval of rendered maps. Metadata elements of the spatial data are based on the ISO19115 standard. XML structured information of the SLD and metadata are stored in an XML database. The data models and the data management system are robust for managing the large quantity of spatial objects, sensor observations, census and document data. The operational WISDOM information system prototype contains modules for data management, automatic data integration, and web services for data retrieval, analysis, and distribution. The graphical user interfaces facilitate metadata cataloguing, data warehousing, web sensor data analysis and thematic mapping.
A case Study of Applying Object-Relational Persistence in Astronomy Data Archiving
NASA Astrophysics Data System (ADS)
Yao, S. S.; Hiriart, R.; Barg, I.; Warner, P.; Gasson, D.
2005-12-01
The NOAO Science Archive (NSA) team is developing a comprehensive domain model to capture the science data in the archive. Java and an object model derived from the domain model weil address the application layer of the archive system. However, since RDBMS is the best proven technology for data management, the challenge is the paradigm mismatch between the object and the relational models. Transparent object-relational mapping (ORM) persistence is a successful solution to this challenge. In the data modeling and persistence implementation of NSA, we are using Hibernate, a well-accepted ORM tool, to bridge the object model in the business tier and the relational model in the database tier. Thus, the database is isolated from the Java application. The application queries directly on objects using a DBMS-independent object-oriented query API, which frees the application developers from the low level JDBC and SQL so that they can focus on the domain logic. We present the detailed design of the NSA R3 (Release 3) data model and object-relational persistence, including mapping, retrieving and caching. Persistence layer optimization and performance tuning will be analyzed. The system is being built on J2EE, so the integration of Hibernate into the EJB container and the transaction management are also explored.
GALT protein database: querying structural and functional features of GALT enzyme.
d'Acierno, Antonio; Facchiano, Angelo; Marabotti, Anna
2014-09-01
Knowledge of the impact of variations on protein structure can enhance the comprehension of the mechanisms of genetic diseases related to that protein. Here, we present a new version of GALT Protein Database, a Web-accessible data repository for the storage and interrogation of structural effects of variations of the enzyme galactose-1-phosphate uridylyltransferase (GALT), the impairment of which leads to classic Galactosemia, a rare genetic disease. This new version of this database now contains the models of 201 missense variants of GALT enzyme, including heterozygous variants, and it allows users not only to retrieve information about the missense variations affecting this protein, but also to investigate their impact on substrate binding, intersubunit interactions, stability, and other structural features. In addition, it allows the interactive visualization of the models of variants collected into the database. We have developed additional tools to improve the use of the database by nonspecialized users. This Web-accessible database (http://bioinformatica.isa.cnr.it/GALT/GALT2.0) represents a model of tools potentially suitable for application to other proteins that are involved in human pathologies and that are subjected to genetic variations. © 2014 WILEY PERIODICALS, INC.
Database assessment of CMIP5 and hydrological models to determine flood risk areas
NASA Astrophysics Data System (ADS)
Limlahapun, Ponthip; Fukui, Hiromichi
2016-11-01
Solutions for water-related disasters may not be solved with a single scientific method. Based on this premise, we involved logic conceptions, associate sequential result amongst models, and database applications attempting to analyse historical and future scenarios in the context of flooding. The three main models used in this study are (1) the fifth phase of the Coupled Model Intercomparison Project (CMIP5) to derive precipitation; (2) the Integrated Flood Analysis System (IFAS) to extract amount of discharge; and (3) the Hydrologic Engineering Center (HEC) model to generate inundated areas. This research notably focused on integrating data regardless of system-design complexity, and database approaches are significantly flexible, manageable, and well-supported for system data transfer, which makes them suitable for monitoring a flood. The outcome of flood map together with real-time stream data can help local communities identify areas at-risk of flooding in advance.
PDXliver: a database of liver cancer patient derived xenograft mouse models.
He, Sheng; Hu, Bo; Li, Chao; Lin, Ping; Tang, Wei-Guo; Sun, Yun-Fan; Feng, Fang-You-Min; Guo, Wei; Li, Jia; Xu, Yang; Yao, Qian-Lan; Zhang, Xin; Qiu, Shuang-Jian; Zhou, Jian; Fan, Jia; Li, Yi-Xue; Li, Hong; Yang, Xin-Rong
2018-05-09
Liver cancer is the second leading cause of cancer-related deaths and characterized by heterogeneity and drug resistance. Patient-derived xenograft (PDX) models have been widely used in cancer research because they reproduce the characteristics of original tumors. However, the current studies of liver cancer PDX mice are scattered and the number of available PDX models are too small to represent the heterogeneity of liver cancer patients. To improve this situation and to complement available PDX models related resources, here we constructed a comprehensive database, PDXliver, to integrate and analyze liver cancer PDX models. Currently, PDXliver contains 116 PDX models from Chinese liver cancer patients, 51 of them were established by the in-house PDX platform and others were curated from the public literatures. These models are annotated with complete information, including clinical characteristics of patients, genome-wide expression profiles, germline variations, somatic mutations and copy number alterations. Analysis of expression subtypes and mutated genes show that PDXliver represents the diversity of human patients. Another feature of PDXliver is storing drug response data of PDX mice, which makes it possible to explore the association between molecular profiles and drug sensitivity. All data can be accessed via the Browse and Search pages. Additionally, two tools are provided to interactively visualize the omics data of selected PDXs or to compare two groups of PDXs. As far as we known, PDXliver is the first public database of liver cancer PDX models. We hope that this comprehensive resource will accelerate the utility of PDX models and facilitate liver cancer research. The PDXliver database is freely available online at: http://www.picb.ac.cn/PDXliver/.
Mungall, Christopher J; Emmert, David B
2007-07-01
A few years ago, FlyBase undertook to design a new database schema to store Drosophila data. It would fully integrate genomic sequence and annotation data with bibliographic, genetic, phenotypic and molecular data from the literature representing a distillation of the first 100 years of research on this major animal model system. In developing this new integrated schema, FlyBase also made a commitment to ensure that its design was generic, extensible and available as open source, so that it could be employed as the core schema of any model organism data repository, thereby avoiding redundant software development and potentially increasing interoperability. Our question was whether we could create a relational database schema that would be successfully reused. Chado is a relational database schema now being used to manage biological knowledge for a wide variety of organisms, from human to pathogens, especially the classes of information that directly or indirectly can be associated with genome sequences or the primary RNA and protein products encoded by a genome. Biological databases that conform to this schema can interoperate with one another, and with application software from the Generic Model Organism Database (GMOD) toolkit. Chado is distinctive because its design is driven by ontologies. The use of ontologies (or controlled vocabularies) is ubiquitous across the schema, as they are used as a means of typing entities. The Chado schema is partitioned into integrated subschemas (modules), each encapsulating a different biological domain, and each described using representations in appropriate ontologies. To illustrate this methodology, we describe here the Chado modules used for describing genomic sequences. GMOD is a collaboration of several model organism database groups, including FlyBase, to develop a set of open-source software for managing model organism data. The Chado schema is freely distributed under the terms of the Artistic License (http://www.opensource.org/licenses/artistic-license.php) from GMOD (www.gmod.org).
NASA Astrophysics Data System (ADS)
Ehlmann, Bryon K.
Current scientific experiments are often characterized by massive amounts of very complex data and the need for complex data analysis software. Object-oriented database (OODB) systems have the potential of improving the description of the structure and semantics of this data and of integrating the analysis software with the data. This dissertation results from research to enhance OODB functionality and methodology to support scientific databases (SDBs) and, more specifically, to support a nuclear physics experiments database for the Continuous Electron Beam Accelerator Facility (CEBAF). This research to date has identified a number of problems related to the practical application of OODB technology to the conceptual design of the CEBAF experiments database and other SDBs: the lack of a generally accepted OODB design methodology, the lack of a standard OODB model, the lack of a clear conceptual level in existing OODB models, and the limited support in existing OODB systems for many common object relationships inherent in SDBs. To address these problems, the dissertation describes an Object-Relationship Diagram (ORD) and an Object-oriented Database Definition Language (ODDL) that provide tools that allow SDB design and development to proceed systematically and independently of existing OODB systems. These tools define multi-level, conceptual data models for SDB design, which incorporate a simple notation for describing common types of relationships that occur in SDBs. ODDL allows these relationships and other desirable SDB capabilities to be supported by an extended OODB system. A conceptual model of the CEBAF experiments database is presented in terms of ORDs and the ODDL to demonstrate their functionality and use and provide a foundation for future development of experimental nuclear physics software using an OODB approach.
A COSTAR interface using WWW technology.
Rabbani, U.; Morgan, M.; Barnett, O.
1998-01-01
The concentration of industry on modern relational databases has left many nonrelational and proprietary databases without support for integration with new technologies. Emerging interface tools and data-access methodologies can be applied with difficulty to medical record systems which have proprietary data representation. Users of such medical record systems usually must access the clinical content of such record systems with keyboard-intensive and time-consuming interfaces. COSTAR is a legacy ambulatory medical record system developed over 25 years ago that is still popular and extensively used at the Massachusetts General Hospital. We define a model for using middle layer services to extract and cache data from non-relational databases, and present an intuitive World-Wide Web interface to COSTAR. This model has been implemented and successfully piloted in the Internal Medicine Associates at Massachusetts General Hospital. Images Figure 1 Figure 2 Figure 3 Figure 4 PMID:9929310
NASA Astrophysics Data System (ADS)
Ifimov, Gabriela; Pigeau, Grace; Arroyo-Mora, J. Pablo; Soffer, Raymond; Leblanc, George
2017-10-01
In this study the development and implementation of a geospatial database model for the management of multiscale datasets encompassing airborne imagery and associated metadata is presented. To develop the multi-source geospatial database we have used a Relational Database Management System (RDBMS) on a Structure Query Language (SQL) server which was then integrated into ArcGIS and implemented as a geodatabase. The acquired datasets were compiled, standardized, and integrated into the RDBMS, where logical associations between different types of information were linked (e.g. location, date, and instrument). Airborne data, at different processing levels (digital numbers through geocorrected reflectance), were implemented in the geospatial database where the datasets are linked spatially and temporally. An example dataset consisting of airborne hyperspectral imagery, collected for inter and intra-annual vegetation characterization and detection of potential hydrocarbon seepage events over pipeline areas, is presented. Our work provides a model for the management of airborne imagery, which is a challenging aspect of data management in remote sensing, especially when large volumes of data are collected.
Performance Evaluation of NoSQL Databases: A Case Study
2015-02-01
a centralized relational database. The customer decided to consider NoSQL technologies for two specific uses, namely: the primary data store for...17 custom specific 6. FU NoSQL availab data mo arking of data g a specific wo sin benchmark f hmark for tran le workload de o publish meas their...The choice of a particular NoSQL database imposes a specific distributed software architecture and data model, and is a major determinant of the
Recent advances in the compilation of holocene relative Sea-level database in North America
NASA Astrophysics Data System (ADS)
Horton, B.; Vacchi, M.; Engelhart, S. E.; Nikitina, D.
2015-12-01
Reconstruction of relative sea level (RSL) has implications for investigation of crustal movements, calibration of earth rheology models and the reconstruction of ice sheets. In recent years, efforts were made to create RSL databases following a standardized methodology. These regional databases provided a framework for developing our understanding of the primary mechanisms of RSL change since the Last Glacial Maximum and a long-term baseline against which to gauge changes in sea-level during the 20th century and forecasts for the 21st. Here we present two quality-controlled Holocene RSL database compiled for North America. Along the Pacific coast of North America (British Columbia, Canada to California, USA), our re-evaluation of sea-level indicators from geological and archaeological investigations yield 841 RSL data-points mainly from salt and freshwater wetlands or adjacent estuarine sediment as well as from isolation basin. Along the Atlantic coast of North America (Hudson Bay, Canada to South Carolina, USA), we are currently compiling a database including more than 2000 RSL data-points from isolation basin, salt and freshwater wetlands, beach ridges and intratidal deposits. We outline the difficulties and solutions we made to compile databases in such different depostional environment. We address complex tectonics and the framework to compare such large variability of RSL data-point. We discuss the implications of our results for the glacio-isostatic adjustment (GIA) models in the two studied regions.
Liao, Stephen Shaoyi; Wang, Huai Qing; Li, Qiu Dan; Liu, Wei Yi
2006-06-01
This paper presents a new method for learning Bayesian networks from functional dependencies (FD) and third normal form (3NF) tables in relational databases. The method sets up a linkage between the theory of relational databases and probabilistic reasoning models, which is interesting and useful especially when data are incomplete and inaccurate. The effectiveness and practicability of the proposed method is demonstrated by its implementation in a mobile commerce system.
A Hybrid EAV-Relational Model for Consistent and Scalable Capture of Clinical Research Data.
Khan, Omar; Lim Choi Keung, Sarah N; Zhao, Lei; Arvanitis, Theodoros N
2014-01-01
Many clinical research databases are built for specific purposes and their design is often guided by the requirements of their particular setting. Not only does this lead to issues of interoperability and reusability between research groups in the wider community but, within the project itself, changes and additions to the system could be implemented using an ad hoc approach, which may make the system difficult to maintain and even more difficult to share. In this paper, we outline a hybrid Entity-Attribute-Value and relational model approach for modelling data, in light of frequently changing requirements, which enables the back-end database schema to remain static, improving the extensibility and scalability of an application. The model also facilitates data reuse. The methods used build on the modular architecture previously introduced in the CURe project.
A Comparison of a Relational and Nested-Relational IDEF0 Data Model
1990-03-01
develop, some of the problems inherent iu the hierarchical 5 model were circumvented by the more sophisticated network model. Like the hierarchical model...uetwork database consists of a collection of records connected via links. Unlike the hierarchical model, the network model allows arbitrary graphs as...opposed to trees. Thus, each node may have everal owners and may, in turn, own any number of other records. The network model provides a lchanism by
Artemis and ACT: viewing, annotating and comparing sequences stored in a relational database.
Carver, Tim; Berriman, Matthew; Tivey, Adrian; Patel, Chinmay; Böhme, Ulrike; Barrell, Barclay G; Parkhill, Julian; Rajandream, Marie-Adèle
2008-12-01
Artemis and Artemis Comparison Tool (ACT) have become mainstream tools for viewing and annotating sequence data, particularly for microbial genomes. Since its first release, Artemis has been continuously developed and supported with additional functionality for editing and analysing sequences based on feedback from an active user community of laboratory biologists and professional annotators. Nevertheless, its utility has been somewhat restricted by its limitation to reading and writing from flat files. Therefore, a new version of Artemis has been developed, which reads from and writes to a relational database schema, and allows users to annotate more complex, often large and fragmented, genome sequences. Artemis and ACT have now been extended to read and write directly to the Generic Model Organism Database (GMOD, http://www.gmod.org) Chado relational database schema. In addition, a Gene Builder tool has been developed to provide structured forms and tables to edit coordinates of gene models and edit functional annotation, based on standard ontologies, controlled vocabularies and free text. Artemis and ACT are freely available (under a GPL licence) for download (for MacOSX, UNIX and Windows) at the Wellcome Trust Sanger Institute web sites: http://www.sanger.ac.uk/Software/Artemis/ http://www.sanger.ac.uk/Software/ACT/
Enabling comparative modeling of closely related genomes: Example genus Brucella
Faria, José P.; Edirisinghe, Janaka N.; Davis, James J.; ...
2014-03-08
For many scientific applications, it is highly desirable to be able to compare metabolic models of closely related genomes. In this study, we attempt to raise awareness to the fact that taking annotated genomes from public repositories and using them for metabolic model reconstructions is far from being trivial due to annotation inconsistencies. We are proposing a protocol for comparative analysis of metabolic models on closely related genomes, using fifteen strains of genus Brucella, which contains pathogens of both humans and livestock. This study lead to the identification and subsequent correction of inconsistent annotations in the SEED database, as wellmore » as the identification of 31 biochemical reactions that are common to Brucella, which are not originally identified by automated metabolic reconstructions. We are currently implementing this protocol for improving automated annotations within the SEED database and these improvements have been propagated into PATRIC, Model-SEED, KBase and RAST. This method is an enabling step for the future creation of consistent annotation systems and high-quality model reconstructions that will support in predicting accurate phenotypes such as pathogenicity, media requirements or type of respiration.« less
Enabling comparative modeling of closely related genomes: Example genus Brucella
DOE Office of Scientific and Technical Information (OSTI.GOV)
Faria, José P.; Edirisinghe, Janaka N.; Davis, James J.
For many scientific applications, it is highly desirable to be able to compare metabolic models of closely related genomes. In this study, we attempt to raise awareness to the fact that taking annotated genomes from public repositories and using them for metabolic model reconstructions is far from being trivial due to annotation inconsistencies. We are proposing a protocol for comparative analysis of metabolic models on closely related genomes, using fifteen strains of genus Brucella, which contains pathogens of both humans and livestock. This study lead to the identification and subsequent correction of inconsistent annotations in the SEED database, as wellmore » as the identification of 31 biochemical reactions that are common to Brucella, which are not originally identified by automated metabolic reconstructions. We are currently implementing this protocol for improving automated annotations within the SEED database and these improvements have been propagated into PATRIC, Model-SEED, KBase and RAST. This method is an enabling step for the future creation of consistent annotation systems and high-quality model reconstructions that will support in predicting accurate phenotypes such as pathogenicity, media requirements or type of respiration.« less
Zhang, Liming; Yu, Dongsheng; Shi, Xuezheng; Xu, Shengxiang; Xing, Shihe; Zhao, Yongcong
2014-01-01
Soil organic carbon (SOC) models were often applied to regions with high heterogeneity, but limited spatially differentiated soil information and simulation unit resolution. This study, carried out in the Tai-Lake region of China, defined the uncertainty derived from application of the DeNitrification-DeComposition (DNDC) biogeochemical model in an area with heterogeneous soil properties and different simulation units. Three different resolution soil attribute databases, a polygonal capture of mapping units at 1∶50,000 (P5), a county-based database of 1∶50,000 (C5) and county-based database of 1∶14,000,000 (C14), were used as inputs for regional DNDC simulation. The P5 and C5 databases were combined with the 1∶50,000 digital soil map, which is the most detailed soil database for the Tai-Lake region. The C14 database was combined with 1∶14,000,000 digital soil map, which is a coarse database and is often used for modeling at a national or regional scale in China. The soil polygons of P5 database and county boundaries of C5 and C14 databases were used as basic simulation units. Results project that from 1982 to 2000, total SOC change in the top layer (0–30 cm) of the 2.3 M ha of paddy soil in the Tai-Lake region was +1.48 Tg C, −3.99 Tg C and −15.38 Tg C based on P5, C5 and C14 databases, respectively. With the total SOC change as modeled with P5 inputs as the baseline, which is the advantages of using detailed, polygon-based soil dataset, the relative deviation of C5 and C14 were 368% and 1126%, respectively. The comparison illustrates that DNDC simulation is strongly influenced by choice of fundamental geographic resolution as well as input soil attribute detail. The results also indicate that improving the framework of DNDC is essential in creating accurate models of the soil carbon cycle. PMID:24523922
Yang, Xiaohuan; Huang, Yaohuan; Dong, Pinliang; Jiang, Dong; Liu, Honghui
2009-01-01
The spatial distribution of population is closely related to land use and land cover (LULC) patterns on both regional and global scales. Population can be redistributed onto geo-referenced square grids according to this relation. In the past decades, various approaches to monitoring LULC using remote sensing and Geographic Information Systems (GIS) have been developed, which makes it possible for efficient updating of geo-referenced population data. A Spatial Population Updating System (SPUS) is developed for updating the gridded population database of China based on remote sensing, GIS and spatial database technologies, with a spatial resolution of 1 km by 1 km. The SPUS can process standard Moderate Resolution Imaging Spectroradiometer (MODIS L1B) data integrated with a Pattern Decomposition Method (PDM) and an LULC-Conversion Model to obtain patterns of land use and land cover, and provide input parameters for a Population Spatialization Model (PSM). The PSM embedded in SPUS is used for generating 1 km by 1 km gridded population data in each population distribution region based on natural and socio-economic variables. Validation results from finer township-level census data of Yishui County suggest that the gridded population database produced by the SPUS is reliable.
Cammarota, M; Huppes, V; Gaia, S; Degoulet, P
1998-01-01
The development of Health Information Systems is widely determined by the establishment of the underlying information models. An Object-Oriented Matrix Model (OOMM) is described which target is to facilitate the integration of the overall health system. The model is based on information modules named micro-databases that are structured in a three-dimensional network: planning, health structures and information systems. The modelling tool has been developed as a layer on top of a relational database system. A visual browser facilitates the development and maintenance of the information model. The modelling approach has been applied to the Brasilia University Hospital since 1991. The extension of the modelling approach to the Brasilia regional health system is considered.
Optical components damage parameters database system
NASA Astrophysics Data System (ADS)
Tao, Yizheng; Li, Xinglan; Jin, Yuquan; Xie, Dongmei; Tang, Dingyong
2012-10-01
Optical component is the key to large-scale laser device developed by one of its load capacity is directly related to the device output capacity indicators, load capacity depends on many factors. Through the optical components will damage parameters database load capacity factors of various digital, information technology, for the load capacity of optical components to provide a scientific basis for data support; use of business processes and model-driven approach, the establishment of component damage parameter information model and database systems, system application results that meet the injury test optical components business processes and data management requirements of damage parameters, component parameters of flexible, configurable system is simple, easy to use, improve the efficiency of the optical component damage test.
Object-oriented structures supporting remote sensing databases
NASA Technical Reports Server (NTRS)
Wichmann, Keith; Cromp, Robert F.
1995-01-01
Object-oriented databases show promise for modeling the complex interrelationships pervasive in scientific domains. To examine the utility of this approach, we have developed an Intelligent Information Fusion System based on this technology, and applied it to the problem of managing an active repository of remotely-sensed satellite scenes. The design and implementation of the system is compared and contrasted with conventional relational database techniques, followed by a presentation of the underlying object-oriented data structures used to enable fast indexing into the data holdings.
Modeling Real-Time Applications with Reusable Design Patterns
NASA Astrophysics Data System (ADS)
Rekhis, Saoussen; Bouassida, Nadia; Bouaziz, Rafik
Real-Time (RT) applications, which manipulate important volumes of data, need to be managed with RT databases that deal with time-constrained data and time-constrained transactions. In spite of their numerous advantages, RT databases development remains a complex task, since developers must study many design issues related to the RT domain. In this paper, we tackle this problem by proposing RT design patterns that allow the modeling of structural and behavioral aspects of RT databases. We show how RT design patterns can provide design assistance through architecture reuse of reoccurring design problems. In addition, we present an UML profile that represents patterns and facilitates further their reuse. This profile proposes, on one hand, UML extensions allowing to model the variability of patterns in the RT context and, on another hand, extensions inspired from the MARTE (Modeling and Analysis of Real-Time Embedded systems) profile.
Value of shared preclinical safety studies - The eTOX database.
Briggs, Katharine; Barber, Chris; Cases, Montserrat; Marc, Philippe; Steger-Hartmann, Thomas
2015-01-01
A first analysis of a database of shared preclinical safety data for 1214 small molecule drugs and drug candidates extracted from 3970 reports donated by thirteen pharmaceutical companies for the eTOX project (www.etoxproject.eu) is presented. Species, duration of exposure and administration route data were analysed to assess if large enough subsets of homogenous data are available for building in silico predictive models. Prevalence of treatment related effects for the different types of findings recorded were analysed. The eTOX ontology was used to determine the most common treatment-related clinical chemistry and histopathology findings reported in the database. The data were then mined to evaluate sensitivity of established in vivo biomarkers for liver toxicity risk assessment. The value of the database to inform other drug development projects during early drug development is illustrated by a case study.
Database resources of the National Center for Biotechnology Information
Wheeler, David L.; Barrett, Tanya; Benson, Dennis A.; Bryant, Stephen H.; Canese, Kathi; Chetvernin, Vyacheslav; Church, Deanna M.; DiCuccio, Michael; Edgar, Ron; Federhen, Scott; Geer, Lewis Y.; Helmberg, Wolfgang; Kapustin, Yuri; Kenton, David L.; Khovayko, Oleg; Lipman, David J.; Madden, Thomas L.; Maglott, Donna R.; Ostell, James; Pruitt, Kim D.; Schuler, Gregory D.; Schriml, Lynn M.; Sequeira, Edwin; Sherry, Stephen T.; Sirotkin, Karl; Souvorov, Alexandre; Starchenko, Grigory; Suzek, Tugba O.; Tatusov, Roman; Tatusova, Tatiana A.; Wagner, Lukas; Yaschenko, Eugene
2006-01-01
In addition to maintaining the GenBank(R) nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through NCBI's Web site. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central, Entrez Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Electronic PCR, OrfFinder, Spidey, Splign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, Cancer Chromosomes, Entrez Genomes and related tools, the Map Viewer, Model Maker, Evidence Viewer, Clusters of Orthologous Groups, Retroviral Genotyping Tools, HIV-1, Human Protein Interaction Database, SAGEmap, Gene Expression Omnibus, Entrez Probe, GENSAT, Online Mendelian Inheritance in Man, Online Mendelian Inheritance in Animals, the Molecular Modeling Database, the Conserved Domain Database, the Conserved Domain Architecture Retrieval Tool and the PubChem suite of small molecule databases. Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized datasets. All of the resources can be accessed through the NCBI home page at: . PMID:16381840
Database resources of the National Center for Biotechnology Information.
Sayers, Eric W; Barrett, Tanya; Benson, Dennis A; Bolton, Evan; Bryant, Stephen H; Canese, Kathi; Chetvernin, Vyacheslav; Church, Deanna M; Dicuccio, Michael; Federhen, Scott; Feolo, Michael; Fingerman, Ian M; Geer, Lewis Y; Helmberg, Wolfgang; Kapustin, Yuri; Krasnov, Sergey; Landsman, David; Lipman, David J; Lu, Zhiyong; Madden, Thomas L; Madej, Tom; Maglott, Donna R; Marchler-Bauer, Aron; Miller, Vadim; Karsch-Mizrachi, Ilene; Ostell, James; Panchenko, Anna; Phan, Lon; Pruitt, Kim D; Schuler, Gregory D; Sequeira, Edwin; Sherry, Stephen T; Shumway, Martin; Sirotkin, Karl; Slotta, Douglas; Souvorov, Alexandre; Starchenko, Grigory; Tatusova, Tatiana A; Wagner, Lukas; Wang, Yanli; Wilbur, W John; Yaschenko, Eugene; Ye, Jian
2012-01-01
In addition to maintaining the GenBank® nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI Website. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central (PMC), Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Primer-BLAST, COBALT, Splign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, dbVar, Epigenomics, Genome and related tools, the Map Viewer, Model Maker, Evidence Viewer, Trace Archive, Sequence Read Archive, BioProject, BioSample, Retroviral Genotyping Tools, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus (GEO), Probe, Online Mendelian Inheritance in Animals (OMIA), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD), the Conserved Domain Architecture Retrieval Tool (CDART), Biosystems, Protein Clusters and the PubChem suite of small molecule databases. Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.
Interactive Scene Analysis Module - A sensor-database fusion system for telerobotic environments
NASA Technical Reports Server (NTRS)
Cooper, Eric G.; Vazquez, Sixto L.; Goode, Plesent W.
1992-01-01
Accomplishing a task with telerobotics typically involves a combination of operator control/supervision and a 'script' of preprogrammed commands. These commands usually assume that the location of various objects in the task space conform to some internal representation (database) of that task space. The ability to quickly and accurately verify the task environment against the internal database would improve the robustness of these preprogrammed commands. In addition, the on-line initialization and maintenance of a task space database is difficult for operators using Cartesian coordinates alone. This paper describes the Interactive Scene' Analysis Module (ISAM) developed to provide taskspace database initialization and verification utilizing 3-D graphic overlay modelling, video imaging, and laser radar based range imaging. Through the fusion of taskspace database information and image sensor data, a verifiable taskspace model is generated providing location and orientation data for objects in a task space. This paper also describes applications of the ISAM in the Intelligent Systems Research Laboratory (ISRL) at NASA Langley Research Center, and discusses its performance relative to representation accuracy and operator interface efficiency.
Database resources of the National Center for Biotechnology
Wheeler, David L.; Church, Deanna M.; Federhen, Scott; Lash, Alex E.; Madden, Thomas L.; Pontius, Joan U.; Schuler, Gregory D.; Schriml, Lynn M.; Sequeira, Edwin; Tatusova, Tatiana A.; Wagner, Lukas
2003-01-01
In addition to maintaining the GenBank(R) nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides data analysis and retrieval resources for the data in GenBank and other biological data made available through NCBI's Web site. NCBI resources include Entrez, PubMed, PubMed Central (PMC), LocusLink, the NCBITaxonomy Browser, BLAST, BLAST Link (BLink), Electronic PCR (e-PCR), Open Reading Frame (ORF) Finder, References Sequence (RefSeq), UniGene, HomoloGene, ProtEST, Database of Single Nucleotide Polymorphisms (dbSNP), Human/Mouse Homology Map, Cancer Chromosome Aberration Project (CCAP), Entrez Genomes and related tools, the Map Viewer, Model Maker (MM), Evidence Viewer (EV), Clusters of Orthologous Groups (COGs) database, Retroviral Genotyping Tools, SAGEmap, Gene Expression Omnibus (GEO), Online Mendelian Inheritance in Man (OMIM), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD), and the Conserved Domain Architecture Retrieval Tool (CDART). Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of the resources can be accessed through the NCBI home page at: http://www.ncbi.nlm.nih.gov. PMID:12519941
Relax with CouchDB--into the non-relational DBMS era of bioinformatics.
Manyam, Ganiraju; Payton, Michelle A; Roth, Jack A; Abruzzo, Lynne V; Coombes, Kevin R
2012-07-01
With the proliferation of high-throughput technologies, genome-level data analysis has become common in molecular biology. Bioinformaticians are developing extensive resources to annotate and mine biological features from high-throughput data. The underlying database management systems for most bioinformatics software are based on a relational model. Modern non-relational databases offer an alternative that has flexibility, scalability, and a non-rigid design schema. Moreover, with an accelerated development pace, non-relational databases like CouchDB can be ideal tools to construct bioinformatics utilities. We describe CouchDB by presenting three new bioinformatics resources: (a) geneSmash, which collates data from bioinformatics resources and provides automated gene-centric annotations, (b) drugBase, a database of drug-target interactions with a web interface powered by geneSmash, and (c) HapMap-CN, which provides a web interface to query copy number variations from three SNP-chip HapMap datasets. In addition to the web sites, all three systems can be accessed programmatically via web services. Copyright © 2012 Elsevier Inc. All rights reserved.
Assessing the quality of life history information in publicly available databases.
Thorson, James T; Cope, Jason M; Patrick, Wesley S
2014-01-01
Single-species life history parameters are central to ecological research and management, including the fields of macro-ecology, fisheries science, and ecosystem modeling. However, there has been little independent evaluation of the precision and accuracy of the life history values in global and publicly available databases. We therefore develop a novel method based on a Bayesian errors-in-variables model that compares database entries with estimates from local experts, and we illustrate this process by assessing the accuracy and precision of entries in FishBase, one of the largest and oldest life history databases. This model distinguishes biases among seven life history parameters, two types of information available in FishBase (i.e., published values and those estimated from other parameters), and two taxa (i.e., bony and cartilaginous fishes) relative to values from regional experts in the United States, while accounting for additional variance caused by sex- and region-specific life history traits. For published values in FishBase, the model identifies a small positive bias in natural mortality and negative bias in maximum age, perhaps caused by unacknowledged mortality caused by fishing. For life history values calculated by FishBase, the model identified large and inconsistent biases. The model also demonstrates greatest precision for body size parameters, decreased precision for values derived from geographically distant populations, and greatest between-sex differences in age at maturity. We recommend that our bias and precision estimates be used in future errors-in-variables models as a prior on measurement errors. This approach is broadly applicable to global databases of life history traits and, if used, will encourage further development and improvements in these databases.
Hudson, Lawrence N; Newbold, Tim; Contu, Sara; Hill, Samantha L L; Lysenko, Igor; De Palma, Adriana; Phillips, Helen R P; Alhusseini, Tamera I; Bedford, Felicity E; Bennett, Dominic J; Booth, Hollie; Burton, Victoria J; Chng, Charlotte W T; Choimes, Argyrios; Correia, David L P; Day, Julie; Echeverría-Londoño, Susy; Emerson, Susan R; Gao, Di; Garon, Morgan; Harrison, Michelle L K; Ingram, Daniel J; Jung, Martin; Kemp, Victoria; Kirkpatrick, Lucinda; Martin, Callum D; Pan, Yuan; Pask-Hale, Gwilym D; Pynegar, Edwin L; Robinson, Alexandra N; Sanchez-Ortiz, Katia; Senior, Rebecca A; Simmons, Benno I; White, Hannah J; Zhang, Hanbin; Aben, Job; Abrahamczyk, Stefan; Adum, Gilbert B; Aguilar-Barquero, Virginia; Aizen, Marcelo A; Albertos, Belén; Alcala, E L; Del Mar Alguacil, Maria; Alignier, Audrey; Ancrenaz, Marc; Andersen, Alan N; Arbeláez-Cortés, Enrique; Armbrecht, Inge; Arroyo-Rodríguez, Víctor; Aumann, Tom; Axmacher, Jan C; Azhar, Badrul; Azpiroz, Adrián B; Baeten, Lander; Bakayoko, Adama; Báldi, András; Banks, John E; Baral, Sharad K; Barlow, Jos; Barratt, Barbara I P; Barrico, Lurdes; Bartolommei, Paola; Barton, Diane M; Basset, Yves; Batáry, Péter; Bates, Adam J; Baur, Bruno; Bayne, Erin M; Beja, Pedro; Benedick, Suzan; Berg, Åke; Bernard, Henry; Berry, Nicholas J; Bhatt, Dinesh; Bicknell, Jake E; Bihn, Jochen H; Blake, Robin J; Bobo, Kadiri S; Bóçon, Roberto; Boekhout, Teun; Böhning-Gaese, Katrin; Bonham, Kevin J; Borges, Paulo A V; Borges, Sérgio H; Boutin, Céline; Bouyer, Jérémy; Bragagnolo, Cibele; Brandt, Jodi S; Brearley, Francis Q; Brito, Isabel; Bros, Vicenç; Brunet, Jörg; Buczkowski, Grzegorz; Buddle, Christopher M; Bugter, Rob; Buscardo, Erika; Buse, Jörn; Cabra-García, Jimmy; Cáceres, Nilton C; Cagle, Nicolette L; Calviño-Cancela, María; Cameron, Sydney A; Cancello, Eliana M; Caparrós, Rut; Cardoso, Pedro; Carpenter, Dan; Carrijo, Tiago F; Carvalho, Anelena L; Cassano, Camila R; Castro, Helena; Castro-Luna, Alejandro A; Rolando, Cerda B; Cerezo, Alexis; Chapman, Kim Alan; Chauvat, Matthieu; Christensen, Morten; Clarke, Francis M; Cleary, Daniel F R; Colombo, Giorgio; Connop, Stuart P; Craig, Michael D; Cruz-López, Leopoldo; Cunningham, Saul A; D'Aniello, Biagio; D'Cruze, Neil; da Silva, Pedro Giovâni; Dallimer, Martin; Danquah, Emmanuel; Darvill, Ben; Dauber, Jens; Davis, Adrian L V; Dawson, Jeff; de Sassi, Claudio; de Thoisy, Benoit; Deheuvels, Olivier; Dejean, Alain; Devineau, Jean-Louis; Diekötter, Tim; Dolia, Jignasu V; Domínguez, Erwin; Dominguez-Haydar, Yamileth; Dorn, Silvia; Draper, Isabel; Dreber, Niels; Dumont, Bertrand; Dures, Simon G; Dynesius, Mats; Edenius, Lars; Eggleton, Paul; Eigenbrod, Felix; Elek, Zoltán; Entling, Martin H; Esler, Karen J; de Lima, Ricardo F; Faruk, Aisyah; Farwig, Nina; Fayle, Tom M; Felicioli, Antonio; Felton, Annika M; Fensham, Roderick J; Fernandez, Ignacio C; Ferreira, Catarina C; Ficetola, Gentile F; Fiera, Cristina; Filgueiras, Bruno K C; Fırıncıoğlu, Hüseyin K; Flaspohler, David; Floren, Andreas; Fonte, Steven J; Fournier, Anne; Fowler, Robert E; Franzén, Markus; Fraser, Lauchlan H; Fredriksson, Gabriella M; Freire, Geraldo B; Frizzo, Tiago L M; Fukuda, Daisuke; Furlani, Dario; Gaigher, René; Ganzhorn, Jörg U; García, Karla P; Garcia-R, Juan C; Garden, Jenni G; Garilleti, Ricardo; Ge, Bao-Ming; Gendreau-Berthiaume, Benoit; Gerard, Philippa J; Gheler-Costa, Carla; Gilbert, Benjamin; Giordani, Paolo; Giordano, Simonetta; Golodets, Carly; Gomes, Laurens G L; Gould, Rachelle K; Goulson, Dave; Gove, Aaron D; Granjon, Laurent; Grass, Ingo; Gray, Claudia L; Grogan, James; Gu, Weibin; Guardiola, Moisès; Gunawardene, Nihara R; Gutierrez, Alvaro G; Gutiérrez-Lamus, Doris L; Haarmeyer, Daniela H; Hanley, Mick E; Hanson, Thor; Hashim, Nor R; Hassan, Shombe N; Hatfield, Richard G; Hawes, Joseph E; Hayward, Matt W; Hébert, Christian; Helden, Alvin J; Henden, John-André; Henschel, Philipp; Hernández, Lionel; Herrera, James P; Herrmann, Farina; Herzog, Felix; Higuera-Diaz, Diego; Hilje, Branko; Höfer, Hubert; Hoffmann, Anke; Horgan, Finbarr G; Hornung, Elisabeth; Horváth, Roland; Hylander, Kristoffer; Isaacs-Cubides, Paola; Ishida, Hiroaki; Ishitani, Masahiro; Jacobs, Carmen T; Jaramillo, Víctor J; Jauker, Birgit; Hernández, F Jiménez; Johnson, McKenzie F; Jolli, Virat; Jonsell, Mats; Juliani, S Nur; Jung, Thomas S; Kapoor, Vena; Kappes, Heike; Kati, Vassiliki; Katovai, Eric; Kellner, Klaus; Kessler, Michael; Kirby, Kathryn R; Kittle, Andrew M; Knight, Mairi E; Knop, Eva; Kohler, Florian; Koivula, Matti; Kolb, Annette; Kone, Mouhamadou; Kőrösi, Ádám; Krauss, Jochen; Kumar, Ajith; Kumar, Raman; Kurz, David J; Kutt, Alex S; Lachat, Thibault; Lantschner, Victoria; Lara, Francisco; Lasky, Jesse R; Latta, Steven C; Laurance, William F; Lavelle, Patrick; Le Féon, Violette; LeBuhn, Gretchen; Légaré, Jean-Philippe; Lehouck, Valérie; Lencinas, María V; Lentini, Pia E; Letcher, Susan G; Li, Qi; Litchwark, Simon A; Littlewood, Nick A; Liu, Yunhui; Lo-Man-Hung, Nancy; López-Quintero, Carlos A; Louhaichi, Mounir; Lövei, Gabor L; Lucas-Borja, Manuel Esteban; Luja, Victor H; Luskin, Matthew S; MacSwiney G, M Cristina; Maeto, Kaoru; Magura, Tibor; Mallari, Neil Aldrin; Malone, Louise A; Malonza, Patrick K; Malumbres-Olarte, Jagoba; Mandujano, Salvador; Måren, Inger E; Marin-Spiotta, Erika; Marsh, Charles J; Marshall, E J P; Martínez, Eliana; Martínez Pastur, Guillermo; Moreno Mateos, David; Mayfield, Margaret M; Mazimpaka, Vicente; McCarthy, Jennifer L; McCarthy, Kyle P; McFrederick, Quinn S; McNamara, Sean; Medina, Nagore G; Medina, Rafael; Mena, Jose L; Mico, Estefania; Mikusinski, Grzegorz; Milder, Jeffrey C; Miller, James R; Miranda-Esquivel, Daniel R; Moir, Melinda L; Morales, Carolina L; Muchane, Mary N; Muchane, Muchai; Mudri-Stojnic, Sonja; Munira, A Nur; Muoñz-Alonso, Antonio; Munyekenye, B F; Naidoo, Robin; Naithani, A; Nakagawa, Michiko; Nakamura, Akihiro; Nakashima, Yoshihiro; Naoe, Shoji; Nates-Parra, Guiomar; Navarrete Gutierrez, Dario A; Navarro-Iriarte, Luis; Ndang'ang'a, Paul K; Neuschulz, Eike L; Ngai, Jacqueline T; Nicolas, Violaine; Nilsson, Sven G; Noreika, Norbertas; Norfolk, Olivia; Noriega, Jorge Ari; Norton, David A; Nöske, Nicole M; Nowakowski, A Justin; Numa, Catherine; O'Dea, Niall; O'Farrell, Patrick J; Oduro, William; Oertli, Sabine; Ofori-Boateng, Caleb; Oke, Christopher Omamoke; Oostra, Vicencio; Osgathorpe, Lynne M; Otavo, Samuel Eduardo; Page, Navendu V; Paritsis, Juan; Parra-H, Alejandro; Parry, Luke; Pe'er, Guy; Pearman, Peter B; Pelegrin, Nicolás; Pélissier, Raphaël; Peres, Carlos A; Peri, Pablo L; Persson, Anna S; Petanidou, Theodora; Peters, Marcell K; Pethiyagoda, Rohan S; Phalan, Ben; Philips, T Keith; Pillsbury, Finn C; Pincheira-Ulbrich, Jimmy; Pineda, Eduardo; Pino, Joan; Pizarro-Araya, Jaime; Plumptre, A J; Poggio, Santiago L; Politi, Natalia; Pons, Pere; Poveda, Katja; Power, Eileen F; Presley, Steven J; Proença, Vânia; Quaranta, Marino; Quintero, Carolina; Rader, Romina; Ramesh, B R; Ramirez-Pinilla, Martha P; Ranganathan, Jai; Rasmussen, Claus; Redpath-Downing, Nicola A; Reid, J Leighton; Reis, Yana T; Rey Benayas, José M; Rey-Velasco, Juan Carlos; Reynolds, Chevonne; Ribeiro, Danilo Bandini; Richards, Miriam H; Richardson, Barbara A; Richardson, Michael J; Ríos, Rodrigo Macip; Robinson, Richard; Robles, Carolina A; Römbke, Jörg; Romero-Duque, Luz Piedad; Rös, Matthias; Rosselli, Loreta; Rossiter, Stephen J; Roth, Dana S; Roulston, T'ai H; Rousseau, Laurent; Rubio, André V; Ruel, Jean-Claude; Sadler, Jonathan P; Sáfián, Szabolcs; Saldaña-Vázquez, Romeo A; Sam, Katerina; Samnegård, Ulrika; Santana, Joana; Santos, Xavier; Savage, Jade; Schellhorn, Nancy A; Schilthuizen, Menno; Schmiedel, Ute; Schmitt, Christine B; Schon, Nicole L; Schüepp, Christof; Schumann, Katharina; Schweiger, Oliver; Scott, Dawn M; Scott, Kenneth A; Sedlock, Jodi L; Seefeldt, Steven S; Shahabuddin, Ghazala; Shannon, Graeme; Sheil, Douglas; Sheldon, Frederick H; Shochat, Eyal; Siebert, Stefan J; Silva, Fernando A B; Simonetti, Javier A; Slade, Eleanor M; Smith, Jo; Smith-Pardo, Allan H; Sodhi, Navjot S; Somarriba, Eduardo J; Sosa, Ramón A; Soto Quiroga, Grimaldo; St-Laurent, Martin-Hugues; Starzomski, Brian M; Stefanescu, Constanti; Steffan-Dewenter, Ingolf; Stouffer, Philip C; Stout, Jane C; Strauch, Ayron M; Struebig, Matthew J; Su, Zhimin; Suarez-Rubio, Marcela; Sugiura, Shinji; Summerville, Keith S; Sung, Yik-Hei; Sutrisno, Hari; Svenning, Jens-Christian; Teder, Tiit; Threlfall, Caragh G; Tiitsaar, Anu; Todd, Jacqui H; Tonietto, Rebecca K; Torre, Ignasi; Tóthmérész, Béla; Tscharntke, Teja; Turner, Edgar C; Tylianakis, Jason M; Uehara-Prado, Marcio; Urbina-Cardona, Nicolas; Vallan, Denis; Vanbergen, Adam J; Vasconcelos, Heraldo L; Vassilev, Kiril; Verboven, Hans A F; Verdasca, Maria João; Verdú, José R; Vergara, Carlos H; Vergara, Pablo M; Verhulst, Jort; Virgilio, Massimiliano; Vu, Lien Van; Waite, Edward M; Walker, Tony R; Wang, Hua-Feng; Wang, Yanping; Watling, James I; Weller, Britta; Wells, Konstans; Westphal, Catrin; Wiafe, Edward D; Williams, Christopher D; Willig, Michael R; Woinarski, John C Z; Wolf, Jan H D; Wolters, Volkmar; Woodcock, Ben A; Wu, Jihua; Wunderle, Joseph M; Yamaura, Yuichi; Yoshikura, Satoko; Yu, Douglas W; Zaitsev, Andrey S; Zeidler, Juliane; Zou, Fasheng; Collen, Ben; Ewers, Rob M; Mace, Georgina M; Purves, Drew W; Scharlemann, Jörn P W; Purvis, Andy
2017-01-01
The PREDICTS project-Projecting Responses of Ecological Diversity In Changing Terrestrial Systems (www.predicts.org.uk)-has collated from published studies a large, reasonably representative database of comparable samples of biodiversity from multiple sites that differ in the nature or intensity of human impacts relating to land use. We have used this evidence base to develop global and regional statistical models of how local biodiversity responds to these measures. We describe and make freely available this 2016 release of the database, containing more than 3.2 million records sampled at over 26,000 locations and representing over 47,000 species. We outline how the database can help in answering a range of questions in ecology and conservation biology. To our knowledge, this is the largest and most geographically and taxonomically representative database of spatial comparisons of biodiversity that has been collated to date; it will be useful to researchers and international efforts wishing to model and understand the global status of biodiversity.
Database resources of the National Center for Biotechnology Information
Wheeler, David L.; Barrett, Tanya; Benson, Dennis A.; Bryant, Stephen H.; Canese, Kathi; Chetvernin, Vyacheslav; Church, Deanna M.; DiCuccio, Michael; Edgar, Ron; Federhen, Scott; Feolo, Michael; Geer, Lewis Y.; Helmberg, Wolfgang; Kapustin, Yuri; Khovayko, Oleg; Landsman, David; Lipman, David J.; Madden, Thomas L.; Maglott, Donna R.; Miller, Vadim; Ostell, James; Pruitt, Kim D.; Schuler, Gregory D.; Shumway, Martin; Sequeira, Edwin; Sherry, Steven T.; Sirotkin, Karl; Souvorov, Alexandre; Starchenko, Grigory; Tatusov, Roman L.; Tatusova, Tatiana A.; Wagner, Lukas; Yaschenko, Eugene
2008-01-01
In addition to maintaining the GenBank(R) nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data available through NCBI's web site. NCBI resources include Entrez, the Entrez Programming Utilities, My NCBI, PubMed, PubMed Central, Entrez Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link, Electronic PCR, OrfFinder, Spidey, Splign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, Cancer Chromosomes, Entrez Genome, Genome Project and related tools, the Trace, Assembly, and Short Read Archives, the Map Viewer, Model Maker, Evidence Viewer, Clusters of Orthologous Groups, Influenza Viral Resources, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus, Entrez Probe, GENSAT, Database of Genotype and Phenotype, Online Mendelian Inheritance in Man, Online Mendelian Inheritance in Animals, the Molecular Modeling Database, the Conserved Domain Database, the Conserved Domain Architecture Retrieval Tool and the PubChem suite of small molecule databases. Augmenting the web applications are custom implementations of the BLAST program optimized to search specialized data sets. These resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov. PMID:18045790
NASA Technical Reports Server (NTRS)
Morelli, Eugene A.; Cunningham, Kevin; Hill, Melissa A.
2013-01-01
Flight test and modeling techniques were developed for efficiently identifying global aerodynamic models that can be used to accurately simulate stall, upset, and recovery on large transport airplanes. The techniques were developed and validated in a high-fidelity fixed-base flight simulator using a wind-tunnel aerodynamic database, realistic sensor characteristics, and a realistic flight deck representative of a large transport aircraft. Results demonstrated that aerodynamic models for stall, upset, and recovery can be identified rapidly and accurately using relatively simple piloted flight test maneuvers. Stall maneuver predictions and comparisons of identified aerodynamic models with data from the underlying simulation aerodynamic database were used to validate the techniques.
Le, Hoang-Quynh; Tran, Mai-Vu; Dang, Thanh Hai; Ha, Quang-Thuy; Collier, Nigel
2016-07-01
The BioCreative V chemical-disease relation (CDR) track was proposed to accelerate the progress of text mining in facilitating integrative understanding of chemicals, diseases and their relations. In this article, we describe an extension of our system (namely UET-CAM) that participated in the BioCreative V CDR. The original UET-CAM system's performance was ranked fourth among 18 participating systems by the BioCreative CDR track committee. In the Disease Named Entity Recognition and Normalization (DNER) phase, our system employed joint inference (decoding) with a perceptron-based named entity recognizer (NER) and a back-off model with Semantic Supervised Indexing and Skip-gram for named entity normalization. In the chemical-induced disease (CID) relation extraction phase, we proposed a pipeline that includes a coreference resolution module and a Support Vector Machine relation extraction model. The former module utilized a multi-pass sieve to extend entity recall. In this article, the UET-CAM system was improved by adding a 'silver' CID corpus to train the prediction model. This silver standard corpus of more than 50 thousand sentences was automatically built based on the Comparative Toxicogenomics Database (CTD) database. We evaluated our method on the CDR test set. Results showed that our system could reach the state of the art performance with F1 of 82.44 for the DNER task and 58.90 for the CID task. Analysis demonstrated substantial benefits of both the multi-pass sieve coreference resolution method (F1 + 4.13%) and the silver CID corpus (F1 +7.3%).Database URL: SilverCID-The silver-standard corpus for CID relation extraction is freely online available at: https://zenodo.org/record/34530 (doi:10.5281/zenodo.34530). © The Author(s) 2016. Published by Oxford University Press.
NASA Astrophysics Data System (ADS)
Ghiorso, M. S.
2014-12-01
Computational thermodynamics (CT) has now become an essential tool of petrologic and geochemical research. CT is the basis for the construction of phase diagrams, the application of geothermometers and geobarometers, the equilibrium speciation of solutions, the construction of pseudosections, calculations of mass transfer between minerals, melts and fluids, and, it provides a means of estimating materials properties for the evaluation of constitutive relations in fluid dynamical simulations. The practical application of CT to Earth science problems requires data. Data on the thermochemical properties and the equation of state of relevant materials, and data on the relative stability and partitioning of chemical elements between phases as a function of temperature and pressure. These data must be evaluated and synthesized into a self consistent collection of theoretical models and model parameters that is colloquially known as a thermodynamic database. Quantitative outcomes derived from CT reply on the existence, maintenance and integrity of thermodynamic databases. Unfortunately, the community is reliant on too few such databases, developed by a small number of research groups, and mostly under circumstances where refinement and updates to the database lag behind or are unresponsive to need. Given the increasing level of reliance on CT calculations, what is required is a paradigm shift in the way thermodynamic databases are developed, maintained and disseminated. They must become community resources, with flexible and assessable software interfaces that permit easy modification, while at the same time maintaining theoretical integrity and fidelity to the underlying experimental observations. Advances in computational and data science give us the tools and resources to address this problem, allowing CT results to be obtained at the speed of thought, and permitting geochemical and petrological intuition to play a key role in model development and calibration.
Modeling of developmental toxicology presents a significant challenge to computational toxicology due to endpoint complexity and lack of data coverage. These challenges largely account for the relatively few modeling successes using the structure–activity relationship (SAR) parad...
Modeling and Databases for Teaching Petrology
NASA Astrophysics Data System (ADS)
Asher, P.; Dutrow, B.
2003-12-01
With the widespread availability of high-speed computers with massive storage and ready transport capability of large amounts of data, computational and petrologic modeling and the use of databases provide new tools with which to teach petrology. Modeling can be used to gain insights into a system, predict system behavior, describe a system's processes, compare with a natural system or simply to be illustrative. These aspects result from data driven or empirical, analytical or numerical models or the concurrent examination of multiple lines of evidence. At the same time, use of models can enhance core foundations of the geosciences by improving critical thinking skills and by reinforcing prior knowledge gained. However, the use of modeling to teach petrology is dictated by the level of expectation we have for students and their facility with modeling approaches. For example, do we expect students to push buttons and navigate a program, understand the conceptual model and/or evaluate the results of a model. Whatever the desired level of sophistication, specific elements of design should be incorporated into a modeling exercise for effective teaching. These include, but are not limited to; use of the scientific method, use of prior knowledge, a clear statement of purpose and goals, attainable goals, a connection to the natural/actual system, a demonstration that complex heterogeneous natural systems are amenable to analyses by these techniques and, ideally, connections to other disciplines and the larger earth system. Databases offer another avenue with which to explore petrology. Large datasets are available that allow integration of multiple lines of evidence to attack a petrologic problem or understand a petrologic process. These are collected into a database that offers a tool for exploring, organizing and analyzing the data. For example, datasets may be geochemical, mineralogic, experimental and/or visual in nature, covering global, regional to local scales. These datasets provide students with access to large amount of related data through space and time. Goals of the database working group include educating earth scientists about information systems in general, about the importance of metadata about ways of using databases and datasets as educational tools and about the availability of existing datasets and databases. The modeling and databases groups hope to create additional petrologic teaching tools using these aspects and invite the community to contribute to the effort.
Data model and relational database design for the New England Water-Use Data System (NEWUDS)
Tessler, Steven
2001-01-01
The New England Water-Use Data System (NEWUDS) is a database for the storage and retrieval of water-use data. NEWUDS can handle data covering many facets of water use, including (1) tracking various types of water-use activities (withdrawals, returns, transfers, distributions, consumptive-use, wastewater collection, and treatment); (2) the description, classification and location of places and organizations involved in water-use activities; (3) details about measured or estimated volumes of water associated with water-use activities; and (4) information about data sources and water resources associated with water use. In NEWUDS, each water transaction occurs unidirectionally between two site objects, and the sites and conveyances form a water network. The core entities in the NEWUDS model are site, conveyance, transaction/rate, location, and owner. Other important entities include water resources (used for withdrawals and returns), data sources, and aliases. Multiple water-exchange estimates can be stored for individual transactions based on different methods or data sources. Storage of user-defined details is accommodated for several of the main entities. Numerous tables containing classification terms facilitate detailed descriptions of data items and can be used for routine or custom data summarization. NEWUDS handles single-user and aggregate-user water-use data, can be used for large or small water-network projects, and is available as a stand-alone Microsoft? Access database structure. Users can customize and extend the database, link it to other databases, or implement the design in other relational database applications.
Computer Administering of the Psychological Investigations: Set-Relational Representation
NASA Astrophysics Data System (ADS)
Yordzhev, Krasimir
Computer administering of a psychological investigation is the computer representation of the entire procedure of psychological assessments - test construction, test implementation, results evaluation, storage and maintenance of the developed database, its statistical processing, analysis and interpretation. A mathematical description of psychological assessment with the aid of personality tests is discussed in this article. The set theory and the relational algebra are used in this description. A relational model of data, needed to design a computer system for automation of certain psychological assessments is given. Some finite sets and relation on them, which are necessary for creating a personality psychological test, are described. The described model could be used to develop real software for computer administering of any psychological test and there is full automation of the whole process: test construction, test implementation, result evaluation, storage of the developed database, statistical implementation, analysis and interpretation. A software project for computer administering personality psychological tests is suggested.
Multidimensional Learner Model In Intelligent Learning System
NASA Astrophysics Data System (ADS)
Deliyska, B.; Rozeva, A.
2009-11-01
The learner model in an intelligent learning system (ILS) has to ensure the personalization (individualization) and the adaptability of e-learning in an online learner-centered environment. ILS is a distributed e-learning system whose modules can be independent and located in different nodes (servers) on the Web. This kind of e-learning is achieved through the resources of the Semantic Web and is designed and developed around a course, group of courses or specialty. An essential part of ILS is learner model database which contains structured data about learner profile and temporal status in the learning process of one or more courses. In the paper a learner model position in ILS is considered and a relational database is designed from learner's domain ontology. Multidimensional modeling agent for the source database is designed and resultant learner data cube is presented. Agent's modules are proposed with corresponding algorithms and procedures. Multidimensional (OLAP) analysis guidelines on the resultant learner module for designing dynamic learning strategy have been highlighted.
SPIRE Data-Base Management System
NASA Technical Reports Server (NTRS)
Fuechsel, C. F.
1984-01-01
Spacelab Payload Integration and Rocket Experiment (SPIRE) data-base management system (DBMS) based on relational model of data bases. Data bases typically used for engineering and mission analysis tasks and, unlike most commercially available systems, allow data items and data structures stored in forms suitable for direct analytical computation. SPIRE DBMS designed to support data requests from interactive users as well as applications programs.
Zhao, Yongsheng; Zhao, Jihong; Huang, Ying; Zhou, Qing; Zhang, Xiangping; Zhang, Suojiang
2014-08-15
A comprehensive database on toxicity of ionic liquids (ILs) is established. The database includes over 4000 pieces of data. Based on the database, the relationship between IL's structure and its toxicity has been analyzed qualitatively. Furthermore, Quantitative Structure-Activity relationships (QSAR) model is conducted to predict the toxicities (EC50 values) of various ILs toward the Leukemia rat cell line IPC-81. Four parameters selected by the heuristic method (HM) are used to perform the studies of multiple linear regression (MLR) and support vector machine (SVM). The squared correlation coefficient (R(2)) and the root mean square error (RMSE) of training sets by two QSAR models are 0.918 and 0.959, 0.258 and 0.179, respectively. The prediction R(2) and RMSE of QSAR test sets by MLR model are 0.892 and 0.329, by SVM model are 0.958 and 0.234, respectively. The nonlinear model developed by SVM algorithm is much outperformed MLR, which indicates that SVM model is more reliable in the prediction of toxicity of ILs. This study shows that increasing the relative number of O atoms of molecules leads to decrease in the toxicity of ILs. Copyright © 2014 Elsevier B.V. All rights reserved.
A UML Profile for Developing Databases that Conform to the Third Manifesto
NASA Astrophysics Data System (ADS)
Eessaar, Erki
The Third Manifesto (TTM) presents the principles of a relational database language that is free of deficiencies and ambiguities of SQL. There are database management systems that are created according to TTM. Developers need tools that support the development of databases by using these database management systems. UML is a widely used visual modeling language. It provides built-in extension mechanism that makes it possible to extend UML by creating profiles. In this paper, we introduce a UML profile for designing databases that correspond to the rules of TTM. We created the first version of the profile by translating existing profiles of SQL database design. After that, we extended and improved the profile. We implemented the profile by using UML CASE system StarUML™. We present an example of using the new profile. In addition, we describe problems that occurred during the profile development.
Jeffries, D J; Donkor, S; Brookes, R H; Fox, A; Hill, P C
2004-09-01
The data requirements of a large multidisciplinary tuberculosis case contact study are complex. We describe an ACCESS-based relational database system that meets our rigorous requirements for data entry and validation, while being user-friendly, flexible, exportable, and easy to install on a network or stand alone system. This includes the development of a double data entry package for epidemiology and laboratory data, semi-automated entry of ELISPOT data directly from the plate reader, and a suite of new programmes for the manipulation and integration of flow cytometry data. The double entered epidemiology and immunology databases are combined into a separate database, providing a near-real-time analysis of immuno-epidemiological data, allowing important trends to be identified early and major decisions about the study to be made and acted on. This dynamic data management model is portable and can easily be applied to other studies.
Zhang, Qingzhou; Yang, Bo; Chen, Xujiao; Xu, Jing; Mei, Changlin; Mao, Zhiguo
2014-01-01
We present a bioinformatics database named Renal Gene Expression Database (RGED), which contains comprehensive gene expression data sets from renal disease research. The web-based interface of RGED allows users to query the gene expression profiles in various kidney-related samples, including renal cell lines, human kidney tissues and murine model kidneys. Researchers can explore certain gene profiles, the relationships between genes of interests and identify biomarkers or even drug targets in kidney diseases. The aim of this work is to provide a user-friendly utility for the renal disease research community to query expression profiles of genes of their own interest without the requirement of advanced computational skills. Website is implemented in PHP, R, MySQL and Nginx and freely available from http://rged.wall-eva.net. http://rged.wall-eva.net. © The Author(s) 2014. Published by Oxford University Press.
Image-Based Airborne LiDAR Point Cloud Encoding for 3d Building Model Retrieval
NASA Astrophysics Data System (ADS)
Chen, Yi-Chen; Lin, Chao-Hung
2016-06-01
With the development of Web 2.0 and cyber city modeling, an increasing number of 3D models have been available on web-based model-sharing platforms with many applications such as navigation, urban planning, and virtual reality. Based on the concept of data reuse, a 3D model retrieval system is proposed to retrieve building models similar to a user-specified query. The basic idea behind this system is to reuse these existing 3D building models instead of reconstruction from point clouds. To efficiently retrieve models, the models in databases are compactly encoded by using a shape descriptor generally. However, most of the geometric descriptors in related works are applied to polygonal models. In this study, the input query of the model retrieval system is a point cloud acquired by Light Detection and Ranging (LiDAR) systems because of the efficient scene scanning and spatial information collection. Using Point clouds with sparse, noisy, and incomplete sampling as input queries is more difficult than that by using 3D models. Because that the building roof is more informative than other parts in the airborne LiDAR point cloud, an image-based approach is proposed to encode both point clouds from input queries and 3D models in databases. The main goal of data encoding is that the models in the database and input point clouds can be consistently encoded. Firstly, top-view depth images of buildings are generated to represent the geometry surface of a building roof. Secondly, geometric features are extracted from depth images based on height, edge and plane of building. Finally, descriptors can be extracted by spatial histograms and used in 3D model retrieval system. For data retrieval, the models are retrieved by matching the encoding coefficients of point clouds and building models. In experiments, a database including about 900,000 3D models collected from the Internet is used for evaluation of data retrieval. The results of the proposed method show a clear superiority over related methods.
Ten Years Experience In Geo-Databases For Linear Facilities Risk Assessment (Lfra)
NASA Astrophysics Data System (ADS)
Oboni, F.
2003-04-01
Keywords: geo-environmental, database, ISO14000, management, decision-making, risk, pipelines, roads, railroads, loss control, SAR, hazard identification ABSTRACT: During the past decades, characterized by the development of the Risk Management (RM) culture, a variety of different RM models have been proposed by governmental agencies in various parts of the world. The most structured models appear to have originated in the field of environmental RM. These models are briefly reviewed in the first section of the paper focusing the attention on the difference between Hazard Management and Risk Management and the need to use databases in order to allow retrieval of specific information and effective updating. The core of the paper reviews a number of different RM approaches, based on extensions of geo-databases, specifically developed for linear facilities (LF) in transportation corridors since the early 90s in Switzerland, Italy, Canada, the US and South America. The applications are compared in terms of methodology, capabilities and resources necessary to their implementation. The paper then focuses the attention on the level of detail that applications and related data have to attain. Common pitfalls related to decision making based on hazards rather than on risks are discussed. The paper focuses the last sections on the description of the next generation of linear facility RA application, including examples of results and discussion of future methodological research. It is shown that geo-databases should be linked to loss control and accident reports in order to maximize their benefits. The links between RA and ISO 14000 (environmental management code) are explicitly considered.
Context indexing of digital cardiac ultrasound records in PACS
NASA Astrophysics Data System (ADS)
Lobodzinski, S. Suave; Meszaros, Georg N.
1998-07-01
Recent wide adoption of the DICOM 3.0 standard by ultrasound equipment vendors created a need for practical clinical implementations of cardiac imaging study visualization, management and archiving, DICOM 3.0 defines only a logical and physical format for exchanging image data (still images, video, patient and study demographics). All DICOM compliant imaging studies must presently be archived on a 650 Mb recordable compact disk. This is a severe limitation for ultrasound applications where studies of 3 to 10 minutes long are a common practice. In addition, DICOM digital echocardiography objects require physiological signal indexing, content segmentation and characterization. Since DICOM 3.0 is an interchange standard only, it does not define how to database composite video objects. The goal of this research was therefore to address the issues of efficient storage, retrieval and management of DICOM compliant cardiac video studies in a distributed PACS environment. Our Web based implementation has the advantage of accommodating both DICOM defined entity-relation modules (equipment data, patient data, video format, etc.) in standard relational database tables and digital indexed video with its attributes in an object relational database. Object relational data model facilitates content indexing of full motion cardiac imaging studies through bi-directional hyperlink generation that tie searchable video attributes and related objects to individual video frames in the temporal domain. Benefits realized from use of bi-directionally hyperlinked data models in an object relational database include: (1) real time video indexing during image acquisition, (2) random access and frame accurate instant playback of previously recorded full motion imaging data, and (3) time savings from faster and more accurate access to data through multiple navigation mechanisms such as multidimensional queries on an index, queries on a hyperlink attribute, free search and browsing.
Development of New Jersey rates for the NJCMS incident delay model.
DOT National Transportation Integrated Search
2012-09-01
This study developed a working database for calculating incident rates and related delay measures, which contains incident related data collected from various data sources, such as the New Jersey Department of Transportation (NJDOT) Crash Records, Tr...
[A relational database to store Poison Centers calls].
Barelli, Alessandro; Biondi, Immacolata; Tafani, Chiara; Pellegrini, Aristide; Soave, Maurizio; Gaspari, Rita; Annetta, Maria Giuseppina
2006-01-01
Italian Poison Centers answer to approximately 100,000 calls per year. Potentially, this activity is a huge source of data for toxicovigilance and for syndromic surveillance. During the last decade, surveillance systems for early detection of outbreaks have drawn the attention of public health institutions due to the threat of terrorism and high-profile disease outbreaks. Poisoning surveillance needs the ongoing, systematic collection, analysis, interpretation, and dissemination of harmonised data about poisonings from all Poison Centers for use in public health action to reduce morbidity and mortality and to improve health. The entity-relationship model for a Poison Center relational database is extremely complex and not studied in detail. For this reason, not harmonised data collection happens among Italian Poison Centers. Entities are recognizable concepts, either concrete or abstract, such as patients and poisons, or events which have relevance to the database, such as calls. Connectivity and cardinality of relationships are complex as well. A one-to-many relationship exist between calls and patients: for one instance of entity calls, there are zero, one, or many instances of entity patients. At the same time, a one-to-many relationship exist between patients and poisons: for one instance of entity patients, there are zero, one, or many instances of entity poisons. This paper shows a relational model for a poison center database which allows the harmonised data collection of poison centers calls.
BNDB - the Biochemical Network Database.
Küntzer, Jan; Backes, Christina; Blum, Torsten; Gerasch, Andreas; Kaufmann, Michael; Kohlbacher, Oliver; Lenhof, Hans-Peter
2007-10-02
Technological advances in high-throughput techniques and efficient data acquisition methods have resulted in a massive amount of life science data. The data is stored in numerous databases that have been established over the last decades and are essential resources for scientists nowadays. However, the diversity of the databases and the underlying data models make it difficult to combine this information for solving complex problems in systems biology. Currently, researchers typically have to browse several, often highly focused, databases to obtain the required information. Hence, there is a pressing need for more efficient systems for integrating, analyzing, and interpreting these data. The standardization and virtual consolidation of the databases is a major challenge resulting in a unified access to a variety of data sources. We present the Biochemical Network Database (BNDB), a powerful relational database platform, allowing a complete semantic integration of an extensive collection of external databases. BNDB is built upon a comprehensive and extensible object model called BioCore, which is powerful enough to model most known biochemical processes and at the same time easily extensible to be adapted to new biological concepts. Besides a web interface for the search and curation of the data, a Java-based viewer (BiNA) provides a powerful platform-independent visualization and navigation of the data. BiNA uses sophisticated graph layout algorithms for an interactive visualization and navigation of BNDB. BNDB allows a simple, unified access to a variety of external data sources. Its tight integration with the biochemical network library BN++ offers the possibility for import, integration, analysis, and visualization of the data. BNDB is freely accessible at http://www.bndb.org.
Siegel, J; Kirkland, D
1991-01-01
The Composite Health Care System (CHCS), a MUMPS-based hospital information system (HIS), has evolved from the Decentralized Hospital Computer Program (DHCP) installed within VA Hospitals. The authors explore the evolution of an ancillary-based system toward an integrated model with a look at its current state and possible future. The history and relationships between orders of different types tie specific patient-related data into a logical and temporal model. Diagrams demonstrate how the database structure has evolved to support clinical needs for integration. It is suggested that a fully integrated model is capable of meeting traditional HIS needs.
AR Based App for Tourist Attraction in ESKİ ÇARŞI (Safranbolu)
NASA Astrophysics Data System (ADS)
Polat, Merve; Rakıp Karaş, İsmail; Kahraman, İdris; Alizadehashrafi, Behnam
2016-10-01
This research is dealing with 3D modeling of historical and heritage landmarks of Safranbolu that are registered by UNESCO. This is an Augmented Reality (AR) based project in order to trigger virtual three-dimensional (3D) models, cultural music, historical photos, artistic features and animated text information. The aim is to propose a GIS-based approach with these features and add to the system as attribute data in a relational database. The database will be available in an AR-based application to provide information for the tourists.
NASA Astrophysics Data System (ADS)
Pecoraro, Gaetano; Calvello, Michele
2017-04-01
In Italy rainfall-induced landslides pose a significant and widespread hazard, resulting in a large number of casualties and enormous economic damages. Mitigation of such a diffuse risk cannot be attained with structural measures only. With respect to the risk to life, early warning systems represent a viable and useful tool for landslide risk mitigation over wide areas. Inventories of rainfall-induced landslides are critical to support investigations of where and when landslides have happened and may occur in the future, i.e. to establish reliable correlations between rainfall characteristics and landslide occurrences. In this work a parametric study has been conducted to evaluate the performance of correlation models between rainfall and landslides over the Italian territory using the "FraneItalia" database, an inventory of landslides retrieved from online Italian journalistic news. The information reported for each record of this database always include: the site of occurrence of the landslides, the date of occurrence, the source of the news. Multiple landslides occurring in the same date, within the same province or region, are inventoried together in one single record of the database, in this case also reporting the number of landslides of the event. Each record the database may also include, if the related information is available: hour of occurrence; typology, volume and material of the landslide; activity phase; effects on people, structures, infrastructures, cars or other elements. The database currently contains six complete years of data (2010-2015), including more than 4000 landslide reports, most of them triggered by rainfall. For the aim of this study, different rainfall-landslides correlation models have been tested by analysing the reported landslides, within all the 144 zones identified by the national civil protection for weather-related warnings in Italy, in relation to satellite-based precipitations estimates from the Global Precipitation Measurement (GPM) NASA mission. This remote sensing database contains gridded precipitation and precipitation-error estimates, with a half-hour temporal resolution and a 0.10-degree spatial resolution, covering most of the earth starting from 2014. It is well known that satellite estimates of rainfall have some limitations in resolving specific rainfall features (e.g., shallow orographic events and short-duration, high-intensity events), yet the temporal and spatial accuracy of the GPM data may be considered adequate in relation to the scale of the analysis and the size of the warning zones used for this study. The results of the parametric analysis conducted herein, although providing some indications on the most relevant rainfall conditions leading to widespread landsliding over a warning zone, must be considered preliminary as they show a very heterogeneous behaviour of the employed rainfall-based warning models over the Italian territory. Nevertheless, they clearly show the strong potential of the continuous multi-year landslide records available from the "FraneItalia" database as an important source of information to evaluate the performance of warning models at regional scale throughout Italy.
PASS2: an automated database of protein alignments organised as structural superfamilies.
Bhaduri, Anirban; Pugalenthi, Ganesan; Sowdhamini, Ramanathan
2004-04-02
The functional selection and three-dimensional structural constraints of proteins in nature often relates to the retention of significant sequence similarity between proteins of similar fold and function despite poor sequence identity. Organization of structure-based sequence alignments for distantly related proteins, provides a map of the conserved and critical regions of the protein universe that is useful for the analysis of folding principles, for the evolutionary unification of protein families and for maximizing the information return from experimental structure determination. The Protein Alignment organised as Structural Superfamily (PASS2) database represents continuously updated, structural alignments for evolutionary related, sequentially distant proteins. An automated and updated version of PASS2 is, in direct correspondence with SCOP 1.63, consisting of sequences having identity below 40% among themselves. Protein domains have been grouped into 628 multi-member superfamilies and 566 single member superfamilies. Structure-based sequence alignments for the superfamilies have been obtained using COMPARER, while initial equivalencies have been derived from a preliminary superposition using LSQMAN or STAMP 4.0. The final sequence alignments have been annotated for structural features using JOY4.0. The database is supplemented with sequence relatives belonging to different genomes, conserved spatially interacting and structural motifs, probabilistic hidden markov models of superfamilies based on the alignments and useful links to other databases. Probabilistic models and sensitive position specific profiles obtained from reliable superfamily alignments aid annotation of remote homologues and are useful tools in structural and functional genomics. PASS2 presents the phylogeny of its members both based on sequence and structural dissimilarities. Clustering of members allows us to understand diversification of the family members. The search engine has been improved for simpler browsing of the database. The database resolves alignments among the structural domains consisting of evolutionarily diverged set of sequences. Availability of reliable sequence alignments of distantly related proteins despite poor sequence identity and single-member superfamilies permit better sampling of structures in libraries for fold recognition of new sequences and for the understanding of protein structure-function relationships of individual superfamilies. PASS2 is accessible at http://www.ncbs.res.in/~faculty/mini/campass/pass2.html
NASA Technical Reports Server (NTRS)
Shearrow, Charles A.
1999-01-01
One of the identified goals of EM3 is to implement virtual manufacturing by the time the year 2000 has ended. To realize this goal of a true virtual manufacturing enterprise the initial development of a machinability database and the infrastructure must be completed. This will consist of the containment of the existing EM-NET problems and developing machine, tooling, and common materials databases. To integrate the virtual manufacturing enterprise with normal day to day operations the development of a parallel virtual manufacturing machinability database, virtual manufacturing database, virtual manufacturing paradigm, implementation/integration procedure, and testable verification models must be constructed. Common and virtual machinability databases will include the four distinct areas of machine tools, available tooling, common machine tool loads, and a materials database. The machine tools database will include the machine envelope, special machine attachments, tooling capacity, location within NASA-JSC or with a contractor, and availability/scheduling. The tooling database will include available standard tooling, custom in-house tooling, tool properties, and availability. The common materials database will include materials thickness ranges, strengths, types, and their availability. The virtual manufacturing databases will consist of virtual machines and virtual tooling directly related to the common and machinability databases. The items to be completed are the design and construction of the machinability databases, virtual manufacturing paradigm for NASA-JSC, implementation timeline, VNC model of one bridge mill and troubleshoot existing software and hardware problems with EN4NET. The final step of this virtual manufacturing project will be to integrate other production sites into the databases bringing JSC's EM3 into a position of becoming a clearing house for NASA's digital manufacturing needs creating a true virtual manufacturing enterprise.
Representing spatial information in a computational model for network management
NASA Technical Reports Server (NTRS)
Blaisdell, James H.; Brownfield, Thomas F.
1994-01-01
While currently available relational database management systems (RDBMS) allow inclusion of spatial information in a data model, they lack tools for presenting this information in an easily comprehensible form. Computer-aided design (CAD) software packages provide adequate functions to produce drawings, but still require manual placement of symbols and features. This project has demonstrated a bridge between the data model of an RDBMS and the graphic display of a CAD system. It is shown that the CAD system can be used to control the selection of data with spatial components from the database and then quickly plot that data on a map display. It is shown that the CAD system can be used to extract data from a drawing and then control the insertion of that data into the database. These demonstrations were successful in a test environment that incorporated many features of known working environments, suggesting that the techniques developed could be adapted for practical use.
WholeCellSimDB: a hybrid relational/HDF database for whole-cell model predictions.
Karr, Jonathan R; Phillips, Nolan C; Covert, Markus W
2014-01-01
Mechanistic 'whole-cell' models are needed to develop a complete understanding of cell physiology. However, extracting biological insights from whole-cell models requires running and analyzing large numbers of simulations. We developed WholeCellSimDB, a database for organizing whole-cell simulations. WholeCellSimDB was designed to enable researchers to search simulation metadata to identify simulations for further analysis, and quickly slice and aggregate simulation results data. In addition, WholeCellSimDB enables users to share simulations with the broader research community. The database uses a hybrid relational/hierarchical data format architecture to efficiently store and retrieve both simulation setup metadata and results data. WholeCellSimDB provides a graphical Web-based interface to search, browse, plot and export simulations; a JavaScript Object Notation (JSON) Web service to retrieve data for Web-based visualizations; a command-line interface to deposit simulations; and a Python API to retrieve data for advanced analysis. Overall, we believe WholeCellSimDB will help researchers use whole-cell models to advance basic biological science and bioengineering. http://www.wholecellsimdb.org SOURCE CODE REPOSITORY: URL: http://github.com/CovertLab/WholeCellSimDB. © The Author(s) 2014. Published by Oxford University Press.
Database resources of the National Center for Biotechnology Information
Sayers, Eric W.; Barrett, Tanya; Benson, Dennis A.; Bolton, Evan; Bryant, Stephen H.; Canese, Kathi; Chetvernin, Vyacheslav; Church, Deanna M.; DiCuccio, Michael; Federhen, Scott; Feolo, Michael; Fingerman, Ian M.; Geer, Lewis Y.; Helmberg, Wolfgang; Kapustin, Yuri; Krasnov, Sergey; Landsman, David; Lipman, David J.; Lu, Zhiyong; Madden, Thomas L.; Madej, Tom; Maglott, Donna R.; Marchler-Bauer, Aron; Miller, Vadim; Karsch-Mizrachi, Ilene; Ostell, James; Panchenko, Anna; Phan, Lon; Pruitt, Kim D.; Schuler, Gregory D.; Sequeira, Edwin; Sherry, Stephen T.; Shumway, Martin; Sirotkin, Karl; Slotta, Douglas; Souvorov, Alexandre; Starchenko, Grigory; Tatusova, Tatiana A.; Wagner, Lukas; Wang, Yanli; Wilbur, W. John; Yaschenko, Eugene; Ye, Jian
2012-01-01
In addition to maintaining the GenBank® nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI Website. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central (PMC), Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Primer-BLAST, COBALT, Splign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, dbVar, Epigenomics, Genome and related tools, the Map Viewer, Model Maker, Evidence Viewer, Trace Archive, Sequence Read Archive, BioProject, BioSample, Retroviral Genotyping Tools, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus (GEO), Probe, Online Mendelian Inheritance in Animals (OMIA), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD), the Conserved Domain Architecture Retrieval Tool (CDART), Biosystems, Protein Clusters and the PubChem suite of small molecule databases. Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov. PMID:22140104
Database resources of the National Center for Biotechnology Information
2013-01-01
In addition to maintaining the GenBank® nucleic acid sequence database, the National Center for Biotechnology Information (NCBI, http://www.ncbi.nlm.nih.gov) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI web site. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central, Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Primer-BLAST, COBALT, Splign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, dbVar, Epigenomics, the Genetic Testing Registry, Genome and related tools, the Map Viewer, Model Maker, Evidence Viewer, Trace Archive, Sequence Read Archive, BioProject, BioSample, Retroviral Genotyping Tools, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus, Probe, Online Mendelian Inheritance in Animals, the Molecular Modeling Database, the Conserved Domain Database, the Conserved Domain Architecture Retrieval Tool, Biosystems, Protein Clusters and the PubChem suite of small molecule databases. Augmenting many of the web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of these resources can be accessed through the NCBI home page. PMID:23193264
Database resources of the National Center for Biotechnology Information.
Wheeler, David L; Barrett, Tanya; Benson, Dennis A; Bryant, Stephen H; Canese, Kathi; Chetvernin, Vyacheslav; Church, Deanna M; DiCuccio, Michael; Edgar, Ron; Federhen, Scott; Geer, Lewis Y; Kapustin, Yuri; Khovayko, Oleg; Landsman, David; Lipman, David J; Madden, Thomas L; Maglott, Donna R; Ostell, James; Miller, Vadim; Pruitt, Kim D; Schuler, Gregory D; Sequeira, Edwin; Sherry, Steven T; Sirotkin, Karl; Souvorov, Alexandre; Starchenko, Grigory; Tatusov, Roman L; Tatusova, Tatiana A; Wagner, Lukas; Yaschenko, Eugene
2007-01-01
In addition to maintaining the GenBank nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through NCBI's Web site. NCBI resources include Entrez, the Entrez Programming Utilities, My NCBI, PubMed, PubMed Central, Entrez Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link(BLink), Electronic PCR, OrfFinder, Spidey, Splign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, Cancer Chromosomes, Entrez Genome, Genome Project and related tools, the Trace and Assembly Archives, the Map Viewer, Model Maker, Evidence Viewer, Clusters of Orthologous Groups (COGs), Viral Genotyping Tools, Influenza Viral Resources, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus (GEO), Entrez Probe, GENSAT, Online Mendelian Inheritance in Man (OMIM), Online Mendelian Inheritance in Animals (OMIA), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD), the Conserved Domain Architecture Retrieval Tool (CDART) and the PubChem suite of small molecule databases. Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. These resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.
Database resources of the National Center for Biotechnology Information.
Sayers, Eric W; Barrett, Tanya; Benson, Dennis A; Bryant, Stephen H; Canese, Kathi; Chetvernin, Vyacheslav; Church, Deanna M; DiCuccio, Michael; Edgar, Ron; Federhen, Scott; Feolo, Michael; Geer, Lewis Y; Helmberg, Wolfgang; Kapustin, Yuri; Landsman, David; Lipman, David J; Madden, Thomas L; Maglott, Donna R; Miller, Vadim; Mizrachi, Ilene; Ostell, James; Pruitt, Kim D; Schuler, Gregory D; Sequeira, Edwin; Sherry, Stephen T; Shumway, Martin; Sirotkin, Karl; Souvorov, Alexandre; Starchenko, Grigory; Tatusova, Tatiana A; Wagner, Lukas; Yaschenko, Eugene; Ye, Jian
2009-01-01
In addition to maintaining the GenBank nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI web site. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central, Entrez Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Electronic PCR, OrfFinder, Spidey, Splign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, Cancer Chromosomes, Entrez Genomes and related tools, the Map Viewer, Model Maker, Evidence Viewer, Clusters of Orthologous Groups (COGs), Retroviral Genotyping Tools, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus (GEO), Entrez Probe, GENSAT, Online Mendelian Inheritance in Man (OMIM), Online Mendelian Inheritance in Animals (OMIA), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD), the Conserved Domain Architecture Retrieval Tool (CDART) and the PubChem suite of small molecule databases. Augmenting many of the web applications is custom implementation of the BLAST program optimized to search specialized data sets. All of the resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.
Uncertainty Modeling for Database Design using Intuitionistic and Rough Set Theory
2009-01-01
Definition. An intuitionistic rough relation R is a sub- set of the set cross product P(D1)× P(D2) × · · ·× P( Dm )× Dµ.× Dv. For a specific relation, R...that aj ∈ dij for all j. The interpretation space is the cross product D1× D2 × · · ·× Dm × Dµ× Dv but is limited for a given re- lation R to the set...systems, Journal of Information Science 11 (1985), 77–87. [7] T. Beaubouef and F. Petry, Rough Querying of Crisp Data in Relational Databases, Third
Scholl, Joep H G; van Hunsel, Florence P A M; Hak, Eelko; van Puijenbroek, Eugène P
2018-02-01
The statistical screening of pharmacovigilance databases containing spontaneously reported adverse drug reactions (ADRs) is mainly based on disproportionality analysis. The aim of this study was to improve the efficiency of full database screening using a prediction model-based approach. A logistic regression-based prediction model containing 5 candidate predictors was developed and internally validated using the Summary of Product Characteristics as the gold standard for the outcome. All drug-ADR associations, with the exception of those related to vaccines, with a minimum of 3 reports formed the training data for the model. Performance was based on the area under the receiver operating characteristic curve (AUC). Results were compared with the current method of database screening based on the number of previously analyzed associations. A total of 25 026 unique drug-ADR associations formed the training data for the model. The final model contained all 5 candidate predictors (number of reports, disproportionality, reports from healthcare professionals, reports from marketing authorization holders, Naranjo score). The AUC for the full model was 0.740 (95% CI; 0.734-0.747). The internal validity was good based on the calibration curve and bootstrapping analysis (AUC after bootstrapping = 0.739). Compared with the old method, the AUC increased from 0.649 to 0.740, and the proportion of potential signals increased by approximately 50% (from 12.3% to 19.4%). A prediction model-based approach can be a useful tool to create priority-based listings for signal detection in databases consisting of spontaneous ADRs. © 2017 The Authors. Pharmacoepidemiology & Drug Safety Published by John Wiley & Sons Ltd.
Haire-Joshu, Debra; Elliott, Michael; Schermbeck, Rebecca; Taricone, Elsa; Green, Scoie; Brownson, Ross C
2010-07-01
The objective of this study was to develop the Missouri Obesity, Nutrition, and Activity Policy Database, a geographically representative baseline of Missouri's existing obesity-related local policies on healthy eating and physical activity. The database is organized to reflect 7 local environments (government, community, health care, worksite, school, after school, and child care) and to describe the prevalence of obesity-related policies in these environments. We employed a stratified nested cluster design using key informant interviews and review of public records to sample 2,356 sites across the 7 target environments for the presence or absence of obesity-related policies. The school environment had the most policies (88%), followed by after school (47%) and health care (32%). Community, government, and child care environments reported smaller proportions of obesity-related policies but higher rates of funding for these policies. Worksite environments had low numbers of obesity-related policies and low funding levels (17% and 6%, respectively). Sixteen of the sampled counties had high obesity-related policy occurrence; 65 had moderate and 8 had low occurrences. Except in Missouri schools, the presence of obesity-related policies is limited. More obesity-related policies are needed so that people have access to environments that support the model behaviors necessary to halt the obesity epidemic. The Missouri Obesity, Nutrition, and Activity Policy Database provides a benchmark for evaluating progress toward the development of obesity-related policies across multiple environments in Missouri.
Integration of Evidence Base into a Probabilistic Risk Assessment
NASA Technical Reports Server (NTRS)
Saile, Lyn; Lopez, Vilma; Bickham, Grandin; Kerstman, Eric; FreiredeCarvalho, Mary; Byrne, Vicky; Butler, Douglas; Myers, Jerry; Walton, Marlei
2011-01-01
INTRODUCTION: A probabilistic decision support model such as the Integrated Medical Model (IMM) utilizes an immense amount of input data that necessitates a systematic, integrated approach for data collection, and management. As a result of this approach, IMM is able to forecasts medical events, resource utilization and crew health during space flight. METHODS: Inflight data is the most desirable input for the Integrated Medical Model. Non-attributable inflight data is collected from the Lifetime Surveillance for Astronaut Health study as well as the engineers, flight surgeons, and astronauts themselves. When inflight data is unavailable cohort studies, other models and Bayesian analyses are used, in addition to subject matters experts input on occasion. To determine the quality of evidence of a medical condition, the data source is categorized and assigned a level of evidence from 1-5; the highest level is one. The collected data reside and are managed in a relational SQL database with a web-based interface for data entry and review. The database is also capable of interfacing with outside applications which expands capabilities within the database itself. Via the public interface, customers can access a formatted Clinical Findings Form (CLiFF) that outlines the model input and evidence base for each medical condition. Changes to the database are tracked using a documented Configuration Management process. DISSCUSSION: This strategic approach provides a comprehensive data management plan for IMM. The IMM Database s structure and architecture has proven to support additional usages. As seen by the resources utilization across medical conditions analysis. In addition, the IMM Database s web-based interface provides a user-friendly format for customers to browse and download the clinical information for medical conditions. It is this type of functionality that will provide Exploratory Medicine Capabilities the evidence base for their medical condition list. CONCLUSION: The IMM Database in junction with the IMM is helping NASA aerospace program improve the health care and reduce risk for the astronauts crew. Both the database and model will continue to expand to meet customer needs through its multi-disciplinary evidence based approach to managing data. Future expansion could serve as a platform for a Space Medicine Wiki of medical conditions.
Kim, Su Ran; Lee, Hye Won; Jun, Ji Hee; Ko, Byoung-Seob
2017-03-01
Gan Mai Da Zao (GMDZ) decoction is widely used for the treatment of various diseases of the internal organ and of the central nervous system. The aim of this study is to investigate the effects of GMDZ decoction on neuropsychiatric disorders in an animal model. We searched seven databases for randomized animal studies published until April 2015: Pubmed, four Korean databases (DBpia, Oriental Medicine Advanced Searching Integrated System, Korean Studies Information Service System, and Research Information Sharing Service), and one Chinese database (China National Knowledge Infrastructure). The randomized animal studies were included if the effects of GMDZ decoction were tested on neuropsychiatric disorders. All articles were read in full and extracted predefined criteria by two independent reviewers. From a total of 258 hits, six randomized controlled animal studies were included. Five studies used a Sprague Dawley rat model for acute psychological stress, post-traumatic stress disorders, and unpredictable mild stress depression whereas one study used a Kunming mouse model for prenatal depression. The results of the studies showed that GMDZ decoction improved the related outcomes. Regardless of the dose and concentration used, GMDZ decoction significantly improved neuropsychiatric disease-related outcomes in animal models. However, additional systematic and extensive studies should be conducted to establish a strong conclusion.
NADM Conceptual Model 1.0 -- A Conceptual Model for Geologic Map Information
,
2004-01-01
Executive Summary -- The NADM Data Model Design Team was established in 1999 by the North American Geologic Map Data Model Steering Committee (NADMSC) with the purpose of drafting a geologic map data model for consideration as a standard for developing interoperable geologic map-centered databases by state, provincial, and federal geological surveys. The model is designed to be a technology-neutral conceptual model that can form the basis for a web-based interchange format using evolving information technology (e.g., XML, RDF, OWL), and guide implementation of geoscience databases in a common conceptual framework. The intended purpose is to allow geologic information sharing between geologic map data providers and users, independent of local information system implementation. The model emphasizes geoscience concepts and relationships related to information presented on geologic maps. Design has been guided by an informal requirements analysis, documentation of existing databases, technology developments, and other standardization efforts in the geoscience and computer-science communities. A key aspect of the model is the notion that representation of the conceptual framework (ontology) that underlies geologic map data must be part of the model, because this framework changes with time and understanding, and varies between information providers. The top level of the model distinguishes geologic concepts, geologic representation concepts, and metadata. The geologic representation part of the model provides a framework for representing the ontology that underlies geologic map data through a controlled vocabulary, and for establishing the relationships between this vocabulary and a geologic map visualization or portrayal. Top-level geologic classes in the model are Earth material (substance), geologic unit (parts of the Earth), geologic age, geologic structure, fossil, geologic process, geologic relation, and geologic event.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rohatgi, Upendra S.
Nuclear reactor codes require validation with appropriate data representing the plant for specific scenarios. The thermal-hydraulic data is scattered in different locations and in different formats. Some of the data is in danger of being lost. A relational database is being developed to organize the international thermal hydraulic test data for various reactor concepts and different scenarios. At the reactor system level, that data is organized to include separate effect tests and integral effect tests for specific scenarios and corresponding phenomena. The database relies on the phenomena identification sections of expert developed PIRTs. The database will provide a summary ofmore » appropriate data, review of facility information, test description, instrumentation, references for the experimental data and some examples of application of the data for validation. The current database platform includes scenarios for PWR, BWR, VVER, and specific benchmarks for CFD modelling data and is to be expanded to include references for molten salt reactors. There are place holders for high temperature gas cooled reactors, CANDU and liquid metal reactors. This relational database is called The International Experimental Thermal Hydraulic Systems (TIETHYS) database and currently resides at Nuclear Energy Agency (NEA) of the OECD and is freely open to public access. Going forward the database will be extended to include additional links and data as they become available. https://www.oecd-nea.org/tiethysweb/« less
Trezza, Alfonso; Bernini, Andrea; Langella, Andrea; Ascher, David B; Pires, Douglas E V; Sodi, Andrea; Passerini, Ilaria; Pelo, Elisabetta; Rizzo, Stanislao; Niccolai, Neri; Spiga, Ottavia
2017-10-01
The aim of this article is to report the investigation of the structural features of ABCA4, a protein associated with a genetic retinal disease. A new database collecting knowledge of ABCA4 structure may facilitate predictions about the possible functional consequences of gene mutations observed in clinical practice. In order to correlate structural and functional effects of the observed mutations, the structure of mouse P-glycoprotein was used as a template for homology modeling. The obtained structural information and genetic data are the basis of our relational database (ABCA4Database). Sequence variability among all ABCA4-deposited entries was calculated and reported as Shannon entropy score at the residue level. The three-dimensional model of ABCA4 structure was used to locate the spatial distribution of the observed variable regions. Our predictions from structural in silico tools were able to accurately link the functional effects of mutations to phenotype. The development of the ABCA4Database gathers all the available genetic and structural information, yielding a global view of the molecular basis of some retinal diseases. ABCA4 modeled structure provides a molecular basis on which to analyze protein sequence mutations related to genetic retinal disease in order to predict the risk of retinal disease across all possible ABCA4 mutations. Additionally, our ABCA4 predicted structure is a good starting point for the creation of a new data analysis model, appropriate for precision medicine, in order to develop a deeper knowledge network of the disease and to improve the management of patients.
Taylor, Cliff D.; Causey, J. Douglas; Denning, Paul; Hammarstrom, Jane M.; Hayes, Timothy S.; Horton, John D.; Kirschbaum, Michael J.; Parks, Heather L.; Wilson, Anna B.; Wintzer, Niki E.; Zientek, Michael L.
2013-01-01
Chapter 1 of this report summarizes a descriptive model of sediment-hosted stratabound copper deposits. General characteristics and subtypes of sediment-hosted stratabound copper deposits are described based upon worldwide examples. Chapter 2 provides a global database of 170 sediment-hosted copper deposits, along with a statistical evaluation of grade and tonnage data for stratabound deposits, a comparison of stratabound deposits in the CACB with those found elsewhere, a discussion of the distinctive characteristics of the subtypes of sediment-hosted copper deposits that occur within the CACB, and guidelines for using grade and tonnage distributions for assessment of undiscovered resources in sediment-hosted stratabound deposits in the CACB. Chapter 3 presents a new descriptive model of sediment-hosted structurally controlled replacement and vein (SCRV) copper deposits with descriptions of individual deposits of this type in the CACB and elsewhere. Appendix A describes a relational database of tonnage, grade, and other information for more than 100 sediment-hosted copper deposits in the CACB. These data are used to calculate the pre-mining mineral endowment for individual deposits in the CACB and serve as the basis for the grade and tonnage models presented in chapter 2. Appendix B describes three spatial databases (Esri shapefiles) for (1) point locations of more than 500 sediment-hosted copper deposits and prospects, (2) projected surface extent of 86 selected copper ore bodies, and (3) areal extent of 77 open pits, all within the CACB.
CLIGEN: Addressing deficiencies in the generator and its databases
USDA-ARS?s Scientific Manuscript database
CLIGEN is a stochastic generator that estimates daily temperatures, precipitation and other weather related phenomena. It is an intermediate model used by the Water Erosion Prediction Program (WEPP), the Wind Erosion Prediction System (WEPS), and other models that require daily weather observations....
BBN technical memorandum W1291 infrasound model feasibility study
DOE Office of Scientific and Technical Information (OSTI.GOV)
Farrell, T., BBN Systems and Technologies
1998-05-01
The purpose of this study is to determine the need and level of effort required to add existing atmospheric databases and infrasound propagation models to the DOE`s Hydroacoustic Coverage Assessment Model (HydroCAM) [1,2]. The rationale for the study is that the performance of the infrasound monitoring network will be an important factor for both the International Monitoring System (IMS) and US national monitoring capability. Many of the technical issues affecting the design and performance of the infrasound network are directly related to the variability of the atmosphere and the corresponding uncertainties in infrasound propagation. It is clear that the studymore » of these issues will be enhanced by the availability of software tools for easy manipulation and interfacing of various atmospheric databases and infrasound propagation models. In addition, since there are many similarities between propagation in the oceans and in the atmosphere, it is anticipated that much of the software infrastructure developed for hydroacoustic database manipulation and propagation modeling in HydroCAM will be directly extendible to an infrasound capability. The study approach was to talk to the acknowledged domain experts in the infrasound monitoring area to determine: 1. The major technical issues affecting infrasound monitoring network performance. 2. The need for an atmospheric database/infrasound propagation modeling capability similar to HydroCAM. 3. The state of existing infrasound propagation codes and atmospheric databases. 4. A recommended approach for developing the required capabilities. A list of the people who contributed information to this study is provided in Table 1. We also relied on our knowledge of oceanographic and meteorological data sources to determine the availability of atmospheric databases and the feasibility of incorporating this information into the existing HydroCAM geographic database software. This report presents a summary of the need for an integrated infrasound modeling capability in Section 2.0. Section 3.0 provides a recommended approach for developing this capability in two stages; a basic capability and an extended capability. This section includes a discussion of the available static and dynamic databases, and the various modeling tools which are available or could be developed under such a task. The conclusions and recommendations of the study are provided in Section 4.0.« less
In silico analysis of fragile histidine triad involved in regression of carcinoma.
Rasheed, Muhammad Asif; Tariq, Fatima; Afzal, Sara; Mannanv, Shazia
2017-04-01
Hepatocellular carcinoma (HCCa) is a primary malignancy of the liver. Many different proteins are involved in HCCa including insulin growth factor (IGF) II , signal transducers and activators of transcription (STAT) 3, STAT4, mothers against decapentaplegic homolog 4 (SMAD 4), fragile histidine triad (FHIT) and selective internal radiation therapy (SIRT) etc. The present study is based on the bioinformatics analysis of FHIT protein in order to understand the proteomics aspect and improvement of the diagnosis of the disease based on the protein. Different information related to protein were gathered from different databases, including National Centre for Biotechnology Information (NCBI) Gene, Protein and Online Mendelian Inheritance in Man (OMIM) databases, Uniprot database, String database and Kyoto Encyclopedia of Genes and Genomes (KEGG) database. Moreover, the structure of the protein and evaluation of the quality of the structure were included from Easy modeler programme. Hence, this analysis not only helped to gather information related to the protein at one place, but also analysed the structure and quality of the protein to conclude that the protein has a role in carcinoma.
Wang, Lei; Alpert, Kathryn I.; Calhoun, Vince D.; Cobia, Derin J.; Keator, David B.; King, Margaret D.; Kogan, Alexandr; Landis, Drew; Tallis, Marcelo; Turner, Matthew D.; Potkin, Steven G.; Turner, Jessica A.; Ambite, Jose Luis
2015-01-01
SchizConnect (www.schizconnect.org) is built to address the issues of multiple data repositories in schizophrenia neuroimaging studies. It includes a level of mediation—translating across data sources—so that the user can place one query, e.g. for diffusion images from male individuals with schizophrenia, and find out from across participating data sources how many datasets there are, as well as downloading the imaging and related data. The current version handles the Data Usage Agreements across different studies, as well as interpreting database-specific terminologies into a common framework. New data repositories can also be mediated to bring immediate access to existing datasets. Compared with centralized, upload data sharing models, SchizConnect is a unique, virtual database with a focus on schizophrenia and related disorders that can mediate live data as information are being updated at each data source. It is our hope that SchizConnect can facilitate testing new hypotheses through aggregated datasets, promoting discovery related to the mechanisms underlying schizophrenic dysfunction. PMID:26142271
Gacesa, Ranko; Zucko, Jurica; Petursdottir, Solveig K; Gudmundsdottir, Elisabet Eik; Fridjonsson, Olafur H; Diminic, Janko; Long, Paul F; Cullum, John; Hranueli, Daslav; Hreggvidsson, Gudmundur O; Starcevic, Antonio
2017-06-01
The MEGGASENSE platform constructs relational databases of DNA or protein sequences. The default functional analysis uses 14 106 hidden Markov model (HMM) profiles based on sequences in the KEGG database. The Solr search engine allows sophisticated queries and a BLAST search function is also incorporated. These standard capabilities were used to generate the SCATT database from the predicted proteome of Streptomyces cattleya . The implementation of a specialised metagenome database (AMYLOMICS) for bioprospecting of carbohydrate-modifying enzymes is described. In addition to standard assembly of reads, a novel 'functional' assembly was developed, in which screening of reads with the HMM profiles occurs before the assembly. The AMYLOMICS database incorporates additional HMM profiles for carbohydrate-modifying enzymes and it is illustrated how the combination of HMM and BLAST analyses helps identify interesting genes. A variety of different proteome and metagenome databases have been generated by MEGGASENSE.
LAND-deFeND - An innovative database structure for landslides and floods and their consequences.
Napolitano, Elisabetta; Marchesini, Ivan; Salvati, Paola; Donnini, Marco; Bianchi, Cinzia; Guzzetti, Fausto
2018-02-01
Information on historical landslides and floods - collectively called "geo-hydrological hazards - is key to understand the complex dynamics of the events, to estimate the temporal and spatial frequency of damaging events, and to quantify their impact. A number of databases on geo-hydrological hazards and their consequences have been developed worldwide at different geographical and temporal scales. Of the few available database structures that can handle information on both landslides and floods some are outdated and others were not designed to store, organize, and manage information on single phenomena or on the type and monetary value of the damages and the remediation actions. Here, we present the LANDslides and Floods National Database (LAND-deFeND), a new database structure able to store, organize, and manage in a single digital structure spatial information collected from various sources with different accuracy. In designing LAND-deFeND, we defined four groups of entities, namely: nature-related, human-related, geospatial-related, and information-source-related entities that collectively can describe fully the geo-hydrological hazards and their consequences. In LAND-deFeND, the main entities are the nature-related entities, encompassing: (i) the "phenomenon", a single landslide or local inundation, (ii) the "event", which represent the ensemble of the inundations and/or landslides occurred in a conventional geographical area in a limited period, and (iii) the "trigger", which is the meteo-climatic or seismic cause (trigger) of the geo-hydrological hazards. LAND-deFeND maintains the relations between the nature-related entities and the human-related entities even where the information is missing partially. The physical model of the LAND-deFeND contains 32 tables, including nine input tables, 21 dictionary tables, and two association tables, and ten views, including specific views that make the database structure compliant with the EC INSPIRE and the Floods Directives. The LAND-deFeND database structure is open, and freely available from http://geomorphology.irpi.cnr.it/tools. Copyright © 2017 The Authors. Published by Elsevier Ltd.. All rights reserved.
High-Performance Data Analytics Beyond the Relational and Graph Data Models with GEMS
DOE Office of Scientific and Technical Information (OSTI.GOV)
Castellana, Vito G.; Minutoli, Marco; Bhatt, Shreyansh
Graphs represent an increasingly popular data model for data-analytics, since they can naturally represent relationships and interactions between entities. Relational databases and their pure table-based data model are not well suitable to store and process sparse data. Consequently, graph databases have gained interest in the last few years and the Resource Description Framework (RDF) became the standard data model for graph data. Nevertheless, while RDF is well suited to analyze the relationships between the entities, it is not efficient in representing their attributes and properties. In this work we propose the adoption of a new hybrid data model, based onmore » attributed graphs, that aims at overcoming the limitations of the pure relational and graph data models. We present how we have re-designed the GEMS data-analytics framework to fully take advantage of the proposed hybrid data model. To improve analysts productivity, in addition to a C++ API for applications development, we adopt GraQL as input query language. We validate our approach implementing a set of queries on net-flow data and we compare our framework performance against Neo4j. Experimental results show significant performance improvement over Neo4j, up to several orders of magnitude when increasing the size of the input data.« less
CORAL Server and CORAL Server Proxy: Scalable Access to Relational Databases from CORAL Applications
DOE Office of Scientific and Technical Information (OSTI.GOV)
Valassi, A.; /CERN; Bartoldus, R.
The CORAL software is widely used at CERN by the LHC experiments to access the data they store on relational databases, such as Oracle. Two new components have recently been added to implement a model involving a middle tier 'CORAL server' deployed close to the database and a tree of 'CORAL server proxies', providing data caching and multiplexing, deployed close to the client. A first implementation of the two new components, released in the summer 2009, is now deployed in the ATLAS online system to read the data needed by the High Level Trigger, allowing the configuration of a farmmore » of several thousand processes. This paper reviews the architecture of the software, its development status and its usage in ATLAS.« less
Accessing and distributing EMBL data using CORBA (common object request broker architecture).
Wang, L; Rodriguez-Tomé, P; Redaschi, N; McNeil, P; Robinson, A; Lijnzaad, P
2000-01-01
The EMBL Nucleotide Sequence Database is a comprehensive database of DNA and RNA sequences and related information traditionally made available in flat-file format. Queries through tools such as SRS (Sequence Retrieval System) also return data in flat-file format. Flat files have a number of shortcomings, however, and the resources therefore currently lack a flexible environment to meet individual researchers' needs. The Object Management Group's common object request broker architecture (CORBA) is an industry standard that provides platform-independent programming interfaces and models for portable distributed object-oriented computing applications. Its independence from programming languages, computing platforms and network protocols makes it attractive for developing new applications for querying and distributing biological data. A CORBA infrastructure developed by EMBL-EBI provides an efficient means of accessing and distributing EMBL data. The EMBL object model is defined such that it provides a basis for specifying interfaces in interface definition language (IDL) and thus for developing the CORBA servers. The mapping from the object model to the relational schema in the underlying Oracle database uses the facilities provided by PersistenceTM, an object/relational tool. The techniques of developing loaders and 'live object caching' with persistent objects achieve a smart live object cache where objects are created on demand. The objects are managed by an evictor pattern mechanism. The CORBA interfaces to the EMBL database address some of the problems of traditional flat-file formats and provide an efficient means for accessing and distributing EMBL data. CORBA also provides a flexible environment for users to develop their applications by building clients to our CORBA servers, which can be integrated into existing systems.
Accessing and distributing EMBL data using CORBA (common object request broker architecture)
Wang, Lichun; Rodriguez-Tomé, Patricia; Redaschi, Nicole; McNeil, Phil; Robinson, Alan; Lijnzaad, Philip
2000-01-01
Background: The EMBL Nucleotide Sequence Database is a comprehensive database of DNA and RNA sequences and related information traditionally made available in flat-file format. Queries through tools such as SRS (Sequence Retrieval System) also return data in flat-file format. Flat files have a number of shortcomings, however, and the resources therefore currently lack a flexible environment to meet individual researchers' needs. The Object Management Group's common object request broker architecture (CORBA) is an industry standard that provides platform-independent programming interfaces and models for portable distributed object-oriented computing applications. Its independence from programming languages, computing platforms and network protocols makes it attractive for developing new applications for querying and distributing biological data. Results: A CORBA infrastructure developed by EMBL-EBI provides an efficient means of accessing and distributing EMBL data. The EMBL object model is defined such that it provides a basis for specifying interfaces in interface definition language (IDL) and thus for developing the CORBA servers. The mapping from the object model to the relational schema in the underlying Oracle database uses the facilities provided by PersistenceTM, an object/relational tool. The techniques of developing loaders and 'live object caching' with persistent objects achieve a smart live object cache where objects are created on demand. The objects are managed by an evictor pattern mechanism. Conclusions: The CORBA interfaces to the EMBL database address some of the problems of traditional flat-file formats and provide an efficient means for accessing and distributing EMBL data. CORBA also provides a flexible environment for users to develop their applications by building clients to our CORBA servers, which can be integrated into existing systems. PMID:11178259
Aßmann, C
2016-06-01
Besides large efforts regarding field work, provision of valid databases requires statistical and informational infrastructure to enable long-term access to longitudinal data sets on height, weight and related issues. To foster use of longitudinal data sets within the scientific community, provision of valid databases has to address data-protection regulations. It is, therefore, of major importance to hinder identifiability of individuals from publicly available databases. To reach this goal, one possible strategy is to provide a synthetic database to the public allowing for pretesting strategies for data analysis. The synthetic databases can be established using multiple imputation tools. Given the approval of the strategy, verification is based on the original data. Multiple imputation by chained equations is illustrated to facilitate provision of synthetic databases as it allows for capturing a wide range of statistical interdependencies. Also missing values, typically occurring within longitudinal databases for reasons of item non-response, can be addressed via multiple imputation when providing databases. The provision of synthetic databases using multiple imputation techniques is one possible strategy to ensure data protection, increase visibility of longitudinal databases and enhance the analytical potential.
Modeling climate change impacts on the forest sector
John R. Mills; Ralph Alig; Richard W. Haynes; Darius M. Adams
2000-01-01
The forest sector has had a relatively long history of applying sectorial models to estimate the effects of atmospheric issues such as acid rain, climate change, and the forestry impacts of reduced atmospheric ozone. The models of the forest sector vary in scope and complexity but share a number of common features and databases.
Carcinogenicity and Mutagenicity Data: New Initiatives to ...
Currents models for prediction of chemical carcinogenicity and mutagenicity rely upon a relatively small number of publicly available data resources, where the data being modeled are highly summarized and aggregated representations of the actual experimental results. A number of new initiatives are underway to improve access to existing public carcinogenicity and mutagenicity data for use in modeling, as well as to encourage new approaches to the use of data in modeling. Rodent bioassay results from the NIEHS National Toxicology Program (NTP) and the Berkeley Carcinogenic Potency Database (CPDB) have provided the largest public data resources for building carcinogenicity prediction models to date. However, relatively few and limited representations of these data have actually informed existing models. Initiatives, such as EPA's DSSTox Database Network, offer elaborated and quality reviewed presentations of the CPDB and expanded data linkages and coverage of chemical space for carcinogenicity and mutagenicity. In particular the latest published DSSTox CPDBAS structure-data file includes a number of species-specific and summary activity fields, including a species-specific normalized score for carcinogenic potency (TD50) and various weighted summary activities. These data are being incorporated into PubChem to provide broad
Survey data and metadata modelling using document-oriented NoSQL
NASA Astrophysics Data System (ADS)
Rahmatuti Maghfiroh, Lutfi; Gusti Bagus Baskara Nugraha, I.
2018-03-01
Survey data that are collected from year to year have metadata change. However it need to be stored integratedly to get statistical data faster and easier. Data warehouse (DW) can be used to solve this limitation. However there is a change of variables in every period that can not be accommodated by DW. Traditional DW can not handle variable change via Slowly Changing Dimension (SCD). Previous research handle the change of variables in DW to manage metadata by using multiversion DW (MVDW). MVDW is designed using relational model. Some researches also found that developing nonrelational model in NoSQL database has reading time faster than the relational model. Therefore, we propose changes to metadata management by using NoSQL. This study proposes a model DW to manage change and algorithms to retrieve data with metadata changes. Evaluation of the proposed models and algorithms result in that database with the proposed design can retrieve data with metadata changes properly. This paper has contribution in comprehensive data analysis with metadata changes (especially data survey) in integrated storage.
Molecular Oxygen in the Thermosphere: Issues and Measurement Strategies
NASA Astrophysics Data System (ADS)
Picone, J. M.; Hedin, A. E.; Drob, D. P.; Meier, R. R.; Bishop, J.; Budzien, S. A.
2002-05-01
We review the state of empirical knowledge regarding the distribution of molecular oxygen in the lower thermosphere (100-200 km), as embodied by the new NRLMSISE-00 empirical atmospheric model, its predecessors, and the underlying databases. For altitudes above 120 km, the two major classes of data (mass spectrometer and solar ultraviolet [UV] absorption) disagree significantly regarding the magnitude of the O2 density and the dependence on solar activity. As a result, the addition of the Solar Maximum Mission (SMM) data set (based on solar UV absorption) to the NRLMSIS database has directly impacted the new model, increasing the complexity of the model's formulation and generally reducing the thermospheric O2 density relative to MSISE-90. Beyond interest in the thermosphere itself, this issue materially affects detailed models of ionospheric chemistry and dynamics as well as modeling of the upper atmospheric airglow. Because these are key elements of both experimental and operational systems which measure and forecast the near-Earth space environment, we present strategies for augmenting the database through analysis of existing data and through future measurements in order to resolve this issue.
Sánchez-de-Madariaga, Ricardo; Muñoz, Adolfo; Lozano-Rubí, Raimundo; Serrano-Balazote, Pablo; Castro, Antonio L; Moreno, Oscar; Pascual, Mario
2017-08-18
The objective of this research is to compare the relational and non-relational (NoSQL) database systems approaches in order to store, recover, query and persist standardized medical information in the form of ISO/EN 13606 normalized Electronic Health Record XML extracts, both in isolation and concurrently. NoSQL database systems have recently attracted much attention, but few studies in the literature address their direct comparison with relational databases when applied to build the persistence layer of a standardized medical information system. One relational and two NoSQL databases (one document-based and one native XML database) of three different sizes have been created in order to evaluate and compare the response times (algorithmic complexity) of six different complexity growing queries, which have been performed on them. Similar appropriate results available in the literature have also been considered. Relational and non-relational NoSQL database systems show almost linear algorithmic complexity query execution. However, they show very different linear slopes, the former being much steeper than the two latter. Document-based NoSQL databases perform better in concurrency than in isolation, and also better than relational databases in concurrency. Non-relational NoSQL databases seem to be more appropriate than standard relational SQL databases when database size is extremely high (secondary use, research applications). Document-based NoSQL databases perform in general better than native XML NoSQL databases. EHR extracts visualization and edition are also document-based tasks more appropriate to NoSQL database systems. However, the appropriate database solution much depends on each particular situation and specific problem.
1993-07-09
Calculate Oil and solve iteratively equation (18) for q and (l)-(S) forex . 4, Solve the velocity problemn through equation (19) to calculate q and (6)-(10) to...object.oriented models for the database to store the system information f1l. Using OOP on the formalism level is more difficult and a current field of...Multidimensional Physical Systems: Graph-theoretic Modeling, Systems and Cybernetics, vol 21 (1992), 5 .9-71 JV A RELATIONAL DATABASE FOR GENERAL
PseudoBase: a database with RNA pseudoknots.
van Batenburg, F H; Gultyaev, A P; Pleij, C W; Ng, J; Oliehoek, J
2000-01-01
PseudoBase is a database containing structural, functional and sequence data related to RNA pseudo-knots. It can be reached at http://wwwbio. Leiden Univ.nl/ approximately Batenburg/PKB.html. This page will direct the user to a retrieval page from where a particular pseudoknot can be chosen, or to a submission page which enables the user to add pseudoknot information to the database or to an informative page that elaborates on the various aspects of the database. For each pseudoknot, 12 items are stored, e.g. the nucleotides of the region that contains the pseudoknot, the stem positions of the pseudoknot, the EMBL accession number of the sequence that contains this pseudoknot and the support that can be given regarding the reliability of the pseudoknot. Access is via a small number of steps, using 16 different categories. The development process was done by applying the evolutionary methodology for software development rather than by applying the methodology of the classical waterfall model or the more modern spiral model.
An alternative database approach for management of SNOMED CT and improved patient data queries.
Campbell, W Scott; Pedersen, Jay; McClay, James C; Rao, Praveen; Bastola, Dhundy; Campbell, James R
2015-10-01
SNOMED CT is the international lingua franca of terminologies for human health. Based in Description Logics (DL), the terminology enables data queries that incorporate inferences between data elements, as well as, those relationships that are explicitly stated. However, the ontologic and polyhierarchical nature of the SNOMED CT concept model make it difficult to implement in its entirety within electronic health record systems that largely employ object oriented or relational database architectures. The result is a reduction of data richness, limitations of query capability and increased systems overhead. The hypothesis of this research was that a graph database (graph DB) architecture using SNOMED CT as the basis for the data model and subsequently modeling patient data upon the semantic core of SNOMED CT could exploit the full value of the terminology to enrich and support advanced data querying capability of patient data sets. The hypothesis was tested by instantiating a graph DB with the fully classified SNOMED CT concept model. The graph DB instance was tested for integrity by calculating the transitive closure table for the SNOMED CT hierarchy and comparing the results with transitive closure tables created using current, validated methods. The graph DB was then populated with 461,171 anonymized patient record fragments and over 2.1 million associated SNOMED CT clinical findings. Queries, including concept negation and disjunction, were then run against the graph database and an enterprise Oracle relational database (RDBMS) of the same patient data sets. The graph DB was then populated with laboratory data encoded using LOINC, as well as, medication data encoded with RxNorm and complex queries performed using LOINC, RxNorm and SNOMED CT to identify uniquely described patient populations. A graph database instance was successfully created for two international releases of SNOMED CT and two US SNOMED CT editions. Transitive closure tables and descriptive statistics generated using the graph database were identical to those using validated methods. Patient queries produced identical patient count results to the Oracle RDBMS with comparable times. Database queries involving defining attributes of SNOMED CT concepts were possible with the graph DB. The same queries could not be directly performed with the Oracle RDBMS representation of the patient data and required the creation and use of external terminology services. Further, queries of undefined depth were successful in identifying unknown relationships between patient cohorts. The results of this study supported the hypothesis that a patient database built upon and around the semantic model of SNOMED CT was possible. The model supported queries that leveraged all aspects of the SNOMED CT logical model to produce clinically relevant query results. Logical disjunction and negation queries were possible using the data model, as well as, queries that extended beyond the structural IS_A hierarchy of SNOMED CT to include queries that employed defining attribute-values of SNOMED CT concepts as search parameters. As medical terminologies, such as SNOMED CT, continue to expand, they will become more complex and model consistency will be more difficult to assure. Simultaneously, consumers of data will increasingly demand improvements to query functionality to accommodate additional granularity of clinical concepts without sacrificing speed. This new line of research provides an alternative approach to instantiating and querying patient data represented using advanced computable clinical terminologies. Copyright © 2015 Elsevier Inc. All rights reserved.
Design of Integrated Database on Mobile Information System: A Study of Yogyakarta Smart City App
NASA Astrophysics Data System (ADS)
Nurnawati, E. K.; Ermawati, E.
2018-02-01
An integration database is a database which acts as the data store for multiple applications and thus integrates data across these applications (in contrast to an Application Database). An integration database needs a schema that takes all its client applications into account. The benefit of the schema that sharing data among applications does not require an extra layer of integration services on the applications. Any changes to data made in a single application are made available to all applications at the time of database commit - thus keeping the applications’ data use better synchronized. This study aims to design and build an integrated database that can be used by various applications in a mobile device based system platforms with the based on smart city system. The built-in database can be used by various applications, whether used together or separately. The design and development of the database are emphasized on the flexibility, security, and completeness of attributes that can be used together by various applications to be built. The method used in this study is to choice of the appropriate database logical structure (patterns of data) and to build the relational-database models (Design Databases). Test the resulting design with some prototype apps and analyze system performance with test data. The integrated database can be utilized both of the admin and the user in an integral and comprehensive platform. This system can help admin, manager, and operator in managing the application easily and efficiently. This Android-based app is built based on a dynamic clientserver where data is extracted from an external database MySQL. So if there is a change of data in the database, then the data on Android applications will also change. This Android app assists users in searching of Yogyakarta (as smart city) related information, especially in culture, government, hotels, and transportation.
PropBase Query Layer: a single portal to UK subsurface physical property databases
NASA Astrophysics Data System (ADS)
Kingdon, Andrew; Nayembil, Martin L.; Richardson, Anne E.; Smith, A. Graham
2013-04-01
Until recently, the delivery of geological information for industry and public was achieved by geological mapping. Now pervasively available computers mean that 3D geological models can deliver realistic representations of the geometric location of geological units, represented as shells or volumes. The next phase of this process is to populate these with physical properties data that describe subsurface heterogeneity and its associated uncertainty. Achieving this requires capture and serving of physical, hydrological and other property information from diverse sources to populate these models. The British Geological Survey (BGS) holds large volumes of subsurface property data, derived both from their own research data collection and also other, often commercially derived data sources. This can be voxelated to incorporate this data into the models to demonstrate property variation within the subsurface geometry. All property data held by BGS has for many years been stored in relational databases to ensure their long-term continuity. However these have, by necessity, complex structures; each database contains positional reference data and model information, and also metadata such as sample identification information and attributes that define the source and processing. Whilst this is critical to assessing these analyses, it also hugely complicates the understanding of variability of the property under assessment and requires multiple queries to study related datasets making extracting physical properties from these databases difficult. Therefore the PropBase Query Layer has been created to allow simplified aggregation and extraction of all related data and its presentation of complex data in simple, mostly denormalized, tables which combine information from multiple databases into a single system. The structure from each relational database is denormalized in a generalised structure, so that each dataset can be viewed together in a common format using a simple interface. Data are re-engineered to facilitate easy loading. The query layer structure comprises tables, procedures, functions, triggers, views and materialised views. The structure contains a main table PRB_DATA which contains all of the data with the following attribution: • a unique identifier • the data source • the unique identifier from the parent database for traceability • the 3D location • the property type • the property value • the units • necessary qualifiers • precision information and an audit trail Data sources, property type and units are constrained by dictionaries, a key component of the structure which defines what properties and inheritance hierarchies are to be coded and also guides the process as to what and how these are extracted from the structure. Data types served by the Query Layer include site investigation derived geotechnical data, hydrogeology datasets, regional geochemistry, geophysical logs as well as lithological and borehole metadata. The size and complexity of the data sets with multiple parent structures requires a technically robust approach to keep the layer synchronised. This is achieved through Oracle procedures written in PL/SQL containing the logic required to carry out the data manipulation (inserts, updates, deletes) to keep the layer synchronised with the underlying databases either as regular scheduled jobs (weekly, monthly etc) or invoked on demand. The PropBase Query Layer's implementation has enabled rapid data discovery, visualisation and interpretation of geological data with greater ease, simplifying the parametrisation of 3D model volumes and facilitating the study of intra-unit heterogeneity.
Solving Relational Database Problems with ORDBMS in an Advanced Database Course
ERIC Educational Resources Information Center
Wang, Ming
2011-01-01
This paper introduces how to use the object-relational database management system (ORDBMS) to solve relational database (RDB) problems in an advanced database course. The purpose of the paper is to provide a guideline for database instructors who desire to incorporate the ORDB technology in their traditional database courses. The paper presents…
The Magnetics Information Consortium (MagIC)
NASA Astrophysics Data System (ADS)
Johnson, C.; Constable, C.; Tauxe, L.; Koppers, A.; Banerjee, S.; Jackson, M.; Solheid, P.
2003-12-01
The Magnetics Information Consortium (MagIC) is a multi-user facility to establish and maintain a state-of-the-art relational database and digital archive for rock and paleomagnetic data. The goal of MagIC is to make such data generally available and to provide an information technology infrastructure for these and other research-oriented databases run by the international community. As its name implies, MagIC will not be restricted to paleomagnetic or rock magnetic data only, although MagIC will focus on these kinds of information during its setup phase. MagIC will be hosted under EarthRef.org at http://earthref.org/MAGIC/ where two "integrated" web portals will be developed, one for paleomagnetism (currently functional as a prototype that can be explored via the http://earthref.org/databases/PMAG/ link) and one for rock magnetism. The MagIC database will store all measurements and their derived properties for studies of paleomagnetic directions (inclination, declination) and their intensities, and for rock magnetic experiments (hysteresis, remanence, susceptibility, anisotropy). Ultimately, this database will allow researchers to study "on the internet" and to download important data sets that display paleo-secular variations in the intensity of the Earth's magnetic field over geological time, or that display magnetic data in typical Zijderveld, hysteresis/FORC and various magnetization/remanence diagrams. The MagIC database is completely integrated in the EarthRef.org relational database structure and thus benefits significantly from already-existing common database components, such as the EarthRef Reference Database (ERR) and Address Book (ERAB). The ERR allows researchers to find complete sets of literature resources as used in GERM (Geochemical Earth Reference Model), REM (Reference Earth Model) and MagIC. The ERAB contains addresses for all contributors to the EarthRef.org databases, and also for those who participated in data collection, archiving and analysis in the magnetic studies. Integration with these existing components will guarantee direct traceability to the original sources of the MagIC data and metadata. The MagIC database design focuses around the general workflow that results in the determination of typical paleomagnetic and rock magnetic analyses. This ensures that individual data points can be traced between the actual measurements and their associated specimen, sample, site, rock formation and locality. This permits a distinction between original and derived data, where the actual measurements are performed at the specimen level, and data at the sample level and higher are then derived products in the database. These relations will also allow recalculation of derived properties, such as site means, when new data becomes available for a specific locality. Data contribution to the MagIC database is critical in achieving a useful research tool. We have developed a standard data and metadata template that can be used to provide all data at the same time as publication. Software tools are provided to facilitate easy population of these templates. The tools allow for the import/export of data files in a delimited text format, and they provide some advanced functionality to validate data and to check internal coherence of the data in the template. During and after publication these standardized MagIC templates will be stored in the ERR database of EarthRef.org from where they can be downloaded at all times. Finally, the contents of these template files will be automatically parsed into the online relational database.
Fragment virtual screening based on Bayesian categorization for discovering novel VEGFR-2 scaffolds.
Zhang, Yanmin; Jiao, Yu; Xiong, Xiao; Liu, Haichun; Ran, Ting; Xu, Jinxing; Lu, Shuai; Xu, Anyang; Pan, Jing; Qiao, Xin; Shi, Zhihao; Lu, Tao; Chen, Yadong
2015-11-01
The discovery of novel scaffolds against a specific target has long been one of the most significant but challengeable goals in discovering lead compounds. A scaffold that binds in important regions of the active pocket is more favorable as a starting point because scaffolds generally possess greater optimization possibilities. However, due to the lack of sufficient chemical space diversity of the databases and the ineffectiveness of the screening methods, it still remains a great challenge to discover novel active scaffolds. Since the strengths and weaknesses of both fragment-based drug design and traditional virtual screening (VS), we proposed a fragment VS concept based on Bayesian categorization for the discovery of novel scaffolds. This work investigated the proposal through an application on VEGFR-2 target. Firstly, scaffold and structural diversity of chemical space for 10 compound databases were explicitly evaluated. Simultaneously, a robust Bayesian classification model was constructed for screening not only compound databases but also their corresponding fragment databases. Although analysis of the scaffold diversity demonstrated a very unevenly distribution of scaffolds over molecules, results showed that our Bayesian model behaved better in screening fragments than molecules. Through a literature retrospective research, several generated fragments with relatively high Bayesian scores indeed exhibit VEGFR-2 biological activity, which strongly proved the effectiveness of fragment VS based on Bayesian categorization models. This investigation of Bayesian-based fragment VS can further emphasize the necessity for enrichment of compound databases employed in lead discovery by amplifying the diversity of databases with novel structures.
The Watershed Health Assessment Tools-Investigating Fisheries (WHAT-IF) is a decision-analysis modeling toolkit for personal computers that supports watershed and fisheries management. The WHAT-IF toolkit includes a relational database, help-system functions and documentation, a...
CAEDS--Computer-Aided Engineering and Architectural Design System.
1982-08-01
elements " Annotation " Points " Lines " Polygons " Polyhedron " Group of elements Modification of above (changes or deletions) Line-weighting, cross...Research Laboratory, Champaign, IL, CERL-TR-E-153, June 1979. (4) "ARCH:MODEL, Version 1-2, Geometric Modeling Relational Database Sys- tem
Database recovery using redundant disk arrays
NASA Technical Reports Server (NTRS)
Mourad, Antoine N.; Fuchs, W. K.; Saab, Daniel G.
1992-01-01
Redundant disk arrays provide a way for achieving rapid recovery from media failures with a relatively low storage cost for large scale database systems requiring high availability. In this paper a method is proposed for using redundant disk arrays to support rapid-recovery from system crashes and transaction aborts in addition to their role in providing media failure recovery. A twin page scheme is used to store the parity information in the array so that the time for transaction commit processing is not degraded. Using an analytical model, it is shown that the proposed method achieves a significant increase in the throughput of database systems using redundant disk arrays by reducing the number of recovery operations needed to maintain the consistency of the database.
Recovery issues in databases using redundant disk arrays
NASA Technical Reports Server (NTRS)
Mourad, Antoine N.; Fuchs, W. K.; Saab, Daniel G.
1993-01-01
Redundant disk arrays provide a way for achieving rapid recovery from media failures with a relatively low storage cost for large scale database systems requiring high availability. In this paper we propose a method for using redundant disk arrays to support rapid recovery from system crashes and transaction aborts in addition to their role in providing media failure recovery. A twin page scheme is used to store the parity information in the array so that the time for transaction commit processing is not degraded. Using an analytical model, we show that the proposed method achieves a significant increase in the throughput of database systems using redundant disk arrays by reducing the number of recovery operations needed to maintain the consistency of the database.
Identifying work-related motor vehicle crashes in multiple databases.
Thomas, Andrea M; Thygerson, Steven M; Merrill, Ray M; Cook, Lawrence J
2012-01-01
To compare and estimate the magnitude of work-related motor vehicle crashes in Utah using 2 probabilistically linked statewide databases. Data from 2006 and 2007 motor vehicle crash and hospital databases were joined through probabilistic linkage. Summary statistics and capture-recapture were used to describe occupants injured in work-related motor vehicle crashes and estimate the size of this population. There were 1597 occupants in the motor vehicle crash database and 1673 patients in the hospital database identified as being in a work-related motor vehicle crash. We identified 1443 occupants with at least one record from either the motor vehicle crash or hospital database indicating work-relatedness that linked to any record in the opposing database. We found that 38.7 percent of occupants injured in work-related motor vehicle crashes identified in the motor vehicle crash database did not have a primary payer code of workers' compensation in the hospital database and 40.0 percent of patients injured in work-related motor vehicle crashes identified in the hospital database did not meet our definition of a work-related motor vehicle crash in the motor vehicle crash database. Depending on how occupants injured in work-related motor crashes are identified, we estimate the population to be between 1852 and 8492 in Utah for the years 2006 and 2007. Research on single databases may lead to biased interpretations of work-related motor vehicle crashes. Combining 2 population based databases may still result in an underestimate of the magnitude of work-related motor vehicle crashes. Improved coding of work-related incidents is needed in current databases.
SAADA: Astronomical Databases Made Easier
NASA Astrophysics Data System (ADS)
Michel, L.; Nguyen, H. N.; Motch, C.
2005-12-01
Many astronomers wish to share datasets with their community but have not enough manpower to develop databases having the functionalities required for high-level scientific applications. The SAADA project aims at automatizing the creation and deployment process of such databases. A generic but scientifically relevant data model has been designed which allows one to build databases by providing only a limited number of product mapping rules. Databases created by SAADA rely on a relational database supporting JDBC and covered by a Java layer including a lot of generated code. Such databases can simultaneously host spectra, images, source lists and plots. Data are grouped in user defined collections whose content can be seen as one unique set per data type even if their formats differ. Datasets can be correlated one with each other using qualified links. These links help, for example, to handle the nature of a cross-identification (e.g., a distance or a likelihood) or to describe their scientific content (e.g., by associating a spectrum to a catalog entry). The SAADA query engine is based on a language well suited to the data model which can handle constraints on linked data, in addition to classical astronomical queries. These constraints can be applied on the linked objects (number, class and attributes) and/or on the link qualifier values. Databases created by SAADA are accessed through a rich WEB interface or a Java API. We are currently developing an inter-operability module implanting VO protocols.
Database resources of the National Center for Biotechnology Information.
Sayers, Eric W; Barrett, Tanya; Benson, Dennis A; Bolton, Evan; Bryant, Stephen H; Canese, Kathi; Chetvernin, Vyacheslav; Church, Deanna M; DiCuccio, Michael; Federhen, Scott; Feolo, Michael; Fingerman, Ian M; Geer, Lewis Y; Helmberg, Wolfgang; Kapustin, Yuri; Landsman, David; Lipman, David J; Lu, Zhiyong; Madden, Thomas L; Madej, Tom; Maglott, Donna R; Marchler-Bauer, Aron; Miller, Vadim; Mizrachi, Ilene; Ostell, James; Panchenko, Anna; Phan, Lon; Pruitt, Kim D; Schuler, Gregory D; Sequeira, Edwin; Sherry, Stephen T; Shumway, Martin; Sirotkin, Karl; Slotta, Douglas; Souvorov, Alexandre; Starchenko, Grigory; Tatusova, Tatiana A; Wagner, Lukas; Wang, Yanli; Wilbur, W John; Yaschenko, Eugene; Ye, Jian
2011-01-01
In addition to maintaining the GenBank® nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI Web site. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central (PMC), Entrez Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Primer-BLAST, COBALT, Electronic PCR, OrfFinder, Splign, ProSplign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, dbVar, Epigenomics, Cancer Chromosomes, Entrez Genomes and related tools, the Map Viewer, Model Maker, Evidence Viewer, Trace Archive, Sequence Read Archive, Retroviral Genotyping Tools, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus (GEO), Entrez Probe, GENSAT, Online Mendelian Inheritance in Man (OMIM), Online Mendelian Inheritance in Animals (OMIA), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD), the Conserved Domain Architecture Retrieval Tool (CDART), IBIS, Biosystems, Peptidome, OMSSA, Protein Clusters and the PubChem suite of small molecule databases. Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.
Performance assessment of EMR systems based on post-relational database.
Yu, Hai-Yan; Li, Jing-Song; Zhang, Xiao-Guang; Tian, Yu; Suzuki, Muneou; Araki, Kenji
2012-08-01
Post-relational databases provide high performance and are currently widely used in American hospitals. As few hospital information systems (HIS) in either China or Japan are based on post-relational databases, here we introduce a new-generation electronic medical records (EMR) system called Hygeia, which was developed with the post-relational database Caché and the latest platform Ensemble. Utilizing the benefits of a post-relational database, Hygeia is equipped with an "integration" feature that allows all the system users to access data-with a fast response time-anywhere and at anytime. Performance tests of databases in EMR systems were implemented in both China and Japan. First, a comparison test was conducted between a post-relational database, Caché, and a relational database, Oracle, embedded in the EMR systems of a medium-sized first-class hospital in China. Second, a user terminal test was done on the EMR system Izanami, which is based on the identical database Caché and operates efficiently at the Miyazaki University Hospital in Japan. The results proved that the post-relational database Caché works faster than the relational database Oracle and showed perfect performance in the real-time EMR system.
Data model and relational database design for the New Jersey Water-Transfer Data System (NJWaTr)
Tessler, Steven
2003-01-01
The New Jersey Water-Transfer Data System (NJWaTr) is a database design for the storage and retrieval of water-use data. NJWaTr can manage data encompassing many facets of water use, including (1) the tracking of various types of water-use activities (withdrawals, returns, transfers, distributions, consumptive-use, wastewater collection, and treatment); (2) the storage of descriptions, classifications and locations of places and organizations involved in water-use activities; (3) the storage of details about measured or estimated volumes of water associated with water-use activities; and (4) the storage of information about data sources and water resources associated with water use. In NJWaTr, each water transfer occurs unidirectionally between two site objects, and the sites and conveyances form a water network. The core entities in the NJWaTr model are site, conveyance, transfer/volume, location, and owner. Other important entities include water resource (used for withdrawals and returns), data source, permit, and alias. Multiple water-exchange estimates based on different methods or data sources can be stored for individual transfers. Storage of user-defined details is accommodated for several of the main entities. Many tables contain classification terms to facilitate the detailed description of data items and can be used for routine or custom data summarization. NJWaTr accommodates single-user and aggregate-user water-use data, can be used for large or small water-network projects, and is available as a stand-alone Microsoft? Access database. Data stored in the NJWaTr structure can be retrieved in user-defined combinations to serve visualization and analytical applications. Users can customize and extend the database, link it to other databases, or implement the design in other relational database applications.
NASA Astrophysics Data System (ADS)
Sherwood, Owen A.; Schwietzke, Stefan; Arling, Victoria A.; Etiope, Giuseppe
2017-08-01
The concentration of atmospheric methane (CH4) has more than doubled over the industrial era. To help constrain global and regional CH4 budgets, inverse (top-down) models incorporate data on the concentration and stable carbon (δ13C) and hydrogen (δ2H) isotopic ratios of atmospheric CH4. These models depend on accurate δ13C and δ2H end-member source signatures for each of the main emissions categories. Compared with meticulous measurement and calibration of isotopic CH4 in the atmosphere, there has been relatively less effort to characterize globally representative isotopic source signatures, particularly for fossil fuel sources. Most global CH4 budget models have so far relied on outdated source signature values derived from globally nonrepresentative data. To correct this deficiency, we present a comprehensive, globally representative end-member database of the δ13C and δ2H of CH4 from fossil fuel (conventional natural gas, shale gas, and coal), modern microbial (wetlands, rice paddies, ruminants, termites, and landfills and/or waste) and biomass burning sources. Gas molecular compositional data for fossil fuel categories are also included with the database. The database comprises 10 706 samples (8734 fossil fuel, 1972 non-fossil) from 190 published references. Mean (unweighted) δ13C signatures for fossil fuel CH4 are significantly lighter than values commonly used in CH4 budget models, thus highlighting potential underestimation of fossil fuel CH4 emissions in previous CH4 budget models. This living database will be updated every 2-3 years to provide the atmospheric modeling community with the most complete CH4 source signature data possible. Database digital object identifier (DOI): https://doi.org/10.15138/G3201T.
Designing a data portal for synthesis modeling
NASA Astrophysics Data System (ADS)
Holmes, M. A.
2006-12-01
Processing of field and model data in multi-disciplinary integrated science studies is a vital part of synthesis modeling. Collection and storage techniques for field data vary greatly between the participating scientific disciplines due to the nature of the data being collected, whether it be in situ, remotely sensed, or recorded by automated data logging equipment. Spreadsheets, personal databases, text files and binary files are used in the initial storage and processing of the raw data. In order to be useful to scientists, engineers and modelers the data need to be stored in a format that is easily identifiable, accessible and transparent to a variety of computing environments. The Model Operations and Synthesis (MOAS) database and associated web portal were created to provide such capabilities. The industry standard relational database is comprised of spatial and temporal data tables, shape files and supporting metadata accessible over the network, through a menu driven web-based portal or spatially accessible through ArcSDE connections from the user's local GIS desktop software. A separate server provides public access to spatial data and model output in the form of attributed shape files through an ArcIMS web-based graphical user interface.
Evaluation of low wind modeling approaches for two tall-stack databases.
Paine, Robert; Samani, Olga; Kaplan, Mary; Knipping, Eladio; Kumar, Naresh
2015-11-01
The performance of the AERMOD air dispersion model under low wind speed conditions, especially for applications with only one level of meteorological data and no direct turbulence measurements or vertical temperature gradient observations, is the focus of this study. The analysis documented in this paper addresses evaluations for low wind conditions involving tall stack releases for which multiple years of concurrent emissions, meteorological data, and monitoring data are available. AERMOD was tested on two field-study databases involving several SO2 monitors and hourly emissions data that had sub-hourly meteorological data (e.g., 10-min averages) available using several technical options: default mode, with various low wind speed beta options, and using the available sub-hourly meteorological data. These field study databases included (1) Mercer County, a North Dakota database featuring five SO2 monitors within 10 km of the Dakota Gasification Company's plant and the Antelope Valley Station power plant in an area of both flat and elevated terrain, and (2) a flat-terrain setting database with four SO2 monitors within 6 km of the Gibson Generating Station in southwest Indiana. Both sites featured regionally representative 10-m meteorological databases, with no significant terrain obstacles between the meteorological site and the emission sources. The low wind beta options show improvement in model performance helping to reduce some of the over-prediction biases currently present in AERMOD when run with regulatory default options. The overall findings with the low wind speed testing on these tall stack field-study databases indicate that AERMOD low wind speed options have a minor effect for flat terrain locations, but can have a significant effect for elevated terrain locations. The performance of AERMOD using low wind speed options leads to improved consistency of meteorological conditions associated with the highest observed and predicted concentration events. The available sub-hourly modeling results using the Sub-Hourly AERMOD Run Procedure (SHARP) are relatively unbiased and show that this alternative approach should be seriously considered to address situations dominated by low-wind meander conditions. AERMOD was evaluated with two tall stack databases (in North Dakota and Indiana) in areas of both flat and elevated terrain. AERMOD cases included the regulatory default mode, low wind speed beta options, and use of the Sub-Hourly AERMOD Run Procedure (SHARP). The low wind beta options show improvement in model performance (especially in higher terrain areas), helping to reduce some of the over-prediction biases currently present in regulatory default AERMOD. The SHARP results are relatively unbiased and show that this approach should be seriously considered to address situations dominated by low-wind meander conditions.
Organization of Heterogeneous Scientific Data Using the EAV/CR Representation
Nadkarni, Prakash M.; Marenco, Luis; Chen, Roland; Skoufos, Emmanouil; Shepherd, Gordon; Miller, Perry
1999-01-01
Entity-attribute-value (EAV) representation is a means of organizing highly heterogeneous data using a relatively simple physical database schema. EAV representation is widely used in the medical domain, most notably in the storage of data related to clinical patient records. Its potential strengths suggest its use in other biomedical areas, in particular research databases whose schemas are complex as well as constantly changing to reflect evolving knowledge in rapidly advancing scientific domains. When deployed for such purposes, the basic EAV representation needs to be augmented significantly to handle the modeling of complex objects (classes) as well as to manage interobject relationships. The authors refer to their modification of the basic EAV paradigm as EAV/CR (EAV with classes and relationships). They describe EAV/CR representation with examples from two biomedical databases that use it. PMID:10579606
Sukhinin, Dmitrii I.; Engel, Andreas K.; Manger, Paul; Hilgetag, Claus C.
2016-01-01
Databases of structural connections of the mammalian brain, such as CoCoMac (cocomac.g-node.org) or BAMS (https://bams1.org), are valuable resources for the analysis of brain connectivity and the modeling of brain dynamics in species such as the non-human primate or the rodent, and have also contributed to the computational modeling of the human brain. Another animal model that is widely used in electrophysiological or developmental studies is the ferret; however, no systematic compilation of brain connectivity is currently available for this species. Thus, we have started developing a database of anatomical connections and architectonic features of the ferret brain, the Ferret(connect)ome, www.Ferretome.org. The Ferretome database has adapted essential features of the CoCoMac methodology and legacy, such as the CoCoMac data model. This data model was simplified and extended in order to accommodate new data modalities that were not represented previously, such as the cytoarchitecture of brain areas. The Ferretome uses a semantic parcellation of brain regions as well as a logical brain map transformation algorithm (objective relational transformation, ORT). The ORT algorithm was also adopted for the transformation of architecture data. The database is being developed in MySQL and has been populated with literature reports on tract-tracing observations in the ferret brain using a custom-designed web interface that allows efficient and validated simultaneous input and proofreading by multiple curators. The database is equipped with a non-specialist web interface. This interface can be extended to produce connectivity matrices in several formats, including a graphical representation superimposed on established ferret brain maps. An important feature of the Ferretome database is the possibility to trace back entries in connectivity matrices to the original studies archived in the system. Currently, the Ferretome contains 50 reports on connections comprising 20 injection reports with more than 150 labeled source and target areas, the majority reflecting connectivity of subcortical nuclei and 15 descriptions of regional brain architecture. We hope that the Ferretome database will become a useful resource for neuroinformatics and neural modeling, and will support studies of the ferret brain as well as facilitate advances in comparative studies of mesoscopic brain connectivity. PMID:27242503
Sukhinin, Dmitrii I; Engel, Andreas K; Manger, Paul; Hilgetag, Claus C
2016-01-01
Databases of structural connections of the mammalian brain, such as CoCoMac (cocomac.g-node.org) or BAMS (https://bams1.org), are valuable resources for the analysis of brain connectivity and the modeling of brain dynamics in species such as the non-human primate or the rodent, and have also contributed to the computational modeling of the human brain. Another animal model that is widely used in electrophysiological or developmental studies is the ferret; however, no systematic compilation of brain connectivity is currently available for this species. Thus, we have started developing a database of anatomical connections and architectonic features of the ferret brain, the Ferret(connect)ome, www.Ferretome.org. The Ferretome database has adapted essential features of the CoCoMac methodology and legacy, such as the CoCoMac data model. This data model was simplified and extended in order to accommodate new data modalities that were not represented previously, such as the cytoarchitecture of brain areas. The Ferretome uses a semantic parcellation of brain regions as well as a logical brain map transformation algorithm (objective relational transformation, ORT). The ORT algorithm was also adopted for the transformation of architecture data. The database is being developed in MySQL and has been populated with literature reports on tract-tracing observations in the ferret brain using a custom-designed web interface that allows efficient and validated simultaneous input and proofreading by multiple curators. The database is equipped with a non-specialist web interface. This interface can be extended to produce connectivity matrices in several formats, including a graphical representation superimposed on established ferret brain maps. An important feature of the Ferretome database is the possibility to trace back entries in connectivity matrices to the original studies archived in the system. Currently, the Ferretome contains 50 reports on connections comprising 20 injection reports with more than 150 labeled source and target areas, the majority reflecting connectivity of subcortical nuclei and 15 descriptions of regional brain architecture. We hope that the Ferretome database will become a useful resource for neuroinformatics and neural modeling, and will support studies of the ferret brain as well as facilitate advances in comparative studies of mesoscopic brain connectivity.
NRLMSISE-00 Empirical Model of the Atmosphere: Statistical Comparisons and Scientific Issues
NASA Technical Reports Server (NTRS)
Aikin, A. C.; Picone, J. M.; Hedin, A. E.; Drob, D. P.
2001-01-01
The new NRLMSISE-00 model and the associated NRLMSIS database now include the following data: (1) total mass density from satellite accelerometers and from orbit determination, including the Jacchia and Barlier data; (2) temperature from incoherent scatter radar, and; (3) molecular oxygen number density, [O2], from solar ultraviolet occultation aboard the Solar Maximum Mission (SMM). A new component, 'anomalous oxygen,' allows for appreciable O(+) and hot atomic oxygen contributions to the total mass density at high altitudes and applies primarily to drag estimation above 500 km. Extensive tables compare our entire database to the NRLMSISE-00, MSISE-90, and Jacchia-70 models for different altitude bands and levels of geomagnetic activity. We also investigate scientific issues related to the new data sets in the NRLMSIS database. Especially noteworthy is the solar activity dependence of the Jacchia data, with which we investigate a large O(+) contribution to the total mass density under the combination of summer, low solar activity, high latitudes, and high altitudes. Under these conditions, except at very low solar activity, the Jacchia data and the Jacchia-70 model indeed show a significantly higher total mass density than does MSISE-90. However, under the corresponding winter conditions, the MSIS-class models represent a noticeable improvement relative to Jacchia-70 over a wide range of F(sub 10.7). Considering the two regimes together, NRLMSISE-00 achieves an improvement over both MSISE-90 and Jacchia-70 by incorporating advantages of each.
Detecting Spatial Patterns of Natural Hazards from the Wikipedia Knowledge Base
NASA Astrophysics Data System (ADS)
Fan, J.; Stewart, K.
2015-07-01
The Wikipedia database is a data source of immense richness and variety. Included in this database are thousands of geotagged articles, including, for example, almost real-time updates on current and historic natural hazards. This includes usercontributed information about the location of natural hazards, the extent of the disasters, and many details relating to response, impact, and recovery. In this research, a computational framework is proposed to detect spatial patterns of natural hazards from the Wikipedia database by combining topic modeling methods with spatial analysis techniques. The computation is performed on the Neon Cluster, a high performance-computing cluster at the University of Iowa. This work uses wildfires as the exemplar hazard, but this framework is easily generalizable to other types of hazards, such as hurricanes or flooding. Latent Dirichlet Allocation (LDA) modeling is first employed to train the entire English Wikipedia dump, transforming the database dump into a 500-dimension topic model. Over 230,000 geo-tagged articles are then extracted from the Wikipedia database, spatially covering the contiguous United States. The geo-tagged articles are converted into an LDA topic space based on the topic model, with each article being represented as a weighted multidimension topic vector. By treating each article's topic vector as an observed point in geographic space, a probability surface is calculated for each of the topics. In this work, Wikipedia articles about wildfires are extracted from the Wikipedia database, forming a wildfire corpus and creating a basis for the topic vector analysis. The spatial distribution of wildfire outbreaks in the US is estimated by calculating the weighted sum of the topic probability surfaces using a map algebra approach, and mapped using GIS. To provide an evaluation of the approach, the estimation is compared to wildfire hazard potential maps created by the USDA Forest service.
ERAIZDA: a model for holistic annotation of animal infectious and zoonotic diseases
Buza, Teresia M.; Jack, Sherman W.; Kirunda, Halid; Khaitsa, Margaret L.; Lawrence, Mark L.; Pruett, Stephen; Peterson, Daniel G.
2015-01-01
There is an urgent need for a unified resource that integrates trans-disciplinary annotations of emerging and reemerging animal infectious and zoonotic diseases. Such data integration will provide wonderful opportunity for epidemiologists, researchers and health policy makers to make data-driven decisions designed to improve animal health. Integrating emerging and reemerging animal infectious and zoonotic disease data from a large variety of sources into a unified open-access resource provides more plausible arguments to achieve better understanding of infectious and zoonotic diseases. We have developed a model for interlinking annotations of these diseases. These diseases are of particular interest because of the threats they pose to animal health, human health and global health security. We demonstrated the application of this model using brucellosis, an infectious and zoonotic disease. Preliminary annotations were deposited into VetBioBase database (http://vetbiobase.igbb.msstate.edu). This database is associated with user-friendly tools to facilitate searching, retrieving and downloading of disease-related information. Database URL: http://vetbiobase.igbb.msstate.edu PMID:26581408
Sujansky, Walter V; Faus, Sam A; Stone, Ethan; Brennan, Patricia Flatley
2010-10-01
Online personal health records (PHRs) enable patients to access, manage, and share certain of their own health information electronically. This capability creates the need for precise access-controls mechanisms that restrict the sharing of data to that intended by the patient. The authors describe the design and implementation of an access-control mechanism for PHR repositories that is modeled on the eXtensible Access Control Markup Language (XACML) standard, but intended to reduce the cognitive and computational complexity of XACML. The authors implemented the mechanism entirely in a relational database system using ANSI-standard SQL statements. Based on a set of access-control rules encoded as relational table rows, the mechanism determines via a single SQL query whether a user who accesses patient data from a specific application is authorized to perform a requested operation on a specified data object. Testing of this query on a moderately large database has demonstrated execution times consistently below 100ms. The authors include the details of the implementation, including algorithms, examples, and a test database as Supplementary materials. Copyright © 2010 Elsevier Inc. All rights reserved.
A Relational/Object-Oriented Database Management System: R/OODBMS
1992-09-01
Concepts In 1968, Dr. Edgar F . Codd had the idea that "predicate logic could be applied to maintaining the logical integrity of the data" in a DBMS [CD90, p...Hall, Inc., Englewood Cliffs, NJ, 1990. [Co70] Codd , E. F ., "A Relational Model for Large Shared Data Banks," Communications of the ACM, v. 13, no.6...pp. 377-387 Jun 1970. [CD90] Interview between E. F . Codd and DBMS, "Relational philosopher: the creator of the relational model talks about his
SSER: Species specific essential reactions database.
Labena, Abraham A; Ye, Yuan-Nong; Dong, Chuan; Zhang, Fa-Z; Guo, Feng-Biao
2017-04-19
Essential reactions are vital components of cellular networks. They are the foundations of synthetic biology and are potential candidate targets for antimetabolic drug design. Especially if a single reaction is catalyzed by multiple enzymes, then inhibiting the reaction would be a better option than targeting the enzymes or the corresponding enzyme-encoding gene. The existing databases such as BRENDA, BiGG, KEGG, Bio-models, Biosilico, and many others offer useful and comprehensive information on biochemical reactions. But none of these databases especially focus on essential reactions. Therefore, building a centralized repository for this class of reactions would be of great value. Here, we present a species-specific essential reactions database (SSER). The current version comprises essential biochemical and transport reactions of twenty-six organisms which are identified via flux balance analysis (FBA) combined with manual curation on experimentally validated metabolic network models. Quantitative data on the number of essential reactions, number of the essential reactions associated with their respective enzyme-encoding genes and shared essential reactions across organisms are the main contents of the database. SSER would be a prime source to obtain essential reactions data and related gene and metabolite information and it can significantly facilitate the metabolic network models reconstruction and analysis, and drug target discovery studies. Users can browse, search, compare and download the essential reactions of organisms of their interest through the website http://cefg.uestc.edu.cn/sser .
Java Web Simulation (JWS); a web based database of kinetic models.
Snoep, J L; Olivier, B G
2002-01-01
Software to make a database of kinetic models accessible via the internet has been developed and a core database has been set up at http://jjj.biochem.sun.ac.za/. This repository of models, available to everyone with internet access, opens a whole new way in which we can make our models public. Via the database, a user can change enzyme parameters and run time simulations or steady state analyses. The interface is user friendly and no additional software is necessary. The database currently contains 10 models, but since the generation of the program code to include new models has largely been automated the addition of new models is straightforward and people are invited to submit their models to be included in the database.
GIS and RDBMS Used with Offline FAA Airspace Databases
NASA Technical Reports Server (NTRS)
Clark, J.; Simmons, J.; Scofield, E.; Talbott, B.
1994-01-01
A geographic information system (GIS) and relational database management system (RDBMS) were used in a Macintosh environment to access, manipulate, and display off-line FAA databases of airport and navigational aid locations, airways, and airspace boundaries. This proof-of-concept effort used data available from the Adaptation Controlled Environment System (ACES) and Digital Aeronautical Chart Supplement (DACS) databases to allow FAA cartographers and others to create computer-assisted charts and overlays as reference material for air traffic controllers. These products were created on an engineering model of the future GRASP (GRaphics Adaptation Support Position) workstation that will be used to make graphics and text products for the Advanced Automation System (AAS), which will upgrade and replace the current air traffic control system. Techniques developed during the prototyping effort have shown the viability of using databases to create graphical products without the need for an intervening data entry step.
The development of digital library system for drug research information.
Kim, H J; Kim, S R; Yoo, D S; Lee, S H; Suh, O K; Cho, J H; Shin, H T; Yoon, J P
1998-01-01
The sophistication of computer technology and information transmission on internet has made various cyber information repository available to information consumers. In the era of information super-highway, the digital library which can be accessed from remote sites at any time is considered the prototype of information repository. Using object-oriented DBMS, the very first model of digital library for pharmaceutical researchers and related professionals in Korea has been developed. The published research papers and researchers' personal information was included in the database. For database with research papers, 13 domestic journals were abstracted and scanned for full-text image files which can be viewed by Internet web browsers. The database with researchers' personal information was also developed and interlinked to the database with research papers. These database will be continuously updated and will be combined with world-wide information as the unique digital library in the field of pharmacy.
Syed, Mustafa H; Karpinets, Tatiana V; Leuze, Michael R; Kora, Guruprasad H; Romine, Margaret R; Uberbacher, Edward C
2009-01-01
Shewanella oneidensis MR-1 is an important model organism for environmental research as it has an exceptional metabolic and respiratory versatility regulated by a complex regulatory network. We have developed a database to collect experimental and computational data relating to regulation of gene and protein expression, and, a visualization environment that enables integration of these data types. The regulatory information in the database includes predictions of DNA regulator binding sites, sigma factor binding sites, transcription units, operons, promoters, and RNA regulators including non-coding RNAs, riboswitches, and different types of terminators. Availability http://shewanella-knowledgebase.org:8080/Shewanella/gbrowserLanding.jsp PMID:20198195
Distributed databases for materials study of thermo-kinetic properties
NASA Astrophysics Data System (ADS)
Toher, Cormac
2015-03-01
High-throughput computational materials science provides researchers with the opportunity to rapidly generate large databases of materials properties. To rapidly add thermal properties to the AFLOWLIB consortium and Materials Project repositories, we have implemented an automated quasi-harmonic Debye model, the Automatic GIBBS Library (AGL). This enables us to screen thousands of materials for thermal conductivity, bulk modulus, thermal expansion and related properties. The search and sort functions of the online database can then be used to identify suitable materials for more in-depth study using more precise computational or experimental techniques. AFLOW-AGL source code is public domain and will soon be released within the GNU-GPL license.
Short Fiction on Film: A Relational DataBase.
ERIC Educational Resources Information Center
May, Charles
Short Fiction on Film is a database that was created and will run on DataRelator, a relational database manager created by Bill Finzer for the California State Department of Education in 1986. DataRelator was designed for use in teaching students database management skills and to provide teachers with examples of how a database manager might be…
A Database as a Service for the Healthcare System to Store Physiological Signal Data.
Chang, Hsien-Tsung; Lin, Tsai-Huei
2016-01-01
Wearable devices that measure physiological signals to help develop self-health management habits have become increasingly popular in recent years. These records are conducive for follow-up health and medical care. In this study, based on the characteristics of the observed physiological signal records- 1) a large number of users, 2) a large amount of data, 3) low information variability, 4) data privacy authorization, and 5) data access by designated users-we wish to resolve physiological signal record-relevant issues utilizing the advantages of the Database as a Service (DaaS) model. Storing a large amount of data using file patterns can reduce database load, allowing users to access data efficiently; the privacy control settings allow users to store data securely. The results of the experiment show that the proposed system has better database access performance than a traditional relational database, with a small difference in database volume, thus proving that the proposed system can improve data storage performance.
A Database as a Service for the Healthcare System to Store Physiological Signal Data
Lin, Tsai-Huei
2016-01-01
Wearable devices that measure physiological signals to help develop self-health management habits have become increasingly popular in recent years. These records are conducive for follow-up health and medical care. In this study, based on the characteristics of the observed physiological signal records– 1) a large number of users, 2) a large amount of data, 3) low information variability, 4) data privacy authorization, and 5) data access by designated users—we wish to resolve physiological signal record-relevant issues utilizing the advantages of the Database as a Service (DaaS) model. Storing a large amount of data using file patterns can reduce database load, allowing users to access data efficiently; the privacy control settings allow users to store data securely. The results of the experiment show that the proposed system has better database access performance than a traditional relational database, with a small difference in database volume, thus proving that the proposed system can improve data storage performance. PMID:28033415
Wang, Lei; Alpert, Kathryn I; Calhoun, Vince D; Cobia, Derin J; Keator, David B; King, Margaret D; Kogan, Alexandr; Landis, Drew; Tallis, Marcelo; Turner, Matthew D; Potkin, Steven G; Turner, Jessica A; Ambite, Jose Luis
2016-01-01
SchizConnect (www.schizconnect.org) is built to address the issues of multiple data repositories in schizophrenia neuroimaging studies. It includes a level of mediation--translating across data sources--so that the user can place one query, e.g. for diffusion images from male individuals with schizophrenia, and find out from across participating data sources how many datasets there are, as well as downloading the imaging and related data. The current version handles the Data Usage Agreements across different studies, as well as interpreting database-specific terminologies into a common framework. New data repositories can also be mediated to bring immediate access to existing datasets. Compared with centralized, upload data sharing models, SchizConnect is a unique, virtual database with a focus on schizophrenia and related disorders that can mediate live data as information is being updated at each data source. It is our hope that SchizConnect can facilitate testing new hypotheses through aggregated datasets, promoting discovery related to the mechanisms underlying schizophrenic dysfunction. Copyright © 2015 Elsevier Inc. All rights reserved.
Cybermaterials: materials by design and accelerated insertion of materials
NASA Astrophysics Data System (ADS)
Xiong, Wei; Olson, Gregory B.
2016-02-01
Cybermaterials innovation entails an integration of Materials by Design and accelerated insertion of materials (AIM), which transfers studio ideation into industrial manufacturing. By assembling a hierarchical architecture of integrated computational materials design (ICMD) based on materials genomic fundamental databases, the ICMD mechanistic design models accelerate innovation. We here review progress in the development of linkage models of the process-structure-property-performance paradigm, as well as related design accelerating tools. Extending the materials development capability based on phase-level structural control requires more fundamental investment at the level of the Materials Genome, with focus on improving applicable parametric design models and constructing high-quality databases. Future opportunities in materials genomic research serving both Materials by Design and AIM are addressed.
Clique-based data mining for related genes in a biomedical database.
Matsunaga, Tsutomu; Yonemori, Chikara; Tomita, Etsuji; Muramatsu, Masaaki
2009-07-01
Progress in the life sciences cannot be made without integrating biomedical knowledge on numerous genes in order to help formulate hypotheses on the genetic mechanisms behind various biological phenomena, including diseases. There is thus a strong need for a way to automatically and comprehensively search from biomedical databases for related genes, such as genes in the same families and genes encoding components of the same pathways. Here we address the extraction of related genes by searching for densely-connected subgraphs, which are modeled as cliques, in a biomedical relational graph. We constructed a graph whose nodes were gene or disease pages, and edges were the hyperlink connections between those pages in the Online Mendelian Inheritance in Man (OMIM) database. We obtained over 20,000 sets of related genes (called 'gene modules') by enumerating cliques computationally. The modules included genes in the same family, genes for proteins that form a complex, and genes for components of the same signaling pathway. The results of experiments using 'metabolic syndrome'-related gene modules show that the gene modules can be used to get a coherent holistic picture helpful for interpreting relations among genes. We presented a data mining approach extracting related genes by enumerating cliques. The extracted gene sets provide a holistic picture useful for comprehending complex disease mechanisms.
1994-01-01
databases and identifying new data entities, data elements, and relationships . - Standard data naming conventions, schema, and definition processes...management system. The use of such a tool could offer: (1) structured support for representation of objects and their relationships to each other (and...their relationships to related multimedia objects such as an engineering drawing of the tank object or a satellite image that contains the installation
NASA Astrophysics Data System (ADS)
Mangosing, D. C.; Chen, G.; Kusterer, J.; Rinsland, P.; Perez, J.; Sorlie, S.; Parker, L.
2011-12-01
One of the objectives of the NASA Langley Research Center's MEaSURES project, "Creating a Unified Airborne Database for Model Assessment", is the development of airborne Earth System Data Records (ESDR) for the regional and global model assessment and validation activities performed by the tropospheric chemistry and climate modeling communities. The ongoing development of ADAM, a web site designed to access a unified, standardized and relational ESDR database, meets this objective. The ESDR database is derived from publically available data sets, from NASA airborne field studies to airborne and in-situ studies sponsored by NOAA, NSF, and numerous international partners. The ADAM web development activities provide an opportunity to highlight a growing synergy between the Airborne Science Data for Atmospheric Composition (ASD-AC) group at NASA Langley and the NASA Langley's Atmospheric Sciences Data Center (ASDC). These teams will collaborate on the ADAM web application by leveraging the state-of-the-art service and message-oriented data distribution architecture developed and implemented by ASDC and using a web-based tool provided by the ASD-AC group whose user interface accommodates the nuanced perspective of science users in the atmospheric chemistry and composition and climate modeling communities.
History Places: A Case Study for Relational Database and Information Retrieval System Design
ERIC Educational Resources Information Center
Hendry, David G.
2007-01-01
This article presents a project-based case study that was developed for students with diverse backgrounds and varied inclinations for engaging technical topics. The project, called History Places, requires that student teams develop a vision for a kind of digital library, propose a conceptual model, and use the model to derive a logical model and…
A Data Management System for International Space Station Simulation Tools
NASA Technical Reports Server (NTRS)
Betts, Bradley J.; DelMundo, Rommel; Elcott, Sharif; McIntosh, Dawn; Niehaus, Brian; Papasin, Richard; Mah, Robert W.; Clancy, Daniel (Technical Monitor)
2002-01-01
Groups associated with the design, operational, and training aspects of the International Space Station make extensive use of modeling and simulation tools. Users of these tools often need to access and manipulate large quantities of data associated with the station, ranging from design documents to wiring diagrams. Retrieving and manipulating this data directly within the simulation and modeling environment can provide substantial benefit to users. An approach for providing these kinds of data management services, including a database schema and class structure, is presented. Implementation details are also provided as a data management system is integrated into the Intelligent Virtual Station, a modeling and simulation tool developed by the NASA Ames Smart Systems Research Laboratory. One use of the Intelligent Virtual Station is generating station-related training procedures in a virtual environment, The data management component allows users to quickly and easily retrieve information related to objects on the station, enhancing their ability to generate accurate procedures. Users can associate new information with objects and have that information stored in a database.
Using ontology databases for scalable query answering, inconsistency detection, and data integration
Dou, Dejing
2011-01-01
An ontology database is a basic relational database management system that models an ontology plus its instances. To reason over the transitive closure of instances in the subsumption hierarchy, for example, an ontology database can either unfold views at query time or propagate assertions using triggers at load time. In this paper, we use existing benchmarks to evaluate our method—using triggers—and we demonstrate that by forward computing inferences, we not only improve query time, but the improvement appears to cost only more space (not time). However, we go on to show that the true penalties were simply opaque to the benchmark, i.e., the benchmark inadequately captures load-time costs. We have applied our methods to two case studies in biomedicine, using ontologies and data from genetics and neuroscience to illustrate two important applications: first, ontology databases answer ontology-based queries effectively; second, using triggers, ontology databases detect instance-based inconsistencies—something not possible using views. Finally, we demonstrate how to extend our methods to perform data integration across multiple, distributed ontology databases. PMID:22163378
Massive parallelization of serial inference algorithms for a complex generalized linear model
Suchard, Marc A.; Simpson, Shawn E.; Zorych, Ivan; Ryan, Patrick; Madigan, David
2014-01-01
Following a series of high-profile drug safety disasters in recent years, many countries are redoubling their efforts to ensure the safety of licensed medical products. Large-scale observational databases such as claims databases or electronic health record systems are attracting particular attention in this regard, but present significant methodological and computational concerns. In this paper we show how high-performance statistical computation, including graphics processing units, relatively inexpensive highly parallel computing devices, can enable complex methods in large databases. We focus on optimization and massive parallelization of cyclic coordinate descent approaches to fit a conditioned generalized linear model involving tens of millions of observations and thousands of predictors in a Bayesian context. We find orders-of-magnitude improvement in overall run-time. Coordinate descent approaches are ubiquitous in high-dimensional statistics and the algorithms we propose open up exciting new methodological possibilities with the potential to significantly improve drug safety. PMID:25328363
e23D: database and visualization of A-to-I RNA editing sites mapped to 3D protein structures.
Solomon, Oz; Eyal, Eran; Amariglio, Ninette; Unger, Ron; Rechavi, Gidi
2016-07-15
e23D, a database of A-to-I RNA editing sites from human, mouse and fly mapped to evolutionary related protein 3D structures, is presented. Genomic coordinates of A-to-I RNA editing sites are converted to protein coordinates and mapped onto 3D structures from PDB or theoretical models from ModBase. e23D allows visualization of the protein structure, modeling of recoding events and orientation of the editing with respect to nearby genomic functional sites from databases of disease causing mutations and genomic polymorphism. http://www.sheba-cancer.org.il/e23D CONTACT: oz.solomon@live.biu.ac.il or Eran.Eyal@sheba.health.gov.il. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
DOE Office of Scientific and Technical Information (OSTI.GOV)
D. D. Blackwell; K. W. Wisian; M. C. Richards
2000-04-01
Several activities related to geothermal resources in the western United States are described in this report. A database of geothermal site-specific thermal gradient and heat flow results from individual exploration wells in the western US has been assembled. Extensive temperature gradient and heat flow exploration data from the active exploration of the 1970's and 1980's were collected, compiled, and synthesized, emphasizing previously unavailable company data. Examples of the use and applications of the database are described. The database and results are available on the world wide web. In this report numerical models are used to establish basic qualitative relationships betweenmore » structure, heat input, and permeability distribution, and the resulting geothermal system. A series of steady state, two-dimensional numerical models evaluate the effect of permeability and structural variations on an idealized, generic Basin and Range geothermal system and the results are described.« less
Reactome graph database: Efficient access to complex pathway data
Korninger, Florian; Viteri, Guilherme; Marin-Garcia, Pablo; Ping, Peipei; Wu, Guanming; Stein, Lincoln; D’Eustachio, Peter
2018-01-01
Reactome is a free, open-source, open-data, curated and peer-reviewed knowledgebase of biomolecular pathways. One of its main priorities is to provide easy and efficient access to its high quality curated data. At present, biological pathway databases typically store their contents in relational databases. This limits access efficiency because there are performance issues associated with queries traversing highly interconnected data. The same data in a graph database can be queried more efficiently. Here we present the rationale behind the adoption of a graph database (Neo4j) as well as the new ContentService (REST API) that provides access to these data. The Neo4j graph database and its query language, Cypher, provide efficient access to the complex Reactome data model, facilitating easy traversal and knowledge discovery. The adoption of this technology greatly improved query efficiency, reducing the average query time by 93%. The web service built on top of the graph database provides programmatic access to Reactome data by object oriented queries, but also supports more complex queries that take advantage of the new underlying graph-based data storage. By adopting graph database technology we are providing a high performance pathway data resource to the community. The Reactome graph database use case shows the power of NoSQL database engines for complex biological data types. PMID:29377902
Reactome graph database: Efficient access to complex pathway data.
Fabregat, Antonio; Korninger, Florian; Viteri, Guilherme; Sidiropoulos, Konstantinos; Marin-Garcia, Pablo; Ping, Peipei; Wu, Guanming; Stein, Lincoln; D'Eustachio, Peter; Hermjakob, Henning
2018-01-01
Reactome is a free, open-source, open-data, curated and peer-reviewed knowledgebase of biomolecular pathways. One of its main priorities is to provide easy and efficient access to its high quality curated data. At present, biological pathway databases typically store their contents in relational databases. This limits access efficiency because there are performance issues associated with queries traversing highly interconnected data. The same data in a graph database can be queried more efficiently. Here we present the rationale behind the adoption of a graph database (Neo4j) as well as the new ContentService (REST API) that provides access to these data. The Neo4j graph database and its query language, Cypher, provide efficient access to the complex Reactome data model, facilitating easy traversal and knowledge discovery. The adoption of this technology greatly improved query efficiency, reducing the average query time by 93%. The web service built on top of the graph database provides programmatic access to Reactome data by object oriented queries, but also supports more complex queries that take advantage of the new underlying graph-based data storage. By adopting graph database technology we are providing a high performance pathway data resource to the community. The Reactome graph database use case shows the power of NoSQL database engines for complex biological data types.
A generic method for improving the spatial interoperability of medical and ecological databases.
Ghenassia, A; Beuscart, J B; Ficheur, G; Occelli, F; Babykina, E; Chazard, E; Genin, M
2017-10-03
The availability of big data in healthcare and the intensive development of data reuse and georeferencing have opened up perspectives for health spatial analysis. However, fine-scale spatial studies of ecological and medical databases are limited by the change of support problem and thus a lack of spatial unit interoperability. The use of spatial disaggregation methods to solve this problem introduces errors into the spatial estimations. Here, we present a generic, two-step method for merging medical and ecological databases that avoids the use of spatial disaggregation methods, while maximizing the spatial resolution. Firstly, a mapping table is created after one or more transition matrices have been defined. The latter link the spatial units of the original databases to the spatial units of the final database. Secondly, the mapping table is validated by (1) comparing the covariates contained in the two original databases, and (2) checking the spatial validity with a spatial continuity criterion and a spatial resolution index. We used our novel method to merge a medical database (the French national diagnosis-related group database, containing 5644 spatial units) with an ecological database (produced by the French National Institute of Statistics and Economic Studies, and containing with 36,594 spatial units). The mapping table yielded 5632 final spatial units. The mapping table's validity was evaluated by comparing the number of births in the medical database and the ecological databases in each final spatial unit. The median [interquartile range] relative difference was 2.3% [0; 5.7]. The spatial continuity criterion was low (2.4%), and the spatial resolution index was greater than for most French administrative areas. Our innovative approach improves interoperability between medical and ecological databases and facilitates fine-scale spatial analyses. We have shown that disaggregation models and large aggregation techniques are not necessarily the best ways to tackle the change of support problem.
van Walraven, Carl; Austin, Peter C; Manuel, Douglas; Knoll, Greg; Jennings, Allison; Forster, Alan J
2010-12-01
Administrative databases commonly use codes to indicate diagnoses. These codes alone are often inadequate to accurately identify patients with particular conditions. In this study, we determined whether we could quantify the probability that a person has a particular disease-in this case renal failure-using other routinely collected information available in an administrative data set. This would allow the accurate identification of a disease cohort in an administrative database. We determined whether patients in a randomly selected 100,000 hospitalizations had kidney disease (defined as two or more sequential serum creatinines or the single admission creatinine indicating a calculated glomerular filtration rate less than 60 mL/min/1.73 m²). The independent association of patient- and hospitalization-level variables with renal failure was measured using a multivariate logistic regression model in a random 50% sample of the patients. The model was validated in the remaining patients. Twenty thousand seven hundred thirteen patients had kidney disease (20.7%). A diagnostic code of kidney disease was strongly associated with kidney disease (relative risk: 34.4), but the accuracy of the code was poor (sensitivity: 37.9%; specificity: 98.9%). Twenty-nine patient- and hospitalization-level variables entered the kidney disease model. This model had excellent discrimination (c-statistic: 90.1%) and accurately predicted the probability of true renal failure. The probability threshold that maximized sensitivity and specificity for the identification of true kidney disease was 21.3% (sensitivity: 80.0%; specificity: 82.2%). Multiple variables available in administrative databases can be combined to quantify the probability that a person has a particular disease. This process permits accurate identification of a disease cohort in an administrative database. These methods may be extended to other diagnoses or procedures and could both facilitate and clarify the use of administrative databases for research and quality improvement. Copyright © 2010 Elsevier Inc. All rights reserved.
A Framework for Cloudy Model Optimization and Database Storage
NASA Astrophysics Data System (ADS)
Calvén, Emilia; Helton, Andrew; Sankrit, Ravi
2018-01-01
We present a framework for producing Cloudy photoionization models of the nebular emission from novae ejecta and storing a subset of the results in SQL database format for later usage. The database can be searched for models best fitting observed spectral line ratios. Additionally, the framework includes an optimization feature that can be used in tandem with the database to search for and improve on models by creating new Cloudy models while, varying the parameters. The database search and optimization can be used to explore the structures of nebulae by deriving their properties from the best-fit models. The goal is to provide the community with a large database of Cloudy photoionization models, generated from parameters reflecting conditions within novae ejecta, that can be easily fitted to observed spectral lines; either by directly accessing the database using the framework code or by usage of a website specifically made for this purpose.
Conceptual and logical level of database modeling
NASA Astrophysics Data System (ADS)
Hunka, Frantisek; Matula, Jiri
2016-06-01
Conceptual and logical levels form the top most levels of database modeling. Usually, ORM (Object Role Modeling) and ER diagrams are utilized to capture the corresponding schema. The final aim of business process modeling is to store its results in the form of database solution. For this reason, value oriented business process modeling which utilizes ER diagram to express the modeling entities and relationships between them are used. However, ER diagrams form the logical level of database schema. To extend possibilities of different business process modeling methodologies, the conceptual level of database modeling is needed. The paper deals with the REA value modeling approach to business process modeling using ER-diagrams, and derives conceptual model utilizing ORM modeling approach. Conceptual model extends possibilities for value modeling to other business modeling approaches.
Chang, Sun Ju; Im, Eun-Ok
2014-01-01
The purpose of the study was to develop a situation-specific theory for explaining health-related quality of life (QOL) among older South Korean adults with type 2 diabetes. To develop a situation-specific theory, three sources were considered: (a) the conceptual model of health promotion and QOL for people with chronic and disabling conditions (an existing theory related to the QOL in patients with chronic diseases); (b) a literature review using multiple databases including Cumulative Index for Nursing and Allied Health Literature (CINAHL), PubMed, PsycINFO, and two Korean databases; and (c) findings from our structural equation modeling study on health-related QOL in older South Korean adults with type 2 diabetes. The proposed situation-specific theory is constructed with six major concepts including barriers, resources, perceptual factors, psychosocial factors, health-promoting behaviors, and health-related QOL. The theory also provides the interrelationships among concepts. Health care providers and nurses could incorporate the proposed situation-specific theory into development of diabetes education programs for improving health-related QOL in older South Korean adults with type 2 diabetes.
Groundwater modeling in integrated water resources management--visions for 2020.
Refsgaard, Jens Christian; Højberg, Anker Lajer; Møller, Ingelise; Hansen, Martin; Søndergaard, Verner
2010-01-01
Groundwater modeling is undergoing a change from traditional stand-alone studies toward being an integrated part of holistic water resources management procedures. This is illustrated by the development in Denmark, where comprehensive national databases for geologic borehole data, groundwater-related geophysical data, geologic models, as well as a national groundwater-surface water model have been established and integrated to support water management. This has enhanced the benefits of using groundwater models. Based on insight gained from this Danish experience, a scientifically realistic scenario for the use of groundwater modeling in 2020 has been developed, in which groundwater models will be a part of sophisticated databases and modeling systems. The databases and numerical models will be seamlessly integrated, and the tasks of monitoring and modeling will be merged. Numerical models for atmospheric, surface water, and groundwater processes will be coupled in one integrated modeling system that can operate at a wide range of spatial scales. Furthermore, the management systems will be constructed with a focus on building credibility of model and data use among all stakeholders and on facilitating a learning process whereby data and models, as well as stakeholders' understanding of the system, are updated to currently available information. The key scientific challenges for achieving this are (1) developing new methodologies for integration of statistical and qualitative uncertainty; (2) mapping geological heterogeneity and developing scaling methodologies; (3) developing coupled model codes; and (4) developing integrated information systems, including quality assurance and uncertainty information that facilitate active stakeholder involvement and learning.
A general temporal data model and the structured population event history register
Clark, Samuel J.
2010-01-01
At this time there are 37 demographic surveillance system sites active in sub-Saharan Africa, Asia and Central America, and this number is growing continuously. These sites and other longitudinal population and health research projects generate large quantities of complex temporal data in order to describe, explain and investigate the event histories of individuals and the populations they constitute. This article presents possible solutions to some of the key data management challenges associated with those data. The fundamental components of a temporal system are identified and both they and their relationships to each other are given simple, standardized definitions. Further, a metadata framework is proposed to endow this abstract generalization with specific meaning and to bind the definitions of the data to the data themselves. The result is a temporal data model that is generalized, conceptually tractable, and inherently contains a full description of the primary data it organizes. Individual databases utilizing this temporal data model can be customized to suit the needs of their operators without modifying the underlying design of the database or sacrificing the potential to transparently share compatible subsets of their data with other similar databases. A practical working relational database design based on this general temporal data model is presented and demonstrated. This work has arisen out of experience with demographic surveillance in the developing world, and although the challenges and their solutions are more general, the discussion is organized around applications in demographic surveillance. An appendix contains detailed examples and working prototype databases that implement the examples discussed in the text. PMID:20396614
A veterinary anatomy tutoring system.
Theodoropoulos, G; Loumos, V; Antonopoulos, J
1994-02-14
A veterinary anatomy tutoring system was developed by using Knowledge Pro, an object-oriented software development tool with hypermedia capabilities, and MS Access, a relational database. Communication between them is facilitated by using the Structured Query Language (SQL). The architecture of the system is based on knowledge sets, each of which covers four different descriptions of an organ, namely gross anatomy (general description), gross anatomy (comparative features), histology, and embryology, which constitute the knowledge units. These knowledge units are linked with three global variables that define the animals, the topographies, and the system to which this organ belongs, creating three data-bases. These three data-bases are interrelated through the organ field in order to establish a relational model. This system allows versatility in the student's navigation through the information space by offering different modes for information location and presentation. These include course mode, review mode, reference mode, dissection mode, and comparison mode. In addition, the system provides a self-evaluation mode.
2017-01-01
Reusing the data from healthcare information systems can effectively facilitate clinical trials (CTs). How to select candidate patients eligible for CT recruitment criteria is a central task. Related work either depends on DBA (database administrator) to convert the recruitment criteria to native SQL queries or involves the data mapping between a standard ontology/information model and individual data source schema. This paper proposes an alternative computer-aided CT recruitment paradigm, based on syntax translation between different DSLs (domain-specific languages). In this paradigm, the CT recruitment criteria are first formally represented as production rules. The referenced rule variables are all from the underlying database schema. Then the production rule is translated to an intermediate query-oriented DSL (e.g., LINQ). Finally, the intermediate DSL is directly mapped to native database queries (e.g., SQL) automated by ORM (object-relational mapping). PMID:29065644
Validity of juvenile idiopathic arthritis diagnoses using administrative health data.
Stringer, Elizabeth; Bernatsky, Sasha
2015-03-01
Administrative health databases are valuable sources of data for conducting research including disease surveillance, outcomes research, and processes of health care at the population level. There has been limited use of administrative data to conduct studies of pediatric rheumatic conditions and no studies validating case definitions in Canada. We report a validation study of incident cases of juvenile idiopathic arthritis in the Canadian province of Nova Scotia. Cases identified through administrative data algorithms were compared to diagnoses in a clinical database. The sensitivity of algorithms that included pediatric rheumatology specialist claims was 81-86%. However, 35-48% of cases that were identified could not be verified in the clinical database depending on the algorithm used. Our case definitions would likely lead to overestimates of disease burden. Our findings may be related to issues pertaining to the non-fee-for-service remuneration model in Nova Scotia, in particular, systematic issues related to the process of submitting claims.
Zhang, Yinsheng; Zhang, Guoming; Shang, Qian
2017-01-01
Reusing the data from healthcare information systems can effectively facilitate clinical trials (CTs). How to select candidate patients eligible for CT recruitment criteria is a central task. Related work either depends on DBA (database administrator) to convert the recruitment criteria to native SQL queries or involves the data mapping between a standard ontology/information model and individual data source schema. This paper proposes an alternative computer-aided CT recruitment paradigm, based on syntax translation between different DSLs (domain-specific languages). In this paradigm, the CT recruitment criteria are first formally represented as production rules. The referenced rule variables are all from the underlying database schema. Then the production rule is translated to an intermediate query-oriented DSL (e.g., LINQ). Finally, the intermediate DSL is directly mapped to native database queries (e.g., SQL) automated by ORM (object-relational mapping).
Action and language mechanisms in the brain: data, models and neuroinformatics.
Arbib, Michael A; Bonaiuto, James J; Bornkessel-Schlesewsky, Ina; Kemmerer, David; MacWhinney, Brian; Nielsen, Finn Årup; Oztop, Erhan
2014-01-01
We assess the challenges of studying action and language mechanisms in the brain, both singly and in relation to each other to provide a novel perspective on neuroinformatics, integrating the development of databases for encoding – separately or together – neurocomputational models and empirical data that serve systems and cognitive neuroscience.
Learning the Semantics of Structured Data Sources
ERIC Educational Resources Information Center
Taheriyan, Mohsen
2015-01-01
Information sources such as relational databases, spreadsheets, XML, JSON, and Web APIs contain a tremendous amount of structured data, however, they rarely provide a semantic model to describe their contents. Semantic models of data sources capture the intended meaning of data sources by mapping them to the concepts and relationships defined by a…
Analysis of DIRAC's behavior using model checking with process algebra
NASA Astrophysics Data System (ADS)
Remenska, Daniela; Templon, Jeff; Willemse, Tim; Bal, Henri; Verstoep, Kees; Fokkink, Wan; Charpentier, Philippe; Graciani Diaz, Ricardo; Lanciotti, Elisa; Roiser, Stefan; Ciba, Krzysztof
2012-12-01
DIRAC is the grid solution developed to support LHCb production activities as well as user data analysis. It consists of distributed services and agents delivering the workload to the grid resources. Services maintain database back-ends to store dynamic state information of entities such as jobs, queues, staging requests, etc. Agents use polling to check and possibly react to changes in the system state. Each agent's logic is relatively simple; the main complexity lies in their cooperation. Agents run concurrently, and collaborate using the databases as shared memory. The databases can be accessed directly by the agents if running locally or through a DIRAC service interface if necessary. This shared-memory model causes entities to occasionally get into inconsistent states. Tracing and fixing such problems becomes formidable due to the inherent parallelism present. We propose more rigorous methods to cope with this. Model checking is one such technique for analysis of an abstract model of a system. Unlike conventional testing, it allows full control over the parallel processes execution, and supports exhaustive state-space exploration. We used the mCRL2 language and toolset to model the behavior of two related DIRAC subsystems: the workload and storage management system. Based on process algebra, mCRL2 allows defining custom data types as well as functions over these. This makes it suitable for modeling the data manipulations made by DIRAC's agents. By visualizing the state space and replaying scenarios with the toolkit's simulator, we have detected race-conditions and deadlocks in these systems, which, in several cases, were confirmed to occur in the reality. Several properties of interest were formulated and verified with the tool. Our future direction is automating the translation from DIRAC to a formal model.
GlobTherm, a global database on thermal tolerances for aquatic and terrestrial organisms.
Bennett, Joanne M; Calosi, Piero; Clusella-Trullas, Susana; Martínez, Brezo; Sunday, Jennifer; Algar, Adam C; Araújo, Miguel B; Hawkins, Bradford A; Keith, Sally; Kühn, Ingolf; Rahbek, Carsten; Rodríguez, Laura; Singer, Alexander; Villalobos, Fabricio; Ángel Olalla-Tárraga, Miguel; Morales-Castilla, Ignacio
2018-03-13
How climate affects species distributions is a longstanding question receiving renewed interest owing to the need to predict the impacts of global warming on biodiversity. Is climate change forcing species to live near their critical thermal limits? Are these limits likely to change through natural selection? These and other important questions can be addressed with models relating geographical distributions of species with climate data, but inferences made with these models are highly contingent on non-climatic factors such as biotic interactions. Improved understanding of climate change effects on species will require extensive analysis of thermal physiological traits, but such data are both scarce and scattered. To overcome current limitations, we created the GlobTherm database. The database contains experimentally derived species' thermal tolerance data currently comprising over 2,000 species of terrestrial, freshwater, intertidal and marine multicellular algae, plants, fungi, and animals. The GlobTherm database will be maintained and curated by iDiv with the aim to keep expanding it, and enable further investigations on the effects of climate on the distribution of life on Earth.
The DrugAge database of aging-related drugs.
Barardo, Diogo; Thornton, Daniel; Thoppil, Harikrishnan; Walsh, Michael; Sharifi, Samim; Ferreira, Susana; Anžič, Andreja; Fernandes, Maria; Monteiro, Patrick; Grum, Tjaša; Cordeiro, Rui; De-Souza, Evandro Araújo; Budovsky, Arie; Araujo, Natali; Gruber, Jan; Petrascheck, Michael; Fraifeld, Vadim E; Zhavoronkov, Alexander; Moskalev, Alexey; de Magalhães, João Pedro
2017-06-01
Aging is a major worldwide medical challenge. Not surprisingly, identifying drugs and compounds that extend lifespan in model organisms is a growing research area. Here, we present DrugAge (http://genomics.senescence.info/drugs/), a curated database of lifespan-extending drugs and compounds. At the time of writing, DrugAge contains 1316 entries featuring 418 different compounds from studies across 27 model organisms, including worms, flies, yeast and mice. Data were manually curated from 324 publications. Using drug-gene interaction data, we also performed a functional enrichment analysis of targets of lifespan-extending drugs. Enriched terms include various functional categories related to glutathione and antioxidant activity, ion transport and metabolic processes. In addition, we found a modest but significant overlap between targets of lifespan-extending drugs and known aging-related genes, suggesting that some but not most aging-related pathways have been targeted pharmacologically in longevity studies. DrugAge is freely available online for the scientific community and will be an important resource for biogerontologists. © 2017 The Authors. Aging Cell published by the Anatomical Society and John Wiley & Sons Ltd.
Variability sensitivity of dynamic texture based recognition in clinical CT data
NASA Astrophysics Data System (ADS)
Kwitt, Roland; Razzaque, Sharif; Lowell, Jeffrey; Aylward, Stephen
2014-03-01
Dynamic texture recognition using a database of template models has recently shown promising results for the task of localizing anatomical structures in Ultrasound video. In order to understand its clinical value, it is imperative to study the sensitivity with respect to inter-patient variability as well as sensitivity to acquisition parameters such as Ultrasound probe angle. Fully addressing patient and acquisition variability issues, however, would require a large database of clinical Ultrasound from many patients, acquired in a multitude of controlled conditions, e.g., using a tracked transducer. Since such data is not readily attainable, we advocate an alternative evaluation strategy using abdominal CT data as a surrogate. In this paper, we describe how to replicate Ultrasound variabilities by extracting subvolumes from CT and interpreting the image material as an ordered sequence of video frames. Utilizing this technique, and based on a database of abdominal CT from 45 patients, we report recognition results on an organ (kidney) recognition task, where we try to discriminate kidney subvolumes/videos from a collection of randomly sampled negative instances. We demonstrate that (1) dynamic texture recognition is relatively insensitive to inter-patient variation while (2) viewing angle variability needs to be accounted for in the template database. Since naively extending the template database to counteract variability issues can lead to impractical database sizes, we propose an alternative strategy based on automated identification of a small set of representative models.
Development of a global land cover characteristics database and IGBP DISCover from 1 km AVHRR data
Loveland, Thomas R.; Reed, B.C.; Brown, Jesslyn F.; Ohlen, D.O.; Zhu, Z.; Yang, L.; Merchant, J.W.
2000-01-01
Researchers from the U.S. Geological Survey, University of Nebraska-Lincoln and the European Commission's Joint Research Centre, Ispra, Italy produced a 1 km resolution global land cover characteristics database for use in a wide range of continental-to global-scale environmental studies. This database provides a unique view of the broad patterns of the biogeographical and ecoclimatic diversity of the global land surface, and presents a detailed interpretation of the extent of human development. The project was carried out as an International Geosphere-Biosphere Programme, Data and Information Systems (IGBP-DIS) initiative. The IGBP DISCover global land cover product is an integral component of the global land cover database. DISCover includes 17 general land cover classes defined to meet the needs of IGBP core science projects. A formal accuracy assessment of the DISCover data layer will be completed in 1998. The 1 km global land cover database was developed through a continent-by-continent unsupervised classification of 1 km monthly Advanced Very High Resolution Radiometer (AVHRR) Normalized Difference Vegetation Index (NDVI) composites covering 1992-1993. Extensive post-classification stratification was necessary to resolve spectral/temporal confusion between disparate land cover types. The complete global database consists of 961 seasonal land cover regions that capture patterns of land cover, seasonality and relative primary productivity. The seasonal land cover regions were aggregated to produce seven separate land cover data sets used for global environmental modelling and assessment. The data sets include IGBP DISCover, U.S. Geological Survey Anderson System, Simple Biosphere Model, Simple Biosphere Model 2, Biosphere-Atmosphere Transfer Scheme, Olson Ecosystems and Running Global Remote Sensing Land Cover. The database also includes all digital sources that were used in the classification. The complete database can be sourced from the website: http://edcwww.cr.usgs.gov/landdaac/glcc/glcc.html.
A Model Based Mars Climate Database for the Mission Design
NASA Technical Reports Server (NTRS)
2005-01-01
A viewgraph presentation on a model based climate database is shown. The topics include: 1) Why a model based climate database?; 2) Mars Climate Database v3.1 Who uses it ? (approx. 60 users!); 3) The new Mars Climate database MCD v4.0; 4) MCD v4.0: what's new ? 5) Simulation of Water ice clouds; 6) Simulation of Water ice cycle; 7) A new tool for surface pressure prediction; 8) Acces to the database MCD 4.0; 9) How to access the database; and 10) New web access
Wang, Xinxin; Lu, Xingmei; Zhou, Qing; Zhao, Yongsheng; Li, Xiaoqian; Zhang, Suojiang
2017-08-02
Refractive index is one of the important physical properties, which is widely used in separation and purification. In this study, the refractive index data of ILs were collected to establish a comprehensive database, which included about 2138 pieces of data from 1996 to 2014. The Group Contribution-Artificial Neural Network (GC-ANN) model and Group Contribution (GC) method were employed to predict the refractive index of ILs at different temperatures from 283.15 K to 368.15 K. Average absolute relative deviations (AARD) of the GC-ANN model and the GC method were 0.179% and 0.628%, respectively. The results showed that a GC-ANN model provided an effective way to estimate the refractive index of ILs, whereas the GC method was simple and extensive. In summary, both of the models were accurate and efficient approaches for estimating refractive indices of ILs.
Relational Database for the Geology of the Northern Rocky Mountains - Idaho, Montana, and Washington
Causey, J. Douglas; Zientek, Michael L.; Bookstrom, Arthur A.; Frost, Thomas P.; Evans, Karl V.; Wilson, Anna B.; Van Gosen, Bradley S.; Boleneus, David E.; Pitts, Rebecca A.
2008-01-01
A relational database was created to prepare and organize geologic map-unit and lithologic descriptions for input into a spatial database for the geology of the northern Rocky Mountains, a compilation of forty-three geologic maps for parts of Idaho, Montana, and Washington in U.S. Geological Survey Open File Report 2005-1235. Not all of the information was transferred to and incorporated in the spatial database due to physical file limitations. This report releases that part of the relational database that was completed for that earlier product. In addition to descriptive geologic information for the northern Rocky Mountains region, the relational database contains a substantial bibliography of geologic literature for the area. The relational database nrgeo.mdb (linked below) is available in Microsoft Access version 2000, a proprietary database program. The relational database contains data tables and other tables used to define terms, relationships between the data tables, and hierarchical relationships in the data; forms used to enter data; and queries used to extract data.
A Conceptual Model of the Information Requirements of Nursing Organizations
Miller, Emmy
1989-01-01
Three related issues play a role in the identification of the information requirements of nursing organizations. These issues are the current state of computer systems in health care organizations, the lack of a well-defined data set for nursing, and the absence of models representing data and information relevant to clinical and administrative nursing practice. This paper will examine current methods of data collection, processing, and storage in clinical and administrative nursing practice for the purpose of identifying the information requirements of nursing organizations. To satisfy these information requirements, database technology can be used; however, a model for database design is needed that reflects the conceptual framework of nursing and the professional concerns of nurses. A conceptual model of the types of data necessary to produce the desired information will be presented and the relationships among data will be delineated.
Databases for multilevel biophysiology research available at Physiome.jp.
Asai, Yoshiyuki; Abe, Takeshi; Li, Li; Oka, Hideki; Nomura, Taishin; Kitano, Hiroaki
2015-01-01
Physiome.jp (http://physiome.jp) is a portal site inaugurated in 2007 to support model-based research in physiome and systems biology. At Physiome.jp, several tools and databases are available to support construction of physiological, multi-hierarchical, large-scale models. There are three databases in Physiome.jp, housing mathematical models, morphological data, and time-series data. In late 2013, the site was fully renovated, and in May 2015, new functions were implemented to provide information infrastructure to support collaborative activities for developing models and performing simulations within the database framework. This article describes updates to the databases implemented since 2013, including cooperation among the three databases, interactive model browsing, user management, version management of models, management of parameter sets, and interoperability with applications.
Storing Data from Qweak--A Precision Measurement of the Proton's Weak Charge
NASA Astrophysics Data System (ADS)
Pote, Timothy
2008-10-01
The Qweak experiment will perform a precision measurement of the proton's parity violating weak charge at low Q-squared. The experiment will do so by measuring the asymmetry in parity-violating electron scattering. The proton's weak charge is directly related to the value of the weak mixing angle--a fundamental quantity in the Standard Model. The Standard Model makes a firm prediction for the value of the weak mixing angle and thus Qweak may provide insight into shortcomings in the SM. The Qweak experiment will run at Thomas Jefferson National Accelerator Facility in Newport News, VA. A database was designed to hold data directly related to the measurement of the proton's weak charge such as detector and beam monitor yield, asymmetry, and error as well as control structures such as the voltage across photomultiplier tubes and the temperature of the liquid hydrogen target. In order to test the database for speed and stability, it was filled with fake data that mimicked the data that Qweak is expected to collect. I will give a brief overview of the Qweak experiment and database design, and present data collected during these tests.
Anthropometry of Brazilian Air Force pilots.
da Silva, Gilvan V; Halpern, Manny; Gordon, Claire C
2017-10-01
Anthropometric data are essential for the design of military equipment including sizing of aircraft cockpits and personal gear. Currently, there are no anthropometric databases specific to Brazilian military personnel. The aim of this study was to create a Brazilian anthropometric database of Air Force pilots. The methods, protocols, descriptions, definitions, landmarks, tools and measurements procedures followed the instructions outlined in Measurer's Handbook: US Army and Marine Corps Anthropometric Surveys, 2010-2011 - NATICK/TR-11/017. The participants were measured countrywide, in all five Brazilian Geographical Regions. Thirty-nine anthropometric measurements related to cockpit design were selected. The results of 2133 males and 206 females aged 16-52 years constitute a set of basic data for cockpit design, space arrangement issues and adjustments, protective gear and equipment design, as well as for digital human modelling. Another important implication is that this study can be considered a starting point for reducing gender bias in women's career as pilots. Practitioner Summary: This paper describes the first large-scale anthropometric survey of the Brazilian Air Force pilots and the development of the related database. This study provides critical data for improving aircraft cockpit design for ergonomics and comprehensive pilot accommodation, protective gear and uniform design, as well as digital human modelling.
S&MPO - An information system for ozone spectroscopy on the WEB
NASA Astrophysics Data System (ADS)
Babikov, Yurii L.; Mikhailenko, Semen N.; Barbe, Alain; Tyuterev, Vladimir G.
2014-09-01
Spectroscopy and Molecular Properties of Ozone ("S&MPO") is an Internet accessible information system devoted to high resolution spectroscopy of the ozone molecule, related properties and data sources. S&MPO contains information on original spectroscopic data (line positions, line intensities, energies, transition moments, spectroscopic parameters) recovered from comprehensive analyses and modeling of experimental spectra as well as associated software for data representation written in PHP Java Script, C++ and FORTRAN. The line-by-line list of vibration-rotation transitions and other information is organized as a relational database under control of MySQL database tools. The main S&MPO goal is to provide access to all available information on vibration-rotation molecular states and transitions under extended conditions based on extrapolations of laboratory measurements using validated theoretical models. Applications for the S&MPO may include: education/training in molecular physics, radiative processes, laser physics; spectroscopic applications (analysis, Fourier transform spectroscopy, atmospheric optics, optical standards, spectroscopic atlases); applications to environment studies and atmospheric physics (remote sensing); data supply for specific databases; and to photochemistry (laser excitation, multiphoton processes). The system is accessible via Internet on two sites: http://smpo.iao.ru and http://smpo.univ-reims.fr.
Infrared target simulation environment for pattern recognition applications
NASA Astrophysics Data System (ADS)
Savakis, Andreas E.; George, Nicholas
1994-07-01
The generation of complete databases of IR data is extremely useful for training human observers and testing automatic pattern recognition algorithms. Field data may be used for realism, but require expensive and time-consuming procedures. IR scene simulation methods have emerged as a more economical and efficient alternative for the generation of IR databases. A novel approach to IR target simulation is presented in this paper. Model vehicles at 1:24 scale are used for the simulation of real targets. The temperature profile of the model vehicles is controlled using resistive circuits which are embedded inside the models. The IR target is recorded using an Inframetrics dual channel IR camera system. Using computer processing we place the recorded IR target in a prerecorded background. The advantages of this approach are: (1) the range and 3D target aspect can be controlled by the relative position between the camera and model vehicle; (2) the temperature profile can be controlled by adjusting the power delivered to the resistive circuit; (3) the IR sensor effects are directly incorporated in the recording process, because the real sensor is used; (4) the recorded target can embedded in various types of backgrounds recorded under different weather conditions, times of day etc. The effectiveness of this approach is demonstrated by generating an IR database of three vehicles which is used to train a back propagation neural network. The neural network is capable of classifying vehicle type, vehicle aspect, and relative temperature with a high degree of accuracy.
The Mouse Tumor Biology Database: A Comprehensive Resource for Mouse Models of Human Cancer.
Krupke, Debra M; Begley, Dale A; Sundberg, John P; Richardson, Joel E; Neuhauser, Steven B; Bult, Carol J
2017-11-01
Research using laboratory mice has led to fundamental insights into the molecular genetic processes that govern cancer initiation, progression, and treatment response. Although thousands of scientific articles have been published about mouse models of human cancer, collating information and data for a specific model is hampered by the fact that many authors do not adhere to existing annotation standards when describing models. The interpretation of experimental results in mouse models can also be confounded when researchers do not factor in the effect of genetic background on tumor biology. The Mouse Tumor Biology (MTB) database is an expertly curated, comprehensive compendium of mouse models of human cancer. Through the enforcement of nomenclature and related annotation standards, MTB supports aggregation of data about a cancer model from diverse sources and assessment of how genetic background of a mouse strain influences the biological properties of a specific tumor type and model utility. Cancer Res; 77(21); e67-70. ©2017 AACR . ©2017 American Association for Cancer Research.
Migration from relational to NoSQL database
NASA Astrophysics Data System (ADS)
Ghotiya, Sunita; Mandal, Juhi; Kandasamy, Saravanakumar
2017-11-01
Data generated by various real time applications, social networking sites and sensor devices is of very huge amount and unstructured, which makes it difficult for Relational database management systems to handle the data. Data is very precious component of any application and needs to be analysed after arranging it in some structure. Relational databases are only able to deal with structured data, so there is need of NoSQL Database management System which can deal with semi -structured data also. Relational database provides the easiest way to manage the data but as the use of NoSQL is increasing it is becoming necessary to migrate the data from Relational to NoSQL databases. Various frameworks has been proposed previously which provides mechanisms for migration of data stored at warehouses in SQL, middle layer solutions which can provide facility of data to be stored in NoSQL databases to handle data which is not structured. This paper provides a literature review of some of the recent approaches proposed by various researchers to migrate data from relational to NoSQL databases. Some researchers proposed mechanisms for the co-existence of NoSQL and Relational databases together. This paper provides a summary of mechanisms which can be used for mapping data stored in Relational databases to NoSQL databases. Various techniques for data transformation and middle layer solutions are summarised in the paper.
The IRMIS object model and services API.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Saunders, C.; Dohan, D. A.; Arnold, N. D.
2005-01-01
The relational model developed for the Integrated Relational Model of Installed Systems (IRMIS) toolkit has been successfully used to capture the Advanced Photon Source (APS) control system software (EPICS process variables and their definitions). The relational tables are populated by a crawler script that parses each Input/Output Controller (IOC) start-up file when an IOC reboot is detected. User interaction is provided by a Java Swing application that acts as a desktop for viewing the process variable information. Mapping between the display objects and the relational tables was carried out with the Hibernate Object Relational Modeling (ORM) framework. Work is wellmore » underway at the APS to extend the relational modeling to include control system hardware. For this work, due in part to the complex user interaction required, the primary application development environment has shifted from the relational database view to the object oriented (Java) perspective. With this approach, the business logic is executed in Java rather than in SQL stored procedures. This paper describes the object model used to represent control system software, hardware, and interconnects in IRMIS. We also describe the services API used to encapsulate the required behaviors for creating and maintaining the complex data. In addition to the core schema and object model, many important concepts in IRMIS are captured by the services API. IRMIS is an ambitious collaborative effort for defining and developing a relational database and associated applications to comprehensively document the large and complex EPICS-based control systems of today's accelerators. The documentation effort includes process variables, control system hardware, and interconnections. The approach could also be used to document all components of the accelerator, including mechanical, vacuum, power supplies, etc. One key aspect of IRMIS is that it is a documentation framework, not a design and development tool. We do not generate EPICS control system configurations from IRMIS, and hence do not impose any additional requirements on EPICS developers.« less
Automating Relational Database Design for Microcomputer Users.
ERIC Educational Resources Information Center
Pu, Hao-Che
1991-01-01
Discusses issues involved in automating the relational database design process for microcomputer users and presents a prototype of a microcomputer-based system (RA, Relation Assistant) that is based on expert systems technology and helps avoid database maintenance problems. Relational database design is explained and the importance of easy input…
Structure and software tools of AIDA.
Duisterhout, J S; Franken, B; Witte, F
1987-01-01
AIDA consists of a set of software tools to allow for fast development and easy-to-maintain Medical Information Systems. AIDA supports all aspects of such a system both during development and operation. It contains tools to build and maintain forms for interactive data entry and on-line input validation, a database management system including a data dictionary and a set of run-time routines for database access, and routines for querying the database and output formatting. Unlike an application generator, the user of AIDA may select parts of the tools to fulfill his needs and program other subsystems not developed with AIDA. The AIDA software uses as host language the ANSI-standard programming language MUMPS, an interpreted language embedded in an integrated database and programming environment. This greatly facilitates the portability of AIDA applications. The database facilities supported by AIDA are based on a relational data model. This data model is built on top of the MUMPS database, the so-called global structure. This relational model overcomes the restrictions of the global structure regarding string length. The global structure is especially powerful for sorting purposes. Using MUMPS as a host language allows the user an easy interface between user-defined data validation checks or other user-defined code and the AIDA tools. AIDA has been designed primarily for prototyping and for the construction of Medical Information Systems in a research environment which requires a flexible approach. The prototyping facility of AIDA operates terminal independent and is even to a great extent multi-lingual. Most of these features are table-driven; this allows on-line changes in the use of terminal type and language, but also causes overhead. AIDA has a set of optimizing tools by which it is possible to build a faster, but (of course) less flexible code from these table definitions. By separating the AIDA software in a source and a run-time version, one is able to write implementation-specific code which can be selected and loaded by a special source loader, being part of the AIDA software. This feature is also accessible for maintaining software on different sites and on different installations.
Minnesota's Tech Prep Outcome Evaluation Model.
ERIC Educational Resources Information Center
Brown, James M.; Pucel, David; Twohig, Cathy; Semler, Steve; Kuchinke, K. Peter
1998-01-01
Describes the Minnesota Tech Prep Consortia Evaluation System, which collects outcomes data on enrollment, retention, related job placement, higher education, dropouts, and diplomas/degrees awarded. Explains outcome measures, database development, data collection and analysis methods, and remaining challenges. (SK)
A digital library for medical imaging activities
NASA Astrophysics Data System (ADS)
dos Santos, Marcelo; Furuie, Sérgio S.
2007-03-01
This work presents the development of an electronic infrastructure to make available a free, online, multipurpose and multimodality medical image database. The proposed infrastructure implements a distributed architecture for medical image database, authoring tools, and a repository for multimedia documents. Also it includes a peer-reviewed model that assures quality of dataset. This public repository provides a single point of access for medical images and related information to facilitate retrieval tasks. The proposed approach has been used as an electronic teaching system in Radiology as well.
Integrated Array/Metadata Analytics
NASA Astrophysics Data System (ADS)
Misev, Dimitar; Baumann, Peter
2015-04-01
Data comes in various forms and types, and integration usually presents a problem that is often simply ignored and solved with ad-hoc solutions. Multidimensional arrays are an ubiquitous data type, that we find at the core of virtually all science and engineering domains, as sensor, model, image, statistics data. Naturally, arrays are richly described by and intertwined with additional metadata (alphanumeric relational data, XML, JSON, etc). Database systems, however, a fundamental building block of what we call "Big Data", lack adequate support for modelling and expressing these array data/metadata relationships. Array analytics is hence quite primitive or non-existent at all in modern relational DBMS. Recognizing this, we extended SQL with a new SQL/MDA part seamlessly integrating multidimensional array analytics into the standard database query language. We demonstrate the benefits of SQL/MDA with real-world examples executed in ASQLDB, an open-source mediator system based on HSQLDB and rasdaman, that already implements SQL/MDA.
Sequential data access with Oracle and Hadoop: a performance comparison
NASA Astrophysics Data System (ADS)
Baranowski, Zbigniew; Canali, Luca; Grancher, Eric
2014-06-01
The Hadoop framework has proven to be an effective and popular approach for dealing with "Big Data" and, thanks to its scaling ability and optimised storage access, Hadoop Distributed File System-based projects such as MapReduce or HBase are seen as candidates to replace traditional relational database management systems whenever scalable speed of data processing is a priority. But do these projects deliver in practice? Does migrating to Hadoop's "shared nothing" architecture really improve data access throughput? And, if so, at what cost? Authors answer these questions-addressing cost/performance as well as raw performance- based on a performance comparison between an Oracle-based relational database and Hadoop's distributed solutions like MapReduce or HBase for sequential data access. A key feature of our approach is the use of an unbiased data model as certain data models can significantly favour one of the technologies tested.
Ma, Yazhen; Xu, Ting; Wan, Dongshi; Ma, Tao; Shi, Sheng; Liu, Jianquan; Hu, Quanjun
2015-03-17
Soil salinity is a significant factor that impairs plant growth and agricultural productivity, and numerous efforts are underway to enhance salt tolerance of economically important plants. Populus species are widely cultivated for diverse uses. Especially, they grow in different habitats, from salty soil to mesophytic environment, and are therefore used as a model genus for elucidating physiological and molecular mechanisms of stress tolerance in woody plants. The Salinity Tolerant Poplar Database (STPD) is an integrative database for salt-tolerant poplar genome biology. Currently the STPD contains Populus euphratica genome and its related genetic resources. P. euphratica, with a preference of the salty habitats, has become a valuable genetic resource for the exploitation of tolerance characteristics in trees. This database contains curated data including genomic sequence, genes and gene functional information, non-coding RNA sequences, transposable elements, simple sequence repeats and single nucleotide polymorphisms information of P. euphratica, gene expression data between P. euphratica and Populus tomentosa, and whole-genome alignments between Populus trichocarpa, P. euphratica and Salix suchowensis. The STPD provides useful searching and data mining tools, including GBrowse genome browser, BLAST servers and genome alignments viewer, which can be used to browse genome regions, identify similar sequences and visualize genome alignments. Datasets within the STPD can also be downloaded to perform local searches. A new Salinity Tolerant Poplar Database has been developed to assist studies of salt tolerance in trees and poplar genomics. The database will be continuously updated to incorporate new genome-wide data of related poplar species. This database will serve as an infrastructure for researches on the molecular function of genes, comparative genomics, and evolution in closely related species as well as promote advances in molecular breeding within Populus. The STPD can be accessed at http://me.lzu.edu.cn/stpd/ .
Atlas - a data warehouse for integrative bioinformatics.
Shah, Sohrab P; Huang, Yong; Xu, Tao; Yuen, Macaire M S; Ling, John; Ouellette, B F Francis
2005-02-21
We present a biological data warehouse called Atlas that locally stores and integrates biological sequences, molecular interactions, homology information, functional annotations of genes, and biological ontologies. The goal of the system is to provide data, as well as a software infrastructure for bioinformatics research and development. The Atlas system is based on relational data models that we developed for each of the source data types. Data stored within these relational models are managed through Structured Query Language (SQL) calls that are implemented in a set of Application Programming Interfaces (APIs). The APIs include three languages: C++, Java, and Perl. The methods in these API libraries are used to construct a set of loader applications, which parse and load the source datasets into the Atlas database, and a set of toolbox applications which facilitate data retrieval. Atlas stores and integrates local instances of GenBank, RefSeq, UniProt, Human Protein Reference Database (HPRD), Biomolecular Interaction Network Database (BIND), Database of Interacting Proteins (DIP), Molecular Interactions Database (MINT), IntAct, NCBI Taxonomy, Gene Ontology (GO), Online Mendelian Inheritance in Man (OMIM), LocusLink, Entrez Gene and HomoloGene. The retrieval APIs and toolbox applications are critical components that offer end-users flexible, easy, integrated access to this data. We present use cases that use Atlas to integrate these sources for genome annotation, inference of molecular interactions across species, and gene-disease associations. The Atlas biological data warehouse serves as data infrastructure for bioinformatics research and development. It forms the backbone of the research activities in our laboratory and facilitates the integration of disparate, heterogeneous biological sources of data enabling new scientific inferences. Atlas achieves integration of diverse data sets at two levels. First, Atlas stores data of similar types using common data models, enforcing the relationships between data types. Second, integration is achieved through a combination of APIs, ontology, and tools. The Atlas software is freely available under the GNU General Public License at: http://bioinformatics.ubc.ca/atlas/
Atlas – a data warehouse for integrative bioinformatics
Shah, Sohrab P; Huang, Yong; Xu, Tao; Yuen, Macaire MS; Ling, John; Ouellette, BF Francis
2005-01-01
Background We present a biological data warehouse called Atlas that locally stores and integrates biological sequences, molecular interactions, homology information, functional annotations of genes, and biological ontologies. The goal of the system is to provide data, as well as a software infrastructure for bioinformatics research and development. Description The Atlas system is based on relational data models that we developed for each of the source data types. Data stored within these relational models are managed through Structured Query Language (SQL) calls that are implemented in a set of Application Programming Interfaces (APIs). The APIs include three languages: C++, Java, and Perl. The methods in these API libraries are used to construct a set of loader applications, which parse and load the source datasets into the Atlas database, and a set of toolbox applications which facilitate data retrieval. Atlas stores and integrates local instances of GenBank, RefSeq, UniProt, Human Protein Reference Database (HPRD), Biomolecular Interaction Network Database (BIND), Database of Interacting Proteins (DIP), Molecular Interactions Database (MINT), IntAct, NCBI Taxonomy, Gene Ontology (GO), Online Mendelian Inheritance in Man (OMIM), LocusLink, Entrez Gene and HomoloGene. The retrieval APIs and toolbox applications are critical components that offer end-users flexible, easy, integrated access to this data. We present use cases that use Atlas to integrate these sources for genome annotation, inference of molecular interactions across species, and gene-disease associations. Conclusion The Atlas biological data warehouse serves as data infrastructure for bioinformatics research and development. It forms the backbone of the research activities in our laboratory and facilitates the integration of disparate, heterogeneous biological sources of data enabling new scientific inferences. Atlas achieves integration of diverse data sets at two levels. First, Atlas stores data of similar types using common data models, enforcing the relationships between data types. Second, integration is achieved through a combination of APIs, ontology, and tools. The Atlas software is freely available under the GNU General Public License at: PMID:15723693
Integration and management of massive remote-sensing data based on GeoSOT subdivision model
NASA Astrophysics Data System (ADS)
Li, Shuang; Cheng, Chengqi; Chen, Bo; Meng, Li
2016-07-01
Owing to the rapid development of earth observation technology, the volume of spatial information is growing rapidly; therefore, improving query retrieval speed from large, rich data sources for remote-sensing data management systems is quite urgent. A global subdivision model, geographic coordinate subdivision grid with one-dimension integer coding on 2n-tree, which we propose as a solution, has been used in data management organizations. However, because a spatial object may cover several grids, ample data redundancy will occur when data are stored in relational databases. To solve this redundancy problem, we first combined the subdivision model with the spatial array database containing the inverted index. We proposed an improved approach for integrating and managing massive remote-sensing data. By adding a spatial code column in an array format in a database, spatial information in remote-sensing metadata can be stored and logically subdivided. We implemented our method in a Kingbase Enterprise Server database system and compared the results with the Oracle platform by simulating worldwide image data. Experimental results showed that our approach performed better than Oracle in terms of data integration and time and space efficiency. Our approach also offers an efficient storage management system for existing storage centers and management systems.
Empirical cost models for estimating power and energy consumption in database servers
NASA Astrophysics Data System (ADS)
Valdivia Garcia, Harold Dwight
The explosive growth in the size of data centers, coupled with the widespread use of virtualization technology has brought power and energy consumption as major concerns for data center administrators. Provisioning decisions must take into consideration not only target application performance but also the power demands and total energy consumption incurred by the hardware and software to be deployed at the data center. Failure to do so will result in damaged equipment, power outages, and inefficient operation. Since database servers comprise one of the most popular and important server applications deployed in such facilities, it becomes necessary to have accurate cost models that can predict the power and energy demands that each database workloads will impose in the system. In this work we present an empirical methodology to estimate the power and energy cost of database operations. Our methodology uses multiple-linear regression to derive accurate cost models that depend only on readily available statistics such as selectivity factors, tuple size, numbers columns and relational cardinality. Moreover, our method does not need measurement of individual hardware components, but rather total power and energy consumption measured at a server. We have implemented our methodology, and ran experiments with several server configurations. Our experiments indicate that we can predict power and energy more accurately than alternative methods found in the literature.
The 24th annual Nucleic Acids Research database issue: a look back and upcoming changes
Rigden, Daniel J
2017-01-01
Abstract This year's Database Issue of Nucleic Acids Research contains 152 papers that include descriptions of 54 new databases and update papers on 98 databases, of which 16 have not been previously featured in NAR. As always, these databases cover a broad range of molecular biology subjects, including genome structure, gene expression and its regulation, proteins, protein domains, and protein–protein interactions. Following the recent trend, an increasing number of new and established databases deal with the issues of human health, from cancer-causing mutations to drugs and drug targets. In accordance with this trend, three recently compiled databases that have been selected by NAR reviewers and editors as ‘breakthrough’ contributions, denovo-db, the Monarch Initiative, and Open Targets, cover human de novo gene variants, disease-related phenotypes in model organisms, and a bioinformatics platform for therapeutic target identification and validation, respectively. We expect these databases to attract the attention of numerous researchers working in various areas of genetics and genomics. Looking back at the past 12 years, we present here the ‘golden set’ of databases that have consistently served as authoritative, comprehensive, and convenient data resources widely used by the entire community and offer some lessons on what makes a successful database. The Database Issue is freely available online at the https://academic.oup.com/nar web site. An updated version of the NAR Molecular Biology Database Collection is available at http://www.oxfordjournals.org/nar/database/a/. PMID:28053160
Quantifying Astronaut Tasks: Robotic Technology and Future Space Suit Design
NASA Technical Reports Server (NTRS)
Newman, Dava
2003-01-01
The primary aim of this research effort was to advance the current understanding of astronauts' capabilities and limitations in space-suited EVA by developing models of the constitutive and compatibility relations of a space suit, based on experimental data gained from human test subjects as well as a 12 degree-of-freedom human-sized robot, and utilizing these fundamental relations to estimate a human factors performance metric for space suited EVA work. The three specific objectives are to: 1) Compile a detailed database of torques required to bend the joints of a space suit, using realistic, multi- joint human motions. 2) Develop a mathematical model of the constitutive relations between space suit joint torques and joint angular positions, based on experimental data and compare other investigators' physics-based models to experimental data. 3) Estimate the work envelope of a space suited astronaut, using the constitutive and compatibility relations of the space suit. The body of work that makes up this report includes experimentation, empirical and physics-based modeling, and model applications. A detailed space suit joint torque-angle database was compiled with a novel experimental approach that used space-suited human test subjects to generate realistic, multi-joint motions and an instrumented robot to measure the torques required to accomplish these motions in a space suit. Based on the experimental data, a mathematical model is developed to predict joint torque from the joint angle history. Two physics-based models of pressurized fabric cylinder bending are compared to experimental data, yielding design insights. The mathematical model is applied to EVA operations in an inverse kinematic analysis coupled to the space suit model to calculate the volume in which space-suited astronauts can work with their hands, demonstrating that operational human factors metrics can be predicted from fundamental space suit information.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wolery, Thomas J.; Tayne, Andrew; Jove-Colon, Carlos F.
Thermodynamic data are essential for understanding and evaluating geochemical processes, as by speciation-solubility calculations, reaction -path modeling, or reactive transport simulation. These data are required to evaluate both equilibrium states and the kinetic approach to such states (via the affinity term in rate laws). The development of thermodynamic databases for these purposes has a long history in geochemistry (e.g., Garrels and Christ, 1965; Helgeson et al., 1969; Helgeson et al., 1978, Johnson et al., 1992; Robie and Hemingway, 1995), paralleled by related and applicable work in the larger scientific community (e.g., Wagman et al., 1982, 1989; Cox et al., 1989;more » Barin and Platzki, 1995; Binneweis and Milke, 1999). The Yucca Mountain Project developed two qualified thermodynamic databases for to model geochemical processes, including ones involving repository components such as spent fuel. The first of the two (BSC, 2007a) was for systems containing dilute aqueous solutions only, the other (BSC, 2007b) for systems involving concentrated aqueous solutions and incorporating a model for such based on Pitzer’s (1991) equations . A 25°C-only database with similarities to the latter was also developed for WIPP (cf. Xiong, 2005). The YMP dilute systems database is widely used in the geochemistry community for a variety of applications involving rock/water interactions. The purpose of the present task is to improve these databases for work on the Used Fuel Disposition Project and maintain some semblance of order that will support qualification in support of the development of future underground high level nuclear waste disposal.« less
Multi-Sensor Scene Synthesis and Analysis
1981-09-01
Quad Trees for Image Representation and Processing ...... ... 126 2.6.2 Databases ..... ..... ... ..... ... ..... ..... 138 2.6.2.1 Definitions and...Basic Concepts ....... 138 2.6.3 Use of Databases in Hierarchical Scene Analysis ...... ... ..................... 147 2.6.4 Use of Relational Tables...Multisensor Image Database Systems (MIDAS) . 161 2.7.2 Relational Database System for Pictures .... ..... 168 2.7.3 Relational Pictorial Database
A linear geospatial streamflow modeling system for data sparse environments
Asante, Kwabena O.; Arlan, Guleid A.; Pervez, Md Shahriar; Rowland, James
2008-01-01
In many river basins around the world, inaccessibility of flow data is a major obstacle to water resource studies and operational monitoring. This paper describes a geospatial streamflow modeling system which is parameterized with global terrain, soils and land cover data and run operationally with satellite‐derived precipitation and evapotranspiration datasets. Simple linear methods transfer water through the subsurface, overland and river flow phases, and the resulting flows are expressed in terms of standard deviations from mean annual flow. In sample applications, the modeling system was used to simulate flow variations in the Congo, Niger, Nile, Zambezi, Orange and Lake Chad basins between 1998 and 2005, and the resulting flows were compared with mean monthly values from the open‐access Global River Discharge Database. While the uncalibrated model cannot predict the absolute magnitude of flow, it can quantify flow anomalies in terms of relative departures from mean flow. Most of the severe flood events identified in the flow anomalies were independently verified by the Dartmouth Flood Observatory (DFO) and the Emergency Disaster Database (EM‐DAT). Despite its limitations, the modeling system is valuable for rapid characterization of the relative magnitude of flood hazards and seasonal flow changes in data sparse settings.
Enhanced DIII-D Data Management Through a Relational Database
NASA Astrophysics Data System (ADS)
Burruss, J. R.; Peng, Q.; Schachter, J.; Schissel, D. P.; Terpstra, T. B.
2000-10-01
A relational database is being used to serve data about DIII-D experiments. The database is optimized for queries across multiple shots, allowing for rapid data mining by SQL-literate researchers. The relational database relates different experiments and datasets, thus providing a big picture of DIII-D operations. Users are encouraged to add their own tables to the database. Summary physics quantities about DIII-D discharges are collected and stored in the database automatically. Meta-data about code runs, MDSplus usage, and visualization tool usage are collected, stored in the database, and later analyzed to improve computing. Documentation on the database may be accessed through programming languages such as C, Java, and IDL, or through ODBC compliant applications such as Excel and Access. A database-driven web page also provides a convenient means for viewing database quantities through the World Wide Web. Demonstrations will be given at the poster.
DOE Office of Scientific and Technical Information (OSTI.GOV)
McPherson, Brian J.; Pan, Feng
2014-09-24
This report summarizes development of a coupled-process reservoir model for simulating enhanced geothermal systems (EGS) that utilize supercritical carbon dioxide as a working fluid. Specifically, the project team developed an advanced chemical kinetic model for evaluating important processes in EGS reservoirs, such as mineral precipitation and dissolution at elevated temperature and pressure, and for evaluating potential impacts on EGS surface facilities by related chemical processes. We assembled a new database for better-calibrated simulation of water/brine/ rock/CO2 interactions in EGS reservoirs. This database utilizes existing kinetic and other chemical data, and we updated those data to reflect corrections for elevated temperaturemore » and pressure conditions of EGS reservoirs.« less
Fiacco, P. A.; Rice, W. H.
1991-01-01
Computerized medical record systems require structured database architectures for information processing. However, the data must be able to be transferred across heterogeneous platform and software systems. Client-Server architecture allows for distributive processing of information among networked computers and provides the flexibility needed to link diverse systems together effectively. We have incorporated this client-server model with a graphical user interface into an outpatient medical record system, known as SuperChart, for the Department of Family Medicine at SUNY Health Science Center at Syracuse. SuperChart was developed using SuperCard and Oracle SuperCard uses modern object-oriented programming to support a hypermedia environment. Oracle is a powerful relational database management system that incorporates a client-server architecture. This provides both a distributed database and distributed processing which improves performance. PMID:1807732
FReD: the floral reflectance database--a web portal for analyses of flower colour.
Arnold, Sarah E J; Faruq, Samia; Savolainen, Vincent; McOwan, Peter W; Chittka, Lars
2010-12-10
Flower colour is of great importance in various fields relating to floral biology and pollinator behaviour. However, subjective human judgements of flower colour may be inaccurate and are irrelevant to the ecology and vision of the flower's pollinators. For precise, detailed information about the colours of flowers, a full reflectance spectrum for the flower of interest should be used rather than relying on such human assessments. The Floral Reflectance Database (FReD) has been developed to make an extensive collection of such data available to researchers. It is freely available at http://www.reflectance.co.uk. The database allows users to download spectral reflectance data for flower species collected from all over the world. These could, for example, be used in modelling interactions between pollinator vision and plant signals, or analyses of flower colours in various habitats. The database contains functions for calculating flower colour loci according to widely-used models of bee colour space, reflectance graphs of the spectra and an option to search for flowers with similar colours in bee colour space. The Floral Reflectance Database is a valuable new tool for researchers interested in the colours of flowers and their association with pollinator colour vision, containing raw spectral reflectance data for a large number of flower species.
NASA Astrophysics Data System (ADS)
Sprintall, J.; Cowley, R.; Palmer, M. D.; Domingues, C. M.; Suzuki, T.; Ishii, M.; Boyer, T.; Goni, G. J.; Gouretski, V. V.; Macdonald, A. M.; Thresher, A.; Good, S. A.; Diggs, S. C.
2016-02-01
Historical ocean temperature profile observations provide a critical element for a host of ocean and climate research activities. These include providing initial conditions for seasonal-to-decadal prediction systems, evaluating past variations in sea level and Earth's energy imbalance, ocean state estimation for studying variability and change, and climate model evaluation and development. The International Quality controlled Ocean Database (IQuOD) initiative represents a community effort to create the most globally complete temperature profile dataset, with (intelligent) metadata and assigned uncertainties. With an internationally coordinated effort organized by oceanographers, with data and ocean instrumentation expertise, and in close consultation with end users (e.g., climate modelers), the IQuOD initiative will assess and maximize the potential of an irreplaceable collection of ocean temperature observations (tens of millions of profiles collected at a cost of tens of billions of dollars, since 1772) to fulfil the demand for a climate-quality global database that can be used with greater confidence in a vast range of climate change related research and services of societal benefit. Progress towards version 1 of the IQuOD database, ongoing and future work will be presented. More information on IQuOD is available at www.iquod.org.
The EcoCyc database: reflecting new knowledge about Escherichia coli K-12.
Keseler, Ingrid M; Mackie, Amanda; Santos-Zavaleta, Alberto; Billington, Richard; Bonavides-Martínez, César; Caspi, Ron; Fulcher, Carol; Gama-Castro, Socorro; Kothari, Anamika; Krummenacker, Markus; Latendresse, Mario; Muñiz-Rascado, Luis; Ong, Quang; Paley, Suzanne; Peralta-Gil, Martin; Subhraveti, Pallavi; Velázquez-Ramírez, David A; Weaver, Daniel; Collado-Vides, Julio; Paulsen, Ian; Karp, Peter D
2017-01-04
EcoCyc (EcoCyc.org) is a freely accessible, comprehensive database that collects and summarizes experimental data for Escherichia coli K-12, the best-studied bacterial model organism. New experimental discoveries about gene products, their function and regulation, new metabolic pathways, enzymes and cofactors are regularly added to EcoCyc. New SmartTable tools allow users to browse collections of related EcoCyc content. SmartTables can also serve as repositories for user- or curator-generated lists. EcoCyc now supports running and modifying E. coli metabolic models directly on the EcoCyc website. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Integrating diverse databases into an unified analysis framework: a Galaxy approach
Blankenberg, Daniel; Coraor, Nathan; Von Kuster, Gregory; Taylor, James; Nekrutenko, Anton
2011-01-01
Recent technological advances have lead to the ability to generate large amounts of data for model and non-model organisms. Whereas, in the past, there have been a relatively small number of central repositories that serve genomic data, an increasing number of distinct specialized data repositories and resources have been established. Here, we describe a generic approach that provides for the integration of a diverse spectrum of data resources into a unified analysis framework, Galaxy (http://usegalaxy.org). This approach allows the simplified coupling of external data resources with the data analysis tools available to Galaxy users, while leveraging the native data mining facilities of the external data resources. Database URL: http://usegalaxy.org PMID:21531983
A first proposal for a general description model of forensic traces
NASA Astrophysics Data System (ADS)
Lindauer, Ina; Schäler, Martin; Vielhauer, Claus; Saake, Gunter; Hildebrandt, Mario
2012-06-01
In recent years, the amount of digitally captured traces at crime scenes increased rapidly. There are various kinds of such traces, like pick marks on locks, latent fingerprints on various surfaces as well as different micro traces. Those traces are different from each other not only in kind but also in which information they provide. Every kind of trace has its own properties (e.g., minutiae for fingerprints, or raking traces for locks) but there are also large amounts of metadata which all traces have in common like location, time and other additional information in relation to crime scenes. For selected types of crime scene traces, type-specific databases already exist, such as the ViCLAS for sexual offences, the IBIS for ballistic forensics or the AFIS for fingerprints. These existing forensic databases strongly differ in the trace description models. For forensic experts it would be beneficial to work with only one database capable of handling all possible forensic traces acquired at a crime scene. This is especially the case when different kinds of traces are interrelated (e.g., fingerprints and ballistic marks on a bullet casing). Unfortunately, current research on interrelated traces as well as general forensic data models and structures is not mature enough to build such an encompassing forensic database. Nevertheless, recent advances in the field of contact-less scanning make it possible to acquire different kinds of traces with the same device. Therefore the data of these traces is structured similarly what simplifies the design of a general forensic data model for different kinds of traces. In this paper we introduce a first common description model for different forensic trace types. Furthermore, we apply for selected trace types from the well established database schema development process the phases of transferring expert knowledge in the corresponding forensic fields into an extendible, database-driven, generalised forensic description model. The trace types considered here are fingerprint traces, traces at locks, micro traces and ballistic traces. Based on these basic trace types, also combined traces (multiple or overlapped fingerprints, fingerprints on bullet casings, etc) and partial traces are considered.
MIPS: curated databases and comprehensive secondary data resources in 2010.
Mewes, H Werner; Ruepp, Andreas; Theis, Fabian; Rattei, Thomas; Walter, Mathias; Frishman, Dmitrij; Suhre, Karsten; Spannagl, Manuel; Mayer, Klaus F X; Stümpflen, Volker; Antonov, Alexey
2011-01-01
The Munich Information Center for Protein Sequences (MIPS at the Helmholtz Center for Environmental Health, Neuherberg, Germany) has many years of experience in providing annotated collections of biological data. Selected data sets of high relevance, such as model genomes, are subjected to careful manual curation, while the bulk of high-throughput data is annotated by automatic means. High-quality reference resources developed in the past and still actively maintained include Saccharomyces cerevisiae, Neurospora crassa and Arabidopsis thaliana genome databases as well as several protein interaction data sets (MPACT, MPPI and CORUM). More recent projects are PhenomiR, the database on microRNA-related phenotypes, and MIPS PlantsDB for integrative and comparative plant genome research. The interlinked resources SIMAP and PEDANT provide homology relationships as well as up-to-date and consistent annotation for 38,000,000 protein sequences. PPLIPS and CCancer are versatile tools for proteomics and functional genomics interfacing to a database of compilations from gene lists extracted from literature. A novel literature-mining tool, EXCERBT, gives access to structured information on classified relations between genes, proteins, phenotypes and diseases extracted from Medline abstracts by semantic analysis. All databases described here, as well as the detailed descriptions of our projects can be accessed through the MIPS WWW server (http://mips.helmholtz-muenchen.de).
MIPS: curated databases and comprehensive secondary data resources in 2010
Mewes, H. Werner; Ruepp, Andreas; Theis, Fabian; Rattei, Thomas; Walter, Mathias; Frishman, Dmitrij; Suhre, Karsten; Spannagl, Manuel; Mayer, Klaus F.X.; Stümpflen, Volker; Antonov, Alexey
2011-01-01
The Munich Information Center for Protein Sequences (MIPS at the Helmholtz Center for Environmental Health, Neuherberg, Germany) has many years of experience in providing annotated collections of biological data. Selected data sets of high relevance, such as model genomes, are subjected to careful manual curation, while the bulk of high-throughput data is annotated by automatic means. High-quality reference resources developed in the past and still actively maintained include Saccharomyces cerevisiae, Neurospora crassa and Arabidopsis thaliana genome databases as well as several protein interaction data sets (MPACT, MPPI and CORUM). More recent projects are PhenomiR, the database on microRNA-related phenotypes, and MIPS PlantsDB for integrative and comparative plant genome research. The interlinked resources SIMAP and PEDANT provide homology relationships as well as up-to-date and consistent annotation for 38 000 000 protein sequences. PPLIPS and CCancer are versatile tools for proteomics and functional genomics interfacing to a database of compilations from gene lists extracted from literature. A novel literature-mining tool, EXCERBT, gives access to structured information on classified relations between genes, proteins, phenotypes and diseases extracted from Medline abstracts by semantic analysis. All databases described here, as well as the detailed descriptions of our projects can be accessed through the MIPS WWW server (http://mips.helmholtz-muenchen.de). PMID:21109531
Optimal tree increment models for the Northeastern United Statesq
Don C. Bragg
2003-01-01
used the potential relative increment (PRI) methodology to develop optimal tree diameter growth models for the Northeastern United States. Thirty species from the Eastwide Forest Inventory Database yielded 69,676 individuals, which were then reduced to fast-growing subsets for PRI analysis. For instance, only 14 individuals from the greater than 6,300-tree eastern...
Optimal Tree Increment Models for the Northeastern United States
Don C. Bragg
2005-01-01
I used the potential relative increment (PRI) methodology to develop optimal tree diameter growth models for the Northeastern United States. Thirty species from the Eastwide Forest Inventory Database yielded 69,676 individuals, which were then reduced to fast-growing subsets for PRI analysis. For instance, only 14 individuals from the greater than 6,300-tree eastern...
Action and Language Mechanisms in the Brain: Data, Models and Neuroinformatics
Bonaiuto, James J.; Bornkessel-Schlesewsky, Ina; Kemmerer, David; MacWhinney, Brian; Nielsen, Finn Årup; Oztop, Erhan
2014-01-01
We assess the challenges of studying action and language mechanisms in the brain, both singly and in relation to each other to provide a novel perspective on neuroinformatics, integrating the development of databases for encoding – separately or together – neurocomputational models and empirical data that serve systems and cognitive neuroscience. PMID:24234916
Estimating a Service-Life Distribution Based on Production Counts and a Failure Database
Ryan, Kenneth J.; Hamada, Michael Scott; Vardeman, Stephen B.
2017-04-01
A manufacturer wanted to compare the service-life distributions of two similar products. These concern product lifetimes after installation (not manufacture). For each product, there were available production counts and an imperfect database providing information on failing units. In the real case, these units were expensive repairable units warrantied against repairs. Failure (of interest here) was relatively rare and driven by a different mode/mechanism than ordinary repair events (not of interest here). Approach: Data models for the service life based on a standard parametric lifetime distribution and a related limited failure population were developed. These models were used to develop expressionsmore » for the likelihood of the available data that properly accounts for information missing in the failure database. Results: A Bayesian approach was employed to obtain estimates of model parameters (with associated uncertainty) in order to investigate characteristics of the service-life distribution. Custom software was developed and is included as Supplemental Material to this case study. One part of a responsible approach to the original case was a simulation experiment used to validate the correctness of the software and the behavior of the statistical methodology before using its results in the application, and an example of such an experiment is included here. Because of confidentiality issues that prevent use of the original data, simulated data with characteristics like the manufacturer’s proprietary data are used to illustrate some aspects of our real analyses. Lastly, we also note that, although this case focuses on rare and complete product failure, the statistical methodology provided is directly applicable to more standard warranty data problems involving typically much larger warranty databases where entries are warranty claims (often for repairs) rather than reports of complete failures.« less
Estimating a Service-Life Distribution Based on Production Counts and a Failure Database
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ryan, Kenneth J.; Hamada, Michael Scott; Vardeman, Stephen B.
A manufacturer wanted to compare the service-life distributions of two similar products. These concern product lifetimes after installation (not manufacture). For each product, there were available production counts and an imperfect database providing information on failing units. In the real case, these units were expensive repairable units warrantied against repairs. Failure (of interest here) was relatively rare and driven by a different mode/mechanism than ordinary repair events (not of interest here). Approach: Data models for the service life based on a standard parametric lifetime distribution and a related limited failure population were developed. These models were used to develop expressionsmore » for the likelihood of the available data that properly accounts for information missing in the failure database. Results: A Bayesian approach was employed to obtain estimates of model parameters (with associated uncertainty) in order to investigate characteristics of the service-life distribution. Custom software was developed and is included as Supplemental Material to this case study. One part of a responsible approach to the original case was a simulation experiment used to validate the correctness of the software and the behavior of the statistical methodology before using its results in the application, and an example of such an experiment is included here. Because of confidentiality issues that prevent use of the original data, simulated data with characteristics like the manufacturer’s proprietary data are used to illustrate some aspects of our real analyses. Lastly, we also note that, although this case focuses on rare and complete product failure, the statistical methodology provided is directly applicable to more standard warranty data problems involving typically much larger warranty databases where entries are warranty claims (often for repairs) rather than reports of complete failures.« less
Computational Thermochemistry of Jet Fuels and Rocket Propellants
NASA Technical Reports Server (NTRS)
Crawford, T. Daniel
2002-01-01
The design of new high-energy density molecules as candidates for jet and rocket fuels is an important goal of modern chemical thermodynamics. The NASA Glenn Research Center is home to a database of thermodynamic data for over 2000 compounds related to this goal, in the form of least-squares fits of heat capacities, enthalpies, and entropies as functions of temperature over the range of 300 - 6000 K. The chemical equilibrium with applications (CEA) program written and maintained by researchers at NASA Glenn over the last fifty years, makes use of this database for modeling the performance of potential rocket propellants. During its long history, the NASA Glenn database has been developed based on experimental results and data published in the scientific literature such as the standard JANAF tables. The recent development of efficient computational techniques based on quantum chemical methods provides an alternative source of information for expansion of such databases. For example, it is now possible to model dissociation or combustion reactions of small molecules to high accuracy using techniques such as coupled cluster theory or density functional theory. Unfortunately, the current applicability of reliable computational models is limited to relatively small molecules containing only around a dozen (non-hydrogen) atoms. We propose to extend the applicability of coupled cluster theory- often referred to as the 'gold standard' of quantum chemical methods- to molecules containing 30-50 non-hydrogen atoms. The centerpiece of this work is the concept of local correlation, in which the description of the electron interactions- known as electron correlation effects- are reduced to only their most important localized components. Such an advance has the potential to greatly expand the current reach of computational thermochemistry and thus to have a significant impact on the theoretical study of jet and rocket propellants.
Automatic image database generation from CAD for 3D object recognition
NASA Astrophysics Data System (ADS)
Sardana, Harish K.; Daemi, Mohammad F.; Ibrahim, Mohammad K.
1993-06-01
The development and evaluation of Multiple-View 3-D object recognition systems is based on a large set of model images. Due to the various advantages of using CAD, it is becoming more and more practical to use existing CAD data in computer vision systems. Current PC- level CAD systems are capable of providing physical image modelling and rendering involving positional variations in cameras, light sources etc. We have formulated a modular scheme for automatic generation of various aspects (views) of the objects in a model based 3-D object recognition system. These views are generated at desired orientations on the unit Gaussian sphere. With a suitable network file sharing system (NFS), the images can directly be stored on a database located on a file server. This paper presents the image modelling solutions using CAD in relation to multiple-view approach. Our modular scheme for data conversion and automatic image database storage for such a system is discussed. We have used this approach in 3-D polyhedron recognition. An overview of the results, advantages and limitations of using CAD data and conclusions using such as scheme are also presented.
ERAIZDA: a model for holistic annotation of animal infectious and zoonotic diseases.
Buza, Teresia M; Jack, Sherman W; Kirunda, Halid; Khaitsa, Margaret L; Lawrence, Mark L; Pruett, Stephen; Peterson, Daniel G
2015-01-01
There is an urgent need for a unified resource that integrates trans-disciplinary annotations of emerging and reemerging animal infectious and zoonotic diseases. Such data integration will provide wonderful opportunity for epidemiologists, researchers and health policy makers to make data-driven decisions designed to improve animal health. Integrating emerging and reemerging animal infectious and zoonotic disease data from a large variety of sources into a unified open-access resource provides more plausible arguments to achieve better understanding of infectious and zoonotic diseases. We have developed a model for interlinking annotations of these diseases. These diseases are of particular interest because of the threats they pose to animal health, human health and global health security. We demonstrated the application of this model using brucellosis, an infectious and zoonotic disease. Preliminary annotations were deposited into VetBioBase database (http://vetbiobase.igbb.msstate.edu). This database is associated with user-friendly tools to facilitate searching, retrieving and downloading of disease-related information. Database URL: http://vetbiobase.igbb.msstate.edu. © The Author(s) 2015. Published by Oxford University Press.
Aerothermal Testing for Project Orion Crew Exploration Vehicle
NASA Technical Reports Server (NTRS)
Berry, Scott A.; Horvath, Thomas J.; Lillard, Randolph P.; Kirk, Benjamin S.; Fischer-Cassady, Amy
2009-01-01
The Project Orion Crew Exploration Vehicle aerothermodynamic experimentation strategy, as it relates to flight database development, is reviewed. Experimental data has been obtained to both validate the computational predictions utilized as part of the database and support the development of engineering models for issues not adequately addressed with computations. An outline is provided of the working groups formed to address the key deficiencies in data and knowledge for blunt reentry vehicles. The facilities utilized to address these deficiencies are reviewed, along with some of the important results obtained thus far. For smooth wall comparisons of computational convective heating predictions against experimental data from several facilities, confidence was gained with the use of algebraic turbulence model solutions as part of the database. For cavities and protuberances, experimental data is being used for screening various designs, plus providing support to the development of engineering models. With the reaction-control system testing, experimental data were acquired on the surface in combination with off-body flow visualization of the jet plumes and interactions. These results are being compared against predictions for improved understanding of aftbody thermal environments and uncertainties.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Agarwal, Vivek; Lybeck, Nancy J.; Pham, Binh
Research and development efforts are required to address aging and reliability concerns of the existing fleet of nuclear power plants. As most plants continue to operate beyond the license life (i.e., towards 60 or 80 years), plant components are more likely to incur age-related degradation mechanisms. To assess and manage the health of aging plant assets across the nuclear industry, the Electric Power Research Institute has developed a web-based Fleet-Wide Prognostic and Health Management (FW-PHM) Suite for diagnosis and prognosis. FW-PHM is a set of web-based diagnostic and prognostic tools and databases, comprised of the Diagnostic Advisor, the Asset Faultmore » Signature Database, the Remaining Useful Life Advisor, and the Remaining Useful Life Database, that serves as an integrated health monitoring architecture. The main focus of this paper is the implementation of prognostic models for generator step-up transformers in the FW-PHM Suite. One prognostic model discussed is based on the functional relationship between degree of polymerization, (the most commonly used metrics to assess the health of the winding insulation in a transformer) and furfural concentration in the insulating oil. The other model is based on thermal-induced degradation of the transformer insulation. By utilizing transformer loading information, established thermal models are used to estimate the hot spot temperature inside the transformer winding. Both models are implemented in the Remaining Useful Life Database of the FW-PHM Suite. The Remaining Useful Life Advisor utilizes the implemented prognostic models to estimate the remaining useful life of the paper winding insulation in the transformer based on actual oil testing and operational data.« less
Technical Aspects of Interfacing MUMPS to an External SQL Relational Database Management System
Kuzmak, Peter M.; Walters, Richard F.; Penrod, Gail
1988-01-01
This paper describes an interface connecting InterSystems MUMPS (M/VX) to an external relational DBMS, the SYBASE Database Management System. The interface enables MUMPS to operate in a relational environment and gives the MUMPS language full access to a complete set of SQL commands. MUMPS generates SQL statements as ASCII text and sends them to the RDBMS. The RDBMS executes the statements and returns ASCII results to MUMPS. The interface suggests that the language features of MUMPS make it an attractive tool for use in the relational database environment. The approach described in this paper separates MUMPS from the relational database. Positioning the relational database outside of MUMPS promotes data sharing and permits a number of different options to be used for working with the data. Other languages like C, FORTRAN, and COBOL can access the RDBMS database. Advanced tools provided by the relational database vendor can also be used. SYBASE is an advanced high-performance transaction-oriented relational database management system for the VAX/VMS and UNIX operating systems. SYBASE is designed using a distributed open-systems architecture, and is relatively easy to interface with MUMPS.
Protoptype integrated design (Pride) system reference manual. Volume 2: Schema definition
NASA Technical Reports Server (NTRS)
Fishwick, P. A.; Sutter, T. R.; Blackburn, C. L.
1983-01-01
An initial description of an evolving relational database schema is presented for the management of finite element model design and analysis data. The report presents a description of each relation including attribute names, data types, and definitions. The format of this report is such that future modifications and enhancements may be easily incorporated.
Hawthorne L. Beyer; Jeff Jenness; Samuel A. Cushman
2010-01-01
Spatial information systems (SIS) is a term that describes a wide diversity of concepts, techniques, and technologies related to the capture, management, display and analysis of spatial information. It encompasses technologies such as geographic information systems (GIS), global positioning systems (GPS), remote sensing, and relational database management systems (...
High-Performance Secure Database Access Technologies for HEP Grids
DOE Office of Scientific and Technical Information (OSTI.GOV)
Matthew Vranicar; John Weicher
2006-04-17
The Large Hadron Collider (LHC) at the CERN Laboratory will become the largest scientific instrument in the world when it starts operations in 2007. Large Scale Analysis Computer Systems (computational grids) are required to extract rare signals of new physics from petabytes of LHC detector data. In addition to file-based event data, LHC data processing applications require access to large amounts of data in relational databases: detector conditions, calibrations, etc. U.S. high energy physicists demand efficient performance of grid computing applications in LHC physics research where world-wide remote participation is vital to their success. To empower physicists with data-intensive analysismore » capabilities a whole hyperinfrastructure of distributed databases cross-cuts a multi-tier hierarchy of computational grids. The crosscutting allows separation of concerns across both the global environment of a federation of computational grids and the local environment of a physicist’s computer used for analysis. Very few efforts are on-going in the area of database and grid integration research. Most of these are outside of the U.S. and rely on traditional approaches to secure database access via an extraneous security layer separate from the database system core, preventing efficient data transfers. Our findings are shared by the Database Access and Integration Services Working Group of the Global Grid Forum, who states that "Research and development activities relating to the Grid have generally focused on applications where data is stored in files. However, in many scientific and commercial domains, database management systems have a central role in data storage, access, organization, authorization, etc, for numerous applications.” There is a clear opportunity for a technological breakthrough, requiring innovative steps to provide high-performance secure database access technologies for grid computing. We believe that an innovative database architecture where the secure authorization is pushed into the database engine will eliminate inefficient data transfer bottlenecks. Furthermore, traditionally separated database and security layers provide an extra vulnerability, leaving a weak clear-text password authorization as the only protection on the database core systems. Due to the legacy limitations of the systems’ security models, the allowed passwords often can not even comply with the DOE password guideline requirements. We see an opportunity for the tight integration of the secure authorization layer with the database server engine resulting in both improved performance and improved security. Phase I has focused on the development of a proof-of-concept prototype using Argonne National Laboratory’s (ANL) Argonne Tandem-Linac Accelerator System (ATLAS) project as a test scenario. By developing a grid-security enabled version of the ATLAS project’s current relation database solution, MySQL, PIOCON Technologies aims to offer a more efficient solution to secure database access.« less
Database resources of the National Center for Biotechnology Information
Acland, Abigail; Agarwala, Richa; Barrett, Tanya; Beck, Jeff; Benson, Dennis A.; Bollin, Colleen; Bolton, Evan; Bryant, Stephen H.; Canese, Kathi; Church, Deanna M.; Clark, Karen; DiCuccio, Michael; Dondoshansky, Ilya; Federhen, Scott; Feolo, Michael; Geer, Lewis Y.; Gorelenkov, Viatcheslav; Hoeppner, Marilu; Johnson, Mark; Kelly, Christopher; Khotomlianski, Viatcheslav; Kimchi, Avi; Kimelman, Michael; Kitts, Paul; Krasnov, Sergey; Kuznetsov, Anatoliy; Landsman, David; Lipman, David J.; Lu, Zhiyong; Madden, Thomas L.; Madej, Tom; Maglott, Donna R.; Marchler-Bauer, Aron; Karsch-Mizrachi, Ilene; Murphy, Terence; Ostell, James; O'Sullivan, Christopher; Panchenko, Anna; Phan, Lon; Pruitt, Don Preussm Kim D.; Rubinstein, Wendy; Sayers, Eric W.; Schneider, Valerie; Schuler, Gregory D.; Sequeira, Edwin; Sherry, Stephen T.; Shumway, Martin; Sirotkin, Karl; Siyan, Karanjit; Slotta, Douglas; Soboleva, Alexandra; Soussov, Vladimir; Starchenko, Grigory; Tatusova, Tatiana A.; Trawick, Bart W.; Vakatov, Denis; Wang, Yanli; Ward, Minghong; John Wilbur, W.; Yaschenko, Eugene; Zbicz, Kerry
2014-01-01
In addition to maintaining the GenBank® nucleic acid sequence database, the National Center for Biotechnology Information (NCBI, http://www.ncbi.nlm.nih.gov) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI Web site. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central, PubReader, Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link, Primer-BLAST, COBALT, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, dbVar, Epigenomics, the Genetic Testing Registry, Genome and related tools, the Map Viewer, Trace Archive, Sequence Read Archive, BioProject, BioSample, ClinVar, MedGen, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus, Probe, Online Mendelian Inheritance in Animals, the Molecular Modeling Database, the Conserved Domain Database, the Conserved Domain Architecture Retrieval Tool, Biosystems, Protein Clusters and the PubChem suite of small molecule databases. Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All these resources can be accessed through the NCBI home page. PMID:24259429
2012-01-01
Background In the scientific biodiversity community, it is increasingly perceived the need to build a bridge between molecular and traditional biodiversity studies. We believe that the information technology could have a preeminent role in integrating the information generated by these studies with the large amount of molecular data we can find in bioinformatics public databases. This work is primarily aimed at building a bioinformatic infrastructure for the integration of public and private biodiversity data through the development of GIDL, an Intelligent Data Loader coupled with the Molecular Biodiversity Database. The system presented here organizes in an ontological way and locally stores the sequence and annotation data contained in the GenBank primary database. Methods The GIDL architecture consists of a relational database and of an intelligent data loader software. The relational database schema is designed to manage biodiversity information (Molecular Biodiversity Database) and it is organized in four areas: MolecularData, Experiment, Collection and Taxonomy. The MolecularData area is inspired to an established standard in Generic Model Organism Databases, the Chado relational schema. The peculiarity of Chado, and also its strength, is the adoption of an ontological schema which makes use of the Sequence Ontology. The Intelligent Data Loader (IDL) component of GIDL is an Extract, Transform and Load software able to parse data, to discover hidden information in the GenBank entries and to populate the Molecular Biodiversity Database. The IDL is composed by three main modules: the Parser, able to parse GenBank flat files; the Reasoner, which automatically builds CLIPS facts mapping the biological knowledge expressed by the Sequence Ontology; the DBFiller, which translates the CLIPS facts into ordered SQL statements used to populate the database. In GIDL Semantic Web technologies have been adopted due to their advantages in data representation, integration and processing. Results and conclusions Entries coming from Virus (814,122), Plant (1,365,360) and Invertebrate (959,065) divisions of GenBank rel.180 have been loaded in the Molecular Biodiversity Database by GIDL. Our system, combining the Sequence Ontology and the Chado schema, allows a more powerful query expressiveness compared with the most commonly used sequence retrieval systems like Entrez or SRS. PMID:22536971
Data Model and Relational Database Design for Highway Runoff Water-Quality Metadata
Granato, Gregory E.; Tessler, Steven
2001-01-01
A National highway and urban runoff waterquality metadatabase was developed by the U.S. Geological Survey in cooperation with the Federal Highway Administration as part of the National Highway Runoff Water-Quality Data and Methodology Synthesis (NDAMS). The database was designed to catalog available literature and to document results of the synthesis in a format that would facilitate current and future research on highway and urban runoff. This report documents the design and implementation of the NDAMS relational database, which was designed to provide a catalog of available information and the results of an assessment of the available data. All the citations and the metadata collected during the review process are presented in a stratified metadatabase that contains citations for relevant publications, abstracts (or previa), and reportreview metadata for a sample of selected reports that document results of runoff quality investigations. The database is referred to as a metadatabase because it contains information about available data sets rather than a record of the original data. The database contains the metadata needed to evaluate and characterize how valid, current, complete, comparable, and technically defensible published and available information may be when evaluated for application to the different dataquality objectives as defined by decision makers. This database is a relational database, in that all information is ultimately linked to a given citation in the catalog of available reports. The main database file contains 86 tables consisting of 29 data tables, 11 association tables, and 46 domain tables. The data tables all link to a particular citation, and each data table is focused on one aspect of the information collected in the literature search and the evaluation of available information. This database is implemented in the Microsoft (MS) Access database software because it is widely used within and outside of government and is familiar to many existing and potential customers. The stratified metadatabase design for the NDAMS program is presented in the MS Access file DBDESIGN.mdb and documented with a data dictionary in the NDAMS_DD.mdb file recorded on the CD-ROM. The data dictionary file includes complete documentation of the table names, table descriptions, and information about each of the 419 fields in the database.
2009-01-01
Background The majority of the genes even in well-studied multi-cellular model organisms have not been functionally characterized yet. Mining the numerous genome wide data sets related to protein function to retrieve potential candidate genes for a particular biological process remains a challenge. Description GExplore has been developed to provide a user-friendly database interface for data mining at the gene expression/protein function level to help in hypothesis development and experiment design. It supports combinatorial searches for proteins with certain domains, tissue- or developmental stage-specific expression patterns, and mutant phenotypes. GExplore operates on a stand-alone database and has fast response times, which is essential for exploratory searches. The interface is not only user-friendly, but also modular so that it accommodates additional data sets in the future. Conclusion GExplore is an online database for quick mining of data related to gene and protein function, providing a multi-gene display of data sets related to the domain composition of proteins as well as expression and phenotype data. GExplore is publicly available at: http://genome.sfu.ca/gexplore/ PMID:19917126
Garraín, Daniel; Fazio, Simone; de la Rúa, Cristina; Recchioni, Marco; Lechón, Yolanda; Mathieux, Fabrice
2015-01-01
The aim of this study is to identify areas of potential improvement of the European Reference Life Cycle Database (ELCD) fuel datasets. The revision is based on the data quality indicators described by the ILCD Handbook, applied on sectorial basis. These indicators evaluate the technological, geographical and time-related representativeness of the dataset and the appropriateness in terms of completeness, precision and methodology. Results show that ELCD fuel datasets have a very good quality in general terms, nevertheless some findings and recommendations in order to improve the quality of Life-Cycle Inventories have been derived. Moreover, these results ensure the quality of the fuel-related datasets to any LCA practitioner, and provide insights related to the limitations and assumptions underlying in the datasets modelling. Giving this information, the LCA practitioner will be able to decide whether the use of the ELCD fuel datasets is appropriate based on the goal and scope of the analysis to be conducted. The methodological approach would be also useful for dataset developers and reviewers, in order to improve the overall DQR of databases.
Solomon, Nancy Pearl; Dietsch, Angela M; Dietrich-Burns, Katie E; Styrmisdottir, Edda L; Armao, Christopher S
2016-05-01
This report describes the development and preliminary analysis of a database for traumatically injured military service members with dysphagia. A multidimensional database was developed to capture clinical variables related to swallowing. Data were derived from clinical records and instrumental swallow studies, and ranged from demographics, injury characteristics, swallowing biomechanics, medications, and standardized tools (e.g., Glasgow Coma Scale, Penetration-Aspiration Scale). Bayesian Belief Network modeling was used to analyze the data at intermediate points, guide data collection, and predict outcomes. Predictive models were validated with independent data via receiver operating characteristic curves. The first iteration of the model (n = 48) revealed variables that could be collapsed for the second model (n = 96). The ability to predict recovery from dysphagia improved from the second to third models (area under the curve = 0.68 to 0.86). The third model, based on 161 cases, revealed "initial diet restrictions" as first-degree, and "Glasgow Coma Scale, intubation history, and diet change" as second-degree associates for diet restrictions at discharge. This project demonstrates the potential for bioinformatics to advance understanding of dysphagia. This database in concert with Bayesian Belief Network modeling makes it possible to explore predictive relationships between injuries and swallowing function, individual variability in recovery, and appropriate treatment options. Reprint & Copyright © 2016 Association of Military Surgeons of the U.S.
Dimensional modeling: beyond data processing constraints.
Bunardzic, A
1995-01-01
The focus of information processing requirements is shifting from the on-line transaction processing (OLTP) issues to the on-line analytical processing (OLAP) issues. While the former serves to ensure the feasibility of the real-time on-line transaction processing (which has already exceeded a level of up to 1,000 transactions per second under normal conditions), the latter aims at enabling more sophisticated analytical manipulation of data. The OLTP requirements, or how to efficiently get data into the system, have been solved by applying the Relational theory in the form of Entity-Relation model. There is presently no theory related to OLAP that would resolve the analytical processing requirements as efficiently as Relational theory provided for the transaction processing. The "relational dogma" also provides the mathematical foundation for the Centralized Data Processing paradigm in which mission-critical information is incorporated as 'one and only one instance' of data, thus ensuring data integrity. In such surroundings, the information that supports business analysis and decision support activities is obtained by running predefined reports and queries that are provided by the IS department. In today's intensified competitive climate, businesses are finding that this traditional approach is not good enough. The only way to stay on top of things, and to survive and prosper, is to decentralize the IS services. The newly emerging Distributed Data Processing, with its increased emphasis on empowering the end user, does not seem to find enough merit in the relational database model to justify relying upon it. Relational theory proved too rigid and complex to accommodate the analytical processing needs. In order to satisfy the OLAP requirements, or how to efficiently get the data out of the system, different models, metaphors, and theories have been devised. All of them are pointing to the need for simplifying the highly non-intuitive mathematical constraints found in the relational databases normalized to their 3rd normal form. Object-oriented approach insists on the importance of the common sense component of the data processing activities. But, particularly interesting, is the approach that advocates the necessity of 'flattening' the structure of the business models as we know them today. This discipline is called Dimensional Modeling and it enables users to form multidimensional views of the relevant facts which are stored in a 'flat' (non-structured), easy-to-comprehend and easy-to-access database. When using dimensional modeling, we relax many of the axioms inherent in a relational model. We focus on the knowledge of the relevant facts which are reflecting the business operations and are the real basis for the decision support and business analysis. At the core of the dimensional modeling are fact tables that contain the non-discrete, additive data. To determine the level of aggregation of these facts, we use granularity tables that specify the resolution, or the level/detail, that the user is allowed to entertain. The third component is dimension tables that embody the knowledge of the constraints to be used to form the views.
dbDSM: a manually curated database for deleterious synonymous mutations.
Wen, Pengbo; Xiao, Peng; Xia, Junfeng
2016-06-15
Synonymous mutations (SMs), which changed the sequence of a gene without directly altering the amino acid sequence of the encoded protein, were thought to have no functional consequences for a long time. They are often assumed to be neutral in models of mutation and selection and were completely ignored in many studies. However, accumulating experimental evidence has demonstrated that these mutations exert their impact on gene functions via splicing accuracy, mRNA stability, translation fidelity, protein folding and expression, and some of these mutations are implicated in human diseases. To the best of our knowledge, there is still no database specially focusing on disease-related SMs. We have developed a new database called dbDSM (database of Deleterious Synonymous Mutation), a continually updated database that collects, curates and manages available human disease-related SM data obtained from published literature. In the current release, dbDSM collects 1936 SM-disease association entries, including 1289 SMs and 443 human diseases from ClinVar, GRASP, GWAS Catalog, GWASdb, PolymiRTS database, PubMed database and Web of Knowledge. Additionally, we provided users a link to download all the data in the dbDSM and a link to submit novel data into the database. We hope dbDSM will be a useful resource for investigating the roles of SMs in human disease. dbDSM is freely available online at http://bioinfo.ahu.edu.cn:8080/dbDSM/index.jsp with all major browser supported. jfxia@ahu.edu.cn Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
A pilot GIS database of active faults of Mt. Etna (Sicily): A tool for integrated hazard evaluation
NASA Astrophysics Data System (ADS)
Barreca, Giovanni; Bonforte, Alessandro; Neri, Marco
2013-02-01
A pilot GIS-based system has been implemented for the assessment and analysis of hazard related to active faults affecting the eastern and southern flanks of Mt. Etna. The system structure was developed in ArcGis® environment and consists of different thematic datasets that include spatially-referenced arc-features and associated database. Arc-type features, georeferenced into WGS84 Ellipsoid UTM zone 33 Projection, represent the five main fault systems that develop in the analysed region. The backbone of the GIS-based system is constituted by the large amount of information which was collected from the literature and then stored and properly geocoded in a digital database. This consists of thirty five alpha-numeric fields which include all fault parameters available from literature such us location, kinematics, landform, slip rate, etc. Although the system has been implemented according to the most common procedures used by GIS developer, the architecture and content of the database represent a pilot backbone for digital storing of fault parameters, providing a powerful tool in modelling hazard related to the active tectonics of Mt. Etna. The database collects, organises and shares all scientific currently available information about the active faults of the volcano. Furthermore, thanks to the strong effort spent on defining the fields of the database, the structure proposed in this paper is open to the collection of further data coming from future improvements in the knowledge of the fault systems. By layering additional user-specific geographic information and managing the proposed database (topological querying) a great diversity of hazard and vulnerability maps can be produced by the user. This is a proposal of a backbone for a comprehensive geographical database of fault systems, universally applicable to other sites.
NASA Astrophysics Data System (ADS)
Knosp, B.; Gangl, M.; Hristova-Veleva, S. M.; Kim, R. M.; Li, P.; Turk, J.; Vu, Q. A.
2015-12-01
The JPL Tropical Cyclone Information System (TCIS) brings together satellite, aircraft, and model forecast data from several NASA, NOAA, and other data centers to assist researchers in comparing and analyzing data and model forecast related to tropical cyclones. The TCIS has been running a near-real time (NRT) data portal during North Atlantic hurricane season that typically runs from June through October each year, since 2010. Data collected by the TCIS varies by type, format, contents, and frequency and is served to the user in two ways: (1) as image overlays on a virtual globe and (2) as derived output from a suite of analysis tools. In order to support these two functions, the data must be collected and then made searchable by criteria such as date, mission, product, pressure level, and geospatial region. Creating a database architecture that is flexible enough to manage, intelligently interrogate, and ultimately present this disparate data to the user in a meaningful way has been the primary challenge. The database solution for the TCIS has been to use a hybrid MySQL + Solr implementation. After testing other relational database and NoSQL solutions, such as PostgreSQL and MongoDB respectively, this solution has given the TCIS the best offerings in terms of query speed and result reliability. This database solution also supports the challenging (and memory overwhelming) geospatial queries that are necessary to support analysis tools requested by users. Though hardly new technologies on their own, our implementation of MySQL + Solr had to be customized and tuned to be able to accurately store, index, and search the TCIS data holdings. In this presentation, we will discuss how we arrived on our MySQL + Solr database architecture, why it offers us the most consistent fast and reliable results, and how it supports our front end so that we can offer users a look into our "big data" holdings.
[Relational database for urinary stone ambulatory consultation. Assessment of initial outcomes].
Sáenz Medina, J; Páez Borda, A; Crespo Martinez, L; Gómez Dos Santos, V; Barrado, C; Durán Poveda, M
2010-05-01
To create a relational database for monitoring lithiasic patients. We describe the architectural details and the initial results of the statistical analysis. Microsoft Access 2002 was used as template. Four different tables were constructed to gather demographic data (table 1), clinical and laboratory findings (table 2), stone features (table 3) and therapeutic approach (table 4). For a reliability analysis of the database the number of correctly stored data was gathered. To evaluate the performance of the database, a prospective analysis was conducted, from May 2004 to August 2009, on 171 stone free patients after treatment (EWSL, surgery or medical) from a total of 511 patients stored in the database. Lithiasic status (stone free or stone relapse) was used as primary end point, while demographic factors (age, gender), lithiasic history, upper urinary tract alterations and characteristics of the stone (side, location, composition and size) were considered as predictive factors. An univariate analysis was conducted initially by chi square test and supplemented by Kaplan Meier estimates for time to stone recurrence. A multiple Cox proportional hazards regression model was generated to jointly assess the prognostic value of the demographic factors and the predictive value of stones characteristics. For the reliability analysis 22,084 data were available corresponding to 702 consultations on 511 patients. Analysis of data showed a recurrence rate of 85.4% (146/171, median time to recurrence 608 days, range 70-1758). In the univariate and multivariate analysis, none of the factors under consideration had a significant effect on recurrence rate (p=ns). The relational database is useful for monitoring patients with urolithiasis. It allows easy control and update, as well as data storage for later use. The analysis conducted for its evaluation showed no influence of demographic factors and stone features on stone recurrence.
McCabe, Simon; Arndt, Jamie; Goldenberg, Jamie L; Vess, Matthew; Vail, Kenneth E; Gibbons, Frederick X; Rogers, Ross
2015-03-01
To use insights from an integration of the terror management health model and the prototype willingness model to inform and improve nutrition-related behavior using an ecologically valid outcome. Prior to shopping, grocery shoppers were exposed to a reminder of mortality (or pain) and then visualized a healthy (vs. neutral) prototype. Receipts were collected postshopping and food items purchased were coded using a nutrition database. Compared with those in the control conditions, participants who received the mortality reminder and who were led to visualize a healthy eater prototype purchased more nutritious foods. The integration of the terror management health model and the prototype willingness model has the potential for both basic and applied advances and offers a generative ground for future research. PsycINFO Database Record (c) 2015 APA, all rights reserved.
Benchmarking Using Basic DBMS Operations
NASA Astrophysics Data System (ADS)
Crolotte, Alain; Ghazal, Ahmad
The TPC-H benchmark proved to be successful in the decision support area. Many commercial database vendors and their related hardware vendors used these benchmarks to show the superiority and competitive edge of their products. However, over time, the TPC-H became less representative of industry trends as vendors keep tuning their database to this benchmark-specific workload. In this paper, we present XMarq, a simple benchmark framework that can be used to compare various software/hardware combinations. Our benchmark model is currently composed of 25 queries that measure the performance of basic operations such as scans, aggregations, joins and index access. This benchmark model is based on the TPC-H data model due to its maturity and well-understood data generation capability. We also propose metrics to evaluate single-system performance and compare two systems. Finally we illustrate the effectiveness of this model by showing experimental results comparing two systems under different conditions.
Spatio-Temporal Data Model for Integrating Evolving Nation-Level Datasets
NASA Astrophysics Data System (ADS)
Sorokine, A.; Stewart, R. N.
2017-10-01
Ability to easily combine the data from diverse sources in a single analytical workflow is one of the greatest promises of the Big Data technologies. However, such integration is often challenging as datasets originate from different vendors, governments, and research communities that results in multiple incompatibilities including data representations, formats, and semantics. Semantics differences are hardest to handle: different communities often use different attribute definitions and associate the records with different sets of evolving geographic entities. Analysis of global socioeconomic variables across multiple datasets over prolonged time is often complicated by the difference in how boundaries and histories of countries or other geographic entities are represented. Here we propose an event-based data model for depicting and tracking histories of evolving geographic units (countries, provinces, etc.) and their representations in disparate data. The model addresses the semantic challenge of preserving identity of geographic entities over time by defining criteria for the entity existence, a set of events that may affect its existence, and rules for mapping between different representations (datasets). Proposed model is used for maintaining an evolving compound database of global socioeconomic and environmental data harvested from multiple sources. Practical implementation of our model is demonstrated using PostgreSQL object-relational database with the use of temporal, geospatial, and NoSQL database extensions.
NASA Astrophysics Data System (ADS)
Brissebrat, Guillaume; Mastrorillo, Laurence; Ramage, Karim; Boichard, Jean-Luc; Cloché, Sophie; Fleury, Laurence; Klenov, Ludmila; Labatut, Laurent; Mière, Arnaud
2013-04-01
The international HyMeX (HYdrological cycle in the Mediterranean EXperiment) project aims at a better understanding and quantification of the hydrological cycle and related processes in the Mediterranean, with emphasis on high-impact weather events, inter-annual to decadal variability of the Mediterranean coupled system, and associated trends in the context of global change. The project includes long term monitoring of environmental parameters, intensive field campaigns, use of satellite data, modelling studies, as well as post event field surveys and value-added products processing. Therefore HyMeX database incorporates various dataset types from different disciplines, either operational or research. The database relies on a strong collaboration between OMP and IPSL data centres. Field data, which are 1D time series, maps or pictures, are managed by OMP team while gridded data (satellite products, model outputs, radar data...) are managed by IPSL team. At present, the HyMeX database contains about 150 datasets, including 80 hydrological, meteorological, ocean and soil in situ datasets, 30 radar datasets, 15 satellite products, 15 atmosphere, ocean and land surface model outputs from operational (re-)analysis or forecasts and from research simulations, and 5 post event survey datasets. The data catalogue complies with international standards (ISO 19115; INSPIRE; Directory Interchange Format; Global Change Master Directory Thesaurus). It includes all the datasets stored in the HyMeX database, as well as external datasets relevant for the project. All the data, whatever the type is, are accessible through a single gateway. The database website http://mistrals.sedoo.fr/HyMeX offers different tools: - A registration procedure which enables any scientist to accept the data policy and apply for a user database account. - A search tool to browse the catalogue using thematic, geographic and/or temporal criteria. - Sorted lists of the datasets by thematic keywords, by measured parameters, by instruments or by platform type. - Forms to document observations or products that will be provided to the database. - A shopping-cart web interface to order in situ data files. - Ftp facilities to access gridded data. The website will soon propose new facilities. Many in situ datasets have been homogenized and inserted in a relational database yet, in order to enable more accurate data selection and download of different datasets in a shared format. Interoperability between the two data centres will be enhanced by the OpenDAP communication protocol associated with the Thredds catalogue software, which may also be implemented in other data centres that manage data of interest for the HyMeX project. In order to meet the operational needs for the HyMeX 2012 campaigns, a day-to-day quick look and report display website has been developed too: http://sop.hymex.org. It offers a convenient way to browse meteorological conditions and data during the campaign periods.
Duchrow, Timo; Shtatland, Timur; Guettler, Daniel; Pivovarov, Misha; Kramer, Stefan; Weissleder, Ralph
2009-01-01
Background The breadth of biological databases and their information content continues to increase exponentially. Unfortunately, our ability to query such sources is still often suboptimal. Here, we introduce and apply community voting, database-driven text classification, and visual aids as a means to incorporate distributed expert knowledge, to automatically classify database entries and to efficiently retrieve them. Results Using a previously developed peptide database as an example, we compared several machine learning algorithms in their ability to classify abstracts of published literature results into categories relevant to peptide research, such as related or not related to cancer, angiogenesis, molecular imaging, etc. Ensembles of bagged decision trees met the requirements of our application best. No other algorithm consistently performed better in comparative testing. Moreover, we show that the algorithm produces meaningful class probability estimates, which can be used to visualize the confidence of automatic classification during the retrieval process. To allow viewing long lists of search results enriched by automatic classifications, we added a dynamic heat map to the web interface. We take advantage of community knowledge by enabling users to cast votes in Web 2.0 style in order to correct automated classification errors, which triggers reclassification of all entries. We used a novel framework in which the database "drives" the entire vote aggregation and reclassification process to increase speed while conserving computational resources and keeping the method scalable. In our experiments, we simulate community voting by adding various levels of noise to nearly perfectly labelled instances, and show that, under such conditions, classification can be improved significantly. Conclusion Using PepBank as a model database, we show how to build a classification-aided retrieval system that gathers training data from the community, is completely controlled by the database, scales well with concurrent change events, and can be adapted to add text classification capability to other biomedical databases. The system can be accessed at . PMID:19799796
Follicle Online: an integrated database of follicle assembly, development and ovulation.
Hua, Juan; Xu, Bo; Yang, Yifan; Ban, Rongjun; Iqbal, Furhan; Cooke, Howard J; Zhang, Yuanwei; Shi, Qinghua
2015-01-01
Folliculogenesis is an important part of ovarian function as it provides the oocytes for female reproductive life. Characterizing genes/proteins involved in folliculogenesis is fundamental for understanding the mechanisms associated with this biological function and to cure the diseases associated with folliculogenesis. A large number of genes/proteins associated with folliculogenesis have been identified from different species. However, no dedicated public resource is currently available for folliculogenesis-related genes/proteins that are validated by experiments. Here, we are reporting a database 'Follicle Online' that provides the experimentally validated gene/protein map of the folliculogenesis in a number of species. Follicle Online is a web-based database system for storing and retrieving folliculogenesis-related experimental data. It provides detailed information for 580 genes/proteins (from 23 model organisms, including Homo sapiens, Mus musculus, Rattus norvegicus, Mesocricetus auratus, Bos Taurus, Drosophila and Xenopus laevis) that have been reported to be involved in folliculogenesis, POF (premature ovarian failure) and PCOS (polycystic ovary syndrome). The literature was manually curated from more than 43,000 published articles (till 1 March 2014). The Follicle Online database is implemented in PHP + MySQL + JavaScript and this user-friendly web application provides access to the stored data. In summary, we have developed a centralized database that provides users with comprehensive information about genes/proteins involved in folliculogenesis. This database can be accessed freely and all the stored data can be viewed without any registration. Database URL: http://mcg.ustc.edu.cn/sdap1/follicle/index.php © The Author(s) 2015. Published by Oxford University Press.
Follicle Online: an integrated database of follicle assembly, development and ovulation
Hua, Juan; Xu, Bo; Yang, Yifan; Ban, Rongjun; Iqbal, Furhan; Zhang, Yuanwei; Shi, Qinghua
2015-01-01
Folliculogenesis is an important part of ovarian function as it provides the oocytes for female reproductive life. Characterizing genes/proteins involved in folliculogenesis is fundamental for understanding the mechanisms associated with this biological function and to cure the diseases associated with folliculogenesis. A large number of genes/proteins associated with folliculogenesis have been identified from different species. However, no dedicated public resource is currently available for folliculogenesis-related genes/proteins that are validated by experiments. Here, we are reporting a database ‘Follicle Online’ that provides the experimentally validated gene/protein map of the folliculogenesis in a number of species. Follicle Online is a web-based database system for storing and retrieving folliculogenesis-related experimental data. It provides detailed information for 580 genes/proteins (from 23 model organisms, including Homo sapiens, Mus musculus, Rattus norvegicus, Mesocricetus auratus, Bos Taurus, Drosophila and Xenopus laevis) that have been reported to be involved in folliculogenesis, POF (premature ovarian failure) and PCOS (polycystic ovary syndrome). The literature was manually curated from more than 43 000 published articles (till 1 March 2014). The Follicle Online database is implemented in PHP + MySQL + JavaScript and this user-friendly web application provides access to the stored data. In summary, we have developed a centralized database that provides users with comprehensive information about genes/proteins involved in folliculogenesis. This database can be accessed freely and all the stored data can be viewed without any registration. Database URL: http://mcg.ustc.edu.cn/sdap1/follicle/index.php PMID:25931457
Deployment and Evaluation of an Observations Data Model
NASA Astrophysics Data System (ADS)
Horsburgh, J. S.; Tarboton, D. G.; Zaslavsky, I.; Maidment, D. R.; Valentine, D.
2007-12-01
Environmental observations are fundamental to hydrology and water resources, and the way these data are organized and manipulated either enables or inhibits the analyses that can be performed. The CUAHSI Hydrologic Information System project is developing information technology infrastructure to support hydrologic science. This includes an Observations Data Model (ODM) that provides a new and consistent format for the storage and retrieval of environmental observations in a relational database designed to facilitate integrated analysis of large datasets collected by multiple investigators. Within this data model, observations are stored with sufficient ancillary information (metadata) about the observations to allow them to be unambiguously interpreted and used, and to provide traceable heritage from raw measurements to useable information. The design is based upon a relational database model that exposes each single observation as a record, taking advantage of the capability in relational database systems for querying based upon data values and enabling cross dimension data retrieval and analysis. This data model has been deployed, as part of the HIS Server, at the WATERS Network test bed observatories across the U.S where it serves as a repository for real time data in the observatory information system. The ODM holds the data that is then made available to investigators and the public through web services and the Data Access System for Hydrology (DASH) map based interface. In the WATERS Network test bed settings the ODM has been used to ingest, analyze and publish data from a variety of sources and disciplines. This paper will present an evaluation of the effectiveness of this initial deployment and the revisions that are being instituted to address shortcomings. The ODM represents a new, systematic way for hydrologists, scientists, and engineers to organize and share their data and thereby facilitate a fuller integrated understanding of water resources based on more extensive and fully specified information.
A Physically Based Correlation of Irradiation-Induced Transition Temperature Shifts for RPV Steels
DOE Office of Scientific and Technical Information (OSTI.GOV)
Eason, Ernest D.; Odette, George Robert; Nanstad, Randy K
2007-11-01
The reactor pressure vessels (RPVs) of commercial nuclear power plants are subject to embrittlement due to exposure to high-energy neutrons from the core, which causes changes in material toughness properties that increase with radiation exposure and are affected by many variables. Irradiation embrittlement of RPV beltline materials is currently evaluated using Regulatory Guide 1.99 Revision 2 (RG1.99/2), which presents methods for estimating the shift in Charpy transition temperature at 30 ft-lb (TTS) and the drop in Charpy upper shelf energy (ΔUSE). The purpose of the work reported here is to improve on the TTS correlation model in RG1.99/2 using themore » broader database now available and current understanding of embrittlement mechanisms. The USE database and models have not been updated since the publication of NUREG/CR-6551 and, therefore, are not discussed in this report. The revised embrittlement shift model is calibrated and validated on a substantially larger, better-balanced database compared to prior models, including over five times the amount of data used to develop RG1.99/2. It also contains about 27% more data than the most recent update to the surveillance shift database, in 2000. The key areas expanded in the current database relative to the database available in 2000 are low-flux, low-copper, and long-time, high-fluence exposures, all areas that were previously relatively sparse. All old and new surveillance data were reviewed for completeness, duplicates, and discrepancies in cooperation with the American Society for Testing and Materials (ASTM) Subcommittee E10.02 on Radiation Effects in Structural Materials. In the present modeling effort, a 10% random sample of data was reserved from the fitting process, and most aspects of the model were validated with that sample as well as other data not used in calibration. The model is a hybrid, incorporating both physically motivated features and empirical calibration to the U.S. power reactor surveillance data. It contains two terms, corresponding to the best-understood radiation damage features, matrix damage and copper-rich precipitates, although the empirical calibration will ensure that all other damage processes that are occurring are also reflected in those terms. Effects of material chemical composition, product form, and radiation exposure are incorporated, such that all effects are supported by findings of statistical significance, physical understanding, or comparison with independent data from controlled experiments, such as the Irradiation Variable (IVAR) Program. In most variable effects, the model is supported by two or three of these different forms of evidence. The key variable trends, such as the neutron fluence dependence and copper-nickel dependence in the new TTS model, are much improved over RG1.99/2 and are well supported by independent data and the current understanding of embrittlement mechanisms. The new model includes the variables copper, nickel, and fluence that are in RG1.99/2, but also includes effects of irradiation temperature, neutron flux, phosphorus, and manganese. The calibrated model is a good fit, with no significant residual error trends in any of the variables used in the model or several additional variables and variable interactions that were investigated. The report includes a chapter summarizing the current understanding of embrittlement mechanisms and one comparing the IVAR database with the TTS model predictions. Generally good agreement is found in that quantitative comparison, providing independent confirmation of the predictive capability of the TTS model. The key new insight in the TTS modeling effort, that flux effects are evident in both low (or no) copper and higher copper materials, is supported by the IVAR data. The slightly simplified version of the TTS model presented in Section 7.3 of this report is recommended for applications.« less
Leary, Alison; Cook, Rob; Jones, Sarahjane; Smith, Judith; Gough, Malcolm; Maxwell, Elaine; Punshon, Geoffrey; Radford, Mark
2016-12-16
Nursing is a safety critical activity but not easily quantified. This makes the building of predictive staffing models a challenge. The aim of this study was to determine if relationships between registered and non-registered nurse staffing levels and clinical outcomes could be discovered through the mining of routinely collected clinical data. The secondary aim was to examine the feasibility and develop the use of 'big data' techniques commonly used in industry for this area of healthcare and examine future uses. The data were obtained from 1 large acute National Health Service hospital trust in England. Routinely collected physiological, signs and symptom data from a clinical database were extracted, imported and mined alongside a bespoke staffing and outcomes database using Mathmatica V.10. The physiological data consisted of 120 million patient entries over 6 years, the bespoke database consisted of 9 years of daily data on staffing levels and safety factors such as falls. To discover patterns in these data or non-linear relationships that would contribute to modelling. To examine feasibility of this technique in this field. After mining, 40 correlations (p<0.00005) emerged between safety factors, physiological data (such as the presence or absence of nausea) and staffing factors. Several inter-related factors demonstrated step changes where registered nurse availability appeared to relate to physiological parameters or outcomes such as falls and the management of symptoms. Data extraction proved challenging as some commercial databases were not built for extraction of the massive data sets they contain. The relationship between staffing and outcomes appears to exist. It appears to be non-linear but calculable and a data-driven model appears possible. These findings could be used to build an initial mathematical model for acute staffing which could be further tested. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/.
Building a Database for a Quantitative Model
NASA Technical Reports Server (NTRS)
Kahn, C. Joseph; Kleinhammer, Roger
2014-01-01
A database can greatly benefit a quantitative analysis. The defining characteristic of a quantitative risk, or reliability, model is the use of failure estimate data. Models can easily contain a thousand Basic Events, relying on hundreds of individual data sources. Obviously, entering so much data by hand will eventually lead to errors. Not so obviously entering data this way does not aid linking the Basic Events to the data sources. The best way to organize large amounts of data on a computer is with a database. But a model does not require a large, enterprise-level database with dedicated developers and administrators. A database built in Excel can be quite sufficient. A simple spreadsheet database can link every Basic Event to the individual data source selected for them. This database can also contain the manipulations appropriate for how the data is used in the model. These manipulations include stressing factors based on use and maintenance cycles, dormancy, unique failure modes, the modeling of multiple items as a single "Super component" Basic Event, and Bayesian Updating based on flight and testing experience. A simple, unique metadata field in both the model and database provides a link from any Basic Event in the model to its data source and all relevant calculations. The credibility for the entire model often rests on the credibility and traceability of the data.
MetPetDB: A database for metamorphic geochemistry
NASA Astrophysics Data System (ADS)
Spear, Frank S.; Hallett, Benjamin; Pyle, Joseph M.; Adalı, Sibel; Szymanski, Boleslaw K.; Waters, Anthony; Linder, Zak; Pearce, Shawn O.; Fyffe, Matthew; Goldfarb, Dennis; Glickenhouse, Nickolas; Buletti, Heather
2009-12-01
We present a data model for the initial implementation of MetPetDB, a geochemical database specific to metamorphic rock samples. The database is designed around the concept of preservation of spatial relationships, at all scales, of chemical analyses and their textural setting. Objects in the database (samples) represent physical rock samples; each sample may contain one or more subsamples with associated geochemical and image data. Samples, subsamples, geochemical data, and images are described with attributes (some required, some optional); these attributes also serve as search delimiters. All data in the database are classified as published (i.e., archived or published data), public or private. Public and published data may be freely searched and downloaded. All private data is owned; permission to view, edit, download and otherwise manipulate private data may be granted only by the data owner; all such editing operations are recorded by the database to create a data version log. The sharing of data permissions among a group of collaborators researching a common sample is done by the sample owner through the project manager. User interaction with MetPetDB is hosted by a web-based platform based upon the Java servlet application programming interface, with the PostgreSQL relational database. The database web portal includes modules that allow the user to interact with the database: registered users may save and download public and published data, upload private data, create projects, and assign permission levels to project collaborators. An Image Viewer module provides for spatial integration of image and geochemical data. A toolkit consisting of plotting and geochemical calculation software for data analysis and a mobile application for viewing the public and published data is being developed. Future issues to address include population of the database, integration with other geochemical databases, development of the analysis toolkit, creation of data models for derivative data, and building a community-wide user base. It is believed that this and other geochemical databases will enable more productive collaborations, generate more efficient research efforts, and foster new developments in basic research in the field of solid earth geochemistry.
[Establishment of a comprehensive database for laryngeal cancer related genes and the miRNAs].
Li, Mengjiao; E, Qimin; Liu, Jialin; Huang, Tingting; Liang, Chuanyu
2015-09-01
By collecting and analyzing the laryngeal cancer related genes and the miRNAs, to build a comprehensive laryngeal cancer-related gene database, which differs from the current biological information database with complex and clumsy structure and focuses on the theme of gene and miRNA, and it could make the research and teaching more convenient and efficient. Based on the B/S architecture, using Apache as a Web server, MySQL as coding language of database design and PHP as coding language of web design, a comprehensive database for laryngeal cancer-related genes was established, providing with the gene tables, protein tables, miRNA tables and clinical information tables of the patients with laryngeal cancer. The established database containsed 207 laryngeal cancer related genes, 243 proteins, 26 miRNAs, and their particular information such as mutations, methylations, diversified expressions, and the empirical references of laryngeal cancer relevant molecules. The database could be accessed and operated via the Internet, by which browsing and retrieval of the information were performed. The database were maintained and updated regularly. The database for laryngeal cancer related genes is resource-integrated and user-friendly, providing a genetic information query tool for the study of laryngeal cancer.
Building a genome database using an object-oriented approach.
Barbasiewicz, Anna; Liu, Lin; Lang, B Franz; Burger, Gertraud
2002-01-01
GOBASE is a relational database that integrates data associated with mitochondria and chloroplasts. The most important data in GOBASE, i. e., molecular sequences and taxonomic information, are obtained from the public sequence data repository at the National Center for Biotechnology Information (NCBI), and are validated by our experts. Maintaining a curated genomic database comes with a towering labor cost, due to the shear volume of available genomic sequences and the plethora of annotation errors and omissions in records retrieved from public repositories. Here we describe our approach to increase automation of the database population process, thereby reducing manual intervention. As a first step, we used Unified Modeling Language (UML) to construct a list of potential errors. Each case was evaluated independently, and an expert solution was devised, and represented as a diagram. Subsequently, the UML diagrams were used as templates for writing object-oriented automation programs in the Java programming language.
Burnett, Leslie; Barlow-Stewart, Kris; Proos, Anné L; Aizenberg, Harry
2003-05-01
This article describes a generic model for access to samples and information in human genetic databases. The model utilises a "GeneTrustee", a third-party intermediary independent of the subjects and of the investigators or database custodians. The GeneTrustee model has been implemented successfully in various community genetics screening programs and has facilitated research access to genetic databases while protecting the privacy and confidentiality of research subjects. The GeneTrustee model could also be applied to various types of non-conventional genetic databases, including neonatal screening Guthrie card collections, and to forensic DNA samples.
Predictive Models of Acute Mountain Sickness after Rapid Ascent to Various Altitudes
2013-01-01
unclassified relational mountain medicine database containing individ- ual ascent profiles, demographic and physiologic subject descriptors, and...course of AMS, and define the baseline demographics and physiologic descriptors that increase the risk of AMS. In addition, these models provide...substantiated this finding in un- acclimatized women (24). Other physiologic differences between men and women (i.e., differences in endothelial
Neural/Bayes network predictor for inheritable cardiac disease pathogenicity and phenotype.
Burghardt, Thomas P; Ajtai, Katalin
2018-04-11
The cardiac muscle sarcomere contains multiple proteins contributing to contraction energy transduction and its regulation during a heartbeat. Inheritable heart disease mutants affect most of them but none more frequently than the ventricular myosin motor and cardiac myosin binding protein c (mybpc3). These co-localizing proteins have mybpc3 playing a regulatory role to the energy transducing motor. Residue substitution and functional domain assignment of each mutation in the protein sequence decides, under the direction of a sensible disease model, phenotype and pathogenicity. The unknown model mechanism is decided here using a method combing neural and Bayes networks. Missense single nucleotide polymorphisms (SNPs) are clues for the disease mechanism summarized in an extensive database collecting mutant sequence location and residue substitution as independent variables that imply the dependent disease phenotype and pathogenicity characteristics in 4 dimensional data points (4ddps). The SNP database contains entries with the majority having one or both dependent data entries unfulfilled. A neural network relating causes (mutant residue location and substitution) and effects (phenotype and pathogenicity) is trained, validated, and optimized using fulfilled 4ddps. It then predicts unfulfilled 4ddps providing the implicit disease model. A discrete Bayes network interprets fulfilled and predicted 4ddps with conditional probabilities for phenotype and pathogenicity given mutation location and residue substitution thus relating the neural network implicit model to explicit features of the motor and mybpc3 sequence and structural domains. Neural/Bayes network forecasting automates disease mechanism modeling by leveraging the world wide human missense SNP database that is in place and expanding. Copyright © 2018 The Authors. Published by Elsevier Ltd.. All rights reserved.
NASA Technical Reports Server (NTRS)
Kawamura, K.; Beale, G. O.; Schaffer, J. D.; Hsieh, B. J.; Padalkar, S.; Rodriguez-Moscoso, J. J.
1985-01-01
The results of the first phase of Research on an Expert System for Database Operation of Simulation/Emulation Math Models, is described. Techniques from artificial intelligence (AI) were to bear on task domains of interest to NASA Marshall Space Flight Center. One such domain is simulation of spacecraft attitude control systems. Two related software systems were developed to and delivered to NASA. One was a generic simulation model for spacecraft attitude control, written in FORTRAN. The second was an expert system which understands the usage of a class of spacecraft attitude control simulation software and can assist the user in running the software. This NASA Expert Simulation System (NESS), written in LISP, contains general knowledge about digital simulation, specific knowledge about the simulation software, and self knowledge.
A Relational Database System for Student Use.
ERIC Educational Resources Information Center
Fertuck, Len
1982-01-01
Describes an APL implementation of a relational database system suitable for use in a teaching environment in which database development and database administration are studied, and discusses the functions of the user and the database administrator. An appendix illustrating system operation and an eight-item reference list are attached. (Author/JL)
Exploring Human Cognition Using Large Image Databases.
Griffiths, Thomas L; Abbott, Joshua T; Hsu, Anne S
2016-07-01
Most cognitive psychology experiments evaluate models of human cognition using a relatively small, well-controlled set of stimuli. This approach stands in contrast to current work in neuroscience, perception, and computer vision, which have begun to focus on using large databases of natural images. We argue that natural images provide a powerful tool for characterizing the statistical environment in which people operate, for better evaluating psychological theories, and for bringing the insights of cognitive science closer to real applications. We discuss how some of the challenges of using natural images as stimuli in experiments can be addressed through increased sample sizes, using representations from computer vision, and developing new experimental methods. Finally, we illustrate these points by summarizing recent work using large image databases to explore questions about human cognition in four different domains: modeling subjective randomness, defining a quantitative measure of representativeness, identifying prior knowledge used in word learning, and determining the structure of natural categories. Copyright © 2016 Cognitive Science Society, Inc.
Nagle, D.D.; Campbell, B.G.; Lowery, M.A.
2009-01-01
The increasing use and importance of lakes for water supply to communities enhance the need for an accurate methodology to determine lake bathymetry and storage capacity. A global positioning receiver and a fathometer were used to collect position data and water depth in February 2008 at Lake William C. Bowen and Municipal Reservoir #1, Spartanburg County, South Carolina. All collected data were imported into a geographic information system database. A bathymetric surface model, contour map, and stage-area and -volume relations were created from the geographic information database.
Fleeman, N; McLeod, C; Bagust, A; Beale, S; Boland, A; Dundar, Y; Jorgensen, A; Payne, K; Pirmohamed, M; Pushpakom, S; Walley, T; de Warren-Penny, P; Dickson, R
2010-01-01
To determine whether testing for cytochrome P450 (CYP) polymorphisms in adults entering antipsychotic treatment for schizophrenia leads to improvement in outcomes, is useful in medical, personal or public health decision-making, and is a cost-effective use of health-care resources. The following electronic databases were searched for relevant published literature: Cochrane Controlled Trials Register, Cochrane Database of Systematic Reviews, Database of Abstracts of Reviews of Effectiveness, EMBASE, Health Technology Assessment database, ISI Web of Knowledge, MEDLINE, PsycINFO, NHS Economic Evaluation Database, Health Economic Evaluation Database, Cost-effectiveness Analysis (CEA) Registry and the Centre for Health Economics website. In addition, publicly available information on various genotyping tests was sought from the internet and advisory panel members. A systematic review of analytical validity, clinical validity and clinical utility of CYP testing was undertaken. Data were extracted into structured tables and narratively discussed, and meta-analysis was undertaken when possible. A review of economic evaluations of CYP testing in psychiatry and a review of economic models related to schizophrenia were also carried out. For analytical validity, 46 studies of a range of different genotyping tests for 11 different CYP polymorphisms (most commonly CYP2D6) were included. Sensitivity and specificity were high (99-100%). For clinical validity, 51 studies were found. In patients tested for CYP2D6, an association between genotype and tardive dyskinesia (including Abnormal Involuntary Movement Scale scores) was found. The only other significant finding linked the CYP2D6 genotype to parkinsonism. One small unpublished study met the inclusion criteria for clinical utility. One economic evaluation assessing the costs and benefits of CYP testing for prescribing antidepressants and 28 economic models of schizophrenia were identified; none was suitable for developing a model to examine the cost-effectiveness of CYP testing. Tests for determining genotypes appear to be accurate although not all aspects of analytical validity were reported. Given the absence of convincing evidence from clinical validity studies, the lack of clinical utility and economic studies, and the unsuitability of published schizophrenia models, no model was developed; instead key features and data requirements for economic modelling are presented. Recommendations for future research cover both aspects of research quality and data that will be required to inform the development of future economic models.
Design and Establishment of Quality Model of Fundamental Geographic Information Database
NASA Astrophysics Data System (ADS)
Ma, W.; Zhang, J.; Zhao, Y.; Zhang, P.; Dang, Y.; Zhao, T.
2018-04-01
In order to make the quality evaluation for the Fundamental Geographic Information Databases(FGIDB) more comprehensive, objective and accurate, this paper studies and establishes a quality model of FGIDB, which formed by the standardization of database construction and quality control, the conformity of data set quality and the functionality of database management system, and also designs the overall principles, contents and methods of the quality evaluation for FGIDB, providing the basis and reference for carry out quality control and quality evaluation for FGIDB. This paper designs the quality elements, evaluation items and properties of the Fundamental Geographic Information Database gradually based on the quality model framework. Connected organically, these quality elements and evaluation items constitute the quality model of the Fundamental Geographic Information Database. This model is the foundation for the quality demand stipulation and quality evaluation of the Fundamental Geographic Information Database, and is of great significance on the quality assurance in the design and development stage, the demand formulation in the testing evaluation stage, and the standard system construction for quality evaluation technology of the Fundamental Geographic Information Database.
A spatiotemporal data model for incorporating time in geographic information systems (GEN-STGIS)
NASA Astrophysics Data System (ADS)
Narciso, Flor Eugenia
Temporal Geographic Information Systems (TGIS) is a new technology, which is being developed to work with Geographic Information Systems (GIS) that deal with geographic phenomena that change over time. The capabilities of TGIS depend on the underlying data model. However, a literature review of current spatiotemporal GIS data models has shown that they are not adequate for managing time when representing temporal data. In addition, the majority of these data models have been designed to support the requirements of specific-purpose applications. In an effort to resolve this problem, the related literature has been explored. A comparative investigation of the current spatiotemporal GIS data models has been made to identify their characteristics, advantages and disadvantages, similarities and differences, and to determine why they do not work adequately. A new object-oriented General-purpose Spatiotemporal GIS (GEN-STGIS) data model is proposed here. This model provides better representation, storage and management of data related to geographic phenomena that change over time and overcomes some of the problems detected in the reviewed data models. The proposed data model has four key benefits. First, it provides the capabilities of a standard vector-based GIS embedded in the 2-D Euclidean space. Second, it includes the two temporal dimensions, valid time and transaction time, supported by temporal databases. Third, it inherits, from the object oriented approach, the flexibility, modularity and ability to handle the complexities introduced by spatial and temporal dimensions. Fourth, it improves the geographic query capabilities of current TGIS with the introduction of the concept of bounding box while providing temporal and spatiotemporal query capabilities. The data model is then evaluated in order to assess its strengths and weaknesses as a spatiotemporal GIS data model, and to determine how well the model satisfies the requirements imposed by TGIS applications. The practicality of the data model is demonstrated by the creation of a TGIS example and the partial implementation of the model using the POET Java software for developing the object-oriented database. the object-oriented database.
Optimized volume models of earthquake-triggered landslides
Xu, Chong; Xu, Xiwei; Shen, Lingling; Yao, Qi; Tan, Xibin; Kang, Wenjun; Ma, Siyuan; Wu, Xiyan; Cai, Juntao; Gao, Mingxing; Li, Kang
2016-01-01
In this study, we proposed three optimized models for calculating the total volume of landslides triggered by the 2008 Wenchuan, China Mw 7.9 earthquake. First, we calculated the volume of each deposit of 1,415 landslides triggered by the quake based on pre- and post-quake DEMs in 20 m resolution. The samples were used to fit the conventional landslide “volume-area” power law relationship and the 3 optimized models we proposed, respectively. Two data fitting methods, i.e. log-transformed-based linear and original data-based nonlinear least square, were employed to the 4 models. Results show that original data-based nonlinear least square combining with an optimized model considering length, width, height, lithology, slope, peak ground acceleration, and slope aspect shows the best performance. This model was subsequently applied to the database of landslides triggered by the quake except for two largest ones with known volumes. It indicates that the total volume of the 196,007 landslides is about 1.2 × 1010 m3 in deposit materials and 1 × 1010 m3 in source areas, respectively. The result from the relationship of quake magnitude and entire landslide volume related to individual earthquake is much less than that from this study, which reminds us the necessity to update the power-law relationship. PMID:27404212
Optimized volume models of earthquake-triggered landslides.
Xu, Chong; Xu, Xiwei; Shen, Lingling; Yao, Qi; Tan, Xibin; Kang, Wenjun; Ma, Siyuan; Wu, Xiyan; Cai, Juntao; Gao, Mingxing; Li, Kang
2016-07-12
In this study, we proposed three optimized models for calculating the total volume of landslides triggered by the 2008 Wenchuan, China Mw 7.9 earthquake. First, we calculated the volume of each deposit of 1,415 landslides triggered by the quake based on pre- and post-quake DEMs in 20 m resolution. The samples were used to fit the conventional landslide "volume-area" power law relationship and the 3 optimized models we proposed, respectively. Two data fitting methods, i.e. log-transformed-based linear and original data-based nonlinear least square, were employed to the 4 models. Results show that original data-based nonlinear least square combining with an optimized model considering length, width, height, lithology, slope, peak ground acceleration, and slope aspect shows the best performance. This model was subsequently applied to the database of landslides triggered by the quake except for two largest ones with known volumes. It indicates that the total volume of the 196,007 landslides is about 1.2 × 10(10) m(3) in deposit materials and 1 × 10(10) m(3) in source areas, respectively. The result from the relationship of quake magnitude and entire landslide volume related to individual earthquake is much less than that from this study, which reminds us the necessity to update the power-law relationship.
Baran, Michael C; Moseley, Hunter N B; Sahota, Gurmukh; Montelione, Gaetano T
2002-10-01
Modern protein NMR spectroscopy laboratories have a rapidly growing need for an easily queried local archival system of raw experimental NMR datasets. SPINS (Standardized ProteIn Nmr Storage) is an object-oriented relational database that provides facilities for high-volume NMR data archival, organization of analyses, and dissemination of results to the public domain by automatic preparation of the header files required for submission of data to the BioMagResBank (BMRB). The current version of SPINS coordinates the process from data collection to BMRB deposition of raw NMR data by standardizing and integrating the storage and retrieval of these data in a local laboratory file system. Additional facilities include a data mining query tool, graphical database administration tools, and a NMRStar v2. 1.1 file generator. SPINS also includes a user-friendly internet-based graphical user interface, which is optionally integrated with Varian VNMR NMR data collection software. This paper provides an overview of the data model underlying the SPINS database system, a description of its implementation in Oracle, and an outline of future plans for the SPINS project.
NASA Astrophysics Data System (ADS)
Othmanli, Hussein; Zhao, Chengyi; Stahr, Karl
2017-04-01
The Tarim River Basin is the largest continental basin in China. The region has extremely continental desert climate characterized by little rainfall <50 mm/a and high potential evaporation >3000 mm/a. The climate change is affecting severely the basin causing soil salinization, water shortage, and regression in crop production. Therefore, a Soil and Land Resources Information System (SLISYS-Tarim) for the regional simulation of crop yield production in the basin was developed. The SLISYS-Tarim consists of a database and an agro-ecological simulation model EPIC (Environmental Policy Integrated Climate). The database comprises relational tables including information about soils, terrain conditions, land use, and climate. The soil data implicate information of 50 soil profiles which were dug, analyzed, described and classified in order to characterize the soils in the region. DEM data were integrated with geological maps to build a digital terrain structure. Remote sensing data of Landsat images were applied for soil mapping, and for land use and land cover classification. An additional database for climate data, land management and crop information were linked to the system, too. Construction of the SLISYS-Tarim database was accomplished by integrating and overlaying the recommended thematic maps within environment of the geographic information system (GIS) to meet the data standard of the global and national SOTER digital database. This database forms appropriate input- and output data for the crop modelling with the EPIC model at various scales in the Tarim Basin. The EPIC model was run for simulating cotton production under a constructed scenario characterizing the current management practices, soil properties and climate conditions. For the EPIC model calibration, some parameters were adjusted so that the modeled cotton yield fits to the measured yield on the filed scale. The validation of the modeling results was achieved in a later step based on remote sensing data. The simulated cotton yield varied according to field management, soil type and salinity level, where soil salinity was the main limiting factor. Furthermore, the calibrated and validated EPIC model was run under several scenarios of climate conditions and land management practices to estimate the effect of climate change on cotton production and sustainability of agriculture systems in the basin. The application of SLISYS-Tarim showed that this database can be a suitable framework for storage and retrieval of soil and terrain data at various scales. The simulation with the EPIC model can assess the impact of climate change and management strategies. Therefore, SLISYS-Tarim can be a good tool for regional planning and serve the decision support system on regional and national scale.
Lin, Yun; Melby, Daniel P; Krishnan, Balaji; Adabag, Selcuk; Tholakanahalli, Venkatakrishna; Li, Jian-Ming
2017-08-01
The aim of this study is to investigate the frequency of electrosurgery-related pacemaker malfunction. A retrospective study was conducted to investigate electrosurgery-related pacemaker malfunction in consecutive patients undergoing pulse generator (PG) replacement or upgrade from two large hospitals in Minneapolis, MN between January 2011 and January 2014. The occurrence of this pacemaker malfunction was then studied by using MAUDE database for all four major device vendors. A total of 1398 consecutive patients from 2 large tertiary referral centers in Minneapolis, MN undergoing PG replacement or upgrade surgery were retrospectively studied. Four patients (0.3% of all patients), all with pacemakers from St Jude Medical (2.8%, 4 of 142) had output failure or inappropriately low pacing rate below 30 bpm during electrosurgery, despite being programmed in an asynchronous mode. During the same period, 1174 cases of pacemaker malfunctions were reported on the same models in MAUDE database, 37 of which (3.2%) were electrosurgery-related. Twenty-four cases (65%) had output failure or inappropriate low pacing rate. The distribution of adverse events was loss of pacing (59.5%), reversion to backup pacing (32.4%), inappropriate low pacing rate (5.4%), and ventricular fibrillation (2.7%). The majority of these (78.5%) occurred during PG replacement at ERI or upgrade surgery. No electrosurgery-related malfunction was found in MAUDE database on 862 pacemaker malfunction cases during the same period from other vendors. Electrosurgery during PG replacement or upgrade surgery can trigger output failure or inappropriate low pacing rate in certain models of modern pacemakers. Cautions should be taken for pacemaker-dependent patients.
Relational Databases and Biomedical Big Data.
de Silva, N H Nisansa D
2017-01-01
In various biomedical applications that collect, handle, and manipulate data, the amounts of data tend to build up and venture into the range identified as bigdata. In such occurrences, a design decision has to be taken as to what type of database would be used to handle this data. More often than not, the default and classical solution to this in the biomedical domain according to past research is relational databases. While this used to be the norm for a long while, it is evident that there is a trend to move away from relational databases in favor of other types and paradigms of databases. However, it still has paramount importance to understand the interrelation that exists between biomedical big data and relational databases. This chapter will review the pros and cons of using relational databases to store biomedical big data that previous researches have discussed and used.
MRLC-LAND COVER MAPPING, ACCURACY ASSESSMENT AND APPLICATION RESEARCH
The National Land Cover Database (NLCD), produced by the Multi-Resolution Land Characteristics (MRLC) provides consistently classified land-cover and ancillary data for the United States. These data support many of the modeling and monitoring efforts related to GPRA goals of Cle...
Methods, Knowledge Support, and Experimental Tools for Modeling
2006-10-01
open source software entities: the PostgreSQL relational database management system (http://www.postgres.org), the Apache web server (http...past. The revision control system allows the program to capture disagreements, and allows users to explore the history of such disagreements by
The Role of Religiosity on Postsecondary Degree Attainment
ERIC Educational Resources Information Center
Lee, Sang Min; Puig, Ana; Clark, Mary Ann
2007-01-01
Although previous researchers found that several individual, family, and school characteristics influenced adolescents' academic performance, religion related factors have not typically been considered for models of bachelor's degree attainment. Using longitudinal data in a national database, the authors examined the relationship between high…
NASA Technical Reports Server (NTRS)
Campbell, William J.
1985-01-01
Intelligent data management is the concept of interfacing a user to a database management system with a value added service that will allow a full range of data management operations at a high level of abstraction using human written language. The development of such a system will be based on expert systems and related artificial intelligence technologies, and will allow the capturing of procedural and relational knowledge about data management operations and the support of a user with such knowledge in an on-line, interactive manner. Such a system will have the following capabilities: (1) the ability to construct a model of the users view of the database, based on the query syntax; (2) the ability to transform English queries and commands into database instructions and processes; (3) the ability to use heuristic knowledge to rapidly prune the data space in search processes; and (4) the ability to use an on-line explanation system to allow the user to understand what the system is doing and why it is doing it. Additional information is given in outline form.
Geospatial Database for Strata Objects Based on Land Administration Domain Model (ladm)
NASA Astrophysics Data System (ADS)
Nasorudin, N. N.; Hassan, M. I.; Zulkifli, N. A.; Rahman, A. Abdul
2016-09-01
Recently in our country, the construction of buildings become more complex and it seems that strata objects database becomes more important in registering the real world as people now own and use multilevel of spaces. Furthermore, strata title was increasingly important and need to be well-managed. LADM is a standard model for land administration and it allows integrated 2D and 3D representation of spatial units. LADM also known as ISO 19152. The aim of this paper is to develop a strata objects database using LADM. This paper discusses the current 2D geospatial database and needs for 3D geospatial database in future. This paper also attempts to develop a strata objects database using a standard data model (LADM) and to analyze the developed strata objects database using LADM data model. The current cadastre system in Malaysia includes the strata title is discussed in this paper. The problems in the 2D geospatial database were listed and the needs for 3D geospatial database in future also is discussed. The processes to design a strata objects database are conceptual, logical and physical database design. The strata objects database will allow us to find the information on both non-spatial and spatial strata title information thus shows the location of the strata unit. This development of strata objects database may help to handle the strata title and information.
A Relational Algebra Query Language for Programming Relational Databases
ERIC Educational Resources Information Center
McMaster, Kirby; Sambasivam, Samuel; Anderson, Nicole
2011-01-01
In this paper, we describe a Relational Algebra Query Language (RAQL) and Relational Algebra Query (RAQ) software product we have developed that allows database instructors to teach relational algebra through programming. Instead of defining query operations using mathematical notation (the approach commonly taken in database textbooks), students…
Hatz, Maximilian H M; Leidl, Reiner; Yates, Nichola A; Stollenwerk, Björn
2014-04-01
Thrombosis inhibitors can be used to treat acute coronary syndromes (ACS). However, there are various alternative treatment strategies, of which some have been compared using health economic decision models. To assess the quality of health economic decision models comparing thrombosis inhibitors in patients with ACS undergoing percutaneous coronary intervention, and to identify areas for quality improvement. The literature databases MEDLINE, EMBASE, EconLit, National Health Service Economic Evaluation Database (NHS EED), Database of Abstracts of Reviews of Effects (DARE) and Health Technology Assessment (HTA). A review of the quality of health economic decision models was conducted by two independent reviewers, using the Philips checklist. Twenty-one relevant studies were identified. Differences were apparent regarding the model type (six decision trees, four Markov models, eight combinations, three undefined models), the model structure (types of events, Markov states) and the incorporation of data (efficacy, cost and utility data). Critical issues were the absence of particular events (e.g. thrombocytopenia, stroke) and questionable usage of utility values within some studies. As we restricted our search to health economic decision models comparing thrombosis inhibitors, interesting aspects related to the quality of studies of adjacent medical areas that compared stents or procedures could have been missed. This review identified areas where recommendations are indicated regarding the quality of future ACS decision models. For example, all critical events and relevant treatment options should be included. Models also need to allow for changing event probabilities to correctly reflect ACS and to incorporate appropriate, age-specific utility values and decrements when conducting cost-utility analyses.
Data Model for Multi Hazard Risk Assessment Spatial Support Decision System
NASA Astrophysics Data System (ADS)
Andrejchenko, Vera; Bakker, Wim; van Westen, Cees
2014-05-01
The goal of the CHANGES Spatial Decision Support System is to support end-users in making decisions related to risk reduction measures for areas at risk from multiple hydro-meteorological hazards. The crucial parts in the design of the system are the user requirements, the data model, the data storage and management, and the relationships between the objects in the system. The implementation of the data model is carried out entirely with an open source database management system with a spatial extension. The web application is implemented using open source geospatial technologies with PostGIS as the database, Python for scripting, and Geoserver and javascript libraries for visualization and the client-side user-interface. The model can handle information from different study areas (currently, study areas from France, Romania, Italia and Poland are considered). Furthermore, the data model handles information about administrative units, projects accessible by different types of users, user-defined hazard types (floods, snow avalanches, debris flows, etc.), hazard intensity maps of different return periods, spatial probability maps, elements at risk maps (buildings, land parcels, linear features etc.), economic and population vulnerability information dependent on the hazard type and the type of the element at risk, in the form of vulnerability curves. The system has an inbuilt database of vulnerability curves, but users can also add their own ones. Included in the model is the management of a combination of different scenarios (e.g. related to climate change, land use change or population change) and alternatives (possible risk-reduction measures), as well as data-structures for saving the calculated economic or population loss or exposure per element at risk, aggregation of the loss and exposure using the administrative unit maps, and finally, producing the risk maps. The risk data can be used for cost-benefit analysis (CBA) and multi-criteria evaluation (SMCE). The data model includes data-structures for CBA and SMCE. The model is at the stage where risk and cost-benefit calculations can be stored but the remaining part is currently under development. Multi-criteria information, user management and the relation of these with the rest of the model is our next step. Having a carefully designed data model plays a crucial role in the development of the whole system for rapid development, keeping the data consistent, and in the end, support the end-user in making good decisions in risk-reduction measures related to multiple natural hazards. This work is part of the EU FP7 Marie Curie ITN "CHANGES"project (www.changes-itn.edu)
Bayesian Calibration of Thermodynamic Databases and the Role of Kinetics
NASA Astrophysics Data System (ADS)
Wolf, A. S.; Ghiorso, M. S.
2017-12-01
Self-consistent thermodynamic databases of geologically relevant materials (like Berman, 1988; Holland and Powell, 1998, Stixrude & Lithgow-Bertelloni 2011) are crucial for simulating geological processes as well as interpreting rock samples from the field. These databases form the backbone of our understanding of how fluids and rocks interact at extreme planetary conditions. Considerable work is involved in their construction from experimental phase reaction data, as they must self-consistently describe the free energy surfaces (including relative offsets) of potentially hundreds of interacting phases. Standard database calibration methods typically utilize either linear programming or least squares regression. While both produce a viable model, they suffer from strong limitations on the training data (which must be filtered by hand), along with general ignorance of many of the sources of experimental uncertainty. We develop a new method for calibrating high P-T thermodynamic databases for use in geologic applications. The model is designed to handle pure solid endmember and free fluid phases and can be extended to include mixed solid solutions and melt phases. This new calibration effort utilizes Bayesian techniques to obtain optimal parameter values together with a full family of statistically acceptable models, summarized by the posterior. Unlike previous efforts, the Bayesian Logistic Uncertain Reaction (BLUR) model directly accounts for both measurement uncertainties and disequilibrium effects, by employing a kinetic reaction model whose parameters are empirically determined from the experiments themselves. Thus, along with the equilibrium free energy surfaces, we also provide rough estimates of the activation energies, entropies, and volumes for each reaction. As a first application, we demonstrate this new method on the three-phase aluminosilicate system, illustrating how it can produce superior estimates of the phase boundaries by incorporating constraints from all available data, while automatically handling variable data quality due to a combination of measurement errors and kinetic effects.
FReD: The Floral Reflectance Database — A Web Portal for Analyses of Flower Colour
Savolainen, Vincent; McOwan, Peter W.; Chittka, Lars
2010-01-01
Background Flower colour is of great importance in various fields relating to floral biology and pollinator behaviour. However, subjective human judgements of flower colour may be inaccurate and are irrelevant to the ecology and vision of the flower's pollinators. For precise, detailed information about the colours of flowers, a full reflectance spectrum for the flower of interest should be used rather than relying on such human assessments. Methodology/Principal Findings The Floral Reflectance Database (FReD) has been developed to make an extensive collection of such data available to researchers. It is freely available at http://www.reflectance.co.uk. The database allows users to download spectral reflectance data for flower species collected from all over the world. These could, for example, be used in modelling interactions between pollinator vision and plant signals, or analyses of flower colours in various habitats. The database contains functions for calculating flower colour loci according to widely-used models of bee colour space, reflectance graphs of the spectra and an option to search for flowers with similar colours in bee colour space. Conclusions/Significance The Floral Reflectance Database is a valuable new tool for researchers interested in the colours of flowers and their association with pollinator colour vision, containing raw spectral reflectance data for a large number of flower species. PMID:21170326
A dynamic clinical dental relational database.
Taylor, D; Naguib, R N G; Boulton, S
2004-09-01
The traditional approach to relational database design is based on the logical organization of data into a number of related normalized tables. One assumption is that the nature and structure of the data is known at the design stage. In the case of designing a relational database to store historical dental epidemiological data from individual clinical surveys, the structure of the data is not known until the data is presented for inclusion into the database. This paper addresses the issues concerned with the theoretical design of a clinical dynamic database capable of adapting the internal table structure to accommodate clinical survey data, and presents a prototype database application capable of processing, displaying, and querying the dental data.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Denef, Vincent; Shah, Manesh B; Verberkmoes, Nathan C
The recent surge in microbial genomic sequencing, combined with the development of high-throughput liquid chromatography-mass-spectrometry-based (LC/LC-MS/MS) proteomics, has raised the question of the extent to which genomic information of one strain or environmental sample can be used to profile proteomes of related strains or samples. Even with decreasing sequencing costs, it remains impractical to obtain genomic sequence for every strain or sample analyzed. Here, we evaluate how shotgun proteomics is affected by amino acid divergence between the sample and the genomic database using a probability-based model and a random mutation simulation model constrained by experimental data. To assess the effectsmore » of nonrandom distribution of mutations, we also evaluated identification levels using in silico peptide data from sequenced isolates with average amino acid identities (AAI) varying between 76 and 98%. We compared the predictions to experimental protein identification levels for a sample that was evaluated using a database that included genomic information for the dominant organism and for a closely related variant (95% AAI). The range of models set the boundaries at which half of the proteins in a proteomic experiment can be identified to be 77-92% AAI between orthologs in the sample and database. Consistent with this prediction, experimental data indicated loss of half the identifiable proteins at 90% AAI. Additional analysis indicated a 6.4% reduction of the initial protein coverage per 1% amino acid divergence and total identification loss at 86% AAI. Consequently, shotgun proteomics is capable of cross-strain identifications but avoids most crossspecies false positives.« less
The Danish Testicular Cancer database.
Daugaard, Gedske; Kier, Maria Gry Gundgaard; Bandak, Mikkel; Mortensen, Mette Saksø; Larsson, Heidi; Søgaard, Mette; Toft, Birgitte Groenkaer; Engvad, Birte; Agerbæk, Mads; Holm, Niels Vilstrup; Lauritsen, Jakob
2016-01-01
The nationwide Danish Testicular Cancer database consists of a retrospective research database (DaTeCa database) and a prospective clinical database (Danish Multidisciplinary Cancer Group [DMCG] DaTeCa database). The aim is to improve the quality of care for patients with testicular cancer (TC) in Denmark, that is, by identifying risk factors for relapse, toxicity related to treatment, and focusing on late effects. All Danish male patients with a histologically verified germ cell cancer diagnosis in the Danish Pathology Registry are included in the DaTeCa databases. Data collection has been performed from 1984 to 2007 and from 2013 onward, respectively. The retrospective DaTeCa database contains detailed information with more than 300 variables related to histology, stage, treatment, relapses, pathology, tumor markers, kidney function, lung function, etc. A questionnaire related to late effects has been conducted, which includes questions regarding social relationships, life situation, general health status, family background, diseases, symptoms, use of medication, marital status, psychosocial issues, fertility, and sexuality. TC survivors alive on October 2014 were invited to fill in this questionnaire including 160 validated questions. Collection of questionnaires is still ongoing. A biobank including blood/sputum samples for future genetic analyses has been established. Both samples related to DaTeCa and DMCG DaTeCa database are included. The prospective DMCG DaTeCa database includes variables regarding histology, stage, prognostic group, and treatment. The DMCG DaTeCa database has existed since 2013 and is a young clinical database. It is necessary to extend the data collection in the prospective database in order to answer quality-related questions. Data from the retrospective database will be added to the prospective data. This will result in a large and very comprehensive database for future studies on TC patients.
CicerTransDB 1.0: a resource for expression and functional study of chickpea transcription factors.
Gayali, Saurabh; Acharya, Shankar; Lande, Nilesh Vikram; Pandey, Aarti; Chakraborty, Subhra; Chakraborty, Niranjan
2016-07-29
Transcription factor (TF) databases are major resource for systematic studies of TFs in specific species as well as related family members. Even though there are several publicly available multi-species databases, the information on the amount and diversity of TFs within individual species is fragmented, especially for newly sequenced genomes of non-model species of agricultural significance. We constructed CicerTransDB (Cicer Transcription Factor Database), the first database of its kind, which would provide a centralized putatively complete list of TFs in a food legume, chickpea. CicerTransDB, available at www.cicertransdb.esy.es , is based on chickpea (Cicer arietinum L.) annotation v 1.0. The database is an outcome of genome-wide domain study and manual classification of TF families. This database not only provides information of the gene, but also gene ontology, domain and motif architecture. CicerTransDB v 1.0 comprises information of 1124 genes of chickpea and enables the user to not only search, browse and download sequences but also retrieve sequence features. CicerTransDB also provides several single click interfaces, transconnecting to various other databases to ease further analysis. Several webAPI(s) integrated in the database allow end-users direct access of data. A critical comparison of CicerTransDB with PlantTFDB (Plant Transcription Factor Database) revealed 68 novel TFs in the chickpea genome, hitherto unexplored. Database URL: http://www.cicertransdb.esy.es.
NASA Astrophysics Data System (ADS)
Dornback, M.; Hourigan, T.; Etnoyer, P.; McGuinn, R.; Cross, S. L.
2014-12-01
Research on deep-sea corals has expanded rapidly over the last two decades, as scientists began to realize their value as long-lived structural components of high biodiversity habitats and archives of environmental information. The NOAA Deep Sea Coral Research and Technology Program's National Database for Deep-Sea Corals and Sponges is a comprehensive resource for georeferenced data on these organisms in U.S. waters. The National Database currently includes more than 220,000 deep-sea coral records representing approximately 880 unique species. Database records from museum archives, commercial and scientific bycatch, and from journal publications provide baseline information with relatively coarse spatial resolution dating back as far as 1842. These data are complemented by modern, in-situ submersible observations with high spatial resolution, from surveys conducted by NOAA and NOAA partners. Management of high volumes of modern high-resolution observational data can be challenging. NOAA is working with our data partners to incorporate this occurrence data into the National Database, along with images and associated information related to geoposition, time, biology, taxonomy, environment, provenance, and accuracy. NOAA is also working to link associated datasets collected by our program's research, to properly archive them to the NOAA National Data Centers, to build a robust metadata record, and to establish a standard protocol to simplify the process. Access to the National Database is provided through an online mapping portal. The map displays point based records from the database. Records can be refined by taxon, region, time, and depth. The queries and extent used to view the map can also be used to download subsets of the database. The database, map, and website is already in use by NOAA, regional fishery management councils, and regional ocean planning bodies, but we envision it as a model that can expand to accommodate data on a global scale.
Ang, Darwin N; Behrns, Kevin E
2013-07-01
The emphasis on high-quality care has spawned the development of quality programs, most of which focus on broad outcome measures across a diverse group of providers. Our aim was to investigate the clinical outcomes for a department of surgery with multiple service lines of patient care using a relational database. Mortality, length of stay (LOS), patient safety indicators (PSIs), and hospital-acquired conditions were examined for each service line. Expected values for mortality and LOS were derived from University HealthSystem Consortium regression models, whereas expected values for PSIs were derived from Agency for Healthcare Research and Quality regression models. Overall, 5200 patients were evaluated from the months of January through May of both 2011 (n = 2550) and 2012 (n = 2650). The overall observed-to-expected (O/E) ratio of mortality improved from 1.03 to 0.92. The overall O/E ratio for LOS improved from 0.92 to 0.89. PSIs that predicted mortality included postoperative sepsis (O/E:1.89), postoperative respiratory failure (O/E:1.83), postoperative metabolic derangement (O/E:1.81), and postoperative deep vein thrombosis or pulmonary embolus (O/E:1.8). Mortality and LOS can be improved by using a relational database with outcomes reported to specific service lines. Service line quality can be influenced by distribution of frequent reports, group meetings, and service line-directed interventions.
NASA Astrophysics Data System (ADS)
Cervato, C.; Fils, D.; Bohling, G.; Diver, P.; Greer, D.; Reed, J.; Tang, X.
2006-12-01
The federation of databases is not a new endeavor. Great strides have been made e.g. in the health and astrophysics communities. Reviews of those successes indicate that they have been able to leverage off key cross-community core concepts. In its simplest implementation, a federation of databases with identical base schemas that can be extended to address individual efforts, is relatively easy to accomplish. Efforts of groups like the Open Geospatial Consortium have shown methods to geospatially relate data between different sources. We present here a summary of CHRONOS's (http://www.chronos.org) experience with highly heterogeneous data. Our experience with the federation of very diverse databases shows that the wide variety of encoding options for items like locality, time scale, taxon ID, and other key parameters makes it difficult to effectively join data across them. However, the response to this is not to develop one large, monolithic database, which will suffer growth pains due to social, national, and operational issues, but rather to systematically develop the architecture that will enable cross-resource (database, repository, tool, interface) interaction. CHRONOS has accomplished the major hurdle of federating small IT database efforts with service-oriented and XML-based approaches. The application of easy-to-use procedures that allow groups of all sizes to implement and experiment with searches across various databases and to use externally created tools is vital. We are sharing with the geoinformatics community the difficulties with application frameworks, user authentication, standards compliance, and data storage encountered in setting up web sites and portals for various science initiatives (e.g., ANDRILL, EARTHTIME). The ability to incorporate CHRONOS data, services, and tools into the existing framework of a group is crucial to the development of a model that supports and extends the vitality of the small- to medium-sized research effort that is essential for a vibrant scientific community. This presentation will directly address issues of portal development related to JSR-168 and other portal API's as well as issues related to both federated and local directory-based authentication. The application of service-oriented architecture in connection with ReST-based approaches is vital to facilitate service use by experienced and less experienced information technology groups. Application of these services with XML- based schemas allows for the connection to third party tools such a GIS-based tools and software designed to perform a specific scientific analysis. The connection of all these capabilities into a combined framework based on the standard XHTML Document object model and CSS 2.0 standards used in traditional web development will be demonstrated. CHRONOS also utilizes newer client techniques such as AJAX and cross- domain scripting along with traditional server-side database, application, and web servers. The combination of the various components of this architecture creates an environment based on open and free standards that allows for the discovery, retrieval, and integration of tools and data.
[Scientometrics and bibliometrics of biomedical engineering periodicals and papers].
Zhao, Ping; Xu, Ping; Li, Bingyan; Wang, Zhengrong
2003-09-01
This investigation was made to reveal the current status, research trend and research level of biomedical engineering in Chinese mainland by means of scientometrics and to assess the quality of the four domestic publications by bibliometrics. We identified all articles of four related publications by searching Chinese and foreign databases from 1997 to 2001. All articles collected or cited by these databases were searched and statistically analyzed for finding out the relevant distributions, including databases, years, authors, institutions, subject headings and subheadings. The source of sustentation funds and the related articles were analyzed too. The results showed that two journals were cited by two foreign databases and five Chinese databases simultaneously. The output of Journal of Biomedical Engineering was the highest. Its quantity of original papers cited by EI, CA and the totality of papers sponsored by funds were higher than those of the others, but the quantity and percentage per year of biomedical articles cited by EI were decreased in all. Inland core authors and institutions had come into being in the field of biomedical engineering. Their research topics were mainly concentrated on ten subject headings which included biocompatible materials, computer-assisted signal processing, electrocardiography, computer-assisted image processing, biomechanics, algorithms, electroencephalography, automatic data processing, mechanical stress, hemodynamics, mathematical computing, microcomputers, theoretical models, etc. The main subheadings were concentrated on instrumentation, physiopathology, diagnosis, therapy, ultrasonography, physiology, analysis, surgery, pathology, method, etc.
NASA Astrophysics Data System (ADS)
Gwiazda, A.; Banas, W.; Sekala, A.; Foit, K.; Hryniewicz, P.; Kost, G.
2015-11-01
Process of workcell designing is limited by different constructional requirements. They are related to technological parameters of manufactured element, to specifications of purchased elements of a workcell and to technical characteristics of a workcell scene. This shows the complexity of the design-constructional process itself. The results of such approach are individually designed workcell suitable to the specific location and specific production cycle. Changing this parameters one must rebuild the whole configuration of a workcell. Taking into consideration this it is important to elaborate the base of typical elements of a robot kinematic chain that could be used as the tool for building Virtual modelling of kinematic chains of industrial robots requires several preparatory phase. Firstly, it is important to create a database element, which will be models of industrial robot arms. These models could be described as functional primitives that represent elements between components of the kinematic pairs and structural members of industrial robots. A database with following elements is created: the base kinematic pairs, the base robot structural elements, the base of the robot work scenes. The first of these databases includes kinematic pairs being the key component of the manipulator actuator modules. Accordingly, as mentioned previously, it includes the first stage rotary pair of fifth stage. This type of kinematic pairs was chosen due to the fact that it occurs most frequently in the structures of industrial robots. Second base consists of structural robot elements therefore it allows for the conversion of schematic structures of kinematic chains in the structural elements of the arm of industrial robots. It contains, inter alia, the structural elements such as base, stiff members - simple or angular units. They allow converting recorded schematic three-dimensional elements. Last database is a database of scenes. It includes elements of both simple and complex: simple models of technological equipment, conveyors models, models of the obstacles and like that. Using these elements it could be formed various production spaces (robotized workcells), in which it is possible to virtually track the operation of an industrial robot arm modelled in the system.
BioModels Database: a repository of mathematical models of biological processes.
Chelliah, Vijayalakshmi; Laibe, Camille; Le Novère, Nicolas
2013-01-01
BioModels Database is a public online resource that allows storing and sharing of published, peer-reviewed quantitative, dynamic models of biological processes. The model components and behaviour are thoroughly checked to correspond the original publication and manually curated to ensure reliability. Furthermore, the model elements are annotated with terms from controlled vocabularies as well as linked to relevant external data resources. This greatly helps in model interpretation and reuse. Models are stored in SBML format, accepted in SBML and CellML formats, and are available for download in various other common formats such as BioPAX, Octave, SciLab, VCML, XPP and PDF, in addition to SBML. The reaction network diagram of the models is also available in several formats. BioModels Database features a search engine, which provides simple and more advanced searches. Features such as online simulation and creation of smaller models (submodels) from the selected model elements of a larger one are provided. BioModels Database can be accessed both via a web interface and programmatically via web services. New models are available in BioModels Database at regular releases, about every 4 months.
Geographic information system/watershed model interface
Fisher, Gary T.
1989-01-01
Geographic information systems allow for the interactive analysis of spatial data related to water-resources investigations. A conceptual design for an interface between a geographic information system and a watershed model includes functions for the estimation of model parameter values. Design criteria include ease of use, minimal equipment requirements, a generic data-base management system, and use of a macro language. An application is demonstrated for a 90.1-square-kilometer subbasin of the Patuxent River near Unity, Maryland, that performs automated derivation of watershed parameters for hydrologic modeling.
NASA Astrophysics Data System (ADS)
Olszewski, R.; Pillich-Kolipińska, A.; Fiedukowicz, A.
2013-12-01
Implementation of INSPIRE Directive in Poland requires not only legal transposition but also development of a number of technological solutions. The one of such tasks, associated with creation of Spatial Information Infrastructure in Poland, is developing a complex model of georeference database. Significant funding for GBDOT project enables development of the national basic topographical database as a multiresolution database (MRDB). Effective implementation of this type of database requires developing procedures for generalization of geographic information (generalization of digital landscape model - DLM), which, treating TOPO10 component as the only source for creation of TOPO250 component, will allow keeping conceptual and classification consistency between those database elements. To carry out this task, the implementation of the system's concept (prepared previously for Head Office of Geodesy and Cartography) is required. Such system is going to execute the generalization process using constrained-based modeling and allows to keep topological relationships between the objects as well as between the object classes. Full implementation of the designed generalization system requires running comprehensive tests which would help with its calibration and parameterization of the generalization procedures (related to the character of generalized area). Parameterization of this process will allow determining the criteria of specific objects selection, simplification algorithms as well as the operation order. Tests with the usage of differentiated, related to the character of the area, generalization process parameters become nowadays the priority issue. Parameters are delivered to the system in the form of XML files, which, with the help of dedicated tool, are generated from the spreadsheet files (XLS) filled in by user. Using XLS file makes entering and modifying the parameters easier. Among the other elements defined by the external parametric files there are: criteria of object selection, metric parameters of generalization algorithms (e.g. simplification or aggregation) and the operations' sequence. Testing on the trial areas of diverse character will allow developing the rules of generalization process' realization, its parameterization with the proposed tool within the multiresolution reference database. The authors have attempted to develop a generalization process' parameterization for a number of different trial areas. The generalization of the results will contribute to the development of a holistic system of generalized reference data stored in the national geodetic and cartographic resources.
Linking Data Access to Data Models to Applications: The Estuary Data Mapper
The U.S. Environmental Protection Agency (US EPA) is developing e-Estuary, a decision-support system for coastal management. E-Estuary has three elements: an estuarine geo-referenced relational database, watershed GIS coverages, and tools to support decision-making. To facilita...
Mashup of Geo and Space Science Data Provided via Relational Databases in the Semantic Web
NASA Astrophysics Data System (ADS)
Ritschel, B.; Seelus, C.; Neher, G.; Iyemori, T.; Koyama, Y.; Yatagai, A. I.; Murayama, Y.; King, T. A.; Hughes, J. S.; Fung, S. F.; Galkin, I. A.; Hapgood, M. A.; Belehaki, A.
2014-12-01
The use of RDBMS for the storage and management of geo and space science data and/or metadata is very common. Although the information stored in tables is based on a data model and therefore well organized and structured, a direct mashup with RDF based data stored in triple stores is not possible. One solution of the problem consists in the transformation of the whole content into RDF structures and storage in triple stores. Another interesting way is the use of a specific system/service, such as e.g. D2RQ, for the access to relational database content as virtual, read only RDF graphs. The Semantic Web based -proof of concept- GFZ ISDC uses the triple store Virtuoso for the storage of general context information/metadata to geo and space science satellite and ground station data. There is information about projects, platforms, instruments, persons, product types, etc. available but no detailed metadata about the data granuals itself. Such important information, as e.g. start or end time or the detailed spatial coverage of a single measurement is stored in RDBMS tables of the ISDC catalog system only. In order to provide a seamless access to all available information about the granuals/data products a mashup of the different data resources (triple store and RDBMS) is necessary. This paper describes the use of D2RQ for a Semantic Web/SPARQL based mashup of relational databases used for ISDC data server but also for the access to IUGONET and/or ESPAS and further geo and space science data resources. RDBMS Relational Database Management System RDF Resource Description Framework SPARQL SPARQL Protocol And RDF Query Language D2RQ Accessing Relational Databases as Virtual RDF Graphs GFZ ISDC German Research Centre for Geosciences Information System and Data Center IUGONET Inter-university Upper Atmosphere Global Observation Network (Japanese project) ESPAS Near earth space data infrastructure for e-science (European Union funded project)
Computer-aided auditing of prescription drug claims.
Iyengar, Vijay S; Hermiz, Keith B; Natarajan, Ramesh
2014-09-01
We describe a methodology for identifying and ranking candidate audit targets from a database of prescription drug claims. The relevant audit targets may include various entities such as prescribers, patients and pharmacies, who exhibit certain statistical behavior indicative of potential fraud and abuse over the prescription claims during a specified period of interest. Our overall approach is consistent with related work in statistical methods for detection of fraud and abuse, but has a relative emphasis on three specific aspects: first, based on the assessment of domain experts, certain focus areas are selected and data elements pertinent to the audit analysis in each focus area are identified; second, specialized statistical models are developed to characterize the normalized baseline behavior in each focus area; and third, statistical hypothesis testing is used to identify entities that diverge significantly from their expected behavior according to the relevant baseline model. The application of this overall methodology to a prescription claims database from a large health plan is considered in detail.
Health and productivity as a business strategy.
Loeppke, Ronald; Taitel, Michael; Richling, Dennis; Parry, Thomas; Kessler, Ronald C; Hymel, Pam; Konicki, Doris
2007-07-01
The objective of this study is to assess the magnitude of health-related lost productivity relative to medical and pharmacy costs for four employers and assess the business implications of a "full-cost" approach to managing health. A database was developed by integrating medical and pharmacy claims data with employee self-report productivity and health information collected through the Health and Work Performance Questionnaire (HPQ). Information collected on employer business measures were combined with this database to model health-related lost productivity. 1) Health-related productivity costs were more than four times greater than medical and pharmacy costs. 2) The full cost of poor health is driven by different health conditions than those driving medical and pharmacy costs alone. This study demonstrates that Integrated Population Health & Productivity Management should be built on a foundation of Integrated Population Health & Productivity Measurement. Therefore, employers would reveal a blueprint for action for their integrated health and productivity enhancement strategies by measuring the full health and productivity costs related to the burdens of illness and health risk in their population.
Wauchope, R Don; Ahuja, Lajpat R; Arnold, Jeffrey G; Bingner, Ron; Lowrance, Richard; van Genuchten, Martinus T; Adams, Larry D
2003-01-01
We present an overview of USDA Agricultural Research Service (ARS) computer models and databases related to pest-management science, emphasizing current developments in environmental risk assessment and management simulation models. The ARS has a unique national interdisciplinary team of researchers in surface and sub-surface hydrology, soil and plant science, systems analysis and pesticide science, who have networked to develop empirical and mechanistic computer models describing the behavior of pests, pest responses to controls and the environmental impact of pest-control methods. Historically, much of this work has been in support of production agriculture and in support of the conservation programs of our 'action agency' sister, the Natural Resources Conservation Service (formerly the Soil Conservation Service). Because we are a public agency, our software/database products are generally offered without cost, unless they are developed in cooperation with a private-sector cooperator. Because ARS is a basic and applied research organization, with development of new science as our highest priority, these products tend to be offered on an 'as-is' basis with limited user support except for cooperating R&D relationship with other scientists. However, rapid changes in the technology for information analysis and communication continually challenge our way of doing business.
A spatial database for landslides in northern Bavaria: A methodological approach
NASA Astrophysics Data System (ADS)
Jäger, Daniel; Kreuzer, Thomas; Wilde, Martina; Bemm, Stefan; Terhorst, Birgit
2018-04-01
Landslide databases provide essential information for hazard modeling, damages on buildings and infrastructure, mitigation, and research needs. This study presents the development of a landslide database system named WISL (Würzburg Information System on Landslides), currently storing detailed landslide data for northern Bavaria, Germany, in order to enable scientific queries as well as comparisons with other regional landslide inventories. WISL is based on free open source software solutions (PostgreSQL, PostGIS) assuring good correspondence of the various softwares and to enable further extensions with specific adaptions of self-developed software. Apart from that, WISL was designed to be particularly compatible for easy communication with other databases. As a central pre-requisite for standardized, homogeneous data acquisition in the field, a customized data sheet for landslide description was compiled. This sheet also serves as an input mask for all data registration procedures in WISL. A variety of "in-database" solutions for landslide analysis provides the necessary scalability for the database, enabling operations at the local server. In its current state, WISL already enables extensive analysis and queries. This paper presents an example analysis of landslides in Oxfordian Limestones in the northeastern Franconian Alb, northern Bavaria. The results reveal widely differing landslides in terms of geometry and size. Further queries related to landslide activity classifies the majority of the landslides as currently inactive, however, they clearly possess a certain potential for remobilization. Along with some active mass movements, a significant percentage of landslides potentially endangers residential areas or infrastructure. The main aspect of future enhancements of the WISL database is related to data extensions in order to increase research possibilities, as well as to transfer the system to other regions and countries.
Hospital nurse staffing models and patient and staff-related outcomes.
Butler, Michelle; Collins, Rita; Drennan, Jonathan; Halligan, Phil; O'Mathúna, Dónal P; Schultz, Timothy J; Sheridan, Ann; Vilis, Eileen
2011-07-06
Nurse staffing interventions have been introduced across countries in recent years in response to changing patient requirements, developments in patient care, and shortages of qualified nursing staff. These include changes in skill mix, grade mix or qualification mix, staffing levels, nursing shifts or nurses' work patterns. Nurse staffing has been closely linked to patient outcomes, organisational outcomes such as costs, and staff-related outcomes. Our aim was to explore the effect of hospital nurse staffing models on patient and staff-related outcomes. We searched the following databases from inception through to May 2009: Cochrane/EPOC resources (DARE, CENTRAL, the EPOC Specialised Register), PubMed, EMBASE, CINAHL Plus, CAB Health, Virginia Henderson International Nursing Library, the Joanna Briggs Institute database, the British Library, international theses databases, as well as generic search engines. Randomised control trials, controlled clinical trials, controlled before and after studies and interrupted time series analyses of interventions relating to hospital nurse staffing models. Participants were patients and nursing staff working in hospital settings. We included any objective measure of patient or staff-related outcome. Seven reviewers working in pairs independently extracted data from each potentially relevant study and assessed risk of bias. We identified 6,202 studies that were potentially relevant to our review. Following detailed examination of each study, we included 15 studies in the review. Despite the number of studies conducted on this topic, the quality of evidence overall was very limited. We found no evidence that the addition of specialist nurses to nursing staff reduces patient death rates, attendance at the emergency department, or readmission rates, but it is likely to result in shorter patient hospital stays, and reductions in pressure ulcers. The evidence in relation to the impact of replacing Registered Nurses with unqualified nursing assistants on patient outcomes is very limited. However, it is suggested that specialist support staff, such as dietary assistants, may have an important impact on patient outcomes. Self-scheduling and primary nursing may reduce staff turnover. The introduction of team midwifery (versus standard care) may reduce medical procedures in labour and result in a shorter length of stay without compromising maternal or perinatal safety. We found no eligible studies of educational interventions, grade mix interventions, or staffing levels and therefore we are unable to draw conclusions in relation to these interventions. The findings suggest interventions relating to hospital nurse staffing models may improve some patient outcomes, particularly the addition of specialist nursing and specialist support roles to the nursing workforce. Interventions relating to hospital nurse staffing models may also improve staff-related outcomes, particularly the introduction of primary nursing and self-scheduling. However, these findings should be treated with extreme caution due to the limited evidence available from the research conducted to date.
StarView: The object oriented design of the ST DADS user interface
NASA Technical Reports Server (NTRS)
Williams, J. D.; Pollizzi, J. A.
1992-01-01
StarView is the user interface being developed for the Hubble Space Telescope Data Archive and Distribution Service (ST DADS). ST DADS is the data archive for HST observations and a relational database catalog describing the archived data. Users will use StarView to query the catalog and select appropriate datasets for study. StarView sends requests for archived datasets to ST DADS which processes the requests and returns the database to the user. StarView is designed to be a powerful and extensible user interface. Unique features include an internal relational database to navigate query results, a form definition language that will work with both CRT and X interfaces, a data definition language that will allow StarView to work with any relational database, and the ability to generate adhoc queries without requiring the user to understand the structure of the ST DADS catalog. Ultimately, StarView will allow the user to refine queries in the local database for improved performance and merge in data from external sources for correlation with other query results. The user will be able to create a query from single or multiple forms, merging the selected attributes into a single query. Arbitrary selection of attributes for querying is supported. The user will be able to select how query results are viewed. A standard form or table-row format may be used. Navigation capabilities are provided to aid the user in viewing query results. Object oriented analysis and design techniques were used in the design of StarView to support the mechanisms and concepts required to implement these features. One such mechanism is the Model-View-Controller (MVC) paradigm. The MVC allows the user to have multiple views of the underlying database, while providing a consistent mechanism for interaction regardless of the view. This approach supports both CRT and X interfaces while providing a common mode of user interaction. Another powerful abstraction is the concept of a Query Model. This concept allows a single query to be built form a single or multiple forms before it is submitted to ST DADS. Supporting this concept is the adhoc query generator which allows the user to select and qualify an indeterminate number attributes from the database. The user does not need any knowledge of how the joins across various tables are to be resolved. The adhoc generator calculates the joins automatically and generates the correct SQL query.
A GIS-Enabled, Michigan-Specific, Hierarchical Groundwater Modeling and Visualization System
NASA Astrophysics Data System (ADS)
Liu, Q.; Li, S.; Mandle, R.; Simard, A.; Fisher, B.; Brown, E.; Ross, S.
2005-12-01
Efficient management of groundwater resources relies on a comprehensive database that represents the characteristics of the natural groundwater system as well as analysis and modeling tools to describe the impacts of decision alternatives. Many agencies in Michigan have spent several years compiling expensive and comprehensive surface water and groundwater inventories and other related spatial data that describe their respective areas of responsibility. However, most often this wealth of descriptive data has only been utilized for basic mapping purposes. The benefits from analyzing these data, using GIS analysis functions or externally developed analysis models or programs, has yet to be systematically realized. In this talk, we present a comprehensive software environment that allows Michigan groundwater resources managers and frontline professionals to make more effective use of the available data and improve their ability to manage and protect groundwater resources, address potential conflicts, design cleanup schemes, and prioritize investigation activities. In particular, we take advantage of the Interactive Ground Water (IGW) modeling system and convert it to a customized software environment specifically for analyzing, modeling, and visualizing the Michigan statewide groundwater database. The resulting Michigan IGW modeling system (IGW-M) is completely window-based, fully interactive, and seamlessly integrated with a GIS mapping engine. The system operates in real-time (on the fly) providing dynamic, hierarchical mapping, modeling, spatial analysis, and visualization. Specifically, IGW-M allows water resources and environmental professionals in Michigan to: * Access and utilize the extensive data from the statewide groundwater database, interactively manipulate GIS objects, and display and query the associated data and attributes; * Analyze and model the statewide groundwater database, interactively convert GIS objects into numerical model features, automatically extract data and attributes, and simulate unsteady groundwater flow and contaminant transport in response to water and land management decisions; * Visualize and map model simulations and predictions with data from the statewide groundwater database in a seamless interactive environment. IGW-M has the potential to significantly improve the productivity of Michigan groundwater management investigations. It changes the role of engineers and scientists in modeling and analyzing the statewide groundwater database from heavily physical to cognitive problem-solving and decision-making tasks. The seamless real-time integration, real-time visual interaction, and real-time processing capability allows a user to focus on critical management issues, conflicts, and constraints, to quickly and iteratively examine conceptual approximations, management and planning scenarios, and site characterization assumptions, to identify dominant processes, to evaluate data worth and sensitivity, and to guide further data-collection activities. We illustrate the power and effectiveness of the M-IGW modeling and visualization system with a real case study and a real-time, live demonstration.
NASA Astrophysics Data System (ADS)
Liu, G.; Wu, C.; Li, X.; Song, P.
2013-12-01
The 3D urban geological information system has been a major part of the national urban geological survey project of China Geological Survey in recent years. Large amount of multi-source and multi-subject data are to be stored in the urban geological databases. There are various models and vocabularies drafted and applied by industrial companies in urban geological data. The issues such as duplicate and ambiguous definition of terms and different coding structure increase the difficulty of information sharing and data integration. To solve this problem, we proposed a national standard-driven information classification and coding method to effectively store and integrate urban geological data, and we applied the data dictionary technology to achieve structural and standard data storage. The overall purpose of this work is to set up a common data platform to provide information sharing service. Research progresses are as follows: (1) A unified classification and coding method for multi-source data based on national standards. Underlying national standards include GB 9649-88 for geology and GB/T 13923-2006 for geography. Current industrial models are compared with national standards to build a mapping table. The attributes of various urban geological data entity models are reduced to several categories according to their application phases and domains. Then a logical data model is set up as a standard format to design data file structures for a relational database. (2) A multi-level data dictionary for data standardization constraint. Three levels of data dictionary are designed: model data dictionary is used to manage system database files and enhance maintenance of the whole database system; attribute dictionary organizes fields used in database tables; term and code dictionary is applied to provide a standard for urban information system by adopting appropriate classification and coding methods; comprehensive data dictionary manages system operation and security. (3) An extension to system data management function based on data dictionary. Data item constraint input function is making use of the standard term and code dictionary to get standard input result. Attribute dictionary organizes all the fields of an urban geological information database to ensure the consistency of term use for fields. Model dictionary is used to generate a database operation interface automatically with standard semantic content via term and code dictionary. The above method and technology have been applied to the construction of Fuzhou Urban Geological Information System, South-East China with satisfactory results.
Ground Motion Prediction Model Using Artificial Neural Network
NASA Astrophysics Data System (ADS)
Dhanya, J.; Raghukanth, S. T. G.
2018-03-01
This article focuses on developing a ground motion prediction equation based on artificial neural network (ANN) technique for shallow crustal earthquakes. A hybrid technique combining genetic algorithm and Levenberg-Marquardt technique is used for training the model. The present model is developed to predict peak ground velocity, and 5% damped spectral acceleration. The input parameters for the prediction are moment magnitude ( M w), closest distance to rupture plane ( R rup), shear wave velocity in the region ( V s30) and focal mechanism ( F). A total of 13,552 ground motion records from 288 earthquakes provided by the updated NGA-West2 database released by Pacific Engineering Research Center are utilized to develop the model. The ANN architecture considered for the model consists of 192 unknowns including weights and biases of all the interconnected nodes. The performance of the model is observed to be within the prescribed error limits. In addition, the results from the study are found to be comparable with the existing relations in the global database. The developed model is further demonstrated by estimating site-specific response spectra for Shimla city located in Himalayan region.
Using SQL Databases for Sequence Similarity Searching and Analysis.
Pearson, William R; Mackey, Aaron J
2017-09-13
Relational databases can integrate diverse types of information and manage large sets of similarity search results, greatly simplifying genome-scale analyses. By focusing on taxonomic subsets of sequences, relational databases can reduce the size and redundancy of sequence libraries and improve the statistical significance of homologs. In addition, by loading similarity search results into a relational database, it becomes possible to explore and summarize the relationships between all of the proteins in an organism and those in other biological kingdoms. This unit describes how to use relational databases to improve the efficiency of sequence similarity searching and demonstrates various large-scale genomic analyses of homology-related data. It also describes the installation and use of a simple protein sequence database, seqdb_demo, which is used as a basis for the other protocols. The unit also introduces search_demo, a database that stores sequence similarity search results. The search_demo database is then used to explore the evolutionary relationships between E. coli proteins and proteins in other organisms in a large-scale comparative genomic analysis. © 2017 by John Wiley & Sons, Inc. Copyright © 2017 John Wiley & Sons, Inc.
2014-04-25
EA’s Java application programming interface (API), the team built a tool called OWL2EA that can ingest an OWL file and generate the corresponding UML...ObjectItemStructure specification shown in Figure 10. Running this script in the relational database server MySQL creates the physical schema that
Web application and database modeling of traffic impact analysis using Google Maps
NASA Astrophysics Data System (ADS)
Yulianto, Budi; Setiono
2017-06-01
Traffic impact analysis (TIA) is a traffic study that aims at identifying the impact of traffic generated by development or change in land use. In addition to identifying the traffic impact, TIA is also equipped with mitigation measurement to minimize the arising traffic impact. TIA has been increasingly important since it was defined in the act as one of the requirements in the proposal of Building Permit. The act encourages a number of TIA studies in various cities in Indonesia, including Surakarta. For that reason, it is necessary to study the development of TIA by adopting the concept Transportation Impact Control (TIC) in the implementation of the TIA standard document and multimodal modeling. It includes TIA's standardization for technical guidelines, database and inspection by providing TIA checklists, monitoring and evaluation. The research was undertaken by collecting the historical data of junctions, modeling of the data in the form of relational database, building a user interface for CRUD (Create, Read, Update and Delete) the TIA data in the form of web programming with Google Maps libraries. The result research is a system that provides information that helps the improvement and repairment of TIA documents that exist today which is more transparent, reliable and credible.
Dynamic publication model for neurophysiology databases.
Gardner, D; Abato, M; Knuth, K H; DeBellis, R; Erde, S M
2001-08-29
We have implemented a pair of database projects, one serving cortical electrophysiology and the other invertebrate neurones and recordings. The design for each combines aspects of two proven schemes for information interchange. The journal article metaphor determined the type, scope, organization and quantity of data to comprise each submission. Sequence databases encouraged intuitive tools for data viewing, capture, and direct submission by authors. Neurophysiology required transcending these models with new datatypes. Time-series, histogram and bivariate datatypes, including illustration-like wrappers, were selected by their utility to the community of investigators. As interpretation of neurophysiological recordings depends on context supplied by metadata attributes, searches are via visual interfaces to sets of controlled-vocabulary metadata trees. Neurones, for example, can be specified by metadata describing functional and anatomical characteristics. Permanence is advanced by data model and data formats largely independent of contemporary technology or implementation, including Java and the XML standard. All user tools, including dynamic data viewers that serve as a virtual oscilloscope, are Java-based, free, multiplatform, and distributed by our application servers to any contemporary networked computer. Copyright is retained by submitters; viewer displays are dynamic and do not violate copyright of related journal figures. Panels of neurophysiologists view and test schemas and tools, enhancing community support.
2010-01-01
Background Quantitative models of biochemical and cellular systems are used to answer a variety of questions in the biological sciences. The number of published quantitative models is growing steadily thanks to increasing interest in the use of models as well as the development of improved software systems and the availability of better, cheaper computer hardware. To maximise the benefits of this growing body of models, the field needs centralised model repositories that will encourage, facilitate and promote model dissemination and reuse. Ideally, the models stored in these repositories should be extensively tested and encoded in community-supported and standardised formats. In addition, the models and their components should be cross-referenced with other resources in order to allow their unambiguous identification. Description BioModels Database http://www.ebi.ac.uk/biomodels/ is aimed at addressing exactly these needs. It is a freely-accessible online resource for storing, viewing, retrieving, and analysing published, peer-reviewed quantitative models of biochemical and cellular systems. The structure and behaviour of each simulation model distributed by BioModels Database are thoroughly checked; in addition, model elements are annotated with terms from controlled vocabularies as well as linked to relevant data resources. Models can be examined online or downloaded in various formats. Reaction network diagrams generated from the models are also available in several formats. BioModels Database also provides features such as online simulation and the extraction of components from large scale models into smaller submodels. Finally, the system provides a range of web services that external software systems can use to access up-to-date data from the database. Conclusions BioModels Database has become a recognised reference resource for systems biology. It is being used by the community in a variety of ways; for example, it is used to benchmark different simulation systems, and to study the clustering of models based upon their annotations. Model deposition to the database today is advised by several publishers of scientific journals. The models in BioModels Database are freely distributed and reusable; the underlying software infrastructure is also available from SourceForge https://sourceforge.net/projects/biomodels/ under the GNU General Public License. PMID:20587024
Ventilator-Related Adverse Events: A Taxonomy and Findings From 3 Incident Reporting Systems.
Pham, Julius Cuong; Williams, Tamara L; Sparnon, Erin M; Cillie, Tam K; Scharen, Hilda F; Marella, William M
2016-05-01
In 2009, researchers from Johns Hopkins University's Armstrong Institute for Patient Safety and Quality; public agencies, including the FDA; and private partners, including the Emergency Care Research Institute and the University HealthSystem Consortium (UHC) Safety Intelligence Patient Safety Organization, sought to form a public-private partnership for the promotion of patient safety (P5S) to advance patient safety through voluntary partnerships. The study objective was to test the concept of the P5S to advance our understanding of safety issues related to ventilator events, to develop a common classification system for categorizing adverse events related to mechanical ventilators, and to perform a comparison of adverse events across different adverse event reporting systems. We performed a cross-sectional analysis of ventilator-related adverse events reported in 2012 from the following incident reporting systems: the Pennsylvania Patient Safety Authority's Patient Safety Reporting System, UHC's Safety Intelligence Patient Safety Organization database, and the FDA's Manufacturer and User Facility Device Experience database. Once each organization had its dataset of ventilator-related adverse events, reviewers read the narrative descriptions of each event and classified it according to the developed common taxonomy. A Pennsylvania Patient Safety Authority, FDA, and UHC search provided 252, 274, and 700 relevant reports, respectively. The 3 event types most commonly reported to the UHC and the Pennsylvania Patient Safety Authority's Patient Safety Reporting System databases were airway/breathing circuit issue, human factor issues, and ventilator malfunction events. The top 3 event types reported to the FDA were ventilator malfunction, power source issue, and alarm failure. Overall, we found that (1) through the development of a common taxonomy, adverse events from 3 reporting systems can be evaluated, (2) the types of events reported in each database were related to the purpose of the database and the source of the reports, resulting in significant differences in reported event categories across the 3 systems, and (3) a public-private collaboration for investigating ventilator-related adverse events under the P5S model is feasible. Copyright © 2016 by Daedalus Enterprises.
A hybrid CNN feature model for pulmonary nodule malignancy risk differentiation.
Wang, Huafeng; Zhao, Tingting; Li, Lihong Connie; Pan, Haixia; Liu, Wanquan; Gao, Haoqi; Han, Fangfang; Wang, Yuehai; Qi, Yifan; Liang, Zhengrong
2018-01-01
The malignancy risk differentiation of pulmonary nodule is one of the most challenge tasks of computer-aided diagnosis (CADx). Most recently reported CADx methods or schemes based on texture and shape estimation have shown relatively satisfactory on differentiating the risk level of malignancy among the nodules detected in lung cancer screening. However, the existing CADx schemes tend to detect and analyze characteristics of pulmonary nodules from a statistical perspective according to local features only. Enlightened by the currently prevailing learning ability of convolutional neural network (CNN), which simulates human neural network for target recognition and our previously research on texture features, we present a hybrid model that takes into consideration of both global and local features for pulmonary nodule differentiation using the largest public database founded by the Lung Image Database Consortium and Image Database Resource Initiative (LIDC-IDRI). By comparing three types of CNN models in which two of them were newly proposed by us, we observed that the multi-channel CNN model yielded the best discrimination in capacity of differentiating malignancy risk of the nodules based on the projection of distributions of extracted features. Moreover, CADx scheme using the new multi-channel CNN model outperformed our previously developed CADx scheme using the 3D texture feature analysis method, which increased the computed area under a receiver operating characteristic curve (AUC) from 0.9441 to 0.9702.
OBSIFRAC: database-supported software for 3D modeling of rock mass fragmentation
NASA Astrophysics Data System (ADS)
Empereur-Mot, Luc; Villemin, Thierry
2003-03-01
Under stress, fractures in rock masses tend to form fully connected networks. The mass can thus be thought of as a 3D series of blocks produced by fragmentation processes. A numerical model has been developed that uses a relational database to describe such a mass. The model, which assumes the fractures to be plane, allows data from natural networks to test theories concerning fragmentation processes. In the model, blocks are bordered by faces that are composed of edges and vertices. A fracture can originate from a seed point, its orientation being controlled by the stress field specified by an orientation matrix. Alternatively, it can be generated from a discrete set of given orientations and positions. Both kinds of fracture can occur together in a model. From an original simple block, a given fracture produces two simple polyhedral blocks, and the original block becomes compound. Compound and simple blocks created throughout fragmentation are stored in the database. Several fragmentation processes have been studied. In one scenario, a constant proportion of blocks is fragmented at each step of the process. The resulting distribution appears to be fractal, although seed points are random in each fragmented block. In a second scenario, division affects only one random block at each stage of the process, and gives a Weibull volume distribution law. This software can be used for a large number of other applications.
Chao, Edmund Y S; Armiger, Robert S; Yoshida, Hiroaki; Lim, Jonathan; Haraguchi, Naoki
2007-03-08
The ability to combine physiology and engineering analyses with computer sciences has opened the door to the possibility of creating the "Virtual Human" reality. This paper presents a broad foundation for a full-featured biomechanical simulator for the human musculoskeletal system physiology. This simulation technology unites the expertise in biomechanical analysis and graphic modeling to investigate joint and connective tissue mechanics at the structural level and to visualize the results in both static and animated forms together with the model. Adaptable anatomical models including prosthetic implants and fracture fixation devices and a robust computational infrastructure for static, kinematic, kinetic, and stress analyses under varying boundary and loading conditions are incorporated on a common platform, the VIMS (Virtual Interactive Musculoskeletal System). Within this software system, a manageable database containing long bone dimensions, connective tissue material properties and a library of skeletal joint system functional activities and loading conditions are also available and they can easily be modified, updated and expanded. Application software is also available to allow end-users to perform biomechanical analyses interactively. Examples using these models and the computational algorithms in a virtual laboratory environment are used to demonstrate the utility of these unique database and simulation technology. This integrated system, model library and database will impact on orthopaedic education, basic research, device development and application, and clinical patient care related to musculoskeletal joint system reconstruction, trauma management, and rehabilitation.
Chao, Edmund YS; Armiger, Robert S; Yoshida, Hiroaki; Lim, Jonathan; Haraguchi, Naoki
2007-01-01
The ability to combine physiology and engineering analyses with computer sciences has opened the door to the possibility of creating the "Virtual Human" reality. This paper presents a broad foundation for a full-featured biomechanical simulator for the human musculoskeletal system physiology. This simulation technology unites the expertise in biomechanical analysis and graphic modeling to investigate joint and connective tissue mechanics at the structural level and to visualize the results in both static and animated forms together with the model. Adaptable anatomical models including prosthetic implants and fracture fixation devices and a robust computational infrastructure for static, kinematic, kinetic, and stress analyses under varying boundary and loading conditions are incorporated on a common platform, the VIMS (Virtual Interactive Musculoskeletal System). Within this software system, a manageable database containing long bone dimensions, connective tissue material properties and a library of skeletal joint system functional activities and loading conditions are also available and they can easily be modified, updated and expanded. Application software is also available to allow end-users to perform biomechanical analyses interactively. Examples using these models and the computational algorithms in a virtual laboratory environment are used to demonstrate the utility of these unique database and simulation technology. This integrated system, model library and database will impact on orthopaedic education, basic research, device development and application, and clinical patient care related to musculoskeletal joint system reconstruction, trauma management, and rehabilitation. PMID:17343764
Chess databases as a research vehicle in psychology: Modeling large data.
Vaci, Nemanja; Bilalić, Merim
2017-08-01
The game of chess has often been used for psychological investigations, particularly in cognitive science. The clear-cut rules and well-defined environment of chess provide a model for investigations of basic cognitive processes, such as perception, memory, and problem solving, while the precise rating system for the measurement of skill has enabled investigations of individual differences and expertise-related effects. In the present study, we focus on another appealing feature of chess-namely, the large archive databases associated with the game. The German national chess database presented in this study represents a fruitful ground for the investigation of multiple longitudinal research questions, since it collects the data of over 130,000 players and spans over 25 years. The German chess database collects the data of all players, including hobby players, and all tournaments played. This results in a rich and complete collection of the skill, age, and activity of the whole population of chess players in Germany. The database therefore complements the commonly used expertise approach in cognitive science by opening up new possibilities for the investigation of multiple factors that underlie expertise and skill acquisition. Since large datasets are not common in psychology, their introduction also raises the question of optimal and efficient statistical analysis. We offer the database for download and illustrate how it can be used by providing concrete examples and a step-by-step tutorial using different statistical analyses on a range of topics, including skill development over the lifetime, birth cohort effects, effects of activity and inactivity on skill, and gender differences.
Design of a Multi Dimensional Database for the Archimed DataWarehouse.
Bréant, Claudine; Thurler, Gérald; Borst, François; Geissbuhler, Antoine
2005-01-01
The Archimed data warehouse project started in 1993 at the Geneva University Hospital. It has progressively integrated seven data marts (or domains of activity) archiving medical data such as Admission/Discharge/Transfer (ADT) data, laboratory results, radiology exams, diagnoses, and procedure codes. The objective of the Archimed data warehouse is to facilitate the access to an integrated and coherent view of patient medical in order to support analytical activities such as medical statistics, clinical studies, retrieval of similar cases and data mining processes. This paper discusses three principal design aspects relative to the conception of the database of the data warehouse: 1) the granularity of the database, which refers to the level of detail or summarization of data, 2) the database model and architecture, describing how data will be presented to end users and how new data is integrated, 3) the life cycle of the database, in order to ensure long term scalability of the environment. Both, the organization of patient medical data using a standardized elementary fact representation and the use of the multi dimensional model have proved to be powerful design tools to integrate data coming from the multiple heterogeneous database systems part of the transactional Hospital Information System (HIS). Concurrently, the building of the data warehouse in an incremental way has helped to control the evolution of the data content. These three design aspects bring clarity and performance regarding data access. They also provide long term scalability to the system and resilience to further changes that may occur in source systems feeding the data warehouse.
Zhao, Lei; Guo, Yi; Wang, Wei; Yan, Li-juan
2011-08-01
To evaluate the effectiveness of acupuncture as a treatment for neurovascular headache and to analyze the current situation related to acupuncture treatment. PubMed database (1966-2010), EMBASE database (1986-2010), Cochrane Library (Issue 1, 2010), Chinese Biomedical Literature Database (1979-2010), China HowNet Knowledge Database (1979-2010), VIP Journals Database (1989-2010), and Wanfang database (1998-2010) were retrieved. Randomized or quasi-randomized controlled studies were included. The priority was given to high-quality randomized, controlled trials. Statistical outcome indicators were measured using RevMan 5.0.20 software. A total of 16 articles and 1 535 cases were included. Meta-analysis showed a significant difference between the acupuncture therapy and Western medicine therapy [combined RR (random efficacy model)=1.46, 95% CI (1.21, 1.75), Z=3.96, P<0.0001], indicating an obvious superior effect of the acupuncture therapy; significant difference also existed between the comprehensive acupuncture therapy and acupuncture therapy alone [combined RR (fixed efficacy model)=3.35, 95% CI (1.92, 5.82), Z=4.28, P<0.0001], indicating that acupuncture combined with other therapies, such as points injection, scalp acupuncture, auricular acupuncture, etc., were superior to the conventional body acupuncture therapy alone. The inclusion of limited clinical studies had verified the efficacy of acupuncture in the treatment of neurovascular headache. Although acupuncture or its combined therapies provides certain advantages, most clinical studies are of small sample sizes. Large sample size, randomized, controlled trials are needed in the future for more definitive results.
NASA Astrophysics Data System (ADS)
Mandal, Chhabinath; Linthicum, D. Scott
1993-04-01
A modelling algorithm (PROGEN) for the generation of complete protein atomic coordinates from only the α-carbon coordinates is described. PROGEN utilizes an optimal geometry parameter (OGP) database for the positioning of atoms for each amino acid of the polypeptide model. The OGP database was established by examining the statistical correlations between 23 different intra-peptide and inter-peptide geometric parameters relative to the α-carbon distances for each amino acid in a library of 19 known proteins from the Brookhaven Protein Database (BPDB). The OGP files for specific amino acids and peptides were used to generate the atomic positions, with respect to α-carbons, for main-chain and side-chain atoms in the modelled structure. Refinement of the initial model was accomplished using energy minimization (EM) and molecular dynamics techniques. PROGEN was tested using 60 known proteins in the BPDB, representing a wide spectrum of primary and secondary structures. Comparison between PROGEN models and BPDB crystal reference structures gave r.m.s.d. values for peptide main-chain atoms between 0.29 and 0.76 Å, with a grand average of 0.53 Å for all 60 models. The r.m.s.d. for all non-hydrogen atoms ranged between 1.44 and 1.93 Å for the 60 polypeptide models. PROGEN was also able to make the correct assignment of cis- or trans-proline configurations in the protein structures examined. PROGEN offers a fully automatic building and refinement procedure and requires no special or specific structural considerations for the protein to be modelled.
Khan, Aihab; Husain, Syed Afaq
2013-01-01
We put forward a fragile zero watermarking scheme to detect and characterize malicious modifications made to a database relation. Most of the existing watermarking schemes for relational databases introduce intentional errors or permanent distortions as marks into the database original content. These distortions inevitably degrade the data quality and data usability as the integrity of a relational database is violated. Moreover, these fragile schemes can detect malicious data modifications but do not characterize the tempering attack, that is, the nature of tempering. The proposed fragile scheme is based on zero watermarking approach to detect malicious modifications made to a database relation. In zero watermarking, the watermark is generated (constructed) from the contents of the original data rather than introduction of permanent distortions as marks into the data. As a result, the proposed scheme is distortion-free; thus, it also resolves the inherent conflict between security and imperceptibility. The proposed scheme also characterizes the malicious data modifications to quantify the nature of tempering attacks. Experimental results show that even minor malicious modifications made to a database relation can be detected and characterized successfully.
QSAR Modeling Using Large-Scale Databases: Case Study for HIV-1 Reverse Transcriptase Inhibitors.
Tarasova, Olga A; Urusova, Aleksandra F; Filimonov, Dmitry A; Nicklaus, Marc C; Zakharov, Alexey V; Poroikov, Vladimir V
2015-07-27
Large-scale databases are important sources of training sets for various QSAR modeling approaches. Generally, these databases contain information extracted from different sources. This variety of sources can produce inconsistency in the data, defined as sometimes widely diverging activity results for the same compound against the same target. Because such inconsistency can reduce the accuracy of predictive models built from these data, we are addressing the question of how best to use data from publicly and commercially accessible databases to create accurate and predictive QSAR models. We investigate the suitability of commercially and publicly available databases to QSAR modeling of antiviral activity (HIV-1 reverse transcriptase (RT) inhibition). We present several methods for the creation of modeling (i.e., training and test) sets from two, either commercially or freely available, databases: Thomson Reuters Integrity and ChEMBL. We found that the typical predictivities of QSAR models obtained using these different modeling set compilation methods differ significantly from each other. The best results were obtained using training sets compiled for compounds tested using only one method and material (i.e., a specific type of biological assay). Compound sets aggregated by target only typically yielded poorly predictive models. We discuss the possibility of "mix-and-matching" assay data across aggregating databases such as ChEMBL and Integrity and their current severe limitations for this purpose. One of them is the general lack of complete and semantic/computer-parsable descriptions of assay methodology carried by these databases that would allow one to determine mix-and-matchability of result sets at the assay level.
Spatial cyberinfrastructures, ontologies, and the humanities.
Sieber, Renee E; Wellen, Christopher C; Jin, Yuan
2011-04-05
We report on research into building a cyberinfrastructure for Chinese biographical and geographic data. Our cyberinfrastructure contains (i) the McGill-Harvard-Yenching Library Ming Qing Women's Writings database (MQWW), the only online database on historical Chinese women's writings, (ii) the China Biographical Database, the authority for Chinese historical people, and (iii) the China Historical Geographical Information System, one of the first historical geographic information systems. Key to this integration is that linked databases retain separate identities as bases of knowledge, while they possess sufficient semantic interoperability to allow for multidatabase concepts and to support cross-database queries on an ad hoc basis. Computational ontologies create underlying semantics for database access. This paper focuses on the spatial component in a humanities cyberinfrastructure, which includes issues of conflicting data, heterogeneous data models, disambiguation, and geographic scale. First, we describe the methodology for integrating the databases. Then we detail the system architecture, which includes a tier of ontologies and schema. We describe the user interface and applications that allow for cross-database queries. For instance, users should be able to analyze the data, examine hypotheses on spatial and temporal relationships, and generate historical maps with datasets from MQWW for research, teaching, and publication on Chinese women writers, their familial relations, publishing venues, and the literary and social communities. Last, we discuss the social side of cyberinfrastructure development, as people are considered to be as critical as the technical components for its success.
Garraín, Daniel; Fazio, Simone; de la Rúa, Cristina; Recchioni, Marco; Lechón, Yolanda; Mathieux, Fabrice
2015-01-01
The aim of this paper is to identify areas of potential improvement of the European Reference Life Cycle Database (ELCD) electricity datasets. The revision is based on the data quality indicators described by the International Life Cycle Data system (ILCD) Handbook, applied on sectorial basis. These indicators evaluate the technological, geographical and time-related representativeness of the dataset and the appropriateness in terms of completeness, precision and methodology. Results show that ELCD electricity datasets have a very good quality in general terms, nevertheless some findings and recommendations in order to improve the quality of Life-Cycle Inventories have been derived. Moreover, these results ensure the quality of the electricity-related datasets to any LCA practitioner, and provide insights related to the limitations and assumptions underlying in the datasets modelling. Giving this information, the LCA practitioner will be able to decide whether the use of the ELCD electricity datasets is appropriate based on the goal and scope of the analysis to be conducted. The methodological approach would be also useful for dataset developers and reviewers, in order to improve the overall Data Quality Requirements of databases.
Fisher information framework for time series modeling
NASA Astrophysics Data System (ADS)
Venkatesan, R. C.; Plastino, A.
2017-08-01
A robust prediction model invoking the Takens embedding theorem, whose working hypothesis is obtained via an inference procedure based on the minimum Fisher information principle, is presented. The coefficients of the ansatz, central to the working hypothesis satisfy a time independent Schrödinger-like equation in a vector setting. The inference of (i) the probability density function of the coefficients of the working hypothesis and (ii) the establishing of constraint driven pseudo-inverse condition for the modeling phase of the prediction scheme, is made, for the case of normal distributions, with the aid of the quantum mechanical virial theorem. The well-known reciprocity relations and the associated Legendre transform structure for the Fisher information measure (FIM, hereafter)-based model in a vector setting (with least square constraints) are self-consistently derived. These relations are demonstrated to yield an intriguing form of the FIM for the modeling phase, which defines the working hypothesis, solely in terms of the observed data. Cases for prediction employing time series' obtained from the: (i) the Mackey-Glass delay-differential equation, (ii) one ECG signal from the MIT-Beth Israel Deaconess Hospital (MIT-BIH) cardiac arrhythmia database, and (iii) one ECG signal from the Creighton University ventricular tachyarrhythmia database. The ECG samples were obtained from the Physionet online repository. These examples demonstrate the efficiency of the prediction model. Numerical examples for exemplary cases are provided.
77 FR 38277 - Wind and Water Power Program
Federal Register 2010, 2011, 2012, 2013, 2014
2012-06-27
..., modeling, and database efforts. This meeting will be a technical discussion to provide those involved in... ecological survey, modeling, and database efforts in the waters off the Mid-Atlantic. The workshop aims to... models and compatible Federal and regional databases. It is not the object of this session to obtain any...
Toward designing for trust in database automation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Duez, P. P.; Jamieson, G. A.
Appropriate reliance on system automation is imperative for safe and productive work, especially in safety-critical systems. It is unsafe to rely on automation beyond its designed use; conversely, it can be both unproductive and unsafe to manually perform tasks that are better relegated to automated tools. Operator trust in automated tools mediates reliance, and trust appears to affect how operators use technology. As automated agents become more complex, the question of trust in automation is increasingly important. In order to achieve proper use of automation, we must engender an appropriate degree of trust that is sensitive to changes in operatingmore » functions and context. In this paper, we present research concerning trust in automation in the domain of automated tools for relational databases. Lee and See have provided models of trust in automation. One model developed by Lee and See identifies three key categories of information about the automation that lie along a continuum of attributional abstraction. Purpose-, process-and performance-related information serve, both individually and through inferences between them, to describe automation in such a way as to engender r properly-calibrated trust. Thus, one can look at information from different levels of attributional abstraction as a general requirements analysis for information key to appropriate trust in automation. The model of information necessary to engender appropriate trust in automation [1] is a general one. Although it describes categories of information, it does not provide insight on how to determine the specific information elements required for a given automated tool. We have applied the Abstraction Hierarchy (AH) to this problem in the domain of relational databases. The AH serves as a formal description of the automation at several levels of abstraction, ranging from a very abstract purpose-oriented description to a more concrete description of the resources involved in the automated process. The connection between an AH for an automated tool and a list of information elements at the three levels of attributional abstraction is then direct, providing a method for satisfying information requirements for appropriate trust in automation. In this paper, we will present our method for developing specific information requirements for an automated tool, based on a formal analysis of that tool and the models presented by Lee and See. We will show an example of the application of the AH to automation, in the domain of relational database automation, and the resulting set of specific information elements for appropriate trust in the automated tool. Finally, we will comment on the applicability of this approach to the domain of nuclear plant instrumentation. (authors)« less
Integrated Space Asset Management Database and Modeling
NASA Technical Reports Server (NTRS)
MacLeod, Todd; Gagliano, Larry; Percy, Thomas; Mason, Shane
2015-01-01
Effective Space Asset Management is one key to addressing the ever-growing issue of space congestion. It is imperative that agencies around the world have access to data regarding the numerous active assets and pieces of space junk currently tracked in orbit around the Earth. At the center of this issues is the effective management of data of many types related to orbiting objects. As the population of tracked objects grows, so too should the data management structure used to catalog technical specifications, orbital information, and metadata related to those populations. Marshall Space Flight Center's Space Asset Management Database (SAM-D) was implemented in order to effectively catalog a broad set of data related to known objects in space by ingesting information from a variety of database and processing that data into useful technical information. Using the universal NORAD number as a unique identifier, the SAM-D processes two-line element data into orbital characteristics and cross-references this technical data with metadata related to functional status, country of ownership, and application category. The SAM-D began as an Excel spreadsheet and was later upgraded to an Access database. While SAM-D performs its task very well, it is limited by its current platform and is not available outside of the local user base. Further, while modeling and simulation can be powerful tools to exploit the information contained in SAM-D, the current system does not allow proper integration options for combining the data with both legacy and new M&S tools. This paper provides a summary of SAM-D development efforts to date and outlines a proposed data management infrastructure that extends SAM-D to support the larger data sets to be generated. A service-oriented architecture model using an information sharing platform named SIMON will allow it to easily expand to incorporate new capabilities, including advanced analytics, M&S tools, fusion techniques and user interface for visualizations. In addition, tight control of information sharing policy will increase confidence in the system, which would encourage industry partners to provide commercial data. Combined with the integration of new and legacy M&S tools, a SIMON-based architecture will provide a robust environment that can be extended and expanded indefinitely.
Hewitt, Robin; Gobbi, Alberto; Lee, Man-Ling
2005-01-01
Relational databases are the current standard for storing and retrieving data in the pharmaceutical and biotech industries. However, retrieving data from a relational database requires specialized knowledge of the database schema and of the SQL query language. At Anadys, we have developed an easy-to-use system for searching and reporting data in a relational database to support our drug discovery project teams. This system is fast and flexible and allows users to access all data without having to write SQL queries. This paper presents the hierarchical, graph-based metadata representation and SQL-construction methods that, together, are the basis of this system's capabilities.
A dedicated database system for handling multi-level data in systems biology.
Pornputtapong, Natapol; Wanichthanarak, Kwanjeera; Nilsson, Avlant; Nookaew, Intawat; Nielsen, Jens
2014-01-01
Advances in high-throughput technologies have enabled extensive generation of multi-level omics data. These data are crucial for systems biology research, though they are complex, heterogeneous, highly dynamic, incomplete and distributed among public databases. This leads to difficulties in data accessibility and often results in errors when data are merged and integrated from varied resources. Therefore, integration and management of systems biological data remain very challenging. To overcome this, we designed and developed a dedicated database system that can serve and solve the vital issues in data management and hereby facilitate data integration, modeling and analysis in systems biology within a sole database. In addition, a yeast data repository was implemented as an integrated database environment which is operated by the database system. Two applications were implemented to demonstrate extensibility and utilization of the system. Both illustrate how the user can access the database via the web query function and implemented scripts. These scripts are specific for two sample cases: 1) Detecting the pheromone pathway in protein interaction networks; and 2) Finding metabolic reactions regulated by Snf1 kinase. In this study we present the design of database system which offers an extensible environment to efficiently capture the majority of biological entities and relations encountered in systems biology. Critical functions and control processes were designed and implemented to ensure consistent, efficient, secure and reliable transactions. The two sample cases on the yeast integrated data clearly demonstrate the value of a sole database environment for systems biology research.
Gamma-Ray Burst Intensity Distributions
NASA Technical Reports Server (NTRS)
Band, David L.; Norris, Jay P.; Bonnell, Jerry T.
2004-01-01
We use the lag-luminosity relation to calculate self-consistently the redshifts, apparent peak bolometric luminosities L(sub B1), and isotropic energies E(sub iso) for a large sample of BATSE bursts. We consider two different forms of the lag-luminosity relation; for both forms the median redshift, for our burst database is 1.6. We model the resulting sample of burst energies with power law and Gaussian dis- tributions, both of which are reasonable models. The power law model has an index of a = 1.76 plus or minus 0.05 (95% confidence) as opposed to the index of a = 2 predicted by the simple universal jet profile model; however, reasonable refinements to this model permit much greater flexibility in reconciling predicted and observed energy distributions.
The XSD-Builder Specification Language—Toward a Semantic View of XML Schema Definition
NASA Astrophysics Data System (ADS)
Fong, Joseph; Cheung, San Kuen
In the present database market, XML database model is a main structure for the forthcoming database system in the Internet environment. As a conceptual schema of XML database, XML Model has its limitation on presenting its data semantics. System analyst has no toolset for modeling and analyzing XML system. We apply XML Tree Model (shown in Figure 2) as a conceptual schema of XML database to model and analyze the structure of an XML database. It is important not only for visualizing, specifying, and documenting structural models, but also for constructing executable systems. The tree model represents inter-relationship among elements inside different logical schema such as XML Schema Definition (XSD), DTD, Schematron, XDR, SOX, and DSD (shown in Figure 1, an explanation of the terms in the figure are shown in Table 1). The XSD-Builder consists of XML Tree Model, source language, translator, and XSD. The source language is called XSD-Source which is mainly for providing an environment with concept of user friendliness while writing an XSD. The source language will consequently be translated by XSD-Translator. Output of XSD-Translator is an XSD which is our target and is called as an object language.
Testing models for the formation of the equatorial ridge on Iapetus via crater counting
NASA Astrophysics Data System (ADS)
Damptz, Amanda L.; Dombard, Andrew J.; Kirchoff, Michelle R.
2018-03-01
Iapetus's equatorial ridge, visible in global views of the moon, is unique in the Solar System. The formation of this feature is likely attributed to a key event in the evolution of Iapetus, and various models have been proposed as the source of the ridge. By surveying imagery from the Cassini and Voyager missions, this study aims to compile a database of the impact crater population on and around Iapetus's equatorial ridge, assess the relative age of the ridge from differences in cratering between on ridge and off ridge, and test the various models of ridge formation. This work presents a database that contains 7748 craters ranging from 0.83 km to 591 km in diameter. The database includes the study area in which the crater is located, the latitude and longitude of the crater, the major and minor axis lengths, and the azimuthal angle of orientation of the major axis. Analysis of crater orientation over the entire study area reveals that there is no preference for long-axis orientation, particularly in the area with the highest resolution. Comparison of the crater size-frequency distributions show that the crater distribution on the ridge appears to be depleted in craters larger than 16 km with an abruptly enhanced crater population less than 16 km in diameter up to saturation. One possible interpretation is that the ridge is a relatively younger surface with an enhanced small impactor population. Finally, the compiled results are used to examine each ridge formation hypothesis. Based on these results, a model of ridge formation via a tidally disrupted sub-satellite appears most consistent with our interpretation of a younger ridge with an enhanced small impactor population.
Advanced transportation system studies. Alternate propulsion subsystem concepts: Propulsion database
NASA Technical Reports Server (NTRS)
Levack, Daniel
1993-01-01
The Advanced Transportation System Studies alternate propulsion subsystem concepts propulsion database interim report is presented. The objective of the database development task is to produce a propulsion database which is easy to use and modify while also being comprehensive in the level of detail available. The database is to be available on the Macintosh computer system. The task is to extend across all three years of the contract. Consequently, a significant fraction of the effort in this first year of the task was devoted to the development of the database structure to ensure a robust base for the following years' efforts. Nonetheless, significant point design propulsion system descriptions and parametric models were also produced. Each of the two propulsion databases, parametric propulsion database and propulsion system database, are described. The descriptions include a user's guide to each code, write-ups for models used, and sample output. The parametric database has models for LOX/H2 and LOX/RP liquid engines, solid rocket boosters using three different propellants, a hybrid rocket booster, and a NERVA derived nuclear thermal rocket engine.
Mouse Tumor Biology (MTB): a database of mouse models for human cancer.
Bult, Carol J; Krupke, Debra M; Begley, Dale A; Richardson, Joel E; Neuhauser, Steven B; Sundberg, John P; Eppig, Janan T
2015-01-01
The Mouse Tumor Biology (MTB; http://tumor.informatics.jax.org) database is a unique online compendium of mouse models for human cancer. MTB provides online access to expertly curated information on diverse mouse models for human cancer and interfaces for searching and visualizing data associated with these models. The information in MTB is designed to facilitate the selection of strains for cancer research and is a platform for mining data on tumor development and patterns of metastases. MTB curators acquire data through manual curation of peer-reviewed scientific literature and from direct submissions by researchers. Data in MTB are also obtained from other bioinformatics resources including PathBase, the Gene Expression Omnibus and ArrayExpress. Recent enhancements to MTB improve the association between mouse models and human genes commonly mutated in a variety of cancers as identified in large-scale cancer genomics studies, provide new interfaces for exploring regions of the mouse genome associated with cancer phenotypes and incorporate data and information related to Patient-Derived Xenograft models of human cancers. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
PathCase-SB architecture and database design
2011-01-01
Background Integration of metabolic pathways resources and regulatory metabolic network models, and deploying new tools on the integrated platform can help perform more effective and more efficient systems biology research on understanding the regulation in metabolic networks. Therefore, the tasks of (a) integrating under a single database environment regulatory metabolic networks and existing models, and (b) building tools to help with modeling and analysis are desirable and intellectually challenging computational tasks. Description PathCase Systems Biology (PathCase-SB) is built and released. The PathCase-SB database provides data and API for multiple user interfaces and software tools. The current PathCase-SB system provides a database-enabled framework and web-based computational tools towards facilitating the development of kinetic models for biological systems. PathCase-SB aims to integrate data of selected biological data sources on the web (currently, BioModels database and KEGG), and to provide more powerful and/or new capabilities via the new web-based integrative framework. This paper describes architecture and database design issues encountered in PathCase-SB's design and implementation, and presents the current design of PathCase-SB's architecture and database. Conclusions PathCase-SB architecture and database provide a highly extensible and scalable environment with easy and fast (real-time) access to the data in the database. PathCase-SB itself is already being used by researchers across the world. PMID:22070889
Clinical study of the Erlanger silver catheter--data management and biometry.
Martus, P; Geis, C; Lugauer, S; Böswald, M; Guggenbichler, J P
1999-01-01
The clinical evaluation of venous catheters for catheter-induced infections must conform to a strict biometric methodology. The statistical planning of the study (target population, design, degree of blinding), data management (database design, definition of variables, coding), quality assurance (data inspection at several levels) and the biometric evaluation of the Erlanger silver catheter project are described. The three-step data flow included: 1) primary data from the hospital, 2) relational database, 3) files accessible for statistical evaluation. Two different statistical models were compared: analyzing the first catheter only of a patient in the analysis (independent data) and analyzing several catheters from the same patient (dependent data) by means of the generalized estimating equations (GEE) method. The main result of the study was based on the comparison of both statistical models.
MTO-like reference mask modeling for advanced inverse lithography technology patterns
NASA Astrophysics Data System (ADS)
Park, Jongju; Moon, Jongin; Son, Suein; Chung, Donghoon; Kim, Byung-Gook; Jeon, Chan-Uk; LoPresti, Patrick; Xue, Shan; Wang, Sonny; Broadbent, Bill; Kim, Soonho; Hur, Jiuk; Choo, Min
2017-07-01
Advanced Inverse Lithography Technology (ILT) can result in mask post-OPC databases with very small address units, all-angle figures, and very high vertex counts. This creates mask inspection issues for existing mask inspection database rendering. These issues include: large data volumes, low transfer rate, long data preparation times, slow inspection throughput, and marginal rendering accuracy leading to high false detections. This paper demonstrates the application of a new rendering method including a new OASIS-like mask inspection format, new high-speed rendering algorithms, and related hardware to meet the inspection challenges posed by Advanced ILT masks.
(abstract) Synthesis of Speaker Facial Movements to Match Selected Speech Sequences
NASA Technical Reports Server (NTRS)
Scott, Kenneth C.
1994-01-01
We are developing a system for synthesizing image sequences the simulate the facial motion of a speaker. To perform this synthesis, we are pursuing two major areas of effort. We are developing the necessary computer graphics technology to synthesize a realistic image sequence of a person speaking selected speech sequences. Next, we are developing a model that expresses the relation between spoken phonemes and face/mouth shape. A subject is video taped speaking an arbitrary text that contains expression of the full list of desired database phonemes. The subject is video taped from the front speaking normally, recording both audio and video detail simultaneously. Using the audio track, we identify the specific video frames on the tape relating to each spoken phoneme. From this range we digitize the video frame which represents the extreme of mouth motion/shape. Thus, we construct a database of images of face/mouth shape related to spoken phonemes. A selected audio speech sequence is recorded which is the basis for synthesizing a matching video sequence; the speaker need not be the same as used for constructing the database. The audio sequence is analyzed to determine the spoken phoneme sequence and the relative timing of the enunciation of those phonemes. Synthesizing an image sequence corresponding to the spoken phoneme sequence is accomplished using a graphics technique known as morphing. Image sequence keyframes necessary for this processing are based on the spoken phoneme sequence and timing. We have been successful in synthesizing the facial motion of a native English speaker for a small set of arbitrary speech segments. Our future work will focus on advancement of the face shape/phoneme model and independent control of facial features.
Why Save Your Course as a Relational Database?
ERIC Educational Resources Information Center
Hamilton, Gregory C.; Katz, David L.; Davis, James E.
2000-01-01
Describes a system that stores course materials for computer-based training programs in a relational database called Of Course! Outlines the basic structure of the databases; explains distinctions between Of Course! and other authoring languages; and describes how data is retrieved from the database and presented to the student. (Author/LRW)
Simple Logic for Big Problems: An Inside Look at Relational Databases.
ERIC Educational Resources Information Center
Seba, Douglas B.; Smith, Pat
1982-01-01
Discusses database design concept termed "normalization" (process replacing associations between data with associations in two-dimensional tabular form) which results in formation of relational databases (they are to computers what dictionaries are to spoken languages). Applications of the database in serials control and complex systems…
Relational Database Design in Information Science Education.
ERIC Educational Resources Information Center
Brooks, Terrence A.
1985-01-01
Reports on database management system (dbms) applications designed by library school students for university community at University of Iowa. Three dbms design issues are examined: synthesis of relations, analysis of relations (normalization procedure), and data dictionary usage. Database planning prior to automation using data dictionary approach…
Toward Computational Cumulative Biology by Combining Models of Biological Datasets
Faisal, Ali; Peltonen, Jaakko; Georgii, Elisabeth; Rung, Johan; Kaski, Samuel
2014-01-01
A main challenge of data-driven sciences is how to make maximal use of the progressively expanding databases of experimental datasets in order to keep research cumulative. We introduce the idea of a modeling-based dataset retrieval engine designed for relating a researcher's experimental dataset to earlier work in the field. The search is (i) data-driven to enable new findings, going beyond the state of the art of keyword searches in annotations, (ii) modeling-driven, to include both biological knowledge and insights learned from data, and (iii) scalable, as it is accomplished without building one unified grand model of all data. Assuming each dataset has been modeled beforehand, by the researchers or automatically by database managers, we apply a rapidly computable and optimizable combination model to decompose a new dataset into contributions from earlier relevant models. By using the data-driven decomposition, we identify a network of interrelated datasets from a large annotated human gene expression atlas. While tissue type and disease were major driving forces for determining relevant datasets, the found relationships were richer, and the model-based search was more accurate than the keyword search; moreover, it recovered biologically meaningful relationships that are not straightforwardly visible from annotations—for instance, between cells in different developmental stages such as thymocytes and T-cells. Data-driven links and citations matched to a large extent; the data-driven links even uncovered corrections to the publication data, as two of the most linked datasets were not highly cited and turned out to have wrong publication entries in the database. PMID:25427176
Toward computational cumulative biology by combining models of biological datasets.
Faisal, Ali; Peltonen, Jaakko; Georgii, Elisabeth; Rung, Johan; Kaski, Samuel
2014-01-01
A main challenge of data-driven sciences is how to make maximal use of the progressively expanding databases of experimental datasets in order to keep research cumulative. We introduce the idea of a modeling-based dataset retrieval engine designed for relating a researcher's experimental dataset to earlier work in the field. The search is (i) data-driven to enable new findings, going beyond the state of the art of keyword searches in annotations, (ii) modeling-driven, to include both biological knowledge and insights learned from data, and (iii) scalable, as it is accomplished without building one unified grand model of all data. Assuming each dataset has been modeled beforehand, by the researchers or automatically by database managers, we apply a rapidly computable and optimizable combination model to decompose a new dataset into contributions from earlier relevant models. By using the data-driven decomposition, we identify a network of interrelated datasets from a large annotated human gene expression atlas. While tissue type and disease were major driving forces for determining relevant datasets, the found relationships were richer, and the model-based search was more accurate than the keyword search; moreover, it recovered biologically meaningful relationships that are not straightforwardly visible from annotations-for instance, between cells in different developmental stages such as thymocytes and T-cells. Data-driven links and citations matched to a large extent; the data-driven links even uncovered corrections to the publication data, as two of the most linked datasets were not highly cited and turned out to have wrong publication entries in the database.
International Shock-Wave Database: Current Status
NASA Astrophysics Data System (ADS)
Levashov, Pavel
2013-06-01
Shock-wave and related dynamic material response data serve for calibrating, validating, and improving material models over very broad regions of the pressure-temperature-density phase space. Since the middle of the 20th century vast amount of shock-wave experimental information has been obtained. To systemize it a number of compendiums of shock-wave data has been issued by LLNL, LANL (USA), CEA (France), IPCP and VNIIEF (Russia). In mid-90th the drawbacks of the paper handbooks became obvious, so the first version of the online shock-wave database appeared in 1997 (http://www.ficp.ac.ru/rusbank). It includes approximately 20000 experimental points on shock compression, adiabatic expansion, measurements of sound velocity behind the shock front and free-surface-velocity for more than 650 substances. This is still a useful tool for the shock-wave community, but it has a number of serious disadvantages which can't be easily eliminated: (i) very simple data format for points and references; (ii) minimalistic user interface for data addition; (iii) absence of history of changes; (iv) bad feedback from users. The new International Shock-Wave database (ISWdb) is intended to solve these and some other problems. The ISWdb project objectives are: (i) to develop a database on thermodynamic and mechanical properties of materials under conditions of shock-wave and other dynamic loadings, selected related quantities of interest, and the meta-data that describes the provenance of the measurements and material models; and (ii) to make this database available internationally through the Internet, in an interactive form. The development and operation of the ISWdb is guided by an advisory committee. The database will be installed on two mirrored web-servers, one in Russia and the other in USA (currently only one server is available). The database provides access to original experimental data on shock compression, non-shock dynamic loadings, isentropic expansion, measurements of sound speed in the Hugoniot state, and time-dependent free-surface or window-interface velocity profiles. Users are able to search the information in the database and obtain the experimental points in tabular or plain text formats directly via the Internet using common browsers. It is also possible to plot the experimental points for comparison with different approximations and results of equation-of-state calculations. The user can present the results of calculations in text or graphical forms and compare them with any experimental data available in the database. A short history of the shock-wave database will be presented and current possibilities of ISWdb will be demonstrated. Web-site of the project: http://iswdb.info. This work is supported by SNL contracts # 1143875, 1196352.
An Initial Design of ISO 19152:2012 LADM Based Valuation and Taxation Data Model
NASA Astrophysics Data System (ADS)
Çağdaş, V.; Kara, A.; van Oosterom, P.; Lemmen, C.; Işıkdağ, Ü.; Kathmann, R.; Stubkjær, E.
2016-10-01
A fiscal registry or database is supposed to record geometric, legal, physical, economic, and environmental characteristics in relation to property units, which are subject to immovable property valuation and taxation. Apart from procedural standards, there is no internationally accepted data standard that defines the semantics of fiscal databases. The ISO 19152:2012 Land Administration Domain Model (LADM), as an international land administration standard focuses on legal requirements, but considers out of scope specifications of external information systems including valuation and taxation databases. However, it provides a formalism which allows for an extension that responds to the fiscal requirements. This paper introduces an initial version of a LADM - Fiscal Extension Module for the specification of databases used in immovable property valuation and taxation. The extension module is designed to facilitate all stages of immovable property taxation, namely the identification of properties and taxpayers, assessment of properties through single or mass appraisal procedures, automatic generation of sales statistics, and the management of tax collection, dealing with arrears and appeals. It is expected that the initial version will be refined through further activities held by a possible joint working group under FIG Commission 7 (Cadastre and Land Management) and FIG Commission 9 (Valuation and the Management of Real Estate) in collaboration with other relevant international bodies.
Modeling Liver-Related Adverse Effects of Drugs Using kNN QSAR Method
Rodgers, Amie D.; Zhu, Hao; Fourches, Dennis; Rusyn, Ivan; Tropsha, Alexander
2010-01-01
Adverse effects of drugs (AEDs) continue to be a major cause of drug withdrawals both in development and post-marketing. While liver-related AEDs are a major concern for drug safety, there are few in silico models for predicting human liver toxicity for drug candidates. We have applied the Quantitative Structure Activity Relationship (QSAR) approach to model liver AEDs. In this study, we aimed to construct a QSAR model capable of binary classification (active vs. inactive) of drugs for liver AEDs based on chemical structure. To build QSAR models, we have employed an FDA spontaneous reporting database of human liver AEDs (elevations in activity of serum liver enzymes), which contains data on approximately 500 approved drugs. Approximately 200 compounds with wide clinical data coverage, structural similarity and balanced (40/60) active/inactive ratio were selected for modeling and divided into multiple training/test and external validation sets. QSAR models were developed using the k nearest neighbor method and validated using external datasets. Models with high sensitivity (>73%) and specificity (>94%) for prediction of liver AEDs in external validation sets were developed. To test applicability of the models, three chemical databases (World Drug Index, Prestwick Chemical Library, and Biowisdom Liver Intelligence Module) were screened in silico and the validity of predictions was determined, where possible, by comparing model-based classification with assertions in publicly available literature. Validated QSAR models of liver AEDs based on the data from the FDA spontaneous reporting system can be employed as sensitive and specific predictors of AEDs in pre-clinical screening of drug candidates for potential hepatotoxicity in humans. PMID:20192250
NASA Astrophysics Data System (ADS)
Lebedeva, Liudmila; Semenova, Olga
2013-04-01
One of widely claimed problems in modern modelling hydrology is lack of available information to investigate hydrological processes and improve their representation in the models. In spite of this, one hardly might confidently say that existing "traditional" data sources have been already fully analyzed and made use of. There existed the network of research watersheds in USSR called water-balance stations where comprehensive and extensive hydrometeorological measurements were conducted according to more or less single program during the last 40-60 years. The program (where not ceased) includes observations of discharges in several, often nested and homogeneous, small watersheds, meteorological elements, evaporation, soil temperature and moisture, snow depths, etc. The network covered different climatic and landscape zones and was established in the middle of the last century with the aim of investigation of the runoff formation in different conditions. Until recently the long-term observational data accompanied by descriptions and maps had existed only in hard copies. It partly explains why these datasets are not enough exploited yet and very rarely or even never were used for the purposes of hydrological modelling although they seem to be much more promising than implementation of the completely new measuring techniques not detracting from its importance. The goal of the presented work is development of a database of observational data and supportive materials from small research watersheds across the territory of the former Soviet Union. The first version of the database will include the following information for 12 water-balance stations across Russia, Ukraine, Kazahstan and Turkmenistan: daily values of discharges (one or several watersheds), air temperature, humidity, precipitation (one or several gauges), soil and snow state variables, soil and snow evaporation. The stations will cover desert and semi desert, steppe and forest steppe, forest, permafrost and mountainous zones. Supportive material will include maps of watershed boundaries and location of observational sites. Text descriptions of the data, measuring techniques and hydrometeorological conditions related to each of the water-balance station will accompany the datasets. The database is supposed to be expanded with time in number of the stations (by 20) and available data series for each of them. It will be uploaded to the internet with open access to everyone interested in. Such a database allows one to test hydrological models and separate modules for their adequacy and workability in different conditions and can serve as a base for models comparison and evaluation. Special profit of the database will gain models that don't rely on calibration but on the adequate process representation and use of the observable parameters. One of such models, process-based Hydrograph model, will be tested against the data from every watershed from the developed database. The aim of the Hydrograph model application to the as many as possible number of research data-rich watersheds in different climatic zones is both amending the algorithms and creation and adjustment of the model parameters that allow using the model across the geographic spectrum.
Enhancing SAMOS Data Access in DOMS via a Neo4j Property Graph Database.
NASA Astrophysics Data System (ADS)
Stallard, A. P.; Smith, S. R.; Elya, J. L.
2016-12-01
The Shipboard Automated Meteorological and Oceanographic System (SAMOS) initiative provides routine access to high-quality marine meteorological and near-surface oceanographic observations from research vessels. The Distributed Oceanographic Match-Up Service (DOMS) under development is a centralized service that allows researchers to easily match in situ and satellite oceanographic data from distributed sources to facilitate satellite calibration, validation, and retrieval algorithm development. The service currently uses Apache Solr as a backend search engine on each node in the distributed network. While Solr is a high-performance solution that facilitates creation and maintenance of indexed data, it is limited in the sense that its schema is fixed. The property graph model escapes this limitation by creating relationships between data objects. The authors will present the development of the SAMOS Neo4j property graph database including new search possibilities that take advantage of the property graph model, performance comparisons with Apache Solr, and a vision for graph databases as a storage tool for oceanographic data. The integration of the SAMOS Neo4j graph into DOMS will also be described. Currently, Neo4j contains spatial and temporal records from SAMOS which are modeled into a time tree and r-tree using Graph Aware and Spatial plugin tools for Neo4j. These extensions provide callable Java procedures within CYPHER (Neo4j's query language) that generate in-graph structures. Once generated, these structures can be queried using procedures from these libraries, or directly via CYPHER statements. Neo4j excels at performing relationship and path-based queries, which challenge relational-SQL databases because they require memory intensive joins due to the limitation of their design. Consider a user who wants to find records over several years, but only for specific months. If a traditional database only stores timestamps, this type of query would be complex and likely prohibitively slow. Using the time tree model, one can specify a path from the root to the data which restricts resolutions to certain timeframes (e.g., months). This query can be executed without joins, unions, or other compute-intensive operations, putting Neo4j at a computational advantage to the SQL database alternative.
Reviews, Holdings, and Presses and Publishers in Academic Library Book Acquisitions.
ERIC Educational Resources Information Center
Calhoun, John C.
2001-01-01
Discussion of academic library book acquisition reviews pertinent literature on book reviews, book selection, and evaluation and proposes a model for academic library book acquisition using a two-year relational database file of approval plan records. Defines a core collection for the California State University system, and characterizes…
Systematic approach to verification and validation: High explosive burn models
DOE Office of Scientific and Technical Information (OSTI.GOV)
Menikoff, Ralph; Scovel, Christina A.
2012-04-16
Most material models used in numerical simulations are based on heuristics and empirically calibrated to experimental data. For a specific model, key questions are determining its domain of applicability and assessing its relative merits compared to other models. Answering these questions should be a part of model verification and validation (V and V). Here, we focus on V and V of high explosive models. Typically, model developers implemented their model in their own hydro code and use different sets of experiments to calibrate model parameters. Rarely can one find in the literature simulation results for different models of the samemore » experiment. Consequently, it is difficult to assess objectively the relative merits of different models. This situation results in part from the fact that experimental data is scattered through the literature (articles in journals and conference proceedings) and that the printed literature does not allow the reader to obtain data from a figure in electronic form needed to make detailed comparisons among experiments and simulations. In addition, it is very time consuming to set up and run simulations to compare different models over sufficiently many experiments to cover the range of phenomena of interest. The first difficulty could be overcome if the research community were to support an online web based database. The second difficulty can be greatly reduced by automating procedures to set up and run simulations of similar types of experiments. Moreover, automated testing would be greatly facilitated if the data files obtained from a database were in a standard format that contained key experimental parameters as meta-data in a header to the data file. To illustrate our approach to V and V, we have developed a high explosive database (HED) at LANL. It now contains a large number of shock initiation experiments. Utilizing the header information in a data file from HED, we have written scripts to generate an input file for a hydro code, run a simulation, and generate a comparison plot showing simulated and experimental velocity gauge data. These scripts are then applied to several series of experiments and to several HE burn models. The same systematic approach is applicable to other types of material models; for example, equations of state models and material strength models.« less
Schomburg, Ida; Chang, Antje; Placzek, Sandra; Söhngen, Carola; Rother, Michael; Lang, Maren; Munaretto, Cornelia; Ulas, Susanne; Stelzer, Michael; Grote, Andreas; Scheer, Maurice; Schomburg, Dietmar
2013-01-01
The BRENDA (BRaunschweig ENzyme DAtabase) enzyme portal (http://www.brenda-enzymes.org) is the main information system of functional biochemical and molecular enzyme data and provides access to seven interconnected databases. BRENDA contains 2.7 million manually annotated data on enzyme occurrence, function, kinetics and molecular properties. Each entry is connected to a reference and the source organism. Enzyme ligands are stored with their structures and can be accessed via their names, synonyms or via a structure search. FRENDA (Full Reference ENzyme DAta) and AMENDA (Automatic Mining of ENzyme DAta) are based on text mining methods and represent a complete survey of PubMed abstracts with information on enzymes in different organisms, tissues or organelles. The supplemental database DRENDA provides more than 910 000 new EC number-disease relations in more than 510 000 references from automatic search and a classification of enzyme-disease-related information. KENDA (Kinetic ENzyme DAta), a new amendment extracts and displays kinetic values from PubMed abstracts. The integration of the EnzymeDetector offers an automatic comparison, evaluation and prediction of enzyme function annotations for prokaryotic genomes. The biochemical reaction database BKM-react contains non-redundant enzyme-catalysed and spontaneous reactions and was developed to facilitate and accelerate the construction of biochemical models.
NASA Astrophysics Data System (ADS)
Shennan, Ian; Bradley, Sarah L.; Edwards, Robin
2018-05-01
The new sea-level database for Britain and Ireland contains >2100 data points from 86 regions and records relative sea-level (RSL) changes over the last 20 ka and across elevations ranging from ∼+40 to -55 m. It reveals radically different patterns of RSL as we move from regions near the centre of the Celtic ice sheet at the last glacial maximum to regions near and beyond the ice limits. Validated sea-level index points and limiting data show good agreement with the broad patterns of RSL change predicted by current glacial isostatic adjustment (GIA) models. The index points show no consistent pattern of synchronous coastal advance and retreat across different regions, ∼100-500 km scale, indicating that within-estuary processes, rather than decimetre- and centennial-scale oscillations in sea level, produce major controls on the temporal pattern of horizontal shifts in coastal sedimentary environments. Comparisons between the database and GIA model predictions for multiple regions provide potentially powerful constraints on various characteristics of global GIA models, including the magnitude of MWP1A, the final deglaciation of the Laurentide ice sheet and the continued melting of Antarctica after 7 ka BP.
Avalos, Marta; Adroher, Nuria Duran; Lagarde, Emmanuel; Thiessard, Frantz; Grandvalet, Yves; Contrand, Benjamin; Orriols, Ludivine
2012-09-01
Large data sets with many variables provide particular challenges when constructing analytic models. Lasso-related methods provide a useful tool, although one that remains unfamiliar to most epidemiologists. We illustrate the application of lasso methods in an analysis of the impact of prescribed drugs on the risk of a road traffic crash, using a large French nationwide database (PLoS Med 2010;7:e1000366). In the original case-control study, the authors analyzed each exposure separately. We use the lasso method, which can simultaneously perform estimation and variable selection in a single model. We compare point estimates and confidence intervals using (1) a separate logistic regression model for each drug with a Bonferroni correction and (2) lasso shrinkage logistic regression analysis. Shrinkage regression had little effect on (bias corrected) point estimates, but led to less conservative results, noticeably for drugs with moderate levels of exposure. Carbamates, carboxamide derivative and fatty acid derivative antiepileptics, drugs used in opioid dependence, and mineral supplements of potassium showed stronger associations. Lasso is a relevant method in the analysis of databases with large number of exposures and can be recommended as an alternative to conventional strategies.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rupcich, Franco; Badal, Andreu; Kyprianou, Iacovos
Purpose: The purpose of this study was to develop a database for estimating organ dose in a voxelized patient model for coronary angiography and brain perfusion CT acquisitions with any spectra and angular tube current modulation setting. The database enables organ dose estimation for existing and novel acquisition techniques without requiring Monte Carlo simulations. Methods: The study simulated transport of monoenergetic photons between 5 and 150 keV for 1000 projections over 360 Degree-Sign through anthropomorphic voxelized female chest and head (0 Degree-Sign and 30 Degree-Sign tilt) phantoms and standard head and body CTDI dosimetry cylinders. The simulations resulted in tablesmore » of normalized dose deposition for several radiosensitive organs quantifying the organ dose per emitted photon for each incident photon energy and projection angle for coronary angiography and brain perfusion acquisitions. The values in a table can be multiplied by an incident spectrum and number of photons at each projection angle and then summed across all energies and angles to estimate total organ dose. Scanner-specific organ dose may be approximated by normalizing the database-estimated organ dose by the database-estimated CTDI{sub vol} and multiplying by a physical CTDI{sub vol} measurement. Two examples are provided demonstrating how to use the tables to estimate relative organ dose. In the first, the change in breast and lung dose during coronary angiography CT scans is calculated for reduced kVp, angular tube current modulation, and partial angle scanning protocols relative to a reference protocol. In the second example, the change in dose to the eye lens is calculated for a brain perfusion CT acquisition in which the gantry is tilted 30 Degree-Sign relative to a nontilted scan. Results: Our database provides tables of normalized dose deposition for several radiosensitive organs irradiated during coronary angiography and brain perfusion CT scans. Validation results indicate total organ doses calculated using our database are within 1% of those calculated using Monte Carlo simulations with the same geometry and scan parameters for all organs except red bone marrow (within 6%), and within 23% of published estimates for different voxelized phantoms. Results from the example of using the database to estimate organ dose for coronary angiography CT acquisitions show 2.1%, 1.1%, and -32% change in breast dose and 2.1%, -0.74%, and 4.7% change in lung dose for reduced kVp, tube current modulated, and partial angle protocols, respectively, relative to the reference protocol. Results show -19.2% difference in dose to eye lens for a tilted scan relative to a nontilted scan. The reported relative changes in organ doses are presented without quantification of image quality and are for the sole purpose of demonstrating the use of the proposed database. Conclusions: The proposed database and calculation method enable the estimation of organ dose for coronary angiography and brain perfusion CT scans utilizing any spectral shape and angular tube current modulation scheme by taking advantage of the precalculated Monte Carlo simulation results. The database can be used in conjunction with image quality studies to develop optimized acquisition techniques and may be particularly beneficial for optimizing dual kVp acquisitions for which numerous kV, mA, and filtration combinations may be investigated.« less
PrionHome: a database of prions and other sequences relevant to prion phenomena.
Harbi, Djamel; Parthiban, Marimuthu; Gendoo, Deena M A; Ehsani, Sepehr; Kumar, Manish; Schmitt-Ulms, Gerold; Sowdhamini, Ramanathan; Harrison, Paul M
2012-01-01
Prions are units of propagation of an altered state of a protein or proteins; prions can propagate from organism to organism, through cooption of other protein copies. Prions contain no necessary nucleic acids, and are important both as both pathogenic agents, and as a potential force in epigenetic phenomena. The original prions were derived from a misfolded form of the mammalian Prion Protein PrP. Infection by these prions causes neurodegenerative diseases. Other prions cause non-Mendelian inheritance in budding yeast, and sometimes act as diseases of yeast. We report the bioinformatic construction of the PrionHome, a database of >2000 prion-related sequences. The data was collated from various public and private resources and filtered for redundancy. The data was then processed according to a transparent classification system of prionogenic sequences (i.e., sequences that can make prions), prionoids (i.e., proteins that propagate like prions between individual cells), and other prion-related phenomena. There are eight PrionHome classifications for sequences. The first four classifications are derived from experimental observations: prionogenic sequences, prionoids, other prion-related phenomena, and prion interactors. The second four classifications are derived from sequence analysis: orthologs, paralogs, pseudogenes, and candidate-prionogenic sequences. Database entries list: supporting information for PrionHome classifications, prion-determinant areas (where relevant), and disordered and compositionally-biased regions. Also included are literature references for the PrionHome classifications, transcripts and genomic coordinates, and structural data (including comparative models made for the PrionHome from manually curated alignments). We provide database usage examples for both vertebrate and fungal prion contexts. Using the database data, we have performed a detailed analysis of the compositional biases in known budding-yeast prionogenic sequences, showing that the only abundant bias pattern is for asparagine bias with subsidiary serine bias. We anticipate that this database will be a useful experimental aid and reference resource. It is freely available at: http://libaio.biol.mcgill.ca/prion.
PrionHome: A Database of Prions and Other Sequences Relevant to Prion Phenomena
Harbi, Djamel; Parthiban, Marimuthu; Gendoo, Deena M. A.; Ehsani, Sepehr; Kumar, Manish; Schmitt-Ulms, Gerold; Sowdhamini, Ramanathan; Harrison, Paul M.
2012-01-01
Prions are units of propagation of an altered state of a protein or proteins; prions can propagate from organism to organism, through cooption of other protein copies. Prions contain no necessary nucleic acids, and are important both as both pathogenic agents, and as a potential force in epigenetic phenomena. The original prions were derived from a misfolded form of the mammalian Prion Protein PrP. Infection by these prions causes neurodegenerative diseases. Other prions cause non-Mendelian inheritance in budding yeast, and sometimes act as diseases of yeast. We report the bioinformatic construction of the PrionHome, a database of >2000 prion-related sequences. The data was collated from various public and private resources and filtered for redundancy. The data was then processed according to a transparent classification system of prionogenic sequences (i.e., sequences that can make prions), prionoids (i.e., proteins that propagate like prions between individual cells), and other prion-related phenomena. There are eight PrionHome classifications for sequences. The first four classifications are derived from experimental observations: prionogenic sequences, prionoids, other prion-related phenomena, and prion interactors. The second four classifications are derived from sequence analysis: orthologs, paralogs, pseudogenes, and candidate-prionogenic sequences. Database entries list: supporting information for PrionHome classifications, prion-determinant areas (where relevant), and disordered and compositionally-biased regions. Also included are literature references for the PrionHome classifications, transcripts and genomic coordinates, and structural data (including comparative models made for the PrionHome from manually curated alignments). We provide database usage examples for both vertebrate and fungal prion contexts. Using the database data, we have performed a detailed analysis of the compositional biases in known budding-yeast prionogenic sequences, showing that the only abundant bias pattern is for asparagine bias with subsidiary serine bias. We anticipate that this database will be a useful experimental aid and reference resource. It is freely available at: http://libaio.biol.mcgill.ca/prion. PMID:22363733
Machine learning approaches to analysing textual injury surveillance data: a systematic review.
Vallmuur, Kirsten
2015-06-01
To synthesise recent research on the use of machine learning approaches to mining textual injury surveillance data. Systematic review. The electronic databases which were searched included PubMed, Cinahl, Medline, Google Scholar, and Proquest. The bibliography of all relevant articles was examined and associated articles were identified using a snowballing technique. For inclusion, articles were required to meet the following criteria: (a) used a health-related database, (b) focused on injury-related cases, AND used machine learning approaches to analyse textual data. The papers identified through the search were screened resulting in 16 papers selected for review. Articles were reviewed to describe the databases and methodology used, the strength and limitations of different techniques, and quality assurance approaches used. Due to heterogeneity between studies meta-analysis was not performed. Occupational injuries were the focus of half of the machine learning studies and the most common methods described were Bayesian probability or Bayesian network based methods to either predict injury categories or extract common injury scenarios. Models were evaluated through either comparison with gold standard data or content expert evaluation or statistical measures of quality. Machine learning was found to provide high precision and accuracy when predicting a small number of categories, was valuable for visualisation of injury patterns and prediction of future outcomes. However, difficulties related to generalizability, source data quality, complexity of models and integration of content and technical knowledge were discussed. The use of narrative text for injury surveillance has grown in popularity, complexity and quality over recent years. With advances in data mining techniques, increased capacity for analysis of large databases, and involvement of computer scientists in the injury prevention field, along with more comprehensive use and description of quality assurance methods in text mining approaches, it is likely that we will see a continued growth and advancement in knowledge of text mining in the injury field. Copyright © 2015 Elsevier Ltd. All rights reserved.
Herasevich, Vitaly; Pickering, Brian W; Dong, Yue; Peters, Steve G; Gajic, Ognjen
2010-03-01
To develop and validate an informatics infrastructure for syndrome surveillance, decision support, reporting, and modeling of critical illness. Using open-schema data feeds imported from electronic medical records (EMRs), we developed a near-real-time relational database (Multidisciplinary Epidemiology and Translational Research in Intensive Care Data Mart). Imported data domains included physiologic monitoring, medication orders, laboratory and radiologic investigations, and physician and nursing notes. Open database connectivity supported the use of Boolean combinations of data that allowed authorized users to develop syndrome surveillance, decision support, and reporting (data "sniffers") routines. Random samples of database entries in each category were validated against corresponding independent manual reviews. The Multidisciplinary Epidemiology and Translational Research in Intensive Care Data Mart accommodates, on average, 15,000 admissions to the intensive care unit (ICU) per year and 200,000 vital records per day. Agreement between database entries and manual EMR audits was high for sex, mortality, and use of mechanical ventilation (kappa, 1.0 for all) and for age and laboratory and monitored data (Bland-Altman mean difference +/- SD, 1(0) for all). Agreement was lower for interpreted or calculated variables, such as specific syndrome diagnoses (kappa, 0.5 for acute lung injury), duration of ICU stay (mean difference +/- SD, 0.43+/-0.2), or duration of mechanical ventilation (mean difference +/- SD, 0.2+/-0.9). Extraction of essential ICU data from a hospital EMR into an open, integrative database facilitates process control, reporting, syndrome surveillance, decision support, and outcome research in the ICU.
Crowdsourcing-Assisted Radio Environment Database for V2V Communication.
Katagiri, Keita; Sato, Koya; Fujii, Takeo
2018-04-12
In order to realize reliable Vehicle-to-Vehicle (V2V) communication systems for autonomous driving, the recognition of radio propagation becomes an important technology. However, in the current wireless distributed network systems, it is difficult to accurately estimate the radio propagation characteristics because of the locality of the radio propagation caused by surrounding buildings and geographical features. In this paper, we propose a measurement-based radio environment database for improving the accuracy of the radio environment estimation in the V2V communication systems. The database first gathers measurement datasets of the received signal strength indicator (RSSI) related to the transmission/reception locations from V2V systems. By using the datasets, the average received power maps linked with transmitter and receiver locations are generated. We have performed measurement campaigns of V2V communications in the real environment to observe RSSI for the database construction. Our results show that the proposed method has higher accuracy of the radio propagation estimation than the conventional path loss model-based estimation.
DDRprot: a database of DNA damage response-related proteins.
Andrés-León, Eduardo; Cases, Ildefonso; Arcas, Aida; Rojas, Ana M
2016-01-01
The DNA Damage Response (DDR) signalling network is an essential system that protects the genome's integrity. The DDRprot database presented here is a resource that integrates manually curated information on the human DDR network and its sub-pathways. For each particular DDR protein, we present detailed information about its function. If involved in post-translational modifications (PTMs) with each other, we depict the position of the modified residue/s in the three-dimensional structures, when resolved structures are available for the proteins. All this information is linked to the original publication from where it was obtained. Phylogenetic information is also shown, including time of emergence and conservation across 47 selected species, family trees and sequence alignments of homologues. The DDRprot database can be queried by different criteria: pathways, species, evolutionary age or involvement in (PTM). Sequence searches using hidden Markov models can be also used.Database URL: http://ddr.cbbio.es. © The Author(s) 2016. Published by Oxford University Press.
A TEX86 surface sediment database and extended Bayesian calibration
NASA Astrophysics Data System (ADS)
Tierney, Jessica E.; Tingley, Martin P.
2015-06-01
Quantitative estimates of past temperature changes are a cornerstone of paleoclimatology. For a number of marine sediment-based proxies, the accuracy and precision of past temperature reconstructions depends on a spatial calibration of modern surface sediment measurements to overlying water temperatures. Here, we present a database of 1095 surface sediment measurements of TEX86, a temperature proxy based on the relative cyclization of marine archaeal glycerol dialkyl glycerol tetraether (GDGT) lipids. The dataset is archived in a machine-readable format with geospatial information, fractional abundances of lipids (if available), and metadata. We use this new database to update surface and subsurface temperature calibration models for TEX86 and demonstrate the applicability of the TEX86 proxy to past temperature prediction. The TEX86 database confirms that surface sediment GDGT distribution has a strong relationship to temperature, which accounts for over 70% of the variance in the data. Future efforts, made possible by the data presented here, will seek to identify variables with secondary relationships to GDGT distributions, such as archaeal community composition.
Waller, P; Cassell, J A; Saunders, M H; Stevens, R
2017-03-01
In order to promote understanding of UK governance and assurance relating to electronic health records research, we present and discuss the role of the Independent Scientific Advisory Committee (ISAC) for MHRA database research in evaluating protocols proposing the use of the Clinical Practice Research Datalink. We describe the development of the Committee's activities between 2006 and 2015, alongside growth in data linkage and wider national electronic health records programmes, including the application and assessment processes, and our approach to undertaking this work. Our model can provide independence, challenge and support to data providers such as the Clinical Practice Research Datalink database which has been used for well over 1,000 medical research projects. ISAC's role in scientific oversight ensures feasible and scientifically acceptable plans are in place, while having both lay and professional membership addresses governance issues in order to protect the integrity of the database and ensure that public confidence is maintained.
Crowdsourcing-Assisted Radio Environment Database for V2V Communication †
Katagiri, Keita; Fujii, Takeo
2018-01-01
In order to realize reliable Vehicle-to-Vehicle (V2V) communication systems for autonomous driving, the recognition of radio propagation becomes an important technology. However, in the current wireless distributed network systems, it is difficult to accurately estimate the radio propagation characteristics because of the locality of the radio propagation caused by surrounding buildings and geographical features. In this paper, we propose a measurement-based radio environment database for improving the accuracy of the radio environment estimation in the V2V communication systems. The database first gathers measurement datasets of the received signal strength indicator (RSSI) related to the transmission/reception locations from V2V systems. By using the datasets, the average received power maps linked with transmitter and receiver locations are generated. We have performed measurement campaigns of V2V communications in the real environment to observe RSSI for the database construction. Our results show that the proposed method has higher accuracy of the radio propagation estimation than the conventional path loss model-based estimation. PMID:29649174
EcoCyc: a comprehensive database resource for Escherichia coli
Keseler, Ingrid M.; Collado-Vides, Julio; Gama-Castro, Socorro; Ingraham, John; Paley, Suzanne; Paulsen, Ian T.; Peralta-Gil, Martín; Karp, Peter D.
2005-01-01
The EcoCyc database (http://EcoCyc.org/) is a comprehensive source of information on the biology of the prototypical model organism Escherichia coli K12. The mission for EcoCyc is to contain both computable descriptions of, and detailed comments describing, all genes, proteins, pathways and molecular interactions in E.coli. Through ongoing manual curation, extensive information such as summary comments, regulatory information, literature citations and evidence types has been extracted from 8862 publications and added to Version 8.5 of the EcoCyc database. The EcoCyc database can be accessed through a World Wide Web interface, while the downloadable Pathway Tools software and data files enable computational exploration of the data and provide enhanced querying capabilities that web interfaces cannot support. For example, EcoCyc contains carefully curated information that can be used as training sets for bioinformatics prediction of entities such as promoters, operons, genetic networks, transcription factor binding sites, metabolic pathways, functionally related genes, protein complexes and protein–ligand interactions. PMID:15608210
MatProps: Material Properties Database and Associated Access Library
DOE Office of Scientific and Technical Information (OSTI.GOV)
Durrenberger, J K; Becker, R C; Goto, D M
2007-08-13
Coefficients for analytic constitutive and equation of state models (EOS), which are used by many hydro codes at LLNL, are currently stored in a legacy material database (Steinberg, UCRL-MA-106349). Parameters for numerous materials are available through this database, and include Steinberg-Guinan and Steinberg-Lund constitutive models for metals, JWL equations of state for high explosives, and Mie-Gruniesen equations of state for metals. These constitutive models are used in most of the simulations done by ASC codes today at Livermore. Analytic EOSs are also still used, but have been superseded in many cases by tabular representations in LEOS (http://leos.llnl.gov). Numerous advanced constitutivemore » models have been developed and implemented into ASC codes over the past 20 years. These newer models have more physics and better representations of material strength properties than their predecessors, and therefore more model coefficients. However, a material database of these coefficients is not readily available. Therefore incorporating these coefficients with those of the legacy models into a portable database that could be shared amongst codes would be most welcome. The goal of this paper is to describe the MatProp effort at LLNL to create such a database and associated access library that could be used by codes throughout the DOE complex and beyond. We have written an initial version of the MatProp database and access library and our DOE/ASC code ALE3D (Nichols et. al., UCRL-MA-152204) is able to import information from the database. The database, a link to which exists on the Sourceforge server at LLNL, contains coefficients for many materials and models (see Appendix), and includes material parameters in the following categories--flow stress, shear modulus, strength, damage, and equation of state. Future versions of the Matprop database and access library will include the ability to read and write material descriptions that can be exchanged between codes. It will also include an ability to do unit changes, i.e. have the library return parameters in user-specified unit systems. In addition to these, additional material categories can be added (e.g., phase change kinetics, etc.). The Matprop database and access library is part of a larger set of tools used at LLNL for assessing material model behavior. One of these is MSlib, a shared constitutive material model library. Another is the Material Strength Database (MSD), which allows users to compare parameter fits for specific constitutive models to available experimental data. Together with Matprop, these tools create a suite of capabilities that provide state-of-the-art models and parameters for those models to integrated simulation codes. This document is broken into several appendices. Appendix A contains a code example to retrieve several material coefficients. Appendix B contains the API for the Matprop data access library. Appendix C contains a list of the material names and model types currently available in the Matprop database. Appendix D contains a list of the parameter names for the currently recognized model types. Appendix E contains a full xml description of the material Tantalum.« less
Regional early flood warning system: design and implementation
NASA Astrophysics Data System (ADS)
Chang, L. C.; Yang, S. N.; Kuo, C. L.; Wang, Y. F.
2017-12-01
This study proposes a prototype of the regional early flood inundation warning system in Tainan City, Taiwan. The AI technology is used to forecast multi-step-ahead regional flood inundation maps during storm events. The computing time is only few seconds that leads to real-time regional flood inundation forecasting. A database is built to organize data and information for building real-time forecasting models, maintaining the relations of forecasted points, and displaying forecasted results, while real-time data acquisition is another key task where the model requires immediately accessing rain gauge information to provide forecast services. All programs related database are constructed in Microsoft SQL Server by using Visual C# to extracting real-time hydrological data, managing data, storing the forecasted data and providing the information to the visual map-based display. The regional early flood inundation warning system use the up-to-date Web technologies driven by the database and real-time data acquisition to display the on-line forecasting flood inundation depths in the study area. The friendly interface includes on-line sequentially showing inundation area by Google Map, maximum inundation depth and its location, and providing KMZ file download of the results which can be watched on Google Earth. The developed system can provide all the relevant information and on-line forecast results that helps city authorities to make decisions during typhoon events and make actions to mitigate the losses.
An Analysis of Transmission and Storage Gains from Sliding Checksum Methods
1998-11-01
analysed a protocol "rsync" for synchronising related files at different ends of a communications channel with a minimum of transmitted data. This report...Sliding Checksum Methods Executive Summary In a previous report we described, modelled and analysed a protocol "rsync" for synchronising related...collaborative writing of documentation and synchronisation of distributed databases in the situation where no one location is aware of the differences
Insertion algorithms for network model database management systems
NASA Astrophysics Data System (ADS)
Mamadolimov, Abdurashid; Khikmat, Saburov
2017-12-01
The network model is a database model conceived as a flexible way of representing objects and their relationships. Its distinguishing feature is that the schema, viewed as a graph in which object types are nodes and relationship types are arcs, forms partial order. When a database is large and a query comparison is expensive then the efficiency requirement of managing algorithms is minimizing the number of query comparisons. We consider updating operation for network model database management systems. We develop a new sequantial algorithm for updating operation. Also we suggest a distributed version of the algorithm.
Ologs: a categorical framework for knowledge representation.
Spivak, David I; Kent, Robert E
2012-01-01
In this paper we introduce the olog, or ontology log, a category-theoretic model for knowledge representation (KR). Grounded in formal mathematics, ologs can be rigorously formulated and cross-compared in ways that other KR models (such as semantic networks) cannot. An olog is similar to a relational database schema; in fact an olog can serve as a data repository if desired. Unlike database schemas, which are generally difficult to create or modify, ologs are designed to be user-friendly enough that authoring or reconfiguring an olog is a matter of course rather than a difficult chore. It is hoped that learning to author ologs is much simpler than learning a database definition language, despite their similarity. We describe ologs carefully and illustrate with many examples. As an application we show that any primitive recursive function can be described by an olog. We also show that ologs can be aligned or connected together into a larger network using functors. The various methods of information flow and institutions can then be used to integrate local and global world-views. We finish by providing several different avenues for future research.
Ologs: A Categorical Framework for Knowledge Representation
Spivak, David I.; Kent, Robert E.
2012-01-01
In this paper we introduce the olog, or ontology log, a category-theoretic model for knowledge representation (KR). Grounded in formal mathematics, ologs can be rigorously formulated and cross-compared in ways that other KR models (such as semantic networks) cannot. An olog is similar to a relational database schema; in fact an olog can serve as a data repository if desired. Unlike database schemas, which are generally difficult to create or modify, ologs are designed to be user-friendly enough that authoring or reconfiguring an olog is a matter of course rather than a difficult chore. It is hoped that learning to author ologs is much simpler than learning a database definition language, despite their similarity. We describe ologs carefully and illustrate with many examples. As an application we show that any primitive recursive function can be described by an olog. We also show that ologs can be aligned or connected together into a larger network using functors. The various methods of information flow and institutions can then be used to integrate local and global world-views. We finish by providing several different avenues for future research. PMID:22303434
Overcoming barriers to a research-ready national commercial claims database.
Newman, David; Herrera, Carolina-Nicole; Parente, Stephen T
2014-11-01
Billions of dollars have been spent on the goal of making healthcare data available to clinicians and researchers in the hopes of improving healthcare and lowering costs. However, the problems of data governance, distribution, and accessibility remain challenges for the healthcare system to overcome. In this study, we discuss some of the issues around holding, reporting, and distributing data, including the newest "big data" challenge: making the data accessible to researchers and policy makers. This article presents a case study in "big healthcare data" involving the Health Care Cost Institute (HCCI). HCCI is a nonprofit, nonpartisan, independent research institute that serves as a voluntary repository of national commercial healthcare claims data. Governance of large healthcare databases is complicated by the data-holding model and further complicated by issues related to distribution to research teams. For multi-payer healthcare claims databases, the 2 most common models of data holding (mandatory and voluntary) have different data security requirements. Furthermore, data transport and accessibility may require technological investment. HCCI's efforts offer insights from which other data managers and healthcare leaders may benefit when contemplating a data collaborative.
The immune epitope database: a historical retrospective of the first decade.
Salimi, Nima; Fleri, Ward; Peters, Bjoern; Sette, Alessandro
2012-10-01
As the amount of biomedical information available in the literature continues to increase, databases that aggregate this information continue to grow in importance and scope. The population of databases can occur either through fully automated text mining approaches or through manual curation by human subject experts. We here report our experiences in populating the National Institute of Allergy and Infectious Diseases sponsored Immune Epitope Database and Analysis Resource (IEDB, http://iedb.org), which was created in 2003, and as of 2012 captures the epitope information from approximately 99% of all papers published to date that describe immune epitopes (with the exception of cancer and HIV data). This was achieved using a hybrid model based on automated document categorization and extensive human expert involvement. This task required automated scanning of over 22 million PubMed abstracts followed by classification and curation of over 13 000 references, including over 7000 infectious disease-related manuscripts, over 1000 allergy-related manuscripts, roughly 4000 related to autoimmunity, and 1000 transplant/alloantigen-related manuscripts. The IEDB curation involves an unprecedented level of detail, capturing for each paper the actual experiments performed for each different epitope structure. Key to enabling this process was the extensive use of ontologies to ensure rigorous and consistent data representation as well as interoperability with other bioinformatics resources, including the Protein Data Bank, Chemical Entities of Biological Interest, and the NIAID Bioinformatics Resource Centers. A growing fraction of the IEDB data derives from direct submissions by research groups engaged in epitope discovery, and is being facilitated by the implementation of novel data submission tools. The present explosion of information contained in biological databases demands effective query and display capabilities to optimize the user experience. Accordingly, the development of original ways to query the database, on the basis of ontologically driven hierarchical trees, and display of epitope data in aggregate in a biologically intuitive yet rigorous fashion is now at the forefront of the IEDB efforts. We also highlight advances made in the realm of epitope analysis and predictive tools available in the IEDB. © 2012 The Authors. Immunology © 2012 Blackwell Publishing Ltd.
Spatial Data Integration Using Ontology-Based Approach
NASA Astrophysics Data System (ADS)
Hasani, S.; Sadeghi-Niaraki, A.; Jelokhani-Niaraki, M.
2015-12-01
In today's world, the necessity for spatial data for various organizations is becoming so crucial that many of these organizations have begun to produce spatial data for that purpose. In some circumstances, the need to obtain real time integrated data requires sustainable mechanism to process real-time integration. Case in point, the disater management situations that requires obtaining real time data from various sources of information. One of the problematic challenges in the mentioned situation is the high degree of heterogeneity between different organizations data. To solve this issue, we introduce an ontology-based method to provide sharing and integration capabilities for the existing databases. In addition to resolving semantic heterogeneity, better access to information is also provided by our proposed method. Our approach is consisted of three steps, the first step is identification of the object in a relational database, then the semantic relationships between them are modelled and subsequently, the ontology of each database is created. In a second step, the relative ontology will be inserted into the database and the relationship of each class of ontology will be inserted into the new created column in database tables. Last step is consisted of a platform based on service-oriented architecture, which allows integration of data. This is done by using the concept of ontology mapping. The proposed approach, in addition to being fast and low cost, makes the process of data integration easy and the data remains unchanged and thus takes advantage of the legacy application provided.
ALDB: a domestic-animal long noncoding RNA database.
Li, Aimin; Zhang, Junying; Zhou, Zhongyin; Wang, Lei; Liu, Yujuan; Liu, Yajun
2015-01-01
Long noncoding RNAs (lncRNAs) have attracted significant attention in recent years due to their important roles in many biological processes. Domestic animals constitute a unique resource for understanding the genetic basis of phenotypic variation and are ideal models relevant to diverse areas of biomedical research. With improving sequencing technologies, numerous domestic-animal lncRNAs are now available. Thus, there is an immediate need for a database resource that can assist researchers to store, organize, analyze and visualize domestic-animal lncRNAs. The domestic-animal lncRNA database, named ALDB, is the first comprehensive database with a focus on the domestic-animal lncRNAs. It currently archives 12,103 pig intergenic lncRNAs (lincRNAs), 8,923 chicken lincRNAs and 8,250 cow lincRNAs. In addition to the annotations of lincRNAs, it offers related data that is not available yet in existing lncRNA databases (lncRNAdb and NONCODE), such as genome-wide expression profiles and animal quantitative trait loci (QTLs) of domestic animals. Moreover, a collection of interfaces and applications, such as the Basic Local Alignment Search Tool (BLAST), the Generic Genome Browser (GBrowse) and flexible search functionalities, are available to help users effectively explore, analyze and download data related to domestic-animal lncRNAs. ALDB enables the exploration and comparative analysis of lncRNAs in domestic animals. A user-friendly web interface, integrated information and tools make it valuable to researchers in their studies. ALDB is freely available from http://res.xaut.edu.cn/aldb/index.jsp.
NASA Astrophysics Data System (ADS)
Nakagawa, Y.; Kawahara, S.; Araki, F.; Matsuoka, D.; Ishikawa, Y.; Fujita, M.; Sugimoto, S.; Okada, Y.; Kawazoe, S.; Watanabe, S.; Ishii, M.; Mizuta, R.; Murata, A.; Kawase, H.
2017-12-01
Analyses of large ensemble data are quite useful in order to produce probabilistic effect projection of climate change. Ensemble data of "+2K future climate simulations" are currently produced by Japanese national project "Social Implementation Program on Climate Change Adaptation Technology (SI-CAT)" as a part of a database for Policy Decision making for Future climate change (d4PDF; Mizuta et al. 2016) produced by Program for Risk Information on Climate Change. Those data consist of global warming simulations and regional downscaling simulations. Considering that those data volumes are too large (a few petabyte) to download to a local computer of users, a user-friendly system is required to search and download data which satisfy requests of the users. We develop "a database system for near-future climate change projections" for providing functions to find necessary data for the users under SI-CAT. The database system for near-future climate change projections mainly consists of a relational database, a data download function and user interface. The relational database using PostgreSQL is a key function among them. Temporally and spatially compressed data are registered on the relational database. As a first step, we develop the relational database for precipitation, temperature and track data of typhoon according to requests by SI-CAT members. The data download function using Open-source Project for a Network Data Access Protocol (OPeNDAP) provides a function to download temporally and spatially extracted data based on search results obtained by the relational database. We also develop the web-based user interface for using the relational database and the data download function. A prototype of the database system for near-future climate change projections are currently in operational test on our local server. The database system for near-future climate change projections will be released on Data Integration and Analysis System Program (DIAS) in fiscal year 2017. Techniques of the database system for near-future climate change projections might be quite useful for simulation and observational data in other research fields. We report current status of development and some case studies of the database system for near-future climate change projections.
NASA Astrophysics Data System (ADS)
De, Biplab; Adhikari, Indrani; Nandy, Ashis; Saha, Achintya; Goswami, Binoy Behari
2017-06-01
Design and development of antioxidant supplements constitute an essential aspect of research in order to derive molecules that would help to combat the free radical invasion to the human body and curb oxidative stress related diseases. The present work deals with the development of in silico models for a series of thiazolidine derivatives having antioxidant potential. The objective of the work is to obtain models that would help to design new thazolidine derivatives based on substituent modification and thereby predict their activity profile. The QSAR model thus developed helps in quantification of the extent of contribution of the various molecular fragments towards the activity of the molecules, while the 3D pharmacophore model provides a brief idea of the essential molecular features that help the molecules to interact with the neighbouring free radicals. Both the models have been extensively validated which ensures their predictive ability as well the potential to search molecular databases for selection of thiazolidine derivatives with potent antioxidant activity. The models can thus be utilised effectively for database searching with the aim to isolate active antioxidants belonging to the thiazolidine group.
Work injury management model and implication in Hong Kong: a literature review.
Chong, Cecilia Suk-Mei; Cheng, Andy Shu-Kei
2010-01-01
The objective of this review is to explore the work injury management models in literatures and the essential components in different models. The resulting information could be used to develop an integrated holistic model that could be applied in the work injury management system in Hong Kong. A keyword search of MEDLINE and CINAHL databases was conducted. A total of 68 studies related to the management of an injury were found within the above mentioned electronic database. Together with the citation tracking, there were 13 studies left for selection after the exclusion screening. Only 7 out of those 13 studies met the inclusion criteria for review. It is noticeable that the most important component in the injury management model in the reviewed literatures is early intervention. Because of limitations in Employees' Compensation Ordinance in Hong Kong, there is an impetus to have a model and practice guideline for work injury management in Hong Kong to ensure the quality of injury management services. At the end of this paper, the authors propose a work injury management model based on the employees' compensation system in Hong Kong. This model can be used as a reference for those countries adopting similar legislation as in Hong Kong.
Detailed Uncertainty Analysis of the Ares I A106 Liftoff/Transition Database
NASA Technical Reports Server (NTRS)
Hanke, Jeremy L.
2011-01-01
The Ares I A106 Liftoff/Transition Force and Moment Aerodynamics Database describes the aerodynamics of the Ares I Crew Launch Vehicle (CLV) from the moment of liftoff through the transition from high to low total angles of attack at low subsonic Mach numbers. The database includes uncertainty estimates that were developed using a detailed uncertainty quantification procedure. The Ares I Aerodynamics Panel developed both the database and the uncertainties from wind tunnel test data acquired in the NASA Langley Research Center s 14- by 22-Foot Subsonic Wind Tunnel Test 591 using a 1.75 percent scale model of the Ares I and the tower assembly. The uncertainty modeling contains three primary uncertainty sources: experimental uncertainty, database modeling uncertainty, and database query interpolation uncertainty. The final database and uncertainty model represent a significant improvement in the quality of the aerodynamic predictions for this regime of flight over the estimates previously used by the Ares Project. The maximum possible aerodynamic force pushing the vehicle towards the launch tower assembly in a dispersed case using this database saw a 40 percent reduction from the worst-case scenario in previously released data for Ares I.
Clinical records anonymisation and text extraction (CRATE): an open-source software system.
Cardinal, Rudolf N
2017-04-26
Electronic medical records contain information of value for research, but contain identifiable and often highly sensitive confidential information. Patient-identifiable information cannot in general be shared outside clinical care teams without explicit consent, but anonymisation/de-identification allows research uses of clinical data without explicit consent. This article presents CRATE (Clinical Records Anonymisation and Text Extraction), an open-source software system with separable functions: (1) it anonymises or de-identifies arbitrary relational databases, with sensitivity and precision similar to previous comparable systems; (2) it uses public secure cryptographic methods to map patient identifiers to research identifiers (pseudonyms); (3) it connects relational databases to external tools for natural language processing; (4) it provides a web front end for research and administrative functions; and (5) it supports a specific model through which patients may consent to be contacted about research. Creation and management of a research database from sensitive clinical records with secure pseudonym generation, full-text indexing, and a consent-to-contact process is possible and practical using entirely free and open-source software.
Publishing Linked Open Data for Physical Samples - Lessons Learned
NASA Astrophysics Data System (ADS)
Ji, P.; Arko, R. A.; Lehnert, K.; Bristol, S.
2016-12-01
Most data and information about physical samples and associated sampling features currently reside in relational databases. Integrating common concepts from various databases has motivated us to publish Linked Open Data for collections of physical samples, using Semantic Web technologies including the Resource Description Framework (RDF), RDF Query Language (SPARQL), and Web Ontology Language (OWL). The goal of our work is threefold: To evaluate and select ontologies in different granularities for common concepts; to establish best practices and develop a generic methodology for publishing physical sample data stored in relational database as Linked Open Data; and to reuse standard community vocabularies from the International Commission on Stratigraphy (ICS), Global Volcanism Program (GVP), General Bathymetric Chart of the Oceans (GEBCO), and others. Our work leverages developments in the EarthCube GeoLink project and the Interdisciplinary Earth Data Alliance (IEDA) facility for modeling and extracting physical sample data stored in relational databases. Reusing ontologies developed by GeoLink and IEDA has facilitated discovery and integration of data and information across multiple collections including the USGS National Geochemical Database (NGDB), System for Earth Sample Registration (SESAR), and Index to Marine & Lacustrine Geological Samples (IMLGS). We have evaluated, tested, and deployed Linked Open Data tools including Morph, Virtuoso Server, LodView, LodLive, and YASGUI for converting, storing, representing, and querying data in a knowledge base (RDF triplestore). Using persistent identifiers such as Open Researcher & Contributor IDs (ORCIDs) and International Geo Sample Numbers (IGSNs) at the record level makes it possible for other repositories to link related resources such as persons, datasets, documents, expeditions, awards, etc. to samples, features, and collections. This work is supported by the EarthCube "GeoLink" project (NSF# ICER14-40221 and others) and the "USGS-IEDA Partnership to Support a Data Lifecycle Framework and Tools" project (USGS# G13AC00381).
Knobloch, Leanne K; Knobloch-Fedders, Lynne M; Yorgason, Jeremy B; Ebata, Aaron T; McGlaughlin, Patricia C
2017-08-01
This study drew on the relational turbulence model to investigate how the interpersonal dynamics of military couples predict parents' reports of the reintegration difficulty of military children upon homecoming after deployment. Longitudinal data were collected from 118 military couples once per month for 3 consecutive months after reunion. Military couples reported on their depressive symptoms, characteristics of their romantic relationship, and the reintegration difficulty of their oldest child. Results of dyadic growth curve models indicated that the mean levels of parents' depressive symptoms (H1), relationship uncertainty (H2), and interference from a partner (H3) were positively associated with parents' reports of military children's reintegration difficulty. These findings suggest that the relational turbulence model has utility for illuminating the reintegration difficulty of military children during the postdeployment transition. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Research Directions in Database Security IV
1993-07-01
second algorithm, which is based on multiversion timestamp ordering, is that high level transactions can be forced to read arbitrarily old data values...system. The first, the single ver- sion model, stores only the latest veision of each data item, while the second, the 88 multiversion model, stores... Multiversion Database Model In the standard database model, where there is only one version of each data item, all transactions compete for the most recent
Hicks, T; Taroni, F; Curran, J; Buckleton, J; Castella, V; Ribaux, O
2010-10-01
Familial searching consists of searching for a full profile left at a crime scene in a National DNA Database (NDNAD). In this paper we are interested in the circumstance where no full match is returned, but a partial match is found between a database member's profile and the crime stain. Because close relatives share more of their DNA than unrelated persons, this partial match may indicate that the crime stain was left by a close relative of the person with whom the partial match was found. This approach has successfully solved important crimes in the UK and the USA. In a previous paper, a model, which takes into account substructure and siblings, was used to simulate a NDNAD. In this paper, we have used this model to test the usefulness of familial searching and offer guidelines for pre-assessment of the cases based on the likelihood ratio. Siblings of "persons" present in the simulated Swiss NDNAD were created. These profiles (N=10,000) were used as traces and were then compared to the whole database (N=100,000). The statistical results obtained show that the technique has great potential confirming the findings of previous studies. However, effectiveness of the technique is only one part of the story. Familial searching has juridical and ethical aspects that should not be ignored. In Switzerland for example, there are no specific guidelines to the legality or otherwise of familial searching. This article both presents statistical results, and addresses criminological and civil liberties aspects to take into account risks and benefits of familial searching. Copyright © 2009 Elsevier Ireland Ltd. All rights reserved.
NASA Technical Reports Server (NTRS)
Abiteboul, Serge
1997-01-01
The amount of data of all kinds available electronically has increased dramatically in recent years. The data resides in different forms, ranging from unstructured data in the systems to highly structured in relational database systems. Data is accessible through a variety of interfaces including Web browsers, database query languages, application-specic interfaces, or data exchange formats. Some of this data is raw data, e.g., images or sound. Some of it has structure even if the structure is often implicit, and not as rigid or regular as that found in standard database systems. Sometimes the structure exists but has to be extracted from the data. Sometimes also it exists but we prefer to ignore it for certain purposes such as browsing. We call here semi-structured data this data that is (from a particular viewpoint) neither raw data nor strictly typed, i.e., not table-oriented as in a relational model or sorted-graph as in object databases. As will seen later when the notion of semi-structured data is more precisely de ned, the need for semi-structured data arises naturally in the context of data integration, even when the data sources are themselves well-structured. Although data integration is an old topic, the need to integrate a wider variety of data- formats (e.g., SGML or ASN.1 data) and data found on the Web has brought the topic of semi-structured data to the forefront of research. The main purpose of the paper is to isolate the essential aspects of semi- structured data. We also survey some proposals of models and query languages for semi-structured data. In particular, we consider recent works at Stanford U. and U. Penn on semi-structured data. In both cases, the motivation is found in the integration of heterogeneous data.
NASA Astrophysics Data System (ADS)
Weatherill, G. A.; Pagani, M.; Garcia, J.
2016-09-01
The creation of a magnitude-homogenized catalogue is often one of the most fundamental steps in seismic hazard analysis. The process of homogenizing multiple catalogues of earthquakes into a single unified catalogue typically requires careful appraisal of available bulletins, identification of common events within multiple bulletins and the development and application of empirical models to convert from each catalogue's native scale into the required target. The database of the International Seismological Center (ISC) provides the most exhaustive compilation of records from local bulletins, in addition to its reviewed global bulletin. New open-source tools are developed that can utilize this, or any other compiled database, to explore the relations between earthquake solutions provided by different recording networks, and to build and apply empirical models in order to harmonize magnitude scales for the purpose of creating magnitude-homogeneous earthquake catalogues. These tools are described and their application illustrated in two different contexts. The first is a simple application in the Sub-Saharan Africa region where the spatial coverage and magnitude scales for different local recording networks are compared, and their relation to global magnitude scales explored. In the second application the tools are used on a global scale for the purpose of creating an extended magnitude-homogeneous global earthquake catalogue. Several existing high-quality earthquake databases, such as the ISC-GEM and the ISC Reviewed Bulletins, are harmonized into moment magnitude to form a catalogue of more than 562 840 events. This extended catalogue, while not an appropriate substitute for a locally calibrated analysis, can help in studying global patterns in seismicity and hazard, and is therefore released with the accompanying software.
Physiological Parameters Database for PBPK Modeling (External Review Draft)
EPA released for public comment a physiological parameters database (created using Microsoft ACCESS) intended to be used in PBPK modeling. The database contains physiological parameter values for humans from early childhood through senescence. It also contains similar data for an...
Bioinformatic flowchart and database to investigate the origins and diversity of Clan AA peptidases
Llorens, Carlos; Futami, Ricardo; Renaud, Gabriel; Moya, Andrés
2009-01-01
Background Clan AA of aspartic peptidases relates the family of pepsin monomers evolutionarily with all dimeric peptidases encoded by eukaryotic LTR retroelements. Recent findings describing various pools of single-domain nonviral host peptidases, in prokaryotes and eukaryotes, indicate that the diversity of clan AA is larger than previously thought. The ensuing approach to investigate this enzyme group is by studying its phylogeny. However, clan AA is a difficult case to study due to the low similarity and different rates of evolution. This work is an ongoing attempt to investigate the different clan AA families to understand the cause of their diversity. Results In this paper, we describe in-progress database and bioinformatic flowchart designed to characterize the clan AA protein domain based on all possible protein families through ancestral reconstructions, sequence logos, and hidden markov models (HMMs). The flowchart includes the characterization of a major consensus sequence based on 6 amino acid patterns with correspondence with Andreeva's model, the structural template describing the clan AA peptidase fold. The set of tools is work in progress we have organized in a database within the GyDB project, referred to as Clan AA Reference Database . Conclusion The pre-existing classification combined with the evolutionary history of LTR retroelements permits a consistent taxonomical collection of sequence logos and HMMs. This set is useful for gene annotation but also a reference to evaluate the diversity of, and the relationships among, the different families. Comparisons among HMMs suggest a common ancestor for all dimeric clan AA peptidases that is halfway between single-domain nonviral peptidases and those coded by Ty3/Gypsy LTR retroelements. Sequence logos reveal how all clan AA families follow similar protein domain architecture related to the peptidase fold. In particular, each family nucleates a particular consensus motif in the sequence position related to the flap. The different motifs constitute a network where an alanine-asparagine-like variable motif predominates, instead of the canonical flap of the HIV-1 peptidase and closer relatives. Reviewers This article was reviewed by Daniel H. Haft, Vladimir Kapitonov (nominated by Jerry Jurka), and Ben M. Dunn (nominated by Claus Wilke). PMID:19173708
Aerodynamic Optimization of Rocket Control Surface Geometry Using Cartesian Methods and CAD Geometry
NASA Technical Reports Server (NTRS)
Nelson, Andrea; Aftosmis, Michael J.; Nemec, Marian; Pulliam, Thomas H.
2004-01-01
Aerodynamic design is an iterative process involving geometry manipulation and complex computational analysis subject to physical constraints and aerodynamic objectives. A design cycle consists of first establishing the performance of a baseline design, which is usually created with low-fidelity engineering tools, and then progressively optimizing the design to maximize its performance. Optimization techniques have evolved from relying exclusively on designer intuition and insight in traditional trial and error methods, to sophisticated local and global search methods. Recent attempts at automating the search through a large design space with formal optimization methods include both database driven and direct evaluation schemes. Databases are being used in conjunction with surrogate and neural network models as a basis on which to run optimization algorithms. Optimization algorithms are also being driven by the direct evaluation of objectives and constraints using high-fidelity simulations. Surrogate methods use data points obtained from simulations, and possibly gradients evaluated at the data points, to create mathematical approximations of a database. Neural network models work in a similar fashion, using a number of high-fidelity database calculations as training iterations to create a database model. Optimal designs are obtained by coupling an optimization algorithm to the database model. Evaluation of the current best design then gives either a new local optima and/or increases the fidelity of the approximation model for the next iteration. Surrogate methods have also been developed that iterate on the selection of data points to decrease the uncertainty of the approximation model prior to searching for an optimal design. The database approximation models for each of these cases, however, become computationally expensive with increase in dimensionality. Thus the method of using optimization algorithms to search a database model becomes problematic as the number of design variables is increased.
PACSY, a relational database management system for protein structure and chemical shift analysis.
Lee, Woonghee; Yu, Wookyung; Kim, Suhkmann; Chang, Iksoo; Lee, Weontae; Markley, John L
2012-10-01
PACSY (Protein structure And Chemical Shift NMR spectroscopY) is a relational database management system that integrates information from the Protein Data Bank, the Biological Magnetic Resonance Data Bank, and the Structural Classification of Proteins database. PACSY provides three-dimensional coordinates and chemical shifts of atoms along with derived information such as torsion angles, solvent accessible surface areas, and hydrophobicity scales. PACSY consists of six relational table types linked to one another for coherence by key identification numbers. Database queries are enabled by advanced search functions supported by an RDBMS server such as MySQL or PostgreSQL. PACSY enables users to search for combinations of information from different database sources in support of their research. Two software packages, PACSY Maker for database creation and PACSY Analyzer for database analysis, are available from http://pacsy.nmrfam.wisc.edu.
ApiEST-DB: analyzing clustered EST data of the apicomplexan parasites.
Li, Li; Crabtree, Jonathan; Fischer, Steve; Pinney, Deborah; Stoeckert, Christian J; Sibley, L David; Roos, David S
2004-01-01
ApiEST-DB (http://www.cbil.upenn.edu/paradbs-servlet/) provides integrated access to publicly available EST data from protozoan parasites in the phylum Apicomplexa. The database currently incorporates a total of nearly 100,000 ESTs from several parasite species of clinical and/or veterinary interest, including Eimeria tenella, Neospora caninum, Plasmodium falciparum, Sarcocystis neurona and Toxoplasma gondii. To facilitate analysis of these data, EST sequences were clustered and assembled to form consensus sequences for each organism, and these assemblies were then subjected to automated annotation via similarity searches against protein and domain databases. The underlying relational database infrastructure, Genomics Unified Schema (GUS), enables complex biologically based queries, facilitating validation of gene models, identification of alternative splicing, detection of single nucleotide polymorphisms, identification of stage-specific genes and recognition of phylogenetically conserved and phylogenetically restricted sequences.
Pavelko, Michael T.
2010-01-01
The water-level database for the Death Valley regional groundwater flow system in Nevada and California was updated. The database includes more than 54,000 water levels collected from 1907 to 2007, from more than 1,800 wells. Water levels were assigned a primary flag and multiple secondary flags that describe hydrologic conditions and trends at the time of the measurement and identify pertinent information about the well or water-level measurement. The flags provide a subjective measure of the relative accuracy of the measurements and are used to identify which water levels are appropriate for calculating head observations in a regional transient groundwater flow model. Included in the report appendix are all water-level data and their flags, selected well data, and an interactive spreadsheet for viewing hydrographs and well locations.
Thoracolumbar spine fractures in frontal impact crashes.
Pintar, Frank A; Yoganandan, Narayan; Maiman, Dennis J; Scarboro, Mark; Rudd, Rodney W
2012-01-01
There is currently no injury assessment for thoracic or lumbar spine fractures in the motor vehicle crash standards throughout the world. Compression-related thoracolumbar fractures are occurring in frontal impacts and yet the mechanism of injury is poorly understood. The objective of this investigation was to characterize these injuries using real world crash data from the US-DOT-NHTSA NASS-CDS and CIREN databases. Thoracic and lumbar AIS vertebral body fracture codes were searched for in the two databases. The NASS database was used to characterize population trends as a function of crash year and vehicle model year. The CIREN database was used to examine a case series in more detail. From the NASS database there were 2000-4000 occupants in frontal impacts with thoracic and lumbar vertebral body fractures per crash year. There was an increasing trend in incidence rate of thoracolumbar fractures in frontal impact crashes as a function of vehicle model year from 1986 to 2008; this was not the case for other crash types. From the CIREN database, the thoracolumbar spine was most commonly fractured at either the T12 or L1 level. Major, burst type fractures occurred predominantly at T12, L1 or L5; wedge fractures were most common at L1. Most CIREN occupants were belted; there were slightly more females involved; they were almost all in bucket seats; impact location occurred approximately half the time on the road and half off the road. The type of object struck also seemed to have some influence on fractured spine level, suggesting that the crash deceleration pulse may be influential in the type of compression vector that migrates up the spinal column. Future biomechanical studies are required to define mechanistically how these fractures are influenced by these many factors.