Sánchez-de-Madariaga, Ricardo; Muñoz, Adolfo; Lozano-Rubí, Raimundo; Serrano-Balazote, Pablo; Castro, Antonio L; Moreno, Oscar; Pascual, Mario
2017-08-18
The objective of this research is to compare the relational and non-relational (NoSQL) database systems approaches in order to store, recover, query and persist standardized medical information in the form of ISO/EN 13606 normalized Electronic Health Record XML extracts, both in isolation and concurrently. NoSQL database systems have recently attracted much attention, but few studies in the literature address their direct comparison with relational databases when applied to build the persistence layer of a standardized medical information system. One relational and two NoSQL databases (one document-based and one native XML database) of three different sizes have been created in order to evaluate and compare the response times (algorithmic complexity) of six different complexity growing queries, which have been performed on them. Similar appropriate results available in the literature have also been considered. Relational and non-relational NoSQL database systems show almost linear algorithmic complexity query execution. However, they show very different linear slopes, the former being much steeper than the two latter. Document-based NoSQL databases perform better in concurrency than in isolation, and also better than relational databases in concurrency. Non-relational NoSQL databases seem to be more appropriate than standard relational SQL databases when database size is extremely high (secondary use, research applications). Document-based NoSQL databases perform in general better than native XML NoSQL databases. EHR extracts visualization and edition are also document-based tasks more appropriate to NoSQL database systems. However, the appropriate database solution much depends on each particular situation and specific problem.
Migration from relational to NoSQL database
NASA Astrophysics Data System (ADS)
Ghotiya, Sunita; Mandal, Juhi; Kandasamy, Saravanakumar
2017-11-01
Data generated by various real time applications, social networking sites and sensor devices is of very huge amount and unstructured, which makes it difficult for Relational database management systems to handle the data. Data is very precious component of any application and needs to be analysed after arranging it in some structure. Relational databases are only able to deal with structured data, so there is need of NoSQL Database management System which can deal with semi -structured data also. Relational database provides the easiest way to manage the data but as the use of NoSQL is increasing it is becoming necessary to migrate the data from Relational to NoSQL databases. Various frameworks has been proposed previously which provides mechanisms for migration of data stored at warehouses in SQL, middle layer solutions which can provide facility of data to be stored in NoSQL databases to handle data which is not structured. This paper provides a literature review of some of the recent approaches proposed by various researchers to migrate data from relational to NoSQL databases. Some researchers proposed mechanisms for the co-existence of NoSQL and Relational databases together. This paper provides a summary of mechanisms which can be used for mapping data stored in Relational databases to NoSQL databases. Various techniques for data transformation and middle layer solutions are summarised in the paper.
Evaluation of relational and NoSQL database architectures to manage genomic annotations.
Schulz, Wade L; Nelson, Brent G; Felker, Donn K; Durant, Thomas J S; Torres, Richard
2016-12-01
While the adoption of next generation sequencing has rapidly expanded, the informatics infrastructure used to manage the data generated by this technology has not kept pace. Historically, relational databases have provided much of the framework for data storage and retrieval. Newer technologies based on NoSQL architectures may provide significant advantages in storage and query efficiency, thereby reducing the cost of data management. But their relative advantage when applied to biomedical data sets, such as genetic data, has not been characterized. To this end, we compared the storage, indexing, and query efficiency of a common relational database (MySQL), a document-oriented NoSQL database (MongoDB), and a relational database with NoSQL support (PostgreSQL). When used to store genomic annotations from the dbSNP database, we found the NoSQL architectures to outperform traditional, relational models for speed of data storage, indexing, and query retrieval in nearly every operation. These findings strongly support the use of novel database technologies to improve the efficiency of data management within the biological sciences. Copyright © 2016 Elsevier Inc. All rights reserved.
Sánchez-de-Madariaga, Ricardo; Muñoz, Adolfo; Castro, Antonio L; Moreno, Oscar; Pascual, Mario
2018-01-01
This research shows a protocol to assess the computational complexity of querying relational and non-relational (NoSQL (not only Structured Query Language)) standardized electronic health record (EHR) medical information database systems (DBMS). It uses a set of three doubling-sized databases, i.e. databases storing 5000, 10,000 and 20,000 realistic standardized EHR extracts, in three different database management systems (DBMS): relational MySQL object-relational mapping (ORM), document-based NoSQL MongoDB, and native extensible markup language (XML) NoSQL eXist. The average response times to six complexity-increasing queries were computed, and the results showed a linear behavior in the NoSQL cases. In the NoSQL field, MongoDB presents a much flatter linear slope than eXist. NoSQL systems may also be more appropriate to maintain standardized medical information systems due to the special nature of the updating policies of medical information, which should not affect the consistency and efficiency of the data stored in NoSQL databases. One limitation of this protocol is the lack of direct results of improved relational systems such as archetype relational mapping (ARM) with the same data. However, the interpolation of doubling-size database results to those presented in the literature and other published results suggests that NoSQL systems might be more appropriate in many specific scenarios and problems to be solved. For example, NoSQL may be appropriate for document-based tasks such as EHR extracts used in clinical practice, or edition and visualization, or situations where the aim is not only to query medical information, but also to restore the EHR in exactly its original form. PMID:29608174
Sánchez-de-Madariaga, Ricardo; Muñoz, Adolfo; Castro, Antonio L; Moreno, Oscar; Pascual, Mario
2018-03-19
This research shows a protocol to assess the computational complexity of querying relational and non-relational (NoSQL (not only Structured Query Language)) standardized electronic health record (EHR) medical information database systems (DBMS). It uses a set of three doubling-sized databases, i.e. databases storing 5000, 10,000 and 20,000 realistic standardized EHR extracts, in three different database management systems (DBMS): relational MySQL object-relational mapping (ORM), document-based NoSQL MongoDB, and native extensible markup language (XML) NoSQL eXist. The average response times to six complexity-increasing queries were computed, and the results showed a linear behavior in the NoSQL cases. In the NoSQL field, MongoDB presents a much flatter linear slope than eXist. NoSQL systems may also be more appropriate to maintain standardized medical information systems due to the special nature of the updating policies of medical information, which should not affect the consistency and efficiency of the data stored in NoSQL databases. One limitation of this protocol is the lack of direct results of improved relational systems such as archetype relational mapping (ARM) with the same data. However, the interpolation of doubling-size database results to those presented in the literature and other published results suggests that NoSQL systems might be more appropriate in many specific scenarios and problems to be solved. For example, NoSQL may be appropriate for document-based tasks such as EHR extracts used in clinical practice, or edition and visualization, or situations where the aim is not only to query medical information, but also to restore the EHR in exactly its original form.
Improved Information Retrieval Performance on SQL Database Using Data Adapter
NASA Astrophysics Data System (ADS)
Husni, M.; Djanali, S.; Ciptaningtyas, H. T.; Wicaksana, I. G. N. A.
2018-02-01
The NoSQL databases, short for Not Only SQL, are increasingly being used as the number of big data applications increases. Most systems still use relational databases (RDBs), but as the number of data increases each year, the system handles big data with NoSQL databases to analyze and access data more quickly. NoSQL emerged as a result of the exponential growth of the internet and the development of web applications. The query syntax in the NoSQL database differs from the SQL database, therefore requiring code changes in the application. Data adapter allow applications to not change their SQL query syntax. Data adapters provide methods that can synchronize SQL databases with NotSQL databases. In addition, the data adapter provides an interface which is application can access to run SQL queries. Hence, this research applied data adapter system to synchronize data between MySQL database and Apache HBase using direct access query approach, where system allows application to accept query while synchronization process in progress. From the test performed using data adapter, the results obtained that the data adapter can synchronize between SQL databases, MySQL, and NoSQL database, Apache HBase. This system spends the percentage of memory resources in the range of 40% to 60%, and the percentage of processor moving from 10% to 90%. In addition, from this system also obtained the performance of database NoSQL better than SQL database.
SQL/NF Translator for the Triton Nested Relational Database System
1990-12-01
18as., Ohio .. 9~~ ~~ 1 4- AFIT/GCE/ENG/90D-05 SQL/Nk1 TRANSLATOR FOR THE TRITON NESTED RELATIONAL DATABASE SYSTEM THESIS Craig William Schnepf Captain...FOR THE TRITON NESTED RELATIONAL DATABASE SYSTEM THESIS Presented to the Faculty of the School of Engineering of the Air Force Institute of Technnlogy... systems . The SQL/NF query language used for the nested relationil model is an extension of the popular relational model query language SQL. The query
NASA Astrophysics Data System (ADS)
Dziedzic, Adam; Mulawka, Jan
2014-11-01
NoSQL is a new approach to data storage and manipulation. The aim of this paper is to gain more insight into NoSQL databases, as we are still in the early stages of understanding when to use them and how to use them in an appropriate way. In this submission descriptions of selected NoSQL databases are presented. Each of the databases is analysed with primary focus on its data model, data access, architecture and practical usage in real applications. Furthemore, the NoSQL databases are compared in fields of data references. The relational databases offer foreign keys, whereas NoSQL databases provide us with limited references. An intermediate model between graph theory and relational algebra which can address the problem should be created. Finally, the proposal of a new approach to the problem of inconsistent references in Big Data storage systems is introduced.
Flexible network reconstruction from relational databases with Cytoscape and CytoSQL
2010-01-01
Background Molecular interaction networks can be efficiently studied using network visualization software such as Cytoscape. The relevant nodes, edges and their attributes can be imported in Cytoscape in various file formats, or directly from external databases through specialized third party plugins. However, molecular data are often stored in relational databases with their own specific structure, for which dedicated plugins do not exist. Therefore, a more generic solution is presented. Results A new Cytoscape plugin 'CytoSQL' is developed to connect Cytoscape to any relational database. It allows to launch SQL ('Structured Query Language') queries from within Cytoscape, with the option to inject node or edge features of an existing network as SQL arguments, and to convert the retrieved data to Cytoscape network components. Supported by a set of case studies we demonstrate the flexibility and the power of the CytoSQL plugin in converting specific data subsets into meaningful network representations. Conclusions CytoSQL offers a unified approach to let Cytoscape interact with relational databases. Thanks to the power of the SQL syntax, this tool can rapidly generate and enrich networks according to very complex criteria. The plugin is available at http://www.ptools.ua.ac.be/CytoSQL. PMID:20594316
Flexible network reconstruction from relational databases with Cytoscape and CytoSQL.
Laukens, Kris; Hollunder, Jens; Dang, Thanh Hai; De Jaeger, Geert; Kuiper, Martin; Witters, Erwin; Verschoren, Alain; Van Leemput, Koenraad
2010-07-01
Molecular interaction networks can be efficiently studied using network visualization software such as Cytoscape. The relevant nodes, edges and their attributes can be imported in Cytoscape in various file formats, or directly from external databases through specialized third party plugins. However, molecular data are often stored in relational databases with their own specific structure, for which dedicated plugins do not exist. Therefore, a more generic solution is presented. A new Cytoscape plugin 'CytoSQL' is developed to connect Cytoscape to any relational database. It allows to launch SQL ('Structured Query Language') queries from within Cytoscape, with the option to inject node or edge features of an existing network as SQL arguments, and to convert the retrieved data to Cytoscape network components. Supported by a set of case studies we demonstrate the flexibility and the power of the CytoSQL plugin in converting specific data subsets into meaningful network representations. CytoSQL offers a unified approach to let Cytoscape interact with relational databases. Thanks to the power of the SQL syntax, this tool can rapidly generate and enrich networks according to very complex criteria. The plugin is available at http://www.ptools.ua.ac.be/CytoSQL.
Lee, Ken Ka-Yin; Tang, Wai-Choi; Choi, Kup-Sze
2013-04-01
Clinical data are dynamic in nature, often arranged hierarchically and stored as free text and numbers. Effective management of clinical data and the transformation of the data into structured format for data analysis are therefore challenging issues in electronic health records development. Despite the popularity of relational databases, the scalability of the NoSQL database model and the document-centric data structure of XML databases appear to be promising features for effective clinical data management. In this paper, three database approaches--NoSQL, XML-enabled and native XML--are investigated to evaluate their suitability for structured clinical data. The database query performance is reported, together with our experience in the databases development. The results show that NoSQL database is the best choice for query speed, whereas XML databases are advantageous in terms of scalability, flexibility and extensibility, which are essential to cope with the characteristics of clinical data. While NoSQL and XML technologies are relatively new compared to the conventional relational database, both of them demonstrate potential to become a key database technology for clinical data management as the technology further advances. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.
2012-09-01
relative performance of several conventional SQL and NoSQL databases with a set of one billion file block hashes. Digital Forensics, Sector Hashing, Full... NoSQL databases with a set of one billion file block hashes. v THIS PAGE INTENTIONALLY LEFT BLANK vi Table of Contents List of Acronyms and...Operating System NOOP No Operation assembly instruction NoSQL “Not only SQL” model for non-relational database management NSRL National Software
Hewitt, Robin; Gobbi, Alberto; Lee, Man-Ling
2005-01-01
Relational databases are the current standard for storing and retrieving data in the pharmaceutical and biotech industries. However, retrieving data from a relational database requires specialized knowledge of the database schema and of the SQL query language. At Anadys, we have developed an easy-to-use system for searching and reporting data in a relational database to support our drug discovery project teams. This system is fast and flexible and allows users to access all data without having to write SQL queries. This paper presents the hierarchical, graph-based metadata representation and SQL-construction methods that, together, are the basis of this system's capabilities.
Performance Evaluation of NoSQL Databases: A Case Study
2015-02-01
a centralized relational database. The customer decided to consider NoSQL technologies for two specific uses, namely: the primary data store for...17 custom specific 6. FU NoSQL availab data mo arking of data g a specific wo sin benchmark f hmark for tran le workload de o publish meas their...The choice of a particular NoSQL database imposes a specific distributed software architecture and data model, and is a major determinant of the
DOE Office of Scientific and Technical Information (OSTI.GOV)
David Nix, Lisa Simirenko
2006-10-25
The Biolmaging Database (BID) is a relational database developed to store the data and meta-data for the 3D gene expression in early Drosophila embryo development on a cellular level. The schema was written to be used with the MySQL DBMS but with minor modifications can be used on any SQL compliant relational DBMS.
Network Configuration of Oracle and Database Programming Using SQL
NASA Technical Reports Server (NTRS)
Davis, Melton; Abdurrashid, Jibril; Diaz, Philip; Harris, W. C.
2000-01-01
A database can be defined as a collection of information organized in such a way that it can be retrieved and used. A database management system (DBMS) can further be defined as the tool that enables us to manage and interact with the database. The Oracle 8 Server is a state-of-the-art information management environment. It is a repository for very large amounts of data, and gives users rapid access to that data. The Oracle 8 Server allows for sharing of data between applications; the information is stored in one place and used by many systems. My research will focus primarily on SQL (Structured Query Language) programming. SQL is the way you define and manipulate data in Oracle's relational database. SQL is the industry standard adopted by all database vendors. When programming with SQL, you work on sets of data (i.e., information is not processed one record at a time).
NASA Technical Reports Server (NTRS)
McGlynn, T.; Santisteban, M.
2007-01-01
This chapter provides a very brief introduction to the Structured Query Language (SQL) for getting information from relational databases. We make no pretense that this is a complete or comprehensive discussion of SQL. There are many aspects of the language the will be completely ignored in the presentation. The goal here is to provide enough background so that users understand the basic concepts involved in building and using relational databases. We also go through the steps involved in building a particular astronomical database used in some of the other presentations in this volume.
Information Security Considerations for Applications Using Apache Accumulo
2014-09-01
Distributed File System INSCOM United States Army Intelligence and Security Command JPA Java Persistence API JSON JavaScript Object Notation MAC Mandatory... MySQL [13]. BigTable can process 20 petabytes per day [14]. High degree of scalability on commodity hardware. NoSQL databases do not rely on highly...manipulation in relational databases. NoSQL databases each have a unique programming interface that uses a lower level procedural language (e.g., Java
Comparison of the Frontier Distributed Database Caching System to NoSQL Databases
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dykstra, Dave
One of the main attractions of non-relational NoSQL databases is their ability to scale to large numbers of readers, including readers spread over a wide area. The Frontier distributed database caching system, used in production by the Large Hadron Collider CMS and ATLAS detector projects for Conditions data, is based on traditional SQL databases but also adds high scalability and the ability to be distributed over a wide-area for an important subset of applications. This paper compares the major characteristics of the two different approaches and identifies the criteria for choosing which approach to prefer over the other. It alsomore » compares in some detail the NoSQL databases used by CMS and ATLAS: MongoDB, CouchDB, HBase, and Cassandra.« less
Research on high availability architecture of SQL and NoSQL
NASA Astrophysics Data System (ADS)
Wang, Zhiguo; Wei, Zhiqiang; Liu, Hao
2017-03-01
With the advent of the era of big data, amount and importance of data have increased dramatically. SQL database develops in performance and scalability, but more and more companies tend to use NoSQL database as their databases, because NoSQL database has simpler data model and stronger extension capacity than SQL database. Almost all database designers including SQL database and NoSQL database aim to improve performance and ensure availability by reasonable architecture which can reduce the effects of software failures and hardware failures, so that they can provide better experiences for their customers. In this paper, I mainly discuss the architectures of MySQL, MongoDB, and Redis, which are high available and have been deployed in practical application environment, and design a hybrid architecture.
2002-06-01
Student memo for personnel MCLLS . . . . . . . . . . . . . . 75 i. Migrate data to SQL Server...The Web Server is on the same server as the SWORD database in the current version. 4: results set 5: dynamic HTML page 6: dynamic HTML page 3: SQL ...still be supported by Access. SQL Server would be a more viable tool for a fully developed application based on the number of potential users and
Integrating Scientific Array Processing into Standard SQL
NASA Astrophysics Data System (ADS)
Misev, Dimitar; Bachhuber, Johannes; Baumann, Peter
2014-05-01
We live in a time that is dominated by data. Data storage is cheap and more applications than ever accrue vast amounts of data. Storing the emerging multidimensional data sets efficiently, however, and allowing them to be queried by their inherent structure, is a challenge many databases have to face today. Despite the fact that multidimensional array data is almost always linked to additional, non-array information, array databases have mostly developed separately from relational systems, resulting in a disparity between the two database categories. The current SQL standard and SQL DBMS supports arrays - and in an extension also multidimensional arrays - but does so in a very rudimentary and inefficient way. This poster demonstrates the practicality of an SQL extension for array processing, implemented in a proof-of-concept multi-faceted system that manages a federation of array and relational database systems, providing transparent, efficient and scalable access to the heterogeneous data in them.
An effective model for store and retrieve big health data in cloud computing.
Goli-Malekabadi, Zohreh; Sargolzaei-Javan, Morteza; Akbari, Mohammad Kazem
2016-08-01
The volume of healthcare data including different and variable text types, sounds, and images is increasing day to day. Therefore, the storage and processing of these data is a necessary and challenging issue. Generally, relational databases are used for storing health data which are not able to handle the massive and diverse nature of them. This study aimed at presenting the model based on NoSQL databases for the storage of healthcare data. Despite different types of NoSQL databases, document-based DBs were selected by a survey on the nature of health data. The presented model was implemented in the Cloud environment for accessing to the distribution properties. Then, the data were distributed on the database by applying the Shard property. The efficiency of the model was evaluated in comparison with the previous data model, Relational Database, considering query time, data preparation, flexibility, and extensibility parameters. The results showed that the presented model approximately performed the same as SQL Server for "read" query while it acted more efficiently than SQL Server for "write" query. Also, the performance of the presented model was better than SQL Server in the case of flexibility, data preparation and extensibility. Based on these observations, the proposed model was more effective than Relational Databases for handling health data. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Technical Aspects of Interfacing MUMPS to an External SQL Relational Database Management System
Kuzmak, Peter M.; Walters, Richard F.; Penrod, Gail
1988-01-01
This paper describes an interface connecting InterSystems MUMPS (M/VX) to an external relational DBMS, the SYBASE Database Management System. The interface enables MUMPS to operate in a relational environment and gives the MUMPS language full access to a complete set of SQL commands. MUMPS generates SQL statements as ASCII text and sends them to the RDBMS. The RDBMS executes the statements and returns ASCII results to MUMPS. The interface suggests that the language features of MUMPS make it an attractive tool for use in the relational database environment. The approach described in this paper separates MUMPS from the relational database. Positioning the relational database outside of MUMPS promotes data sharing and permits a number of different options to be used for working with the data. Other languages like C, FORTRAN, and COBOL can access the RDBMS database. Advanced tools provided by the relational database vendor can also be used. SYBASE is an advanced high-performance transaction-oriented relational database management system for the VAX/VMS and UNIX operating systems. SYBASE is designed using a distributed open-systems architecture, and is relatively easy to interface with MUMPS.
Evaluating the Cassandra NoSQL Database Approach for Genomic Data Persistency.
Aniceto, Rodrigo; Xavier, Rene; Guimarães, Valeria; Hondo, Fernanda; Holanda, Maristela; Walter, Maria Emilia; Lifschitz, Sérgio
2015-01-01
Rapid advances in high-throughput sequencing techniques have created interesting computational challenges in bioinformatics. One of them refers to management of massive amounts of data generated by automatic sequencers. We need to deal with the persistency of genomic data, particularly storing and analyzing these large-scale processed data. To find an alternative to the frequently considered relational database model becomes a compelling task. Other data models may be more effective when dealing with a very large amount of nonconventional data, especially for writing and retrieving operations. In this paper, we discuss the Cassandra NoSQL database approach for storing genomic data. We perform an analysis of persistency and I/O operations with real data, using the Cassandra database system. We also compare the results obtained with a classical relational database system and another NoSQL database approach, MongoDB.
An Experimental Investigation of Complexity in Database Query Formulation Tasks
ERIC Educational Resources Information Center
Casterella, Gretchen Irwin; Vijayasarathy, Leo
2013-01-01
Information Technology professionals and other knowledge workers rely on their ability to extract data from organizational databases to respond to business questions and support decision making. Structured query language (SQL) is the standard programming language for querying data in relational databases, and SQL skills are in high demand and are…
Architecture Knowledge for Evaluating Scalable Databases
2015-01-16
problems, arising from the proliferation of new data models and distributed technologies for building scalable, available data stores . Architects must...longer are relational databases the de facto standard for building data repositories. Highly distributed, scalable “ NoSQL ” databases [11] have emerged...This is especially challenging at the data storage layer. The multitude of competing NoSQL database technologies creates a complex and rapidly
Maintaining Multimedia Data in a Geospatial Database
2012-09-01
at PostgreSQL and MySQL as spatial databases was offered. Given their results, as each database produced result sets from zero to 100,000, it was...excelled given multiple conditions. A different look at PostgreSQL and MySQL as spatial databases was offered. Given their results, as each database... MySQL ................................................................................................14 B. BENCHMARKING DATA RETRIEVED FROM TABLE
Evaluating the Cassandra NoSQL Database Approach for Genomic Data Persistency
Aniceto, Rodrigo; Xavier, Rene; Guimarães, Valeria; Hondo, Fernanda; Holanda, Maristela; Walter, Maria Emilia; Lifschitz, Sérgio
2015-01-01
Rapid advances in high-throughput sequencing techniques have created interesting computational challenges in bioinformatics. One of them refers to management of massive amounts of data generated by automatic sequencers. We need to deal with the persistency of genomic data, particularly storing and analyzing these large-scale processed data. To find an alternative to the frequently considered relational database model becomes a compelling task. Other data models may be more effective when dealing with a very large amount of nonconventional data, especially for writing and retrieving operations. In this paper, we discuss the Cassandra NoSQL database approach for storing genomic data. We perform an analysis of persistency and I/O operations with real data, using the Cassandra database system. We also compare the results obtained with a classical relational database system and another NoSQL database approach, MongoDB. PMID:26558254
Learning Asset Technology Integration Support Tool Design Document
2010-05-11
language known as Hypertext Preprocessor ( PHP ) and by MySQL – a relational database management system that can also be used for content management. It...Requirements The LATIST tool will be implemented utilizing a WordPress platform with MySQL as the database. Also the LATIST system must effectively work... MySQL . When designing the LATIST system there are several considerations which must be accounted for in the working prototype. These include: • DAU
Cloud-Based Distributed Control of Unmanned Systems
2015-04-01
during mission execution. At best, the data is saved onto hard-drives and is accessible only by the local team. Data history in a form available and...following open source technologies: GeoServer, OpenLayers, PostgreSQL , and PostGIS are chosen to implement the back-end database and server. A brief...geospatial map data. 3. PostgreSQL : An SQL-compliant object-relational database that easily scales to accommodate large amounts of data - upwards to
Freire, Sergio Miranda; Teodoro, Douglas; Wei-Kleiner, Fang; Sundvall, Erik; Karlsson, Daniel; Lambrix, Patrick
2016-01-01
This study provides an experimental performance evaluation on population-based queries of NoSQL databases storing archetype-based Electronic Health Record (EHR) data. There are few published studies regarding the performance of persistence mechanisms for systems that use multilevel modelling approaches, especially when the focus is on population-based queries. A healthcare dataset with 4.2 million records stored in a relational database (MySQL) was used to generate XML and JSON documents based on the openEHR reference model. Six datasets with different sizes were created from these documents and imported into three single machine XML databases (BaseX, eXistdb and Berkeley DB XML) and into a distributed NoSQL database system based on the MapReduce approach, Couchbase, deployed in different cluster configurations of 1, 2, 4, 8 and 12 machines. Population-based queries were submitted to those databases and to the original relational database. Database size and query response times are presented. The XML databases were considerably slower and required much more space than Couchbase. Overall, Couchbase had better response times than MySQL, especially for larger datasets. However, Couchbase requires indexing for each differently formulated query and the indexing time increases with the size of the datasets. The performances of the clusters with 2, 4, 8 and 12 nodes were not better than the single node cluster in relation to the query response time, but the indexing time was reduced proportionally to the number of nodes. The tested XML databases had acceptable performance for openEHR-based data in some querying use cases and small datasets, but were generally much slower than Couchbase. Couchbase also outperformed the response times of the relational database, but required more disk space and had a much longer indexing time. Systems like Couchbase are thus interesting research targets for scalable storage and querying of archetype-based EHR data when population-based use cases are of interest. PMID:26958859
Freire, Sergio Miranda; Teodoro, Douglas; Wei-Kleiner, Fang; Sundvall, Erik; Karlsson, Daniel; Lambrix, Patrick
2016-01-01
This study provides an experimental performance evaluation on population-based queries of NoSQL databases storing archetype-based Electronic Health Record (EHR) data. There are few published studies regarding the performance of persistence mechanisms for systems that use multilevel modelling approaches, especially when the focus is on population-based queries. A healthcare dataset with 4.2 million records stored in a relational database (MySQL) was used to generate XML and JSON documents based on the openEHR reference model. Six datasets with different sizes were created from these documents and imported into three single machine XML databases (BaseX, eXistdb and Berkeley DB XML) and into a distributed NoSQL database system based on the MapReduce approach, Couchbase, deployed in different cluster configurations of 1, 2, 4, 8 and 12 machines. Population-based queries were submitted to those databases and to the original relational database. Database size and query response times are presented. The XML databases were considerably slower and required much more space than Couchbase. Overall, Couchbase had better response times than MySQL, especially for larger datasets. However, Couchbase requires indexing for each differently formulated query and the indexing time increases with the size of the datasets. The performances of the clusters with 2, 4, 8 and 12 nodes were not better than the single node cluster in relation to the query response time, but the indexing time was reduced proportionally to the number of nodes. The tested XML databases had acceptable performance for openEHR-based data in some querying use cases and small datasets, but were generally much slower than Couchbase. Couchbase also outperformed the response times of the relational database, but required more disk space and had a much longer indexing time. Systems like Couchbase are thus interesting research targets for scalable storage and querying of archetype-based EHR data when population-based use cases are of interest.
TabSQL: a MySQL tool to facilitate mapping user data to public databases.
Xia, Xiao-Qin; McClelland, Michael; Wang, Yipeng
2010-06-23
With advances in high-throughput genomics and proteomics, it is challenging for biologists to deal with large data files and to map their data to annotations in public databases. We developed TabSQL, a MySQL-based application tool, for viewing, filtering and querying data files with large numbers of rows. TabSQL provides functions for downloading and installing table files from public databases including the Gene Ontology database (GO), the Ensembl databases, and genome databases from the UCSC genome bioinformatics site. Any other database that provides tab-delimited flat files can also be imported. The downloaded gene annotation tables can be queried together with users' data in TabSQL using either a graphic interface or command line. TabSQL allows queries across the user's data and public databases without programming. It is a convenient tool for biologists to annotate and enrich their data.
TabSQL: a MySQL tool to facilitate mapping user data to public databases
2010-01-01
Background With advances in high-throughput genomics and proteomics, it is challenging for biologists to deal with large data files and to map their data to annotations in public databases. Results We developed TabSQL, a MySQL-based application tool, for viewing, filtering and querying data files with large numbers of rows. TabSQL provides functions for downloading and installing table files from public databases including the Gene Ontology database (GO), the Ensembl databases, and genome databases from the UCSC genome bioinformatics site. Any other database that provides tab-delimited flat files can also be imported. The downloaded gene annotation tables can be queried together with users' data in TabSQL using either a graphic interface or command line. Conclusions TabSQL allows queries across the user's data and public databases without programming. It is a convenient tool for biologists to annotate and enrich their data. PMID:20573251
Comparison of the Frontier Distributed Database Caching System to NoSQL Databases
NASA Astrophysics Data System (ADS)
Dykstra, Dave
2012-12-01
One of the main attractions of non-relational “NoSQL” databases is their ability to scale to large numbers of readers, including readers spread over a wide area. The Frontier distributed database caching system, used in production by the Large Hadron Collider CMS and ATLAS detector projects for Conditions data, is based on traditional SQL databases but also adds high scalability and the ability to be distributed over a wide-area for an important subset of applications. This paper compares the major characteristics of the two different approaches and identifies the criteria for choosing which approach to prefer over the other. It also compares in some detail the NoSQL databases used by CMS and ATLAS: MongoDB, CouchDB, HBase, and Cassandra.
Evaluating a NoSQL Alternative for Chilean Virtual Observatory Services
NASA Astrophysics Data System (ADS)
Antognini, J.; Araya, M.; Solar, M.; Valenzuela, C.; Lira, F.
2015-09-01
Currently, the standards and protocols for data access in the Virtual Observatory architecture (DAL) are generally implemented with relational databases based on SQL. In particular, the Astronomical Data Query Language (ADQL), language used by IVOA to represent queries to VO services, was created to satisfy the different data access protocols, such as Simple Cone Search. ADQL is based in SQL92, and has extra functionality implemented using PgSphere. An emergent alternative to SQL are the so called NoSQL databases, which can be classified in several categories such as Column, Document, Key-Value, Graph, Object, etc.; each one recommended for different scenarios. Within their notable characteristics we can find: schema-free, easy replication support, simple API, Big Data, etc. The Chilean Virtual Observatory (ChiVO) is developing a functional prototype based on the IVOA architecture, with the following relevant factors: Performance, Scalability, Flexibility, Complexity, and Functionality. Currently, it's very difficult to compare these factors, due to a lack of alternatives. The objective of this paper is to compare NoSQL alternatives with SQL through the implementation of a Web API REST that satisfies ChiVO's needs: a SESAME-style name resolver for the data from ALMA. Therefore, we propose a test scenario by configuring a NoSQL database with data from different sources and evaluating the feasibility of creating a Simple Cone Search service and its performance. This comparison will allow to pave the way for the application of Big Data databases in the Virtual Observatory.
Comprehensive Routing Security Development and Deployment for the Internet
2015-02-01
feature enhancement and bug fixes. • MySQL : MySQL is a widely used and popular open source database package. It was chosen for database support in the...RPSTIR depends on several other open source packages. • MySQL : MySQL is used for the the local RPKI database cache. • OpenSSL: OpenSSL is used for...cryptographic libraries for X.509 certificates. • ODBC mySql Connector: ODBC (Open Database Connectivity) is a standard programming interface (API) for
Use of Graph Database for the Integration of Heterogeneous Biological Data.
Yoon, Byoung-Ha; Kim, Seon-Kyu; Kim, Seon-Young
2017-03-01
Understanding complex relationships among heterogeneous biological data is one of the fundamental goals in biology. In most cases, diverse biological data are stored in relational databases, such as MySQL and Oracle, which store data in multiple tables and then infer relationships by multiple-join statements. Recently, a new type of database, called the graph-based database, was developed to natively represent various kinds of complex relationships, and it is widely used among computer science communities and IT industries. Here, we demonstrate the feasibility of using a graph-based database for complex biological relationships by comparing the performance between MySQL and Neo4j, one of the most widely used graph databases. We collected various biological data (protein-protein interaction, drug-target, gene-disease, etc.) from several existing sources, removed duplicate and redundant data, and finally constructed a graph database containing 114,550 nodes and 82,674,321 relationships. When we tested the query execution performance of MySQL versus Neo4j, we found that Neo4j outperformed MySQL in all cases. While Neo4j exhibited a very fast response for various queries, MySQL exhibited latent or unfinished responses for complex queries with multiple-join statements. These results show that using graph-based databases, such as Neo4j, is an efficient way to store complex biological relationships. Moreover, querying a graph database in diverse ways has the potential to reveal novel relationships among heterogeneous biological data.
Use of Graph Database for the Integration of Heterogeneous Biological Data
Yoon, Byoung-Ha; Kim, Seon-Kyu
2017-01-01
Understanding complex relationships among heterogeneous biological data is one of the fundamental goals in biology. In most cases, diverse biological data are stored in relational databases, such as MySQL and Oracle, which store data in multiple tables and then infer relationships by multiple-join statements. Recently, a new type of database, called the graph-based database, was developed to natively represent various kinds of complex relationships, and it is widely used among computer science communities and IT industries. Here, we demonstrate the feasibility of using a graph-based database for complex biological relationships by comparing the performance between MySQL and Neo4j, one of the most widely used graph databases. We collected various biological data (protein-protein interaction, drug-target, gene-disease, etc.) from several existing sources, removed duplicate and redundant data, and finally constructed a graph database containing 114,550 nodes and 82,674,321 relationships. When we tested the query execution performance of MySQL versus Neo4j, we found that Neo4j outperformed MySQL in all cases. While Neo4j exhibited a very fast response for various queries, MySQL exhibited latent or unfinished responses for complex queries with multiple-join statements. These results show that using graph-based databases, such as Neo4j, is an efficient way to store complex biological relationships. Moreover, querying a graph database in diverse ways has the potential to reveal novel relationships among heterogeneous biological data. PMID:28416946
NASA Astrophysics Data System (ADS)
Velazquez, Enrique Israel
Improvements in medical and genomic technologies have dramatically increased the production of electronic data over the last decade. As a result, data management is rapidly becoming a major determinant, and urgent challenge, for the development of Precision Medicine. Although successful data management is achievable using Relational Database Management Systems (RDBMS), exponential data growth is a significant contributor to failure scenarios. Growing amounts of data can also be observed in other sectors, such as economics and business, which, together with the previous facts, suggests that alternate database approaches (NoSQL) may soon be required for efficient storage and management of big databases. However, this hypothesis has been difficult to test in the Precision Medicine field since alternate database architectures are complex to assess and means to integrate heterogeneous electronic health records (EHR) with dynamic genomic data are not easily available. In this dissertation, we present a novel set of experiments for identifying NoSQL database approaches that enable effective data storage and management in Precision Medicine using patients' clinical and genomic information from the cancer genome atlas (TCGA). The first experiment draws on performance and scalability from biologically meaningful queries with differing complexity and database sizes. The second experiment measures performance and scalability in database updates without schema changes. The third experiment assesses performance and scalability in database updates with schema modifications due dynamic data. We have identified two NoSQL approach, based on Cassandra and Redis, which seems to be the ideal database management systems for our precision medicine queries in terms of performance and scalability. We present NoSQL approaches and show how they can be used to manage clinical and genomic big data. Our research is relevant to the public health since we are focusing on one of the main challenges to the development of Precision Medicine and, consequently, investigating a potential solution to the progressively increasing demands on health care.
Database constraints applied to metabolic pathway reconstruction tools.
Vilaplana, Jordi; Solsona, Francesc; Teixido, Ivan; Usié, Anabel; Karathia, Hiren; Alves, Rui; Mateo, Jordi
2014-01-01
Our group developed two biological applications, Biblio-MetReS and Homol-MetReS, accessing the same database of organisms with annotated genes. Biblio-MetReS is a data-mining application that facilitates the reconstruction of molecular networks based on automated text-mining analysis of published scientific literature. Homol-MetReS allows functional (re)annotation of proteomes, to properly identify both the individual proteins involved in the process(es) of interest and their function. It also enables the sets of proteins involved in the process(es) in different organisms to be compared directly. The efficiency of these biological applications is directly related to the design of the shared database. We classified and analyzed the different kinds of access to the database. Based on this study, we tried to adjust and tune the configurable parameters of the database server to reach the best performance of the communication data link to/from the database system. Different database technologies were analyzed. We started the study with a public relational SQL database, MySQL. Then, the same database was implemented by a MapReduce-based database named HBase. The results indicated that the standard configuration of MySQL gives an acceptable performance for low or medium size databases. Nevertheless, tuning database parameters can greatly improve the performance and lead to very competitive runtimes.
PACSY, a relational database management system for protein structure and chemical shift analysis.
Lee, Woonghee; Yu, Wookyung; Kim, Suhkmann; Chang, Iksoo; Lee, Weontae; Markley, John L
2012-10-01
PACSY (Protein structure And Chemical Shift NMR spectroscopY) is a relational database management system that integrates information from the Protein Data Bank, the Biological Magnetic Resonance Data Bank, and the Structural Classification of Proteins database. PACSY provides three-dimensional coordinates and chemical shifts of atoms along with derived information such as torsion angles, solvent accessible surface areas, and hydrophobicity scales. PACSY consists of six relational table types linked to one another for coherence by key identification numbers. Database queries are enabled by advanced search functions supported by an RDBMS server such as MySQL or PostgreSQL. PACSY enables users to search for combinations of information from different database sources in support of their research. Two software packages, PACSY Maker for database creation and PACSY Analyzer for database analysis, are available from http://pacsy.nmrfam.wisc.edu.
Integrated Array/Metadata Analytics
NASA Astrophysics Data System (ADS)
Misev, Dimitar; Baumann, Peter
2015-04-01
Data comes in various forms and types, and integration usually presents a problem that is often simply ignored and solved with ad-hoc solutions. Multidimensional arrays are an ubiquitous data type, that we find at the core of virtually all science and engineering domains, as sensor, model, image, statistics data. Naturally, arrays are richly described by and intertwined with additional metadata (alphanumeric relational data, XML, JSON, etc). Database systems, however, a fundamental building block of what we call "Big Data", lack adequate support for modelling and expressing these array data/metadata relationships. Array analytics is hence quite primitive or non-existent at all in modern relational DBMS. Recognizing this, we extended SQL with a new SQL/MDA part seamlessly integrating multidimensional array analytics into the standard database query language. We demonstrate the benefits of SQL/MDA with real-world examples executed in ASQLDB, an open-source mediator system based on HSQLDB and rasdaman, that already implements SQL/MDA.
Flexible Decision Support in Device-Saturated Environments
2003-10-01
also output tuples to a remote MySQL or Postgres database. 3.3 GUI The GUI allows the user to pose queries using SQL and to display query...DatabaseConnection.java – handles connections to an external database (such as MySQL or Postgres ). • Debug.java – contains the code for printing out Debug messages...also provided. It is possible to output the results of queries to a MySQL or Postgres database for archival and the GUI can query those results
The MAO NASU Plate Archive Database. Current Status and Perspectives
NASA Astrophysics Data System (ADS)
Pakuliak, L. K.; Sergeeva, T. P.
2006-04-01
The preliminary online version of the database of the MAO NASU plate archive is constructed on the basis of the relational database management system MySQL and permits an easy supplement of database with new collections of astronegatives, provides a high flexibility in constructing SQL-queries for data search optimization, PHP Basic Authorization protected access to administrative interface and wide range of search parameters. The current status of the database will be reported and the brief description of the search engine and means of the database integrity support will be given. Methods and means of the data verification and tasks for the further development will be discussed.
PACSY, a relational database management system for protein structure and chemical shift analysis
Lee, Woonghee; Yu, Wookyung; Kim, Suhkmann; Chang, Iksoo
2012-01-01
PACSY (Protein structure And Chemical Shift NMR spectroscopY) is a relational database management system that integrates information from the Protein Data Bank, the Biological Magnetic Resonance Data Bank, and the Structural Classification of Proteins database. PACSY provides three-dimensional coordinates and chemical shifts of atoms along with derived information such as torsion angles, solvent accessible surface areas, and hydrophobicity scales. PACSY consists of six relational table types linked to one another for coherence by key identification numbers. Database queries are enabled by advanced search functions supported by an RDBMS server such as MySQL or PostgreSQL. PACSY enables users to search for combinations of information from different database sources in support of their research. Two software packages, PACSY Maker for database creation and PACSY Analyzer for database analysis, are available from http://pacsy.nmrfam.wisc.edu. PMID:22903636
DOE Office of Scientific and Technical Information (OSTI.GOV)
Femec, D.A.
This report describes two code-generating tools used to speed design and implementation of relational databases and user interfaces: CREATE-SCHEMA and BUILD-SCREEN. CREATE-SCHEMA produces the SQL commands that actually create and define the database. BUILD-SCREEN takes templates for data entry screens and generates the screen management system routine calls to display the desired screen. Both tools also generate the related FORTRAN declaration statements and precompiled SQL calls. Included with this report is the source code for a number of FORTRAN routines and functions used by the user interface. This code is broadly applicable to a number of different databases.
Analysis and Development of a Web-Enabled Planning and Scheduling Database Application
2013-09-01
establishes an entity—relationship diagram for the desired process, constructs an operable database using MySQL , and provides a web- enabled interface for...development, develop, design, process, re- engineering, reengineering, MySQL , structured query language, SQL, myPHPadmin. 15. NUMBER OF PAGES 107 16...relationship diagram for the desired process, constructs an operable database using MySQL , and provides a web-enabled interface for the population of
Database Constraints Applied to Metabolic Pathway Reconstruction Tools
Vilaplana, Jordi; Solsona, Francesc; Teixido, Ivan; Usié, Anabel; Karathia, Hiren; Alves, Rui; Mateo, Jordi
2014-01-01
Our group developed two biological applications, Biblio-MetReS and Homol-MetReS, accessing the same database of organisms with annotated genes. Biblio-MetReS is a data-mining application that facilitates the reconstruction of molecular networks based on automated text-mining analysis of published scientific literature. Homol-MetReS allows functional (re)annotation of proteomes, to properly identify both the individual proteins involved in the process(es) of interest and their function. It also enables the sets of proteins involved in the process(es) in different organisms to be compared directly. The efficiency of these biological applications is directly related to the design of the shared database. We classified and analyzed the different kinds of access to the database. Based on this study, we tried to adjust and tune the configurable parameters of the database server to reach the best performance of the communication data link to/from the database system. Different database technologies were analyzed. We started the study with a public relational SQL database, MySQL. Then, the same database was implemented by a MapReduce-based database named HBase. The results indicated that the standard configuration of MySQL gives an acceptable performance for low or medium size databases. Nevertheless, tuning database parameters can greatly improve the performance and lead to very competitive runtimes. PMID:25202745
Wiley, Laura K.; Sivley, R. Michael; Bush, William S.
2013-01-01
Efficient storage and retrieval of genomic annotations based on range intervals is necessary, given the amount of data produced by next-generation sequencing studies. The indexing strategies of relational database systems (such as MySQL) greatly inhibit their use in genomic annotation tasks. This has led to the development of stand-alone applications that are dependent on flat-file libraries. In this work, we introduce MyNCList, an implementation of the NCList data structure within a MySQL database. MyNCList enables the storage, update and rapid retrieval of genomic annotations from the convenience of a relational database system. Range-based annotations of 1 million variants are retrieved in under a minute, making this approach feasible for whole-genome annotation tasks. Database URL: https://github.com/bushlab/mynclist PMID:23894185
Wiley, Laura K; Sivley, R Michael; Bush, William S
2013-01-01
Efficient storage and retrieval of genomic annotations based on range intervals is necessary, given the amount of data produced by next-generation sequencing studies. The indexing strategies of relational database systems (such as MySQL) greatly inhibit their use in genomic annotation tasks. This has led to the development of stand-alone applications that are dependent on flat-file libraries. In this work, we introduce MyNCList, an implementation of the NCList data structure within a MySQL database. MyNCList enables the storage, update and rapid retrieval of genomic annotations from the convenience of a relational database system. Range-based annotations of 1 million variants are retrieved in under a minute, making this approach feasible for whole-genome annotation tasks. Database URL: https://github.com/bushlab/mynclist.
Levy, C.; Beauchamp, C.
1996-01-01
This poster describes the methods used and working prototype that was developed from an abstraction of the relational model from the VA's hierarchical DHCP database. Overlaying the relational model on DHCP permits multiple user views of the physical data structure, enhances access to the database by providing a link to commercial (SQL based) software, and supports a conceptual managed care data model based on primary and longitudinal patient care. The goal of this work was to create a relational abstraction of the existing hierarchical database; to construct, using SQL data definition language, user views of the database which reflect the clinical conceptual view of DHCP, and to allow the user to work directly with the logical view of the data using GUI based commercial software of their choosing. The workstation is intended to serve as a platform from which a managed care information model could be implemented and evaluated.
NoSQL: collection document and cloud by using a dynamic web query form
NASA Astrophysics Data System (ADS)
Abdalla, Hemn B.; Lin, Jinzhao; Li, Guoquan
2015-07-01
Mongo-DB (from "humongous") is an open-source document database and the leading NoSQL database. A NoSQL (Not Only SQL, next generation databases, being non-relational, deal, open-source and horizontally scalable) presenting a mechanism for storage and retrieval of documents. Previously, we stored and retrieved the data using the SQL queries. Here, we use the MonogoDB that means we are not utilizing the MySQL and SQL queries. Directly importing the documents into our Drives, retrieving the documents on that drive by not applying the SQL queries, using the IO BufferReader and Writer, BufferReader for importing our type of document files to my folder (Drive). For retrieving the document files, the usage is BufferWriter from the particular folder (or) Drive. In this sense, providing the security for those storing files for what purpose means if we store the documents in our local folder means all or views that file and modified that file. So preventing that file, we are furnishing the security. The original document files will be changed to another format like in this paper; Binary format is used. Our documents will be converting to the binary format after that direct storing in one of our folder, that time the storage space will provide the private key for accessing that file. Wherever any user tries to discover the Document files means that file data are in the binary format, the document's file owner simply views that original format using that personal key from receive the secret key from the cloud.
Epstein, Richard H; Dexter, Franklin
2017-07-01
Comorbidity adjustment is often performed during outcomes and health care resource utilization research. Our goal was to develop an efficient algorithm in structured query language (SQL) to determine the Elixhauser comorbidity index. We wrote an SQL algorithm to calculate the Elixhauser comorbidities from Diagnosis Related Group and International Classification of Diseases (ICD) codes. Validation was by comparison to expected comorbidities from combinations of these codes and to the 2013 Nationwide Readmissions Database (NRD). The SQL algorithm matched perfectly with expected comorbidities for all combinations of ICD-9 or ICD-10, and Diagnosis Related Groups. Of 13 585 859 evaluable NRD records, the algorithm matched 100% of the listed comorbidities. Processing time was ∼0.05 ms/record. The SQL Elixhauser code was efficient and computationally identical to the SAS algorithm used for the NRD. This algorithm may be useful where preprocessing of large datasets in a relational database environment and comorbidity determination is desired before statistical analysis. A validated SQL procedure to calculate Elixhauser comorbidities and the van Walraven index from ICD-9 or ICD-10 discharge diagnosis codes has been published. © The Author 2017. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Joint Battlespace Infosphere: Information Management Within a C2 Enterprise
2005-06-01
using. In version 1.2, we support both MySQL and Oracle as underlying implementations where the XML metadata schema is mapped into relational tables in...Identity Servers, Role-Based Access Control, and Policy Representation – Databases: Oracle , MySQL , TigerLogic, Berkeley XML DB 15 Instrumentation Services...converted to SQL for execution. Invocations are then forwarded to the appropriate underlying IOR core components that have the responsibility of issuing
Photo-z-SQL: Photometric redshift estimation framework
NASA Astrophysics Data System (ADS)
Beck, Róbert; Dobos, László; Budavári, Tamás; Szalay, Alexander S.; Csabai, István
2017-04-01
Photo-z-SQL is a flexible template-based photometric redshift estimation framework that can be seamlessly integrated into a SQL database (or DB) server and executed on demand in SQL. The DB integration eliminates the need to move large photometric datasets outside a database for redshift estimation, and uses the computational capabilities of DB hardware. Photo-z-SQL performs both maximum likelihood and Bayesian estimation and handles inputs of variable photometric filter sets and corresponding broad-band magnitudes.
The HARPS-N archive through a Cassandra, NoSQL database suite?
NASA Astrophysics Data System (ADS)
Molinari, Emilio; Guerra, Jose; Harutyunyan, Avet; Lodi, Marcello; Martin, Adrian
2016-07-01
The TNG-INAF is developing the science archive for the WEAVE instrument. The underlying architecture of the archive is based on a non relational database, more precisely, on Apache Cassandra cluster, which uses a NoSQL technology. In order to test and validate the use of this architecture, we created a local archive which we populated with all the HARPSN spectra collected at the TNG since the instrument's start of operations in mid-2012, as well as developed tools for the analysis of this data set. The HARPS-N data set is two orders of magnitude smaller than WEAVE, but we want to demonstrate the ability to walk through a complete data set and produce scientific output, as valuable as that produced by an ordinary pipeline, though without accessing directly the FITS files. The analytics is done by Apache Solr and Spark and on a relational PostgreSQL database. As an example, we produce observables like metallicity indexes for the targets in the archive and compare the results with the ones coming from the HARPS-N regular data reduction software. The aim of this experiment is to explore the viability of a high availability cluster and distributed NoSQL database as a platform for complex scientific analytics on a large data set, which will then be ported to the WEAVE Archive System (WAS) which we are developing for the WEAVE multi object, fiber spectrograph.
Database on Demand: insight how to build your own DBaaS
NASA Astrophysics Data System (ADS)
Gaspar Aparicio, Ruben; Coterillo Coz, Ignacio
2015-12-01
At CERN, a number of key database applications are running on user-managed MySQL, PostgreSQL and Oracle database services. The Database on Demand (DBoD) project was born out of an idea to provide CERN user community with an environment to develop and run database services as a complement to the central Oracle based database service. The Database on Demand empowers the user to perform certain actions that had been traditionally done by database administrators, providing an enterprise platform for database applications. It also allows the CERN user community to run different database engines, e.g. presently three major RDBMS (relational database management system) vendors are offered. In this article we show the actual status of the service after almost three years of operations, some insight of our new redesign software engineering and near future evolution.
NASA Astrophysics Data System (ADS)
Knosp, B.; Gangl, M.; Hristova-Veleva, S. M.; Kim, R. M.; Li, P.; Turk, J.; Vu, Q. A.
2015-12-01
The JPL Tropical Cyclone Information System (TCIS) brings together satellite, aircraft, and model forecast data from several NASA, NOAA, and other data centers to assist researchers in comparing and analyzing data and model forecast related to tropical cyclones. The TCIS has been running a near-real time (NRT) data portal during North Atlantic hurricane season that typically runs from June through October each year, since 2010. Data collected by the TCIS varies by type, format, contents, and frequency and is served to the user in two ways: (1) as image overlays on a virtual globe and (2) as derived output from a suite of analysis tools. In order to support these two functions, the data must be collected and then made searchable by criteria such as date, mission, product, pressure level, and geospatial region. Creating a database architecture that is flexible enough to manage, intelligently interrogate, and ultimately present this disparate data to the user in a meaningful way has been the primary challenge. The database solution for the TCIS has been to use a hybrid MySQL + Solr implementation. After testing other relational database and NoSQL solutions, such as PostgreSQL and MongoDB respectively, this solution has given the TCIS the best offerings in terms of query speed and result reliability. This database solution also supports the challenging (and memory overwhelming) geospatial queries that are necessary to support analysis tools requested by users. Though hardly new technologies on their own, our implementation of MySQL + Solr had to be customized and tuned to be able to accurately store, index, and search the TCIS data holdings. In this presentation, we will discuss how we arrived on our MySQL + Solr database architecture, why it offers us the most consistent fast and reliable results, and how it supports our front end so that we can offer users a look into our "big data" holdings.
An SQL query generator for CLIPS
NASA Technical Reports Server (NTRS)
Snyder, James; Chirica, Laurian
1990-01-01
As expert systems become more widely used, their access to large amounts of external information becomes increasingly important. This information exists in several forms such as statistical, tabular data, knowledge gained by experts and large databases of information maintained by companies. Because many expert systems, including CLIPS, do not provide access to this external information, much of the usefulness of expert systems is left untapped. The scope of this paper is to describe a database extension for the CLIPS expert system shell. The current industry standard database language is SQL. Due to SQL standardization, large amounts of information stored on various computers, potentially at different locations, will be more easily accessible. Expert systems should be able to directly access these existing databases rather than requiring information to be re-entered into the expert system environment. The ORACLE relational database management system (RDBMS) was used to provide a database connection within the CLIPS environment. To facilitate relational database access a query generation system was developed as a CLIPS user function. The queries are entered in a CLlPS-like syntax and are passed to the query generator, which constructs and submits for execution, an SQL query to the ORACLE RDBMS. The query results are asserted as CLIPS facts. The query generator was developed primarily for use within the ICADS project (Intelligent Computer Aided Design System) currently being developed by the CAD Research Unit in the California Polytechnic State University (Cal Poly). In ICADS, there are several parallel or distributed expert systems accessing a common knowledge base of facts. Expert system has a narrow domain of interest and therefore needs only certain portions of the information. The query generator provides a common method of accessing this information and allows the expert system to specify what data is needed without specifying how to retrieve it.
Database Entity Persistence with Hibernate for the Network Connectivity Analysis Model
2014-04-01
time savings in the Java coding development process. Appendices A and B describe address setup procedures for installing the MySQL database...development environment is required: • The open source MySQL Database Management System (DBMS) from Oracle, which is a Java Database Connectivity (JDBC...compliant DBMS • MySQL JDBC Driver library that comes as a plug-in with the Netbeans distribution • The latest Java Development Kit with the latest
Teaching Case: Introduction to NoSQL in a Traditional Database Course
ERIC Educational Resources Information Center
Fowler, Brad; Godin, Joy; Geddy, Margaret
2016-01-01
Many organizations are dealing with the increasing demands of big data, so they are turning to NoSQL databases as their preferred system for handling the unique problems of capturing and storing massive amounts of data. Therefore, it is likely that employees in all sizes of organizations will encounter NoSQL databases. Thus, to be more job-ready,…
ERIC Educational Resources Information Center
Lundquist, Carol; Frieder, Ophir; Holmes, David O.; Grossman, David
1999-01-01
Describes a scalable, parallel, relational database-drive information retrieval engine. To support portability across a wide range of execution environments, all algorithms adhere to the SQL-92 standard. By incorporating relevance feedback algorithms, accuracy is enhanced over prior database-driven information retrieval efforts. Presents…
Multi-Resolution Playback of Network Trace Files
2015-06-01
a com- plete MySQL database, C++ developer tools and the libraries utilized in the development of the system (Boost and Libcrafter), and Wireshark...XE suite has a limit to the allowed size of each database. In order to be scalable, the project had to switch to the MySQL database suite. The...programs that access the database use the MySQL C++ connector, provided by Oracle, and the supplied methods and libraries. 4.4 Flow Generator Chapter 3
Constructing a Graph Database for Semantic Literature-Based Discovery.
Hristovski, Dimitar; Kastrin, Andrej; Dinevski, Dejan; Rindflesch, Thomas C
2015-01-01
Literature-based discovery (LBD) generates discoveries, or hypotheses, by combining what is already known in the literature. Potential discoveries have the form of relations between biomedical concepts; for example, a drug may be determined to treat a disease other than the one for which it was intended. LBD views the knowledge in a domain as a network; a set of concepts along with the relations between them. As a starting point, we used SemMedDB, a database of semantic relations between biomedical concepts extracted with SemRep from Medline. SemMedDB is distributed as a MySQL relational database, which has some problems when dealing with network data. We transformed and uploaded SemMedDB into the Neo4j graph database, and implemented the basic LBD discovery algorithms with the Cypher query language. We conclude that storing the data needed for semantic LBD is more natural in a graph database. Also, implementing LBD discovery algorithms is conceptually simpler with a graph query language when compared with standard SQL.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Roberts, D
Purpose: A unified database system was developed to allow accumulation, review and analysis of quality assurance (QA) data for measurement, treatment, imaging and simulation equipment in our department. Recording these data in a database allows a unified and structured approach to review and analysis of data gathered using commercial database tools. Methods: A clinical database was developed to track records of quality assurance operations on linear accelerators, a computed tomography (CT) scanner, high dose rate (HDR) afterloader and imaging systems such as on-board imaging (OBI) and Calypso in our department. The database was developed using Microsoft Access database and visualmore » basic for applications (VBA) programming interface. Separate modules were written for accumulation, review and analysis of daily, monthly and annual QA data. All modules were designed to use structured query language (SQL) as the basis of data accumulation and review. The SQL strings are dynamically re-written at run time. The database also features embedded documentation, storage of documents produced during QA activities and the ability to annotate all data within the database. Tests are defined in a set of tables that define test type, specific value, and schedule. Results: Daily, Monthly and Annual QA data has been taken in parallel with established procedures to test MQA. The database has been used to aggregate data across machines to examine the consistency of machine parameters and operations within the clinic for several months. Conclusion: The MQA application has been developed as an interface to a commercially available SQL engine (JET 5.0) and a standard database back-end. The MQA system has been used for several months for routine data collection.. The system is robust, relatively simple to extend and can be migrated to a commercial SQL server.« less
Providing R-Tree Support for Mongodb
NASA Astrophysics Data System (ADS)
Xiang, Longgang; Shao, Xiaotian; Wang, Dehao
2016-06-01
Supporting large amounts of spatial data is a significant characteristic of modern databases. However, unlike some mature relational databases, such as Oracle and PostgreSQL, most of current burgeoning NoSQL databases are not well designed for storing geospatial data, which is becoming increasingly important in various fields. In this paper, we propose a novel method to provide R-tree index, as well as corresponding spatial range query and nearest neighbour query functions, for MongoDB, one of the most prevalent NoSQL databases. First, after in-depth analysis of MongoDB's features, we devise an efficient tabular document structure which flattens R-tree index into MongoDB collections. Further, relevant mechanisms of R-tree operations are issued, and then we discuss in detail how to integrate R-tree into MongoDB. Finally, we present the experimental results which show that our proposed method out-performs the built-in spatial index of MongoDB. Our research will greatly facilitate big data management issues with MongoDB in a variety of geospatial information applications.
2004-03-01
with MySQL . This choice was made because MySQL is open source. Any significant database engine such as Oracle or MS- SQL or even MS Access can be used...10 Figure 6. The DoD vs . Commercial Life Cycle...necessarily be interested in SCADA network security 13. MySQL (Database server) – This station represents a typical data server for a web page
Methods to Secure Databases Against Vulnerabilities
2015-12-01
for several languages such as C, C++, PHP, Java and Python [16]. MySQL will work well with very large databases. The documentation references...using Eclipse and connected to each database management system using Python and Java drivers provided by MySQL , MongoDB, and Datastax (for Cassandra...tiers in Python and Java . Problem MySQL MongoDB Cassandra 1. Injection a. Tautologies Vulnerable Vulnerable Not Vulnerable b. Illegal query
DESPIC: Detecting Early Signatures of Persuasion in Information Cascades
2015-08-27
over NoSQL Databases, Proceedings of the 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid 2014). 26-MAY-14, . : , P...over NoSQL Databases. Proceedings of the 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid 2014). Chicago, IL, USA...distributed NoSQL databases including HBase and Riak, we finalized the requirements of the optimal computational architecture to support our framework
Scale-Independent Relational Query Processing
2013-10-04
source options are also available, including Postgresql, MySQL , and SQLite. These mod- ern relational databases are generally very complex software systems...and Their Application to Data Stream Management. IGI Global, 2010. [68] George Reese. Database Programming with JDBC and Java , Second Edition. Ed. by
NVST Data Archiving System Based On FastBit NoSQL Database
NASA Astrophysics Data System (ADS)
Liu, Ying-bo; Wang, Feng; Ji, Kai-fan; Deng, Hui; Dai, Wei; Liang, Bo
2014-06-01
The New Vacuum Solar Telescope (NVST) is a 1-meter vacuum solar telescope that aims to observe the fine structures of active regions on the Sun. The main tasks of the NVST are high resolution imaging and spectral observations, including the measurements of the solar magnetic field. The NVST has been collecting more than 20 million FITS files since it began routine observations in 2012 and produces a maximum observational records of 120 thousand files in a day. Given the large amount of files, the effective archiving and retrieval of files becomes a critical and urgent problem. In this study, we implement a new data archiving system for the NVST based on the Fastbit Not Only Structured Query Language (NoSQL) database. Comparing to the relational database (i.e., MySQL; My Structured Query Language), the Fastbit database manifests distinctive advantages on indexing and querying performance. In a large scale database of 40 million records, the multi-field combined query response time of Fastbit database is about 15 times faster and fully meets the requirements of the NVST. Our study brings a new idea for massive astronomical data archiving and would contribute to the design of data management systems for other astronomical telescopes.
High dimensional biological data retrieval optimization with NoSQL technology.
Wang, Shicai; Pandis, Ioannis; Wu, Chao; He, Sijin; Johnson, David; Emam, Ibrahim; Guitton, Florian; Guo, Yike
2014-01-01
High-throughput transcriptomic data generated by microarray experiments is the most abundant and frequently stored kind of data currently used in translational medicine studies. Although microarray data is supported in data warehouses such as tranSMART, when querying relational databases for hundreds of different patient gene expression records queries are slow due to poor performance. Non-relational data models, such as the key-value model implemented in NoSQL databases, hold promise to be more performant solutions. Our motivation is to improve the performance of the tranSMART data warehouse with a view to supporting Next Generation Sequencing data. In this paper we introduce a new data model better suited for high-dimensional data storage and querying, optimized for database scalability and performance. We have designed a key-value pair data model to support faster queries over large-scale microarray data and implemented the model using HBase, an implementation of Google's BigTable storage system. An experimental performance comparison was carried out against the traditional relational data model implemented in both MySQL Cluster and MongoDB, using a large publicly available transcriptomic data set taken from NCBI GEO concerning Multiple Myeloma. Our new key-value data model implemented on HBase exhibits an average 5.24-fold increase in high-dimensional biological data query performance compared to the relational model implemented on MySQL Cluster, and an average 6.47-fold increase on query performance on MongoDB. The performance evaluation found that the new key-value data model, in particular its implementation in HBase, outperforms the relational model currently implemented in tranSMART. We propose that NoSQL technology holds great promise for large-scale data management, in particular for high-dimensional biological data such as that demonstrated in the performance evaluation described in this paper. We aim to use this new data model as a basis for migrating tranSMART's implementation to a more scalable solution for Big Data.
High dimensional biological data retrieval optimization with NoSQL technology
2014-01-01
Background High-throughput transcriptomic data generated by microarray experiments is the most abundant and frequently stored kind of data currently used in translational medicine studies. Although microarray data is supported in data warehouses such as tranSMART, when querying relational databases for hundreds of different patient gene expression records queries are slow due to poor performance. Non-relational data models, such as the key-value model implemented in NoSQL databases, hold promise to be more performant solutions. Our motivation is to improve the performance of the tranSMART data warehouse with a view to supporting Next Generation Sequencing data. Results In this paper we introduce a new data model better suited for high-dimensional data storage and querying, optimized for database scalability and performance. We have designed a key-value pair data model to support faster queries over large-scale microarray data and implemented the model using HBase, an implementation of Google's BigTable storage system. An experimental performance comparison was carried out against the traditional relational data model implemented in both MySQL Cluster and MongoDB, using a large publicly available transcriptomic data set taken from NCBI GEO concerning Multiple Myeloma. Our new key-value data model implemented on HBase exhibits an average 5.24-fold increase in high-dimensional biological data query performance compared to the relational model implemented on MySQL Cluster, and an average 6.47-fold increase on query performance on MongoDB. Conclusions The performance evaluation found that the new key-value data model, in particular its implementation in HBase, outperforms the relational model currently implemented in tranSMART. We propose that NoSQL technology holds great promise for large-scale data management, in particular for high-dimensional biological data such as that demonstrated in the performance evaluation described in this paper. We aim to use this new data model as a basis for migrating tranSMART's implementation to a more scalable solution for Big Data. PMID:25435347
NASA Astrophysics Data System (ADS)
Ivankovic, D.; Dadic, V.
2009-04-01
Some of oceanographic parameters have to be manually inserted into database; some (for example data from CTD probe) are inserted from various files. All this parameters requires visualization, validation and manipulation from research vessel or scientific institution, and also public presentation. For these purposes is developed web based system, containing dynamic sql procedures and java applets. Technology background is Oracle 10g relational database, and Oracle application server. Web interfaces are developed using PL/SQL stored database procedures (mod PL/SQL). Additional parts for data visualization include use of Java applets and JavaScript. Mapping tool is Google maps API (javascript) and as alternative java applet. Graph is realized as dynamically generated web page containing java applet. Mapping tool and graph are georeferenced. That means that click on some part of graph, automatically initiate zoom or marker onto location where parameter was measured. This feature is very useful for data validation. Code for data manipulation and visualization are partially realized with dynamic SQL and that allow as to separate data definition and code for data manipulation. Adding new parameter in system requires only data definition and description without programming interface for this kind of data.
2013-01-01
commercial NoSQL database system. The results show that In-dexedHBase provides a data loading speed that is 6 times faster than Riak, and is...compare it with Riak, a widely adopted commercial NoSQL database system. The results show that In- dexedHBase provides a data loading speed that is 6...events. This chapter describes our research towards building an efficient and scalable storage platform for Truthy. Many existing NoSQL databases
Quality Attribute-Guided Evaluation of NoSQL Databases: A Case Study
2015-01-16
evaluations of NoSQL databases specifically, and big data systems in general, that have become apparent during our study. Keywords—NoSQL, distributed...technology, namely that of big data , software systems [1]. At the heart of big data systems are a collection of database technologies that are more...born organizations such as Google and Amazon [3][4], along with those of numerous other big data innovators, have created a variety of open source and
Global ISR: Toward a Comprehensive Defense Against Unauthorized Code Execution
2010-10-01
implementation using two of the most popular open- source servers: the Apache web server, and the MySQL database server. For Apache, we measure the effect that...utility ab. T o ta l T im e ( s e c ) 0 500 1000 1500 2000 2500 3000 Native Null ISR ISR−MP Fig. 3. The MySQL test-insert bench- mark measures...various SQL operations. The figure draws total execution time as reported by the benchmark utility. Finally, we benchmarked a MySQL database server using
SQL is Dead; Long-live SQL: Relational Database Technology in Science Contexts
NASA Astrophysics Data System (ADS)
Howe, B.; Halperin, D.
2014-12-01
Relational databases are often perceived as a poor fit in science contexts: Rigid schemas, poor support for complex analytics, unpredictable performance, significant maintenance and tuning requirements --- these idiosyncrasies often make databases unattractive in science contexts characterized by heterogeneous data sources, complex analysis tasks, rapidly changing requirements, and limited IT budgets. In this talk, I'll argue that although the value proposition of typical relational database systems are weak in science, the core ideas that power relational databases have become incredibly prolific in open source science software, and are emerging as a universal abstraction for both big data and small data. In addition, I'll talk about two open source systems we are building to "jailbreak" the core technology of relational databases and adapt them for use in science. The first is SQLShare, a Database-as-a-Service system supporting collaborative data analysis and exchange by reducing database use to an Upload-Query-Share workflow with no installation, schema design, or configuration required. The second is Myria, a service that supports much larger scale data, complex analytics, and supports multiple back end systems. Finally, I'll describe some of the ways our collaborators in oceanography, astronomy, biology, fisheries science, and more are using these systems to replace script-based workflows for reasons of performance, flexibility, and convenience.
A UML Profile for Developing Databases that Conform to the Third Manifesto
NASA Astrophysics Data System (ADS)
Eessaar, Erki
The Third Manifesto (TTM) presents the principles of a relational database language that is free of deficiencies and ambiguities of SQL. There are database management systems that are created according to TTM. Developers need tools that support the development of databases by using these database management systems. UML is a widely used visual modeling language. It provides built-in extension mechanism that makes it possible to extend UML by creating profiles. In this paper, we introduce a UML profile for designing databases that correspond to the rules of TTM. We created the first version of the profile by translating existing profiles of SQL database design. After that, we extended and improved the profile. We implemented the profile by using UML CASE system StarUML™. We present an example of using the new profile. In addition, we describe problems that occurred during the profile development.
Reliability Information Analysis Center 1st Quarter 2007, Technical Area Task (TAT) Report
2007-02-05
34* Created new SQL server database for "PC Configuration" web application. Added roles for security closed 4235 and posted application to production. "e Wrote...and ran SQL Server scripts to migrate production databases to new server . "e Created backup jobs for new SQL Server databases. "* Continued...second phase of the TENA demo. Extensive tasking was established and assigned. A TENA interface to EW Server was reaffirmed after some uncertainty about
A Brief Assessment of LC2IEDM, MIST and Web Services for use in Naval Tactical Data Management
2004-07-01
server software, messaging between the client and server, and a database. The MIST database is implemented in an open source DBMS named PostGreSQL ... PostGreSQL had its beginnings at the University of California, Berkley, in 1986 [11]. The development of PostGreSQL has since evolved into a...contact history from the database. DRDC Atlantic TM 2004-148 9 Request Software Request Software Server Side Response from service
Evaluation of NoSQL databases for DIRAC monitoring and beyond
NASA Astrophysics Data System (ADS)
Mathe, Z.; Casajus Ramo, A.; Stagni, F.; Tomassetti, L.
2015-12-01
Nowadays, many database systems are available but they may not be optimized for storing time series data. Monitoring DIRAC jobs would be better done using a database optimised for storing time series data. So far it was done using a MySQL database, which is not well suited for such an application. Therefore alternatives have been investigated. Choosing an appropriate database for storing huge amounts of time series data is not trivial as one must take into account different aspects such as manageability, scalability and extensibility. We compared the performance of Elasticsearch, OpenTSDB (based on HBase) and InfluxDB NoSQL databases, using the same set of machines and the same data. We also evaluated the effort required for maintaining them. Using the LHCb Workload Management System (WMS), based on DIRAC as a use case we set up a new monitoring system, in parallel with the current MySQL system, and we stored the same data into the databases under test. We evaluated Grafana (for OpenTSDB) and Kibana (for ElasticSearch) metrics and graph editors for creating dashboards, in order to have a clear picture on the usability of each candidate. In this paper we present the results of this study and the performance of the selected technology. We also give an outlook of other potential applications of NoSQL databases within the DIRAC project.
Software Application for Supporting the Education of Database Systems
ERIC Educational Resources Information Center
Vágner, Anikó
2015-01-01
The article introduces an application which supports the education of database systems, particularly the teaching of SQL and PL/SQL in Oracle Database Management System environment. The application has two parts, one is the database schema and its content, and the other is a C# application. The schema is to administrate and store the tasks and the…
Using an image-extended relational database to support content-based image retrieval in a PACS.
Traina, Caetano; Traina, Agma J M; Araújo, Myrian R B; Bueno, Josiane M; Chino, Fabio J T; Razente, Humberto; Azevedo-Marques, Paulo M
2005-12-01
This paper presents a new Picture Archiving and Communication System (PACS), called cbPACS, which has content-based image retrieval capabilities. The cbPACS answers range and k-nearest- neighbor similarity queries, employing a relational database manager extended to support images. The images are compared through their features, which are extracted by an image-processing module and stored in the extended relational database. The database extensions were developed aiming at efficiently answering similarity queries by taking advantage of specialized indexing methods. The main concept supporting the extensions is the definition, inside the relational manager, of distance functions based on features extracted from the images. An extension to the SQL language enables the construction of an interpreter that intercepts the extended commands and translates them to standard SQL, allowing any relational database server to be used. By now, the system implemented works on features based on color distribution of the images through normalized histograms as well as metric histograms. Metric histograms are invariant regarding scale, translation and rotation of images and also to brightness transformations. The cbPACS is prepared to integrate new image features, based on texture and shape of the main objects in the image.
Benchmarking database performance for genomic data.
Khushi, Matloob
2015-06-01
Genomic regions represent features such as gene annotations, transcription factor binding sites and epigenetic modifications. Performing various genomic operations such as identifying overlapping/non-overlapping regions or nearest gene annotations are common research needs. The data can be saved in a database system for easy management, however, there is no comprehensive database built-in algorithm at present to identify overlapping regions. Therefore I have developed a novel region-mapping (RegMap) SQL-based algorithm to perform genomic operations and have benchmarked the performance of different databases. Benchmarking identified that PostgreSQL extracts overlapping regions much faster than MySQL. Insertion and data uploads in PostgreSQL were also better, although general searching capability of both databases was almost equivalent. In addition, using the algorithm pair-wise, overlaps of >1000 datasets of transcription factor binding sites and histone marks, collected from previous publications, were reported and it was found that HNF4G significantly co-locates with cohesin subunit STAG1 (SA1).Inc. © 2015 Wiley Periodicals, Inc.
DOMe: A deduplication optimization method for the NewSQL database backups
Wang, Longxiang; Zhu, Zhengdong; Zhang, Xingjun; Wang, Yinfeng
2017-01-01
Reducing duplicated data of database backups is an important application scenario for data deduplication technology. NewSQL is an emerging database system and is now being used more and more widely. NewSQL systems need to improve data reliability by periodically backing up in-memory data, resulting in a lot of duplicated data. The traditional deduplication method is not optimized for the NewSQL server system and cannot take full advantage of hardware resources to optimize deduplication performance. A recent research pointed out that the future NewSQL server will have thousands of CPU cores, large DRAM and huge NVRAM. Therefore, how to utilize these hardware resources to optimize the performance of data deduplication is an important issue. To solve this problem, we propose a deduplication optimization method (DOMe) for NewSQL system backup. To take advantage of the large number of CPU cores in the NewSQL server to optimize deduplication performance, DOMe parallelizes the deduplication method based on the fork-join framework. The fingerprint index, which is the key data structure in the deduplication process, is implemented as pure in-memory hash table, which makes full use of the large DRAM in NewSQL system, eliminating the performance bottleneck problem of fingerprint index existing in traditional deduplication method. The H-store is used as a typical NewSQL database system to implement DOMe method. DOMe is experimentally analyzed by two representative backup data. The experimental results show that: 1) DOMe can reduce the duplicated NewSQL backup data. 2) DOMe significantly improves deduplication performance by parallelizing CDC algorithms. In the case of the theoretical speedup ratio of the server is 20.8, the speedup ratio of DOMe can achieve up to 18; 3) DOMe improved the deduplication throughput by 1.5 times through the pure in-memory index optimization method. PMID:29049307
Evolution of the use of relational and NoSQL databases in the ATLAS experiment
NASA Astrophysics Data System (ADS)
Barberis, D.
2016-09-01
The ATLAS experiment used for many years a large database infrastructure based on Oracle to store several different types of non-event data: time-dependent detector configuration and conditions data, calibrations and alignments, configurations of Grid sites, catalogues for data management tools, job records for distributed workload management tools, run and event metadata. The rapid development of "NoSQL" databases (structured storage services) in the last five years allowed an extended and complementary usage of traditional relational databases and new structured storage tools in order to improve the performance of existing applications and to extend their functionalities using the possibilities offered by the modern storage systems. The trend is towards using the best tool for each kind of data, separating for example the intrinsically relational metadata from payload storage, and records that are frequently updated and benefit from transactions from archived information. Access to all components has to be orchestrated by specialised services that run on front-end machines and shield the user from the complexity of data storage infrastructure. This paper describes this technology evolution in the ATLAS database infrastructure and presents a few examples of large database applications that benefit from it.
Protecting Database Centric Web Services against SQL/XPath Injection Attacks
NASA Astrophysics Data System (ADS)
Laranjeiro, Nuno; Vieira, Marco; Madeira, Henrique
Web services represent a powerful interface for back-end database systems and are increasingly being used in business critical applications. However, field studies show that a large number of web services are deployed with security flaws (e.g., having SQL Injection vulnerabilities). Although several techniques for the identification of security vulnerabilities have been proposed, developing non-vulnerable web services is still a difficult task. In fact, security-related concerns are hard to apply as they involve adding complexity to already complex code. This paper proposes an approach to secure web services against SQL and XPath Injection attacks, by transparently detecting and aborting service invocations that try to take advantage of potential vulnerabilities. Our mechanism was applied to secure several web services specified by the TPC-App benchmark, showing to be 100% effective in stopping attacks, non-intrusive and very easy to use.
Social media based NPL system to find and retrieve ARM data: Concept paper
DOE Office of Scientific and Technical Information (OSTI.GOV)
Devarakonda, Ranjeet; Giansiracusa, Michael T.; Kumar, Jitendra
Information connectivity and retrieval has a role in our daily lives. The most pervasive source of online information is databases. The amount of data is growing at rapid rate and database technology is improving and having a profound effect. Almost all online applications are storing and retrieving information from databases. One challenge in supplying the public with wider access to informational databases is the need for knowledge of database languages like Structured Query Language (SQL). Although the SQL language has been published in many forms, not everybody is able to write SQL queries. Another challenge is that it may notmore » be practical to make the public aware of the structure of the database. There is a need for novice users to query relational databases using their natural language. To solve this problem, many natural language interfaces to structured databases have been developed. The goal is to provide more intuitive method for generating database queries and delivering responses. Social media makes it possible to interact with a wide section of the population. Through this medium, and with the help of Natural Language Processing (NLP) we can make the data of the Atmospheric Radiation Measurement Data Center (ADC) more accessible to the public. We propose an architecture for using Apache Lucene/Solr [1], OpenML [2,3], and Kafka [4] to generate an automated query/response system with inputs from Twitter5, our Cassandra DB, and our log database. Using the Twitter API and NLP we can give the public the ability to ask questions of our database and get automated responses.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Devarakonda, Ranjeet; Giansiracusa, Michael T.; Kumar, Jitendra
Information connectivity and retrieval has a role in our daily lives. The most pervasive source of online information is databases. The amount of data is growing at rapid rate and database technology is improving and having a profound effect. Almost all online applications are storing and retrieving information from databases. One challenge in supplying the public with wider access to informational databases is the need for knowledge of database languages like Structured Query Language (SQL). Although the SQL language has been published in many forms, not everybody is able to write SQL queries. Another challenge is that it may notmore » be practical to make the public aware of the structure of the database. There is a need for novice users to query relational databases using their natural language. To solve this problem, many natural language interfaces to structured databases have been developed. The goal is to provide more intuitive method for generating database queries and delivering responses. Social media makes it possible to interact with a wide section of the population. Through this medium, and with the help of Natural Language Processing (NLP) we can make the data of the Atmospheric Radiation Measurement Data Center (ADC) more accessible to the public. We propose an architecture for using Apache Lucene/Solr [1], OpenML [2,3], and Kafka [4] to generate an automated query/response system with inputs from Twitter5, our Cassandra DB, and our log database. Using the Twitter API and NLP we can give the public the ability to ask questions of our database and get automated responses.« less
A database for coconut crop improvement.
Rajagopal, Velamoor; Manimekalai, Ramaswamy; Devakumar, Krishnamurthy; Rajesh; Karun, Anitha; Niral, Vittal; Gopal, Murali; Aziz, Shamina; Gunasekaran, Marimuthu; Kumar, Mundappurathe Ramesh; Chandrasekar, Arumugam
2005-12-08
Coconut crop improvement requires a number of biotechnology and bioinformatics tools. A database containing information on CG (coconut germplasm), CCI (coconut cultivar identification), CD (coconut disease), MIFSPC (microbial information systems in plantation crops) and VO (vegetable oils) is described. The database was developed using MySQL and PostgreSQL running in Linux operating system. The database interface is developed in PHP, HTML and JAVA. http://www.bioinfcpcri.org.
Assembling proteomics data as a prerequisite for the analysis of large scale experiments
Schmidt, Frank; Schmid, Monika; Thiede, Bernd; Pleißner, Klaus-Peter; Böhme, Martina; Jungblut, Peter R
2009-01-01
Background Despite the complete determination of the genome sequence of a huge number of bacteria, their proteomes remain relatively poorly defined. Beside new methods to increase the number of identified proteins new database applications are necessary to store and present results of large- scale proteomics experiments. Results In the present study, a database concept has been developed to address these issues and to offer complete information via a web interface. In our concept, the Oracle based data repository system SQL-LIMS plays the central role in the proteomics workflow and was applied to the proteomes of Mycobacterium tuberculosis, Helicobacter pylori, Salmonella typhimurium and protein complexes such as 20S proteasome. Technical operations of our proteomics labs were used as the standard for SQL-LIMS template creation. By means of a Java based data parser, post-processed data of different approaches, such as LC/ESI-MS, MALDI-MS and 2-D gel electrophoresis (2-DE), were stored in SQL-LIMS. A minimum set of the proteomics data were transferred in our public 2D-PAGE database using a Java based interface (Data Transfer Tool) with the requirements of the PEDRo standardization. Furthermore, the stored proteomics data were extractable out of SQL-LIMS via XML. Conclusion The Oracle based data repository system SQL-LIMS played the central role in the proteomics workflow concept. Technical operations of our proteomics labs were used as standards for SQL-LIMS templates. Using a Java based parser, post-processed data of different approaches such as LC/ESI-MS, MALDI-MS and 1-DE and 2-DE were stored in SQL-LIMS. Thus, unique data formats of different instruments were unified and stored in SQL-LIMS tables. Moreover, a unique submission identifier allowed fast access to all experimental data. This was the main advantage compared to multi software solutions, especially if personnel fluctuations are high. Moreover, large scale and high-throughput experiments must be managed in a comprehensive repository system such as SQL-LIMS, to query results in a systematic manner. On the other hand, these database systems are expensive and require at least one full time administrator and specialized lab manager. Moreover, the high technical dynamics in proteomics may cause problems to adjust new data formats. To summarize, SQL-LIMS met the requirements of proteomics data handling especially in skilled processes such as gel-electrophoresis or mass spectrometry and fulfilled the PSI standardization criteria. The data transfer into a public domain via DTT facilitated validation of proteomics data. Additionally, evaluation of mass spectra by post-processing using MS-Screener improved the reliability of mass analysis and prevented storage of data junk. PMID:19166578
Reactive Aggregate Model Protecting Against Real-Time Threats
2014-09-01
on the underlying functionality of three core components. • MS SQL server 2008 backend database. • Microsoft IIS running on Windows server 2008...services. The capstone tested a Linux-based Apache web server with the following software implementations: • MySQL as a Linux-based backend server for...malicious compromise. 1. Assumptions • GINA could connect to a backend MS SQL database through proper configuration of DotNetNuke. • GINA had access
2017-01-01
Reusing the data from healthcare information systems can effectively facilitate clinical trials (CTs). How to select candidate patients eligible for CT recruitment criteria is a central task. Related work either depends on DBA (database administrator) to convert the recruitment criteria to native SQL queries or involves the data mapping between a standard ontology/information model and individual data source schema. This paper proposes an alternative computer-aided CT recruitment paradigm, based on syntax translation between different DSLs (domain-specific languages). In this paradigm, the CT recruitment criteria are first formally represented as production rules. The referenced rule variables are all from the underlying database schema. Then the production rule is translated to an intermediate query-oriented DSL (e.g., LINQ). Finally, the intermediate DSL is directly mapped to native database queries (e.g., SQL) automated by ORM (object-relational mapping). PMID:29065644
Zhang, Yinsheng; Zhang, Guoming; Shang, Qian
2017-01-01
Reusing the data from healthcare information systems can effectively facilitate clinical trials (CTs). How to select candidate patients eligible for CT recruitment criteria is a central task. Related work either depends on DBA (database administrator) to convert the recruitment criteria to native SQL queries or involves the data mapping between a standard ontology/information model and individual data source schema. This paper proposes an alternative computer-aided CT recruitment paradigm, based on syntax translation between different DSLs (domain-specific languages). In this paradigm, the CT recruitment criteria are first formally represented as production rules. The referenced rule variables are all from the underlying database schema. Then the production rule is translated to an intermediate query-oriented DSL (e.g., LINQ). Finally, the intermediate DSL is directly mapped to native database queries (e.g., SQL) automated by ORM (object-relational mapping).
Distributed Episodic Exploratory Planning (DEEP)
2008-12-01
API). For DEEP, Hibernate offered the following advantages: • Abstracts SQL by utilizing HQL so any database with a Java Database Connectivity... Hibernate SQL ICCRTS International Command and Control Research and Technology Symposium JDB Java Distributed Blackboard JDBC Java Database Connectivity...selected because of its opportunistic reasoning capabilities and implemented in Java for platform independence. Java was chosen for ease of
ExplorEnz: a MySQL database of the IUBMB enzyme nomenclature.
McDonald, Andrew G; Boyce, Sinéad; Moss, Gerard P; Dixon, Henry B F; Tipton, Keith F
2007-07-27
We describe the database ExplorEnz, which is the primary repository for EC numbers and enzyme data that are being curated on behalf of the IUBMB. The enzyme nomenclature is incorporated into many other resources, including the ExPASy-ENZYME, BRENDA and KEGG bioinformatics databases. The data, which are stored in a MySQL database, preserve the formatting of chemical and enzyme names. A simple, easy to use, web-based query interface is provided, along with an advanced search engine for more complex queries. The database is publicly available at http://www.enzyme-database.org. The data are available for download as SQL and XML files via FTP. ExplorEnz has powerful and flexible search capabilities and provides the scientific community with the most up-to-date version of the IUBMB Enzyme List.
A database for coconut crop improvement
Rajagopal, Velamoor; Manimekalai, Ramaswamy; Devakumar, Krishnamurthy; Rajesh; Karun, Anitha; Niral, Vittal; Gopal, Murali; Aziz, Shamina; Gunasekaran, Marimuthu; Kumar, Mundappurathe Ramesh; Chandrasekar, Arumugam
2005-01-01
Coconut crop improvement requires a number of biotechnology and bioinformatics tools. A database containing information on CG (coconut germplasm), CCI (coconut cultivar identification), CD (coconut disease), MIFSPC (microbial information systems in plantation crops) and VO (vegetable oils) is described. The database was developed using MySQL and PostgreSQL running in Linux operating system. The database interface is developed in PHP, HTML and JAVA. Availability http://www.bioinfcpcri.org PMID:17597858
Survey data and metadata modelling using document-oriented NoSQL
NASA Astrophysics Data System (ADS)
Rahmatuti Maghfiroh, Lutfi; Gusti Bagus Baskara Nugraha, I.
2018-03-01
Survey data that are collected from year to year have metadata change. However it need to be stored integratedly to get statistical data faster and easier. Data warehouse (DW) can be used to solve this limitation. However there is a change of variables in every period that can not be accommodated by DW. Traditional DW can not handle variable change via Slowly Changing Dimension (SCD). Previous research handle the change of variables in DW to manage metadata by using multiversion DW (MVDW). MVDW is designed using relational model. Some researches also found that developing nonrelational model in NoSQL database has reading time faster than the relational model. Therefore, we propose changes to metadata management by using NoSQL. This study proposes a model DW to manage change and algorithms to retrieve data with metadata changes. Evaluation of the proposed models and algorithms result in that database with the proposed design can retrieve data with metadata changes properly. This paper has contribution in comprehensive data analysis with metadata changes (especially data survey) in integrated storage.
The Protein Disease Database of human body fluids: II. Computer methods and data issues.
Lemkin, P F; Orr, G A; Goldstein, M P; Creed, G J; Myrick, J E; Merril, C R
1995-01-01
The Protein Disease Database (PDD) is a relational database of proteins and diseases. With this database it is possible to screen for quantitative protein abnormalities associated with disease states. These quantitative relationships use data drawn from the peer-reviewed biomedical literature. Assays may also include those observed in high-resolution electrophoretic gels that offer the potential to quantitate many proteins in a single test as well as data gathered by enzymatic or immunologic assays. We are using the Internet World Wide Web (WWW) and the Web browser paradigm as an access method for wide distribution and querying of the Protein Disease Database. The WWW hypertext transfer protocol and its Common Gateway Interface make it possible to build powerful graphical user interfaces that can support easy-to-use data retrieval using query specification forms or images. The details of these interactions are totally transparent to the users of these forms. Using a client-server SQL relational database, user query access, initial data entry and database maintenance are all performed over the Internet with a Web browser. We discuss the underlying design issues, mapping mechanisms and assumptions that we used in constructing the system, data entry, access to the database server, security, and synthesis of derived two-dimensional gel image maps and hypertext documents resulting from SQL database searches.
2009-01-01
Oracle 9i, 10g MySQL MS SQL Server MS SQL Server Operating System Supported Windows 2003 Server Windows 2000 Server (32 bit...WebStar (Mac OS X) SunOne Internet Information Services (IIS) Database Server Supported MS SQL Server MS SQL Server Oracle 9i, 10g...challenges of Web-based surveys are: 1) identifying the best Commercial Off the Shelf (COTS) Web-based survey packages to serve the particular
2001-09-01
replication) -- all from Visual Basic and VBA . In fact, we found that the SQL Server engine actually had a plethora of options, most formidable of...2002, the new SQL Server 2000 database engine, and Microsoft Visual Basic.NET. This thesis describes our use of the Spiral Development Model to...versions of Microsoft products? Specifically, the pending release of Microsoft Office 2002, the new SQL Server 2000 database engine, and Microsoft
Spectrum Savings from High Performance Recording and Playback Onboard the Test Article
2013-02-20
execute within a Windows 7 environment, and data is recorded on SSDs. The underlying database is implemented using MySQL . Figure 1 illustrates the... MySQL database. This is effectively the time at which the recorded data are available for retransmission. CPU and Memory utilization were collected...17.7% MySQL avg. 3.9% EQDR Total avg. 21.6% Table 1 CPU Utilization with260 Mbits/sec Load The difference between the total System CPU (27.8
Using Virtual Servers to Teach the Implementation of Enterprise-Level DBMSs: A Teaching Note
ERIC Educational Resources Information Center
Wagner, William P.; Pant, Vik
2010-01-01
One of the areas where demand has remained strong for MIS students is in the area of database management. Since the early days, this topic has been a mainstay in the MIS curriculum. Students of database management today typically learn about relational databases, SQL, normalization, and how to design and implement various kinds of database…
2014-09-01
NoSQL Data Store Technologies John Klein, Software Engineering Institute Patrick Donohoe, Software Engineering Institute Neil Ernst...REPORT TYPE N/A 3. DATES COVERED 4. TITLE AND SUBTITLE NoSQL Data Store Technologies 5a. CONTRACT NUMBER 5b. GRANT NUMBER 5c. PROGRAM ELEMENT...distribute data 4. Data Replication – determines how a NoSQL database facilitates reliable, high performance data replication to build
ExplorEnz: a MySQL database of the IUBMB enzyme nomenclature
McDonald, Andrew G; Boyce, Sinéad; Moss, Gerard P; Dixon, Henry BF; Tipton, Keith F
2007-01-01
Background We describe the database ExplorEnz, which is the primary repository for EC numbers and enzyme data that are being curated on behalf of the IUBMB. The enzyme nomenclature is incorporated into many other resources, including the ExPASy-ENZYME, BRENDA and KEGG bioinformatics databases. Description The data, which are stored in a MySQL database, preserve the formatting of chemical and enzyme names. A simple, easy to use, web-based query interface is provided, along with an advanced search engine for more complex queries. The database is publicly available at . The data are available for download as SQL and XML files via FTP. Conclusion ExplorEnz has powerful and flexible search capabilities and provides the scientific community with the most up-to-date version of the IUBMB Enzyme List. PMID:17662133
Petaminer: Using ROOT for efficient data storage in MySQL database
NASA Astrophysics Data System (ADS)
Cranshaw, J.; Malon, D.; Vaniachine, A.; Fine, V.; Lauret, J.; Hamill, P.
2010-04-01
High Energy and Nuclear Physics (HENP) experiments store Petabytes of event data and Terabytes of calibration data in ROOT files. The Petaminer project is developing a custom MySQL storage engine to enable the MySQL query processor to directly access experimental data stored in ROOT files. Our project is addressing the problem of efficient navigation to PetaBytes of HENP experimental data described with event-level TAG metadata, which is required by data intensive physics communities such as the LHC and RHIC experiments. Physicists need to be able to compose a metadata query and rapidly retrieve the set of matching events, where improved efficiency will facilitate the discovery process by permitting rapid iterations of data evaluation and retrieval. Our custom MySQL storage engine enables the MySQL query processor to directly access TAG data stored in ROOT TTrees. As ROOT TTrees are column-oriented, reading them directly provides improved performance over traditional row-oriented TAG databases. Leveraging the flexible and powerful SQL query language to access data stored in ROOT TTrees, the Petaminer approach enables rich MySQL index-building capabilities for further performance optimization.
Interactive DataBase of Cosmic Ray Anisotropy (DB A10)
NASA Astrophysics Data System (ADS)
Asipenka, A.S.; Belov, A.V.; Eroshenko, E.F.; Klepach, E.G.; Oleneva, V.A.; Yake, V.G.
Data on the hourly means of cosmic ray density and anisotropy derived by the GSM method over the 1957-2006 are introduced in to MySQL database. This format allowed an access to data both in local and in the Internet. Using the realized combination of script-language Php and My SQL database the Internet project was created on the access for users data on the CR anisotropy in different formats (http://cr20.izmiran.ru/AnisotropyCR/main.htm/). Usage the sheaf Php and MySQL provides fast receiving data even in the Internet since a request and following process of data are accomplished on the project server. Usage of MySQL basis for the storing data on cosmic ray variations give a possibility to construct requests of different structures, extends the variety of data reflection, makes it possible the conformity data to other systems and usage them in other projects.
2016-03-01
Representational state transfer Java messaging service Java application programming interface (API) Internet relay chat (IRC)/extensible messaging and...JBoss application server or an Apache Tomcat servlet container instance. The relational database management system can be either PostgreSQL or MySQL ... Java library called direct web remoting. This library has been part of the core CACE architecture for quite some time; however, there have not been
Novel Method of Storing and Reconstructing Events at Fermilab E-906/SeaQuest Using a MySQL Database
NASA Astrophysics Data System (ADS)
Hague, Tyler
2010-11-01
Fermilab E-906/SeaQuest is a fixed target experiment at Fermi National Accelerator Laboratory. We are investigating the antiquark asymmetry in the nucleon sea. By examining the ratio of the Drell- Yan cross sections of proton-proton and proton-deuterium collisions we can determine the asymmetry ratio. An essential feature in the development of the analysis software is to update the event reconstruction to modern software tools. We are doing this in a unique way by doing a majority of the calculations within an SQL database. Using a MySQL database allows us to take advantage of off-the-shelf software without sacrificing ROOT compatibility and avoid network bottlenecks with server-side data selection. Using our raw data we create stubs, or partial tracks, at each station which are pieced together to create full tracks. Our reconstruction process uses dynamically created SQL statements to analyze the data. These SQL statements create tables that contain the final reconstructed tracks as well as intermediate values. This poster will explain the reconstruction process and how it is being implemented.
The SQL Server Database for Non Computer Professional Teaching Reform
ERIC Educational Resources Information Center
Liu, Xiangwei
2012-01-01
A summary of the teaching methods of the non-computer professional SQL Server database, analyzes the current situation of the teaching course. According to non computer professional curriculum teaching characteristic, put forward some teaching reform methods, and put it into practice, improve the students' analysis ability, practice ability and…
NASA Astrophysics Data System (ADS)
Hu, Haibin
2017-05-01
Among numerous WEB security issues, SQL injection is the most notable and dangerous. In this study, characteristics and procedures of SQL injection are analyzed, and the method for detecting the SQL injection attack is illustrated. The defense resistance and remedy model of SQL injection attack is established from the perspective of non-intrusive SQL injection attack and defense. Moreover, the ability of resisting the SQL injection attack of the server has been comprehensively improved through the security strategies on operation system, IIS and database, etc.. Corresponding codes are realized. The method is well applied in the actual projects.
Relational Algebra and SQL: Better Together
ERIC Educational Resources Information Center
McMaster, Kirby; Sambasivam, Samuel; Hadfield, Steven; Wolthuis, Stuart
2013-01-01
In this paper, we describe how database instructors can teach Relational Algebra and Structured Query Language together through programming. Students write query programs consisting of sequences of Relational Algebra operations vs. Structured Query Language SELECT statements. The query programs can then be run interactively, allowing students to…
Assessment & Commitment Tracking System (ACTS)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bryant, Robert A.; Childs, Teresa A.; Miller, Michael A.
2004-12-20
The ACTS computer code provides a centralized tool for planning and scheduling assessments, tracking and managing actions associated with assessments or that result from an event or condition, and "mining" data for reporting and analyzing information for improving performance. The ACTS application is designed to work with the MS SQL database management system. All database interfaces are written in SQL. The following software is used to develop and support the ACTS application: Cold Fusion HTML JavaScript Quest TOAD Microsoft Visual Source Safe (VSS) HTML Mailer for sending email Microsoft SQL Microsoft Internet Information Server
An Array Library for Microsoft SQL Server with Astrophysical Applications
NASA Astrophysics Data System (ADS)
Dobos, L.; Szalay, A. S.; Blakeley, J.; Falck, B.; Budavári, T.; Csabai, I.
2012-09-01
Today's scientific simulations produce output on the 10-100 TB scale. This unprecedented amount of data requires data handling techniques that are beyond what is used for ordinary files. Relational database systems have been successfully used to store and process scientific data, but the new requirements constantly generate new challenges. Moving terabytes of data among servers on a timely basis is a tough problem, even with the newest high-throughput networks. Thus, moving the computations as close to the data as possible and minimizing the client-server overhead are absolutely necessary. At least data subsetting and preprocessing have to be done inside the server process. Out of the box commercial database systems perform very well in scientific applications from the prospective of data storage optimization, data retrieval, and memory management but lack basic functionality like handling scientific data structures or enabling advanced math inside the database server. The most important gap in Microsoft SQL Server is the lack of a native array data type. Fortunately, the technology exists to extend the database server with custom-written code that enables us to address these problems. We present the prototype of a custom-built extension to Microsoft SQL Server that adds array handling functionality to the database system. With our Array Library, fix-sized arrays of all basic numeric data types can be created and manipulated efficiently. Also, the library is designed to be able to be seamlessly integrated with the most common math libraries, such as BLAS, LAPACK, FFTW, etc. With the help of these libraries, complex operations, such as matrix inversions or Fourier transformations, can be done on-the-fly, from SQL code, inside the database server process. We are currently testing the prototype with two different scientific data sets: The Indra cosmological simulation will use it to store particle and density data from N-body simulations, and the Milky Way Laboratory project will use it to store galaxy simulation data.
NoSQL technologies for the CMS Conditions Database
NASA Astrophysics Data System (ADS)
Sipos, Roland
2015-12-01
With the restart of the LHC in 2015, the growth of the CMS Conditions dataset will continue, therefore the need of consistent and highly available access to the Conditions makes a great cause to revisit different aspects of the current data storage solutions. We present a study of alternative data storage backends for the Conditions Databases, by evaluating some of the most popular NoSQL databases to support a key-value representation of the CMS Conditions. The definition of the database infrastructure is based on the need of storing the conditions as BLOBs. Because of this, each condition can reach the size that may require special treatment (splitting) in these NoSQL databases. As big binary objects may be problematic in several database systems, and also to give an accurate baseline, a testing framework extension was implemented to measure the characteristics of the handling of arbitrary binary data in these databases. Based on the evaluation, prototypes of a document store, using a column-oriented and plain key-value store, are deployed. An adaption layer to access the backends in the CMS Offline software was developed to provide transparent support for these NoSQL databases in the CMS context. Additional data modelling approaches and considerations in the software layer, deployment and automatization of the databases are also covered in the research. In this paper we present the results of the evaluation as well as a performance comparison of the prototypes studied.
Incremental Query Rewriting with Resolution
NASA Astrophysics Data System (ADS)
Riazanov, Alexandre; Aragão, Marcelo A. T.
We address the problem of semantic querying of relational databases (RDB) modulo knowledge bases using very expressive knowledge representation formalisms, such as full first-order logic or its various fragments. We propose to use a resolution-based first-order logic (FOL) reasoner for computing schematic answers to deductive queries, with the subsequent translation of these schematic answers to SQL queries which are evaluated using a conventional relational DBMS. We call our method incremental query rewriting, because an original semantic query is rewritten into a (potentially infinite) series of SQL queries. In this chapter, we outline the main idea of our technique - using abstractions of databases and constrained clauses for deriving schematic answers, and provide completeness and soundness proofs to justify the applicability of this technique to the case of resolution for FOL without equality. The proposed method can be directly used with regular RDBs, including legacy databases. Moreover, we propose it as a potential basis for an efficient Web-scale semantic search technology.
Comparing IndexedHBase and Riak for Serving Truthy: Performance of Data Loading and Query Evaluation
2013-08-01
Research Triangle Park, NC 27709-2211 15. SUBJECT TERMS performance evaluation, distributed database, noSQL , HBase, indexing Xiaoming Gao, Judy Qiu...common hashtags created during a given time window. With the purpose of finding a solution for these challenges, we evaluate NoSQL databases such as
2012-11-27
with powerful analysis tools and an informatics approach leveraging best-of-breed NoSQL databases, in order to store, search and retrieve relevant...dictionaries, and JavaScript also has good support. The MongoDB project[15] was chosen as a scalable NoSQL data store for the cheminfor- matics components
Sujansky, Walter V; Faus, Sam A; Stone, Ethan; Brennan, Patricia Flatley
2010-10-01
Online personal health records (PHRs) enable patients to access, manage, and share certain of their own health information electronically. This capability creates the need for precise access-controls mechanisms that restrict the sharing of data to that intended by the patient. The authors describe the design and implementation of an access-control mechanism for PHR repositories that is modeled on the eXtensible Access Control Markup Language (XACML) standard, but intended to reduce the cognitive and computational complexity of XACML. The authors implemented the mechanism entirely in a relational database system using ANSI-standard SQL statements. Based on a set of access-control rules encoded as relational table rows, the mechanism determines via a single SQL query whether a user who accesses patient data from a specific application is authorized to perform a requested operation on a specified data object. Testing of this query on a moderately large database has demonstrated execution times consistently below 100ms. The authors include the details of the implementation, including algorithms, examples, and a test database as Supplementary materials. Copyright © 2010 Elsevier Inc. All rights reserved.
Respiratory cancer database: An open access database of respiratory cancer gene and miRNA.
Choubey, Jyotsna; Choudhari, Jyoti Kant; Patel, Ashish; Verma, Mukesh Kumar
2017-01-01
Respiratory cancer database (RespCanDB) is a genomic and proteomic database of cancer of respiratory organ. It also includes the information of medicinal plants used for the treatment of various respiratory cancers with structure of its active constituents as well as pharmacological and chemical information of drug associated with various respiratory cancers. Data in RespCanDB has been manually collected from published research article and from other databases. Data has been integrated using MySQL an object-relational database management system. MySQL manages all data in the back-end and provides commands to retrieve and store the data into the database. The web interface of database has been built in ASP. RespCanDB is expected to contribute to the understanding of scientific community regarding respiratory cancer biology as well as developments of new way of diagnosing and treating respiratory cancer. Currently, the database consist the oncogenomic information of lung cancer, laryngeal cancer, and nasopharyngeal cancer. Data for other cancers, such as oral and tracheal cancers, will be added in the near future. The URL of RespCanDB is http://ridb.subdic-bioinformatics-nitrr.in/.
Supporting temporal queries on clinical relational databases: the S-WATCH-QL language.
Combi, C.; Missora, L.; Pinciroli, F.
1996-01-01
Due to the ubiquitous and special nature of time, specially in clinical datábases there's the need of particular temporal data and operators. In this paper we describe S-WATCH-QL (Structured Watch Query Language), a temporal extension of SQL, the widespread query language based on the relational model. S-WATCH-QL extends the well-known SQL by the addition of: a) temporal data types that allow the storage of information with different levels of granularity; b) historical relations that can store together both instantaneous valid times and intervals; c) some temporal clauses, functions and predicates allowing to define complex temporal queries. PMID:8947722
MySQL/PHP web database applications for IPAC proposal submission
NASA Astrophysics Data System (ADS)
Crane, Megan K.; Storrie-Lombardi, Lisa J.; Silbermann, Nancy A.; Rebull, Luisa M.
2008-07-01
The Infrared Processing and Analysis Center (IPAC) is NASA's multi-mission center of expertise for long-wavelength astrophysics. Proposals for various IPAC missions and programs are ingested via MySQL/PHP web database applications. Proposers use web forms to enter coversheet information and upload PDF files related to the proposal. Upon proposal submission, a unique directory is created on the webserver into which all of the uploaded files are placed. The coversheet information is converted into a PDF file using a PHP extension called FPDF. The files are concatenated into one PDF file using the command-line tool pdftk and then forwarded to the review committee. This work was performed at the California Institute of Technology under contract to the National Aeronautics and Space Administration.
Evaluation of Sub Query Performance in SQL Server
NASA Astrophysics Data System (ADS)
Oktavia, Tanty; Sujarwo, Surya
2014-03-01
The paper explores several sub query methods used in a query and their impact on the query performance. The study uses experimental approach to evaluate the performance of each sub query methods combined with indexing strategy. The sub query methods consist of in, exists, relational operator and relational operator combined with top operator. The experimental shows that using relational operator combined with indexing strategy in sub query has greater performance compared with using same method without indexing strategy and also other methods. In summary, for application that emphasized on the performance of retrieving data from database, it better to use relational operator combined with indexing strategy. This study is done on Microsoft SQL Server 2012.
Evolution of the LBT Telemetry System
NASA Astrophysics Data System (ADS)
Summers, K.; Biddick, C.; De La Peña, M. D.; Summers, D.
2014-05-01
The Large Binocular Telescope (LBT) Telescope Control System (TCS) records about 10GB of telemetry data per night. Additionally, the vibration monitoring system records about 9GB of telemetry data per night. Through 2013, we have amassed over 6TB of Hierarchical Data Format (HDF5) files and almost 9TB in a MySQL database of TCS and vibration data. The LBT telemetry system, in its third major revision since 2004, provides the mechanism to capture and store this data. The telemetry system has evolved from a simple HDF file system with MySQL stream definitions within the TCS, to a separate system using a MySQL database system for the definitions and data, and finally to no database use at all, using HDF5 files.
Lessons Learned With a Global Graph and Ozone Widget Framework (OWF) Testbed
2013-05-01
of operating system and database environments. The following is one example. Requirements are: Java 1.6 + and a Relational Database Management...We originally tried to use MySQL as our database, because we were more familiar with it, but since the database dumps as well as most of the...Global Graph Rest Services In order to set up the Global Graph Rest Services, you will need to have the following dependencies installed: Java 1.6
StreptomycesInforSys: A web-enabled information repository
Jain, Chakresh Kumar; Gupta, Vidhi; Gupta, Ashvarya; Gupta, Sanjay; Wadhwa, Gulshan; Sharma, Sanjeev Kumar; Sarethy, Indira P
2012-01-01
Members of Streptomyces produce 70% of natural bioactive products. There is considerable amount of information available based on polyphasic approach for classification of Streptomyces. However, this information based on phenotypic, genotypic and bioactive component production profiles is crucial for pharmacological screening programmes. This is scattered across various journals, books and other resources, many of which are not freely accessible. The designed database incorporates polyphasic typing information using combinations of search options to aid in efficient screening of new isolates. This will help in the preliminary categorization of appropriate groups. It is a free relational database compatible with existing operating systems. A cross platform technology with XAMPP Web server has been used to develop, manage, and facilitate the user query effectively with database support. Employment of PHP, a platform-independent scripting language, embedded in HTML and the database management software MySQL will facilitate dynamic information storage and retrieval. The user-friendly, open and flexible freeware (PHP, MySQL and Apache) is foreseen to reduce running and maintenance cost. Availability www.sis.biowaves.org PMID:23275736
StreptomycesInforSys: A web-enabled information repository.
Jain, Chakresh Kumar; Gupta, Vidhi; Gupta, Ashvarya; Gupta, Sanjay; Wadhwa, Gulshan; Sharma, Sanjeev Kumar; Sarethy, Indira P
2012-01-01
Members of Streptomyces produce 70% of natural bioactive products. There is considerable amount of information available based on polyphasic approach for classification of Streptomyces. However, this information based on phenotypic, genotypic and bioactive component production profiles is crucial for pharmacological screening programmes. This is scattered across various journals, books and other resources, many of which are not freely accessible. The designed database incorporates polyphasic typing information using combinations of search options to aid in efficient screening of new isolates. This will help in the preliminary categorization of appropriate groups. It is a free relational database compatible with existing operating systems. A cross platform technology with XAMPP Web server has been used to develop, manage, and facilitate the user query effectively with database support. Employment of PHP, a platform-independent scripting language, embedded in HTML and the database management software MySQL will facilitate dynamic information storage and retrieval. The user-friendly, open and flexible freeware (PHP, MySQL and Apache) is foreseen to reduce running and maintenance cost. www.sis.biowaves.org.
Database Migration for Command and Control
2002-11-01
Sql - proprietary JDP Private Area Air defense data Defended asset list Oracle 7.3.2 - Automated process (OLTP...TADIL warnings Oracle 7.3.2 Flat File - Discrete transaction with data upds - NRT response required Pull mission data Std SQL ...level execution data Oracle 7.3 User update External interfaces Auto/manual backup Messaging Proprietary replication (internally) SQL Web server
DataSpread: Unifying Databases and Spreadsheets.
Bendre, Mangesh; Sun, Bofan; Zhang, Ding; Zhou, Xinyan; Chang, Kevin ChenChuan; Parameswaran, Aditya
2015-08-01
Spreadsheet software is often the tool of choice for ad-hoc tabular data management, processing, and visualization, especially on tiny data sets. On the other hand, relational database systems offer significant power, expressivity, and efficiency over spreadsheet software for data management, while lacking in the ease of use and ad-hoc analysis capabilities. We demonstrate DataSpread, a data exploration tool that holistically unifies databases and spreadsheets. It continues to offer a Microsoft Excel-based spreadsheet front-end, while in parallel managing all the data in a back-end database, specifically, PostgreSQL. DataSpread retains all the advantages of spreadsheets, including ease of use, ad-hoc analysis and visualization capabilities, and a schema-free nature, while also adding the advantages of traditional relational databases, such as scalability and the ability to use arbitrary SQL to import, filter, or join external or internal tables and have the results appear in the spreadsheet. DataSpread needs to reason about and reconcile differences in the notions of schema, addressing of cells and tuples, and the current "pane" (which exists in spreadsheets but not in traditional databases), and support data modifications at both the front-end and the back-end. Our demonstration will center on our first and early prototype of the DataSpread, and will give the attendees a sense for the enormous data exploration capabilities offered by unifying spreadsheets and databases.
DataSpread: Unifying Databases and Spreadsheets
Bendre, Mangesh; Sun, Bofan; Zhang, Ding; Zhou, Xinyan; Chang, Kevin ChenChuan; Parameswaran, Aditya
2015-01-01
Spreadsheet software is often the tool of choice for ad-hoc tabular data management, processing, and visualization, especially on tiny data sets. On the other hand, relational database systems offer significant power, expressivity, and efficiency over spreadsheet software for data management, while lacking in the ease of use and ad-hoc analysis capabilities. We demonstrate DataSpread, a data exploration tool that holistically unifies databases and spreadsheets. It continues to offer a Microsoft Excel-based spreadsheet front-end, while in parallel managing all the data in a back-end database, specifically, PostgreSQL. DataSpread retains all the advantages of spreadsheets, including ease of use, ad-hoc analysis and visualization capabilities, and a schema-free nature, while also adding the advantages of traditional relational databases, such as scalability and the ability to use arbitrary SQL to import, filter, or join external or internal tables and have the results appear in the spreadsheet. DataSpread needs to reason about and reconcile differences in the notions of schema, addressing of cells and tuples, and the current “pane” (which exists in spreadsheets but not in traditional databases), and support data modifications at both the front-end and the back-end. Our demonstration will center on our first and early prototype of the DataSpread, and will give the attendees a sense for the enormous data exploration capabilities offered by unifying spreadsheets and databases. PMID:26900487
NASA Astrophysics Data System (ADS)
Gaspar Aparicio, R.; Gomez, D.; Coterillo Coz, I.; Wojcik, D.
2012-12-01
At CERN a number of key database applications are running on user-managed MySQL database services. The database on demand project was born out of an idea to provide the CERN user community with an environment to develop and run database services outside of the actual centralised Oracle based database services. The Database on Demand (DBoD) empowers the user to perform certain actions that had been traditionally done by database administrators, DBA's, providing an enterprise platform for database applications. It also allows the CERN user community to run different database engines, e.g. presently open community version of MySQL and single instance Oracle database server. This article describes a technology approach to face this challenge, a service level agreement, the SLA that the project provides, and an evolution of possible scenarios.
2014-06-01
central location. Each of the SQLite databases are converted and stored in one MySQL database and the pcap files are parsed to extract call information...from the specific communications applications used during the experiment. This extracted data is then stored in the same MySQL database. With all...rhythm of the event. Figure 3 demonstrates the application usage over the course of the experiment for the EXDIR. As seen, the EXDIR spent the majority
An Analysis Platform for Mobile Ad Hoc Network (MANET) Scenario Execution Log Data
2016-01-01
these technologies. 4.1 Backend Technologies • Java 1.8 • my-sql-connector- java -5.0.8.jar • Tomcat • VirtualBox • Kali MANET Virtual Machine 4.2...Frontend Technologies • LAMPP 4.3 Database • MySQL Server 5. Database The SEDAP database settings and structure are described in this section...contains all the backend java functionality including the web services, should be placed in the webapps directory inside the Tomcat installation
Data structures and organisation: Special problems in scientific applications
NASA Astrophysics Data System (ADS)
Read, Brian J.
1989-12-01
In this paper we discuss and offer answers to the following questions: What, really, are the benifits of databases in physics? Are scientific databases essentially different from conventional ones? What are the drawbacks of a commercial database management system for use with scientific data? Do they outweigh the advantages? Do databases systems have adequate graphics facilities, or is a separate graphics package necessary? SQL as a standard language has deficiencies, but what are they for scientific data in particular? Indeed, is the relational model appropriate anyway? Or, should we turn to object oriented databases?
EasyKSORD: A Platform of Keyword Search Over Relational Databases
NASA Astrophysics Data System (ADS)
Peng, Zhaohui; Li, Jing; Wang, Shan
Keyword Search Over Relational Databases (KSORD) enables casual users to use keyword queries (a set of keywords) to search relational databases just like searching the Web, without any knowledge of the database schema or any need of writing SQL queries. Based on our previous work, we design and implement a novel KSORD platform named EasyKSORD for users and system administrators to use and manage different KSORD systems in a novel and simple manner. EasyKSORD supports advanced queries, efficient data-graph-based search engines, multiform result presentations, and system logging and analysis. Through EasyKSORD, users can search relational databases easily and read search results conveniently, and system administrators can easily monitor and analyze the operations of KSORD and manage KSORD systems much better.
Using PHP/MySQL to Manage Potential Mass Impacts
NASA Technical Reports Server (NTRS)
Hager, Benjamin I.
2010-01-01
This paper presents a new application using commercially available software to manage mass properties for spaceflight vehicles. PHP/MySQL(PHP: Hypertext Preprocessor and My Structured Query Language) are a web scripting language and a database language commonly used in concert with each other. They open up new opportunities to develop cutting edge mass properties tools, and in particular, tools for the management of potential mass impacts (threats and opportunities). The paper begins by providing an overview of the functions and capabilities of PHP/MySQL. The focus of this paper is on how PHP/MySQL are being used to develop an advanced "web accessible" database system for identifying and managing mass impacts on NASA's Ares I Upper Stage program, managed by the Marshall Space Flight Center. To fully describe this application, examples of the data, search functions, and views are provided to promote, not only the function, but the security, ease of use, simplicity, and eye-appeal of this new application. This paper concludes with an overview of the other potential mass properties applications and tools that could be developed using PHP/MySQL. The premise behind this paper is that PHP/MySQL are software tools that are easy to use and readily available for the development of cutting edge mass properties applications. These tools are capable of providing "real-time" searching and status of an active database, automated report generation, and other capabilities to streamline and enhance mass properties management application. By using PHP/MySQL, proven existing methods for managing mass properties can be adapted to present-day information technology to accelerate mass properties data gathering, analysis, and reporting, allowing mass property management to keep pace with today's fast-pace design and development processes.
Enhanced DIII-D Data Management Through a Relational Database
NASA Astrophysics Data System (ADS)
Burruss, J. R.; Peng, Q.; Schachter, J.; Schissel, D. P.; Terpstra, T. B.
2000-10-01
A relational database is being used to serve data about DIII-D experiments. The database is optimized for queries across multiple shots, allowing for rapid data mining by SQL-literate researchers. The relational database relates different experiments and datasets, thus providing a big picture of DIII-D operations. Users are encouraged to add their own tables to the database. Summary physics quantities about DIII-D discharges are collected and stored in the database automatically. Meta-data about code runs, MDSplus usage, and visualization tool usage are collected, stored in the database, and later analyzed to improve computing. Documentation on the database may be accessed through programming languages such as C, Java, and IDL, or through ODBC compliant applications such as Excel and Access. A database-driven web page also provides a convenient means for viewing database quantities through the World Wide Web. Demonstrations will be given at the poster.
Nosql for Storage and Retrieval of Large LIDAR Data Collections
NASA Astrophysics Data System (ADS)
Boehm, J.; Liu, K.
2015-08-01
Developments in LiDAR technology over the past decades have made LiDAR to become a mature and widely accepted source of geospatial information. This in turn has led to an enormous growth in data volume. The central idea for a file-centric storage of LiDAR point clouds is the observation that large collections of LiDAR data are typically delivered as large collections of files, rather than single files of terabyte size. This split of the dataset, commonly referred to as tiling, was usually done to accommodate a specific processing pipeline. It makes therefore sense to preserve this split. A document oriented NoSQL database can easily emulate this data partitioning, by representing each tile (file) in a separate document. The document stores the metadata of the tile. The actual files are stored in a distributed file system emulated by the NoSQL database. We demonstrate the use of MongoDB a highly scalable document oriented NoSQL database for storing large LiDAR files. MongoDB like any NoSQL database allows for queries on the attributes of the document. As a specialty MongoDB also allows spatial queries. Hence we can perform spatial queries on the bounding boxes of the LiDAR tiles. Inserting and retrieving files on a cloud-based database is compared to native file system and cloud storage transfer speed.
BioWarehouse: a bioinformatics database warehouse toolkit
Lee, Thomas J; Pouliot, Yannick; Wagner, Valerie; Gupta, Priyanka; Stringer-Calvert, David WJ; Tenenbaum, Jessica D; Karp, Peter D
2006-01-01
Background This article addresses the problem of interoperation of heterogeneous bioinformatics databases. Results We introduce BioWarehouse, an open source toolkit for constructing bioinformatics database warehouses using the MySQL and Oracle relational database managers. BioWarehouse integrates its component databases into a common representational framework within a single database management system, thus enabling multi-database queries using the Structured Query Language (SQL) but also facilitating a variety of database integration tasks such as comparative analysis and data mining. BioWarehouse currently supports the integration of a pathway-centric set of databases including ENZYME, KEGG, and BioCyc, and in addition the UniProt, GenBank, NCBI Taxonomy, and CMR databases, and the Gene Ontology. Loader tools, written in the C and JAVA languages, parse and load these databases into a relational database schema. The loaders also apply a degree of semantic normalization to their respective source data, decreasing semantic heterogeneity. The schema supports the following bioinformatics datatypes: chemical compounds, biochemical reactions, metabolic pathways, proteins, genes, nucleic acid sequences, features on protein and nucleic-acid sequences, organisms, organism taxonomies, and controlled vocabularies. As an application example, we applied BioWarehouse to determine the fraction of biochemically characterized enzyme activities for which no sequences exist in the public sequence databases. The answer is that no sequence exists for 36% of enzyme activities for which EC numbers have been assigned. These gaps in sequence data significantly limit the accuracy of genome annotation and metabolic pathway prediction, and are a barrier for metabolic engineering. Complex queries of this type provide examples of the value of the data warehousing approach to bioinformatics research. Conclusion BioWarehouse embodies significant progress on the database integration problem for bioinformatics. PMID:16556315
BioWarehouse: a bioinformatics database warehouse toolkit.
Lee, Thomas J; Pouliot, Yannick; Wagner, Valerie; Gupta, Priyanka; Stringer-Calvert, David W J; Tenenbaum, Jessica D; Karp, Peter D
2006-03-23
This article addresses the problem of interoperation of heterogeneous bioinformatics databases. We introduce BioWarehouse, an open source toolkit for constructing bioinformatics database warehouses using the MySQL and Oracle relational database managers. BioWarehouse integrates its component databases into a common representational framework within a single database management system, thus enabling multi-database queries using the Structured Query Language (SQL) but also facilitating a variety of database integration tasks such as comparative analysis and data mining. BioWarehouse currently supports the integration of a pathway-centric set of databases including ENZYME, KEGG, and BioCyc, and in addition the UniProt, GenBank, NCBI Taxonomy, and CMR databases, and the Gene Ontology. Loader tools, written in the C and JAVA languages, parse and load these databases into a relational database schema. The loaders also apply a degree of semantic normalization to their respective source data, decreasing semantic heterogeneity. The schema supports the following bioinformatics datatypes: chemical compounds, biochemical reactions, metabolic pathways, proteins, genes, nucleic acid sequences, features on protein and nucleic-acid sequences, organisms, organism taxonomies, and controlled vocabularies. As an application example, we applied BioWarehouse to determine the fraction of biochemically characterized enzyme activities for which no sequences exist in the public sequence databases. The answer is that no sequence exists for 36% of enzyme activities for which EC numbers have been assigned. These gaps in sequence data significantly limit the accuracy of genome annotation and metabolic pathway prediction, and are a barrier for metabolic engineering. Complex queries of this type provide examples of the value of the data warehousing approach to bioinformatics research. BioWarehouse embodies significant progress on the database integration problem for bioinformatics.
A Data Warehouse to Support Condition Based Maintenance (CBM)
2005-05-01
Application ( VBA ) code sequence to import the original MAST-generated CSV and then create a single output table in DBASE IV format. The DBASE IV format...database architecture (Oracle, Sybase, MS- SQL , etc). This design includes table definitions, comments, specification of table attributes, primary and foreign...built queries and applications. Needs the application developers to construct data views. No SQL programming experience. b. Power Database User - knows
2016-03-01
science IT information technology JBOD just a bunch of disks JDBC java database connectivity xviii JPME Joint Professional Military Education JSO...Joint Service Officer JVM java virtual machine MPP massively parallel processing MPTE Manpower, Personnel, Training, and Education NAVMAC Navy...27 external database, whether it is MySQL , Oracle, DB2, or SQL Server (Teller, 2015). Connectors optimize the data transfer by obtaining metadata
Quality Attribute-Guided Evaluation of NoSQL Databases: An Experience Report
2014-10-18
detailed technical evaluations of NoSQL databases specifically, and big data systems in general, that have become apparent during our study... big data , software systems [Agarwal 2011]. Internet-born organizations such as Google and Amazon are at the cutting edge of this revolution...Chang 2008], along with those of numerous other big data innovators, have made a variety of open source and commercial data management technologies
Footprint Representation of Planetary Remote Sensing Data
NASA Astrophysics Data System (ADS)
Walter, S. H. G.; Gasselt, S. V.; Michael, G.; Neukum, G.
The geometric outline of remote sensing image data, the so called footprint, can be represented as a number of coordinate tuples. These polygons are associated with according attribute information such as orbit name, ground- and image resolution, solar longitude and illumination conditions to generate a powerful base for classification of planetary experiment data. Speed, handling and extended capabilites are the reasons for using geodatabases to store and access these data types. Techniques for such a spatial database of footprint data are demonstrated using the Relational Database Management System (RDBMS) PostgreSQL, spatially enabled by the PostGIS extension. Exemplary, footprints of the HRSC and OMEGA instruments, both onboard ESA's Mars Express Orbiter, are generated and connected to attribute information. The aim is to provide high-resolution footprints of the OMEGA instrument to the science community for the first time and make them available for web-based mapping applications like the "Planetary Interactive GIS-on-the-Web Analyzable Database" (PIG- WAD), produced by the USGS. Map overlays with HRSC or other instruments like MOC and THEMIS (footprint maps are already available for these instruments and can be integrated into the database) allow on-the-fly intersection and comparison as well as extended statistics of the data. Footprint polygons are generated one by one using standard software provided by the instrument teams. Attribute data is calculated and stored together with the geometric information. In the case of HRSC, the coordinates of the footprints are already available in the VICAR label of each image file. Using the VICAR RTL and PostgreSQL's libpq C library they are loaded into the database using the Well-Known Text (WKT) notation by the Open Geospatial Consortium, Inc. (OGC). For the OMEGA instrument, image data is read using IDL routines developed and distributed by the OMEGA team. Image outlines are exported together with relevant attribute data to the industry standard Shapefile format. These files are translated to a Structured Query Language (SQL) command sequence suitable for insertion into the PostGIS/PostgrSQL database using the shp2pgsql data loader provided by the PostGIS software. PostgreSQL's advanced features such as geometry types, rules, operators and functions allow complex spatial queries and on-the-fly processing of data on DBMS level e.g. generalisation of the outlines. Processing done by the DBMS, visualisation via GIS systems and utilisation for web-based applications like mapservers will be demonstrated.
Alignment of high-throughput sequencing data inside in-memory databases.
Firnkorn, Daniel; Knaup-Gregori, Petra; Lorenzo Bermejo, Justo; Ganzinger, Matthias
2014-01-01
In times of high-throughput DNA sequencing techniques, performance-capable analysis of DNA sequences is of high importance. Computer supported DNA analysis is still an intensive time-consuming task. In this paper we explore the potential of a new In-Memory database technology by using SAP's High Performance Analytic Appliance (HANA). We focus on read alignment as one of the first steps in DNA sequence analysis. In particular, we examined the widely used Burrows-Wheeler Aligner (BWA) and implemented stored procedures in both, HANA and the free database system MySQL, to compare execution time and memory management. To ensure that the results are comparable, MySQL has been running in memory as well, utilizing its integrated memory engine for database table creation. We implemented stored procedures, containing exact and inexact searching of DNA reads within the reference genome GRCh37. Due to technical restrictions in SAP HANA concerning recursion, the inexact matching problem could not be implemented on this platform. Hence, performance analysis between HANA and MySQL was made by comparing the execution time of the exact search procedures. Here, HANA was approximately 27 times faster than MySQL which means, that there is a high potential within the new In-Memory concepts, leading to further developments of DNA analysis procedures in the future.
Experience with ATLAS MySQL PanDA database service
NASA Astrophysics Data System (ADS)
Smirnov, Y.; Wlodek, T.; De, K.; Hover, J.; Ozturk, N.; Smith, J.; Wenaus, T.; Yu, D.
2010-04-01
The PanDA distributed production and analysis system has been in production use for ATLAS data processing and analysis since late 2005 in the US, and globally throughout ATLAS since early 2008. Its core architecture is based on a set of stateless web services served by Apache and backed by a suite of MySQL databases that are the repository for all PanDA information: active and archival job queues, dataset and file catalogs, site configuration information, monitoring information, system control parameters, and so on. This database system is one of the most critical components of PanDA, and has successfully delivered the functional and scaling performance required by PanDA, currently operating at a scale of half a million jobs per week, with much growth still to come. In this paper we describe the design and implementation of the PanDA database system, its architecture of MySQL servers deployed at BNL and CERN, backup strategy and monitoring tools. The system has been developed, thoroughly tested, and brought to production to provide highly reliable, scalable, flexible and available database services for ATLAS Monte Carlo production, reconstruction and physics analysis.
Methods, Knowledge Support, and Experimental Tools for Modeling
2006-10-01
open source software entities: the PostgreSQL relational database management system (http://www.postgres.org), the Apache web server (http...past. The revision control system allows the program to capture disagreements, and allows users to explore the history of such disagreements by
JBioWH: an open-source Java framework for bioinformatics data integration
Vera, Roberto; Perez-Riverol, Yasset; Perez, Sonia; Ligeti, Balázs; Kertész-Farkas, Attila; Pongor, Sándor
2013-01-01
The Java BioWareHouse (JBioWH) project is an open-source platform-independent programming framework that allows a user to build his/her own integrated database from the most popular data sources. JBioWH can be used for intensive querying of multiple data sources and the creation of streamlined task-specific data sets on local PCs. JBioWH is based on a MySQL relational database scheme and includes JAVA API parser functions for retrieving data from 20 public databases (e.g. NCBI, KEGG, etc.). It also includes a client desktop application for (non-programmer) users to query data. In addition, JBioWH can be tailored for use in specific circumstances, including the handling of massive queries for high-throughput analyses or CPU intensive calculations. The framework is provided with complete documentation and application examples and it can be downloaded from the Project Web site at http://code.google.com/p/jbiowh. A MySQL server is available for demonstration purposes at hydrax.icgeb.trieste.it:3307. Database URL: http://code.google.com/p/jbiowh PMID:23846595
JBioWH: an open-source Java framework for bioinformatics data integration.
Vera, Roberto; Perez-Riverol, Yasset; Perez, Sonia; Ligeti, Balázs; Kertész-Farkas, Attila; Pongor, Sándor
2013-01-01
The Java BioWareHouse (JBioWH) project is an open-source platform-independent programming framework that allows a user to build his/her own integrated database from the most popular data sources. JBioWH can be used for intensive querying of multiple data sources and the creation of streamlined task-specific data sets on local PCs. JBioWH is based on a MySQL relational database scheme and includes JAVA API parser functions for retrieving data from 20 public databases (e.g. NCBI, KEGG, etc.). It also includes a client desktop application for (non-programmer) users to query data. In addition, JBioWH can be tailored for use in specific circumstances, including the handling of massive queries for high-throughput analyses or CPU intensive calculations. The framework is provided with complete documentation and application examples and it can be downloaded from the Project Web site at http://code.google.com/p/jbiowh. A MySQL server is available for demonstration purposes at hydrax.icgeb.trieste.it:3307. Database URL: http://code.google.com/p/jbiowh.
CheD: chemical database compilation tool, Internet server, and client for SQL servers.
Trepalin, S V; Yarkov, A V
2001-01-01
An efficient program, which runs on a personal computer, for the storage, retrieval, and processing of chemical information, is presented, The program can work both as a stand-alone application or in conjunction with a specifically written Web server application or with some standard SQL servers, e.g., Oracle, Interbase, and MS SQL. New types of data fields are introduced, e.g., arrays for spectral information storage, HTML and database links, and user-defined functions. CheD has an open architecture; thus, custom data types, controls, and services may be added. A WWW server application for chemical data retrieval features an easy and user-friendly installation on Windows NT or 95 platforms.
NASA Astrophysics Data System (ADS)
Barnsley, R. M.; Steele, Iain A.; Smith, R. J.; Mawson, Neil R.
2014-07-01
The Small Telescopes Installed at the Liverpool Telescope (STILT) project has been in operation since March 2009, collecting data with three wide field unfiltered cameras: SkycamA, SkycamT and SkycamZ. To process the data, a pipeline was developed to automate source extraction, catalogue cross-matching, photometric calibration and database storage. In this paper, modifications and further developments to this pipeline will be discussed, including a complete refactor of the pipeline's codebase into Python, migration of the back-end database technology from MySQL to PostgreSQL, and changing the catalogue used for source cross-matching from USNO-B1 to APASS. In addition to this, details will be given relating to the development of a preliminary front-end to the source extracted database which will allow a user to perform common queries such as cone searches and light curve comparisons of catalogue and non-catalogue matched objects. Some next steps and future ideas for the project will also be presented.
The establishment and use of the point source catalog database of the 2MASS near infrared survey
NASA Astrophysics Data System (ADS)
Gao, Y. F.; Shan, H. G.; Cheng, D.
2003-02-01
The 2MASS near infrared survey project is introduced briefly. The 2MASS point sources catalog (2MASS PSC) database and the network query system are established by using the PHP Hypertext Preprocessor and MySQL database server. By using the system, one can not only query information of sources listed in the catalog, but also draw the plots related. Moreover, after the 2MASS data are diagnosed , some research fields which can be benefited from this database are suggested.
A SQL-Database Based Meta-CASE System and its Query Subsystem
NASA Astrophysics Data System (ADS)
Eessaar, Erki; Sgirka, Rünno
Meta-CASE systems simplify the creation of CASE (Computer Aided System Engineering) systems. In this paper, we present a meta-CASE system that provides a web-based user interface and uses an object-relational database system (ORDBMS) as its basis. The use of ORDBMSs allows us to integrate different parts of the system and simplify the creation of meta-CASE and CASE systems. ORDBMSs provide powerful query mechanism. The proposed system allows developers to use queries to evaluate and gradually improve artifacts and calculate values of software measures. We illustrate the use of the systems by using SimpleM modeling language and discuss the use of SQL in the context of queries about artifacts. We have created a prototype of the meta-CASE system by using PostgreSQL™ ORDBMS and PHP scripting language.
Querying and Computing with BioCyc Databases
Krummenacker, Markus; Paley, Suzanne; Mueller, Lukas; Yan, Thomas; Karp, Peter D.
2006-01-01
Summary We describe multiple methods for accessing and querying the complex and integrated cellular data in the BioCyc family of databases: access through multiple file formats, access through Application Program Interfaces (APIs) for LISP, Perl and Java, and SQL access through the BioWarehouse relational database. Availability The Pathway Tools software and 20 BioCyc DBs in Tiers 1 and 2 are freely available to academic users; fees apply to some types of commercial use. For download instructions see http://BioCyc.org/download.shtml PMID:15961440
2004-06-01
remote databases, has seen little vendor acceptance. Each database ( Oracle , DB2, MySQL , etc.) has its own client- server protocol. Therefore each...existing standards – SQL , X.500/LDAP, FTP, etc. • View information dissemination as selective replication – State-oriented vs . message-oriented...allowing the 8 application to start. The resource management system would serve as a broker to the resources, making sure that resources are not
Delayed Instantiation Bulk Operations for Management of Distributed, Object-Based Storage Systems
2009-08-01
source and destination object sets, while they have attribute pages to indicate that history . Fourth, we allow for operations to occur on any objects...client dialogue to the PostgreSQL database where server-side functions implement the service logic for the requests. The translation is done...to satisfy client requests, and performs delayed instantiation bulk operations. It is built around a PostgreSQL database with tables for storing
Wong, Wing Chung; Kim, Dewey; Carter, Hannah; Diekhans, Mark; Ryan, Michael C; Karchin, Rachel
2011-08-01
Thousands of cancer exomes are currently being sequenced, yielding millions of non-synonymous single nucleotide variants (SNVs) of possible relevance to disease etiology. Here, we provide a software toolkit to prioritize SNVs based on their predicted contribution to tumorigenesis. It includes a database of precomputed, predictive features covering all positions in the annotated human exome and can be used either stand-alone or as part of a larger variant discovery pipeline. MySQL database, source code and binaries freely available for academic/government use at http://wiki.chasmsoftware.org, Source in Python and C++. Requires 32 or 64-bit Linux system (tested on Fedora Core 8,10,11 and Ubuntu 10), 2.5*≤ Python <3.0*, MySQL server >5.0, 60 GB available hard disk space (50 MB for software and data files, 40 GB for MySQL database dump when uncompressed), 2 GB of RAM.
HBVPathDB: a database of HBV infection-related molecular interaction network.
Zhang, Yi; Bo, Xiao-Chen; Yang, Jing; Wang, Sheng-Qi
2005-03-21
To describe molecules or genes interaction between hepatitis B viruses (HBV) and host, for understanding how virus' and host's genes and molecules are networked to form a biological system and for perceiving mechanism of HBV infection. The knowledge of HBV infection-related reactions was organized into various kinds of pathways with carefully drawn graphs in HBVPathDB. Pathway information is stored with relational database management system (DBMS), which is currently the most efficient way to manage large amounts of data and query is implemented with powerful Structured Query Language (SQL). The search engine is written using Personal Home Page (PHP) with SQL embedded and web retrieval interface is developed for searching with Hypertext Markup Language (HTML). We present the first version of HBVPathDB, which is a HBV infection-related molecular interaction network database composed of 306 pathways with 1 050 molecules involved. With carefully drawn graphs, pathway information stored in HBVPathDB can be browsed in an intuitive way. We develop an easy-to-use interface for flexible accesses to the details of database. Convenient software is implemented to query and browse the pathway information of HBVPathDB. Four search page layout options-category search, gene search, description search, unitized search-are supported by the search engine of the database. The database is freely available at http://www.bio-inf.net/HBVPathDB/HBV/. The conventional perspective HBVPathDB have already contained a considerable amount of pathway information with HBV infection related, which is suitable for in-depth analysis of molecular interaction network of virus and host. HBVPathDB integrates pathway data-sets with convenient software for query, browsing, visualization, that provides users more opportunity to identify regulatory key molecules as potential drug targets and to explore the possible mechanism of HBV infection based on gene expression datasets.
Visualization of Vgi Data Through the New NASA Web World Wind Virtual Globe
NASA Astrophysics Data System (ADS)
Brovelli, M. A.; Kilsedar, C. E.; Zamboni, G.
2016-06-01
GeoWeb 2.0, laying the foundations of Volunteered Geographic Information (VGI) systems, has led to platforms where users can contribute to the geographic knowledge that is open to access. Moreover, as a result of the advancements in 3D visualization, virtual globes able to visualize geographic data even on browsers emerged. However the integration of VGI systems and virtual globes has not been fully realized. The study presented aims to visualize volunteered data in 3D, considering also the ease of use aspects for general public, using Free and Open Source Software (FOSS). The new Application Programming Interface (API) of NASA, Web World Wind, written in JavaScript and based on Web Graphics Library (WebGL) is cross-platform and cross-browser, so that the virtual globe created using this API can be accessible through any WebGL supported browser on different operating systems and devices, as a result not requiring any installation or configuration on the client-side, making the collected data more usable to users, which is not the case with the World Wind for Java as installation and configuration of the Java Virtual Machine (JVM) is required. Furthermore, the data collected through various VGI platforms might be in different formats, stored in a traditional relational database or in a NoSQL database. The project developed aims to visualize and query data collected through Open Data Kit (ODK) platform and a cross-platform application, where data is stored in a relational PostgreSQL and NoSQL CouchDB databases respectively.
Open Clients for Distributed Databases
NASA Astrophysics Data System (ADS)
Chayes, D. N.; Arko, R. A.
2001-12-01
We are actively developing a collection of open source example clients that demonstrate use of our "back end" data management infrastructure. The data management system is reported elsewhere at this meeting (Arko and Chayes: A Scaleable Database Infrastructure). In addition to their primary goal of being examples for others to build upon, some of these clients may have limited utility in them selves. More information about the clients and the data infrastructure is available on line at http://data.ldeo.columbia.edu. The available examples to be demonstrated include several web-based clients including those developed for the Community Review System of the Digital Library for Earth System Education, a real-time watch standers log book, an offline interface to use log book entries, a simple client to search on multibeam metadata and others are Internet enabled and generally web-based front ends that support searches against one or more relational databases using industry standard SQL queries. In addition to the web based clients, simple SQL searches from within Excel and similar applications will be demonstrated. By defining, documenting and publishing a clear interface to the fully searchable databases, it becomes relatively easy to construct client interfaces that are optimized for specific applications in comparison to building a monolithic data and user interface system.
Streamlining the Process of Acquiring Secure Open Architecture Software Systems
2013-10-08
Microsoft.NET, Enterprise Java Beans, GNU Lesser General Public License (LGPL) libraries, and data communication protocols like the Hypertext Transfer...NetBeans development environments), customer relationship management (SugarCRM), database management systems (PostgreSQL, MySQL ), operating
The unified database for the fixed target experiment BM@N
NASA Astrophysics Data System (ADS)
Gertsenberger, K. V.
2016-09-01
The article describes the developed database designed as comprehensive data storage of the fixed target experiment BM@N [1] at Joint Institute for Nuclear Research (JINR) in Dubna. The structure and purposes of the BM@N facility will be briefly presented. The scheme of the unified database and its parameters will be described in detail. The use of the BM@N database implemented on the PostgreSQL database management system (DBMS) allows one to provide user access to the actual information of the experiment. Also the interfaces developed for the access to the database will be presented. One was implemented as the set of C++ classes to access the data without SQL statements, the other-Web-interface being available on the Web page of the BM@N experiment.
Life in extra dimensions of database world or penetration of NoSQL in HEP community
NASA Astrophysics Data System (ADS)
Kuznetsov, V.; Evans, D.; Metson, S.
2012-12-01
The recent buzzword in IT world is NoSQL. Major players, such as Facebook, Yahoo, Google, etc. are widely adopted different “NoSQL” solutions for their needs. Horizontal scalability, flexible data model and management of big data volumes are only a few advantages of NoSQL. In CMS experiment we use several of them in production environment. Here, we present CMS projects based on NoSQL solutions, their strengths and weaknesses as well as our experience with those tools and their coexistence with standard RDBMS solutions in our applications.
A Database Design for the Brazilian Air Force Military Personnel Control System.
1987-06-01
GIVEN A RECNum GET MOVING HISTORICAL". 77 SEL4 PlC X(70) VALUE ". 4. GIVEN A RECNUM GET NOMINATION HISTORICAL". 77 SEL5 PIC X(70) VALUE it 5. GIVEN A...WHERE - "°RECNUM = :RECNUM". 77 SQL-SEL3-LENGTH PIC S9999 VALUE 150 COMP. 77 SQL- SEL4 PIC X(150) VALUE "SELECT ABBREV,DTNOM,DTEXO,SITN FROM...NOMINATION WHERE RECNUM 77 SQL- SEL4 -LENGTH PIC S9999 VALUE 150 COMP. 77 SQL-SEL5 PIC X(150) VALUE "SELECT ABBREVDTDES,DTWAIVER,SITD FROM DESIG WHERE RECNUM It
Standard Port-Visit Cost Forecasting Model for U.S. Navy Husbanding Contracts
2009-12-01
Protocol (HTTP) server.35 2. MySQL . An open-source database.36 3. PHP . A common scripting language used for Web development.37 E. IMPLEMENTATION OF...Inc. (2009). MySQL Community Server (Version 5.1) [Software]. Available from http://dev.mysql.com/downloads/ 37 The PHP Group (2009). PHP (Version...Logistics Services MySQL My Structured Query Language NAVSUP Navy Supply Systems Command NC Non-Contract Items NPS Naval Postgraduate
Information Collection using Handheld Devices in Unreliable Networking Environments
2014-06-01
different types of mobile devices that connect wirelessly to a database 8 server. The actual backend database is not important to the mobile clients...Google’s infrastructure and local servers with MySQL and PostgreSQL on the backend (ODK 2014b). (2) Google Fusion Tables are used to do basic link...how we conduct business. Our requirements to share information do not change simply because there is little or no existing infrastructure in our
MicroRNA Gene Regulatory Networks in Peripheral Nerve Sheath Tumors
2013-09-01
3.0 hierarchical clustering of both the X and the Y-axis using Centroid linkage. The resulting clustered matrixes were visualized using Java Treeview...To score potential ceRNA interactions, the 54979 human interactions were loaded into a mySQL database and when the user selects a given mRNA all...on the fly using PHP interactions with mySQL in a similar fashion as previously described in our publicly available databases such as sarcoma
Astronomical Archive at Tartu Observatory
NASA Astrophysics Data System (ADS)
Annuk, K.
2007-10-01
Archiving astronomical data is important task not only at large observatories but also at small observatories. Here we describe the astronomical archive at Tartu Observatory. The archive consists of old photographic plate images, photographic spectrograms, CCD direct--images and CCD spectroscopic data. The photographic plate digitizing project was started in 2005. An on-line database (based on MySQL) was created. The database includes CCD data as well photographic data. A PHP-MySQL interface was written for access to all data.
ERIC Educational Resources Information Center
Bosc, P.; Lietard, L.; Pivert, O.
2003-01-01
Considers flexible querying of relational databases. Highlights include SQL languages and basic aggregate operators; Sugeno's fuzzy integral; evaluation examples; and how and under what conditions other aggregate functions could be applied to fuzzy sets in a flexible query. (Author/LRW)
Dr. John H. Hopps Jr. Research Scholars Program
2014-10-20
Program staff, alumni and existing participants. Over the course of the last five months, SageFox has successfully obtained IRB approval for all...and awards. Progress made in development of the HoppsNet system included design and implementation of a relational database in MySQL , development of
NASA Technical Reports Server (NTRS)
Maluf, David A.; Tran, Peter B.
2003-01-01
Object-Relational database management system is an integrated hybrid cooperative approach to combine the best practices of both the relational model utilizing SQL queries and the object-oriented, semantic paradigm for supporting complex data creation. In this paper, a highly scalable, information on demand database framework, called NETMARK, is introduced. NETMARK takes advantages of the Oracle 8i object-relational database using physical addresses data types for very efficient keyword search of records spanning across both context and content. NETMARK was originally developed in early 2000 as a research and development prototype to solve the vast amounts of unstructured and semi-structured documents existing within NASA enterprises. Today, NETMARK is a flexible, high-throughput open database framework for managing, storing, and searching unstructured or semi-structured arbitrary hierarchal models, such as XML and HTML.
Protein Simulation Data in the Relational Model.
Simms, Andrew M; Daggett, Valerie
2012-10-01
High performance computing is leading to unprecedented volumes of data. Relational databases offer a robust and scalable model for storing and analyzing scientific data. However, these features do not come without a cost-significant design effort is required to build a functional and efficient repository. Modeling protein simulation data in a relational database presents several challenges: the data captured from individual simulations are large, multi-dimensional, and must integrate with both simulation software and external data sites. Here we present the dimensional design and relational implementation of a comprehensive data warehouse for storing and analyzing molecular dynamics simulations using SQL Server.
Protein Simulation Data in the Relational Model
Simms, Andrew M.; Daggett, Valerie
2011-01-01
High performance computing is leading to unprecedented volumes of data. Relational databases offer a robust and scalable model for storing and analyzing scientific data. However, these features do not come without a cost—significant design effort is required to build a functional and efficient repository. Modeling protein simulation data in a relational database presents several challenges: the data captured from individual simulations are large, multi-dimensional, and must integrate with both simulation software and external data sites. Here we present the dimensional design and relational implementation of a comprehensive data warehouse for storing and analyzing molecular dynamics simulations using SQL Server. PMID:23204646
"Science SQL" as a Building Block for Flexible, Standards-based Data Infrastructures
NASA Astrophysics Data System (ADS)
Baumann, Peter
2016-04-01
We have learnt to live with the pain of separating data and metadata into non-interoperable silos. For metadata, we enjoy the flexibility of databases, be they relational, graph, or some other NoSQL. Contrasting this, users still "drown in files" as an unstructured, low-level archiving paradigm. It is time to bridge this chasm which once was technologically induced, but today can be overcome. One building block towards a common re-integrated information space is to support massive multi-dimensional spatio-temporal arrays. These "datacubes" appear as sensor, image, simulation, and statistics data in all science and engineering domains, and beyond. For example, 2-D satellilte imagery, 2-D x/y/t image timeseries and x/y/z geophysical voxel data, and 4-D x/y/z/t climate data contribute to today's data deluge in the Earth sciences. Virtual observatories in the Space sciences routinely generate Petabytes of such data. Life sciences deal with microarray data, confocal microscopy, human brain data, which all fall into the same category. The ISO SQL/MDA (Multi-Dimensional Arrays) candidate standard is extending SQL with modelling and query support for n-D arrays ("datacubes") in a flexible, domain-neutral way. This heralds a new generation of services with new quality parameters, such as flexibility, ease of access, embedding into well-known user tools, and scalability mechanisms that remain completely transparent to users. Technology like the EU rasdaman ("raster data manager") Array Database system can support all of the above examples simultaneously, with one technology. This is practically proven: As of today, rasdaman is in operational use on hundreds of Terabytes of satellite image timeseries datacubes, with transparent query distribution across more than 1,000 nodes. Therefore, Array Databases offering SQL/MDA constitute a natural common building block for next-generation data infrastructures. Being initiator and editor of the standard we present principles, implementation facets, and application examples as a basis for further discussion. Further, we highlight recent implementation progress in parallelization, data distribution, and query optimization showing their effects on real-life use cases.
Nadkarni, P M
1997-08-01
Concept Locator (CL) is a client-server application that accesses a Sybase relational database server containing a subset of the UMLS Metathesaurus for the purpose of retrieval of concepts corresponding to one or more query expressions supplied to it. CL's query grammar permits complex Boolean expressions, wildcard patterns, and parenthesized (nested) subexpressions. CL translates the query expressions supplied to it into one or more SQL statements that actually perform the retrieval. The generated SQL is optimized by the client to take advantage of the strengths of the server's query optimizer, and sidesteps its weaknesses, so that execution is reasonably efficient.
Astronomical Data Processing Using SciQL, an SQL Based Query Language for Array Data
NASA Astrophysics Data System (ADS)
Zhang, Y.; Scheers, B.; Kersten, M.; Ivanova, M.; Nes, N.
2012-09-01
SciQL (pronounced as ‘cycle’) is a novel SQL-based array query language for scientific applications with both tables and arrays as first class citizens. SciQL lowers the entrance fee of adopting relational DBMS (RDBMS) in scientific domains, because it includes functionality often only found in mathematics software packages. In this paper, we demonstrate the usefulness of SciQL for astronomical data processing using examples from the Transient Key Project of the LOFAR radio telescope. In particular, how the LOFAR light-curve database of all detected sources can be constructed, by correlating sources across the spatial, frequency, time and polarisation domains.
Teaching Structured Design of Network Algorithms in Enhanced Versions of SQL
ERIC Educational Resources Information Center
de Brock, Bert
2004-01-01
From time to time developers of (database) applications will encounter, explicitly or implicitly, structures such as trees, graphs, and networks. Such applications can, for instance, relate to bills of material, organization charts, networks of (rail)roads, networks of conduit pipes (e.g., plumbing, electricity), telecom networks, and data…
SQLGEN: a framework for rapid client-server database application development.
Nadkarni, P M; Cheung, K H
1995-12-01
SQLGEN is a framework for rapid client-server relational database application development. It relies on an active data dictionary on the client machine that stores metadata on one or more database servers to which the client may be connected. The dictionary generates dynamic Structured Query Language (SQL) to perform common database operations; it also stores information about the access rights of the user at log-in time, which is used to partially self-configure the behavior of the client to disable inappropriate user actions. SQLGEN uses a microcomputer database as the client to store metadata in relational form, to transiently capture server data in tables, and to allow rapid application prototyping followed by porting to client-server mode with modest effort. SQLGEN is currently used in several production biomedical databases.
Specialized microbial databases for inductive exploration of microbial genome sequences
Fang, Gang; Ho, Christine; Qiu, Yaowu; Cubas, Virginie; Yu, Zhou; Cabau, Cédric; Cheung, Frankie; Moszer, Ivan; Danchin, Antoine
2005-01-01
Background The enormous amount of genome sequence data asks for user-oriented databases to manage sequences and annotations. Queries must include search tools permitting function identification through exploration of related objects. Methods The GenoList package for collecting and mining microbial genome databases has been rewritten using MySQL as the database management system. Functions that were not available in MySQL, such as nested subquery, have been implemented. Results Inductive reasoning in the study of genomes starts from "islands of knowledge", centered around genes with some known background. With this concept of "neighborhood" in mind, a modified version of the GenoList structure has been used for organizing sequence data from prokaryotic genomes of particular interest in China. GenoChore , a set of 17 specialized end-user-oriented microbial databases (including one instance of Microsporidia, Encephalitozoon cuniculi, a member of Eukarya) has been made publicly available. These databases allow the user to browse genome sequence and annotation data using standard queries. In addition they provide a weekly update of searches against the world-wide protein sequences data libraries, allowing one to monitor annotation updates on genes of interest. Finally, they allow users to search for patterns in DNA or protein sequences, taking into account a clustering of genes into formal operons, as well as providing extra facilities to query sequences using predefined sequence patterns. Conclusion This growing set of specialized microbial databases organize data created by the first Chinese bacterial genome programs (ThermaList, Thermoanaerobacter tencongensis, LeptoList, with two different genomes of Leptospira interrogans and SepiList, Staphylococcus epidermidis) associated to related organisms for comparison. PMID:15698474
Database Reports Over the Internet
NASA Technical Reports Server (NTRS)
Smith, Dean Lance
2002-01-01
Most of the summer was spent developing software that would permit existing test report forms to be printed over the web on a printer that is supported by Adobe Acrobat Reader. The data is stored in a DBMS (Data Base Management System). The client asks for the information from the database using an HTML (Hyper Text Markup Language) form in a web browser. JavaScript is used with the forms to assist the user and verify the integrity of the entered data. Queries to a database are made in SQL (Sequential Query Language), a widely supported standard for making queries to databases. Java servlets, programs written in the Java programming language running under the control of network server software, interrogate the database and complete a PDF form template kept in a file. The completed report is sent to the browser requesting the report. Some errors are sent to the browser in an HTML web page, others are reported to the server. Access to the databases was restricted since the data are being transported to new DBMS software that will run on new hardware. However, the SQL queries were made to Microsoft Access, a DBMS that is available on most PCs (Personal Computers). Access does support the SQL commands that were used, and a database was created with Access that contained typical data for the report forms. Some of the problems and features are discussed below.
CHASM and SNVBox: toolkit for detecting biologically important single nucleotide mutations in cancer
Carter, Hannah; Diekhans, Mark; Ryan, Michael C.; Karchin, Rachel
2011-01-01
Summary: Thousands of cancer exomes are currently being sequenced, yielding millions of non-synonymous single nucleotide variants (SNVs) of possible relevance to disease etiology. Here, we provide a software toolkit to prioritize SNVs based on their predicted contribution to tumorigenesis. It includes a database of precomputed, predictive features covering all positions in the annotated human exome and can be used either stand-alone or as part of a larger variant discovery pipeline. Availability and Implementation: MySQL database, source code and binaries freely available for academic/government use at http://wiki.chasmsoftware.org, Source in Python and C++. Requires 32 or 64-bit Linux system (tested on Fedora Core 8,10,11 and Ubuntu 10), 2.5*≤ Python <3.0*, MySQL server >5.0, 60 GB available hard disk space (50 MB for software and data files, 40 GB for MySQL database dump when uncompressed), 2 GB of RAM. Contact: karchin@jhu.edu Supplementary Information: Supplementary data are available at Bioinformatics online. PMID:21685053
Datacube Services in Action, Using Open Source and Open Standards
NASA Astrophysics Data System (ADS)
Baumann, P.; Misev, D.
2016-12-01
Array Databases comprise novel, promising technology for massive spatio-temporal datacubes, extending the SQL paradigm of "any query, anytime" to n-D arrays. On server side, such queries can be optimized, parallelized, and distributed based on partitioned array storage. The rasdaman ("raster data manager") system, which has pioneered Array Databases, is available in open source on www.rasdaman.org. Its declarative query language extends SQL with array operators which are optimized and parallelized on server side. The rasdaman engine, which is part of OSGeo Live, is mature and in operational use databases individually holding dozens of Terabytes. Further, the rasdaman concepts have strongly impacted international Big Data standards in the field, including the forthcoming MDA ("Multi-Dimensional Array") extension to ISO SQL, the OGC Web Coverage Service (WCS) and Web Coverage Processing Service (WCPS) standards, and the forthcoming INSPIRE WCS/WCPS; in both OGC and INSPIRE, OGC is WCS Core Reference Implementation. In our talk we present concepts, architecture, operational services, and standardization impact of open-source rasdaman, as well as experiences made.
SiC: An Agent Based Architecture for Preventing and Detecting Attacks to Ubiquitous Databases
NASA Astrophysics Data System (ADS)
Pinzón, Cristian; de Paz, Yanira; Bajo, Javier; Abraham, Ajith; Corchado, Juan M.
One of the main attacks to ubiquitous databases is the structure query language (SQL) injection attack, which causes severe damages both in the commercial aspect and in the user’s confidence. This chapter proposes the SiC architecture as a solution to the SQL injection attack problem. This is a hierarchical distributed multiagent architecture, which involves an entirely new approach with respect to existing architectures for the prevention and detection of SQL injections. SiC incorporates a kind of intelligent agent, which integrates a case-based reasoning system. This agent, which is the core of the architecture, allows the application of detection techniques based on anomalies as well as those based on patterns, providing a great degree of autonomy, flexibility, robustness and dynamic scalability. The characteristics of the multiagent system allow an architecture to detect attacks from different types of devices, regardless of the physical location. The architecture has been tested on a medical database, guaranteeing safe access from various devices such as PDAs and notebook computers.
Implementation of a Computerized Maintenance Management System
NASA Technical Reports Server (NTRS)
Shen, Yong-Hong; Askari, Bruce
1994-01-01
A primer Computerized Maintenance Management System (CMMS) has been established for NASA Ames pressure component certification program. The CMMS takes full advantage of the latest computer technology and SQL relational database to perform periodic services for vital pressure components. The Ames certification program is briefly described and the aspects of the CMMS implementation are discussed as they are related to the certification objectives.
GEOMAGIA50: An archeointensity database with PHP and MySQL
NASA Astrophysics Data System (ADS)
Korhonen, K.; Donadini, F.; Riisager, P.; Pesonen, L. J.
2008-04-01
The GEOMAGIA50 database stores 3798 archeomagnetic and paleomagnetic intensity determinations dated to the past 50,000 years. It also stores details of the measurement setup for each determination, which are used for ranking the data according to prescribed reliability criteria. The ranking system aims to alleviate the data reliability problem inherent in this kind of data. GEOMAGIA50 is based on two popular open source technologies. The MySQL database management system is used for storing the data, whereas the functionality and user interface are provided by server-side PHP scripts. This technical brief gives a detailed description of GEOMAGIA50 from a technical viewpoint.
Monitoring SLAC High Performance UNIX Computing Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lettsome, Annette K.; /Bethune-Cookman Coll. /SLAC
2005-12-15
Knowledge of the effectiveness and efficiency of computers is important when working with high performance systems. The monitoring of such systems is advantageous in order to foresee possible misfortunes or system failures. Ganglia is a software system designed for high performance computing systems to retrieve specific monitoring information. An alternative storage facility for Ganglia's collected data is needed since its default storage system, the round-robin database (RRD), struggles with data integrity. The creation of a script-driven MySQL database solves this dilemma. This paper describes the process took in the creation and implementation of the MySQL database for use by Ganglia.more » Comparisons between data storage by both databases are made using gnuplot and Ganglia's real-time graphical user interface.« less
NASA Technical Reports Server (NTRS)
Maluf, David A.; Tran, Peter B.
2003-01-01
Object-Relational database management system is an integrated hybrid cooperative approach to combine the best practices of both the relational model utilizing SQL queries and the object-oriented, semantic paradigm for supporting complex data creation. In this paper, a highly scalable, information on demand database framework, called NETMARK, is introduced. NETMARK takes advantages of the Oracle 8i object-relational database using physical addresses data types for very efficient keyword search of records spanning across both context and content. NETMARK was originally developed in early 2000 as a research and development prototype to solve the vast amounts of unstructured and semistructured documents existing within NASA enterprises. Today, NETMARK is a flexible, high-throughput open database framework for managing, storing, and searching unstructured or semi-structured arbitrary hierarchal models, such as XML and HTML.
An Extensible Schema-less Database Framework for Managing High-throughput Semi-Structured Documents
NASA Technical Reports Server (NTRS)
Maluf, David A.; Tran, Peter B.; La, Tracy; Clancy, Daniel (Technical Monitor)
2002-01-01
Object-Relational database management system is an integrated hybrid cooperative approach to combine the best practices of both the relational model utilizing SQL queries and the object oriented, semantic paradigm for supporting complex data creation. In this paper, a highly scalable, information on demand database framework, called NETMARK is introduced. NETMARK takes advantages of the Oracle 8i object-relational database using physical addresses data types for very efficient keyword searches of records for both context and content. NETMARK was originally developed in early 2000 as a research and development prototype to solve the vast amounts of unstructured and semi-structured documents existing within NASA enterprises. Today, NETMARK is a flexible, high throughput open database framework for managing, storing, and searching unstructured or semi structured arbitrary hierarchal models, XML and HTML.
Migration of legacy mumps applications to relational database servers.
O'Kane, K C
2001-07-01
An extended implementation of the Mumps language is described that facilitates vendor neutral migration of legacy Mumps applications to SQL-based relational database servers. Implemented as a compiler, this system translates Mumps programs to operating system independent, standard C code for subsequent compilation to fully stand-alone, binary executables. Added built-in functions and support modules extend the native hierarchical Mumps database with access to industry standard, networked, relational database management servers (RDBMS) thus freeing Mumps applications from dependence upon vendor specific, proprietary, unstandardized database models. Unlike Mumps systems that have added captive, proprietary RDMBS access, the programs generated by this development environment can be used with any RDBMS system that supports common network access protocols. Additional features include a built-in web server interface and the ability to interoperate directly with programs and functions written in other languages.
Photo-z-SQL: Integrated, flexible photometric redshift computation in a database
NASA Astrophysics Data System (ADS)
Beck, R.; Dobos, L.; Budavári, T.; Szalay, A. S.; Csabai, I.
2017-04-01
We present a flexible template-based photometric redshift estimation framework, implemented in C#, that can be seamlessly integrated into a SQL database (or DB) server and executed on-demand in SQL. The DB integration eliminates the need to move large photometric datasets outside a database for redshift estimation, and utilizes the computational capabilities of DB hardware. The code is able to perform both maximum likelihood and Bayesian estimation, and can handle inputs of variable photometric filter sets and corresponding broad-band magnitudes. It is possible to take into account the full covariance matrix between filters, and filter zero points can be empirically calibrated using measurements with given redshifts. The list of spectral templates and the prior can be specified flexibly, and the expensive synthetic magnitude computations are done via lazy evaluation, coupled with a caching of results. Parallel execution is fully supported. For large upcoming photometric surveys such as the LSST, the ability to perform in-place photo-z calculation would be a significant advantage. Also, the efficient handling of variable filter sets is a necessity for heterogeneous databases, for example the Hubble Source Catalog, and for cross-match services such as SkyQuery. We illustrate the performance of our code on two reference photo-z estimation testing datasets, and provide an analysis of execution time and scalability with respect to different configurations. The code is available for download at https://github.com/beckrob/Photo-z-SQL.
Astronomical databases of Nikolaev Observatory
NASA Astrophysics Data System (ADS)
Protsyuk, Y.; Mazhaev, A.
2008-07-01
Several astronomical databases were created at Nikolaev Observatory during the last years. The databases are built by using MySQL search engine and PHP scripts. They are available on NAO web-site http://www.mao.nikolaev.ua.
Creating databases for biological information: an introduction.
Stein, Lincoln
2013-06-01
The essence of bioinformatics is dealing with large quantities of information. Whether it be sequencing data, microarray data files, mass spectrometric data (e.g., fingerprints), the catalog of strains arising from an insertional mutagenesis project, or even large numbers of PDF files, there inevitably comes a time when the information can simply no longer be managed with files and directories. This is where databases come into play. This unit briefly reviews the characteristics of several database management systems, including flat file, indexed file, relational databases, and NoSQL databases. It compares their strengths and weaknesses and offers some general guidelines for selecting an appropriate database management system. Copyright 2013 by JohnWiley & Sons, Inc.
NoSQL data model for semi-automatic integration of ethnomedicinal plant data from multiple sources.
Ningthoujam, Sanjoy Singh; Choudhury, Manabendra Dutta; Potsangbam, Kumar Singh; Chetia, Pankaj; Nahar, Lutfun; Sarker, Satyajit D; Basar, Norazah; Das Talukdar, Anupam
2014-01-01
Sharing traditional knowledge with the scientific community could refine scientific approaches to phytochemical investigation and conservation of ethnomedicinal plants. As such, integration of traditional knowledge with scientific data using a single platform for sharing is greatly needed. However, ethnomedicinal data are available in heterogeneous formats, which depend on cultural aspects, survey methodology and focus of the study. Phytochemical and bioassay data are also available from many open sources in various standards and customised formats. To design a flexible data model that could integrate both primary and curated ethnomedicinal plant data from multiple sources. The current model is based on MongoDB, one of the Not only Structured Query Language (NoSQL) databases. Although it does not contain schema, modifications were made so that the model could incorporate both standard and customised ethnomedicinal plant data format from different sources. The model presented can integrate both primary and secondary data related to ethnomedicinal plants. Accommodation of disparate data was accomplished by a feature of this database that supported a different set of fields for each document. It also allowed storage of similar data having different properties. The model presented is scalable to a highly complex level with continuing maturation of the database, and is applicable for storing, retrieving and sharing ethnomedicinal plant data. It can also serve as a flexible alternative to a relational and normalised database. Copyright © 2014 John Wiley & Sons, Ltd.
Code of Federal Regulations, 2011 CFR
2011-10-01
... Microsoft Open Database Connectivity (ODBC) standard. ODBC is a Windows technology that allows a database software package to import data from a database created using a different software package. We currently...-compatible format. All databases must be supported with adequate documentation on data attributes, SQL...
New Directions in the NOAO Observing Proposal System
NASA Astrophysics Data System (ADS)
Gasson, David; Bell, Dave
For the past eight years NOAO has been refining its on-line observing proposal system. Virtually all related processes are now handled electronically. Members of the astronomical community can submit proposals through email, web form, or via the Gemini Phase I Tool. NOAO staff can use the system to do administrative tasks, scheduling, and compilation of various statistics. In addition, all information relevant to the TAC process is made available on-line, including the proposals themselves (in HTML, PDF and PostScript) and technical comments. Grades and TAC comments are entered and edited through web forms, and can be sorted and filtered according to specified criteria. Current developments include a move away from proprietary solutions, toward open standards such as SQL (in the form of the MySQL relational database system), Perl, PHP and XML.
Integrating Radar Image Data with Google Maps
NASA Technical Reports Server (NTRS)
Chapman, Bruce D.; Gibas, Sarah
2010-01-01
A public Web site has been developed as a method for displaying the multitude of radar imagery collected by NASA s Airborne Synthetic Aperture Radar (AIRSAR) instrument during its 16-year mission. Utilizing NASA s internal AIRSAR site, the new Web site features more sophisticated visualization tools that enable the general public to have access to these images. The site was originally maintained at NASA on six computers: one that held the Oracle database, two that took care of the software for the interactive map, and three that were for the Web site itself. Several tasks were involved in moving this complicated setup to just one computer. First, the AIRSAR database was migrated from Oracle to MySQL. Then the back-end of the AIRSAR Web site was updated in order to access the MySQL database. To do this, a few of the scripts needed to be modified; specifically three Perl scripts that query that database. The database connections were then updated from Oracle to MySQL, numerous syntax errors were corrected, and a query was implemented that replaced one of the stored Oracle procedures. Lastly, the interactive map was designed, implemented, and tested so that users could easily browse and access the radar imagery through the Google Maps interface.
2MASS Catalog Server Kit Version 2.1
NASA Astrophysics Data System (ADS)
Yamauchi, C.
2013-10-01
The 2MASS Catalog Server Kit is open source software for use in easily constructing a high performance search server for important astronomical catalogs. This software utilizes the open source RDBMS PostgreSQL, therefore, any users can setup the database on their local computers by following step-by-step installation guide. The kit provides highly optimized stored functions for positional searchs similar to SDSS SkyServer. Together with these, the powerful SQL environment of PostgreSQL will meet various user's demands. We released 2MASS Catalog Server Kit version 2.1 in 2012 May, which supports the latest WISE All-Sky catalog (563,921,584 rows) and 9 major all-sky catalogs. Local databases are often indispensable for observatories with unstable or narrow-band networks or severe use, such as retrieving large numbers of records within a small period of time. This software is the best for such purposes, and increasing supported catalogs and improvements of version 2.1 can cover a wider range of applications including advanced calibration system, scientific studies using complicated SQL queries, etc. Official page: http://www.ir.isas.jaxa.jp/~cyamauch/2masskit/
NASA Astrophysics Data System (ADS)
Shao, Weber; Kupelian, Patrick A.; Wang, Jason; Low, Daniel A.; Ruan, Dan
2014-03-01
We devise a paradigm for representing the DICOM-RT structure sets in a database management system, in such way that secondary calculations of geometric information can be performed quickly from the existing contour definitions. The implementation of this paradigm is achieved using the PostgreSQL database system and the PostGIS extension, a geographic information system commonly used for encoding geographical map data. The proposed paradigm eliminates the overhead of retrieving large data records from the database, as well as the need to implement various numerical and data parsing routines, when additional information related to the geometry of the anatomy is desired.
ExplorEnz: the primary source of the IUBMB enzyme list
McDonald, Andrew G.; Boyce, Sinéad; Tipton, Keith F.
2009-01-01
ExplorEnz is the MySQL database that is used for the curation and dissemination of the International Union of Biochemistry and Molecular Biology (IUBMB) Enzyme Nomenclature. A simple web-based query interface is provided, along with an advanced search engine for more complex Boolean queries. The WWW front-end is accessible at http://www.enzyme-database.org, from where downloads of the database as SQL and XML are also available. An associated form-based curatorial application has been developed to facilitate the curation of enzyme data as well as the internal and public review processes that occur before an enzyme entry is made official. Suggestions for new enzyme entries, or modifications to existing ones, can be made using the forms provided at http://www.enzyme-database.org/forms.php. PMID:18776214
Using SQL Databases for Sequence Similarity Searching and Analysis.
Pearson, William R; Mackey, Aaron J
2017-09-13
Relational databases can integrate diverse types of information and manage large sets of similarity search results, greatly simplifying genome-scale analyses. By focusing on taxonomic subsets of sequences, relational databases can reduce the size and redundancy of sequence libraries and improve the statistical significance of homologs. In addition, by loading similarity search results into a relational database, it becomes possible to explore and summarize the relationships between all of the proteins in an organism and those in other biological kingdoms. This unit describes how to use relational databases to improve the efficiency of sequence similarity searching and demonstrates various large-scale genomic analyses of homology-related data. It also describes the installation and use of a simple protein sequence database, seqdb_demo, which is used as a basis for the other protocols. The unit also introduces search_demo, a database that stores sequence similarity search results. The search_demo database is then used to explore the evolutionary relationships between E. coli proteins and proteins in other organisms in a large-scale comparative genomic analysis. © 2017 by John Wiley & Sons, Inc. Copyright © 2017 John Wiley & Sons, Inc.
Monitoring of IaaS and scientific applications on the Cloud using the Elasticsearch ecosystem
NASA Astrophysics Data System (ADS)
Bagnasco, S.; Berzano, D.; Guarise, A.; Lusso, S.; Masera, M.; Vallero, S.
2015-05-01
The private Cloud at the Torino INFN computing centre offers IaaS services to different scientific computing applications. The infrastructure is managed with the OpenNebula cloud controller. The main stakeholders of the facility are a grid Tier-2 site for the ALICE collaboration at LHC, an interactive analysis facility for the same experiment and a grid Tier-2 site for the BES-III collaboration, plus an increasing number of other small tenants. Besides keeping track of the usage, the automation of dynamic allocation of resources to tenants requires detailed monitoring and accounting of the resource usage. As a first investigation towards this, we set up a monitoring system to inspect the site activities both in terms of IaaS and applications running on the hosted virtual instances. For this purpose we used the Elasticsearch, Logstash and Kibana stack. In the current implementation, the heterogeneous accounting information is fed to different MySQL databases and sent to Elasticsearch via a custom Logstash plugin. For the IaaS metering, we developed sensors for the OpenNebula API. The IaaS level information gathered through the API is sent to the MySQL database through an ad-hoc developed RESTful web service, which is also used for other accounting purposes. Concerning the application level, we used the Root plugin TProofMonSenderSQL to collect accounting data from the interactive analysis facility. The BES-III virtual instances used to be monitored with Zabbix, as a proof of concept we also retrieve the information contained in the Zabbix database. Each of these three cases is indexed separately in Elasticsearch. We are now starting to consider dismissing the intermediate level provided by the SQL database and evaluating a NoSQL option as a unique central database for all the monitoring information. We setup a set of Kibana dashboards with pre-defined queries in order to monitor the relevant information in each case. In this way we have achieved a uniform monitoring interface for both the IaaS and the scientific applications, mostly leveraging off-the-shelf tools.
Ferreira Junior, José Raniery; Oliveira, Marcelo Costa; de Azevedo-Marques, Paulo Mazzoncini
2016-12-01
Lung cancer is the leading cause of cancer-related deaths in the world, and its main manifestation is pulmonary nodules. Detection and classification of pulmonary nodules are challenging tasks that must be done by qualified specialists, but image interpretation errors make those tasks difficult. In order to aid radiologists on those hard tasks, it is important to integrate the computer-based tools with the lesion detection, pathology diagnosis, and image interpretation processes. However, computer-aided diagnosis research faces the problem of not having enough shared medical reference data for the development, testing, and evaluation of computational methods for diagnosis. In order to minimize this problem, this paper presents a public nonrelational document-oriented cloud-based database of pulmonary nodules characterized by 3D texture attributes, identified by experienced radiologists and classified in nine different subjective characteristics by the same specialists. Our goal with the development of this database is to improve computer-aided lung cancer diagnosis and pulmonary nodule detection and classification research through the deployment of this database in a cloud Database as a Service framework. Pulmonary nodule data was provided by the Lung Image Database Consortium and Image Database Resource Initiative (LIDC-IDRI), image descriptors were acquired by a volumetric texture analysis, and database schema was developed using a document-oriented Not only Structured Query Language (NoSQL) approach. The proposed database is now with 379 exams, 838 nodules, and 8237 images, 4029 of them are CT scans and 4208 manually segmented nodules, and it is allocated in a MongoDB instance on a cloud infrastructure.
Toward Phase IV, Populating the WOVOdat Database
NASA Astrophysics Data System (ADS)
Ratdomopurbo, A.; Newhall, C. G.; Schwandner, F. M.; Selva, J.; Ueda, H.
2009-12-01
One of challenges for volcanologists is the fact that more and more people are likely to live on volcanic slopes. Information about volcanic activity during unrest should be accurate and rapidly distributed. As unrest may lead to eruption, evacuation may be necessary to minimize damage and casualties. The decision to evacuate people is usually based on the interpretation of monitoring data. Over the past several decades, monitoring volcanoes has used more and more sophisticated instruments. A huge volume of data is collected in order to understand the state of activity and behaviour of a volcano. WOVOdat, The World Organization of Volcano Observatories (WOVO) Database of Volcanic Unrest, will provide context within which scientists can interpret the state of their own volcano, during and between crises. After a decision during the 2000 IAVCEI General Assembly to create WOVOdat, development has passed through several phases, from Concept Development (Phase-I in 2000-2002), Database Design (Phase-II, 2003-2006) and Pilot Testing (Phase-III in 2007-2008). For WOVOdat to be operational, there are still two (2) steps to complete, which are: Database Population (Phase-IV) and Enhancement and Maintenance (Phase-V). Since January 2009, the WOVOdat project is hosted by Earth Observatory of Singapore for at least a 5-year period. According to the original planning in 2002, this 5-year period will be used for completing the Phase-IV. As the WOVOdat design is not yet tested for all types of data, 2009 is still reserved for building the back-end relational database management system (RDBMS) of WOVOdat and testing it with more complex data. Fine-tuning of the WOVOdat’s RDBMS design is being done with each new upload of observatory data. The next and main phase of WOVOdat development will be data population, managing data transfer from multiple observatory formats to WOVOdat format. Data population will depend on two important things, the availability of SQL database in volcano observatories and their data sharing policy. Hence, a strong collaboration with every WOVO observatory is important. For some volcanoes where the data are not in an SQL system, the WOVOdat project will help scientists working on the volcano to start building an SQL database.
ERIC Educational Resources Information Center
Mills, Robert J.; Dupin-Bryant, Pamela A.; Johnson, John D.; Beaulieu, Tanya Y.
2015-01-01
The demand for Information Systems (IS) graduates with expertise in Structured Query Language (SQL) and database management is vast and projected to increase as "big data" becomes ubiquitous. To prepare students to solve complex problems in a data-driven world, educators must explore instructional strategies to help link prior knowledge…
Efficient data management tools for the heterogeneous big data warehouse
NASA Astrophysics Data System (ADS)
Alekseev, A. A.; Osipova, V. V.; Ivanov, M. A.; Klimentov, A.; Grigorieva, N. V.; Nalamwar, H. S.
2016-09-01
The traditional RDBMS has been consistent for the normalized data structures. RDBMS served well for decades, but the technology is not optimal for data processing and analysis in data intensive fields like social networks, oil-gas industry, experiments at the Large Hadron Collider, etc. Several challenges have been raised recently on the scalability of data warehouse like workload against the transactional schema, in particular for the analysis of archived data or the aggregation of data for summary and accounting purposes. The paper evaluates new database technologies like HBase, Cassandra, and MongoDB commonly referred as NoSQL databases for handling messy, varied and large amount of data. The evaluation depends upon the performance, throughput and scalability of the above technologies for several scientific and industrial use-cases. This paper outlines the technologies and architectures needed for processing Big Data, as well as the description of the back-end application that implements data migration from RDBMS to NoSQL data warehouse, NoSQL database organization and how it could be useful for further data analytics.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Christopher Slominski
2009-10-01
Archiving a large fraction of the EPICS signals within the Jefferson Lab (JLAB) Accelerator control system is vital for postmortem and real-time analysis of the accelerator performance. This analysis is performed on a daily basis by scientists, operators, engineers, technicians, and software developers. Archiving poses unique challenges due to the magnitude of the control system. A MySQL Archiving system (Mya) was developed to scale to the needs of the control system; currently archiving 58,000 EPICS variables, updating at a rate of 11,000 events per second. In addition to the large collection rate, retrieval of the archived data must also bemore » fast and robust. Archived data retrieval clients obtain data at a rate over 100,000 data points per second. Managing the data in a relational database provides a number of benefits. This paper describes an archiving solution that uses an open source database and standard off the shelf hardware to reach high performance archiving needs. Mya has been in production at Jefferson Lab since February of 2007.« less
2015-09-01
Detectability ...............................................................................................37 Figure 20. Excel VBA Codes for Checker...National Vulnerability Database OS Operating System SQL Structured Query Language VC Verification Condition VBA Visual Basic for Applications...checks each of these assertions for detectability by Daikon. The checker is an Excel Visual Basic for Applications ( VBA ) script that checks the
NASA Technical Reports Server (NTRS)
Baldwin, John; Zendejas, Silvino; Gutheinz, Sandy; Borden, Chester; Wang, Yeou-Fang
2009-01-01
Mission and Assets Database (MADB) Version 1.0 is an SQL database system with a Web user interface to centralize information. The database stores flight project support resource requirements, view periods, antenna information, schedule, and forecast results for use in mid-range and long-term planning of Deep Space Network (DSN) assets.
Development of Human Face Literature Database Using Text Mining Approach: Phase I.
Kaur, Paramjit; Krishan, Kewal; Sharma, Suresh K
2018-06-01
The face is an important part of the human body by which an individual communicates in the society. Its importance can be highlighted by the fact that a person deprived of face cannot sustain in the living world. The amount of experiments being performed and the number of research papers being published under the domain of human face have surged in the past few decades. Several scientific disciplines, which are conducting research on human face include: Medical Science, Anthropology, Information Technology (Biometrics, Robotics, and Artificial Intelligence, etc.), Psychology, Forensic Science, Neuroscience, etc. This alarms the need of collecting and managing the data concerning human face so that the public and free access of it can be provided to the scientific community. This can be attained by developing databases and tools on human face using bioinformatics approach. The current research emphasizes on creating a database concerning literature data of human face. The database can be accessed on the basis of specific keywords, journal name, date of publication, author's name, etc. The collected research papers will be stored in the form of a database. Hence, the database will be beneficial to the research community as the comprehensive information dedicated to the human face could be found at one place. The information related to facial morphologic features, facial disorders, facial asymmetry, facial abnormalities, and many other parameters can be extracted from this database. The front end has been developed using Hyper Text Mark-up Language and Cascading Style Sheets. The back end has been developed using hypertext preprocessor (PHP). The JAVA Script has used as scripting language. MySQL (Structured Query Language) is used for database development as it is most widely used Relational Database Management System. XAMPP (X (cross platform), Apache, MySQL, PHP, Perl) open source web application software has been used as the server.The database is still under the developmental phase and discusses the initial steps of its creation. The current paper throws light on the work done till date.
[Establishment of a comprehensive database for laryngeal cancer related genes and the miRNAs].
Li, Mengjiao; E, Qimin; Liu, Jialin; Huang, Tingting; Liang, Chuanyu
2015-09-01
By collecting and analyzing the laryngeal cancer related genes and the miRNAs, to build a comprehensive laryngeal cancer-related gene database, which differs from the current biological information database with complex and clumsy structure and focuses on the theme of gene and miRNA, and it could make the research and teaching more convenient and efficient. Based on the B/S architecture, using Apache as a Web server, MySQL as coding language of database design and PHP as coding language of web design, a comprehensive database for laryngeal cancer-related genes was established, providing with the gene tables, protein tables, miRNA tables and clinical information tables of the patients with laryngeal cancer. The established database containsed 207 laryngeal cancer related genes, 243 proteins, 26 miRNAs, and their particular information such as mutations, methylations, diversified expressions, and the empirical references of laryngeal cancer relevant molecules. The database could be accessed and operated via the Internet, by which browsing and retrieval of the information were performed. The database were maintained and updated regularly. The database for laryngeal cancer related genes is resource-integrated and user-friendly, providing a genetic information query tool for the study of laryngeal cancer.
The Ruby UCSC API: accessing the UCSC genome database using Ruby.
Mishima, Hiroyuki; Aerts, Jan; Katayama, Toshiaki; Bonnal, Raoul J P; Yoshiura, Koh-ichiro
2012-09-21
The University of California, Santa Cruz (UCSC) genome database is among the most used sources of genomic annotation in human and other organisms. The database offers an excellent web-based graphical user interface (the UCSC genome browser) and several means for programmatic queries. A simple application programming interface (API) in a scripting language aimed at the biologist was however not yet available. Here, we present the Ruby UCSC API, a library to access the UCSC genome database using Ruby. The API is designed as a BioRuby plug-in and built on the ActiveRecord 3 framework for the object-relational mapping, making writing SQL statements unnecessary. The current version of the API supports databases of all organisms in the UCSC genome database including human, mammals, vertebrates, deuterostomes, insects, nematodes, and yeast.The API uses the bin index-if available-when querying for genomic intervals. The API also supports genomic sequence queries using locally downloaded *.2bit files that are not stored in the official MySQL database. The API is implemented in pure Ruby and is therefore available in different environments and with different Ruby interpreters (including JRuby). Assisted by the straightforward object-oriented design of Ruby and ActiveRecord, the Ruby UCSC API will facilitate biologists to query the UCSC genome database programmatically. The API is available through the RubyGem system. Source code and documentation are available at https://github.com/misshie/bioruby-ucsc-api/ under the Ruby license. Feedback and help is provided via the website at http://rubyucscapi.userecho.com/.
The Ruby UCSC API: accessing the UCSC genome database using Ruby
2012-01-01
Background The University of California, Santa Cruz (UCSC) genome database is among the most used sources of genomic annotation in human and other organisms. The database offers an excellent web-based graphical user interface (the UCSC genome browser) and several means for programmatic queries. A simple application programming interface (API) in a scripting language aimed at the biologist was however not yet available. Here, we present the Ruby UCSC API, a library to access the UCSC genome database using Ruby. Results The API is designed as a BioRuby plug-in and built on the ActiveRecord 3 framework for the object-relational mapping, making writing SQL statements unnecessary. The current version of the API supports databases of all organisms in the UCSC genome database including human, mammals, vertebrates, deuterostomes, insects, nematodes, and yeast. The API uses the bin index—if available—when querying for genomic intervals. The API also supports genomic sequence queries using locally downloaded *.2bit files that are not stored in the official MySQL database. The API is implemented in pure Ruby and is therefore available in different environments and with different Ruby interpreters (including JRuby). Conclusions Assisted by the straightforward object-oriented design of Ruby and ActiveRecord, the Ruby UCSC API will facilitate biologists to query the UCSC genome database programmatically. The API is available through the RubyGem system. Source code and documentation are available at https://github.com/misshie/bioruby-ucsc-api/ under the Ruby license. Feedback and help is provided via the website at http://rubyucscapi.userecho.com/. PMID:22994508
NASA Astrophysics Data System (ADS)
Maffei, A. R.; Chandler, C. L.; Work, T.; Allen, J.; Groman, R. C.; Fox, P. A.
2009-12-01
Content Management Systems (CMSs) provide powerful features that can be of use to oceanographic (and other geo-science) data managers. However, in many instances, geo-science data management offices have previously designed customized schemas for their metadata. The WHOI Ocean Informatics initiative and the NSF funded Biological Chemical and Biological Data Management Office (BCO-DMO) have jointly sponsored a project to port an existing, relational database containing oceanographic metadata, along with an existing interface coded in Cold Fusion middleware, to a Drupal6 Content Management System. The goal was to translate all the existing database tables, input forms, website reports, and other features present in the existing system to employ Drupal CMS features. The replacement features include Drupal content types, CCK node-reference fields, themes, RDB, SPARQL, workflow, and a number of other supporting modules. Strategic use of some Drupal6 CMS features enables three separate but complementary interfaces that provide access to oceanographic research metadata via the MySQL database: 1) a Drupal6-powered front-end; 2) a standard SQL port (used to provide a Mapserver interface to the metadata and data; and 3) a SPARQL port (feeding a new faceted search capability being developed). Future plans include the creation of science ontologies, by scientist/technologist teams, that will drive semantically-enabled faceted search capabilities planned for the site. Incorporation of semantic technologies included in the future Drupal 7 core release is also anticipated. Using a public domain CMS as opposed to proprietary middleware, and taking advantage of the many features of Drupal 6 that are designed to support semantically-enabled interfaces will help prepare the BCO-DMO database for interoperability with other ecosystem databases.
A Quality-Control-Oriented Database for a Mesoscale Meteorological Observation Network
NASA Astrophysics Data System (ADS)
Lussana, C.; Ranci, M.; Uboldi, F.
2012-04-01
In the operational context of a local weather service, data accessibility and quality related issues must be managed by taking into account a wide set of user needs. This work describes the structure and the operational choices made for the operational implementation of a database system storing data from highly automated observing stations, metadata and information on data quality. Lombardy's environmental protection agency, ARPA Lombardia, manages a highly automated mesoscale meteorological network. A Quality Assurance System (QAS) ensures that reliable observational information is collected and disseminated to the users. The weather unit in ARPA Lombardia, at the same time an important QAS component and an intensive data user, has developed a database specifically aimed to: 1) providing quick access to data for operational activities and 2) ensuring data quality for real-time applications, by means of an Automatic Data Quality Control (ADQC) procedure. Quantities stored in the archive include hourly aggregated observations of: precipitation amount, temperature, wind, relative humidity, pressure, global and net solar radiation. The ADQC performs several independent tests on raw data and compares their results in a decision-making procedure. An important ADQC component is the Spatial Consistency Test based on Optimal Interpolation. Interpolated and Cross-Validation analysis values are also stored in the database, providing further information to human operators and useful estimates in case of missing data. The technical solution adopted is based on a LAMP (Linux, Apache, MySQL and Php) system, constituting an open source environment suitable for both development and operational practice. The ADQC procedure itself is performed by R scripts directly interacting with the MySQL database. Users and network managers can access the database by using a set of web-based Php applications.
Extending SQL to Support Privacy Policies
NASA Astrophysics Data System (ADS)
Ghazinour, Kambiz; Pun, Sampson; Majedi, Maryam; Chinaci, Amir H.; Barker, Ken
Increasing concerns over Internet applications that violate user privacy by exploiting (back-end) database vulnerabilities must be addressed to protect both customer privacy and to ensure corporate strategic assets remain trustworthy. This chapter describes an extension onto database catalogues and Structured Query Language (SQL) for supporting privacy in Internet applications, such as in social networks, e-health, e-governmcnt, etc. The idea is to introduce new predicates to SQL commands to capture common privacy requirements, such as purpose, visibility, generalization, and retention for both mandatory and discretionary access control policies. The contribution is that corporations, when creating the underlying databases, will be able to define what their mandatory privacy policies arc with which all application users have to comply. Furthermore, each application user, when providing their own data, will be able to define their own privacy policies with which other users have to comply. The extension is supported with underlying catalogues and algorithms. The experiments demonstrate a very reasonable overhead for the extension. The result is a low-cost mechanism to create new systems that arc privacy aware and also to transform legacy databases to their privacy-preserving equivalents. Although the examples arc from social networks, one can apply the results to data security and user privacy of other enterprises as well.
DOE Office of Scientific and Technical Information (OSTI.GOV)
This software is a plug-in that interfaces between the Phoenix Integration's Model Center and the Base SAS 9.2 applications. The end use of the plug-in is to link input and output data that resides in SAS tables or MS SQL to and from "legacy" software programs without recoding. The potential end users are users who need to run legacy code and want data stored in a SQL database.
A Web-based Tool for SDSS and 2MASS Database Searches
NASA Astrophysics Data System (ADS)
Hendrickson, M. A.; Uomoto, A.; Golimowski, D. A.
We have developed a web site using HTML, Php, Python, and MySQL that extracts, processes, and displays data from the Sloan Digital Sky Survey (SDSS) and the Two-Micron All-Sky Survey (2MASS). The goal is to locate brown dwarf candidates in the SDSS database by looking at color cuts; however, this site could also be useful for targeted searches of other databases as well. MySQL databases are created from broad searches of SDSS and 2MASS data. Broad queries on the SDSS and 2MASS database servers are run weekly so that observers have the most up-to-date information from which to select candidates for observation. Observers can look at detailed information about specific objects including finding charts, images, and available spectra. In addition, updates from previous observations can be added by any collaborators; this format makes observational collaboration simple. Observers can also restrict the database search, just before or during an observing run, to select objects of special interest.
Operationalizing Semantic Medline for meeting the information needs at point of care.
Rastegar-Mojarad, Majid; Li, Dingcheng; Liu, Hongfang
2015-01-01
Scientific literature is one of the popular resources for providing decision support at point of care. It is highly desirable to bring the most relevant literature to support the evidence-based clinical decision making process. Motivated by the recent advance in semantically enhanced information retrieval, we have developed a system, which aims to bring semantically enriched literature, Semantic Medline, to meet the information needs at point of care. This study reports our work towards operationalizing the system for real time use. We demonstrate that the migration of a relational database implementation to a NoSQL (Not only SQL) implementation significantly improves the performance and makes the use of Semantic Medline at point of care decision support possible.
MaGnET: Malaria Genome Exploration Tool.
Sharman, Joanna L; Gerloff, Dietlind L
2013-09-15
The Malaria Genome Exploration Tool (MaGnET) is a software tool enabling intuitive 'exploration-style' visualization of functional genomics data relating to the malaria parasite, Plasmodium falciparum. MaGnET provides innovative integrated graphic displays for different datasets, including genomic location of genes, mRNA expression data, protein-protein interactions and more. Any selection of genes to explore made by the user is easily carried over between the different viewers for different datasets, and can be changed interactively at any point (without returning to a search). Free online use (Java Web Start) or download (Java application archive and MySQL database; requires local MySQL installation) at http://malariagenomeexplorer.org joanna.sharman@ed.ac.uk or dgerloff@ffame.org Supplementary data are available at Bioinformatics online.
Operationalizing Semantic Medline for meeting the information needs at point of care
Rastegar-Mojarad, Majid; Li, Dingcheng; Liu, Hongfang
2015-01-01
Scientific literature is one of the popular resources for providing decision support at point of care. It is highly desirable to bring the most relevant literature to support the evidence-based clinical decision making process. Motivated by the recent advance in semantically enhanced information retrieval, we have developed a system, which aims to bring semantically enriched literature, Semantic Medline, to meet the information needs at point of care. This study reports our work towards operationalizing the system for real time use. We demonstrate that the migration of a relational database implementation to a NoSQL (Not only SQL) implementation significantly improves the performance and makes the use of Semantic Medline at point of care decision support possible. PMID:26306259
Brassica ASTRA: an integrated database for Brassica genomic research.
Love, Christopher G; Robinson, Andrew J; Lim, Geraldine A C; Hopkins, Clare J; Batley, Jacqueline; Barker, Gary; Spangenberg, German C; Edwards, David
2005-01-01
Brassica ASTRA is a public database for genomic information on Brassica species. The database incorporates expressed sequences with Swiss-Prot and GenBank comparative sequence annotation as well as secondary Gene Ontology (GO) annotation derived from the comparison with Arabidopsis TAIR GO annotations. Simple sequence repeat molecular markers are identified within resident sequences and mapped onto the closely related Arabidopsis genome sequence. Bacterial artificial chromosome (BAC) end sequences derived from the Multinational Brassica Genome Project are also mapped onto the Arabidopsis genome sequence enabling users to identify candidate Brassica BACs corresponding to syntenic regions of Arabidopsis. This information is maintained in a MySQL database with a web interface providing the primary means of interrogation. The database is accessible at http://hornbill.cspp.latrobe.edu.au.
Teaching Advanced SQL Skills: Text Bulk Loading
ERIC Educational Resources Information Center
Olsen, David; Hauser, Karina
2007-01-01
Studies show that advanced database skills are important for students to be prepared for today's highly competitive job market. A common task for database administrators is to insert a large amount of data into a database. This paper illustrates how an up-to-date, advanced database topic, namely bulk insert, can be incorporated into a database…
Establishment and Assessment of Plasma Disruption and Warning Databases from EAST
NASA Astrophysics Data System (ADS)
Wang, Bo; Robert, Granetz; Xiao, Bingjia; Li, Jiangang; Yang, Fei; Li, Junjun; Chen, Dalong
2016-12-01
Disruption database and disruption warning database of the EAST tokamak had been established by a disruption research group. The disruption database, based on Structured Query Language (SQL), comprises 41 disruption parameters, which include current quench characteristics, EFIT equilibrium characteristics, kinetic parameters, halo currents, and vertical motion. Presently most disruption databases are based on plasma experiments of non-superconducting tokamak devices. The purposes of the EAST database are to find disruption characteristics and disruption statistics to the fully superconducting tokamak EAST, to elucidate the physics underlying tokamak disruptions, to explore the influence of disruption on superconducting magnets and to extrapolate toward future burning plasma devices. In order to quantitatively assess the usefulness of various plasma parameters for predicting disruptions, a similar SQL database to Alcator C-Mod for EAST has been created by compiling values for a number of proposed disruption-relevant parameters sampled from all plasma discharges in the 2015 campaign. The detailed statistic results and analysis of two databases on the EAST tokamak are presented. supported by the National Magnetic Confinement Fusion Science Program of China (No. 2014GB103000)
Information integration for a sky survey by data warehousing
NASA Astrophysics Data System (ADS)
Luo, A.; Zhang, Y.; Zhao, Y.
The virtualization service of data system for a sky survey LAMOST is very important for astronomers The service needs to integrate information from data collections catalogs and references and support simple federation of a set of distributed files and associated metadata Data warehousing has been in existence for several years and demonstrated superiority over traditional relational database management systems by providing novel indexing schemes that supported efficient on-line analytical processing OLAP of large databases Now relational database systems such as Oracle etc support the warehouse capability which including extensions to the SQL language to support OLAP operations and a number of metadata management tools have been created The information integration of LAMOST by applying data warehousing is to effectively provide data and knowledge on-line
Bergamino, Maurizio; Hamilton, David J; Castelletti, Lara; Barletta, Laura; Castellan, Lucio
2015-03-01
In this study, we describe the development and utilization of a relational database designed to manage the clinical and radiological data of patients with brain tumors. The Brain Tumor Database was implemented using MySQL v.5.0, while the graphical user interface was created using PHP and HTML, thus making it easily accessible through a web browser. This web-based approach allows for multiple institutions to potentially access the database. The BT Database can record brain tumor patient information (e.g. clinical features, anatomical attributes, and radiological characteristics) and be used for clinical and research purposes. Analytic tools to automatically generate statistics and different plots are provided. The BT Database is a free and powerful user-friendly tool with a wide range of possible clinical and research applications in neurology and neurosurgery. The BT Database graphical user interface source code and manual are freely available at http://tumorsdatabase.altervista.org. © The Author(s) 2013.
PylotDB - A Database Management, Graphing, and Analysis Tool Written in Python
DOE Office of Scientific and Technical Information (OSTI.GOV)
Barnette, Daniel W.
2012-01-04
PylotDB, written completely in Python, provides a user interface (UI) with which to interact with, analyze, graph data from, and manage open source databases such as MySQL. The UI mitigates the user having to know in-depth knowledge of the database application programming interface (API). PylotDB allows the user to generate various kinds of plots from user-selected data; generate statistical information on text as well as numerical fields; backup and restore databases; compare database tables across different databases as well as across different servers; extract information from any field to create new fields; generate, edit, and delete databases, tables, and fields;more » generate or read into a table CSV data; and similar operations. Since much of the database information is brought under control of the Python computer language, PylotDB is not intended for huge databases for which MySQL and Oracle, for example, are better suited. PylotDB is better suited for smaller databases that might be typically needed in a small research group situation. PylotDB can also be used as a learning tool for database applications in general.« less
The Comparison of SQL, QBE, and DFQL as Query Languages for Relational Databases
1994-03-01
is: Dname F-mune Laame Headquarter James Borg b. Query 7: RetieMl involving explicit sets Retrieve the Social Security Numbers of employees who worked...i •••,• I• i , i I I • I 10. Ka Dispullahta MABES TNI-AL Cilangkap-Jakarta Timur Indonesia 11. Parunmungan Girsang 3 Jl. Cawang Baru 34-36 Jakarta
Chen, R S; Nadkarni, P; Marenco, L; Levin, F; Erdos, J; Miller, P L
2000-01-01
The entity-attribute-value representation with classes and relationships (EAV/CR) provides a flexible and simple database schema to store heterogeneous biomedical data. In certain circumstances, however, the EAV/CR model is known to retrieve data less efficiently than conventionally based database schemas. To perform a pilot study that systematically quantifies performance differences for database queries directed at real-world microbiology data modeled with EAV/CR and conventional representations, and to explore the relative merits of different EAV/CR query implementation strategies. Clinical microbiology data obtained over a ten-year period were stored using both database models. Query execution times were compared for four clinically oriented attribute-centered and entity-centered queries operating under varying conditions of database size and system memory. The performance characteristics of three different EAV/CR query strategies were also examined. Performance was similar for entity-centered queries in the two database models. Performance in the EAV/CR model was approximately three to five times less efficient than its conventional counterpart for attribute-centered queries. The differences in query efficiency became slightly greater as database size increased, although they were reduced with the addition of system memory. The authors found that EAV/CR queries formulated using multiple, simple SQL statements executed in batch were more efficient than single, large SQL statements. This paper describes a pilot project to explore issues in and compare query performance for EAV/CR and conventional database representations. Although attribute-centered queries were less efficient in the EAV/CR model, these inefficiencies may be addressable, at least in part, by the use of more powerful hardware or more memory, or both.
Zhang, Qingzhou; Yang, Bo; Chen, Xujiao; Xu, Jing; Mei, Changlin; Mao, Zhiguo
2014-01-01
We present a bioinformatics database named Renal Gene Expression Database (RGED), which contains comprehensive gene expression data sets from renal disease research. The web-based interface of RGED allows users to query the gene expression profiles in various kidney-related samples, including renal cell lines, human kidney tissues and murine model kidneys. Researchers can explore certain gene profiles, the relationships between genes of interests and identify biomarkers or even drug targets in kidney diseases. The aim of this work is to provide a user-friendly utility for the renal disease research community to query expression profiles of genes of their own interest without the requirement of advanced computational skills. Availability and implementation: Website is implemented in PHP, R, MySQL and Nginx and freely available from http://rged.wall-eva.net. Database URL: http://rged.wall-eva.net PMID:25252782
Insect barcode information system.
Pratheepa, Maria; Jalali, Sushil Kumar; Arokiaraj, Robinson Silvester; Venkatesan, Thiruvengadam; Nagesh, Mandadi; Panda, Madhusmita; Pattar, Sharath
2014-01-01
Insect Barcode Information System called as Insect Barcode Informática (IBIn) is an online database resource developed by the National Bureau of Agriculturally Important Insects, Bangalore. This database provides acquisition, storage, analysis and publication of DNA barcode records of agriculturally important insects, for researchers specifically in India and other countries. It bridges a gap in bioinformatics by integrating molecular, morphological and distribution details of agriculturally important insects. IBIn was developed using PHP/My SQL by using relational database management concept. This database is based on the client- server architecture, where many clients can access data simultaneously. IBIn is freely available on-line and is user-friendly. IBIn allows the registered users to input new information, search and view information related to DNA barcode of agriculturally important insects.This paper provides a current status of insect barcode in India and brief introduction about the database IBIn. http://www.nabg-nbaii.res.in/barcode.
How to maintain blood supply during computer network breakdown: a manual backup system.
Zeiler, T; Slonka, J; Bürgi, H R; Kretschmer, V
2000-12-01
Electronic data management systems using computer network systems and client/server architecture are increasingly used in laboratories and transfusion services. Severe problems arise if there is no network access to the database server and critical functions are not available. We describe a manual backup system (MBS) developed to maintain the delivery of blood products to patients in a hospital transfusion service in case of a computer network breakdown. All data are kept on a central SQL database connected to peripheral workstations in a local area network (LAN). Request entry from wards is performed via machine-readable request forms containing self-adhesive specimen labels with barcodes for test tubes. Data entry occurs on-line by bidirectional automated systems or off-line manually. One of the workstations in the laboratory contains a second SQL database which is frequently and incrementally updated. This workstation is run as a stand-alone, read-only database if the central SQL database is not available. In case of a network breakdown, the time-graded MBS is launched. Patient data, requesting ward and ordered tests/requests, are photocopied through a template from the request forms on special MBS worksheets serving as laboratory journal for manual processing and result report (a copy is left in the laboratory). As soon as the network is running again the data from the off-line period are entered into the primary SQL server. The MBS was successfully used at several occasions. The documentation of a 90-min breakdown period is presented in detail. Additional work resulted from the copy work and the belated manual data entry after restoration of the system. There was no delay in issue of blood products or result reporting. The backup system described has been proven to be simple, quick and safe to maintain urgent blood supply and distribution of laboratory results in case of unexpected network breakdown.
Data Processing on Database Management Systems with Fuzzy Query
NASA Astrophysics Data System (ADS)
Şimşek, Irfan; Topuz, Vedat
In this study, a fuzzy query tool (SQLf) for non-fuzzy database management systems was developed. In addition, samples of fuzzy queries were made by using real data with the tool developed in this study. Performance of SQLf was tested with the data about the Marmara University students' food grant. The food grant data were collected in MySQL database by using a form which had been filled on the web. The students filled a form on the web to describe their social and economical conditions for the food grant request. This form consists of questions which have fuzzy and crisp answers. The main purpose of this fuzzy query is to determine the students who deserve the grant. The SQLf easily found the eligible students for the grant through predefined fuzzy values. The fuzzy query tool (SQLf) could be used easily with other database system like ORACLE and SQL server.
GEOmetadb: powerful alternative search engine for the Gene Expression Omnibus
Zhu, Yuelin; Davis, Sean; Stephens, Robert; Meltzer, Paul S.; Chen, Yidong
2008-01-01
The NCBI Gene Expression Omnibus (GEO) represents the largest public repository of microarray data. However, finding data in GEO can be challenging. We have developed GEOmetadb in an attempt to make querying the GEO metadata both easier and more powerful. All GEO metadata records as well as the relationships between them are parsed and stored in a local MySQL database. A powerful, flexible web search interface with several convenient utilities provides query capabilities not available via NCBI tools. In addition, a Bioconductor package, GEOmetadb that utilizes a SQLite export of the entire GEOmetadb database is also available, rendering the entire GEO database accessible with full power of SQL-based queries from within R. Availability: The web interface and SQLite databases available at http://gbnci.abcc.ncifcrf.gov/geo/. The Bioconductor package is available via the Bioconductor project. The corresponding MATLAB implementation is also available at the same website. Contact: yidong@mail.nih.gov PMID:18842599
NASA Astrophysics Data System (ADS)
Vacca, G.; Pili, D.; Fiorino, D. R.; Pintus, V.
2017-05-01
The presented work is part of the research project, titled "Tecniche murarie tradizionali: conoscenza per la conservazione ed il miglioramento prestazionale" (Traditional building techniques: from knowledge to conservation and performance improvement), with the purpose of studying the building techniques of the 13th-18th centuries in the Sardinia Region (Italy) for their knowledge, conservation, and promotion. The end purpose of the entire study is to improve the performance of the examined structures. In particular, the task of the authors within the research project was to build a WebGIS to manage the data collected during the examination and study phases. This infrastructure was entirely built using Open Source software. The work consisted of designing a database built in PostgreSQL and its spatial extension PostGIS, which allows to store and manage feature geometries and spatial data. The data input is performed via a form built in HTML and PHP. The HTML part is based on Bootstrap, an open tools library for websites and web applications. The implementation of this template used both PHP and Javascript code. The PHP code manages the reading and writing of data to the database, using embedded SQL queries. As of today, we surveyed and archived more than 300 buildings, belonging to three main macro categories: fortification architectures, religious architectures, residential architectures. The masonry samples investigated in relation to the construction techniques are more than 150. The database is published on the Internet as a WebGIS built using the Leaflet Javascript open libraries, which allows creating map sites with background maps and navigation, input and query tools. This too uses an interaction of HTML, Javascript, PHP and SQL code.
Shao, Wei; Shan, Jigui; Kearney, Mary F; Wu, Xiaolin; Maldarelli, Frank; Mellors, John W; Luke, Brian; Coffin, John M; Hughes, Stephen H
2016-07-04
The NCI Retrovirus Integration Database is a MySql-based relational database created for storing and retrieving comprehensive information about retroviral integration sites, primarily, but not exclusively, HIV-1. The database is accessible to the public for submission or extraction of data originating from experiments aimed at collecting information related to retroviral integration sites including: the site of integration into the host genome, the virus family and subtype, the origin of the sample, gene exons/introns associated with integration, and proviral orientation. Information about the references from which the data were collected is also stored in the database. Tools are built into the website that can be used to map the integration sites to UCSC genome browser, to plot the integration site patterns on a chromosome, and to display provirus LTRs in their inserted genome sequence. The website is robust, user friendly, and allows users to query the database and analyze the data dynamically. https://rid.ncifcrf.gov ; or http://home.ncifcrf.gov/hivdrp/resources.htm .
DRUMS: Disk Repository with Update Management and Select option for high throughput sequencing data.
Nettling, Martin; Thieme, Nils; Both, Andreas; Grosse, Ivo
2014-02-04
New technologies for analyzing biological samples, like next generation sequencing, are producing a growing amount of data together with quality scores. Moreover, software tools (e.g., for mapping sequence reads), calculating transcription factor binding probabilities, estimating epigenetic modification enriched regions or determining single nucleotide polymorphism increase this amount of position-specific DNA-related data even further. Hence, requesting data becomes challenging and expensive and is often implemented using specialised hardware. In addition, picking specific data as fast as possible becomes increasingly important in many fields of science. The general problem of handling big data sets was addressed by developing specialized databases like HBase, HyperTable or Cassandra. However, these database solutions require also specialized or distributed hardware leading to expensive investments. To the best of our knowledge, there is no database capable of (i) storing billions of position-specific DNA-related records, (ii) performing fast and resource saving requests, and (iii) running on a single standard computer hardware. Here, we present DRUMS (Disk Repository with Update Management and Select option), satisfying demands (i)-(iii). It tackles the weaknesses of traditional databases while handling position-specific DNA-related data in an efficient manner. DRUMS is capable of storing up to billions of records. Moreover, it focuses on optimizing relating single lookups as range request, which are needed permanently for computations in bioinformatics. To validate the power of DRUMS, we compare it to the widely used MySQL database. The test setting considers two biological data sets. We use standard desktop hardware as test environment. DRUMS outperforms MySQL in writing and reading records by a factor of two up to a factor of 10000. Furthermore, it can work with significantly larger data sets. Our work focuses on mid-sized data sets up to several billion records without requiring cluster technology. Storing position-specific data is a general problem and the concept we present here is a generalized approach. Hence, it can be easily applied to other fields of bioinformatics.
Architecture for biomedical multimedia information delivery on the World Wide Web
NASA Astrophysics Data System (ADS)
Long, L. Rodney; Goh, Gin-Hua; Neve, Leif; Thoma, George R.
1997-10-01
Research engineers at the National Library of Medicine are building a prototype system for the delivery of multimedia biomedical information on the World Wide Web. This paper discuses the architecture and design considerations for the system, which will be used initially to make images and text from the third National Health and Nutrition Examination Survey (NHANES) publicly available. We categorized our analysis as follows: (1) fundamental software tools: we analyzed trade-offs among use of conventional HTML/CGI, X Window Broadway, and Java; (2) image delivery: we examined the use of unconventional TCP transmission methods; (3) database manager and database design: we discuss the capabilities and planned use of the Informix object-relational database manager and the planned schema for the HNANES database; (4) storage requirements for our Sun server; (5) user interface considerations; (6) the compatibility of the system with other standard research and analysis tools; (7) image display: we discuss considerations for consistent image display for end users. Finally, we discuss the scalability of the system in terms of incorporating larger or more databases of similar data, and the extendibility of the system for supporting content-based retrieval of biomedical images. The system prototype is called the Web-based Medical Information Retrieval System. An early version was built as a Java applet and tested on Unix, PC, and Macintosh platforms. This prototype used the MiniSQL database manager to do text queries on a small database of records of participants in the second NHANES survey. The full records and associated x-ray images were retrievable and displayable on a standard Web browser. A second version has now been built, also a Java applet, using the MySQL database manager.
Rasdaman for Big Spatial Raster Data
NASA Astrophysics Data System (ADS)
Hu, F.; Huang, Q.; Scheele, C. J.; Yang, C. P.; Yu, M.; Liu, K.
2015-12-01
Spatial raster data have grown exponentially over the past decade. Recent advancements on data acquisition technology, such as remote sensing, have allowed us to collect massive observation data of various spatial resolution and domain coverage. The volume, velocity, and variety of such spatial data, along with the computational intensive nature of spatial queries, pose grand challenge to the storage technologies for effective big data management. While high performance computing platforms (e.g., cloud computing) can be used to solve the computing-intensive issues in big data analysis, data has to be managed in a way that is suitable for distributed parallel processing. Recently, rasdaman (raster data manager) has emerged as a scalable and cost-effective database solution to store and retrieve massive multi-dimensional arrays, such as sensor, image, and statistics data. Within this paper, the pros and cons of using rasdaman to manage and query spatial raster data will be examined and compared with other common approaches, including file-based systems, relational databases (e.g., PostgreSQL/PostGIS), and NoSQL databases (e.g., MongoDB and Hive). Earth Observing System (EOS) data collected from NASA's Atmospheric Scientific Data Center (ASDC) will be used and stored in these selected database systems, and a set of spatial and non-spatial queries will be designed to benchmark their performance on retrieving large-scale, multi-dimensional arrays of EOS data. Lessons learnt from using rasdaman will be discussed as well.
A new relational database structure and online interface for the HITRAN database
NASA Astrophysics Data System (ADS)
Hill, Christian; Gordon, Iouli E.; Rothman, Laurence S.; Tennyson, Jonathan
2013-11-01
A new format for the HITRAN database is proposed. By storing the line-transition data in a number of linked tables described by a relational database schema, it is possible to overcome the limitations of the existing format, which have become increasingly apparent over the last few years as new and more varied data are being used by radiative-transfer models. Although the database in the new format can be searched using the well-established Structured Query Language (SQL), a web service, HITRANonline, has been deployed to allow users to make most common queries of the database using a graphical user interface in a web page. The advantages of the relational form of the database to ensuring data integrity and consistency are explored, and the compatibility of the online interface with the emerging standards of the Virtual Atomic and Molecular Data Centre (VAMDC) project is discussed. In particular, the ability to access HITRAN data using a standard query language from other websites, command line tools and from within computer programs is described.
High Performance Semantic Factoring of Giga-Scale Semantic Graph Databases
DOE Office of Scientific and Technical Information (OSTI.GOV)
Joslyn, Cliff A.; Adolf, Robert D.; Al-Saffar, Sinan
2010-10-04
As semantic graph database technology grows to address components ranging from extant large triple stores to SPARQL endpoints over SQL-structured relational databases, it will become increasingly important to be able to bring high performance computational resources to bear on their analysis, interpretation, and visualization, especially with respect to their innate semantic structure. Our research group built a novel high performance hybrid system comprising computational capability for semantic graph database processing utilizing the large multithreaded architecture of the Cray XMT platform, conventional clusters, and large data stores. In this paper we describe that architecture, and present the results of our deployingmore » that for the analysis of the Billion Triple dataset with respect to its semantic factors.« less
2014-04-25
EA’s Java application programming interface (API), the team built a tool called OWL2EA that can ingest an OWL file and generate the corresponding UML...ObjectItemStructure specification shown in Figure 10. Running this script in the relational database server MySQL creates the physical schema that
Network Security Visualization
1999-09-27
performing SQL generation and result-set binding, inserting acquired security events into the database and gathering the requested data for Console scene...objects is also auto-generated by a VBA script. Built into the auto-generated table access objects are the preferred join paths between tables. This...much of the Server itself) never have to deal with SQL directly. This is one aspect of laying the groundwork for supporting RDBMSs from multiple vendors
A Relational Encoding of a Conceptual Model with Multiple Temporal Dimensions
NASA Astrophysics Data System (ADS)
Gubiani, Donatella; Montanari, Angelo
The theoretical interest and the practical relevance of a systematic treatment of multiple temporal dimensions is widely recognized in the database and information system communities. Nevertheless, most relational databases have no temporal support at all. A few of them provide a limited support, in terms of temporal data types and predicates, constructors, and functions for the management of time values (borrowed from the SQL standard). One (resp., two) temporal dimensions are supported by historical and transaction-time (resp., bitemporal) databases only. In this paper, we provide a relational encoding of a conceptual model featuring four temporal dimensions, namely, the classical valid and transaction times, plus the event and availability times. We focus our attention on the distinctive technical features of the proposed temporal extension of the relation model. In the last part of the paper, we briefly show how to implement it in a standard DBMS.
NASA Astrophysics Data System (ADS)
McEnery, J. A.; Jitkajornwanich, K.
2012-12-01
This presentation will describe the methodology and overall system development by which a benchmark dataset of precipitation information has been used to characterize the depth-area-duration relations in heavy rain storms occurring over regions of Texas. Over the past two years project investigators along with the National Weather Service (NWS) West Gulf River Forecast Center (WGRFC) have developed and operated a gateway data system to ingest, store, and disseminate NWS multi-sensor precipitation estimates (MPE). As a pilot project of the Integrated Water Resources Science and Services (IWRSS) initiative, this testbed uses a Standard Query Language (SQL) server to maintain a full archive of current and historic MPE values within the WGRFC service area. These time series values are made available for public access as web services in the standard WaterML format. Having this volume of information maintained in a comprehensive database now allows the use of relational analysis capabilities within SQL to leverage these multi-sensor precipitation values and produce a valuable derivative product. The area of focus for this study is North Texas and will utilize values that originated from the West Gulf River Forecast Center (WGRFC); one of three River Forecast Centers currently represented in the holdings of this data system. Over the past two decades, NEXRAD radar has dramatically improved the ability to record rainfall. The resulting hourly MPE values, distributed over an approximate 4 km by 4 km grid, are considered by the NWS to be the "best estimate" of rainfall. The data server provides an accepted standard interface for internet access to the largest time-series dataset of NEXRAD based MPE values ever assembled. An automated script has been written to search and extract storms over the 18 year period of record from the contents of this massive historical precipitation database. Not only can it extract site-specific storms, but also duration-specific storms and storms separated by user defined inter-event periods. A separate storm database has been created to store the selected output. By storing output within tables in a separate database, we can make use of powerful SQL capabilities to perform flexible pattern analysis. Previous efforts have made use of historic data from limited clusters of irregularly spaced physical gauges. Spatial extent of the observational network has been a limiting factor. The relatively dense distribution of MPE provides a virtual mesh of observations stretched over the landscape. This work combines a unique hydrologic data resource with programming and database analysis to characterize storm depth-area-duration relationships.
Adding Hierarchical Objects to Relational Database General-Purpose XML-Based Information Managements
NASA Technical Reports Server (NTRS)
Lin, Shu-Chun; Knight, Chris; La, Tracy; Maluf, David; Bell, David; Tran, Khai Peter; Gawdiak, Yuri
2006-01-01
NETMARK is a flexible, high-throughput software system for managing, storing, and rapid searching of unstructured and semi-structured documents. NETMARK transforms such documents from their original highly complex, constantly changing, heterogeneous data formats into well-structured, common data formats in using Hypertext Markup Language (HTML) and/or Extensible Markup Language (XML). The software implements an object-relational database system that combines the best practices of the relational model utilizing Structured Query Language (SQL) with those of the object-oriented, semantic database model for creating complex data. In particular, NETMARK takes advantage of the Oracle 8i object-relational database model using physical-address data types for very efficient keyword searches of records across both context and content. NETMARK also supports multiple international standards such as WEBDAV for drag-and-drop file management and SOAP for integrated information management using Web services. The document-organization and -searching capabilities afforded by NETMARK are likely to make this software attractive for use in disciplines as diverse as science, auditing, and law enforcement.
Agile Datacube Analytics (not just) for the Earth Sciences
NASA Astrophysics Data System (ADS)
Misev, Dimitar; Merticariu, Vlad; Baumann, Peter
2017-04-01
Metadata are considered small, smart, and queryable; data, on the other hand, are known as big, clumsy, hard to analyze. Consequently, gridded data - such as images, image timeseries, and climate datacubes - are managed separately from the metadata, and with different, restricted retrieval capabilities. One reason for this silo approach is that databases, while good at tables, XML hierarchies, RDF graphs, etc., traditionally do not support multi-dimensional arrays well. This gap is being closed by Array Databases which extend the SQL paradigm of "any query, anytime" to NoSQL arrays. They introduce semantically rich modelling combined with declarative, high-level query languages on n-D arrays. On Server side, such queries can be optimized, parallelized, and distributed based on partitioned array storage. This way, they offer new vistas in flexibility, scalability, performance, and data integration. In this respect, the forthcoming ISO SQL extension MDA ("Multi-dimensional Arrays") will be a game changer in Big Data Analytics. We introduce concepts and opportunities through the example of rasdaman ("raster data manager") which in fact has pioneered the field of Array Databases and forms the blueprint for ISO SQL/MDA and further Big Data standards, such as OGC WCPS for querying spatio-temporal Earth datacubes. With operational installations exceeding 140 TB queries have been split across more than one thousand cloud nodes, using CPUs as well as GPUs. Installations can easily be mashed up securely, enabling large-scale location-transparent query processing in federations. Federation queries have been demonstrated live at EGU 2016 spanning Europe and Australia in the context of the intercontinental EarthServer initiative, visualized through NASA WorldWind.
Agile Datacube Analytics (not just) for the Earth Sciences
NASA Astrophysics Data System (ADS)
Baumann, P.
2016-12-01
Metadata are considered small, smart, and queryable; data, on the other hand, are known as big, clumsy, hard to analyze. Consequently, gridded data - such as images, image timeseries, and climate datacubes - are managed separately from the metadata, and with different, restricted retrieval capabilities. One reason for this silo approach is that databases, while good at tables, XML hierarchies, RDF graphs, etc., traditionally do not support multi-dimensional arrays well.This gap is being closed by Array Databases which extend the SQL paradigm of "any query, anytime" to NoSQL arrays. They introduce semantically rich modelling combined with declarative, high-level query languages on n-D arrays. On Server side, such queries can be optimized, parallelized, and distributed based on partitioned array storage. This way, they offer new vistas in flexibility, scalability, performance, and data integration. In this respect, the forthcoming ISO SQL extension MDA ("Multi-dimensional Arrays") will be a game changer in Big Data Analytics.We introduce concepts and opportunities through the example of rasdaman ("raster data manager") which in fact has pioneered the field of Array Databases and forms the blueprint for ISO SQL/MDA and further Big Data standards, such as OGC WCPS for querying spatio-temporal Earth datacubes. With operational installations exceeding 140 TB queries have been split across more than one thousand cloud nodes, using CPUs as well as GPUs. Installations can easily be mashed up securely, enabling large-scale location-transparent query processing in federations. Federation queries have been demonstrated live at EGU 2016 spanning Europe and Australia in the context of the intercontinental EarthServer initiative, visualized through NASA WorldWind.
Enabling On-Demand Database Computing with MIT SuperCloud Database Management System
2015-09-15
arc.liv.ac.uk/trac/SGE) provides these services and is independent of programming language (C, Fortran, Java , Matlab, etc) or parallel programming...a MySQL database to store DNS records. The DNS records are controlled via a simple web service interface that allows records to be created
A Tutorial in Creating Web-Enabled Databases with Inmagic DB/TextWorks through ODBC.
ERIC Educational Resources Information Center
Breeding, Marshall
2000-01-01
Explains how to create Web-enabled databases. Highlights include Inmagic's DB/Text WebPublisher product called DB/TextWorks; ODBC (Open Database Connectivity) drivers; Perl programming language; HTML coding; Structured Query Language (SQL); Common Gateway Interface (CGI) programming; and examples of HTML pages and Perl scripts. (LRW)
Zero tolerance for incorrect data: Best practices in SQL transaction programming
NASA Astrophysics Data System (ADS)
Laiho, M.; Skourlas, C.; Dervos, D. A.
2015-02-01
DBMS products differ in the way they support even the basic SQL transaction services. In this paper, a framework of best practices in SQL transaction programming is given and discussed. The SQL developers are advised to experiment with and verify the services supported by the DBMS product used. The framework has been developed by DBTechNet, a European network of teachers, trainers and ICT professionals. A course module on SQL transactions, offered by the LLP "DBTech VET Teachers" programme, is also presented and discussed. Aims and objectives of the programme include the introduction of the topics and content of SQL transactions and concurrency control to HE/VET curricula and addressing the need for initial and continuous training on these topics to in-company trainers, VET teachers, and Higher Education students. An overview of the course module, its learning outcomes, the education and training (E&T) content, virtual database labs with hands-on self-practicing exercises, plus instructions for the teacher/trainer on the pedagogy and the usage of the course modules' content are briefly described. The main principle adopted is to "Learn by verifying in practice" and the transactions course motto is: "Zero Tolerance for Incorrect Data".
Spatio-Temporal Data Model for Integrating Evolving Nation-Level Datasets
NASA Astrophysics Data System (ADS)
Sorokine, A.; Stewart, R. N.
2017-10-01
Ability to easily combine the data from diverse sources in a single analytical workflow is one of the greatest promises of the Big Data technologies. However, such integration is often challenging as datasets originate from different vendors, governments, and research communities that results in multiple incompatibilities including data representations, formats, and semantics. Semantics differences are hardest to handle: different communities often use different attribute definitions and associate the records with different sets of evolving geographic entities. Analysis of global socioeconomic variables across multiple datasets over prolonged time is often complicated by the difference in how boundaries and histories of countries or other geographic entities are represented. Here we propose an event-based data model for depicting and tracking histories of evolving geographic units (countries, provinces, etc.) and their representations in disparate data. The model addresses the semantic challenge of preserving identity of geographic entities over time by defining criteria for the entity existence, a set of events that may affect its existence, and rules for mapping between different representations (datasets). Proposed model is used for maintaining an evolving compound database of global socioeconomic and environmental data harvested from multiple sources. Practical implementation of our model is demonstrated using PostgreSQL object-relational database with the use of temporal, geospatial, and NoSQL database extensions.
Clinical Databases for Chest Physicians.
Courtwright, Andrew M; Gabriel, Peter E
2018-04-01
A clinical database is a repository of patient medical and sociodemographic information focused on one or more specific health condition or exposure. Although clinical databases may be used for research purposes, their primary goal is to collect and track patient data for quality improvement, quality assurance, and/or actual clinical management. This article aims to provide an introduction and practical advice on the development of small-scale clinical databases for chest physicians and practice groups. Through example projects, we discuss the pros and cons of available technical platforms, including Microsoft Excel and Access, relational database management systems such as Oracle and PostgreSQL, and Research Electronic Data Capture. We consider approaches to deciding the base unit of data collection, creating consensus around variable definitions, and structuring routine clinical care to complement database aims. We conclude with an overview of regulatory and security considerations for clinical databases. Copyright © 2018 American College of Chest Physicians. Published by Elsevier Inc. All rights reserved.
Development of a replicated database of DHCP data for evaluation of drug use.
Graber, S E; Seneker, J A; Stahl, A A; Franklin, K O; Neel, T E; Miller, R A
1996-01-01
This case report describes development and testing of a method to extract clinical information stored in the Veterans Affairs (VA) Decentralized Hospital Computer System (DHCP) for the purpose of analyzing data about groups of patients. The authors used a microcomputer-based, structured query language (SQL)-compatible, relational database system to replicate a subset of the Nashville VA Hospital's DHCP patient database. This replicated database contained the complete current Nashville DHCP prescription, provider, patient, and drug data sets, and a subset of the laboratory data. A pilot project employed this replicated database to answer questions that might arise in drug-use evaluation, such as identification of cases of polypharmacy, suboptimal drug regimens, and inadequate laboratory monitoring of drug therapy. These database queries included as candidates for review all prescriptions for all outpatients. The queries demonstrated that specific drug-use events could be identified for any time interval represented in the replicated database. PMID:8653451
Development of a replicated database of DHCP data for evaluation of drug use.
Graber, S E; Seneker, J A; Stahl, A A; Franklin, K O; Neel, T E; Miller, R A
1996-01-01
This case report describes development and testing of a method to extract clinical information stored in the Veterans Affairs (VA) Decentralized Hospital Computer System (DHCP) for the purpose of analyzing data about groups of patients. The authors used a microcomputer-based, structured query language (SQL)-compatible, relational database system to replicate a subset of the Nashville VA Hospital's DHCP patient database. This replicated database contained the complete current Nashville DHCP prescription, provider, patient, and drug data sets, and a subset of the laboratory data. A pilot project employed this replicated database to answer questions that might arise in drug-use evaluation, such as identification of cases of polypharmacy, suboptimal drug regimens, and inadequate laboratory monitoring of drug therapy. These database queries included as candidates for review all prescriptions for all outpatients. The queries demonstrated that specific drug-use events could be identified for any time interval represented in the replicated database.
Efficient hemodynamic event detection utilizing relational databases and wavelet analysis
NASA Technical Reports Server (NTRS)
Saeed, M.; Mark, R. G.
2001-01-01
Development of a temporal query framework for time-oriented medical databases has hitherto been a challenging problem. We describe a novel method for the detection of hemodynamic events in multiparameter trends utilizing wavelet coefficients in a MySQL relational database. Storage of the wavelet coefficients allowed for a compact representation of the trends, and provided robust descriptors for the dynamics of the parameter time series. A data model was developed to allow for simplified queries along several dimensions and time scales. Of particular importance, the data model and wavelet framework allowed for queries to be processed with minimal table-join operations. A web-based search engine was developed to allow for user-defined queries. Typical queries required between 0.01 and 0.02 seconds, with at least two orders of magnitude improvement in speed over conventional queries. This powerful and innovative structure will facilitate research on large-scale time-oriented medical databases.
Zhang, Qingzhou; Yang, Bo; Chen, Xujiao; Xu, Jing; Mei, Changlin; Mao, Zhiguo
2014-01-01
We present a bioinformatics database named Renal Gene Expression Database (RGED), which contains comprehensive gene expression data sets from renal disease research. The web-based interface of RGED allows users to query the gene expression profiles in various kidney-related samples, including renal cell lines, human kidney tissues and murine model kidneys. Researchers can explore certain gene profiles, the relationships between genes of interests and identify biomarkers or even drug targets in kidney diseases. The aim of this work is to provide a user-friendly utility for the renal disease research community to query expression profiles of genes of their own interest without the requirement of advanced computational skills. Website is implemented in PHP, R, MySQL and Nginx and freely available from http://rged.wall-eva.net. http://rged.wall-eva.net. © The Author(s) 2014. Published by Oxford University Press.
Find the fish: using PROC SQL to build a relational database
Fabrizio, Mary C.; Nelson, Scott N.
1995-01-01
Reliable estimates of abundance and survival, gained through mark-recapture studies, are necessary to better understand how to manage and restore lake trout populations in the Great Lakes. Working with a 24-year data set from a mark-recapture study conducted in Lake Superior, we attempted to disclose information on tag shedding by examining recaptures of double-tagged fish. The data set consisted of 64,288 observations on fish which had been marked with one or more tags; a subset of these fish had been marked with two tags at initial capture. Although DATA and PROC statements could be used to obtain some of the information we sought, these statements could not be used to extract a complete set of results from the double-tagging experiments. We therefore used SQL processing to create three tables representing the same information but in a fully normalized relational structure. In addition, we created indices to efficiently examine complex relationships among the individual capture records. This approach allowed us to obtain all the information necessary to estimate tag retention through subsequent modeling. We believe that our success with SQL was due in large part to its ability to simultaneosly scan the same table more than once and to permit consideration of other tables in sub-queries.
New tools and methods for direct programmatic access to the dbSNP relational database.
Saccone, Scott F; Quan, Jiaxi; Mehta, Gaurang; Bolze, Raphael; Thomas, Prasanth; Deelman, Ewa; Tischfield, Jay A; Rice, John P
2011-01-01
Genome-wide association studies often incorporate information from public biological databases in order to provide a biological reference for interpreting the results. The dbSNP database is an extensive source of information on single nucleotide polymorphisms (SNPs) for many different organisms, including humans. We have developed free software that will download and install a local MySQL implementation of the dbSNP relational database for a specified organism. We have also designed a system for classifying dbSNP tables in terms of common tasks we wish to accomplish using the database. For each task we have designed a small set of custom tables that facilitate task-related queries and provide entity-relationship diagrams for each task composed from the relevant dbSNP tables. In order to expose these concepts and methods to a wider audience we have developed web tools for querying the database and browsing documentation on the tables and columns to clarify the relevant relational structure. All web tools and software are freely available to the public at http://cgsmd.isi.edu/dbsnpq. Resources such as these for programmatically querying biological databases are essential for viably integrating biological information into genetic association experiments on a genome-wide scale.
2012-10-01
higher Java v5Apache Struts v2 Hibernate v2 C3PO SQL*Net client / JDBC Database Server Oracle 10.0.2 Desktop Client Internet Explorer...for mobile Smartphones - A Java -based framework utilizing Apache Struts on the server - Relational database to handle data storage requirements B...technologies are as follows: Technology Use Requirements Java Application Provides the backend application software to drive the PHR-A 7 BEA Web
Design and Implementation of CNEOST Image Database Based on NoSQL System
NASA Astrophysics Data System (ADS)
Wang, X.
2013-07-01
The China Near Earth Object Survey Telescope (CNEOST) is the largest Schmidt telescope in China, and it has acquired more than 3 TB astronomical image data since it saw the first light in 2006. After the upgradation of the CCD camera in 2013, over 10 TB data will be obtained every year. The management of massive images is not only an indispensable part of data processing pipeline but also the basis of data sharing. Based on the analysis of requirement, an image management system is designed and implemented by employing the non-relational database.
Design and Implementation of CNEOST Image Database Based on NoSQL System
NASA Astrophysics Data System (ADS)
Wang, Xin
2014-04-01
The China Near Earth Object Survey Telescope is the largest Schmidt telescope in China, and it has acquired more than 3 TB astronomical image data since it saw the first light in 2006. After the upgrade of the CCD camera in 2013, over 10 TB data will be obtained every year. The management of the massive images is not only an indispensable part of data processing pipeline but also the basis of data sharing. Based on the analysis of requirement, an image management system is designed and implemented by employing the non-relational database.
Information Flow Integrity for Systems of Independently-Developed Components
2015-06-22
We also examined three programs (Apache, MySQL , and PHP) in detail to evaluate the efficacy of using the provided package test suites to generate...method are just as effective as hooks that were manually placed over the course of years while greatly reducing the burden on programmers. ”Leveraging...to validate optimizations of real-world, mature applications: the Apache software suite, the Mozilla Suite, and the MySQL database. ”Validating Library
Processing of the WLCG monitoring data using NoSQL
NASA Astrophysics Data System (ADS)
Andreeva, J.; Beche, A.; Belov, S.; Dzhunov, I.; Kadochnikov, I.; Karavakis, E.; Saiz, P.; Schovancova, J.; Tuckett, D.
2014-06-01
The Worldwide LHC Computing Grid (WLCG) today includes more than 150 computing centres where more than 2 million jobs are being executed daily and petabytes of data are transferred between sites. Monitoring the computing activities of the LHC experiments, over such a huge heterogeneous infrastructure, is extremely demanding in terms of computation, performance and reliability. Furthermore, the generated monitoring flow is constantly increasing, which represents another challenge for the monitoring systems. While existing solutions are traditionally based on Oracle for data storage and processing, recent developments evaluate NoSQL for processing large-scale monitoring datasets. NoSQL databases are getting increasingly popular for processing datasets at the terabyte and petabyte scale using commodity hardware. In this contribution, the integration of NoSQL data processing in the Experiment Dashboard framework is described along with first experiences of using this technology for monitoring the LHC computing activities.
2006-09-01
Each of these layers will be described in more detail to include relevant technologies ( Java , PDA, Hibernate , and PostgreSQL) used to implement...Logic Layer -Object-Relational Mapper ( Hibernate ) Data 35 capable in order to interface with Java applications. Based on meeting the selection...further discussed. Query List Application Logic Layer HibernateApache - Java Servlet - Hibernate Interface -OR Mapper -RDBMS Interface
ERIC Educational Resources Information Center
Terawaki, Yuki; Takahashi, Yuichi; Kodama, Yasushi; Yana, Kazuo
2011-01-01
This paper describes an integration of different Relational Database Management System (RDBMS) of two Course Management Systems (CMS) called Sakai and the Common Factory for Inspiration and Value in Education (CFIVE). First, when the service of CMS is provided campus-wide, the problems of user support, CMS operation and customization of CMS are…
Domain fusion analysis by applying relational algebra to protein sequence and domain databases
Truong, Kevin; Ikura, Mitsuhiko
2003-01-01
Background Domain fusion analysis is a useful method to predict functionally linked proteins that may be involved in direct protein-protein interactions or in the same metabolic or signaling pathway. As separate domain databases like BLOCKS, PROSITE, Pfam, SMART, PRINTS-S, ProDom, TIGRFAMs, and amalgamated domain databases like InterPro continue to grow in size and quality, a computational method to perform domain fusion analysis that leverages on these efforts will become increasingly powerful. Results This paper proposes a computational method employing relational algebra to find domain fusions in protein sequence databases. The feasibility of this method was illustrated on the SWISS-PROT+TrEMBL sequence database using domain predictions from the Pfam HMM (hidden Markov model) database. We identified 235 and 189 putative functionally linked protein partners in H. sapiens and S. cerevisiae, respectively. From scientific literature, we were able to confirm many of these functional linkages, while the remainder offer testable experimental hypothesis. Results can be viewed at . Conclusion As the analysis can be computed quickly on any relational database that supports standard SQL (structured query language), it can be dynamically updated along with the sequence and domain databases, thereby improving the quality of predictions over time. PMID:12734020
ERIC Educational Resources Information Center
Liu, Xia; Liu, Lai C.; Koong, Kai S.; Lu, June
2003-01-01
Analysis of 300 information technology job postings in two Internet databases identified the following skill categories: programming languages (Java, C/C++, and Visual Basic were most frequent); website development (57% sought SQL and HTML skills); databases (nearly 50% required Oracle); networks (only Windows NT or wide-area/local-area networks);…
Computerized database management system for breast cancer patients.
Sim, Kok Swee; Chong, Sze Siang; Tso, Chih Ping; Nia, Mohsen Esmaeili; Chong, Aun Kee; Abbas, Siti Fathimah
2014-01-01
Data analysis based on breast cancer risk factors such as age, race, breastfeeding, hormone replacement therapy, family history, and obesity was conducted on breast cancer patients using a new enhanced computerized database management system. My Structural Query Language (MySQL) is selected as the application for database management system to store the patient data collected from hospitals in Malaysia. An automatic calculation tool is embedded in this system to assist the data analysis. The results are plotted automatically and a user-friendly graphical user interface is developed that can control the MySQL database. Case studies show breast cancer incidence rate is highest among Malay women, followed by Chinese and Indian. The peak age for breast cancer incidence is from 50 to 59 years old. Results suggest that the chance of developing breast cancer is increased in older women, and reduced with breastfeeding practice. The weight status might affect the breast cancer risk differently. Additional studies are needed to confirm these findings.
Techniques of Photometry and Astrometry with APASS, Gaia, and Pan-STARRs Results (Abstract)
NASA Astrophysics Data System (ADS)
Green, W.
2017-12-01
(Abstract only) The databases with the APASS DR9, Gaia DR1, and the Pan-STARRs 3pi DR1 data releases are publicly available for use. There is a bit of data-mining involved to download and manage these reference stars. This paper discusses the use of these databases to acquire accurate photometric references as well as techniques for improving results. Images are prepared in the usual way: zero, dark, flat-fields, and WCS solutions with Astrometry.net. Images are then processed with Sextractor to produce an ASCII table of identifying photometric features. The database manages photometics catalogs and images converted to ASCII tables. Scripts convert the files into SQL and assimilate them into database tables. Using SQL techniques, each image star is merged with reference data to produce publishable results. The VYSOS has over 13,000 images of the ONC5 field to process with roughly 100 total fields in the campaign. This paper provides the overview for this daunting task.
Image query and indexing for digital x rays
NASA Astrophysics Data System (ADS)
Long, L. Rodney; Thoma, George R.
1998-12-01
The web-based medical information retrieval system (WebMIRS) allows interned access to databases containing 17,000 digitized x-ray spine images and associated text data from National Health and Nutrition Examination Surveys (NHANES). WebMIRS allows SQL query of the text, and viewing of the returned text records and images using a standard browser. We are now working (1) to determine utility of data directly derived from the images in our databases, and (2) to investigate the feasibility of computer-assisted or automated indexing of the images to support image retrieval of images of interest to biomedical researchers in the field of osteoarthritis. To build an initial database based on image data, we are manually segmenting a subset of the vertebrae, using techniques from vertebral morphometry. From this, we will derive and add to the database vertebral features. This image-derived data will enhance the user's data access capability by enabling the creation of combined SQL/image-content queries.
Chapter 51: How to Build a Simple Cone Search Service Using a Local Database
NASA Astrophysics Data System (ADS)
Kent, B. R.; Greene, G. R.
The cone search service protocol will be examined from the server side in this chapter. A simple cone search service will be setup and configured locally using MySQL. Data will be read into a table, and the Java JDBC will be used to connect to the database. Readers will understand the VO cone search specification and how to use it to query a database on their local systems and return an XML/VOTable file based on an input of RA/DEC coordinates and a search radius. The cone search in this example will be deployed as a Java servlet. The resulting cone search can be tested with a verification service. This basic setup can be used with other languages and relational databases.
High performance semantic factoring of giga-scale semantic graph databases.
DOE Office of Scientific and Technical Information (OSTI.GOV)
al-Saffar, Sinan; Adolf, Bob; Haglin, David
2010-10-01
As semantic graph database technology grows to address components ranging from extant large triple stores to SPARQL endpoints over SQL-structured relational databases, it will become increasingly important to be able to bring high performance computational resources to bear on their analysis, interpretation, and visualization, especially with respect to their innate semantic structure. Our research group built a novel high performance hybrid system comprising computational capability for semantic graph database processing utilizing the large multithreaded architecture of the Cray XMT platform, conventional clusters, and large data stores. In this paper we describe that architecture, and present the results of our deployingmore » that for the analysis of the Billion Triple dataset with respect to its semantic factors, including basic properties, connected components, namespace interaction, and typed paths.« less
MaGnET: Malaria Genome Exploration Tool
Sharman, Joanna L.; Gerloff, Dietlind L.
2013-01-01
Summary: The Malaria Genome Exploration Tool (MaGnET) is a software tool enabling intuitive ‘exploration-style’ visualization of functional genomics data relating to the malaria parasite, Plasmodium falciparum. MaGnET provides innovative integrated graphic displays for different datasets, including genomic location of genes, mRNA expression data, protein–protein interactions and more. Any selection of genes to explore made by the user is easily carried over between the different viewers for different datasets, and can be changed interactively at any point (without returning to a search). Availability and Implementation: Free online use (Java Web Start) or download (Java application archive and MySQL database; requires local MySQL installation) at http://malariagenomeexplorer.org Contact: joanna.sharman@ed.ac.uk or dgerloff@ffame.org Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23894142
Software Engineering Laboratory (SEL) database organization and user's guide, revision 2
NASA Technical Reports Server (NTRS)
Morusiewicz, Linda; Bristow, John
1992-01-01
The organization of the Software Engineering Laboratory (SEL) database is presented. Included are definitions and detailed descriptions of the database tables and views, the SEL data, and system support data. The mapping from the SEL and system support data to the base table is described. In addition, techniques for accessing the database through the Database Access Manager for the SEL (DAMSEL) system and via the ORACLE structured query language (SQL) are discussed.
Software Engineering Laboratory (SEL) database organization and user's guide
NASA Technical Reports Server (NTRS)
So, Maria; Heller, Gerard; Steinberg, Sandra; Spiegel, Douglas
1989-01-01
The organization of the Software Engineering Laboratory (SEL) database is presented. Included are definitions and detailed descriptions of the database tables and views, the SEL data, and system support data. The mapping from the SEL and system support data to the base tables is described. In addition, techniques for accessing the database, through the Database Access Manager for the SEL (DAMSEL) system and via the ORACLE structured query language (SQL), are discussed.
Hosting and pulishing astronomical data in SQL databases
NASA Astrophysics Data System (ADS)
Galkin, Anastasia; Klar, Jochen; Riebe, Kristin; Matokevic, Gal; Enke, Harry
2017-04-01
In astronomy, terabytes and petabytes of data are produced by ground instruments, satellite missions and simulations. At Leibniz-Institute for Astrophysics Potsdam (AIP) we host and publish terabytes of cosmological simulation and observational data. The public archive at AIP has now reached a size of 60TB and growing and helps to produce numerous scientific papers. The web framework Daiquiri offers a dedicated web interface for each of the hosted scientific databases. Scientists all around the world run SQL queries which include specific astrophysical functions and get their desired data in reasonable time. Daiquiri supports the scientific projects by offering a number of administration tools such as database and user management, contact messages to the staff and support for organization of meetings and workshops. The webpages can be customized and the Wordpress integration supports the participating scientists in maintaining the documentation and the projects' news sections.
Understanding and Capturing People’s Mobile App Privacy Preferences
2013-10-28
The entire apps’ metadata takes up about 500MB of storage space when stored in a MySQL database and all the binary files take approximately 300GB of...functionality that can de- compile Dalvik bytecodes to Java source code faster than other de-compilers. Given the scale of the app analysis we planned on... java libraries, such as parser, sql connectors, etc Targeted Ads 137 admob, adwhirl, greystripe… Provided by mobile behavioral ads company to
Shark: SQL and Analytics with Cost-Based Query Optimization on Coarse-Grained Distributed Memory
2014-01-13
RDBMS and contains a database (often MySQL or Derby) with a namespace for tables, table metadata and partition information. Table data is stored in an...serialization/deserialization) Java interface implementations with corresponding object inspectors. The Hive driver controls the processing of queries, coordinat...native API, RDD operations are invoked through a functional interface similar to DryadLINQ [32] in Scala, Java or Python. For example, the Scala code for
Search extension transforms Wiki into a relational system: a case for flavonoid metabolite database.
Arita, Masanori; Suwa, Kazuhiro
2008-09-17
In computer science, database systems are based on the relational model founded by Edgar Codd in 1970. On the other hand, in the area of biology the word 'database' often refers to loosely formatted, very large text files. Although such bio-databases may describe conflicts or ambiguities (e.g. a protein pair do and do not interact, or unknown parameters) in a positive sense, the flexibility of the data format sacrifices a systematic query mechanism equivalent to the widely used SQL. To overcome this disadvantage, we propose embeddable string-search commands on a Wiki-based system and designed a half-formatted database. As proof of principle, a database of flavonoid with 6902 molecular structures from over 1687 plant species was implemented on MediaWiki, the background system of Wikipedia. Registered users can describe any information in an arbitrary format. Structured part is subject to text-string searches to realize relational operations. The system was written in PHP language as the extension of MediaWiki. All modifications are open-source and publicly available. This scheme benefits from both the free-formatted Wiki style and the concise and structured relational-database style. MediaWiki supports multi-user environments for document management, and the cost for database maintenance is alleviated.
Search extension transforms Wiki into a relational system: A case for flavonoid metabolite database
Arita, Masanori; Suwa, Kazuhiro
2008-01-01
Background In computer science, database systems are based on the relational model founded by Edgar Codd in 1970. On the other hand, in the area of biology the word 'database' often refers to loosely formatted, very large text files. Although such bio-databases may describe conflicts or ambiguities (e.g. a protein pair do and do not interact, or unknown parameters) in a positive sense, the flexibility of the data format sacrifices a systematic query mechanism equivalent to the widely used SQL. Results To overcome this disadvantage, we propose embeddable string-search commands on a Wiki-based system and designed a half-formatted database. As proof of principle, a database of flavonoid with 6902 molecular structures from over 1687 plant species was implemented on MediaWiki, the background system of Wikipedia. Registered users can describe any information in an arbitrary format. Structured part is subject to text-string searches to realize relational operations. The system was written in PHP language as the extension of MediaWiki. All modifications are open-source and publicly available. Conclusion This scheme benefits from both the free-formatted Wiki style and the concise and structured relational-database style. MediaWiki supports multi-user environments for document management, and the cost for database maintenance is alleviated. PMID:18822113
A Medical Image Backup Architecture Based on a NoSQL Database and Cloud Computing Services.
Santos Simões de Almeida, Luan Henrique; Costa Oliveira, Marcelo
2015-01-01
The use of digital systems for storing medical images generates a huge volume of data. Digital images are commonly stored and managed on a Picture Archiving and Communication System (PACS), under the DICOM standard. However, PACS is limited because it is strongly dependent on the server's physical space. Alternatively, Cloud Computing arises as an extensive, low cost, and reconfigurable resource. However, medical images contain patient information that can not be made available in a public cloud. Therefore, a mechanism to anonymize these images is needed. This poster presents a solution for this issue by taking digital images from PACS, converting the information contained in each image file to a NoSQL database, and using cloud computing to store digital images.
Celesti, Antonio; Fazio, Maria; Romano, Agata; Bramanti, Alessia; Bramanti, Placido; Villari, Massimo
2018-05-01
The Open Archive Information System (OAIS) is a reference model for organizing people and resources in a system, and it is already adopted in care centers and medical systems to efficiently manage clinical data, medical personnel, and patients. Archival storage systems are typically implemented using traditional relational database systems, but the relation-oriented technology strongly limits the efficiency in the management of huge amount of patients' clinical data, especially in emerging cloud-based, that are distributed. In this paper, we present an OAIS healthcare architecture useful to manage a huge amount of HL7 clinical documents in a scalable way. Specifically, it is based on a NoSQL column-oriented Data Base Management System deployed in the cloud, thus to benefit from a big tables and wide rows available over a virtual distributed infrastructure. We developed a prototype of the proposed architecture at the IRCCS, and we evaluated its efficiency in a real case of study.
NASA Astrophysics Data System (ADS)
Guion, A., Jr.; Hodgkins, H.
2015-12-01
The Center of Excellence in Remote Sensing Education and Research (CERSER) has implemented three research projects during the summer Research Experience for Undergraduates (REU) program gathering water quality data for local waterways. The data has been compiled manually utilizing pen and paper and then entered into a spreadsheet. With the spread of electronic devices capable of interacting with databases, the development of an electronic method of entering and manipulating the water quality data was pursued during this project. This project focused on the development of an interactive database to gather, display, and analyze data collected from local waterways. The database and entry form was built in MySQL on a PHP server allowing participants to enter data from anywhere Internet access is available. This project then researched applying this data to the Google Maps site to provide labeling and information to users. The NIA server at http://nia.ecsu.edu is used to host the application for download and for storage of the databases. Water Quality Database Team members included the authors plus Derek Morris Jr., Kathryne Burton and Mr. Jeff Wood as mentor.
Shuttle-Data-Tape XML Translator
NASA Technical Reports Server (NTRS)
Barry, Matthew R.; Osborne, Richard N.
2005-01-01
JSDTImport is a computer program for translating native Shuttle Data Tape (SDT) files from American Standard Code for Information Interchange (ASCII) format into databases in other formats. JSDTImport solves the problem of organizing the SDT content, affording flexibility to enable users to choose how to store the information in a database to better support client and server applications. JSDTImport can be dynamically configured by use of a simple Extensible Markup Language (XML) file. JSDTImport uses this XML file to define how each record and field will be parsed, its layout and definition, and how the resulting database will be structured. JSDTImport also includes a client application programming interface (API) layer that provides abstraction for the data-querying process. The API enables a user to specify the search criteria to apply in gathering all the data relevant to a query. The API can be used to organize the SDT content and translate into a native XML database. The XML format is structured into efficient sections, enabling excellent query performance by use of the XPath query language. Optionally, the content can be translated into a Structured Query Language (SQL) database for fast, reliable SQL queries on standard database server computers.
ADVICE--Educational System for Teaching Database Courses
ERIC Educational Resources Information Center
Cvetanovic, M.; Radivojevic, Z.; Blagojevic, V.; Bojovic, M.
2011-01-01
This paper presents a Web-based educational system, ADVICE, that helps students to bridge the gap between database management system (DBMS) theory and practice. The usage of ADVICE is presented through a set of laboratory exercises developed to teach students conceptual and logical modeling, SQL, formal query languages, and normalization. While…
SORTEZ: a relational translator for NCBI's ASN.1 database.
Hart, K W; Searls, D B; Overton, G C
1994-07-01
The National Center for Biotechnology Information (NCBI) has created a database collection that includes several protein and nucleic acid sequence databases, a biosequence-specific subset of MEDLINE, as well as value-added information such as links between similar sequences. Information in the NCBI database is modeled in Abstract Syntax Notation 1 (ASN.1) an Open Systems Interconnection protocol designed for the purpose of exchanging structured data between software applications rather than as a data model for database systems. While the NCBI database is distributed with an easy-to-use information retrieval system, ENTREZ, the ASN.1 data model currently lacks an ad hoc query language for general-purpose data access. For that reason, we have developed a software package, SORTEZ, that transforms the ASN.1 database (or other databases with nested data structures) to a relational data model and subsequently to a relational database management system (Sybase) where information can be accessed through the relational query language, SQL. Because the need to transform data from one data model and schema to another arises naturally in several important contexts, including efficient execution of specific applications, access to multiple databases and adaptation to database evolution this work also serves as a practical study of the issues involved in the various stages of database transformation. We show that transformation from the ASN.1 data model to a relational data model can be largely automated, but that schema transformation and data conversion require considerable domain expertise and would greatly benefit from additional support tools.
Cross-Matching Source Observations from the Palomar Transient Factory (PTF)
NASA Astrophysics Data System (ADS)
Laher, Russ; Grillmair, C.; Surace, J.; Monkewitz, S.; Jackson, E.
2009-01-01
Over the four-year lifetime of the PTF project, approximately 40 billion instances of astronomical-source observations will be extracted from the image data. The instances will correspond to the same astronomical objects being observed at roughly 25-50 different times, and so a very large catalog containing important object-variability information will be the chief PTF product. Organizing astronomical-source catalogs is conventionally done by dividing the catalog into declination zones and sorting by right ascension within each zone (e.g., the USNOA star catalog), in order to facilitate catalog searches. This method was reincarnated as the "zones" algorithm in a SQL-Server database implementation (Szalay et al., MSR-TR-2004-32), with corrections given by Gray et al. (MSR-TR-2006-52). The primary advantage of this implementation is that all of the work is done entirely on the database server and client/server communication is eliminated. We implemented the methods outlined in Gray et al. for a PostgreSQL database. We programmed the methods as database functions in PL/pgSQL procedural language. The cross-matching is currently based on source positions, but we intend to extend it to use both positions and positional uncertainties to form a chi-square statistic for optimal thresholding. The database design includes three main tables, plus a handful of internal tables. The Sources table stores the SExtractor source extractions taken at various times; the MergedSources table stores statistics about the astronomical objects, which are the result of cross-matching records in the Sources table; and the Merges table, which associates cross-matched primary keys in the Sources table with primary keys in the MergedSoures table. Besides judicious database indexing, we have also internally partitioned the Sources table by declination zone, in order to speed up the population of Sources records and make the database more manageable. The catalog will be accessible to the public after the proprietary period through IRSA (irsa.ipac.caltech.edu).
Changes in Exercise Data Management
NASA Technical Reports Server (NTRS)
Buxton, R. E.; Kalogera, K. L.; Hanson, A. M.
2018-01-01
The suite of exercise hardware aboard the International Space Station (ISS) generates an immense amount of data. The data collected from the treadmill, cycle ergometer, and resistance strength training hardware are basic exercise parameters (time, heart rate, speed, load, etc.). The raw data are post processed in the laboratory and more detailed parameters are calculated from each exercise data file. Updates have recently been made to how this valuable data are stored, adding an additional level of data security, increasing data accessibility, and resulting in overall increased efficiency of medical report delivery. Questions regarding exercise performance or how exercise may influence other variables of crew health frequently arise within the crew health care community. Inquiries over the health of the exercise hardware often need quick analysis and response to ensure the exercise system is operable on a continuous basis. Consolidating all of the exercise system data in a single repository enables a quick response to both the medical and engineering communities. A SQL server database is currently in use, and provides a secure location for all of the exercise data starting at ISS Expedition 1 - current day. The database has been structured to update derived metrics automatically, making analysis and reporting available within minutes of dropping the inflight data it into the database. Commercial tools were evaluated to help aggregate and visualize data from the SQL database. The Tableau software provides manageable interface, which has improved the laboratory's output time of crew reports by 67%. Expansion of the SQL database to be inclusive of additional medical requirement metrics, addition of 'app-like' tools for mobile visualization, and collaborative use (e.g. operational support teams, research groups, and International Partners) of the data system is currently being explored.
Design and Implementation of 3D Model Data Management System Based on SQL
NASA Astrophysics Data System (ADS)
Li, Shitao; Zhang, Shixin; Zhang, Zhanling; Li, Shiming; Jia, Kun; Hu, Zhongxu; Ping, Liang; Hu, Youming; Li, Yanlei
CAD/CAM technology plays an increasingly important role in the machinery manufacturing industry. As an important means of production, the accumulated three-dimensional models in many years of design work are valuable. Thus the management of these three-dimensional models is of great significance. This paper gives detailed explanation for a method to design three-dimensional model databases based on SQL and to implement the functions such as insertion, modification, inquiry, preview and so on.
DRUMS: Disk Repository with Update Management and Select option for high throughput sequencing data
2014-01-01
Background New technologies for analyzing biological samples, like next generation sequencing, are producing a growing amount of data together with quality scores. Moreover, software tools (e.g., for mapping sequence reads), calculating transcription factor binding probabilities, estimating epigenetic modification enriched regions or determining single nucleotide polymorphism increase this amount of position-specific DNA-related data even further. Hence, requesting data becomes challenging and expensive and is often implemented using specialised hardware. In addition, picking specific data as fast as possible becomes increasingly important in many fields of science. The general problem of handling big data sets was addressed by developing specialized databases like HBase, HyperTable or Cassandra. However, these database solutions require also specialized or distributed hardware leading to expensive investments. To the best of our knowledge, there is no database capable of (i) storing billions of position-specific DNA-related records, (ii) performing fast and resource saving requests, and (iii) running on a single standard computer hardware. Results Here, we present DRUMS (Disk Repository with Update Management and Select option), satisfying demands (i)-(iii). It tackles the weaknesses of traditional databases while handling position-specific DNA-related data in an efficient manner. DRUMS is capable of storing up to billions of records. Moreover, it focuses on optimizing relating single lookups as range request, which are needed permanently for computations in bioinformatics. To validate the power of DRUMS, we compare it to the widely used MySQL database. The test setting considers two biological data sets. We use standard desktop hardware as test environment. Conclusions DRUMS outperforms MySQL in writing and reading records by a factor of two up to a factor of 10000. Furthermore, it can work with significantly larger data sets. Our work focuses on mid-sized data sets up to several billion records without requiring cluster technology. Storing position-specific data is a general problem and the concept we present here is a generalized approach. Hence, it can be easily applied to other fields of bioinformatics. PMID:24495746
BIRS - Bioterrorism Information Retrieval System.
Tewari, Ashish Kumar; Rashi; Wadhwa, Gulshan; Sharma, Sanjeev Kumar; Jain, Chakresh Kumar
2013-01-01
Bioterrorism is the intended use of pathogenic strains of microbes to widen terror in a population. There is a definite need to promote research for development of vaccines, therapeutics and diagnostic methods as a part of preparedness to any bioterror attack in the future. BIRS is an open-access database of collective information on the organisms related to bioterrorism. The architecture of database utilizes the current open-source technology viz PHP ver 5.3.19, MySQL and IIS server under windows platform for database designing. Database stores information on literature, generic- information and unique pathways of about 10 microorganisms involved in bioterrorism. This may serve as a collective repository to accelerate the drug discovery and vaccines designing process against such bioterrorist agents (microbes). The available data has been validated from various online resources and literature mining in order to provide the user with a comprehensive information system. The database is freely available at http://www.bioterrorism.biowaves.org.
SGDB: a database of synthetic genes re-designed for optimizing protein over-expression.
Wu, Gang; Zheng, Yuanpu; Qureshi, Imran; Zin, Htar Thant; Beck, Tyler; Bulka, Blazej; Freeland, Stephen J
2007-01-01
Here we present the Synthetic Gene Database (SGDB): a relational database that houses sequences and associated experimental information on synthetic (artificially engineered) genes from all peer-reviewed studies published to date. At present, the database comprises information from more than 200 published experiments. This resource not only provides reference material to guide experimentalists in designing new genes that improve protein expression, but also offers a dataset for analysis by bioinformaticians who seek to test ideas regarding the underlying factors that influence gene expression. The SGDB was built under MySQL database management system. We also offer an XML schema for standardized data description of synthetic genes. Users can access the database at http://www.evolvingcode.net/codon/sgdb/index.php, or batch downloads all information through XML files. Moreover, users may visually compare the coding sequences of a synthetic gene and its natural counterpart with an integrated web tool at http://www.evolvingcode.net/codon/sgdb/aligner.php, and discuss questions, findings and related information on an associated e-forum at http://www.evolvingcode.net/forum/viewforum.php?f=27.
New tools and methods for direct programmatic access to the dbSNP relational database
Saccone, Scott F.; Quan, Jiaxi; Mehta, Gaurang; Bolze, Raphael; Thomas, Prasanth; Deelman, Ewa; Tischfield, Jay A.; Rice, John P.
2011-01-01
Genome-wide association studies often incorporate information from public biological databases in order to provide a biological reference for interpreting the results. The dbSNP database is an extensive source of information on single nucleotide polymorphisms (SNPs) for many different organisms, including humans. We have developed free software that will download and install a local MySQL implementation of the dbSNP relational database for a specified organism. We have also designed a system for classifying dbSNP tables in terms of common tasks we wish to accomplish using the database. For each task we have designed a small set of custom tables that facilitate task-related queries and provide entity-relationship diagrams for each task composed from the relevant dbSNP tables. In order to expose these concepts and methods to a wider audience we have developed web tools for querying the database and browsing documentation on the tables and columns to clarify the relevant relational structure. All web tools and software are freely available to the public at http://cgsmd.isi.edu/dbsnpq. Resources such as these for programmatically querying biological databases are essential for viably integrating biological information into genetic association experiments on a genome-wide scale. PMID:21037260
Kilintzis, Vassilis; Beredimas, Nikolaos; Chouvarda, Ioanna
2014-01-01
An integral part of a system that manages medical data is the persistent storage engine. For almost twenty five years Relational Database Management Systems(RDBMS) were considered the obvious decision, yet today new technologies have emerged that require our attention as possible alternatives. Triplestores store information in terms of RDF triples without necessarily binding to a specific predefined structural model. In this paper we present an attempt to compare the performance of Apache JENA-Fuseki and the Virtuoso Universal Server 6 triplestores with that of MySQL 5.6 RDBMS for storing and retrieving medical information that it is communicated as RDF/XML ontology instances over a RESTful web service. The results show that the performance, calculated as average time of storing and retrieving instances, is significantly better using Virtuoso Server while MySQL performed better than Fuseki.
NASA Astrophysics Data System (ADS)
Dabiru, L.; O'Hara, C. G.; Shaw, D.; Katragadda, S.; Anderson, D.; Kim, S.; Shrestha, B.; Aanstoos, J.; Frisbie, T.; Policelli, F.; Keblawi, N.
2006-12-01
The Research Project Knowledge Base (RPKB) is currently being designed and will be implemented in a manner that is fully compatible and interoperable with enterprise architecture tools developed to support NASA's Applied Sciences Program. Through user needs assessment, collaboration with Stennis Space Center, Goddard Space Flight Center, and NASA's DEVELOP Staff personnel insight to information needs for the RPKB were gathered from across NASA scientific communities of practice. To enable efficient, consistent, standard, structured, and managed data entry and research results compilation a prototype RPKB has been designed and fully integrated with the existing NASA Earth Science Systems Components database. The RPKB will compile research project and keyword information of relevance to the six major science focus areas, 12 national applications, and the Global Change Master Directory (GCMD). The RPKB will include information about projects awarded from NASA research solicitations, project investigator information, research publications, NASA data products employed, and model or decision support tools used or developed as well as new data product information. The RPKB will be developed in a multi-tier architecture that will include a SQL Server relational database backend, middleware, and front end client interfaces for data entry. The purpose of this project is to intelligently harvest the results of research sponsored by the NASA Applied Sciences Program and related research program results. We present various approaches for a wide spectrum of knowledge discovery of research results, publications, projects, etc. from the NASA Systems Components database and global information systems and show how this is implemented in SQL Server database. The application of knowledge discovery is useful for intelligent query answering and multiple-layered database construction. Using advanced EA tools such as the Earth Science Architecture Tool (ESAT), RPKB will enable NASA and partner agencies to efficiently identify the significant results for new experiment directions and principle investigators to formulate experiment directions for new proposals.
A veterinary anatomy tutoring system.
Theodoropoulos, G; Loumos, V; Antonopoulos, J
1994-02-14
A veterinary anatomy tutoring system was developed by using Knowledge Pro, an object-oriented software development tool with hypermedia capabilities, and MS Access, a relational database. Communication between them is facilitated by using the Structured Query Language (SQL). The architecture of the system is based on knowledge sets, each of which covers four different descriptions of an organ, namely gross anatomy (general description), gross anatomy (comparative features), histology, and embryology, which constitute the knowledge units. These knowledge units are linked with three global variables that define the animals, the topographies, and the system to which this organ belongs, creating three data-bases. These three data-bases are interrelated through the organ field in order to establish a relational model. This system allows versatility in the student's navigation through the information space by offering different modes for information location and presentation. These include course mode, review mode, reference mode, dissection mode, and comparison mode. In addition, the system provides a self-evaluation mode.
Observational database for studies of nearby universe
NASA Astrophysics Data System (ADS)
Kaisina, E. I.; Makarov, D. I.; Karachentsev, I. D.; Kaisin, S. S.
2012-01-01
We present the description of a database of galaxies of the Local Volume (LVG), located within 10 Mpc around the Milky Way. It contains more than 800 objects. Based on an analysis of functional capabilities, we used the PostgreSQL DBMS as a management system for our LVG database. Applying semantic modelling methods, we developed a physical ER-model of the database. We describe the developed architecture of the database table structure, and the implemented web-access, available at http://www.sao.ru/lv/lvgdb.
Studies of Big Data metadata segmentation between relational and non-relational databases
NASA Astrophysics Data System (ADS)
Golosova, M. V.; Grigorieva, M. A.; Klimentov, A. A.; Ryabinkin, E. A.; Dimitrov, G.; Potekhin, M.
2015-12-01
In recent years the concepts of Big Data became well established in IT. Systems managing large data volumes produce metadata that describe data and workflows. These metadata are used to obtain information about current system state and for statistical and trend analysis of the processes these systems drive. Over the time the amount of the stored metadata can grow dramatically. In this article we present our studies to demonstrate how metadata storage scalability and performance can be improved by using hybrid RDBMS/NoSQL architecture.
Domain fusion analysis by applying relational algebra to protein sequence and domain databases.
Truong, Kevin; Ikura, Mitsuhiko
2003-05-06
Domain fusion analysis is a useful method to predict functionally linked proteins that may be involved in direct protein-protein interactions or in the same metabolic or signaling pathway. As separate domain databases like BLOCKS, PROSITE, Pfam, SMART, PRINTS-S, ProDom, TIGRFAMs, and amalgamated domain databases like InterPro continue to grow in size and quality, a computational method to perform domain fusion analysis that leverages on these efforts will become increasingly powerful. This paper proposes a computational method employing relational algebra to find domain fusions in protein sequence databases. The feasibility of this method was illustrated on the SWISS-PROT+TrEMBL sequence database using domain predictions from the Pfam HMM (hidden Markov model) database. We identified 235 and 189 putative functionally linked protein partners in H. sapiens and S. cerevisiae, respectively. From scientific literature, we were able to confirm many of these functional linkages, while the remainder offer testable experimental hypothesis. Results can be viewed at http://calcium.uhnres.utoronto.ca/pi. As the analysis can be computed quickly on any relational database that supports standard SQL (structured query language), it can be dynamically updated along with the sequence and domain databases, thereby improving the quality of predictions over time.
Mining Bug Databases for Unidentified Software Vulnerabilities
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dumidu Wijayasekara; Milos Manic; Jason Wright
2012-06-01
Identifying software vulnerabilities is becoming more important as critical and sensitive systems increasingly rely on complex software systems. It has been suggested in previous work that some bugs are only identified as vulnerabilities long after the bug has been made public. These vulnerabilities are known as hidden impact vulnerabilities. This paper discusses the feasibility and necessity to mine common publicly available bug databases for vulnerabilities that are yet to be identified. We present bug database analysis of two well known and frequently used software packages, namely Linux kernel and MySQL. It is shown that for both Linux and MySQL, amore » significant portion of vulnerabilities that were discovered for the time period from January 2006 to April 2011 were hidden impact vulnerabilities. It is also shown that the percentage of hidden impact vulnerabilities has increased in the last two years, for both software packages. We then propose an improved hidden impact vulnerability identification methodology based on text mining bug databases, and conclude by discussing a few potential problems faced by such a classifier.« less
2003-06-01
delivery Data Access (1980s) "What were unit sales in New England last March?" Relational databases (RDBMS), Structured Query Language ( SQL ...macros written in Visual Basic for Applications ( VBA ). 32 Iteration Two: Class Diagram Tech OASIS Export ScriptImport Filter Data ProcessingMethod 1...MS Excel * 1 VBA Macro*1 contains sends data to co nt ai ns executes * * 1 1 contains contains Figure 20. Iteration two class diagram The
2002-09-01
Basic for Applications ( VBA ) 6.0 as macros may not be supported in 8 future versions of Access. Access 2000 offers Internet- related features for...security features from Microsoft’s SQL Server. [1] 3. System Requirements Access 2000 is a resource-intensive application as are all Office 2000...1] • Modules – Functions and procedures written in the Visual Basic for Applications ( VBA ) programming language. The capabilities of modules
Experience with Multi-Tier Grid MySQL Database Service Resiliency at BNL
NASA Astrophysics Data System (ADS)
Wlodek, Tomasz; Ernst, Michael; Hover, John; Katramatos, Dimitrios; Packard, Jay; Smirnov, Yuri; Yu, Dantong
2011-12-01
We describe the use of F5's BIG-IP smart switch technology (3600 Series and Local Traffic Manager v9.0) to provide load balancing and automatic fail-over to multiple Grid services (GUMS, VOMS) and their associated back-end MySQL databases. This resiliency is introduced in front of the external application servers and also for the back-end database systems, which is what makes it "multi-tier". The combination of solutions chosen to ensure high availability of the services, in particular the database replication and fail-over mechanism, are discussed in detail. The paper explains the design and configuration of the overall system, including virtual servers, machine pools, and health monitors (which govern routing), as well as the master-slave database scheme and fail-over policies and procedures. Pre-deployment planning and stress testing will be outlined. Integration of the systems with our Nagios-based facility monitoring and alerting is also described. And application characteristics of GUMS and VOMS which enable effective clustering will be explained. We then summarize our practical experiences and real-world scenarios resulting from operating a major US Grid center, and assess the applicability of our approach to other Grid services in the future.
Integrated Substrate and Thin Film Design Methods
1999-02-01
Proper Representation Once the required chemical databases had been converted to the Excel format, VBA macros were written to convert chemical...ternary systems databases were imported from MS Excel to MS Access to implement SQL queries. Further, this database was connected via an ODBC model, to the... VBA macro, corresponding to each of the elements A, B, and C, respectively. The B loop began with the next alphabetical choice of element symbols
An integrated data-analysis and database system for AMS 14C
NASA Astrophysics Data System (ADS)
Kjeldsen, Henrik; Olsen, Jesper; Heinemeier, Jan
2010-04-01
AMSdata is the name of a combined database and data-analysis system for AMS 14C and stable-isotope work that has been developed at Aarhus University. The system (1) contains routines for data analysis of AMS and MS data, (2) allows a flexible and accurate description of sample extraction and pretreatment, also when samples are split into several fractions, and (3) keeps track of all measured, calculated and attributed data. The structure of the database is flexible and allows an unlimited number of measurement and pretreatment procedures. The AMS 14C data analysis routine is fairly advanced and flexible, and it can be easily optimized for different kinds of measuring processes. Technically, the system is based on a Microsoft SQL server and includes stored SQL procedures for the data analysis. Microsoft Office Access is used for the (graphical) user interface, and in addition Excel, Word and Origin are exploited for input and output of data, e.g. for plotting data during data analysis.
Techniques for Efficiently Managing Large Geosciences Data Sets
NASA Astrophysics Data System (ADS)
Kruger, A.; Krajewski, W. F.; Bradley, A. A.; Smith, J. A.; Baeck, M. L.; Steiner, M.; Lawrence, R. E.; Ramamurthy, M. K.; Weber, J.; Delgreco, S. A.; Domaszczynski, P.; Seo, B.; Gunyon, C. A.
2007-12-01
We have developed techniques and software tools for efficiently managing large geosciences data sets. While the techniques were developed as part of an NSF-Funded ITR project that focuses on making NEXRAD weather data and rainfall products available to hydrologists and other scientists, they are relevant to other geosciences disciplines that deal with large data sets. Metadata, relational databases, data compression, and networking are central to our methodology. Data and derived products are stored on file servers in a compressed format. URLs to, and metadata about the data and derived products are managed in a PostgreSQL database. Virtually all access to the data and products is through this database. Geosciences data normally require a number of processing steps to transform the raw data into useful products: data quality assurance, coordinate transformations and georeferencing, applying calibration information, and many more. We have developed the concept of crawlers that manage this scientific workflow. Crawlers are unattended processes that run indefinitely, and at set intervals query the database for their next assignment. A database table functions as a roster for the crawlers. Crawlers perform well-defined tasks that are, except for perhaps sequencing, largely independent from other crawlers. Once a crawler is done with its current assignment, it updates the database roster table, and gets its next assignment by querying the database. We have developed a library that enables one to quickly add crawlers. The library provides hooks to external (i.e., C-language) compiled codes, so that developers can work and contribute independently. Processes called ingesters inject data into the system. The bulk of the data are from a real-time feed using UCAR/Unidata's IDD/LDM software. An exciting recent development is the establishment of a Unidata HYDRO feed that feeds value-added metadata over the IDD/LDM. Ingesters grab the metadata and populate the PostgreSQL tables. These and other concepts we have developed have enabled us to efficiently manage a 70 Tb (and growing) data weather radar data set.
DB Dehydrogenase: an online integrated structural database on enzyme dehydrogenase.
Nandy, Suman Kumar; Bhuyan, Rajabrata; Seal, Alpana
2012-01-01
Dehydrogenase enzymes are almost inevitable for metabolic processes. Shortage or malfunctioning of dehydrogenases often leads to several acute diseases like cancers, retinal diseases, diabetes mellitus, Alzheimer, hepatitis B & C etc. With advancement in modern-day research, huge amount of sequential, structural and functional data are generated everyday and widens the gap between structural attributes and its functional understanding. DB Dehydrogenase is an effort to relate the functionalities of dehydrogenase with its structures. It is a completely web-based structural database, covering almost all dehydrogenases [~150 enzyme classes, ~1200 entries from ~160 organisms] whose structures are known. It is created by extracting and integrating various online resources to provide the true and reliable data and implemented by MySQL relational database through user friendly web interfaces using CGI Perl. Flexible search options are there for data extraction and exploration. To summarize, sequence, structure, function of all dehydrogenases in one place along with the necessary option of cross-referencing; this database will be utile for researchers to carry out further work in this field. The database is available for free at http://www.bifku.in/DBD/
Virtual file system on NoSQL for processing high volumes of HL7 messages.
Kimura, Eizen; Ishihara, Ken
2015-01-01
The Standardized Structured Medical Information Exchange (SS-MIX) is intended to be the standard repository for HL7 messages that depend on a local file system. However, its scalability is limited. We implemented a virtual file system using NoSQL to incorporate modern computing technology into SS-MIX and allow the system to integrate local patient IDs from different healthcare systems into a universal system. We discuss its implementation using the database MongoDB and describe its performance in a case study.
Accessing the public MIMIC-II intensive care relational database for clinical research.
Scott, Daniel J; Lee, Joon; Silva, Ikaro; Park, Shinhyuk; Moody, George B; Celi, Leo A; Mark, Roger G
2013-01-10
The Multiparameter Intelligent Monitoring in Intensive Care II (MIMIC-II) database is a free, public resource for intensive care research. The database was officially released in 2006, and has attracted a growing number of researchers in academia and industry. We present the two major software tools that facilitate accessing the relational database: the web-based QueryBuilder and a downloadable virtual machine (VM) image. QueryBuilder and the MIMIC-II VM have been developed successfully and are freely available to MIMIC-II users. Simple example SQL queries and the resulting data are presented. Clinical studies pertaining to acute kidney injury and prediction of fluid requirements in the intensive care unit are shown as typical examples of research performed with MIMIC-II. In addition, MIMIC-II has also provided data for annual PhysioNet/Computing in Cardiology Challenges, including the 2012 Challenge "Predicting mortality of ICU Patients". QueryBuilder is a web-based tool that provides easy access to MIMIC-II. For more computationally intensive queries, one can locally install a complete copy of MIMIC-II in a VM. Both publicly available tools provide the MIMIC-II research community with convenient querying interfaces and complement the value of the MIMIC-II relational database.
BioPepDB: an integrated data platform for food-derived bioactive peptides.
Li, Qilin; Zhang, Chao; Chen, Hongjun; Xue, Jitong; Guo, Xiaolei; Liang, Ming; Chen, Ming
2018-03-12
Food-derived bioactive peptides play critical roles in regulating most biological processes and have considerable biological, medical and industrial importance. However, a large number of active peptides data, including sequence, function, source, commercial product information, references and other information are poorly integrated. BioPepDB is a searchable database of food-derived bioactive peptides and their related articles, including more than four thousand bioactive peptide entries. Moreover, BioPepDB provides modules of prediction and hydrolysis-simulation for discovering novel peptides. It can serve as a reference database to investigate the function of different bioactive peptides. BioPepDB is available at http://bis.zju.edu.cn/biopepdbr/ . The web page utilises Apache, PHP5 and MySQL to provide the user interface for accessing the database and predict novel peptides. The database itself is operated on a specialised server.
ADASS Web Database XML Project
NASA Astrophysics Data System (ADS)
Barg, M. I.; Stobie, E. B.; Ferro, A. J.; O'Neil, E. J.
In the spring of 2000, at the request of the ADASS Program Organizing Committee (POC), we began organizing information from previous ADASS conferences in an effort to create a centralized database. The beginnings of this database originated from data (invited speakers, participants, papers, etc.) extracted from HyperText Markup Language (HTML) documents from past ADASS host sites. Unfortunately, not all HTML documents are well formed and parsing them proved to be an iterative process. It was evident at the beginning that if these Web documents were organized in a standardized way, such as XML (Extensible Markup Language), the processing of this information across the Web could be automated, more efficient, and less error prone. This paper will briefly review the many programming tools available for processing XML, including Java, Perl and Python, and will explore the mapping of relational data from our MySQL database to XML.
2010-07-01
http://www.iono.noa.gr/ElectronDensity/EDProfile.php The web service has been developed with the following open source tools: a) PHP , for the... MySQL for the database, which was based on the enhancement of the DIAS database. Below we present some screen shots to demonstrate the functionality
Case and Model Driven Dynamic Template Linking
2005-06-01
store the trips in a PostgreSQL database (www.postgresql.org) and the values stored in this database could be re-used to provide values for similar trips...Preferences YES Yes but limited Print Form YES NO Close Form YES NO Just “X” Quit YES NO Just “X” Show User Action History YES NO 6.5 DAML Ontologies
NASA Astrophysics Data System (ADS)
Grimes, J.; Mahoney, A. R.; Heinrichs, T. A.; Eicken, H.
2012-12-01
Sensor data can be highly variable in nature and also varied depending on the physical quantity being observed, sensor hardware and sampling parameters. The sea ice mass balance site (MBS) operated in Barrow by the University of Alaska Fairbanks (http://seaice.alaska.edu/gi/observatories/barrow_sealevel) is a multisensor platform consisting of a thermistor string, air and water temperature sensors, acoustic altimeters above and below the ice and a humidity sensor. Each sensor has a unique specification and configuration. The data from multiple sensors are combined to generate sea ice data products. For example, ice thickness is calculated from the positions of the upper and lower ice surfaces, which are determined using data from downward-looking and upward-looking acoustic altimeters above and below the ice, respectively. As a data clearinghouse, the Geographic Information Network of Alaska (GINA) processes real time data from many sources, including the Barrow MBS. Doing so requires a system that is easy to use, yet also offers the flexibility to handle data from multisensor observing platforms. In the case of the Barrow MBS, the metadata system needs to accommodate the addition of new and retirement of old sensors from year to year as well as instrument configuration changes caused by, for example, spring melt or inquisitive polar bears. We also require ease of use for both administrators and end users. Here we present the data and processing steps of using sensor data system powered by the NoSQL storage engine, MongoDB. The system has been developed to ingest, process, disseminate and archive data from the Barrow MBS. Storing sensor data in a generalized format, from many different sources, is a challenging task, especially for traditional SQL databases with a set schema. MongoDB is a NoSQL (not only SQL) database that does not require a fixed schema. There are several advantages using this model over the traditional relational database management system (RDBMS) model databases. The lack of a required schema allows flexibility in how the data can be ingested into the database. For example, MongoDB imposes no restrictions on field names. For researchers using the system, this means that the name they have chosen for the sensor is carried through the database, any processing, and to the final output helping to preserve data integrity. Also, MongoDB allows the data to be pushed to it dynamically meaning that field attributes can be defined at the point of ingestion. This allows any sensor data to be ingested as a document and for this functionality to be transferred to the user interface, allowing greater adaptability to different use-case scenarios. In presenting the MondoDB data system being developed for the Barrow MBS, we demonstrate the versatility of this approach and its suitability as the foundation of a Barrow node of the Arctic Observing Network. Authors Jason Grimes - Geographic Information Network of Alaska - jason@gina.alaska.edu Andy Mahony - Geophysical Institute - mahoney@gi.alaska.edu Hajo Eiken - Geophysical Institute - Hajo.Eicken@gi.alaska.edu Tom Heinrichs - Geographic Information Network of Alaska - Tom.Heinrichs@alaska.edu
Experiences with DCE: the pro7 communication server based on OSF-DCE functionality.
Schulte, M; Lordieck, W
1997-01-01
The pro7-communication server is a new approach to manage communication between different applications on different hardware platforms in a hospital environment. The most important features are the use of OSF/DCE for realising remote procedure calls between different platforms, the use of an SQL-92 compatible relational database and the design of a new software development tool (called protocol definition language compiler) for describing the interface of a new application, which is to integrate in a hospital environment.
A Comparison of Different Database Technologies for the CMS AsyncStageOut Transfer Database
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ciangottini, D.; Balcas, J.; Mascheroni, M.
AsyncStageOut (ASO) is the component of the CMS distributed data analysis system (CRAB) that manages users transfers in a centrally controlled way using the File Transfer System (FTS3) at CERN. It addresses a major weakness of the previous, decentralized model, namely that the transfer of the user’s output data to a single remote site was part of the job execution, resulting in inefficient use of job slots and an unacceptable failure rate. Currently ASO manages up to 600k files of various sizes per day from more than 500 users per month, spread over more than 100 sites. ASO uses amore » NoSQL database (CouchDB) as internal bookkeeping and as way to communicate with other CRAB components. Since ASO/CRAB were put in production in 2014, the number of transfers constantly increased up to a point where the pressure to the central CouchDB instance became critical, creating new challenges for the system scalability, performance, and monitoring. This forced a re-engineering of the ASO application to increase its scalability and lowering its operational effort. In this contribution we present a comparison of the performance of the current NoSQL implementation and a new SQL implementation, and how their different strengths and features influenced the design choices and operational experience. We also discuss other architectural changes introduced in the system to handle the increasing load and latency in delivering output to the user.« less
A comparison of different database technologies for the CMS AsyncStageOut transfer database
NASA Astrophysics Data System (ADS)
Ciangottini, D.; Balcas, J.; Mascheroni, M.; Rupeika, E. A.; Vaandering, E.; Riahi, H.; Silva, J. M. D.; Hernandez, J. M.; Belforte, S.; Ivanov, T. T.
2017-10-01
AsyncStageOut (ASO) is the component of the CMS distributed data analysis system (CRAB) that manages users transfers in a centrally controlled way using the File Transfer System (FTS3) at CERN. It addresses a major weakness of the previous, decentralized model, namely that the transfer of the user’s output data to a single remote site was part of the job execution, resulting in inefficient use of job slots and an unacceptable failure rate. Currently ASO manages up to 600k files of various sizes per day from more than 500 users per month, spread over more than 100 sites. ASO uses a NoSQL database (CouchDB) as internal bookkeeping and as way to communicate with other CRAB components. Since ASO/CRAB were put in production in 2014, the number of transfers constantly increased up to a point where the pressure to the central CouchDB instance became critical, creating new challenges for the system scalability, performance, and monitoring. This forced a re-engineering of the ASO application to increase its scalability and lowering its operational effort. In this contribution we present a comparison of the performance of the current NoSQL implementation and a new SQL implementation, and how their different strengths and features influenced the design choices and operational experience. We also discuss other architectural changes introduced in the system to handle the increasing load and latency in delivering output to the user.
Improved oilfield GHG accounting using a global oilfield database
NASA Astrophysics Data System (ADS)
Roberts, S.; Brandt, A. R.; Masnadi, M.
2016-12-01
The definition of oil is shifting in considerable ways. Conventional oil resources are declining as oil sands, heavy oils, and others emerge. Technological advances mean that these unconventional hydrocarbons are now viable resources. Meanwhile, scientific evidence is mounting that climate change is occurring. The oil sector is responsible for 35% of global greenhouse gas (GHG) emissions, but the climate impacts of these new unconventional oils are not well understood. As such, the Oil Climate Index (OCI) project has been an international effort to evaluate the total life-cycle environmental GHG emissions of different oil fields globally. Over the course of the first and second phases of the project, 30 and 75 global oil fields have been investigated, respectively. The 75 fields account for about 25% of global oil production. For the third phase of the project, it is aimed to expand the OCI to contain closing to 100% of global oil production; leading to the analysis of 8000 fields. To accomplish this, a robust database system is required to handle and manipulate the data. Therefore, the integration of the data into the computer science language SQL (Structured Query Language) was performed. The implementation of SQL allows users to process the data more efficiently than would be possible by using the previously established program (Microsoft Excel). Next, a graphic user interface (gui) was implemented, in the computer science language of C#, in order to make the data interactive; enabling people to update the database without prior knowledge of SQL being necessary.
SwePep, a database designed for endogenous peptides and mass spectrometry.
Fälth, Maria; Sköld, Karl; Norrman, Mathias; Svensson, Marcus; Fenyö, David; Andren, Per E
2006-06-01
A new database, SwePep, specifically designed for endogenous peptides, has been constructed to significantly speed up the identification process from complex tissue samples utilizing mass spectrometry. In the identification process the experimental peptide masses are compared with the peptide masses stored in the database both with and without possible post-translational modifications. This intermediate identification step is fast and singles out peptides that are potential endogenous peptides and can later be confirmed with tandem mass spectrometry data. Successful applications of this methodology are presented. The SwePep database is a relational database developed using MySql and Java. The database contains 4180 annotated endogenous peptides from different tissues originating from 394 different species as well as 50 novel peptides from brain tissue identified in our laboratory. Information about the peptides, including mass, isoelectric point, sequence, and precursor protein, is also stored in the database. This new approach holds great potential for removing the bottleneck that occurs during the identification process in the field of peptidomics. The SwePep database is available to the public.
Killion, Patrick J; Sherlock, Gavin; Iyer, Vishwanath R
2003-01-01
Background The power of microarray analysis can be realized only if data is systematically archived and linked to biological annotations as well as analysis algorithms. Description The Longhorn Array Database (LAD) is a MIAME compliant microarray database that operates on PostgreSQL and Linux. It is a fully open source version of the Stanford Microarray Database (SMD), one of the largest microarray databases. LAD is available at Conclusions Our development of LAD provides a simple, free, open, reliable and proven solution for storage and analysis of two-color microarray data. PMID:12930545
CGDM: collaborative genomic data model for molecular profiling data using NoSQL.
Wang, Shicai; Mares, Mihaela A; Guo, Yi-Ke
2016-12-01
High-throughput molecular profiling has greatly improved patient stratification and mechanistic understanding of diseases. With the increasing amount of data used in translational medicine studies in recent years, there is a need to improve the performance of data warehouses in terms of data retrieval and statistical processing. Both relational and Key Value models have been used for managing molecular profiling data. Key Value models such as SeqWare have been shown to be particularly advantageous in terms of query processing speed for large datasets. However, more improvement can be achieved, particularly through better indexing techniques of the Key Value models, taking advantage of the types of queries which are specific for the high-throughput molecular profiling data. In this article, we introduce a Collaborative Genomic Data Model (CGDM), aimed at significantly increasing the query processing speed for the main classes of queries on genomic databases. CGDM creates three Collaborative Global Clustering Index Tables (CGCITs) to solve the velocity and variety issues at the cost of limited extra volume. Several benchmarking experiments were carried out, comparing CGDM implemented on HBase to the traditional SQL data model (TDM) implemented on both HBase and MySQL Cluster, using large publicly available molecular profiling datasets taken from NCBI and HapMap. In the microarray case, CGDM on HBase performed up to 246 times faster than TDM on HBase and 7 times faster than TDM on MySQL Cluster. In single nucleotide polymorphism case, CGDM on HBase outperformed TDM on HBase by up to 351 times and TDM on MySQL Cluster by up to 9 times. The CGDM source code is available at https://github.com/evanswang/CGDM. y.guo@imperial.ac.uk. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
NASA Astrophysics Data System (ADS)
Shanafield, Harold; Shamblin, Stephanie; Devarakonda, Ranjeet; McMurry, Ben; Walker Beaty, Tammy; Wilson, Bruce; Cook, Robert B.
2011-02-01
The FLUXNET global network of regional flux tower networks serves to coordinate the regional and global analysis of eddy covariance based CO2, water vapor and energy flux measurements taken at more than 500 sites in continuous long-term operation. The FLUXNET database presently contains information about the location, characteristics, and data availability of each of these sites. To facilitate the coordination and distribution of this information, we redesigned the underlying database and associated web site. We chose the PostgreSQL database as a platform based on its performance, stability and GIS extensions. PostreSQL allows us to enhance our search and presentation capabilities, which will in turn provide increased functionality for users seeking to understand the FLUXNET data. The redesigned database will also significantly decrease the burden of managing such highly varied data. The website is being developed using the Drupal content management system, which provides many community-developed modules and a robust framework for custom feature development. In parallel, we are working with the regional networks to ensure that the information in the FLUXNET database is identical to that in the regional networks. Going forward, we also plan to develop an automated way to synchronize information with the regional networks.
BIRS – Bioterrorism Information Retrieval System
Tewari, Ashish Kumar; Rashi; Wadhwa, Gulshan; Sharma, Sanjeev Kumar; Jain, Chakresh Kumar
2013-01-01
Bioterrorism is the intended use of pathogenic strains of microbes to widen terror in a population. There is a definite need to promote research for development of vaccines, therapeutics and diagnostic methods as a part of preparedness to any bioterror attack in the future. BIRS is an open-access database of collective information on the organisms related to bioterrorism. The architecture of database utilizes the current open-source technology viz PHP ver 5.3.19, MySQL and IIS server under windows platform for database designing. Database stores information on literature, generic- information and unique pathways of about 10 microorganisms involved in bioterrorism. This may serve as a collective repository to accelerate the drug discovery and vaccines designing process against such bioterrorist agents (microbes). The available data has been validated from various online resources and literature mining in order to provide the user with a comprehensive information system. Availability The database is freely available at http://www.bioterrorism.biowaves.org PMID:23390356
BGDB: a database of bivalent genes.
Li, Qingyan; Lian, Shuabin; Dai, Zhiming; Xiang, Qian; Dai, Xianhua
2013-01-01
Bivalent gene is a gene marked with both H3K4me3 and H3K27me3 epigenetic modification in the same area, and is proposed to play a pivotal role related to pluripotency in embryonic stem (ES) cells. Identification of these bivalent genes and understanding their functions are important for further research of lineage specification and embryo development. So far, lots of genome-wide histone modification data were generated in mouse and human ES cells. These valuable data make it possible to identify bivalent genes, but no comprehensive data repositories or analysis tools are available for bivalent genes currently. In this work, we develop BGDB, the database of bivalent genes. The database contains 6897 bivalent genes in human and mouse ES cells, which are manually collected from scientific literature. Each entry contains curated information, including genomic context, sequences, gene ontology and other relevant information. The web services of BGDB database were implemented with PHP + MySQL + JavaScript, and provide diverse query functions. Database URL: http://dailab.sysu.edu.cn/bgdb/
Managing hydrological measurements for small and intermediate projects: RObsDat
NASA Astrophysics Data System (ADS)
Reusser, Dominik E.
2014-05-01
Hydrological measurements need good management for the data not to be lost. Multiple, often overlapping files from various loggers with heterogeneous formats need to be merged. Data needs to be validated and cleaned and subsequently converted to the format for the hydrological target application. Preferably, all these steps should be easily tracable. RObsDat is an R package designed to support such data management. It comes with a command line user interface to support hydrologists to enter and adjust their data in a database following the Observations Data Model (ODM) standard by QUASHI. RObsDat helps in the setup of the database within one of the free database engines MySQL, PostgreSQL or SQLite. It imports the controlled water vocabulary from the QUASHI web service and provides a smart interface between the hydrologist and the database: Already existing data entries are detected and duplicates avoided. The data import function converts different data table designes to make import simple. Cleaning and modifications of data are handled with a simple version control system. Variable and location names are treated in a user friendly way, accepting and processing multiple versions. A new development is the use of spacetime objects for subsequent processing.
CRAVE: a database, middleware and visualization system for phenotype ontologies.
Gkoutos, Georgios V; Green, Eain C J; Greenaway, Simon; Blake, Andrew; Mallon, Ann-Marie; Hancock, John M
2005-04-01
A major challenge in modern biology is to link genome sequence information to organismal function. In many organisms this is being done by characterizing phenotypes resulting from mutations. Efficiently expressing phenotypic information requires combinatorial use of ontologies. However tools are not currently available to visualize combinations of ontologies. Here we describe CRAVE (Concept Relation Assay Value Explorer), a package allowing storage, active updating and visualization of multiple ontologies. CRAVE is a web-accessible JAVA application that accesses an underlying MySQL database of ontologies via a JAVA persistent middleware layer (Chameleon). This maps the database tables into discrete JAVA classes and creates memory resident, interlinked objects corresponding to the ontology data. These JAVA objects are accessed via calls through the middleware's application programming interface. CRAVE allows simultaneous display and linking of multiple ontologies and searching using Boolean and advanced searches.
NEMiD: a web-based curated microbial diversity database with geo-based plotting.
Bhattacharjee, Kaushik; Joshi, Santa Ram
2014-01-01
The majority of the Earth's microbes remain unknown, and that their potential utility cannot be exploited until they are discovered and characterized. They provide wide scope for the development of new strains as well as biotechnological uses. The documentation and bioprospection of microorganisms carry enormous significance considering their relevance to human welfare. This calls for an urgent need to develop a database with emphasis on the microbial diversity of the largest untapped reservoirs in the biosphere. The data annotated in the North-East India Microbial database (NEMiD) were obtained by the isolation and characterization of microbes from different parts of the Eastern Himalayan region. The database was constructed as a relational database management system (RDBMS) for data storage in MySQL in the back-end on a Linux server and implemented in an Apache/PHP environment. This database provides a base for understanding the soil microbial diversity pattern in this megabiodiversity hotspot and indicates the distribution patterns of various organisms along with identification. The NEMiD database is freely available at www.mblabnehu.info/nemid/.
NEMiD: A Web-Based Curated Microbial Diversity Database with Geo-Based Plotting
Bhattacharjee, Kaushik; Joshi, Santa Ram
2014-01-01
The majority of the Earth's microbes remain unknown, and that their potential utility cannot be exploited until they are discovered and characterized. They provide wide scope for the development of new strains as well as biotechnological uses. The documentation and bioprospection of microorganisms carry enormous significance considering their relevance to human welfare. This calls for an urgent need to develop a database with emphasis on the microbial diversity of the largest untapped reservoirs in the biosphere. The data annotated in the North-East India Microbial database (NEMiD) were obtained by the isolation and characterization of microbes from different parts of the Eastern Himalayan region. The database was constructed as a relational database management system (RDBMS) for data storage in MySQL in the back-end on a Linux server and implemented in an Apache/PHP environment. This database provides a base for understanding the soil microbial diversity pattern in this megabiodiversity hotspot and indicates the distribution patterns of various organisms along with identification. The NEMiD database is freely available at www.mblabnehu.info/nemid/. PMID:24714636
A Recommender System in the Cyber Defense Domain
2014-03-27
monitoring software is a java based program sending updates to the database on the sensor machine. The host monitoring program gathers information about...3.2.2 Database. A MySQL database located on the sensor machine acts as the storage for the sensors on the network. Snort, Nmap, vulnerability scores, and...machine with the IDS and the recommender is labeled “sensor”. The recommender system code is written in java and compiled using java version 1.6.024
NASA Astrophysics Data System (ADS)
Knosp, B.; Gangl, M. E.; Hristova-Veleva, S. M.; Kim, R. M.; Lambrigtsen, B.; Li, P.; Niamsuwan, N.; Shen, T. P. J.; Turk, F. J.; Vu, Q. A.
2014-12-01
The JPL Tropical Cyclone Information System (TCIS) brings together satellite, aircraft, and model forecast data from several NASA, NOAA, and other data centers to assist researchers in comparing and analyzing data related to tropical cyclones. The TCIS has been supporting specific science field campaigns, such as the Genesis and Rapid Intensification Processes (GRIP) campaign and the Hurricane and Severe Storm Sentinel (HS3) campaign, by creating near real-time (NRT) data visualization portals. These portals are intended to assist in mission planning, enhance the understanding of current physical processes, and improve model data by comparing it to satellite and aircraft observations. The TCIS NRT portals allow the user to view plots on a Google Earth interface. To compliment these visualizations, the team has been working on developing data analysis tools to let the user actively interrogate areas of Level 2 swath and two-dimensional plots they see on their screen. As expected, these observation and model data are quite voluminous and bottlenecks in the system architecture can occur when the databases try to run geospatial searches for data files that need to be read by the tools. To improve the responsiveness of the data analysis tools, the TCIS team has been conducting studies on how to best store Level 2 swath footprints and run sub-second geospatial searches to discover data. The first objective was to improve the sampling accuracy of the footprints being stored in the TCIS database by comparing the Java-based NASA PO.DAAC Level 2 Swath Generator with a TCIS Python swath generator. The second objective was to compare the performance of four database implementations - MySQL, MySQL+Solr, MongoDB, and PostgreSQL - to see which database management system would yield the best geospatial query and storage performance. The final objective was to integrate our chosen technologies with our Joint Probability Density Function (Joint PDF), Wave Number Analysis, and Automated Rotational Center Hurricane Eye Retrieval (ARCHER) tools. In this presentation, we will compare the enabling technologies we tested and discuss which ones we selected for integration into the TCIS' data analysis tool architecture. We will also show how these techniques have been automated to provide access to NRT data through our analysis tools.
A System for Web-based Access to the HSOS Database
NASA Astrophysics Data System (ADS)
Lin, G.
Huairou Solar Observing Station's (HSOS) magnetogram and dopplergram are world-class instruments. Access to their data has opened to the world. Web-based access to the data will provide a powerful, convenient tool for data searching and solar physics. It is necessary that our data be provided to users via the Web when it is opened to the world. In this presentation, the author describes general design and programming construction of the system. The system will be generated by PHP and MySQL. The author also introduces basic feature of PHP and MySQL.
Toxicity ForeCaster (ToxCast™) Data
Data is organized into different data sets and includes descriptions of ToxCast chemicals and assays and files summarizing the screening results, a MySQL database, chemicals screened through Tox21, and available data generated from animal toxicity studies.
STINGRAY: system for integrated genomic resources and analysis.
Wagner, Glauber; Jardim, Rodrigo; Tschoeke, Diogo A; Loureiro, Daniel R; Ocaña, Kary A C S; Ribeiro, Antonio C B; Emmel, Vanessa E; Probst, Christian M; Pitaluga, André N; Grisard, Edmundo C; Cavalcanti, Maria C; Campos, Maria L M; Mattoso, Marta; Dávila, Alberto M R
2014-03-07
The STINGRAY system has been conceived to ease the tasks of integrating, analyzing, annotating and presenting genomic and expression data from Sanger and Next Generation Sequencing (NGS) platforms. STINGRAY includes: (a) a complete and integrated workflow (more than 20 bioinformatics tools) ranging from functional annotation to phylogeny; (b) a MySQL database schema, suitable for data integration and user access control; and (c) a user-friendly graphical web-based interface that makes the system intuitive, facilitating the tasks of data analysis and annotation. STINGRAY showed to be an easy to use and complete system for analyzing sequencing data. While both Sanger and NGS platforms are supported, the system could be faster using Sanger data, since the large NGS datasets could potentially slow down the MySQL database usage. STINGRAY is available at http://stingray.biowebdb.org and the open source code at http://sourceforge.net/projects/stingray-biowebdb/.
NASA Astrophysics Data System (ADS)
Dervos, D. A.; Skourlas, C.; Laiho, M.
2015-02-01
"DBTech VET Teachers" project is Leonardo da Vinci Multilateral Transfer of Innovation Project co-financed by the European Commission's Lifelong Learning Programme. The aim of the project is to renew the teaching of database technologies in VET (Vocational Education and Training) institutes, on the basis of the current and real needs of ICT industry in Europe. Training of the VET teachers is done with the systems used in working life and they are taught to guide students to learning by verifying. In this framework, a course module on SQL transactions is prepared and offered. In this paper we present and briefly discuss some qualitative/quantitative data collected from its first pilot offering to an international audience in Greece during May-June 2013. The questionnaire/evaluation results, and the types of participants who have attended the course offering, are presented. Conclusions are also presented.
STINGRAY: system for integrated genomic resources and analysis
2014-01-01
Background The STINGRAY system has been conceived to ease the tasks of integrating, analyzing, annotating and presenting genomic and expression data from Sanger and Next Generation Sequencing (NGS) platforms. Findings STINGRAY includes: (a) a complete and integrated workflow (more than 20 bioinformatics tools) ranging from functional annotation to phylogeny; (b) a MySQL database schema, suitable for data integration and user access control; and (c) a user-friendly graphical web-based interface that makes the system intuitive, facilitating the tasks of data analysis and annotation. Conclusion STINGRAY showed to be an easy to use and complete system for analyzing sequencing data. While both Sanger and NGS platforms are supported, the system could be faster using Sanger data, since the large NGS datasets could potentially slow down the MySQL database usage. STINGRAY is available at http://stingray.biowebdb.org and the open source code at http://sourceforge.net/projects/stingray-biowebdb/. PMID:24606808
Motivational and metacognitive feedback in SQL-Tutor*
NASA Astrophysics Data System (ADS)
Hull, Alison; du Boulay, Benedict
2015-04-01
Motivation and metacognition are strongly intertwined, with learners high in self-efficacy more likely to use a variety of self-regulatory learning strategies, as well as to persist longer on challenging tasks. The aim of the research was to improve the learner's focus on the process and experience of problem-solving while using an Intelligent Tutoring System (ITS) and including motivational and metacognitive feedback based on the learner's past states and experiences. An existing ITS, SQL-Tutor, was used with first-year undergraduates studying a database module. The study used two versions of SQL-Tutor: the Control group used a base version providing domain feedback and the Study group used an extended version that also provided motivational and metacognitive feedback. This paper summarises the pre- and post-process results. Comparisons between groups showed some differing trends both in learning outcomes and behaviour in favour of the Study group.
UbSRD: The Ubiquitin Structural Relational Database.
Harrison, Joseph S; Jacobs, Tim M; Houlihan, Kevin; Van Doorslaer, Koenraad; Kuhlman, Brian
2016-02-22
The structurally defined ubiquitin-like homology fold (UBL) can engage in several unique protein-protein interactions and many of these complexes have been characterized with high-resolution techniques. Using Rosetta's structural classification tools, we have created the Ubiquitin Structural Relational Database (UbSRD), an SQL database of features for all 509 UBL-containing structures in the PDB, allowing users to browse these structures by protein-protein interaction and providing a platform for quantitative analysis of structural features. We used UbSRD to define the recognition features of ubiquitin (UBQ) and SUMO observed in the PDB and the orientation of the UBQ tail while interacting with certain types of proteins. While some of the interaction surfaces on UBQ and SUMO overlap, each molecule has distinct features that aid in molecular discrimination. Additionally, we find that the UBQ tail is malleable and can adopt a variety of conformations upon binding. UbSRD is accessible as an online resource at rosettadesign.med.unc.edu/ubsrd. Copyright © 2015 Elsevier Ltd. All rights reserved.
FJET Database Project: Extract, Transform, and Load
NASA Technical Reports Server (NTRS)
Samms, Kevin O.
2015-01-01
The Data Mining & Knowledge Management team at Kennedy Space Center is providing data management services to the Frangible Joint Empirical Test (FJET) project at Langley Research Center (LARC). FJET is a project under the NASA Engineering and Safety Center (NESC). The purpose of FJET is to conduct an assessment of mild detonating fuse (MDF) frangible joints (FJs) for human spacecraft separation tasks in support of the NASA Commercial Crew Program. The Data Mining & Knowledge Management team has been tasked with creating and managing a database for the efficient storage and retrieval of FJET test data. This paper details the Extract, Transform, and Load (ETL) process as it is related to gathering FJET test data into a Microsoft SQL relational database, and making that data available to the data users. Lessons learned, procedures implemented, and programming code samples are discussed to help detail the learning experienced as the Data Mining & Knowledge Management team adapted to changing requirements and new technology while maintaining flexibility of design in various aspects of the data management project.
Performance analysis of different database in new internet mapping system
NASA Astrophysics Data System (ADS)
Yao, Xing; Su, Wei; Gao, Shuai
2017-03-01
In the Mapping System of New Internet, Massive mapping entries between AID and RID need to be stored, added, updated, and deleted. In order to better deal with the problem when facing a large number of mapping entries update and query request, the Mapping System of New Internet must use high-performance database. In this paper, we focus on the performance of Redis, SQLite, and MySQL these three typical databases, and the results show that the Mapping System based on different databases can adapt to different needs according to the actual situation.
A complete history of everything
NASA Astrophysics Data System (ADS)
Lanclos, Kyle; Deich, William T. S.
2012-09-01
This paper discusses Lick Observatory's local solution for retaining a complete history of everything. Leveraging our existing deployment of a publish/subscribe communications model that is used to broadcast the state of all systems at Lick Observatory, a monitoring daemon runs on a dedicated server that subscribes to and records all published messages. Our success with this system is a testament to the power of simple, straightforward approaches to complex problems. The solution itself is written in Python, and the initial version required about a week of development time; the data are stored in PostgreSQL database tables using a distinctly simple schema. Over time, we addressed scaling issues as the data set grew, which involved reworking the PostgreSQL database schema on the back-end. We also duplicate the data in flat files to enable recovery or migration of the data from one server to another. This paper will cover both the initial design as well as the solutions to the subsequent deployment issues, the trade-offs that motivated those choices, and the integration of this history database with existing client applications.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Enders, Alexander L.; Lousteau, Angela L.
The Desktop Analysis Reporting Tool (DART) is a software package that allows users to easily view and analyze daily files that span long periods. DART gives users the capability to quickly determine the state of health of a radiation portal monitor (RPM), troubleshoot and diagnose problems, and view data in various time frames to perform trend analysis. In short, it converts the data strings written in the daily files into meaningful tables and plots. The standalone version of DART (“soloDART”) utilizes a database engine that is included with the application; no additional installations are necessary. There is also a networkedmore » version of DART (“polyDART”) that is designed to maximize the benefit of a centralized data repository while distributing the workload to individual desktop machines. This networked approach requires a more complex database manager Structured Query Language (SQL) Server; however, SQL Server is not currently provided with DART. Regardless of which version is used, DART will import daily files from RPMs, store the relevant data in its database, and it can produce reports for status, trend analysis, and reporting purposes.« less
Construction of Database for Pulsating Variable Stars
NASA Astrophysics Data System (ADS)
Chen, B. Q.; Yang, M.; Jiang, B. W.
2011-07-01
A database for the pulsating variable stars is constructed for Chinese astronomers to study the variable stars conveniently. The database includes about 230000 variable stars in the Galactic bulge, LMC and SMC observed by the MACHO (MAssive Compact Halo Objects) and OGLE (Optical Gravitational Lensing Experiment) projects at present. The software used for the construction is LAMP, i.e., Linux+Apache+MySQL+PHP. A web page is provided to search the photometric data and the light curve in the database through the right ascension and declination of the object. More data will be incorporated into the database.
Software Architecture Evolution
2013-12-01
system’s major components occurring via a Java Message Service message bus [69]. This architecture was designed to promote loose coupling of soft- ware...play reconfiguration of the system. The components were Java -based and platform-independent; the interfaces by which they communicated were based on...The MPCS database, a MySQL database used for storing telemetry as well as some other information, such as logs and commanding data [68]. This
2011-09-01
rate Python’s maturity as “High.” Python is nine years old and has been continuously been developed and enhanced since then. During fiscal year 2010...We rate Python’s developer toolkit availability/extensibility as “Yes.” Python runs on a SQL database and is 64 compatible with Oracle database...MODEL...........................................................................11 D. GOAL DEVELOPMENT
Cost Considerations in Cloud Computing
2014-01-01
investments. 2. Database Options The potential promise that “ big data ” analytics holds for many enterprise mission areas makes relevant the question of the...development of a range of new distributed file systems and data - bases that have better scalability properties than traditional SQL databases. Hadoop ... data . Many systems exist that extend or supplement Hadoop —such as Apache Accumulo, which provides a highly granular mechanism for managing security
PDBj Mine: design and implementation of relational database interface for Protein Data Bank Japan
Kinjo, Akira R.; Yamashita, Reiko; Nakamura, Haruki
2010-01-01
This article is a tutorial for PDBj Mine, a new database and its interface for Protein Data Bank Japan (PDBj). In PDBj Mine, data are loaded from files in the PDBMLplus format (an extension of PDBML, PDB's canonical XML format, enriched with annotations), which are then served for the user of PDBj via the worldwide web (WWW). We describe the basic design of the relational database (RDB) and web interfaces of PDBj Mine. The contents of PDBMLplus files are first broken into XPath entities, and these paths and data are indexed in the way that reflects the hierarchical structure of the XML files. The data for each XPath type are saved into the corresponding relational table that is named as the XPath itself. The generation of table definitions from the PDBMLplus XML schema is fully automated. For efficient search, frequently queried terms are compiled into a brief summary table. Casual users can perform simple keyword search, and 'Advanced Search' which can specify various conditions on the entries. More experienced users can query the database using SQL statements which can be constructed in a uniform manner. Thus, PDBj Mine achieves a combination of the flexibility of XML documents and the robustness of the RDB. Database URL: http://www.pdbj.org/ PMID:20798081
PDBj Mine: design and implementation of relational database interface for Protein Data Bank Japan.
Kinjo, Akira R; Yamashita, Reiko; Nakamura, Haruki
2010-08-25
This article is a tutorial for PDBj Mine, a new database and its interface for Protein Data Bank Japan (PDBj). In PDBj Mine, data are loaded from files in the PDBMLplus format (an extension of PDBML, PDB's canonical XML format, enriched with annotations), which are then served for the user of PDBj via the worldwide web (WWW). We describe the basic design of the relational database (RDB) and web interfaces of PDBj Mine. The contents of PDBMLplus files are first broken into XPath entities, and these paths and data are indexed in the way that reflects the hierarchical structure of the XML files. The data for each XPath type are saved into the corresponding relational table that is named as the XPath itself. The generation of table definitions from the PDBMLplus XML schema is fully automated. For efficient search, frequently queried terms are compiled into a brief summary table. Casual users can perform simple keyword search, and 'Advanced Search' which can specify various conditions on the entries. More experienced users can query the database using SQL statements which can be constructed in a uniform manner. Thus, PDBj Mine achieves a combination of the flexibility of XML documents and the robustness of the RDB. Database URL: http://www.pdbj.org/
Accessing the public MIMIC-II intensive care relational database for clinical research
2013-01-01
Background The Multiparameter Intelligent Monitoring in Intensive Care II (MIMIC-II) database is a free, public resource for intensive care research. The database was officially released in 2006, and has attracted a growing number of researchers in academia and industry. We present the two major software tools that facilitate accessing the relational database: the web-based QueryBuilder and a downloadable virtual machine (VM) image. Results QueryBuilder and the MIMIC-II VM have been developed successfully and are freely available to MIMIC-II users. Simple example SQL queries and the resulting data are presented. Clinical studies pertaining to acute kidney injury and prediction of fluid requirements in the intensive care unit are shown as typical examples of research performed with MIMIC-II. In addition, MIMIC-II has also provided data for annual PhysioNet/Computing in Cardiology Challenges, including the 2012 Challenge “Predicting mortality of ICU Patients”. Conclusions QueryBuilder is a web-based tool that provides easy access to MIMIC-II. For more computationally intensive queries, one can locally install a complete copy of MIMIC-II in a VM. Both publicly available tools provide the MIMIC-II research community with convenient querying interfaces and complement the value of the MIMIC-II relational database. PMID:23302652
A spatial-temporal system for dynamic cadastral management.
Nan, Liu; Renyi, Liu; Guangliang, Zhu; Jiong, Xie
2006-03-01
A practical spatio-temporal database (STDB) technique for dynamic urban land management is presented. One of the STDB models, the expanded model of Base State with Amendments (BSA), is selected as the basis for developing the dynamic cadastral management technique. Two approaches, the Section Fast Indexing (SFI) and the Storage Factors of Variable Granularity (SFVG), are used to improve the efficiency of the BSA model. Both spatial graphic data and attribute data, through a succinct engine, are stored in standard relational database management systems (RDBMS) for the actual implementation of the BSA model. The spatio-temporal database is divided into three interdependent sub-databases: present DB, history DB and the procedures-tracing DB. The efficiency of database operation is improved by the database connection in the bottom layer of the Microsoft SQL Server. The spatio-temporal system can be provided at a low-cost while satisfying the basic needs of urban land management in China. The approaches presented in this paper may also be of significance to countries where land patterns change frequently or to agencies where financial resources are limited.
Cyclone: java-based querying and computing with Pathway/Genome databases.
Le Fèvre, François; Smidtas, Serge; Schächter, Vincent
2007-05-15
Cyclone aims at facilitating the use of BioCyc, a collection of Pathway/Genome Databases (PGDBs). Cyclone provides a fully extensible Java Object API to analyze and visualize these data. Cyclone can read and write PGDBs, and can write its own data in the CycloneML format. This format is automatically generated from the BioCyc ontology by Cyclone itself, ensuring continued compatibility. Cyclone objects can also be stored in a relational database CycloneDB. Queries can be written in SQL, and in an intuitive and concise object-oriented query language, Hibernate Query Language (HQL). In addition, Cyclone interfaces easily with Java software including the Eclipse IDE for HQL edition, the Jung API for graph algorithms or Cytoscape for graph visualization. Cyclone is freely available under an open source license at: http://sourceforge.net/projects/nemo-cyclone. For download and installation instructions, tutorials, use cases and examples, see http://nemo-cyclone.sourceforge.net.
Catalogue of HI PArameters (CHIPA)
NASA Astrophysics Data System (ADS)
Saponara, J.; Benaglia, P.; Koribalski, B.; Andruchow, I.
2015-08-01
The catalogue of HI parameters of galaxies HI (CHIPA) is the natural continuation of the compilation by M.C. Martin in 1998. CHIPA provides the most important parameters of nearby galaxies derived from observations of the neutral Hydrogen line. The catalogue contains information of 1400 galaxies across the sky and different morphological types. Parameters like the optical diameter of the galaxy, the blue magnitude, the distance, morphological type, HI extension are listed among others. Maps of the HI distribution, velocity and velocity dispersion can also be display for some cases. The main objective of this catalogue is to facilitate the bibliographic queries, through searching in a database accessible from the internet that will be available in 2015 (the website is under construction). The database was built using the open source `` mysql (SQL, Structured Query Language, management system relational database) '', while the website was built with ''HTML (Hypertext Markup Language)'' and ''PHP (Hypertext Preprocessor)''.
Sorokin, Anatoly; Selkov, Gene; Goryanin, Igor
2012-07-16
The volume of the experimentally measured time series data is rapidly growing, while storage solutions offering better data types than simple arrays of numbers or opaque blobs for keeping series data are sorely lacking. A number of indexing methods have been proposed to provide efficient access to time series data, but none has so far been integrated into a tried-and-proven database system. To explore the possibility of such integration, we have developed a data type for time series storage in PostgreSQL, an object-relational database system, and equipped it with an access method based on SAX (Symbolic Aggregate approXimation). This new data type has been successfully tested in a database supporting a large-scale plant gene expression experiment, and it was additionally tested on a very large set of simulated time series data. Copyright © 2011 Elsevier B.V. All rights reserved.
Survey of Machine Learning Methods for Database Security
NASA Astrophysics Data System (ADS)
Kamra, Ashish; Ber, Elisa
Application of machine learning techniques to database security is an emerging area of research. In this chapter, we present a survey of various approaches that use machine learning/data mining techniques to enhance the traditional security mechanisms of databases. There are two key database security areas in which these techniques have found applications, namely, detection of SQL Injection attacks and anomaly detection for defending against insider threats. Apart from the research prototypes and tools, various third-party commercial products are also available that provide database activity monitoring solutions by profiling database users and applications. We present a survey of such products. We end the chapter with a primer on mechanisms for responding to database anomalies.
High Performance Descriptive Semantic Analysis of Semantic Graph Databases
DOE Office of Scientific and Technical Information (OSTI.GOV)
Joslyn, Cliff A.; Adolf, Robert D.; al-Saffar, Sinan
As semantic graph database technology grows to address components ranging from extant large triple stores to SPARQL endpoints over SQL-structured relational databases, it will become increasingly important to be able to understand their inherent semantic structure, whether codified in explicit ontologies or not. Our group is researching novel methods for what we call descriptive semantic analysis of RDF triplestores, to serve purposes of analysis, interpretation, visualization, and optimization. But data size and computational complexity makes it increasingly necessary to bring high performance computational resources to bear on this task. Our research group built a novel high performance hybrid system comprisingmore » computational capability for semantic graph database processing utilizing the large multi-threaded architecture of the Cray XMT platform, conventional servers, and large data stores. In this paper we describe that architecture and our methods, and present the results of our analyses of basic properties, connected components, namespace interaction, and typed paths such for the Billion Triple Challenge 2010 dataset.« less
Effective spatial database support for acquiring spatial information from remote sensing images
NASA Astrophysics Data System (ADS)
Jin, Peiquan; Wan, Shouhong; Yue, Lihua
2009-12-01
In this paper, a new approach to maintain spatial information acquiring from remote-sensing images is presented, which is based on Object-Relational DBMS. According to this approach, the detected and recognized results of targets are stored and able to be further accessed in an ORDBMS-based spatial database system, and users can access the spatial information using the standard SQL interface. This approach is different from the traditional ArcSDE-based method, because the spatial information management module is totally integrated into the DBMS and becomes one of the core modules in the DBMS. We focus on three issues, namely the general framework for the ORDBMS-based spatial database system, the definitions of the add-in spatial data types and operators, and the process to develop a spatial Datablade on Informix. The results show that the ORDBMS-based spatial database support for image-based target detecting and recognition is easy and practical to be implemented.
2014-01-01
Protein biomarkers offer major benefits for diagnosis and monitoring of disease processes. Recent advances in protein mass spectrometry make it feasible to use this very sensitive technology to detect and quantify proteins in blood. To explore the potential of blood biomarkers, we conducted a thorough review to evaluate the reliability of data in the literature and to determine the spectrum of proteins reported to exist in blood with a goal of creating a Federated Database of Blood Proteins (FDBP). A unique feature of our approach is the use of a SQL database for all of the peptide data; the power of the SQL database combined with standard informatic algorithms such as BLAST and the statistical analysis system (SAS) allowed the rapid annotation and analysis of the database without the need to create special programs to manage the data. Our mathematical analysis and review shows that in addition to the usual secreted proteins found in blood, there are many reports of intracellular proteins and good agreement on transcription factors, DNA remodelling factors in addition to cellular receptors and their signal transduction enzymes. Overall, we have catalogued about 12,130 proteins identified by at least one unique peptide, and of these 3858 have 3 or more peptide correlations. The FDBP with annotations should facilitate testing blood for specific disease biomarkers. PMID:24476026
The HISTMAG database: combining historical, archaeomagnetic and volcanic data
NASA Astrophysics Data System (ADS)
Arneitz, Patrick; Leonhardt, Roman; Schnepp, Elisabeth; Heilig, Balázs; Mayrhofer, Franziska; Kovacs, Peter; Hejda, Pavel; Valach, Fridrich; Vadasz, Gergely; Hammerl, Christa; Egli, Ramon; Fabian, Karl; Kompein, Niko
2017-09-01
Records of the past geomagnetic field can be divided into two main categories. These are instrumental historical observations on the one hand, and field estimates based on the magnetization acquired by rocks, sediments and archaeological artefacts on the other hand. In this paper, a new database combining historical, archaeomagnetic and volcanic records is presented. HISTMAG is a relational database, implemented in MySQL, and can be accessed via a web-based interface (http://www.conrad-observatory.at/zamg/index.php/data-en/histmag-database). It combines available global historical data compilations covering the last ∼500 yr as well as archaeomagnetic and volcanic data collections from the last 50 000 yr. Furthermore, new historical and archaeomagnetic records, mainly from central Europe, have been acquired. In total, 190 427 records are currently available in the HISTMAG database, whereby the majority is related to historical declination measurements (155 525). The original database structure was complemented by new fields, which allow for a detailed description of the different data types. A user-comment function provides the possibility for a scientific discussion about individual records. Therefore, HISTMAG database supports thorough reliability and uncertainty assessments of the widely different data sets, which are an essential basis for geomagnetic field reconstructions. A database analysis revealed systematic offset for declination records derived from compass roses on historical geographical maps through comparison with other historical records, while maps created for mining activities represent a reliable source.
ARCPHdb: A comprehensive protein database for SF1 and SF2 helicase from archaea.
Moukhtar, Mirna; Chaar, Wafi; Abdel-Razzak, Ziad; Khalil, Mohamad; Taha, Samir; Chamieh, Hala
2017-01-01
Superfamily 1 and Superfamily 2 helicases, two of the largest helicase protein families, play vital roles in many biological processes including replication, transcription and translation. Study of helicase proteins in the model microorganisms of archaea have largely contributed to the understanding of their function, architecture and assembly. Based on a large phylogenomics approach, we have identified and classified all SF1 and SF2 protein families in ninety five sequenced archaea genomes. Here we developed an online webserver linked to a specialized protein database named ARCPHdb to provide access for SF1 and SF2 helicase families from archaea. ARCPHdb was implemented using MySQL relational database. Web interfaces were developed using Netbeans. Data were stored according to UniProt accession numbers, NCBI Ref Seq ID, PDB IDs and Entrez Databases. A user-friendly interactive web interface has been developed to browse, search and download archaeal helicase protein sequences, their available 3D structure models, and related documentation available in the literature provided by ARCPHdb. The database provides direct links to matching external databases. The ARCPHdb is the first online database to compile all protein information on SF1 and SF2 helicase from archaea in one platform. This database provides essential resource information for all researchers interested in the field. Copyright © 2016 Elsevier Ltd. All rights reserved.
JNDMS Task Authorization 2 Report
2013-10-01
uses Barnyard to store alarms from all DREnet Snort sensors in a MySQL database. Barnyard is an open source tool designed to work with Snort to take...Technology ITI Information Technology Infrastructure J2EE Java 2 Enterprise Edition JAR Java Archive. This is an archive file format defined by Java ...standards. JDBC Java Database Connectivity JDW JNDMS Data Warehouse JNDMS Joint Network and Defence Management System JNDMS Joint Network Defence and
Meta sequence analysis of human blood peptides and their parent proteins.
Bowden, Peter; Pendrak, Voitek; Zhu, Peihong; Marshall, John G
2010-04-18
Sequence analysis of the blood peptides and their qualities will be key to understanding the mechanisms that contribute to error in LC-ESI-MS/MS. Analysis of peptides and their proteins at the level of sequences is much more direct and informative than the comparison of disparate accession numbers. A portable database of all blood peptide and protein sequences with descriptor fields and gene ontology terms might be useful for designing immunological or MRM assays from human blood. The results of twelve studies of human blood peptides and/or proteins identified by LC-MS/MS and correlated against a disparate array of genetic libraries were parsed and matched to proteins from the human ENSEMBL, SwissProt and RefSeq databases by SQL. The reported peptide and protein sequences were organized into an SQL database with full protein sequences and up to five unique peptides in order of prevalence along with the peptide count for each protein. Structured query language or BLAST was used to acquire descriptive information in current databases. Sampling error at the level of peptides is the largest source of disparity between groups. Chi Square analysis of peptide to protein distributions confirmed the significant agreement between groups on identified proteins. Copyright 2010. Published by Elsevier B.V.
BIGNASim: a NoSQL database structure and analysis portal for nucleic acids simulation data
Hospital, Adam; Andrio, Pau; Cugnasco, Cesare; Codo, Laia; Becerra, Yolanda; Dans, Pablo D.; Battistini, Federica; Torres, Jordi; Goñi, Ramón; Orozco, Modesto; Gelpí, Josep Ll.
2016-01-01
Molecular dynamics simulation (MD) is, just behind genomics, the bioinformatics tool that generates the largest amounts of data, and that is using the largest amount of CPU time in supercomputing centres. MD trajectories are obtained after months of calculations, analysed in situ, and in practice forgotten. Several projects to generate stable trajectory databases have been developed for proteins, but no equivalence exists in the nucleic acids world. We present here a novel database system to store MD trajectories and analyses of nucleic acids. The initial data set available consists mainly of the benchmark of the new molecular dynamics force-field, parmBSC1. It contains 156 simulations, with over 120 μs of total simulation time. A deposition protocol is available to accept the submission of new trajectory data. The database is based on the combination of two NoSQL engines, Cassandra for storing trajectories and MongoDB to store analysis results and simulation metadata. The analyses available include backbone geometries, helical analysis, NMR observables and a variety of mechanical analyses. Individual trajectories and combined meta-trajectories can be downloaded from the portal. The system is accessible through http://mmb.irbbarcelona.org/BIGNASim/. Supplementary Material is also available on-line at http://mmb.irbbarcelona.org/BIGNASim/SuppMaterial/. PMID:26612862
Chen, Mingyang; Stott, Amanda C; Li, Shenggang; Dixon, David A
2012-04-01
A robust metadata database called the Collaborative Chemistry Database Tool (CCDBT) for massive amounts of computational chemistry raw data has been designed and implemented. It performs data synchronization and simultaneously extracts the metadata. Computational chemistry data in various formats from different computing sources, software packages, and users can be parsed into uniform metadata for storage in a MySQL database. Parsing is performed by a parsing pyramid, including parsers written for different levels of data types and sets created by the parser loader after loading parser engines and configurations. Copyright © 2011 Elsevier Inc. All rights reserved.
De-MA: a web Database for electron Microprobe Analyses to assist EMP lab manager and users
NASA Astrophysics Data System (ADS)
Allaz, J. M.
2012-12-01
Lab managers and users of electron microprobe (EMP) facilities require comprehensive, yet flexible documentation structures, as well as an efficient scheduling mechanism. A single on-line database system for managing reservations, and providing information on standards, quantitative and qualitative setups (element mapping, etc.), and X-ray data has been developed for this purpose. This system is particularly useful in multi-user facilities where experience ranges from beginners to the highly experienced. New users and occasional facility users will find these tools extremely useful in developing and maintaining high quality, reproducible, and efficient analyses. This user-friendly database is available through the web, and uses MySQL as a database and PHP/HTML as script language (dynamic website). The database includes several tables for standards information, X-ray lines, X-ray element mapping, PHA, element setups, and agenda. It is configurable for up to five different EMPs in a single lab, each of them having up to five spectrometers and as many diffraction crystals as required. The installation should be done on a web server supporting PHP/MySQL, although installation on a personal computer is possible using third-party freeware to create a local Apache server, and to enable PHP/MySQL. Since it is web-based, any user outside the EMP lab can access this database anytime through any web browser and on any operating system. The access can be secured using a general password protection (e.g. htaccess). The web interface consists of 6 main menus. (1) "Standards" lists standards defined in the database, and displays detailed information on each (e.g. material type, name, reference, comments, and analyses). Images such as EDS spectra or BSE can be associated with a standard. (2) "Analyses" lists typical setups to use for quantitative analyses, allows calculation of mineral composition based on a mineral formula, or calculation of mineral formula based on a fixed amount of oxygen, or of cation (using an analysis in element or oxide weight-%); this latter includes re-calculation of H2O/CO2 based on stoichiometry, and oxygen correction for F and Cl. Another option offers a list of any available standards and possible peak or background interferences for a series of elements. (3) "X-ray maps" lists the different setups recommended for element mapping using WDS, and a map calculator to facilitate maps setups and to estimate the total mapping time. (4) "X-ray data" lists all x-ray lines for a specific element (K, L, M, absorption edges, and satellite peaks) in term of energy, wavelength and peak position. A check for possible interferences on peak or background is also possible. Theoretical x-ray peak positions for each crystal are calculated based on the 2d spacing of each crystal and the wavelength of each line. (5) "Agenda" menu displays the reservation dates for each month and for each EMP lab defined. It also offers a reservation request option, this request being sent by email to the EMP manager for approval. (6) Finally, "Admin" is password restricted, and contains all necessary options to manage the database through user-friendly forms. The installation of this database is made easy and knowledge of HTML, PHP, or MySQL is unnecessary to install, configure, manage, or use it. A working database is accessible at http://cub.geoloweb.ch.
SAMSON Technology Demonstrator
2014-06-01
requested. The SAMSON TD was testing with two different policy engines: 1. A custom XACML-based element matching engine using a MySQL database for...performed during the course of the event. Full information protection across the sphere of access management, information protection and auditing was in...
Construction of the Database for Pulsating Variable Stars
NASA Astrophysics Data System (ADS)
Chen, Bing-Qiu; Yang, Ming; Jiang, Bi-Wei
2012-01-01
A database for pulsating variable stars is constructed to favor the study of variable stars in China. The database includes about 230,000 variable stars in the Galactic bulge, LMC and SMC observed in an about 10 yr period by the MACHO(MAssive Compact Halo Objects) and OGLE(Optical Gravitational Lensing Experiment) projects. The software used for the construction is LAMP, i.e., Linux+Apache+MySQL+PHP. A web page is provided for searching the photometric data and light curves in the database through the right ascension and declination of an object. Because of the flexibility of this database, more up-to-date data of variable stars can be incorporated into the database conveniently.
Martin, Stanton L; Blackmon, Barbara P; Rajagopalan, Ravi; Houfek, Thomas D; Sceeles, Robert G; Denn, Sheila O; Mitchell, Thomas K; Brown, Douglas E; Wing, Rod A; Dean, Ralph A
2002-01-01
We have created a federated database for genome studies of Magnaporthe grisea, the causal agent of rice blast disease, by integrating end sequence data from BAC clones, genetic marker data and BAC contig assembly data. A library of 9216 BAC clones providing >25-fold coverage of the entire genome was end sequenced and fingerprinted by HindIII digestion. The Image/FPC software package was then used to generate an assembly of 188 contigs covering >95% of the genome. The database contains the results of this assembly integrated with hybridization data of genetic markers to the BAC library. AceDB was used for the core database engine and a MySQL relational database, populated with numerical representations of BAC clones within FPC contigs, was used to create appropriately scaled images. The database is being used to facilitate sequencing efforts. The database also allows researchers mapping known genes or other sequences of interest, rapid and easy access to the fundamental organization of the M.grisea genome. This database, MagnaportheDB, can be accessed on the web at http://www.cals.ncsu.edu/fungal_genomics/mgdatabase/int.htm.
NASA Astrophysics Data System (ADS)
Ifimov, Gabriela; Pigeau, Grace; Arroyo-Mora, J. Pablo; Soffer, Raymond; Leblanc, George
2017-10-01
In this study the development and implementation of a geospatial database model for the management of multiscale datasets encompassing airborne imagery and associated metadata is presented. To develop the multi-source geospatial database we have used a Relational Database Management System (RDBMS) on a Structure Query Language (SQL) server which was then integrated into ArcGIS and implemented as a geodatabase. The acquired datasets were compiled, standardized, and integrated into the RDBMS, where logical associations between different types of information were linked (e.g. location, date, and instrument). Airborne data, at different processing levels (digital numbers through geocorrected reflectance), were implemented in the geospatial database where the datasets are linked spatially and temporally. An example dataset consisting of airborne hyperspectral imagery, collected for inter and intra-annual vegetation characterization and detection of potential hydrocarbon seepage events over pipeline areas, is presented. Our work provides a model for the management of airborne imagery, which is a challenging aspect of data management in remote sensing, especially when large volumes of data are collected.
Fourment, Mathieu; Gibbs, Mark J
2008-02-05
Viruses of the Bunyaviridae have segmented negative-stranded RNA genomes and several of them cause significant disease. Many partial sequences have been obtained from the segments so that GenBank searches give complex results. Sequence databases usually use HTML pages to mediate remote sorting, but this approach can be limiting and may discourage a user from exploring a database. The VirusBanker database contains Bunyaviridae sequences and alignments and is presented as two spreadsheets generated by a Java program that interacts with a MySQL database on a server. Sequences are displayed in rows and may be sorted using information that is displayed in columns and includes data relating to the segment, gene, protein, species, strain, sequence length, terminal sequence and date and country of isolation. Bunyaviridae sequences and alignments may be downloaded from the second spreadsheet with titles defined by the user from the columns, or viewed when passed directly to the sequence editor, Jalview. VirusBanker allows large datasets of aligned nucleotide and protein sequences from the Bunyaviridae to be compiled and winnowed rapidly using criteria that are formulated heuristically.
NASA Astrophysics Data System (ADS)
Kuznetsov, Valentin; Riley, Daniel; Afaq, Anzar; Sekhri, Vijay; Guo, Yuyi; Lueking, Lee
2010-04-01
The CMS experiment has implemented a flexible and powerful system enabling users to find data within the CMS physics data catalog. The Dataset Bookkeeping Service (DBS) comprises a database and the services used to store and access metadata related to CMS physics data. To this, we have added a generalized query system in addition to the existing web and programmatic interfaces to the DBS. This query system is based on a query language that hides the complexity of the underlying database structure by discovering the join conditions between database tables. This provides a way of querying the system that is simple and straightforward for CMS data managers and physicists to use without requiring knowledge of the database tables or keys. The DBS Query Language uses the ANTLR tool to build the input query parser and tokenizer, followed by a query builder that uses a graph representation of the DBS schema to construct the SQL query sent to underlying database. We will describe the design of the query system, provide details of the language components and overview of how this component fits into the overall data discovery system architecture.
BGDB: a database of bivalent genes
Li, Qingyan; Lian, Shuabin; Dai, Zhiming; Xiang, Qian; Dai, Xianhua
2013-01-01
Bivalent gene is a gene marked with both H3K4me3 and H3K27me3 epigenetic modification in the same area, and is proposed to play a pivotal role related to pluripotency in embryonic stem (ES) cells. Identification of these bivalent genes and understanding their functions are important for further research of lineage specification and embryo development. So far, lots of genome-wide histone modification data were generated in mouse and human ES cells. These valuable data make it possible to identify bivalent genes, but no comprehensive data repositories or analysis tools are available for bivalent genes currently. In this work, we develop BGDB, the database of bivalent genes. The database contains 6897 bivalent genes in human and mouse ES cells, which are manually collected from scientific literature. Each entry contains curated information, including genomic context, sequences, gene ontology and other relevant information. The web services of BGDB database were implemented with PHP + MySQL + JavaScript, and provide diverse query functions. Database URL: http://dailab.sysu.edu.cn/bgdb/ PMID:23894186
Reactome graph database: Efficient access to complex pathway data
Korninger, Florian; Viteri, Guilherme; Marin-Garcia, Pablo; Ping, Peipei; Wu, Guanming; Stein, Lincoln; D’Eustachio, Peter
2018-01-01
Reactome is a free, open-source, open-data, curated and peer-reviewed knowledgebase of biomolecular pathways. One of its main priorities is to provide easy and efficient access to its high quality curated data. At present, biological pathway databases typically store their contents in relational databases. This limits access efficiency because there are performance issues associated with queries traversing highly interconnected data. The same data in a graph database can be queried more efficiently. Here we present the rationale behind the adoption of a graph database (Neo4j) as well as the new ContentService (REST API) that provides access to these data. The Neo4j graph database and its query language, Cypher, provide efficient access to the complex Reactome data model, facilitating easy traversal and knowledge discovery. The adoption of this technology greatly improved query efficiency, reducing the average query time by 93%. The web service built on top of the graph database provides programmatic access to Reactome data by object oriented queries, but also supports more complex queries that take advantage of the new underlying graph-based data storage. By adopting graph database technology we are providing a high performance pathway data resource to the community. The Reactome graph database use case shows the power of NoSQL database engines for complex biological data types. PMID:29377902
Reactome graph database: Efficient access to complex pathway data.
Fabregat, Antonio; Korninger, Florian; Viteri, Guilherme; Sidiropoulos, Konstantinos; Marin-Garcia, Pablo; Ping, Peipei; Wu, Guanming; Stein, Lincoln; D'Eustachio, Peter; Hermjakob, Henning
2018-01-01
Reactome is a free, open-source, open-data, curated and peer-reviewed knowledgebase of biomolecular pathways. One of its main priorities is to provide easy and efficient access to its high quality curated data. At present, biological pathway databases typically store their contents in relational databases. This limits access efficiency because there are performance issues associated with queries traversing highly interconnected data. The same data in a graph database can be queried more efficiently. Here we present the rationale behind the adoption of a graph database (Neo4j) as well as the new ContentService (REST API) that provides access to these data. The Neo4j graph database and its query language, Cypher, provide efficient access to the complex Reactome data model, facilitating easy traversal and knowledge discovery. The adoption of this technology greatly improved query efficiency, reducing the average query time by 93%. The web service built on top of the graph database provides programmatic access to Reactome data by object oriented queries, but also supports more complex queries that take advantage of the new underlying graph-based data storage. By adopting graph database technology we are providing a high performance pathway data resource to the community. The Reactome graph database use case shows the power of NoSQL database engines for complex biological data types.
A Magnetic Petrology Database for Satellite Magnetic Anomaly Interpretations
NASA Astrophysics Data System (ADS)
Nazarova, K.; Wasilewski, P.; Didenko, A.; Genshaft, Y.; Pashkevich, I.
2002-05-01
A Magnetic Petrology Database (MPDB) is now being compiled at NASA/Goddard Space Flight Center in cooperation with Russian and Ukrainian Institutions. The purpose of this database is to provide the geomagnetic community with a comprehensive and user-friendly method of accessing magnetic petrology data via Internet for more realistic interpretation of satellite magnetic anomalies. Magnetic Petrology Data had been accumulated in NASA/Goddard Space Flight Center, United Institute of Physics of the Earth (Russia) and Institute of Geophysics (Ukraine) over several decades and now consists of many thousands of records of data in our archives. The MPDB was, and continues to be in big demand especially since recent launching in near Earth orbit of the mini-constellation of three satellites - Oersted (in 1999), Champ (in 2000), and SAC-C (in 2000) which will provide lithospheric magnetic maps with better spatial and amplitude resolution (about 1 nT). The MPDB is focused on lower crustal and upper mantle rocks and will include data on mantle xenoliths, serpentinized ultramafic rocks, granulites, iron quartzites and rocks from Archean-Proterozoic metamorphic sequences from all around the world. A substantial amount of data is coming from the area of unique Kursk Magnetic Anomaly and Kola Deep Borehole (which recovered 12 km of continental crust). A prototype MPDB can be found on the Geodynamics Branch web server of Goddard Space Flight Center at http://core2.gsfc.nasa.gov/terr_mag/magnpetr.html. The MPDB employs a searchable relational design and consists of 7 interrelated tables. The schema of database is shown at http://core2.gsfc.nasa.gov/terr_mag/doc.html. MySQL database server was utilized to implement MPDB. The SQL (Structured Query Language) is used to query the database. To present the results of queries on WEB and for WEB programming we utilized PHP scripting language and CGI scripts. The prototype MPDB is designed to search database by major satellite magnetic anomaly, tectonic structure, geographical location, rock type, magnetic properties, chemistry and reference, see http://core2.gsfc.nasa.gov/terr_mag/query1.html. The output of database is HTML structured table, text file, and downloadable file. This database will be very useful for studies of lithospheric satellite magnetic anomalies on the Earth and other terrestrial planets.
Cario, Clinton L; Witte, John S
2018-03-15
As whole-genome tumor sequence and biological annotation datasets grow in size, number and content, there is an increasing basic science and clinical need for efficient and accurate data management and analysis software. With the emergence of increasingly sophisticated data stores, execution environments and machine learning algorithms, there is also a need for the integration of functionality across frameworks. We present orchid, a python based software package for the management, annotation and machine learning of cancer mutations. Building on technologies of parallel workflow execution, in-memory database storage and machine learning analytics, orchid efficiently handles millions of mutations and hundreds of features in an easy-to-use manner. We describe the implementation of orchid and demonstrate its ability to distinguish tissue of origin in 12 tumor types based on 339 features using a random forest classifier. Orchid and our annotated tumor mutation database are freely available at https://github.com/wittelab/orchid. Software is implemented in python 2.7, and makes use of MySQL or MemSQL databases. Groovy 2.4.5 is optionally required for parallel workflow execution. JWitte@ucsf.edu. Supplementary data are available at Bioinformatics online.
Finding Protein and Nucleotide Similarities with FASTA
Pearson, William R.
2016-01-01
The FASTA programs provide a comprehensive set of rapid similarity searching tools ( fasta36, fastx36, tfastx36, fasty36, tfasty36), similar to those provided by the BLAST package, as well as programs for slower, optimal, local and global similarity searches ( ssearch36, ggsearch36) and for searching with short peptides and oligonucleotides ( fasts36, fastm36). The FASTA programs use an empirical strategy for estimating statistical significance that accommodates a range of similarity scoring matrices and gap penalties, improving alignment boundary accuracy and search sensitivity (Unit 3.5). The FASTA programs can produce “BLAST-like” alignment and tabular output, for ease of integration into existing analysis pipelines, and can search small, representative databases, and then report results for a larger set of sequences, using links from the smaller dataset. The FASTA programs work with a wide variety of database formats, including mySQL and postgreSQL databases (Unit 9.4). The programs also provide a strategy for integrating domain and active site annotations into alignments and highlighting the mutational state of functionally critical residues. These protocols describe how to use the FASTA programs to characterize protein and DNA sequences, using protein:protein, protein:DNA, and DNA:DNA comparisons. PMID:27010337
Finding Protein and Nucleotide Similarities with FASTA.
Pearson, William R
2016-03-24
The FASTA programs provide a comprehensive set of rapid similarity searching tools (fasta36, fastx36, tfastx36, fasty36, tfasty36), similar to those provided by the BLAST package, as well as programs for slower, optimal, local, and global similarity searches (ssearch36, ggsearch36), and for searching with short peptides and oligonucleotides (fasts36, fastm36). The FASTA programs use an empirical strategy for estimating statistical significance that accommodates a range of similarity scoring matrices and gap penalties, improving alignment boundary accuracy and search sensitivity. The FASTA programs can produce "BLAST-like" alignment and tabular output, for ease of integration into existing analysis pipelines, and can search small, representative databases, and then report results for a larger set of sequences, using links from the smaller dataset. The FASTA programs work with a wide variety of database formats, including mySQL and postgreSQL databases. The programs also provide a strategy for integrating domain and active site annotations into alignments and highlighting the mutational state of functionally critical residues. These protocols describe how to use the FASTA programs to characterize protein and DNA sequences, using protein:protein, protein:DNA, and DNA:DNA comparisons. Copyright © 2016 John Wiley & Sons, Inc.
tcpl: The ToxCast Pipeline for High-Throughput Screening Data
Motivation: The large and diverse high-throughput chemical screening efforts carried out by the US EPAToxCast program requires an efficient, transparent, and reproducible data pipeline.Summary: The tcpl R package and its associated MySQL database provide a generalized platform fo...
NoSQL Based 3D City Model Management System
NASA Astrophysics Data System (ADS)
Mao, B.; Harrie, L.; Cao, J.; Wu, Z.; Shen, J.
2014-04-01
To manage increasingly complicated 3D city models, a framework based on NoSQL database is proposed in this paper. The framework supports import and export of 3D city model according to international standards such as CityGML, KML/COLLADA and X3D. We also suggest and implement 3D model analysis and visualization in the framework. For city model analysis, 3D geometry data and semantic information (such as name, height, area, price and so on) are stored and processed separately. We use a Map-Reduce method to deal with the 3D geometry data since it is more complex, while the semantic analysis is mainly based on database query operation. For visualization, a multiple 3D city representation structure CityTree is implemented within the framework to support dynamic LODs based on user viewpoint. Also, the proposed framework is easily extensible and supports geoindexes to speed up the querying. Our experimental results show that the proposed 3D city management system can efficiently fulfil the analysis and visualization requirements.
The BDNYC database of low-mass stars, brown dwarfs, and planetary mass companions
NASA Astrophysics Data System (ADS)
Cruz, Kelle; Rodriguez, David; Filippazzo, Joseph; Gonzales, Eileen; Faherty, Jacqueline K.; Rice, Emily; BDNYC
2018-01-01
We present a web-interface to a database of low-mass stars, brown dwarfs, and planetary mass companions. Users can send SELECT SQL queries to the database, perform searches by coordinates or name, check the database inventory on specified objects, and even plot spectra interactively. The initial version of this database contains information for 198 objects and version 2 will contain over 1000 objects. The database currently includes photometric data from 2MASS, WISE, and Spitzer and version 2 will include a significant portion of the publicly available optical and NIR spectra for brown dwarfs. The database is maintained and curated by the BDNYC research group and we welcome contributions from other researchers via GitHub.
Enhancing SAMOS Data Access in DOMS via a Neo4j Property Graph Database.
NASA Astrophysics Data System (ADS)
Stallard, A. P.; Smith, S. R.; Elya, J. L.
2016-12-01
The Shipboard Automated Meteorological and Oceanographic System (SAMOS) initiative provides routine access to high-quality marine meteorological and near-surface oceanographic observations from research vessels. The Distributed Oceanographic Match-Up Service (DOMS) under development is a centralized service that allows researchers to easily match in situ and satellite oceanographic data from distributed sources to facilitate satellite calibration, validation, and retrieval algorithm development. The service currently uses Apache Solr as a backend search engine on each node in the distributed network. While Solr is a high-performance solution that facilitates creation and maintenance of indexed data, it is limited in the sense that its schema is fixed. The property graph model escapes this limitation by creating relationships between data objects. The authors will present the development of the SAMOS Neo4j property graph database including new search possibilities that take advantage of the property graph model, performance comparisons with Apache Solr, and a vision for graph databases as a storage tool for oceanographic data. The integration of the SAMOS Neo4j graph into DOMS will also be described. Currently, Neo4j contains spatial and temporal records from SAMOS which are modeled into a time tree and r-tree using Graph Aware and Spatial plugin tools for Neo4j. These extensions provide callable Java procedures within CYPHER (Neo4j's query language) that generate in-graph structures. Once generated, these structures can be queried using procedures from these libraries, or directly via CYPHER statements. Neo4j excels at performing relationship and path-based queries, which challenge relational-SQL databases because they require memory intensive joins due to the limitation of their design. Consider a user who wants to find records over several years, but only for specific months. If a traditional database only stores timestamps, this type of query would be complex and likely prohibitively slow. Using the time tree model, one can specify a path from the root to the data which restricts resolutions to certain timeframes (e.g., months). This query can be executed without joins, unions, or other compute-intensive operations, putting Neo4j at a computational advantage to the SQL database alternative.
A Study of the Efficiency of Spatial Indexing Methods Applied to Large Astronomical Databases
NASA Astrophysics Data System (ADS)
Donaldson, Tom; Berriman, G. Bruce; Good, John; Shiao, Bernie
2018-01-01
Spatial indexing of astronomical databases generally uses quadrature methods, which partition the sky into cells used to create an index (usually a B-tree) written as database column. We report the results of a study to compare the performance of two common indexing methods, HTM and HEALPix, on Solaris and Windows database servers installed with a PostgreSQL database, and a Windows Server installed with MS SQL Server. The indexing was applied to the 2MASS All-Sky Catalog and to the Hubble Source catalog. On each server, the study compared indexing performance by submitting 1 million queries at each index level with random sky positions and random cone search radius, which was computed on a logarithmic scale between 1 arcsec and 1 degree, and measuring the time to complete the query and write the output. These simulated queries, intended to model realistic use patterns, were run in a uniform way on many combinations of indexing method and indexing level. The query times in all simulations are strongly I/O-bound and are linear with number of records returned for large numbers of sources. There are, however, considerable differences between simulations, which reveal that hardware I/O throughput is a more important factor in managing the performance of a DBMS than the choice of indexing scheme. The choice of index itself is relatively unimportant: for comparable index levels, the performance is consistent within the scatter of the timings. At small index levels (large cells; e.g. level 4; cell size 3.7 deg), there is large scatter in the timings because of wide variations in the number of sources found in the cells. At larger index levels, performance improves and scatter decreases, but the improvement at level 8 (14 min) and higher is masked to some extent in the timing scatter caused by the range of query sizes. At very high levels (20; 0.0004 arsec), the granularity of the cells becomes so high that a large number of extraneous empty cells begin to degrade performance. Thus, for the use patterns studied here the database performance is not critically dependent on the exact choices of index or level.
PROTICdb: a web-based application to store, track, query, and compare plant proteome data.
Ferry-Dumazet, Hélène; Houel, Gwenn; Montalent, Pierre; Moreau, Luc; Langella, Olivier; Negroni, Luc; Vincent, Delphine; Lalanne, Céline; de Daruvar, Antoine; Plomion, Christophe; Zivy, Michel; Joets, Johann
2005-05-01
PROTICdb is a web-based application, mainly designed to store and analyze plant proteome data obtained by two-dimensional polyacrylamide gel electrophoresis (2-D PAGE) and mass spectrometry (MS). The purposes of PROTICdb are (i) to store, track, and query information related to proteomic experiments, i.e., from tissue sampling to protein identification and quantitative measurements, and (ii) to integrate information from the user's own expertise and other sources into a knowledge base, used to support data interpretation (e.g., for the determination of allelic variants or products of post-translational modifications). Data insertion into the relational database of PROTICdb is achieved either by uploading outputs of image analysis and MS identification software, or by filling web forms. 2-D PAGE annotated maps can be displayed, queried, and compared through a graphical interface. Links to external databases are also available. Quantitative data can be easily exported in a tabulated format for statistical analyses. PROTICdb is based on the Oracle or the PostgreSQL Database Management System and is freely available upon request at the following URL: http://moulon.inra.fr/ bioinfo/PROTICdb.
Large Survey Database: A Distributed Framework for Storage and Analysis of Large Datasets
NASA Astrophysics Data System (ADS)
Juric, Mario
2011-01-01
The Large Survey Database (LSD) is a Python framework and DBMS for distributed storage, cross-matching and querying of large survey catalogs (>10^9 rows, >1 TB). The primary driver behind its development is the analysis of Pan-STARRS PS1 data. It is specifically optimized for fast queries and parallel sweeps of positionally and temporally indexed datasets. It transparently scales to more than >10^2 nodes, and can be made to function in "shared nothing" architectures. An LSD database consists of a set of vertically and horizontally partitioned tables, physically stored as compressed HDF5 files. Vertically, we partition the tables into groups of related columns ('column groups'), storing together logically related data (e.g., astrometry, photometry). Horizontally, the tables are partitioned into partially overlapping ``cells'' by position in space (lon, lat) and time (t). This organization allows for fast lookups based on spatial and temporal coordinates, as well as data and task distribution. The design was inspired by the success of Google BigTable (Chang et al., 2006). Our programming model is a pipelined extension of MapReduce (Dean and Ghemawat, 2004). An SQL-like query language is used to access data. For complex tasks, map-reduce ``kernels'' that operate on query results on a per-cell basis can be written, with the framework taking care of scheduling and execution. The combination leverages users' familiarity with SQL, while offering a fully distributed computing environment. LSD adds little overhead compared to direct Python file I/O. In tests, we sweeped through 1.1 Grows of PanSTARRS+SDSS data (220GB) less than 15 minutes on a dual CPU machine. In a cluster environment, we achieved bandwidths of 17Gbits/sec (I/O limited). Based on current experience, we believe LSD should scale to be useful for analysis and storage of LSST-scale datasets. It can be downloaded from http://mwscience.net/lsd.
MODBASE, a database of annotated comparative protein structure models
Pieper, Ursula; Eswar, Narayanan; Stuart, Ashley C.; Ilyin, Valentin A.; Sali, Andrej
2002-01-01
MODBASE (http://guitar.rockefeller.edu/modbase) is a relational database of annotated comparative protein structure models for all available protein sequences matched to at least one known protein structure. The models are calculated by MODPIPE, an automated modeling pipeline that relies on PSI-BLAST, IMPALA and MODELLER. MODBASE uses the MySQL relational database management system for flexible and efficient querying, and the MODVIEW Netscape plugin for viewing and manipulating multiple sequences and structures. It is updated regularly to reflect the growth of the protein sequence and structure databases, as well as improvements in the software for calculating the models. For ease of access, MODBASE is organized into different datasets. The largest dataset contains models for domains in 304 517 out of 539 171 unique protein sequences in the complete TrEMBL database (23 March 2001); only models based on significant alignments (PSI-BLAST E-value < 10–4) and models assessed to have the correct fold are included. Other datasets include models for target selection and structure-based annotation by the New York Structural Genomics Research Consortium, models for prediction of genes in the Drosophila melanogaster genome, models for structure determination of several ribosomal particles and models calculated by the MODWEB comparative modeling web server. PMID:11752309
Nuclear Data Online Services at Peking University
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fan, T.S.; Guo, Z.Y.; Ye, W.G.
2005-05-24
The Institute of Heavy Ion Physics at Peking University has developed a new nuclear data online services software package. Through the web site (http://ndos.nst.pku.edu.cn), it offers online access to main relational nuclear databases: five evaluated neutron libraries (BROND, CENDL, ENDF, JEF, JENDL), the ENSDF library, the EXFOR library, the IAEA photonuclear library and the charged particle data of the FENDL library. This software allows the comparison and graphic representations of the different data sets. The computer programs of this package are based on the Linux implementation of PHP and the MySQL software.
TreeQ-VISTA: An Interactive Tree Visualization Tool withFunctional Annotation Query Capabilities
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gu, Shengyin; Anderson, Iain; Kunin, Victor
2007-05-07
Summary: We describe a general multiplatform exploratorytool called TreeQ-Vista, designed for presenting functional annotationsin a phylogenetic context. Traits, such as phenotypic and genomicproperties, are interactively queried from a relational database with auser-friendly interface which provides a set of tools for users with orwithout SQL knowledge. The query results are projected onto aphylogenetic tree and can be displayed in multiple color groups. A richset of browsing, grouping and query tools are provided to facilitatetrait exploration, comparison and analysis.Availability: The program,detailed tutorial and examples are available online athttp://genome-test.lbl.gov/vista/TreeQVista.
Nuclear Data Online Services at Peking University
NASA Astrophysics Data System (ADS)
Fan, T. S.; Guo, Z. Y.; Ye, W. G.; Liu, W. L.; Liu, T. J.; Liu, C. X.; Chen, J. X.; Tang, G. Y.; Shi, Z. M.; Huang, X. L.; Chen, J. E.
2005-05-01
The Institute of Heavy Ion Physics at Peking University has developed a new nuclear data online services software package. Through the web site (http://ndos.nst.pku.edu.cn), it offers online access to main relational nuclear databases: five evaluated neutron libraries (BROND, CENDL, ENDF, JEF, JENDL), the ENSDF library, the EXFOR library, the IAEA photonuclear library and the charged particle data of the FENDL library. This software allows the comparison and graphic representations of the different data sets. The computer programs of this package are based on the Linux implementation of PHP and the MySQL software.
Unobtrusive Social Network Data From Email
2008-12-01
PERSON Standard Form 298 (Rev. 8-98) Prescribed by ANSI Std Z39-18 outlook archived files and stores that data into an SQL - database. Communication...Applications ( VBA ) program was installed on the personal computers (PC) of all participants, in the session window of their Microsoft Outlook. Details of
Acoustic Metadata Management and Transparent Access to Networked Oceanographic Data Sets
2013-09-30
connectivity (ODBC) compliant data source for which drivers are available (e.g. MySQL , Oracle database, Postgres) can now be imported. Implementation...the possibility of speeding data transmission through compression (implemented) or the potential to use alternative data formats such as Java script
Social Impacts Module (SIM) Transition
2012-09-28
User String The authorized user’s name to access the PAVE database. Applies only to Microsoft SQL Server; leave blank, otherwise. passwd String The...otherwise. passwd String The password if an authorized user’s name is required; otherwise, leave blank driver String The class name for the driver to
Fish Karyome: A karyological information network database of Indian Fishes.
Nagpure, Naresh Sahebrao; Pathak, Ajey Kumar; Pati, Rameshwar; Singh, Shri Prakash; Singh, Mahender; Sarkar, Uttam Kumar; Kushwaha, Basdeo; Kumar, Ravindra
2012-01-01
'Fish Karyome', a database on karyological information of Indian fishes have been developed that serves as central source for karyotype data about Indian fishes compiled from the published literature. Fish Karyome has been intended to serve as a liaison tool for the researchers and contains karyological information about 171 out of 2438 finfish species reported in India and is publically available via World Wide Web. The database provides information on chromosome number, morphology, sex chromosomes, karyotype formula and cytogenetic markers etc. Additionally, it also provides the phenotypic information that includes species name, its classification, and locality of sample collection, common name, local name, sex, geographical distribution, and IUCN Red list status. Besides, fish and karyotype images, references for 171 finfish species have been included in the database. Fish Karyome has been developed using SQL Server 2008, a relational database management system, Microsoft's ASP.NET-2008 and Macromedia's FLASH Technology under Windows 7 operating environment. The system also enables users to input new information and images into the database, search and view the information and images of interest using various search options. Fish Karyome has wide range of applications in species characterization and identification, sex determination, chromosomal mapping, karyo-evolution and systematics of fishes.
TNAURice: Database on rice varieties released from Tamil Nadu Agricultural University
Ramalingam, Jegadeesan; Arul, Loganathan; Sathishkumar, Natarajan; Vignesh, Dhandapani; Thiyagarajan, Katiannan; Samiyappan, Ramasamy
2010-01-01
We developed, TNAURice: a database comprising of the rice varieties released from a public institution, Tamil Nadu Agricultural University (TNAU), Coimbatore, India. Backed by MS-SQL, and ASP-Net at the front end, this database provide information on both quantitative and qualitative descriptors of the rice varities inclusive of their parental details. Enabled by an user friendly search utility, the database can be effectively searched by the varietal descriptors, and the entire contents are navigable as well. The database comes handy to the plant breeders involved in the varietal improvement programs to decide on the choice of parental lines. TNAURice is available for public access at http://www.btistnau.org/germdefault.aspx. PMID:21364829
TNAURice: Database on rice varieties released from Tamil Nadu Agricultural University.
Ramalingam, Jegadeesan; Arul, Loganathan; Sathishkumar, Natarajan; Vignesh, Dhandapani; Thiyagarajan, Katiannan; Samiyappan, Ramasamy
2010-11-27
WE DEVELOPED, TNAURICE: a database comprising of the rice varieties released from a public institution, Tamil Nadu Agricultural University (TNAU), Coimbatore, India. Backed by MS-SQL, and ASP-Net at the front end, this database provide information on both quantitative and qualitative descriptors of the rice varities inclusive of their parental details. Enabled by an user friendly search utility, the database can be effectively searched by the varietal descriptors, and the entire contents are navigable as well. The database comes handy to the plant breeders involved in the varietal improvement programs to decide on the choice of parental lines. TNAURice is available for public access at http://www.btistnau.org/germdefault.aspx.
[Design and establishment of modern literature database about acupuncture Deqi].
Guo, Zheng-rong; Qian, Gui-feng; Pan, Qiu-yin; Wang, Yang; Xin, Si-yuan; Li, Jing; Hao, Jie; Hu, Ni-juan; Zhu, Jiang; Ma, Liang-xiao
2015-02-01
A search on acupuncture Deqi was conducted using four Chinese-language biomedical databases (CNKI, Wan-Fang, VIP and CBM) and PubMed database and using keywords "Deqi" or "needle sensation" "needling feeling" "needle feel" "obtaining qi", etc. Then, a "Modern Literature Database for Acupuncture Deqi" was established by employing Microsoft SQL Server 2005 Express Edition, introducing the contents, data types, information structure and logic constraint of the system table fields. From this Database, detailed inquiries about general information of clinical trials, acupuncturists' experience, ancient medical works, comprehensive literature, etc. can be obtained. The present databank lays a foundation for subsequent evaluation of literature quality about Deqi and data mining of undetected Deqi knowledge.
EST databases and web tools for EST projects.
Shen, Yao-Qing; O'Brien, Emmet; Koski, Liisa; Lang, B Franz; Burger, Gertraud
2009-01-01
This chapter outlines key considerations for constructing and implementing an EST database. Instead of showing the technological details step by step, emphasis is put on the design of an EST database suited to the specific needs of EST projects and how to choose the most suitable tools. Using TBestDB as an example, we illustrate the essential factors to be considered for database construction and the steps for data population and annotation. This process employs technologies such as PostgreSQL, Perl, and PHP to build the database and interface, and tools such as AutoFACT for data processing and annotation. We discuss these in comparison to other available technologies and tools, and explain the reasons for our choices.
SinEx DB: a database for single exon coding sequences in mammalian genomes.
Jorquera, Roddy; Ortiz, Rodrigo; Ossandon, F; Cárdenas, Juan Pablo; Sepúlveda, Rene; González, Carolina; Holmes, David S
2016-01-01
Eukaryotic genes are typically interrupted by intragenic, noncoding sequences termed introns. However, some genes lack introns in their coding sequence (CDS) and are generally known as 'single exon genes' (SEGs). In this work, a SEG is defined as a nuclear, protein-coding gene that lacks introns in its CDS. Whereas, many public databases of Eukaryotic multi-exon genes are available, there are only two specialized databases for SEGs. The present work addresses the need for a more extensive and diverse database by creating SinEx DB, a publicly available, searchable database of predicted SEGs from 10 completely sequenced mammalian genomes including human. SinEx DB houses the DNA and protein sequence information of these SEGs and includes their functional predictions (KOG) and the relative distribution of these functions within species. The information is stored in a relational database built with My SQL Server 5.1.33 and the complete dataset of SEG sequences and their functional predictions are available for downloading. SinEx DB can be interrogated by: (i) a browsable phylogenetic schema, (ii) carrying out BLAST searches to the in-house SinEx DB of SEGs and (iii) via an advanced search mode in which the database can be searched by key words and any combination of searches by species and predicted functions. SinEx DB provides a rich source of information for advancing our understanding of the evolution and function of SEGs.Database URL: www.sinex.cl. © The Author(s) 2016. Published by Oxford University Press.
Knowlton, Michelle N; Li, Tongbin; Ren, Yongliang; Bill, Brent R; Ellis, Lynda Bm; Ekker, Stephen C
2008-01-07
The zebrafish is a powerful model vertebrate amenable to high throughput in vivo genetic analyses. Examples include reverse genetic screens using morpholino knockdown, expression-based screening using enhancer trapping and forward genetic screening using transposon insertional mutagenesis. We have created a database to facilitate web-based distribution of data from such genetic studies. The MOrpholino DataBase is a MySQL relational database with an online, PHP interface. Multiple quality control levels allow differential access to data in raw and finished formats. MODBv1 includes sequence information relating to almost 800 morpholinos and their targets and phenotypic data regarding the dose effect of each morpholino (mortality, toxicity and defects). To improve the searchability of this database, we have incorporated a fixed-vocabulary defect ontology that allows for the organization of morpholino affects based on anatomical structure affected and defect produced. This also allows comparison between species utilizing Phenotypic Attribute Trait Ontology (PATO) designated terminology. MODB is also cross-linked with ZFIN, allowing full searches between the two databases. MODB offers users the ability to retrieve morpholino data by sequence of morpholino or target, name of target, anatomical structure affected and defect produced. MODB data can be used for functional genomic analysis of morpholino design to maximize efficacy and minimize toxicity. MODB also serves as a template for future sequence-based functional genetic screen databases, and it is currently being used as a model for the creation of a mutagenic insertional transposon database.
NetIntel: A Database for Manipulation of Rich Social Network Data
2005-03-03
between entities in a social or organizational system. For most of its history , social network analysis has operated on a notion of a dataset - a clearly...and procedural), as well as stored procedure and trigger capabilities. For the current implementation, we have chosen PostgreSQL [1] database. Of the...data and easy-to-use facilities for export of data into analysis tools as well as online browsing and data entry. References [1] Postgresql
2009-07-01
data were recognized as being largely geospatial and thus a GIS was considered the most reasonable way to proceed. The Postgre suite of software also...for the ESRI (2009) geodatabase environment but is applicable for this Postgre -based system. We then introduce and discuss spatial reference...PostgreSQL database using a Postgre ODBC connection. This procedure identified 100 tables with 737 columns. This is after the removal of two
BIGNASim: a NoSQL database structure and analysis portal for nucleic acids simulation data.
Hospital, Adam; Andrio, Pau; Cugnasco, Cesare; Codo, Laia; Becerra, Yolanda; Dans, Pablo D; Battistini, Federica; Torres, Jordi; Goñi, Ramón; Orozco, Modesto; Gelpí, Josep Ll
2016-01-04
Molecular dynamics simulation (MD) is, just behind genomics, the bioinformatics tool that generates the largest amounts of data, and that is using the largest amount of CPU time in supercomputing centres. MD trajectories are obtained after months of calculations, analysed in situ, and in practice forgotten. Several projects to generate stable trajectory databases have been developed for proteins, but no equivalence exists in the nucleic acids world. We present here a novel database system to store MD trajectories and analyses of nucleic acids. The initial data set available consists mainly of the benchmark of the new molecular dynamics force-field, parmBSC1. It contains 156 simulations, with over 120 μs of total simulation time. A deposition protocol is available to accept the submission of new trajectory data. The database is based on the combination of two NoSQL engines, Cassandra for storing trajectories and MongoDB to store analysis results and simulation metadata. The analyses available include backbone geometries, helical analysis, NMR observables and a variety of mechanical analyses. Individual trajectories and combined meta-trajectories can be downloaded from the portal. The system is accessible through http://mmb.irbbarcelona.org/BIGNASim/. Supplementary Material is also available on-line at http://mmb.irbbarcelona.org/BIGNASim/SuppMaterial/. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
NASA Astrophysics Data System (ADS)
Bemm, Stefan; Sandmeier, Christine; Wilde, Martina; Jaeger, Daniel; Schwindt, Daniel; Terhorst, Birgit
2014-05-01
The area of the Swabian-Franconian cuesta landscape (Southern Germany) is highly prone to landslides. This was apparent in the late spring of 2013, when numerous landslides occurred as a consequence of heavy and long-lasting rainfalls. The specific climatic situation caused numerous damages with serious impact on settlements and infrastructure. Knowledge on spatial distribution of landslides, processes and characteristics are important to evaluate the potential risk that can occur from mass movements in those areas. In the frame of two projects about 400 landslides were mapped and detailed data sets were compiled during years 2011 to 2014 at the Franconian Alb. The studies are related to the project "Slope stability and hazard zones in the northern Bavarian cuesta" (DFG, German Research Foundation) as well as to the LfU (The Bavarian Environment Agency) within the project "Georisks and climate change - hazard indication map Jura". The central goal of the present study is to create a spatial database for landslides. The database should contain all fundamental parameters to characterize the mass movements and should provide the potential for secure data storage and data management, as well as statistical evaluations. The spatial database was created with PostgreSQL, an object-relational database management system and PostGIS, a spatial database extender for PostgreSQL, which provides the possibility to store spatial and geographic objects and to connect to several GIS applications, like GRASS GIS, SAGA GIS, QGIS and GDAL, a geospatial library (Obe et al. 2011). Database access for querying, importing, and exporting spatial and non-spatial data is ensured by using GUI or non-GUI connections. The database allows the use of procedural languages for writing advanced functions in the R, Python or Perl programming languages. It is possible to work directly with the (spatial) data entirety of the database in R. The inventory of the database includes (amongst others), informations on location, landslide types and causes, geomorphological positions, geometries, hazards and damages, as well as assessments related to the activity of landslides. Furthermore, there are stored spatial objects, which represent the components of a landslide, in particular the scarps and the accumulation areas. Besides, waterways, map sheets, contour lines, detailed infrastructure data, digital elevation models, aspect and slope data are included. Examples of spatial queries to the database are intersections of raster and vector data for calculating values for slope gradients or aspects of landslide areas and for creating multiple, overlaying sections for the comparison of slopes, as well as distances to the infrastructure or to the next receiving drainage. Furthermore, getting informations on landslide magnitudes, distribution and clustering, as well as potential correlations concerning geomorphological or geological conditions. The data management concept in this study can be implemented for any academic, public or private use, because it is independent from any obligatory licenses. The created spatial database offers a platform for interdisciplinary research and socio-economic questions, as well as for landslide susceptibility and hazard indication mapping. Obe, R.O., Hsu, L.S. 2011. PostGIS in action. - pp 492, Manning Publications, Stamford
Temsch, W; Luger, A; Riedl, M
2008-01-01
This article presents a mathematical model to calculate HbA1c values based on self-measured blood glucose and past HbA1c levels, thereby enabling patients to monitor diabetes therapy between scheduled checkups. This method could help physicians to make treatment decisions if implemented in a system where glucose data are transferred to a remote server. The method, however, cannot replace HbA1c measurements; past HbA1c values are needed to gauge the method. The mathematical model of HbA1c formation was developed based on biochemical principles. Unlike an existing HbA1c formula, the new model respects the decreasing contribution of older glucose levels to current HbA1c values. About 12 standard SQL statements embedded in a php program were used to perform Fourier transform. Regression analysis was used to gauge results with previous HbA1c values. The method can be readily implemented in any SQL database. The predicted HbA1c values thus obtained were in accordance with measured values. They also matched the results of the HbA1c formula in the elevated range. By contrast, the formula was too "optimistic" in the range of better glycemic control. Individual analysis of two subjects improved the accuracy of values and reflected the bias introduced by different glucometers and individual measurement habits.
Cellular Consequences of Telomere Shortening in Histologically Normal Breast Tissues
2013-09-01
using the open source, JAVA -based image analysis software package ImageJ (http://rsb.info.nih.gov/ij/) and a custom designed plugin (“Telometer...Tabulated data were stored in a MySQL (http://www.mysql.com) database and viewed through Microsoft Access (Microsoft Corp.). Statistical Analysis For
An approach in building a chemical compound search engine in oracle database.
Wang, H; Volarath, P; Harrison, R
2005-01-01
A searching or identifying of chemical compounds is an important process in drug design and in chemistry research. An efficient search engine involves a close coupling of the search algorithm and database implementation. The database must process chemical structures, which demands the approaches to represent, store, and retrieve structures in a database system. In this paper, a general database framework for working as a chemical compound search engine in Oracle database is described. The framework is devoted to eliminate data type constrains for potential search algorithms, which is a crucial step toward building a domain specific query language on top of SQL. A search engine implementation based on the database framework is also demonstrated. The convenience of the implementation emphasizes the efficiency and simplicity of the framework.
Monitoring performance of a highly distributed and complex computing infrastructure in LHCb
NASA Astrophysics Data System (ADS)
Mathe, Z.; Haen, C.; Stagni, F.
2017-10-01
In order to ensure an optimal performance of the LHCb Distributed Computing, based on LHCbDIRAC, it is necessary to be able to inspect the behavior over time of many components: firstly the agents and services on which the infrastructure is built, but also all the computing tasks and data transfers that are managed by this infrastructure. This consists of recording and then analyzing time series of a large number of observables, for which the usage of SQL relational databases is far from optimal. Therefore within DIRAC we have been studying novel possibilities based on NoSQL databases (ElasticSearch, OpenTSDB and InfluxDB) as a result of this study we developed a new monitoring system based on ElasticSearch. It has been deployed on the LHCb Distributed Computing infrastructure for which it collects data from all the components (agents, services, jobs) and allows creating reports through Kibana and a web user interface, which is based on the DIRAC web framework. In this paper we describe this new implementation of the DIRAC monitoring system. We give details on the ElasticSearch implementation within the DIRAC general framework, as well as an overview of the advantages of the pipeline aggregation used for creating a dynamic bucketing of the time series. We present the advantages of using the ElasticSearch DSL high-level library for creating and running queries. Finally we shall present the performances of that system.
Murphy, SN; Barnett, GO; Chueh, HC
2000-01-01
The patient base of the Partners HealthCare System in Boston exceeds 1.8 million. Many of these patients are desirable for participation in research studies. To facilitate their discovery, we developed a data warehouse to contain clinical characteristics of these patients. The data warehouse contains diagnosis and procedures from administrative databases. The patients are indexed across institutions and their demographics provided by an Enterprise Master Patient Indexing service. Characteristics of the diagnoses and procedures such as associated providers, dates of service, inpatient/outpatient status, and other visit-related characteristics are also fed from the administrative systems. The targeted users of this system are research clinician s interested in finding patient cohorts for research studies. Their data requirements were analyzed and have been reported elsewhere. We did not expect the clinicians to become expert users of the system. Tools for querying healthcare data have traditionally been text based, although graphical interfaces have been pursued. In order to support the simple drag and drop visual model, as well as the identification and distribution of the patient data, a three-tier software architecture was developed. The user interface was developed in Visual Basic and distributed as an ActiveX object embedded in an HTML page. The middle layer was developed in Java and Microsoft COM. The queries are represented throughout their lifetime as XML objects, and the Microsoft SQL7 database is queried and managed in standard SQL. PMID:11080028
Murphy; Barnett; Chueh
2000-01-01
The patient base of the Partners HealthCare System in Boston exceeds 1.8 million. Many of these patients are desirable for participation in research studies. To facilitate their discovery, we developed a data warehouse to contain clinical characteristics of these patients. The data warehouse contains diagnosis and procedures from administrative databases. The patients are indexed across institutions and their demographics provided by an Enterprise Master Patient Indexing service. Characteristics of the diagnoses and procedures such as associated providers, dates of service, inpatient/outpatient status, and other visit-related characteristics are also fed from the administrative systems. The targeted users of this system are research clinician s interested in finding patient cohorts for research studies. Their data requirements were analyzed and have been reported elsewhere. We did not expect the clinicians to become expert users of the system. Tools for querying healthcare data have traditionally been text based, although graphical interfaces have been pursued. In order to support the simple drag and drop visual model, as well as the identification and distribution of the patient data, a three-tier software architecture was developed. The user interface was developed in Visual Basic and distributed as an ActiveX object embedded in an HTML page. The middle layer was developed in Java and Microsoft COM. The queries are represented throughout their lifetime as XML objects, and the Microsoft SQL7 database is queried and managed in standard SQL.
NASA Astrophysics Data System (ADS)
Nakagawa, Y.; Kawahara, S.; Araki, F.; Matsuoka, D.; Ishikawa, Y.; Fujita, M.; Sugimoto, S.; Okada, Y.; Kawazoe, S.; Watanabe, S.; Ishii, M.; Mizuta, R.; Murata, A.; Kawase, H.
2017-12-01
Analyses of large ensemble data are quite useful in order to produce probabilistic effect projection of climate change. Ensemble data of "+2K future climate simulations" are currently produced by Japanese national project "Social Implementation Program on Climate Change Adaptation Technology (SI-CAT)" as a part of a database for Policy Decision making for Future climate change (d4PDF; Mizuta et al. 2016) produced by Program for Risk Information on Climate Change. Those data consist of global warming simulations and regional downscaling simulations. Considering that those data volumes are too large (a few petabyte) to download to a local computer of users, a user-friendly system is required to search and download data which satisfy requests of the users. We develop "a database system for near-future climate change projections" for providing functions to find necessary data for the users under SI-CAT. The database system for near-future climate change projections mainly consists of a relational database, a data download function and user interface. The relational database using PostgreSQL is a key function among them. Temporally and spatially compressed data are registered on the relational database. As a first step, we develop the relational database for precipitation, temperature and track data of typhoon according to requests by SI-CAT members. The data download function using Open-source Project for a Network Data Access Protocol (OPeNDAP) provides a function to download temporally and spatially extracted data based on search results obtained by the relational database. We also develop the web-based user interface for using the relational database and the data download function. A prototype of the database system for near-future climate change projections are currently in operational test on our local server. The database system for near-future climate change projections will be released on Data Integration and Analysis System Program (DIAS) in fiscal year 2017. Techniques of the database system for near-future climate change projections might be quite useful for simulation and observational data in other research fields. We report current status of development and some case studies of the database system for near-future climate change projections.
A MYSQL-BASED DATA ARCHIVER: PRELIMINARY RESULTS
DOE Office of Scientific and Technical Information (OSTI.GOV)
Matthew Bickley; Christopher Slominski
2008-01-23
Following an evaluation of the archival requirements of the Jefferson Laboratory accelerator’s user community, a prototyping effort was executed to determine if an archiver based on MySQL had sufficient functionality to meet those requirements. This approach was chosen because an archiver based on a relational database enables the development effort to focus on data acquisition and management, letting the database take care of storage, indexing and data consistency. It was clear from the prototype effort that there were no performance impediments to successful implementation of a final system. With our performance concerns addressed, the lab undertook the design and developmentmore » of an operational system. The system is in its operational testing phase now. This paper discusses the archiver system requirements, some of the design choices and their rationale, and presents the acquisition, storage and retrieval performance.« less
A Hybrid Multilevel Storage Architecture for Electric Power Dispatching Big Data
NASA Astrophysics Data System (ADS)
Yan, Hu; Huang, Bibin; Hong, Bowen; Hu, Jing
2017-10-01
Electric power dispatching is the center of the whole power system. In the long run time, the power dispatching center has accumulated a large amount of data. These data are now stored in different power professional systems and form lots of information isolated islands. Integrating these data and do comprehensive analysis can greatly improve the intelligent level of power dispatching. In this paper, a hybrid multilevel storage architecture for electrical power dispatching big data is proposed. It introduces relational database and NoSQL database to establish a power grid panoramic data center, effectively meet power dispatching big data storage needs, including the unified storage of structured and unstructured data fast access of massive real-time data, data version management and so on. It can be solid foundation for follow-up depth analysis of power dispatching big data.
Informatics application provides instant research to practice benefits.
Bowles, K. H.; Peng, T.; Qian, R.; Naylor, M. D.
2001-01-01
A web-based research information system was designed to enable our research team to efficiently measure health related quality of life among frail older adults in a variety of health care settings (home care, nursing homes, assisted living, PACE). The structure, process, and outcome data is collected using laptop computers and downloaded to a SQL database. Unique features of this project are the ability to transfer research to practice by instantly sharing individual and aggregate results with the clinicians caring for these elders and directly impacting the quality of their care. Clinicians can also dial in to the database to access standard queries or receive customized reports about the patients in their facilities. This paper will describe the development and implementation of the information system. The conference presentation will include a demonstration and examples of research to practice benefits. PMID:11825156
NASA Astrophysics Data System (ADS)
Moreau, N.; Dubernet, M. L.
2006-07-01
Basecol is a combination of a website (using PHP and HTML) and a MySQL database concerning molecular ro-vibrational transitions induced by collisions with atoms or molecules. This database has been created in view of the scientific preparation of the Heterodyne Instrument for the Far-Infrared on board the Herschel Space Observatory (HSO). Basecol offers an access to numerical and bibliographic data through various output methods such as ASCII, HTML or VOTable (which is a first step towards a VO compliant system). A web service using Apache Axis has been developed in order to provide a direct access to data for external applications.
Web client and ODBC access to legacy database information: a low cost approach.
Sanders, N. W.; Mann, N. H.; Spengler, D. M.
1997-01-01
A new method has been developed for the Department of Orthopaedics of Vanderbilt University Medical Center to access departmental clinical data. Previously this data was stored only in the medical center's mainframe DB2 database, it is now additionally stored in a departmental SQL database. Access to this data is available via any ODBC compliant front-end or a web client. With a small budget and no full time staff, we were able to give our department on-line access to many years worth of patient data that was previously inaccessible. PMID:9357735
EVpedia: a community web portal for extracellular vesicles research.
Kim, Dae-Kyum; Lee, Jaewook; Kim, Sae Rom; Choi, Dong-Sic; Yoon, Yae Jin; Kim, Ji Hyun; Go, Gyeongyun; Nhung, Dinh; Hong, Kahye; Jang, Su Chul; Kim, Si-Hyun; Park, Kyong-Su; Kim, Oh Youn; Park, Hyun Taek; Seo, Ji Hye; Aikawa, Elena; Baj-Krzyworzeka, Monika; van Balkom, Bas W M; Belting, Mattias; Blanc, Lionel; Bond, Vincent; Bongiovanni, Antonella; Borràs, Francesc E; Buée, Luc; Buzás, Edit I; Cheng, Lesley; Clayton, Aled; Cocucci, Emanuele; Dela Cruz, Charles S; Desiderio, Dominic M; Di Vizio, Dolores; Ekström, Karin; Falcon-Perez, Juan M; Gardiner, Chris; Giebel, Bernd; Greening, David W; Gross, Julia Christina; Gupta, Dwijendra; Hendrix, An; Hill, Andrew F; Hill, Michelle M; Nolte-'t Hoen, Esther; Hwang, Do Won; Inal, Jameel; Jagannadham, Medicharla V; Jayachandran, Muthuvel; Jee, Young-Koo; Jørgensen, Malene; Kim, Kwang Pyo; Kim, Yoon-Keun; Kislinger, Thomas; Lässer, Cecilia; Lee, Dong Soo; Lee, Hakmo; van Leeuwen, Johannes; Lener, Thomas; Liu, Ming-Lin; Lötvall, Jan; Marcilla, Antonio; Mathivanan, Suresh; Möller, Andreas; Morhayim, Jess; Mullier, François; Nazarenko, Irina; Nieuwland, Rienk; Nunes, Diana N; Pang, Ken; Park, Jaesung; Patel, Tushar; Pocsfalvi, Gabriella; Del Portillo, Hernando; Putz, Ulrich; Ramirez, Marcel I; Rodrigues, Marcio L; Roh, Tae-Young; Royo, Felix; Sahoo, Susmita; Schiffelers, Raymond; Sharma, Shivani; Siljander, Pia; Simpson, Richard J; Soekmadji, Carolina; Stahl, Philip; Stensballe, Allan; Stępień, Ewa; Tahara, Hidetoshi; Trummer, Arne; Valadi, Hadi; Vella, Laura J; Wai, Sun Nyunt; Witwer, Kenneth; Yáñez-Mó, María; Youn, Hyewon; Zeidler, Reinhard; Gho, Yong Song
2015-03-15
Extracellular vesicles (EVs) are spherical bilayered proteolipids, harboring various bioactive molecules. Due to the complexity of the vesicular nomenclatures and components, online searches for EV-related publications and vesicular components are currently challenging. We present an improved version of EVpedia, a public database for EVs research. This community web portal contains a database of publications and vesicular components, identification of orthologous vesicular components, bioinformatic tools and a personalized function. EVpedia includes 6879 publications, 172 080 vesicular components from 263 high-throughput datasets, and has been accessed more than 65 000 times from more than 750 cities. In addition, about 350 members from 73 international research groups have participated in developing EVpedia. This free web-based database might serve as a useful resource to stimulate the emerging field of EV research. The web site was implemented in PHP, Java, MySQL and Apache, and is freely available at http://evpedia.info. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
NASA Astrophysics Data System (ADS)
Krishna, B.; Gustafson, W. I., Jr.; Vogelmann, A. M.; Toto, T.; Devarakonda, R.; Palanisamy, G.
2016-12-01
This paper presents a new way of providing ARM data discovery through data analysis and visualization services. ARM stands for Atmospheric Radiation Measurement. This Program was created to study cloud formation processes and their influence on radiative transfer and also include additional measurements of aerosol and precipitation at various highly instrumented ground and mobile stations. The total volume of ARM data is roughly 900TB. The current search for ARM data is performed by using its metadata, such as the site name, instrument name, date, etc. NoSQL technologies were explored to improve the capabilities of data searching, not only by their metadata, but also by using the measurement values. Two technologies that are currently being implemented for testing are Apache Cassandra (noSQL database) and Apache Spark (noSQL based analytics framework). Both of these technologies were developed to work in a distributed environment and hence can handle large data for storing and analytics. D3.js is a JavaScript library that can generate interactive data visualizations in web browsers by making use of commonly used SVG, HTML5, and CSS standards. To test the performance of NoSQL for ARM data, we will be using ARM's popular measurements to locate the data based on its value. Recently noSQL technology has been applied to a pilot project called LASSO, which stands for LES ARM Symbiotic Simulation and Observation Workflow. LASSO will be packaging LES output and observations in "data bundles" and analyses will require the ability for users to analyze both observations and LES model output either individually or together across multiple time periods. The LASSO implementation strategy suggests that enormous data storage is required to store the above mentioned quantities. Thus noSQL was used to provide a powerful means to store portions of the data that provided users with search capabilities on each simulation's traits through a web application. Based on the user selection, plots are created dynamically along with ancillary information that enables the user to locate and download data that fulfilled their required traits.
Database usage and performance for the Fermilab Run II experiments
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bonham, D.; Box, D.; Gallas, E.
2004-12-01
The Run II experiments at Fermilab, CDF and D0, have extensive database needs covering many areas of their online and offline operations. Delivering data to users and processing farms worldwide has represented major challenges to both experiments. The range of applications employing databases includes, calibration (conditions), trigger information, run configuration, run quality, luminosity, data management, and others. Oracle is the primary database product being used for these applications at Fermilab and some of its advanced features have been employed, such as table partitioning and replication. There is also experience with open source database products such as MySQL for secondary databasesmore » used, for example, in monitoring. Tools employed for monitoring the operation and diagnosing problems are also described.« less
Charbonnier, Amandine; Knapp, Jenny; Demonmerot, Florent; Bresson-Hadni, Solange; Raoul, Francis; Grenouillet, Frédéric; Millon, Laurence; Vuitton, Dominique Angèle; Damy, Sylvie
2014-01-01
Alveolar echinococcosis (AE) is an endemic zoonosis in France due to the cestode Echinococcus multilocularis. The French National Reference Centre for Alveolar Echinococcosis (CNR-EA), connected to the FrancEchino network, is responsible for recording all AE cases diagnosed in France. Administrative, epidemiological and medical information on the French AE cases may currently be considered exhaustive only on the diagnosis time. To constitute a reference data set, an information system (IS) was developed thanks to a relational database management system (MySQL language). The current data set will evolve towards a dynamic surveillance system, including follow-up data (e.g. imaging, serology) and will be connected to environmental and parasitological data relative to E. multilocularis to better understand the pathogen transmission pathway. A particularly important goal is the possible interoperability of the IS with similar European and other databases abroad; this new IS could play a supporting role in the creation of new AE registries. © A. Charbonnier et al., published by EDP Sciences, 2014.
A new data management system for the French National Registry of human alveolar echinococcosis cases
Charbonnier, Amandine; Knapp, Jenny; Demonmerot, Florent; Bresson-Hadni, Solange; Raoul, Francis; Grenouillet, Frédéric; Millon, Laurence; Vuitton, Dominique Angèle; Damy, Sylvie
2014-01-01
Alveolar echinococcosis (AE) is an endemic zoonosis in France due to the cestode Echinococcus multilocularis. The French National Reference Centre for Alveolar Echinococcosis (CNR-EA), connected to the FrancEchino network, is responsible for recording all AE cases diagnosed in France. Administrative, epidemiological and medical information on the French AE cases may currently be considered exhaustive only on the diagnosis time. To constitute a reference data set, an information system (IS) was developed thanks to a relational database management system (MySQL language). The current data set will evolve towards a dynamic surveillance system, including follow-up data (e.g. imaging, serology) and will be connected to environmental and parasitological data relative to E. multilocularis to better understand the pathogen transmission pathway. A particularly important goal is the possible interoperability of the IS with similar European and other databases abroad; this new IS could play a supporting role in the creation of new AE registries. PMID:25526544
Xu, Weijia; Ozer, Stuart; Gutell, Robin R
2009-01-01
With an increasingly large amount of sequences properly aligned, comparative sequence analysis can accurately identify not only common structures formed by standard base pairing but also new types of structural elements and constraints. However, traditional methods are too computationally expensive to perform well on large scale alignment and less effective with the sequences from diversified phylogenetic classifications. We propose a new approach that utilizes coevolutional rates among pairs of nucleotide positions using phylogenetic and evolutionary relationships of the organisms of aligned sequences. With a novel data schema to manage relevant information within a relational database, our method, implemented with a Microsoft SQL Server 2005, showed 90% sensitivity in identifying base pair interactions among 16S ribosomal RNA sequences from Bacteria, at a scale 40 times bigger and 50% better sensitivity than a previous study. The results also indicated covariation signals for a few sets of cross-strand base stacking pairs in secondary structure helices, and other subtle constraints in the RNA structure.
Xu, Weijia; Ozer, Stuart; Gutell, Robin R.
2010-01-01
With an increasingly large amount of sequences properly aligned, comparative sequence analysis can accurately identify not only common structures formed by standard base pairing but also new types of structural elements and constraints. However, traditional methods are too computationally expensive to perform well on large scale alignment and less effective with the sequences from diversified phylogenetic classifications. We propose a new approach that utilizes coevolutional rates among pairs of nucleotide positions using phylogenetic and evolutionary relationships of the organisms of aligned sequences. With a novel data schema to manage relevant information within a relational database, our method, implemented with a Microsoft SQL Server 2005, showed 90% sensitivity in identifying base pair interactions among 16S ribosomal RNA sequences from Bacteria, at a scale 40 times bigger and 50% better sensitivity than a previous study. The results also indicated covariation signals for a few sets of cross-strand base stacking pairs in secondary structure helices, and other subtle constraints in the RNA structure. PMID:20502534
Fourment, Mathieu; Gibbs, Mark J
2008-01-01
Background Viruses of the Bunyaviridae have segmented negative-stranded RNA genomes and several of them cause significant disease. Many partial sequences have been obtained from the segments so that GenBank searches give complex results. Sequence databases usually use HTML pages to mediate remote sorting, but this approach can be limiting and may discourage a user from exploring a database. Results The VirusBanker database contains Bunyaviridae sequences and alignments and is presented as two spreadsheets generated by a Java program that interacts with a MySQL database on a server. Sequences are displayed in rows and may be sorted using information that is displayed in columns and includes data relating to the segment, gene, protein, species, strain, sequence length, terminal sequence and date and country of isolation. Bunyaviridae sequences and alignments may be downloaded from the second spreadsheet with titles defined by the user from the columns, or viewed when passed directly to the sequence editor, Jalview. Conclusion VirusBanker allows large datasets of aligned nucleotide and protein sequences from the Bunyaviridae to be compiled and winnowed rapidly using criteria that are formulated heuristically. PMID:18251994
MetReS, an Efficient Database for Genomic Applications.
Vilaplana, Jordi; Alves, Rui; Solsona, Francesc; Mateo, Jordi; Teixidó, Ivan; Pifarré, Marc
2018-02-01
MetReS (Metabolic Reconstruction Server) is a genomic database that is shared between two software applications that address important biological problems. Biblio-MetReS is a data-mining tool that enables the reconstruction of molecular networks based on automated text-mining analysis of published scientific literature. Homol-MetReS allows functional (re)annotation of proteomes, to properly identify both the individual proteins involved in the processes of interest and their function. The main goal of this work was to identify the areas where the performance of the MetReS database performance could be improved and to test whether this improvement would scale to larger datasets and more complex types of analysis. The study was started with a relational database, MySQL, which is the current database server used by the applications. We also tested the performance of an alternative data-handling framework, Apache Hadoop. Hadoop is currently used for large-scale data processing. We found that this data handling framework is likely to greatly improve the efficiency of the MetReS applications as the dataset and the processing needs increase by several orders of magnitude, as expected to happen in the near future.
Ultrabroadband photonic internet: safety aspects
NASA Astrophysics Data System (ADS)
Kalicki, Arkadiusz; Romaniuk, Ryszard
2008-11-01
Web applications became most popular medium in the Internet. Popularity, easiness of web application frameworks together with careless development results in high number of vulnerabilities and attacks. There are several types of attacks possible because of improper input validation. SQL injection is ability to execute arbitrary SQL queries in a database through an existing application. Cross-site scripting is the vulnerability which allows malicious web users to inject code into the web pages viewed by other users. Cross-Site Request Forgery (CSRF) is an attack that tricks the victim into loading a page that contains malicious request. Web spam in blogs. There are several techniques to mitigate attacks. Most important are web application strong design, correct input validation, defined data types for each field and parameterized statements in SQL queries. Server hardening with firewall, modern security policies systems and safe web framework interpreter configuration are essential. It is advised to keep proper security level on client side, keep updated software and install personal web firewalls or IDS/IPS systems. Good habits are logging out from services just after finishing work and using even separate web browser for most important sites, like e-banking.
The VERITAS Facility: A Virtual Environment Platform for Human Performance Research
2016-01-01
IAE The IAE supports the audio environment that users experience during the course of an experiment. This includes environmental sounds, user-to...future, we are looking towards a database-based system that would use MySQL or an equivalent product to store the large data sets and provide standard
Teaching Undergraduate Software Engineering Using Open Source Development Tools
2012-01-01
ware. Some example appliances are: a LAMP stack, Redmine, MySQL database, Moodle, Tom- cat on Apache, and Bugzilla. Some of the important features...Ada, C, C++, PHP , Py- thon, etc., and also supports a wide range of SDKs such as Google’s Android SDK and the Google Web Toolkit SDK. Additionally
78 FR 55699 - Privacy Act of 1974; Proposed New Systems of Records
Federal Register 2010, 2011, 2012, 2013, 2014
2013-09-11
... FMC OIT staff at its Washington, DC headquarters. The FMC GSS is made up of servers, switches, gateways, and two firewall devices. The servers, switches, gateways, and firewall devices are physically... within the confines of FMC-39, FMC General Support System (FMC GSS) and FMC-41, FMC SQL Database (FMCDB...
Oracle Applications Patch Administration Tool (PAT) Beta Version
DOE Office of Scientific and Technical Information (OSTI.GOV)
2002-01-04
PAT is a Patch Administration Tool that provides analysis, tracking, and management of Oracle Application patches. This includes capabilities as outlined below: Patch Analysis & Management Tool Outline of capabilities: Administration Patch Data Maintenance -- track Oracle Application patches applied to what database instance & machine Patch Analysis capture text files (readme.txt and driver files) form comparison detail report comparison detail PL/SQL package comparison detail SQL scripts detail JSP module comparison detail Parse and load the current applptch.txt (10.7) or load patch data from Oracle Application database patch tables (11i) Display Analysis -- Compare patch to be applied with currentmore » Oracle Application installed Appl_top code versions Patch Detail Module comparison detail Analyze and display one Oracle Application module patch. Patch Management -- automatic queue and execution of patches Administration Parameter maintenance -- setting for directory structure of Oracle Application appl_top Validation data maintenance -- machine names and instances to patch Operation Patch Data Maintenance Schedule a patch (queue for later execution) Run a patch (queue for immediate execution) Review the patch logs Patch Management Reports« less
WebCN: A web-based computation tool for in situ-produced cosmogenic nuclides
NASA Astrophysics Data System (ADS)
Ma, Xiuzeng; Li, Yingkui; Bourgeois, Mike; Caffee, Marc; Elmore, David; Granger, Darryl; Muzikar, Paul; Smith, Preston
2007-06-01
Cosmogenic nuclide techniques are increasingly being utilized in geoscience research. For this it is critical to establish an effective, easily accessible and well defined tool for cosmogenic nuclide computations. We have been developing a web-based tool (WebCN) to calculate surface exposure ages and erosion rates based on the nuclide concentrations measured by the accelerator mass spectrometry. WebCN for 10Be and 26Al has been finished and published at http://www.physics.purdue.edu/primelab/for_users/rockage.html. WebCN for 36Cl is under construction. WebCN is designed as a three-tier client/server model and uses the open source PostgreSQL for the database management and PHP for the interface design and calculations. On the client side, an internet browser and Microsoft Access are used as application interfaces to access the system. Open Database Connectivity is used to link PostgreSQL and Microsoft Access. WebCN accounts for both spatial and temporal distributions of the cosmic ray flux to calculate the production rates of in situ-produced cosmogenic nuclides at the Earth's surface.
Multiple-Feature Extracting Modules Based Leak Mining System Design
Cho, Ying-Chiang; Pan, Jen-Yi
2013-01-01
Over the years, human dependence on the Internet has increased dramatically. A large amount of information is placed on the Internet and retrieved from it daily, which makes web security in terms of online information a major concern. In recent years, the most problematic issues in web security have been e-mail address leakage and SQL injection attacks. There are many possible causes of information leakage, such as inadequate precautions during the programming process, which lead to the leakage of e-mail addresses entered online or insufficient protection of database information, a loophole that enables malicious users to steal online content. In this paper, we implement a crawler mining system that is equipped with SQL injection vulnerability detection, by means of an algorithm developed for the web crawler. In addition, we analyze portal sites of the governments of various countries or regions in order to investigate the information leaking status of each site. Subsequently, we analyze the database structure and content of each site, using the data collected. Thus, we make use of practical verification in order to focus on information security and privacy through black-box testing. PMID:24453892
Multiple-feature extracting modules based leak mining system design.
Cho, Ying-Chiang; Pan, Jen-Yi
2013-01-01
Over the years, human dependence on the Internet has increased dramatically. A large amount of information is placed on the Internet and retrieved from it daily, which makes web security in terms of online information a major concern. In recent years, the most problematic issues in web security have been e-mail address leakage and SQL injection attacks. There are many possible causes of information leakage, such as inadequate precautions during the programming process, which lead to the leakage of e-mail addresses entered online or insufficient protection of database information, a loophole that enables malicious users to steal online content. In this paper, we implement a crawler mining system that is equipped with SQL injection vulnerability detection, by means of an algorithm developed for the web crawler. In addition, we analyze portal sites of the governments of various countries or regions in order to investigate the information leaking status of each site. Subsequently, we analyze the database structure and content of each site, using the data collected. Thus, we make use of practical verification in order to focus on information security and privacy through black-box testing.
Ontology-based data integration between clinical and research systems.
Mate, Sebastian; Köpcke, Felix; Toddenroth, Dennis; Martin, Marcus; Prokosch, Hans-Ulrich; Bürkle, Thomas; Ganslandt, Thomas
2015-01-01
Data from the electronic medical record comprise numerous structured but uncoded elements, which are not linked to standard terminologies. Reuse of such data for secondary research purposes has gained in importance recently. However, the identification of relevant data elements and the creation of database jobs for extraction, transformation and loading (ETL) are challenging: With current methods such as data warehousing, it is not feasible to efficiently maintain and reuse semantically complex data extraction and trans-formation routines. We present an ontology-supported approach to overcome this challenge by making use of abstraction: Instead of defining ETL procedures at the database level, we use ontologies to organize and describe the medical concepts of both the source system and the target system. Instead of using unique, specifically developed SQL statements or ETL jobs, we define declarative transformation rules within ontologies and illustrate how these constructs can then be used to automatically generate SQL code to perform the desired ETL procedures. This demonstrates how a suitable level of abstraction may not only aid the interpretation of clinical data, but can also foster the reutilization of methods for un-locking it.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Valassi, A.; Clemencic, M.; Dykstra, D.
The Persistency Framework consists of three software packages (CORAL, COOL and POOL) addressing the data access requirements of the LHC experiments in different areas. It is the result of the collaboration between the CERN IT Department and the three experiments (ATLAS, CMS and LHCb) that use this software to access their data. POOL is a hybrid technology store for C++ objects, metadata catalogs and collections. CORAL is a relational database abstraction layer with an SQL-free API. COOL provides specific software tools and components for the handling of conditions data. This paper reports on the status and outlook of the projectmore » and reviews in detail the usage of each package in the three experiments.« less
Visualization for genomics: the Microbial Genome Viewer.
Kerkhoven, Robert; van Enckevort, Frank H J; Boekhorst, Jos; Molenaar, Douwe; Siezen, Roland J
2004-07-22
A Web-based visualization tool, the Microbial Genome Viewer, is presented that allows the user to combine complex genomic data in a highly interactive way. This Web tool enables the interactive generation of chromosome wheels and linear genome maps from genome annotation data stored in a MySQL database. The generated images are in scalable vector graphics (SVG) format, which is suitable for creating high-quality scalable images and dynamic Web representations. Gene-related data such as transcriptome and time-course microarray experiments can be superimposed on the maps for visual inspection. The Microbial Genome Viewer 1.0 is freely available at http://www.cmbi.kun.nl/MGV
DB-PABP: a database of polyanion-binding proteins
Fang, Jianwen; Dong, Yinghua; Salamat-Miller, Nazila; Russell Middaugh, C.
2008-01-01
The interactions between polyanions (PAs) and polyanion-binding proteins (PABPs) have been found to play significant roles in many essential biological processes including intracellular organization, transport and protein folding. Furthermore, many neurodegenerative disease-related proteins are PABPs. Thus, a better understanding of PA/PABP interactions may not only enhance our understandings of biological systems but also provide new clues to these deadly diseases. The literature in this field is widely scattered, suggesting the need for a comprehensive and searchable database of PABPs. The DB-PABP is a comprehensive, manually curated and searchable database of experimentally characterized PABPs. It is freely available and can be accessed online at http://pabp.bcf.ku.edu/DB_PABP/. The DB-PABP was implemented as a MySQL relational database. An interactive web interface was created using Java Server Pages (JSP). The search page of the database is organized into a main search form and a section for utilities. The main search form enables custom searches via four menus: protein names, polyanion names, the source species of the proteins and the methods used to discover the interactions. Available utilities include a commonality matrix, a function of listing PABPs by the number of interacting polyanions and a string search for author surnames. The DB-PABP is maintained at the University of Kansas. We encourage users to provide feedback and submit new data and references. PMID:17916573
DB-PABP: a database of polyanion-binding proteins.
Fang, Jianwen; Dong, Yinghua; Salamat-Miller, Nazila; Middaugh, C Russell
2008-01-01
The interactions between polyanions (PAs) and polyanion-binding proteins (PABPs) have been found to play significant roles in many essential biological processes including intracellular organization, transport and protein folding. Furthermore, many neurodegenerative disease-related proteins are PABPs. Thus, a better understanding of PA/PABP interactions may not only enhance our understandings of biological systems but also provide new clues to these deadly diseases. The literature in this field is widely scattered, suggesting the need for a comprehensive and searchable database of PABPs. The DB-PABP is a comprehensive, manually curated and searchable database of experimentally characterized PABPs. It is freely available and can be accessed online at http://pabp.bcf.ku.edu/DB_PABP/. The DB-PABP was implemented as a MySQL relational database. An interactive web interface was created using Java Server Pages (JSP). The search page of the database is organized into a main search form and a section for utilities. The main search form enables custom searches via four menus: protein names, polyanion names, the source species of the proteins and the methods used to discover the interactions. Available utilities include a commonality matrix, a function of listing PABPs by the number of interacting polyanions and a string search for author surnames. The DB-PABP is maintained at the University of Kansas. We encourage users to provide feedback and submit new data and references.
NASA Astrophysics Data System (ADS)
Celicourt, P.; Piasecki, M.
2014-12-01
The high cost of hydro-meteorological data acquisition, communication and publication systems along with limited qualified human resources is considered as the main reason why hydro-meteorological data collection remains a challenge especially in developing countries. Despite significant advances in sensor network technologies which gave birth to open hardware and software, low-cost (less than $50) and low-power (in the order of a few miliWatts) sensor platforms in the last two decades, sensors and sensor network deployment remains a labor-intensive, time consuming, cumbersome, and thus expensive task. These factors give rise for the need to develop a affordable, simple to deploy, scalable and self-organizing end-to-end (from sensor to publication) system suitable for deployment in such countries. The design of the envisioned system will consist of a few Sensed-And-Programmed Arduino-based sensor nodes with low-cost sensors measuring parameters relevant to hydrological processes and a Raspberry Pi micro-computer hosting the in-the-field back-end data management. This latter comprises the Python/Django model of the CUAHSI Observations Data Model (ODM) namely DjangODM backed by a PostgreSQL Database Server. We are also developing a Python-based data processing script which will be paired with the data autoloading capability of Django to populate the DjangODM database with the incoming data. To publish the data, the WOFpy (WaterOneFlow Web Services in Python) developed by the Texas Water Development Board for 'Water Data for Texas' which can produce WaterML web services from a variety of back-end database installations such as SQLite, MySQL, and PostgreSQL will be used. A step further would be the development of an appealing online visualization tool using Python statistics and analytics tools (Scipy, Numpy, Pandas) showing the spatial distribution of variables across an entire watershed as a time variant layer on top of a basemap.
Oliveira, S R M; Almeida, G V; Souza, K R R; Rodrigues, D N; Kuser-Falcão, P R; Yamagishi, M E B; Santos, E H; Vieira, F D; Jardine, J G; Neshich, G
2007-10-05
An effective strategy for managing protein databases is to provide mechanisms to transform raw data into consistent, accurate and reliable information. Such mechanisms will greatly reduce operational inefficiencies and improve one's ability to better handle scientific objectives and interpret the research results. To achieve this challenging goal for the STING project, we introduce Sting_RDB, a relational database of structural parameters for protein analysis with support for data warehousing and data mining. In this article, we highlight the main features of Sting_RDB and show how a user can explore it for efficient and biologically relevant queries. Considering its importance for molecular biologists, effort has been made to advance Sting_RDB toward data quality assessment. To the best of our knowledge, Sting_RDB is one of the most comprehensive data repositories for protein analysis, now also capable of providing its users with a data quality indicator. This paper differs from our previous study in many aspects. First, we introduce Sting_RDB, a relational database with mechanisms for efficient and relevant queries using SQL. Sting_rdb evolved from the earlier, text (flat file)-based database, in which data consistency and integrity was not guaranteed. Second, we provide support for data warehousing and mining. Third, the data quality indicator was introduced. Finally and probably most importantly, complex queries that could not be posed on a text-based database, are now easily implemented. Further details are accessible at the Sting_RDB demo web page: http://www.cbi.cnptia.embrapa.br/StingRDB.
Informatics in radiology: use of CouchDB for document-based storage of DICOM objects.
Rascovsky, Simón J; Delgado, Jorge A; Sanz, Alexander; Calvo, Víctor D; Castrillón, Gabriel
2012-01-01
Picture archiving and communication systems traditionally have depended on schema-based Structured Query Language (SQL) databases for imaging data management. To optimize database size and performance, many such systems store a reduced set of Digital Imaging and Communications in Medicine (DICOM) metadata, discarding informational content that might be needed in the future. As an alternative to traditional database systems, document-based key-value stores recently have gained popularity. These systems store documents containing key-value pairs that facilitate data searches without predefined schemas. Document-based key-value stores are especially suited to archive DICOM objects because DICOM metadata are highly heterogeneous collections of tag-value pairs conveying specific information about imaging modalities, acquisition protocols, and vendor-supported postprocessing options. The authors used an open-source document-based database management system (Apache CouchDB) to create and test two such databases; CouchDB was selected for its overall ease of use, capability for managing attachments, and reliance on HTTP and Representational State Transfer standards for accessing and retrieving data. A large database was created first in which the DICOM metadata from 5880 anonymized magnetic resonance imaging studies (1,949,753 images) were loaded by using a Ruby script. To provide the usual DICOM query functionality, several predefined "views" (standard queries) were created by using JavaScript. For performance comparison, the same queries were executed in both the CouchDB database and a SQL-based DICOM archive. The capabilities of CouchDB for attachment management and database replication were separately assessed in tests of a similar, smaller database. Results showed that CouchDB allowed efficient storage and interrogation of all DICOM objects; with the use of information retrieval algorithms such as map-reduce, all the DICOM metadata stored in the large database were searchable with only a minimal increase in retrieval time over that with the traditional database management system. Results also indicated possible uses for document-based databases in data mining applications such as dose monitoring, quality assurance, and protocol optimization. RSNA, 2012
S&MPO - An information system for ozone spectroscopy on the WEB
NASA Astrophysics Data System (ADS)
Babikov, Yurii L.; Mikhailenko, Semen N.; Barbe, Alain; Tyuterev, Vladimir G.
2014-09-01
Spectroscopy and Molecular Properties of Ozone ("S&MPO") is an Internet accessible information system devoted to high resolution spectroscopy of the ozone molecule, related properties and data sources. S&MPO contains information on original spectroscopic data (line positions, line intensities, energies, transition moments, spectroscopic parameters) recovered from comprehensive analyses and modeling of experimental spectra as well as associated software for data representation written in PHP Java Script, C++ and FORTRAN. The line-by-line list of vibration-rotation transitions and other information is organized as a relational database under control of MySQL database tools. The main S&MPO goal is to provide access to all available information on vibration-rotation molecular states and transitions under extended conditions based on extrapolations of laboratory measurements using validated theoretical models. Applications for the S&MPO may include: education/training in molecular physics, radiative processes, laser physics; spectroscopic applications (analysis, Fourier transform spectroscopy, atmospheric optics, optical standards, spectroscopic atlases); applications to environment studies and atmospheric physics (remote sensing); data supply for specific databases; and to photochemistry (laser excitation, multiphoton processes). The system is accessible via Internet on two sites: http://smpo.iao.ru and http://smpo.univ-reims.fr.
Yu, Kebing; Salomon, Arthur R
2009-12-01
Recently, dramatic progress has been achieved in expanding the sensitivity, resolution, mass accuracy, and scan rate of mass spectrometers able to fragment and identify peptides through MS/MS. Unfortunately, this enhanced ability to acquire proteomic data has not been accompanied by a concomitant increase in the availability of flexible tools allowing users to rapidly assimilate, explore, and analyze this data and adapt to various experimental workflows with minimal user intervention. Here we fill this critical gap by providing a flexible relational database called PeptideDepot for organization of expansive proteomic data sets, collation of proteomic data with available protein information resources, and visual comparison of multiple quantitative proteomic experiments. Our software design, built upon the synergistic combination of a MySQL database for safe warehousing of proteomic data with a FileMaker-driven graphical user interface for flexible adaptation to diverse workflows, enables proteomic end-users to directly tailor the presentation of proteomic data to the unique analysis requirements of the individual proteomics lab. PeptideDepot may be deployed as an independent software tool or integrated directly with our high throughput autonomous proteomic pipeline used in the automated acquisition and post-acquisition analysis of proteomic data.
A Database-Based and Web-Based Meta-CASE System
NASA Astrophysics Data System (ADS)
Eessaar, Erki; Sgirka, Rünno
Each Computer Aided Software Engineering (CASE) system provides support to a software process or specific tasks or activities that are part of a software process. Each meta-CASE system allows us to create new CASE systems. The creators of a new CASE system have to specify abstract syntax of the language that is used in the system and functionality as well as non-functional properties of the new system. Many meta-CASE systems record their data directly in files. In this paper, we introduce a meta-CASE system, the enabling technology of which is an object-relational database system (ORDBMS). The system allows users to manage specifications of languages and create models by using these languages. The system has web-based and form-based user interface. We have created a proof-of-concept prototype of the system by using PostgreSQL ORDBMS and PHP scripting language.
Exploring No-SQL alternatives for ALMA monitoring system
NASA Astrophysics Data System (ADS)
Shen, Tzu-Chiang; Soto, Ruben; Merino, Patricio; Peña, Leonel; Bartsch, Marcelo; Aguirre, Alvaro; Ibsen, Jorge
2014-07-01
The Atacama Large Millimeter /submillimeter Array (ALMA) will be a unique research instrument composed of at least 66 reconfigurable high-precision antennas, located at the Chajnantor plain in the Chilean Andes at an elevation of 5000 m. This paper describes the experience gained after several years working with the monitoring system, which has a strong requirement of collecting and storing up to 150K variables with a highest sampling rate of 20.8 kHz. The original design was built on top of a cluster of relational database server and network attached storage with fiber channel interface. As the number of monitoring points increases with the number of antennas included in the array, the current monitoring system demonstrated to be able to handle the increased data rate in the collection and storage area (only one month of data), but the data query interface showed serious performance degradation. A solution based on no-SQL platform was explored as an alternative to the current long-term storage system. Among several alternatives, mongoDB has been selected. In the data flow, intermediate cache servers based on Redis were introduced to allow faster streaming of the most recently acquired data to web based charts and applications for online data analysis.
The Mayak Worker Dosimetry System (MWDS-2013): Implementation of the Dose Calculations.
Zhdanov, А; Vostrotin, V; Efimov, А; Birchall, A; Puncher, M
2016-07-15
The calculation of internal doses for the Mayak Worker Dosimetry System (MWDS-2013) involved extensive computational resources due to the complexity and sheer number of calculations required. The required output consisted of a set of 1000 hyper-realizations: each hyper-realization consists of a set (1 for each worker) of probability distributions of organ doses. This report describes the hardware components and computational approaches required to make the calculation tractable. Together with the software, this system is referred to here as the 'PANDORA system'. It is based on a commercial SQL server database in a series of six work stations. A complete run of the entire Mayak worker cohort entailed a huge amount of calculations in PANDORA and due to the relatively slow speed of writing the data into the SQL server, each run took about 47 days. Quality control was monitored by comparing doses calculated in PANDORA with those in a specially modified version of the commercial software 'IMBA Professional Plus'. Suggestions are also made for increasing calculation and storage efficiency for future dosimetry calculations using PANDORA. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
A high-speed drug interaction search system for ease of use in the clinical environment.
Takada, Masahiro; Inada, Hiroshi; Nakazawa, Kazuo; Tani, Shoko; Iwata, Michiaki; Sugimoto, Yoshihisa; Nagata, Satoru
2012-12-01
With the advancement of pharmaceutical development, drug interactions have become increasingly complex. As a result, a computer-based drug interaction search system is required to organize the whole of drug interaction data. To overcome problems faced with the existing systems, we developed a drug interaction search system using a hash table, which offers higher processing speeds and easier maintenance operations compared with relational databases (RDB). In order to compare the performance of our system and MySQL RDB in terms of search speed, drug interaction searches were repeated for all 45 possible combinations of two out of a group of 10 drugs for two cases: 5,604 and 56,040 drug interaction data. As the principal result, our system was able to process the search approximately 19 times faster than the system using the MySQL RDB. Our system also has several other merits such as that drug interaction data can be created in comma-separated value (CSV) format, thereby facilitating data maintenance. Although our system uses the well-known method of a hash table, it is expected to resolve problems common to existing systems and to be an effective system that enables the safe management of drugs.
Harris, Eric S J; Erickson, Sean D; Tolopko, Andrew N; Cao, Shugeng; Craycroft, Jane A; Scholten, Robert; Fu, Yanling; Wang, Wenquan; Liu, Yong; Zhao, Zhongzhen; Clardy, Jon; Shamu, Caroline E; Eisenberg, David M
2011-05-17
Ethnobotanically driven drug-discovery programs include data related to many aspects of the preparation of botanical medicines, from initial plant collection to chemical extraction and fractionation. The Traditional Medicine Collection Tracking System (TM-CTS) was created to organize and store data of this type for an international collaborative project involving the systematic evaluation of commonly used Traditional Chinese Medicinal plants. The system was developed using domain-driven design techniques, and is implemented using Java, Hibernate, PostgreSQL, Business Intelligence and Reporting Tools (BIRT), and Apache Tomcat. The TM-CTS relational database schema contains over 70 data types, comprising over 500 data fields. The system incorporates a number of unique features that are useful in the context of ethnobotanical projects such as support for information about botanical collection, method of processing, quality tests for plants with existing pharmacopoeia standards, chemical extraction and fractionation, and historical uses of the plants. The database also accommodates data provided in multiple languages and integration with a database system built to support high throughput screening based drug discovery efforts. It is accessed via a web-based application that provides extensive, multi-format reporting capabilities. This new database system was designed to support a project evaluating the bioactivity of Chinese medicinal plants. The software used to create the database is open source, freely available, and could potentially be applied to other ethnobotanically driven natural product collection and drug-discovery programs. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.
Harris, Eric S. J.; Erickson, Sean D.; Tolopko, Andrew N.; Cao, Shugeng; Craycroft, Jane A.; Scholten, Robert; Fu, Yanling; Wang, Wenquan; Liu, Yong; Zhao, Zhongzhen; Clardy, Jon; Shamu, Caroline E.; Eisenberg, David M.
2011-01-01
Aim of the study. Ethnobotanically-driven drug-discovery programs include data related to many aspects of the preparation of botanical medicines, from initial plant collection to chemical extraction and fractionation. The Traditional Medicine-Collection Tracking System (TM-CTS) was created to organize and store data of this type for an international collaborative project involving the systematic evaluation of commonly used Traditional Chinese Medicinal plants. Materials and Methods. The system was developed using domain-driven design techniques, and is implemented using Java, Hibernate, PostgreSQL, Business Intelligence and Reporting Tools (BIRT), and Apache Tomcat. Results. The TM-CTS relational database schema contains over 70 data types, comprising over 500 data fields. The system incorporates a number of unique features that are useful in the context of ethnobotanical projects such as support for information about botanical collection, method of processing, quality tests for plants with existing pharmacopoeia standards, chemical extraction and fractionation, and historical uses of the plants. The database also accommodates data provided in multiple languages and integration with a database system built to support high throughput screening based drug discovery efforts. It is accessed via a web-based application that provides extensive, multi-format reporting capabilities. Conclusions. This new database system was designed to support a project evaluating the bioactivity of Chinese medicinal plants. The software used to create the database is open source, freely available, and could potentially be applied to other ethnobotanically-driven natural product collection and drug-discovery programs. PMID:21420479
Sakurai, Nozomu; Ara, Takeshi; Kanaya, Shigehiko; Nakamura, Yukiko; Iijima, Yoko; Enomoto, Mitsuo; Motegi, Takeshi; Aoki, Koh; Suzuki, Hideyuki; Shibata, Daisuke
2013-01-15
High-accuracy mass values detected by high-resolution mass spectrometry analysis enable prediction of elemental compositions, and thus are used for metabolite annotations in metabolomic studies. Here, we report an application of a relational database to significantly improve the rate of elemental composition predictions. By searching a database of pre-calculated elemental compositions with fixed kinds and numbers of atoms, the approach eliminates redundant evaluations of the same formula that occur in repeated calculations with other tools. When our approach is compared with HR2, which is one of the fastest tools available, our database search times were at least 109 times shorter than those of HR2. When a solid-state drive (SSD) was applied, the search time was 488 times shorter at 5 ppm mass tolerance and 1833 times at 0.1 ppm. Even if the search by HR2 was performed with 8 threads in a high-spec Windows 7 PC, the database search times were at least 26 and 115 times shorter without and with the SSD. These improvements were enhanced in a low spec Windows XP PC. We constructed a web service 'MFSearcher' to query the database in a RESTful manner. Available for free at http://webs2.kazusa.or.jp/mfsearcher. The web service is implemented in Java, MySQL, Apache and Tomcat, with all major browsers supported. sakurai@kazusa.or.jp Supplementary data are available at Bioinformatics online.
Building An Integrated Neurodegenerative Disease Database At An Academic Health Center
Xie, Sharon X.; Baek, Young; Grossman, Murray; Arnold, Steven E.; Karlawish, Jason; Siderowf, Andrew; Hurtig, Howard; Elman, Lauren; McCluskey, Leo; Van Deerlin, Vivianna; Lee, Virginia M.-Y.; Trojanowski, John Q.
2010-01-01
Background It is becoming increasingly important to study common and distinct etiologies, clinical and pathological features, and mechanisms related to neurodegenerative diseases such as Alzheimer’s disease (AD), Parkinson’s disease (PD), amyotrophic lateral sclerosis (ALS), and frontotemporal lobar degeneration (FTLD). These comparative studies rely on powerful database tools to quickly generate data sets which match diverse and complementary criteria set by the studies. Methods In this paper, we present a novel Integrated NeuroDegenerative Disease (INDD) database developed at the University of Pennsylvania (Penn) through a consortium of Penn investigators. Since these investigators work on AD, PD, ALS and FTLD, this allowed us to achieve the goal of developing an INDD database for these major neurodegenerative disorders. We used Microsoft SQL Server as the platform with built-in “backwards” functionality to provide Access as a front-end client to interface with the database. We used PHP hypertext Preprocessor to create the “front end” web interface and then integrated individual neurodegenerative disease databases using a master lookup table. We also present methods of data entry, database security, database backups, and database audit trails for this INDD database. Results We compare the results of a biomarker study using the INDD database to those using an alternative approach by querying individual database separately. Conclusions We have demonstrated that the Penn INDD database has the ability to query multiple database tables from a single console with high accuracy and reliability. The INDD database provides a powerful tool for generating data sets in comparative studies across several neurodegenerative diseases. PMID:21784346
NASA Astrophysics Data System (ADS)
Boichard, Jean-Luc; Brissebrat, Guillaume; Cloche, Sophie; Eymard, Laurence; Fleury, Laurence; Mastrorillo, Laurence; Moulaye, Oumarou; Ramage, Karim
2010-05-01
The AMMA project includes aircraft, ground-based and ocean measurements, an intensive use of satellite data and diverse modelling studies. Therefore, the AMMA database aims at storing a great amount and a large variety of data, and at providing the data as rapidly and safely as possible to the AMMA research community. In order to stimulate the exchange of information and collaboration between researchers from different disciplines or using different tools, the database provides a detailed description of the products and uses standardized formats. The AMMA database contains: - AMMA field campaigns datasets; - historical data in West Africa from 1850 (operational networks and previous scientific programs); - satellite products from past and future satellites, (re-)mapped on a regular latitude/longitude grid and stored in NetCDF format (CF Convention); - model outputs from atmosphere or ocean operational (re-)analysis and forecasts, and from research simulations. The outputs are processed as the satellite products are. Before accessing the data, any user has to sign the AMMA data and publication policy. This chart only covers the use of data in the framework of scientific objectives and categorically excludes the redistribution of data to third parties and the usage for commercial applications. Some collaboration between data producers and users, and the mention of the AMMA project in any publication is also required. The AMMA database and the associated on-line tools have been fully developed and are managed by two teams in France (IPSL Database Centre, Paris and OMP, Toulouse). Users can access data of both data centres using an unique web portal. This website is composed of different modules : - Registration: forms to register, read and sign the data use chart when an user visits for the first time - Data access interface: friendly tool allowing to build a data extraction request by selecting various criteria like location, time, parameters... The request can concern local, satellite and model data. - Documentation: catalogue of all the available data and their metadata. These tools have been developed using standard and free languages and softwares: - Linux system with an Apache web server and a Tomcat application server; - J2EE tools : JSF and Struts frameworks, hibernate; - relational database management systems: PostgreSQL and MySQL; - OpenLDAP directory. In order to facilitate the access to the data by African scientists, the complete system has been mirrored at AGHRYMET Regional Centre in Niamey and is operational there since January 2009. Users can now access metadata and request data through one or the other of two equivalent portals: http://database.amma-international.org or http://amma.agrhymet.ne/amma-data.
Development of a Prototype Detailing Management System for the Civil Engineer Corps
2002-09-01
73 Figure 30. Associate Members To Billets SQL Statement...American Standard Code for Information Interchange EMPRS Electronic Military Personnel Record System VBA Visual Basic for Applications SDLC...capturing keystrokes or carrying out a series of actions when opening an Access database. In the place of macros, VBA should be used because of its
2001-09-01
of MEIMS was programmed in Microsoft Access 97 using Visual Basic for Applications ( VBA ). This prototype had very little documentation. The FAA...using Acess 2000 as an interface and SQL server as the database engine. Question 1: Did you have any problems accessing the program? Y / N
DOE Office of Scientific and Technical Information (OSTI.GOV)
Paulson, Patrick R.; Gibson, Tara D.; Schuchardt, Karen L.
2008-03-01
Requirements for the provenance store and access API are developed. Existing RDF stores and APIs are evaluated against the requirements and performance benchmarks. The team’s conclusion is to use MySQL as a database backend, with a possible move to Oracle in the near-term future. Both Jena and Sesame’s APIs will be supported, but new code will use the Jena API
Web-Based Search and Plot System for Nuclear Reaction Data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Otuka, N.; Nakagawa, T.; Fukahori, T.
2005-05-24
A web-based search and plot system for nuclear reaction data has been developed, covering experimental data in EXFOR format and evaluated data in ENDF format. The system is implemented for Linux OS, with Perl and MySQL used for CGI scripts and the database manager, respectively. Two prototypes for experimental and evaluated data are presented.
Named Entity Recognition in a Hungarian NL Based QA System
NASA Astrophysics Data System (ADS)
Tikkl, Domonkos; Szidarovszky, P. Ferenc; Kardkovacs, Zsolt T.; Magyar, Gábor
In WoW project our purpose is to create a complex search interface with the following features: search in the deep web content of contracted partners' databases, processing Hungarian natural language (NL) questions and transforming them to SQL queries for database access, image search supported by a visual thesaurus that describes in a structural form the visual content of images (also in Hungarian). This paper primarily focuses on a particular problem of question processing task: the entity recognition. Before going into details we give a short overview of the project's aims.
A Framework for Cloudy Model Optimization and Database Storage
NASA Astrophysics Data System (ADS)
Calvén, Emilia; Helton, Andrew; Sankrit, Ravi
2018-01-01
We present a framework for producing Cloudy photoionization models of the nebular emission from novae ejecta and storing a subset of the results in SQL database format for later usage. The database can be searched for models best fitting observed spectral line ratios. Additionally, the framework includes an optimization feature that can be used in tandem with the database to search for and improve on models by creating new Cloudy models while, varying the parameters. The database search and optimization can be used to explore the structures of nebulae by deriving their properties from the best-fit models. The goal is to provide the community with a large database of Cloudy photoionization models, generated from parameters reflecting conditions within novae ejecta, that can be easily fitted to observed spectral lines; either by directly accessing the database using the framework code or by usage of a website specifically made for this purpose.
An image database management system for conducting CAD research
NASA Astrophysics Data System (ADS)
Gruszauskas, Nicholas; Drukker, Karen; Giger, Maryellen L.
2007-03-01
The development of image databases for CAD research is not a trivial task. The collection and management of images and their related metadata from multiple sources is a time-consuming but necessary process. By standardizing and centralizing the methods in which these data are maintained, one can generate subsets of a larger database that match the specific criteria needed for a particular research project in a quick and efficient manner. A research-oriented management system of this type is highly desirable in a multi-modality CAD research environment. An online, webbased database system for the storage and management of research-specific medical image metadata was designed for use with four modalities of breast imaging: screen-film mammography, full-field digital mammography, breast ultrasound and breast MRI. The system was designed to consolidate data from multiple clinical sources and provide the user with the ability to anonymize the data. Input concerning the type of data to be stored as well as desired searchable parameters was solicited from researchers in each modality. The backbone of the database was created using MySQL. A robust and easy-to-use interface for entering, removing, modifying and searching information in the database was created using HTML and PHP. This standardized system can be accessed using any modern web-browsing software and is fundamental for our various research projects on computer-aided detection, diagnosis, cancer risk assessment, multimodality lesion assessment, and prognosis. Our CAD database system stores large amounts of research-related metadata and successfully generates subsets of cases that match the user's desired search criteria.
Design of Integrated Database on Mobile Information System: A Study of Yogyakarta Smart City App
NASA Astrophysics Data System (ADS)
Nurnawati, E. K.; Ermawati, E.
2018-02-01
An integration database is a database which acts as the data store for multiple applications and thus integrates data across these applications (in contrast to an Application Database). An integration database needs a schema that takes all its client applications into account. The benefit of the schema that sharing data among applications does not require an extra layer of integration services on the applications. Any changes to data made in a single application are made available to all applications at the time of database commit - thus keeping the applications’ data use better synchronized. This study aims to design and build an integrated database that can be used by various applications in a mobile device based system platforms with the based on smart city system. The built-in database can be used by various applications, whether used together or separately. The design and development of the database are emphasized on the flexibility, security, and completeness of attributes that can be used together by various applications to be built. The method used in this study is to choice of the appropriate database logical structure (patterns of data) and to build the relational-database models (Design Databases). Test the resulting design with some prototype apps and analyze system performance with test data. The integrated database can be utilized both of the admin and the user in an integral and comprehensive platform. This system can help admin, manager, and operator in managing the application easily and efficiently. This Android-based app is built based on a dynamic clientserver where data is extracted from an external database MySQL. So if there is a change of data in the database, then the data on Android applications will also change. This Android app assists users in searching of Yogyakarta (as smart city) related information, especially in culture, government, hotels, and transportation.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Poliakov, Alexander; Couronne, Olivier
2002-11-04
Aligning large vertebrate genomes that are structurally complex poses a variety of problems not encountered on smaller scales. Such genomes are rich in repetitive elements and contain multiple segmental duplications, which increases the difficulty of identifying true orthologous SNA segments in alignments. The sizes of the sequences make many alignment algorithms designed for comparing single proteins extremely inefficient when processing large genomic intervals. We integrated both local and global alignment tools and developed a suite of programs for automatically aligning large vertebrate genomes and identifying conserved non-coding regions in the alignments. Our method uses the BLAT local alignment program tomore » find anchors on the base genome to identify regions of possible homology for a query sequence. These regions are postprocessed to find the best candidates which are then globally aligned using the AVID global alignment program. In the last step conserved non-coding segments are identified using VISTA. Our methods are fast and the resulting alignments exhibit a high degree of sensitivity, covering more than 90% of known coding exons in the human genome. The GenomeVISTA software is a suite of Perl programs that is built on a MySQL database platform. The scheduler gets control data from the database, builds a queve of jobs, and dispatches them to a PC cluster for execution. The main program, running on each node of the cluster, processes individual sequences. A Perl library acts as an interface between the database and the above programs. The use of a separate library allows the programs to function independently of the database schema. The library also improves on the standard Perl MySQL database interfere package by providing auto-reconnect functionality and improved error handling.« less
CBS Genome Atlas Database: a dynamic storage for bioinformatic results and sequence data.
Hallin, Peter F; Ussery, David W
2004-12-12
Currently, new bacterial genomes are being published on a monthly basis. With the growing amount of genome sequence data, there is a demand for a flexible and easy-to-maintain structure for storing sequence data and results from bioinformatic analysis. More than 150 sequenced bacterial genomes are now available, and comparisons of properties for taxonomically similar organisms are not readily available to many biologists. In addition to the most basic information, such as AT content, chromosome length, tRNA count and rRNA count, a large number of more complex calculations are needed to perform detailed comparative genomics. DNA structural calculations like curvature and stacking energy, DNA compositions like base skews, oligo skews and repeats at the local and global level are just a few of the analysis that are presented on the CBS Genome Atlas Web page. Complex analysis, changing methods and frequent addition of new models are factors that require a dynamic database layout. Using basic tools like the GNU Make system, csh, Perl and MySQL, we have created a flexible database environment for storing and maintaining such results for a collection of complete microbial genomes. Currently, these results counts to more than 220 pieces of information. The backbone of this solution consists of a program package written in Perl, which enables administrators to synchronize and update the database content. The MySQL database has been connected to the CBS web-server via PHP4, to present a dynamic web content for users outside the center. This solution is tightly fitted to existing server infrastructure and the solutions proposed here can perhaps serve as a template for other research groups to solve database issues. A web based user interface which is dynamically linked to the Genome Atlas Database can be accessed via www.cbs.dtu.dk/services/GenomeAtlas/. This paper has a supplemental information page which links to the examples presented: www.cbs.dtu.dk/services/GenomeAtlas/suppl/bioinfdatabase.
Pan, Shiyang; Mu, Yuan; Wang, Hong; Wang, Tong; Huang, Peijun; Ma, Jianfeng; Jiang, Li; Zhang, Jie; Gu, Bing; Yi, Lujiang
2010-04-01
To meet the needs of management of medical case information and biospecimen simultaneously, we developed a novel medical case information system integrating with biospecimen management. The database established by MS SQL Server 2000 covered, basic information, clinical diagnosis, imaging diagnosis, pathological diagnosis and clinical treatment of patient; physicochemical property, inventory management and laboratory analysis of biospecimen; users log and data maintenance. The client application developed by Visual C++ 6.0 was used to implement medical case and biospecimen management, which was based on Client/Server model. This system can perform input, browse, inquest, summary of case and related biospecimen information, and can automatically synthesize case-records based on the database. Management of not only a long-term follow-up on individual, but also of grouped cases organized according to the aim of research can be achieved by the system. This system can improve the efficiency and quality of clinical researches while biospecimens are used coordinately. It realizes synthesized and dynamic management of medical case and biospecimen, which may be considered as a new management platform.
The data quality analyzer: A quality control program for seismic data
NASA Astrophysics Data System (ADS)
Ringler, A. T.; Hagerty, M. T.; Holland, J.; Gonzales, A.; Gee, L. S.; Edwards, J. D.; Wilson, D.; Baker, A. M.
2015-03-01
The U.S. Geological Survey's Albuquerque Seismological Laboratory (ASL) has several initiatives underway to enhance and track the quality of data produced from ASL seismic stations and to improve communication about data problems to the user community. The Data Quality Analyzer (DQA) is one such development and is designed to characterize seismic station data quality in a quantitative and automated manner. The DQA consists of a metric calculator, a PostgreSQL database, and a Web interface: The metric calculator, SEEDscan, is a Java application that reads and processes miniSEED data and generates metrics based on a configuration file. SEEDscan compares hashes of metadata and data to detect changes in either and performs subsequent recalculations as needed. This ensures that the metric values are up to date and accurate. SEEDscan can be run as a scheduled task or on demand. The PostgreSQL database acts as a central hub where metric values and limited station descriptions are stored at the channel level with one-day granularity. The Web interface dynamically loads station data from the database and allows the user to make requests for time periods of interest, review specific networks and stations, plot metrics as a function of time, and adjust the contribution of various metrics to the overall quality grade of the station. The quantification of data quality is based on the evaluation of various metrics (e.g., timing quality, daily noise levels relative to long-term noise models, and comparisons between broadband data and event synthetics). Users may select which metrics contribute to the assessment and those metrics are aggregated into a "grade" for each station. The DQA is being actively used for station diagnostics and evaluation based on the completed metrics (availability, gap count, timing quality, deviation from a global noise model, deviation from a station noise model, coherence between co-located sensors, and comparison between broadband data and synthetics for earthquakes) on stations in the Global Seismographic Network and Advanced National Seismic System.
Updates to the Virtual Atomic and Molecular Data Centre
NASA Astrophysics Data System (ADS)
Hill, Christian; Tennyson, Jonathan; Gordon, Iouli E.; Rothman, Laurence S.; Dubernet, Marie-Lise
2014-06-01
The Virtual Atomic and Molecular Data Centre (VAMDC) has established a set of standards for the storage and transmission of atomic and molecular data and an SQL-based query language (VSS2) for searching online databases, known as nodes. The project has also created an online service, the VAMDC Portal, through which all of these databases may be searched and their results compared and aggregated. Since its inception four years ago, the VAMDC e-infrastructure has grown to encompass over 40 databases, including HITRAN, in more than 20 countries and engages actively with scientists in six continents. Associated with the portal are a growing suite of software tools for the transformation of data from its native, XML-based, XSAMS format, to a range of more convenient human-readable (such as HTML) and machinereadable (such as CSV) formats. The relational database for HITRAN1, created as part of the VAMDC project is a flexible and extensible data model which is able to represent a wider range of parameters than the current fixed-format text-based one. Over the next year, a new online interface to this database will be tested, released and fully documented - this web application, HITRANonline2, will fully replace the ageing and incomplete JavaHAWKS software suite.
Rallapalli, P M; Kemball-Cook, G; Tuddenham, E G; Gomez, K; Perkins, S J
2013-07-01
Factor IX (FIX) is important in the coagulation cascade, being activated to FIXa on cleavage. Defects in the human F9 gene frequently lead to hemophilia B. To assess 1113 unique F9 mutations corresponding to 3721 patient entries in a new and up-to-date interactive web database alongside the FIXa protein structure. The mutations database was built using MySQL and structural analyses were based on a homology model for the human FIXa structure based on closely-related crystal structures. Mutations have been found in 336 (73%) out of 461 residues in FIX. There were 812 unique point mutations, 182 deletions, 54 polymorphisms, 39 insertions and 26 others that together comprise a total of 1113 unique variants. The 64 unique mild severity mutations in the mature protein with known circulating protein phenotypes include 15 (23%) quantitative type I mutations and 41 (64%) predominantly qualitative type II mutations. Inhibitors were described in 59 reports (1.6%) corresponding to 25 unique mutations. The interactive database provides insights into mechanisms of hemophilia B. Type II mutations are deduced to disrupt predominantly those structural regions involved with functional interactions. The interactive features of the database will assist in making judgments about patient management. © 2013 International Society on Thrombosis and Haemostasis.
NOAO observing proposal processing system
NASA Astrophysics Data System (ADS)
Bell, David J.; Gasson, David; Hartman, Mia
2002-12-01
Since going electronic in 1994, NOAO has continued to refine and enhance its observing proposal handling system. Virtually all related processes are now handled electronically. Members of the astronomical community can submit proposals through email, web form or via Gemini's downloadable Phase-I Tool. NOAO staff can use online interfaces for administrative tasks, technical reviews, telescope scheduling, and compilation of various statistics. In addition, all information relevant to the TAC process is made available online. The system, now known as ANDES, is designed as a thin-client architecture (web pages are now used for almost all database functions) built using open source tools (FreeBSD, Apache, MySQL, Perl, PHP) to process descriptively-marked (LaTeX, XML) proposal documents.
Motamed, Cyrus; Bourgain, Jean Louis
2016-06-01
Anaesthesia Information Management Systems (AIMS) generate large amounts of data, which might be useful for quality assurance programs. This study was designed to highlight the multiple contributions of our AIMS system in extracting quality indicators over a period of 10years. The study was conducted from 2002 to 2011. Two methods were used to extract anaesthesia indicators: the manual extraction of individual files for monitoring neuromuscular relaxation and structured query language (SQL) extraction for other indicators which were postoperative nausea and vomiting (PONV), pain, sedation scores, pain-related medications, scores and postoperative hypothermia. For each indicator, a program of information/meetings and adaptation/suggestions for operating room and PACU personnel was initiated to improve quality assurance, while data were extracted each year. The study included 77,573 patients. The mean overall completeness of data for the initial years ranged from 55 to 85% and was indicator-dependent, which then improved to 95% completeness for the last 5years. The incidence of neuromuscular monitoring was initially 67% and then increased to 95% (P<0.05). The rate of pharmacological reversal remained around 53% throughout the study. Regarding SQL data, an improvement of severe postoperative pain and PONV scores was observed throughout the study, while mild postoperative hypothermia remained a challenge, despite efforts for improvement. The AIMS system permitted the follow-up of certain indicators through manual sampling and many more via SQL extraction in a sustained and non-time-consuming way across years. However, it requires competent and especially dedicated resources to handle the database. Copyright © 2016 Société française d'anesthésie et de réanimation (Sfar). Published by Elsevier Masson SAS. All rights reserved.
A web-based data visualization tool for the MIMIC-II database.
Lee, Joon; Ribey, Evan; Wallace, James R
2016-02-04
Although MIMIC-II, a public intensive care database, has been recognized as an invaluable resource for many medical researchers worldwide, becoming a proficient MIMIC-II researcher requires knowledge of SQL programming and an understanding of the MIMIC-II database schema. These are challenging requirements especially for health researchers and clinicians who may have limited computer proficiency. In order to overcome this challenge, our objective was to create an interactive, web-based MIMIC-II data visualization tool that first-time MIMIC-II users can easily use to explore the database. The tool offers two main features: Explore and Compare. The Explore feature enables the user to select a patient cohort within MIMIC-II and visualize the distributions of various administrative, demographic, and clinical variables within the selected cohort. The Compare feature enables the user to select two patient cohorts and visually compare them with respect to a variety of variables. The tool is also helpful to experienced MIMIC-II researchers who can use it to substantially accelerate the cumbersome and time-consuming steps of writing SQL queries and manually visualizing extracted data. Any interested researcher can use the MIMIC-II data visualization tool for free to quickly and conveniently conduct a preliminary investigation on MIMIC-II with a few mouse clicks. Researchers can also use the tool to learn the characteristics of the MIMIC-II patients. Since it is still impossible to conduct multivariable regression inside the tool, future work includes adding analytics capabilities. Also, the next version of the tool will aim to utilize MIMIC-III which contains more data.
A public HTLV-1 molecular epidemiology database for sequence management and data mining.
Araujo, Thessika Hialla Almeida; Souza-Brito, Leandro Inacio; Libin, Pieter; Deforche, Koen; Edwards, Dustin; de Albuquerque-Junior, Antonio Eduardo; Vandamme, Anne-Mieke; Galvao-Castro, Bernardo; Alcantara, Luiz Carlos Junior
2012-01-01
It is estimated that 15 to 20 million people are infected with the human T-cell lymphotropic virus type 1 (HTLV-1). At present, there are more than 2,000 unique HTLV-1 isolate sequences published. A central database to aggregate sequence information from a range of epidemiological aspects including HTLV-1 infections, pathogenesis, origins, and evolutionary dynamics would be useful to scientists and physicians worldwide. Described here, we have developed a database that collects and annotates sequence data and can be accessed through a user-friendly search interface. The HTLV-1 Molecular Epidemiology Database website is available at http://htlv1db.bahia.fiocruz.br/. All data was obtained from publications available at GenBank or through contact with the authors. The database was developed using Apache Webserver 2.1.6 and SGBD MySQL. The webpage interfaces were developed in HTML and sever-side scripting written in PHP. The HTLV-1 Molecular Epidemiology Database is hosted on the Gonçalo Moniz/FIOCRUZ Research Center server. There are currently 2,457 registered sequences with 2,024 (82.37%) of those sequences representing unique isolates. Of these sequences, 803 (39.67%) contain information about clinical status (TSP/HAM, 17.19%; ATL, 7.41%; asymptomatic, 12.89%; other diseases, 2.17%; and no information, 60.32%). Further, 7.26% of sequences contain information on patient gender while 5.23% of sequences provide the age of the patient. The HTLV-1 Molecular Epidemiology Database retrieves and stores annotated HTLV-1 proviral sequences from clinical, epidemiological, and geographical studies. The collected sequences and related information are now accessible on a publically available and user-friendly website. This open-access database will support clinical research and vaccine development related to viral genotype.
2007-06-01
other databases such as MySQL , Oracle , and Derby will be added to future versions of the program. Setting a factor requires more than changing a single...Non-Penetrating vs . Penetrating Results.............106 a. Coverage...Interaction Profile for D U-2 and C RQ-4 .......................................................89 Figure 59. R-Squared vs . Number of Regression Tree
ERIC Educational Resources Information Center
Poitras, Eric G.; Naismith, Laura M.; Doleck, Tenzin; Lajoie, Susanne P.
2016-01-01
This study aimed to identify misconceptions in medical student knowledge by mining user interactions in the MedU online learning environment. Data from 13000 attempts at a single virtual patient case were extracted from the MedU MySQL database. A subgroup discovery method was applied to identify patterns in learner-generated annotations and…
Digitizing Consumption Across the Operational Spectrum
2014-09-01
Figure 14. Java -implemented Dictionary and Query: Result ............................................22 Figure 15. Global Database Architecture...format. Figure 14 is an illustration of the query submitted in Java and the result which would be shown using the data shown in Figure 13. Figure...13. NoSQL (key, value) Dictionary Example 22 Figure 14. Java -implemented Dictionary and Query: Result While a
A System for the Automatic Assembly of Test Questions Using a No-SQL Database
ERIC Educational Resources Information Center
Shin, Sanggyu; Hashimoto, Hiroshi
2014-01-01
We describe a system that automatically assembles test questions from a set of examples. Our system can create test questions appropriate for each user's level at low cost. In particular, when a user review their lesson, our system provides new test questions which are assembled based on their previous test results and past mistakes, rather than a…
Looking toward the Future: A Case Study of Open Source Software in the Humanities
ERIC Educational Resources Information Center
Quamen, Harvey
2006-01-01
In this article Harvey Quamen examines how the philosophy of open source software might be of particular benefit to humanities scholars in the near future--particularly for academic journals with limited financial resources. To this end he provides a case study in which he describes his use of open source technology (MySQL database software and…
Predicting Host Level Reachability via Static Analysis of Routing Protocol Configuration
2007-09-01
check_function_bodies = false; SET client_min_messages = warning; -- -- Name: SCHEMA public; Type: COMMENT; Schema: -; Owner: postgres -- COMMENT...public; Owner: mcmanst -- -- -- Name: public; Type: ACL; Schema: -; Owner: postgres -- REVOKE ALL ON SCHEMA public FROM PUBLIC; REVOKE...ALL ON SCHEMA public FROM postgres ; GRANT ALL ON SCHEMA public TO postgres ; GRANT ALL ON SCHEMA public TO PUBLIC; -- -- PostgreSQL database
WebDB Component Builder - Lessons Learned
DOE Office of Scientific and Technical Information (OSTI.GOV)
Macedo, C.
2000-02-15
Oracle WebDB is the easiest way to produce web enabled lightweight and enterprise-centric applications. This concept from Oracle has tantalized our taste for simplistic web development by using a purely web based tool that lives nowhere else but in the database. The use of online wizards, templates, and query builders, which produces PL/SQL behind the curtains, can be used straight ''out of the box'' by both novice and seasoned developers. The topic of this presentation will introduce lessons learned by developing and deploying applications built using the WebDB Component Builder in conjunction with custom PL/SQL code to empower a hybridmore » application. There are two kinds of WebDB components: those that display data to end users via reporting, and those that let end users update data in the database via entry forms. The presentation will also discuss various methods within the Component Builder to enhance the applications pushed to the desktop. The demonstrated example is an application entitled HOME (Helping Other's More Effectively) that was built to manage a yearly United Way Campaign effort. Our task was to build an end to end application which could manage approximately 900 non-profit agencies, an average of 4,100 individual contributions, and $1.2 million dollars. Using WebDB, the shell of the application was put together in a matter of a few weeks. However, we did encounter some hurdles that WebDB, in it's stage of infancy (v2.0), could not solve for us directly. Together with custom PL/SQL, WebDB's Component Builder became a powerful tool that enabled us to produce a very flexible hybrid application.« less
Ontology-Based Data Integration between Clinical and Research Systems
Mate, Sebastian; Köpcke, Felix; Toddenroth, Dennis; Martin, Marcus; Prokosch, Hans-Ulrich
2015-01-01
Data from the electronic medical record comprise numerous structured but uncoded ele-ments, which are not linked to standard terminologies. Reuse of such data for secondary research purposes has gained in importance recently. However, the identification of rele-vant data elements and the creation of database jobs for extraction, transformation and loading (ETL) are challenging: With current methods such as data warehousing, it is not feasible to efficiently maintain and reuse semantically complex data extraction and trans-formation routines. We present an ontology-supported approach to overcome this challenge by making use of abstraction: Instead of defining ETL procedures at the database level, we use ontologies to organize and describe the medical concepts of both the source system and the target system. Instead of using unique, specifically developed SQL statements or ETL jobs, we define declarative transformation rules within ontologies and illustrate how these constructs can then be used to automatically generate SQL code to perform the desired ETL procedures. This demonstrates how a suitable level of abstraction may not only aid the interpretation of clinical data, but can also foster the reutilization of methods for un-locking it. PMID:25588043
myChEMBL: a virtual machine implementation of open data and cheminformatics tools.
Ochoa, Rodrigo; Davies, Mark; Papadatos, George; Atkinson, Francis; Overington, John P
2014-01-15
myChEMBL is a completely open platform, which combines public domain bioactivity data with open source database and cheminformatics technologies. myChEMBL consists of a Linux (Ubuntu) Virtual Machine featuring a PostgreSQL schema with the latest version of the ChEMBL database, as well as the latest RDKit cheminformatics libraries. In addition, a self-contained web interface is available, which can be modified and improved according to user specifications. The VM is available at: ftp://ftp.ebi.ac.uk/pub/databases/chembl/VM/myChEMBL/current. The web interface and web services code is available at: https://github.com/rochoa85/myChEMBL.
Viral Genome DataBase: storing and analyzing genes and proteins from complete viral genomes.
Hiscock, D; Upton, C
2000-05-01
The Viral Genome DataBase (VGDB) contains detailed information of the genes and predicted protein sequences from 15 completely sequenced genomes of large (&100 kb) viruses (2847 genes). The data that is stored includes DNA sequence, protein sequence, GenBank and user-entered notes, molecular weight (MW), isoelectric point (pI), amino acid content, A + T%, nucleotide frequency, dinucleotide frequency and codon use. The VGDB is a mySQL database with a user-friendly JAVA GUI. Results of queries can be easily sorted by any of the individual parameters. The software and additional figures and information are available at http://athena.bioc.uvic.ca/genomes/index.html .
Using CLIPS in a distributed system: The Network Control Center (NCC) expert system
NASA Technical Reports Server (NTRS)
Wannemacher, Tom
1990-01-01
This paper describes an intelligent troubleshooting system for the Help Desk domain. It was developed on an IBM-compatible 80286 PC using Microsoft C and CLIPS and an AT&T 3B2 minicomputer using the UNIFY database and a combination of shell script, C programs and SQL queries. The two computers are linked by a lan. The functions of this system are to help non-technical NCC personnel handle trouble calls, to keep a log of problem calls with complete, concise information, and to keep a historical database of problems. The database helps identify hardware and software problem areas and provides a source of new rules for the troubleshooting knowledge base.
FCDD: A Database for Fruit Crops Diseases.
Chauhan, Rupal; Jasrai, Yogesh; Pandya, Himanshu; Chaudhari, Suman; Samota, Chand Mal
2014-01-01
Fruit Crops Diseases Database (FCDD) requires a number of biotechnology and bioinformatics tools. The FCDD is a unique bioinformatics resource that compiles information about 162 details on fruit crops diseases, diseases type, its causal organism, images, symptoms and their control. The FCDD contains 171 phytochemicals from 25 fruits, their 2D images and their 20 possible sequences. This information has been manually extracted and manually verified from numerous sources, including other electronic databases, textbooks and scientific journals. FCDD is fully searchable and supports extensive text search. The main focus of the FCDD is on providing possible information of fruit crops diseases, which will help in discovery of potential drugs from one of the common bioresource-fruits. The database was developed using MySQL. The database interface is developed in PHP, HTML and JAVA. FCDD is freely available. http://www.fruitcropsdd.com/
The NOAO Data Lab PHAT Photometry Database
NASA Astrophysics Data System (ADS)
Olsen, Knut; Williams, Ben; Fitzpatrick, Michael; PHAT Team
2018-01-01
We present a database containing both the combined photometric object catalog and the single epoch measurements from the Panchromatic Hubble Andromeda Treasury (PHAT). This database is hosted by the NOAO Data Lab (http://datalab.noao.edu), and as such exposes a number of data services to the PHAT photometry, including access through a Table Access Protocol (TAP) service, direct PostgreSQL queries, web-based and programmatic query interfaces, remote storage space for personal database tables and files, and a JupyterHub-based Notebook analysis environment, as well as image access through a Simple Image Access (SIA) service. We show how the Data Lab database and Jupyter Notebook environment allow for straightforward and efficient analyses of PHAT catalog data, including maps of object density, depth, and color, extraction of light curves of variable objects, and proper motion exploration.
Lu, Ying-Hao; Kuo, Chen-Chun; Huang, Yaw-Bin
2011-08-01
We selected HTML, PHP and JavaScript as the programming languages to build "WebBio", a web-based system for patient data of biological products and used MySQL as database. WebBio is based on the PHP-MySQL suite and is run by Apache server on Linux machine. WebBio provides the functions of data management, searching function and data analysis for 20 kinds of biological products (plasma expanders, human immunoglobulin and hematological products). There are two particular features in WebBio: (1) pharmacists can rapidly find out whose patients used contaminated products for medication safety, and (2) the statistics charts for a specific product can be automatically generated to reduce pharmacist's work loading. WebBio has successfully turned traditional paper work into web-based data management.
Building an integrated neurodegenerative disease database at an academic health center.
Xie, Sharon X; Baek, Young; Grossman, Murray; Arnold, Steven E; Karlawish, Jason; Siderowf, Andrew; Hurtig, Howard; Elman, Lauren; McCluskey, Leo; Van Deerlin, Vivianna; Lee, Virginia M-Y; Trojanowski, John Q
2011-07-01
It is becoming increasingly important to study common and distinct etiologies, clinical and pathological features, and mechanisms related to neurodegenerative diseases such as Alzheimer's disease, Parkinson's disease, amyotrophic lateral sclerosis, and frontotemporal lobar degeneration. These comparative studies rely on powerful database tools to quickly generate data sets that match diverse and complementary criteria set by them. In this article, we present a novel integrated neurodegenerative disease (INDD) database, which was developed at the University of Pennsylvania (Penn) with the help of a consortium of Penn investigators. Because the work of these investigators are based on Alzheimer's disease, Parkinson's disease, amyotrophic lateral sclerosis, and frontotemporal lobar degeneration, it allowed us to achieve the goal of developing an INDD database for these major neurodegenerative disorders. We used the Microsoft SQL server as a platform, with built-in "backwards" functionality to provide Access as a frontend client to interface with the database. We used PHP Hypertext Preprocessor to create the "frontend" web interface and then used a master lookup table to integrate individual neurodegenerative disease databases. We also present methods of data entry, database security, database backups, and database audit trails for this INDD database. Using the INDD database, we compared the results of a biomarker study with those using an alternative approach by querying individual databases separately. We have demonstrated that the Penn INDD database has the ability to query multiple database tables from a single console with high accuracy and reliability. The INDD database provides a powerful tool for generating data sets in comparative studies on several neurodegenerative diseases. Copyright © 2011 The Alzheimer's Association. Published by Elsevier Inc. All rights reserved.
mpMoRFsDB: a database of molecular recognition features in membrane proteins.
Gypas, Foivos; Tsaousis, Georgios N; Hamodrakas, Stavros J
2013-10-01
Molecular recognition features (MoRFs) are small, intrinsically disordered regions in proteins that undergo a disorder-to-order transition on binding to their partners. MoRFs are involved in protein-protein interactions and may function as the initial step in molecular recognition. The aim of this work was to collect, organize and store all membrane proteins that contain MoRFs. Membrane proteins constitute ∼30% of fully sequenced proteomes and are responsible for a wide variety of cellular functions. MoRFs were classified according to their secondary structure, after interacting with their partners. We identified MoRFs in transmembrane and peripheral membrane proteins. The position of transmembrane protein MoRFs was determined in relation to a protein's topology. All information was stored in a publicly available mySQL database with a user-friendly web interface. A Jmol applet is integrated for visualization of the structures. mpMoRFsDB provides valuable information related to disorder-based protein-protein interactions in membrane proteins. http://bioinformatics.biol.uoa.gr/mpMoRFsDB
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gaponov, Yu.A.; Igarashi, N.; Hiraki, M.
2004-05-12
An integrated controlling system and a unified database for high throughput protein crystallography experiments have been developed. Main features of protein crystallography experiments (purification, crystallization, crystal harvesting, data collection, data processing) were integrated into the software under development. All information necessary to perform protein crystallography experiments is stored (except raw X-ray data that are stored in a central data server) in a MySQL relational database. The database contains four mutually linked hierarchical trees describing protein crystals, data collection of protein crystal and experimental data processing. A database editor was designed and developed. The editor supports basic database functions to view,more » create, modify and delete user records in the database. Two search engines were realized: direct search of necessary information in the database and object oriented search. The system is based on TCP/IP secure UNIX sockets with four predefined sending and receiving behaviors, which support communications between all connected servers and clients with remote control functions (creating and modifying data for experimental conditions, data acquisition, viewing experimental data, and performing data processing). Two secure login schemes were designed and developed: a direct method (using the developed Linux clients with secure connection) and an indirect method (using the secure SSL connection using secure X11 support from any operating system with X-terminal and SSH support). A part of the system has been implemented on a new MAD beam line, NW12, at the Photon Factory Advanced Ring for general user experiments.« less
NASA Astrophysics Data System (ADS)
Bagnasco, S.; Berzano, D.; Guarise, A.; Lusso, S.; Masera, M.; Vallero, S.
2015-12-01
The INFN computing centre in Torino hosts a private Cloud, which is managed with the OpenNebula cloud controller. The infrastructure offers Infrastructure-as-a-Service (IaaS) and Platform-as-a-Service (PaaS) services to different scientific computing applications. The main stakeholders of the facility are a grid Tier-2 site for the ALICE collaboration at LHC, an interactive analysis facility for the same experiment and a grid Tier-2 site for the BESIII collaboration, plus an increasing number of other small tenants. The dynamic allocation of resources to tenants is partially automated. This feature requires detailed monitoring and accounting of the resource usage. We set up a monitoring framework to inspect the site activities both in terms of IaaS and applications running on the hosted virtual instances. For this purpose we used the ElasticSearch, Logstash and Kibana (ELK) stack. The infrastructure relies on a MySQL database back-end for data preservation and to ensure flexibility to choose a different monitoring solution if needed. The heterogeneous accounting information is transferred from the database to the ElasticSearch engine via a custom Logstash plugin. Each use-case is indexed separately in ElasticSearch and we setup a set of Kibana dashboards with pre-defined queries in order to monitor the relevant information in each case. For the IaaS metering, we developed sensors for the OpenNebula API. The IaaS level information gathered through the API is sent to the MySQL database through an ad-hoc developed RESTful web service. Moreover, we have developed a billing system for our private Cloud, which relies on the RabbitMQ message queue for asynchronous communication to the database and on the ELK stack for its graphical interface. The Italian Grid accounting framework is also migrating to a similar set-up. Concerning the application level, we used the Root plugin TProofMonSenderSQL to collect accounting data from the interactive analysis facility. The BESIII virtual instances used to be monitored with Zabbix, as a proof of concept we also retrieve the information contained in the Zabbix database. In this way we have achieved a uniform monitoring interface for both the IaaS and the scientific applications, mostly leveraging off-the-shelf tools. At present, we are working to define a model for monitoring-as-a-service, based on the tools described above, which the Cloud tenants can easily configure to suit their specific needs.
BtoxDB: a comprehensive database of protein structural data on toxin-antitoxin systems.
Barbosa, Luiz Carlos Bertucci; Garrido, Saulo Santesso; Marchetto, Reinaldo
2015-03-01
Toxin-antitoxin (TA) systems are diverse and abundant genetic modules in prokaryotic cells that are typically formed by two genes encoding a stable toxin and a labile antitoxin. Because TA systems are able to repress growth or kill cells and are considered to be important actors in cell persistence (multidrug resistance without genetic change), these modules are considered potential targets for alternative drug design. In this scenario, structural information for the proteins in these systems is highly valuable. In this report, we describe the development of a web-based system, named BtoxDB, that stores all protein structural data on TA systems. The BtoxDB database was implemented as a MySQL relational database using PHP scripting language. Web interfaces were developed using HTML, CSS and JavaScript. The data were collected from the PDB, UniProt and Entrez databases. These data were appropriately filtered using specialized literature and our previous knowledge about toxin-antitoxin systems. The database provides three modules ("Search", "Browse" and "Statistics") that enable searches, acquisition of contents and access to statistical data. Direct links to matching external databases are also available. The compilation of all protein structural data on TA systems in one platform is highly useful for researchers interested in this content. BtoxDB is publicly available at http://www.gurupi.uft.edu.br/btoxdb. Copyright © 2015 Elsevier Ltd. All rights reserved.
ProteoLens: a visual analytic tool for multi-scale database-driven biological network data mining.
Huan, Tianxiao; Sivachenko, Andrey Y; Harrison, Scott H; Chen, Jake Y
2008-08-12
New systems biology studies require researchers to understand how interplay among myriads of biomolecular entities is orchestrated in order to achieve high-level cellular and physiological functions. Many software tools have been developed in the past decade to help researchers visually navigate large networks of biomolecular interactions with built-in template-based query capabilities. To further advance researchers' ability to interrogate global physiological states of cells through multi-scale visual network explorations, new visualization software tools still need to be developed to empower the analysis. A robust visual data analysis platform driven by database management systems to perform bi-directional data processing-to-visualizations with declarative querying capabilities is needed. We developed ProteoLens as a JAVA-based visual analytic software tool for creating, annotating and exploring multi-scale biological networks. It supports direct database connectivity to either Oracle or PostgreSQL database tables/views, on which SQL statements using both Data Definition Languages (DDL) and Data Manipulation languages (DML) may be specified. The robust query languages embedded directly within the visualization software help users to bring their network data into a visualization context for annotation and exploration. ProteoLens supports graph/network represented data in standard Graph Modeling Language (GML) formats, and this enables interoperation with a wide range of other visual layout tools. The architectural design of ProteoLens enables the de-coupling of complex network data visualization tasks into two distinct phases: 1) creating network data association rules, which are mapping rules between network node IDs or edge IDs and data attributes such as functional annotations, expression levels, scores, synonyms, descriptions etc; 2) applying network data association rules to build the network and perform the visual annotation of graph nodes and edges according to associated data values. We demonstrated the advantages of these new capabilities through three biological network visualization case studies: human disease association network, drug-target interaction network and protein-peptide mapping network. The architectural design of ProteoLens makes it suitable for bioinformatics expert data analysts who are experienced with relational database management to perform large-scale integrated network visual explorations. ProteoLens is a promising visual analytic platform that will facilitate knowledge discoveries in future network and systems biology studies.
NASA Astrophysics Data System (ADS)
Steigies, Christian
2012-07-01
The Neutron Monitor Database project, www.nmdb.eu, has been funded in 2008 and 2009 by the European Commission's 7th framework program (FP7). Neutron monitors (NMs) have been in use worldwide since the International Geophysical Year (IGY) in 1957 and cosmic ray data from the IGY and the improved NM64 NMs has been distributed since this time, but a common data format existed only for data with one hour resolution. This data was first distributed in printed books, later via the World Data Center ftp server. In the 1990's the first NM stations started to record data at higher resolutions (typically 1 minute) and publish in on their webpages. However, every NM station chose their own format, making it cumbersome to work with this distributed data. In NMDB all European and some neighboring NM stations came together to agree on a common format for high-resolution data and made this available via a centralized database. The goal of NMDB is to make all data from all NM stations available in real-time. The original NMDB network has recently been joined by the Bartol Research Institute (Newark DE, USA), the National Autonomous University of Mexico and the North-West University (Potchefstroom, South Africa). The data is accessible to everyone via an easy to use webinterface, but expert users can also directly access the database to build applications like real-time space weather alerts. Even though SQL databases are used today by most webservices (blogs, wikis, social media, e-commerce), the power of an SQL database has not yet been fully realized by the scientific community. In training courses, we are teaching how to make use of NMDB, how to join NMDB, and how to ensure the data quality. The present status of the extended NMDB will be presented. The consortium welcomes further data providers to help increase the scientific contributions of the worldwide neutron monitor network to heliospheric physics and space weather.
Yu, Kebing; Salomon, Arthur R.
2010-01-01
Recently, dramatic progress has been achieved in expanding the sensitivity, resolution, mass accuracy, and scan rate of mass spectrometers able to fragment and identify peptides through tandem mass spectrometry (MS/MS). Unfortunately, this enhanced ability to acquire proteomic data has not been accompanied by a concomitant increase in the availability of flexible tools allowing users to rapidly assimilate, explore, and analyze this data and adapt to a variety of experimental workflows with minimal user intervention. Here we fill this critical gap by providing a flexible relational database called PeptideDepot for organization of expansive proteomic data sets, collation of proteomic data with available protein information resources, and visual comparison of multiple quantitative proteomic experiments. Our software design, built upon the synergistic combination of a MySQL database for safe warehousing of proteomic data with a FileMaker-driven graphical user interface for flexible adaptation to diverse workflows, enables proteomic end-users to directly tailor the presentation of proteomic data to the unique analysis requirements of the individual proteomics lab. PeptideDepot may be deployed as an independent software tool or integrated directly with our High Throughput Autonomous Proteomic Pipeline (HTAPP) used in the automated acquisition and post-acquisition analysis of proteomic data. PMID:19834895
The design of moral education website for college students based on ASP.NET
NASA Astrophysics Data System (ADS)
Sui, Chunling; Du, Ruiqing
2012-01-01
Moral education website offers an available solution to low transmission speed and small influence areas of traditional moral education. The aim of this paper is to illustrate the design of one moral education website and the advantages of using it to help moral teaching. The reason for moral education website was discussed at the beginning of this paper. Development tools were introduced. The system design was illustrated with module design and database design. How to access data in SQL Server database are discussed in details. Finally a conclusion was made based on the discussions in this paper.
Evolution of the architecture of the ATLAS Metadata Interface (AMI)
NASA Astrophysics Data System (ADS)
Odier, J.; Aidel, O.; Albrand, S.; Fulachier, J.; Lambert, F.
2015-12-01
The ATLAS Metadata Interface (AMI) is now a mature application. Over the years, the number of users and the number of provided functions has dramatically increased. It is necessary to adapt the hardware infrastructure in a seamless way so that the quality of service re - mains high. We describe the AMI evolution since its beginning being served by a single MySQL backend database server to the current state having a cluster of virtual machines at French Tier1, an Oracle database at Lyon with complementary replication to the Oracle DB at CERN and AMI back-up server.
DCMS: A data analytics and management system for molecular simulation.
Kumar, Anand; Grupcev, Vladimir; Berrada, Meryem; Fogarty, Joseph C; Tu, Yi-Cheng; Zhu, Xingquan; Pandit, Sagar A; Xia, Yuni
Molecular Simulation (MS) is a powerful tool for studying physical/chemical features of large systems and has seen applications in many scientific and engineering domains. During the simulation process, the experiments generate a very large number of atoms and intend to observe their spatial and temporal relationships for scientific analysis. The sheer data volumes and their intensive interactions impose significant challenges for data accessing, managing, and analysis. To date, existing MS software systems fall short on storage and handling of MS data, mainly because of the missing of a platform to support applications that involve intensive data access and analytical process. In this paper, we present the database-centric molecular simulation (DCMS) system our team developed in the past few years. The main idea behind DCMS is to store MS data in a relational database management system (DBMS) to take advantage of the declarative query interface ( i.e. , SQL), data access methods, query processing, and optimization mechanisms of modern DBMSs. A unique challenge is to handle the analytical queries that are often compute-intensive. For that, we developed novel indexing and query processing strategies (including algorithms running on modern co-processors) as integrated components of the DBMS. As a result, researchers can upload and analyze their data using efficient functions implemented inside the DBMS. Index structures are generated to store analysis results that may be interesting to other users, so that the results are readily available without duplicating the analysis. We have developed a prototype of DCMS based on the PostgreSQL system and experiments using real MS data and workload show that DCMS significantly outperforms existing MS software systems. We also used it as a platform to test other data management issues such as security and compression.
F. O. Kern, Elizabeth; Beischel, Scott; Stalnaker, Randal; Aron, David C.; Kirsh, Susan R.; Watts, Sharon A.
2008-01-01
Background Little information is available describing how to implement a disease registry from an electronic patient record system. The aim of this report is to describe the technology, methods, and utility of a diabetes registry populated by the Veterans Health Information Systems Architecture (VistA), which underlies the computerized patient record system of the Veterans Health Administration (VHA) in Veteran Affairs Integrated Service Network 10 (VISN 10). Methods VISN 10 data from VistA were mapped to a relational SQL-based data system using KB_SQL software. Operational definitions for diabetes, active clinical management, and responsible providers were used to create views of patient-level data in the diabetes registry. Query Analyzer was used to access the data views directly. Semicustomizable reports were created by linking the diabetes registry to a Web page using Microsoft asp.net2. A retrospective observational study design was used to analyze trends in the process of care and outcomes. Results Since October 2001, 81,227 patients with diabetes have enrolled in VISN 10: approximately 42,000 are currently under active management by VISN 10 providers. By tracking primary care visits, we assigned 91% to a clinic group responsible for diabetes care. In the Cleveland Veterans Affairs Medical Center (VAMC), the frequency of mean annual hemoglobin A1c levels ≥9% has declined significantly over 5 years. Almost 4000 patients have been seen in diabetes intervention programs in the Cleveland VAMC over the past 4 years. Conclusions A diabetes registry can be populated from the database underlying the VHA electronic patient record database system and linked to Web-based and ad hoc queries useful for quality improvement. PMID:19885172
A spatial database for landslides in northern Bavaria: A methodological approach
NASA Astrophysics Data System (ADS)
Jäger, Daniel; Kreuzer, Thomas; Wilde, Martina; Bemm, Stefan; Terhorst, Birgit
2018-04-01
Landslide databases provide essential information for hazard modeling, damages on buildings and infrastructure, mitigation, and research needs. This study presents the development of a landslide database system named WISL (Würzburg Information System on Landslides), currently storing detailed landslide data for northern Bavaria, Germany, in order to enable scientific queries as well as comparisons with other regional landslide inventories. WISL is based on free open source software solutions (PostgreSQL, PostGIS) assuring good correspondence of the various softwares and to enable further extensions with specific adaptions of self-developed software. Apart from that, WISL was designed to be particularly compatible for easy communication with other databases. As a central pre-requisite for standardized, homogeneous data acquisition in the field, a customized data sheet for landslide description was compiled. This sheet also serves as an input mask for all data registration procedures in WISL. A variety of "in-database" solutions for landslide analysis provides the necessary scalability for the database, enabling operations at the local server. In its current state, WISL already enables extensive analysis and queries. This paper presents an example analysis of landslides in Oxfordian Limestones in the northeastern Franconian Alb, northern Bavaria. The results reveal widely differing landslides in terms of geometry and size. Further queries related to landslide activity classifies the majority of the landslides as currently inactive, however, they clearly possess a certain potential for remobilization. Along with some active mass movements, a significant percentage of landslides potentially endangers residential areas or infrastructure. The main aspect of future enhancements of the WISL database is related to data extensions in order to increase research possibilities, as well as to transfer the system to other regions and countries.
Applications of GIS and database technologies to manage a Karst Feature Database
Gao, Y.; Tipping, R.G.; Alexander, E.C.
2006-01-01
This paper describes the management of a Karst Feature Database (KFD) in Minnesota. Two sets of applications in both GIS and Database Management System (DBMS) have been developed for the KFD of Minnesota. These applications were used to manage and to enhance the usability of the KFD. Structured Query Language (SQL) was used to manipulate transactions of the database and to facilitate the functionality of the user interfaces. The Database Administrator (DBA) authorized users with different access permissions to enhance the security of the database. Database consistency and recovery are accomplished by creating data logs and maintaining backups on a regular basis. The working database provides guidelines and management tools for future studies of karst features in Minnesota. The methodology of designing this DBMS is applicable to develop GIS-based databases to analyze and manage geomorphic and hydrologic datasets at both regional and local scales. The short-term goal of this research is to develop a regional KFD for the Upper Mississippi Valley Karst and the long-term goal is to expand this database to manage and study karst features at national and global scales.
Analysis of Cloud-Based Database Systems
2015-06-01
EU) citizens under the Patriot Act [3]. Unforeseen virtualization bugs have caused wide-reaching outages [4], leaving customers helpless to assist...collected from SQL Server Profiler traces. We analyze the trace results captured from our test bed both before and after increasing system resources...cloud test- bed . A. DATA COLLECTION, PARSING, AND ORGANIZATION Once we finished collecting the trace data, we knew we needed to have as close a
Development of Integrated Information System for Travel Bureau Company
NASA Astrophysics Data System (ADS)
Karma, I. G. M.; Susanti, J.
2018-01-01
Related to the effectiveness of decision-making by the management of travel bureau company, especially by managers, information serves frequent delays or incomplete. Although already computer-assisted, the existing application-based is used only handle one particular activity only, not integrated. This research is intended to produce an integrated information system that handles the overall operational activities of the company. By applying the object-oriented system development approach, the system is built with Visual Basic. Net programming language and MySQL database package. The result is a system that consists of 4 (four) separated program packages, including Reservation System, AR System, AP System and Accounting System. Based on the output, we can conclude that this system is able to produce integrated information that related to the problem of reservation, operational and financial those produce up-to-date information in order to support operational activities and decisionmaking process by related parties.