Heterogeneous database integration in biomedicine.
Sujansky, W
2001-08-01
The rapid expansion of biomedical knowledge, reduction in computing costs, and spread of internet access have created an ocean of electronic data. The decentralized nature of our scientific community and healthcare system, however, has resulted in a patchwork of diverse, or heterogeneous, database implementations, making access to and aggregation of data across databases very difficult. The database heterogeneity problem applies equally to clinical data describing individual patients and biological data characterizing our genome. Specifically, databases are highly heterogeneous with respect to the data models they employ, the data schemas they specify, the query languages they support, and the terminologies they recognize. Heterogeneous database systems attempt to unify disparate databases by providing uniform conceptual schemas that resolve representational heterogeneities, and by providing querying capabilities that aggregate and integrate distributed data. Research in this area has applied a variety of database and knowledge-based techniques, including semantic data modeling, ontology definition, query translation, query optimization, and terminology mapping. Existing systems have addressed heterogeneous database integration in the realms of molecular biology, hospital information systems, and application portability.
System for Performing Single Query Searches of Heterogeneous and Dispersed Databases
NASA Technical Reports Server (NTRS)
Maluf, David A. (Inventor); Okimura, Takeshi (Inventor); Gurram, Mohana M. (Inventor); Tran, Vu Hoang (Inventor); Knight, Christopher D. (Inventor); Trinh, Anh Ngoc (Inventor)
2017-01-01
The present invention is a distributed computer system of heterogeneous databases joined in an information grid and configured with an Application Programming Interface hardware which includes a search engine component for performing user-structured queries on multiple heterogeneous databases in real time. This invention reduces overhead associated with the impedance mismatch that commonly occurs in heterogeneous database queries.
Performance related issues in distributed database systems
NASA Technical Reports Server (NTRS)
Mukkamala, Ravi
1991-01-01
The key elements of research performed during the year long effort of this project are: Investigate the effects of heterogeneity in distributed real time systems; Study the requirements to TRAC towards building a heterogeneous database system; Study the effects of performance modeling on distributed database performance; and Experiment with an ORACLE based heterogeneous system.
NASA Technical Reports Server (NTRS)
Moroh, Marsha
1988-01-01
A methodology for building interfaces of resident database management systems to a heterogeneous distributed database management system under development at NASA, the DAVID system, was developed. The feasibility of that methodology was demonstrated by construction of the software necessary to perform the interface task. The interface terminology developed in the course of this research is presented. The work performed and the results are summarized.
Heterogeneous distributed query processing: The DAVID system
NASA Technical Reports Server (NTRS)
Jacobs, Barry E.
1985-01-01
The objective of the Distributed Access View Integrated Database (DAVID) project is the development of an easy to use computer system with which NASA scientists, engineers and administrators can uniformly access distributed heterogeneous databases. Basically, DAVID will be a database management system that sits alongside already existing database and file management systems. Its function is to enable users to access the data in other languages and file systems without having to learn the data manipulation languages. Given here is an outline of a talk on the DAVID project and several charts.
Pape-Haugaard, Louise; Frank, Lars
2011-01-01
A major obstacle in ensuring ubiquitous information is the utilization of heterogeneous systems in eHealth. The objective in this paper is to illustrate how an architecture for distributed eHealth databases can be designed without lacking the characteristic features of traditional sustainable databases. The approach is firstly to explain traditional architecture in central and homogeneous distributed database computing, followed by a possible approach to use an architectural framework to obtain sustainability across disparate systems i.e. heterogeneous databases, concluded with a discussion. It is seen that through a method of using relaxed ACID properties on a service-oriented architecture it is possible to achieve data consistency which is essential when ensuring sustainable interoperability.
Interconnecting heterogeneous database management systems
NASA Technical Reports Server (NTRS)
Gligor, V. D.; Luckenbaugh, G. L.
1984-01-01
It is pointed out that there is still a great need for the development of improved communication between remote, heterogeneous database management systems (DBMS). Problems regarding the effective communication between distributed DBMSs are primarily related to significant differences between local data managers, local data models and representations, and local transaction managers. A system of interconnected DBMSs which exhibit such differences is called a network of distributed, heterogeneous DBMSs. In order to achieve effective interconnection of remote, heterogeneous DBMSs, the users must have uniform, integrated access to the different DBMs. The present investigation is mainly concerned with an analysis of the existing approaches to interconnecting heterogeneous DBMSs, taking into account four experimental DBMS projects.
Managing Heterogeneous Information Systems through Discovery and Retrieval of Generic Concepts.
ERIC Educational Resources Information Center
Srinivasan, Uma; Ngu, Anne H. H.; Gedeon, Tom
2000-01-01
Introduces a conceptual integration approach to heterogeneous databases or information systems that exploits the similarity in metalevel information and performs metadata mining on database objects to discover a set of concepts that serve as a domain abstraction and provide a conceptual layer above existing legacy systems. Presents results of…
Heterogeneous distributed databases: A case study
NASA Technical Reports Server (NTRS)
Stewart, Tracy R.; Mukkamala, Ravi
1991-01-01
Alternatives are reviewed for accessing distributed heterogeneous databases and a recommended solution is proposed. The current study is limited to the Automated Information Systems Center at the Naval Sea Combat Systems Engineering Station at Norfolk, VA. This center maintains two databases located on Digital Equipment Corporation's VAX computers running under the VMS operating system. The first data base, ICMS, resides on a VAX11/780 and has been implemented using VAX DBMS, a CODASYL based system. The second database, CSA, resides on a VAX 6460 and has been implemented using the ORACLE relational database management system (RDBMS). Both databases are used for configuration management within the U.S. Navy. Different customer bases are supported by each database. ICMS tracks U.S. Navy ships and major systems (anti-sub, sonar, etc.). Even though the major systems on ships and submarines have totally different functions, some of the equipment within the major systems are common to both ships and submarines.
Distributed Access View Integrated Database (DAVID) system
NASA Technical Reports Server (NTRS)
Jacobs, Barry E.
1991-01-01
The Distributed Access View Integrated Database (DAVID) System, which was adopted by the Astrophysics Division for their Astrophysics Data System, is a solution to the system heterogeneity problem. The heterogeneous components of the Astrophysics problem is outlined. The Library and Library Consortium levels of the DAVID approach are described. The 'books' and 'kits' level is discussed. The Universal Object Typer Management System level is described. The relation of the DAVID project with the Small Business Innovative Research (SBIR) program is explained.
A New Approach To Secure Federated Information Bases Using Agent Technology.
ERIC Educational Resources Information Center
Weippi, Edgar; Klug, Ludwig; Essmayr, Wolfgang
2003-01-01
Discusses database agents which can be used to establish federated information bases by integrating heterogeneous databases. Highlights include characteristics of federated information bases, including incompatible database management systems, schemata, and frequently changing context; software agent technology; Java agents; system architecture;…
Heterogeneity in Health Care Computing Environments
Sengupta, Soumitra
1989-01-01
This paper discusses issues of heterogeneity in computer systems, networks, databases, and presentation techniques, and the problems it creates in developing integrated medical information systems. The need for institutional, comprehensive goals are emphasized. Using the Columbia-Presbyterian Medical Center's computing environment as the case study, various steps to solve the heterogeneity problem are presented.
NASA Technical Reports Server (NTRS)
Kelley, Steve; Roussopoulos, Nick; Sellis, Timos; Wallace, Sarah
1993-01-01
The Universal Index System (UIS) is an index management system that uses a uniform interface to solve the heterogeneity problem among database management systems. UIS provides an easy-to-use common interface to access all underlying data, but also allows different underlying database management systems, storage representations, and access methods.
Common Database Interface for Heterogeneous Software Engineering Tools.
1987-12-01
SUB-GROUP Database Management Systems ;Programming(Comuters); 1e 05 Computer Files;Information Transfer;Interfaces; 19. ABSTRACT (Continue on reverse...Air Force Institute of Technology Air University In Partial Fulfillment of the Requirements for the Degree of Master of Science in Information Systems ...Literature ..... 8 System 690 Configuration ......... 8 Database Functionis ............ 14 Software Engineering Environments ... 14 Data Manager
SIMS: addressing the problem of heterogeneity in databases
NASA Astrophysics Data System (ADS)
Arens, Yigal
1997-02-01
The heterogeneity of remotely accessible databases -- with respect to contents, query language, semantics, organization, etc. -- presents serious obstacles to convenient querying. The SIMS (single interface to multiple sources) system addresses this global integration problem. It does so by defining a single language for describing the domain about which information is stored in the databases and using this language as the query language. Each database to which SIMS is to provide access is modeled using this language. The model describes a database's contents, organization, and other relevant features. SIMS uses these models, together with a planning system drawing on techniques from artificial intelligence, to decompose a given user's high-level query into a series of queries against the databases and other data manipulation steps. The retrieval plan is constructed so as to minimize data movement over the network and maximize parallelism to increase execution speed. SIMS can recover from network failures during plan execution by obtaining data from alternate sources, when possible. SIMS has been demonstrated in the domains of medical informatics and logistics, using real databases.
A probabilistic approach to information retrieval in heterogeneous databases
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chatterjee, A.; Segev, A.
During the post decade, organizations have increased their scope and operations beyond their traditional geographic boundaries. At the same time, they have adopted heterogeneous and incompatible information systems independent of each other without a careful consideration that one day they may need to be integrated. As a result of this diversity, many important business applications today require access to data stored in multiple autonomous databases. This paper examines a problem of inter-database information retrieval in a heterogeneous environment, where conventional techniques are no longer efficient. To solve the problem, broader definitions for join, union, intersection and selection operators are proposed.more » Also, a probabilistic method to specify the selectivity of these operators is discussed. An algorithm to compute these probabilities is provided in pseudocode.« less
Bichutskiy, Vadim Y.; Colman, Richard; Brachmann, Rainer K.; Lathrop, Richard H.
2006-01-01
Complex problems in life science research give rise to multidisciplinary collaboration, and hence, to the need for heterogeneous database integration. The tumor suppressor p53 is mutated in close to 50% of human cancers, and a small drug-like molecule with the ability to restore native function to cancerous p53 mutants is a long-held medical goal of cancer treatment. The Cancer Research DataBase (CRDB) was designed in support of a project to find such small molecules. As a cancer informatics project, the CRDB involved small molecule data, computational docking results, functional assays, and protein structure data. As an example of the hybrid strategy for data integration, it combined the mediation and data warehousing approaches. This paper uses the CRDB to illustrate the hybrid strategy as a viable approach to heterogeneous data integration in biomedicine, and provides a design method for those considering similar systems. More efficient data sharing implies increased productivity, and, hopefully, improved chances of success in cancer research. (Code and database schemas are freely downloadable, http://www.igb.uci.edu/research/research.html.) PMID:19458771
[Tumor Data Interacted System Design Based on Grid Platform].
Liu, Ying; Cao, Jiaji; Zhang, Haowei; Zhang, Ke
2016-06-01
In order to satisfy demands of massive and heterogeneous tumor clinical data processing and the multi-center collaborative diagnosis and treatment for tumor diseases,a Tumor Data Interacted System(TDIS)was established based on grid platform,so that an implementing virtualization platform of tumor diagnosis service was realized,sharing tumor information in real time and carrying on standardized management.The system adopts Globus Toolkit 4.0tools to build the open grid service framework and encapsulats data resources based on Web Services Resource Framework(WSRF).The system uses the middleware technology to provide unified access interface for heterogeneous data interaction,which could optimize interactive process with virtualized service to query and call tumor information resources flexibly.For massive amounts of heterogeneous tumor data,the federated stored and multiple authorized mode is selected as security services mechanism,real-time monitoring and balancing load.The system can cooperatively manage multi-center heterogeneous tumor data to realize the tumor patient data query,sharing and analysis,and compare and match resources in typical clinical database or clinical information database in other service node,thus it can assist doctors in consulting similar case and making up multidisciplinary treatment plan for tumors.Consequently,the system can improve efficiency of diagnosis and treatment for tumor,and promote the development of collaborative tumor diagnosis model.
An incremental database access method for autonomous interoperable databases
NASA Technical Reports Server (NTRS)
Roussopoulos, Nicholas; Sellis, Timos
1994-01-01
We investigated a number of design and performance issues of interoperable database management systems (DBMS's). The major results of our investigation were obtained in the areas of client-server database architectures for heterogeneous DBMS's, incremental computation models, buffer management techniques, and query optimization. We finished a prototype of an advanced client-server workstation-based DBMS which allows access to multiple heterogeneous commercial DBMS's. Experiments and simulations were then run to compare its performance with the standard client-server architectures. The focus of this research was on adaptive optimization methods of heterogeneous database systems. Adaptive buffer management accounts for the random and object-oriented access methods for which no known characterization of the access patterns exists. Adaptive query optimization means that value distributions and selectives, which play the most significant role in query plan evaluation, are continuously refined to reflect the actual values as opposed to static ones that are computed off-line. Query feedback is a concept that was first introduced to the literature by our group. We employed query feedback for both adaptive buffer management and for computing value distributions and selectivities. For adaptive buffer management, we use the page faults of prior executions to achieve more 'informed' management decisions. For the estimation of the distributions of the selectivities, we use curve-fitting techniques, such as least squares and splines, for regressing on these values.
A semantic data dictionary method for database schema integration in CIESIN
NASA Astrophysics Data System (ADS)
Hinds, N.; Huang, Y.; Ravishankar, C.
1993-08-01
CIESIN (Consortium for International Earth Science Information Network) is funded by NASA to investigate the technology necessary to integrate and facilitate the interdisciplinary use of Global Change information. A clear of this mission includes providing a link between the various global change data sets, in particular the physical sciences and the human (social) sciences. The typical scientist using the CIESIN system will want to know how phenomena in an outside field affects his/her work. For example, a medical researcher might ask: how does air-quality effect emphysema? This and many similar questions will require sophisticated semantic data integration. The researcher who raised the question may be familiar with medical data sets containing emphysema occurrences. But this same investigator may know little, if anything, about the existance or location of air-quality data. It is easy to envision a system which would allow that investigator to locate and perform a ``join'' on two data sets, one containing emphysema cases and the other containing air-quality levels. No such system exists today. One major obstacle to providing such a system will be overcoming the heterogeneity which falls into two broad categories. ``Database system'' heterogeneity involves differences in data models and packages. ``Data semantic'' heterogeneity involves differences in terminology between disciplines which translates into data semantic issues, and varying levels of data refinement, from raw to summary. Our work investigates a global data dictionary mechanism to facilitate a merged data service. Specially, we propose using a semantic tree during schema definition to aid in locating and integrating heterogeneous databases.
BIOZON: a system for unification, management and analysis of heterogeneous biological data.
Birkland, Aaron; Yona, Golan
2006-02-15
Integration of heterogeneous data types is a challenging problem, especially in biology, where the number of databases and data types increase rapidly. Amongst the problems that one has to face are integrity, consistency, redundancy, connectivity, expressiveness and updatability. Here we present a system (Biozon) that addresses these problems, and offers biologists a new knowledge resource to navigate through and explore. Biozon unifies multiple biological databases consisting of a variety of data types (such as DNA sequences, proteins, interactions and cellular pathways). It is fundamentally different from previous efforts as it uses a single extensive and tightly connected graph schema wrapped with hierarchical ontology of documents and relations. Beyond warehousing existing data, Biozon computes and stores novel derived data, such as similarity relationships and functional predictions. The integration of similarity data allows propagation of knowledge through inference and fuzzy searches. Sophisticated methods of query that span multiple data types were implemented and first-of-a-kind biological ranking systems were explored and integrated. The Biozon system is an extensive knowledge resource of heterogeneous biological data. Currently, it holds more than 100 million biological documents and 6.5 billion relations between them. The database is accessible through an advanced web interface that supports complex queries, "fuzzy" searches, data materialization and more, online at http://biozon.org.
Constructing compact and effective graphs for recommender systems via node and edge aggregations
Lee, Sangkeun; Kahng, Minsuk; Lee, Sang-goo
2014-12-10
Exploiting graphs for recommender systems has great potential to flexibly incorporate heterogeneous information for producing better recommendation results. As our baseline approach, we first introduce a naive graph-based recommendation method, which operates with a heterogeneous log-metadata graph constructed from user log and content metadata databases. Although the na ve graph-based recommendation method is simple, it allows us to take advantages of heterogeneous information and shows promising flexibility and recommendation accuracy. However, it often leads to extensive processing time due to the sheer size of the graphs constructed from entire user log and content metadata databases. In this paper, we proposemore » node and edge aggregation approaches to constructing compact and e ective graphs called Factor-Item bipartite graphs by aggregating nodes and edges of a log-metadata graph. Furthermore, experimental results using real world datasets indicate that our approach can significantly reduce the size of graphs exploited for recommender systems without sacrificing the recommendation quality.« less
Database interfaces on NASA's heterogeneous distributed database system
NASA Technical Reports Server (NTRS)
Huang, Shou-Hsuan Stephen
1989-01-01
The syntax and semantics of all commands used in the template are described. Template builders should consult this document for proper commands in the template. Previous documents (Semiannual reports) described other aspects of this project. Appendix 1 contains all substituting commands used in the system. Appendix 2 includes all repeating commands. Appendix 3 is a collection of DEFINE templates from eight different DBMS's.
Database interfaces on NASA's heterogeneous distributed database system
NASA Technical Reports Server (NTRS)
Huang, S. H. S.
1986-01-01
The purpose of the ORACLE interface is to enable the DAVID program to submit queries and transactions to databases running under the ORACLE DBMS. The interface package is made up of several modules. The progress of these modules is described below. The two approaches used in implementing the interface are also discussed. Detailed discussion of the design of the templates is shown and concluding remarks are presented.
NASA Astrophysics Data System (ADS)
Thakore, Arun K.; Sauer, Frank
1994-05-01
The organization of modern medical care environments into disease-related clusters, such as a cancer center, a diabetes clinic, etc., has the side-effect of introducing multiple heterogeneous databases, often containing similar information, within the same organization. This heterogeneity fosters incompatibility and prevents the effective sharing of data amongst applications at different sites. Although integration of heterogeneous databases is now feasible, in the medical arena this is often an ad hoc process, not founded on proven database technology or formal methods. In this paper we illustrate the use of a high-level object- oriented semantic association method to model information found in different databases into an integrated conceptual global model that integrates the databases. We provide examples from the medical domain to illustrate an integration approach resulting in a consistent global view, without attacking the autonomy of the underlying databases.
Case retrieval in medical databases by fusing heterogeneous information.
Quellec, Gwénolé; Lamard, Mathieu; Cazuguel, Guy; Roux, Christian; Cochener, Béatrice
2011-01-01
A novel content-based heterogeneous information retrieval framework, particularly well suited to browse medical databases and support new generation computer aided diagnosis (CADx) systems, is presented in this paper. It was designed to retrieve possibly incomplete documents, consisting of several images and semantic information, from a database; more complex data types such as videos can also be included in the framework. The proposed retrieval method relies on image processing, in order to characterize each individual image in a document by their digital content, and information fusion. Once the available images in a query document are characterized, a degree of match, between the query document and each reference document stored in the database, is defined for each attribute (an image feature or a metadata). A Bayesian network is used to recover missing information if need be. Finally, two novel information fusion methods are proposed to combine these degrees of match, in order to rank the reference documents by decreasing relevance for the query. In the first method, the degrees of match are fused by the Bayesian network itself. In the second method, they are fused by the Dezert-Smarandache theory: the second approach lets us model our confidence in each source of information (i.e., each attribute) and take it into account in the fusion process for a better retrieval performance. The proposed methods were applied to two heterogeneous medical databases, a diabetic retinopathy database and a mammography screening database, for computer aided diagnosis. Precisions at five of 0.809 ± 0.158 and 0.821 ± 0.177, respectively, were obtained for these two databases, which is very promising.
Fiacco, P. A.; Rice, W. H.
1991-01-01
Computerized medical record systems require structured database architectures for information processing. However, the data must be able to be transferred across heterogeneous platform and software systems. Client-Server architecture allows for distributive processing of information among networked computers and provides the flexibility needed to link diverse systems together effectively. We have incorporated this client-server model with a graphical user interface into an outpatient medical record system, known as SuperChart, for the Department of Family Medicine at SUNY Health Science Center at Syracuse. SuperChart was developed using SuperCard and Oracle SuperCard uses modern object-oriented programming to support a hypermedia environment. Oracle is a powerful relational database management system that incorporates a client-server architecture. This provides both a distributed database and distributed processing which improves performance. PMID:1807732
An Examination of Multi-Tier Designs for Legacy Data Access
1997-12-01
heterogeneous relational database management systems. The first test system incorporates a two-tier architecture design using Java, and the second system...employs a three-tier architecture design using Java and CORBA. Data on replication times for the two-tier and three-tier designs are presented
Stahl, Olivier; Duvergey, Hugo; Guille, Arnaud; Blondin, Fanny; Vecchio, Alexandre Del; Finetti, Pascal; Granjeaud, Samuel; Vigy, Oana; Bidaut, Ghislain
2013-06-06
With the advance of post-genomic technologies, the need for tools to manage large scale data in biology becomes more pressing. This involves annotating and storing data securely, as well as granting permissions flexibly with several technologies (all array types, flow cytometry, proteomics) for collaborative work and data sharing. This task is not easily achieved with most systems available today. We developed Djeen (Database for Joomla!'s Extensible Engine), a new Research Information Management System (RIMS) for collaborative projects. Djeen is a user-friendly application, designed to streamline data storage and annotation collaboratively. Its database model, kept simple, is compliant with most technologies and allows storing and managing of heterogeneous data with the same system. Advanced permissions are managed through different roles. Templates allow Minimum Information (MI) compliance. Djeen allows managing project associated with heterogeneous data types while enforcing annotation integrity and minimum information. Projects are managed within a hierarchy and user permissions are finely-grained for each project, user and group.Djeen Component source code (version 1.5.1) and installation documentation are available under CeCILL license from http://sourceforge.net/projects/djeen/files and supplementary material.
2013-01-01
Background With the advance of post-genomic technologies, the need for tools to manage large scale data in biology becomes more pressing. This involves annotating and storing data securely, as well as granting permissions flexibly with several technologies (all array types, flow cytometry, proteomics) for collaborative work and data sharing. This task is not easily achieved with most systems available today. Findings We developed Djeen (Database for Joomla!’s Extensible Engine), a new Research Information Management System (RIMS) for collaborative projects. Djeen is a user-friendly application, designed to streamline data storage and annotation collaboratively. Its database model, kept simple, is compliant with most technologies and allows storing and managing of heterogeneous data with the same system. Advanced permissions are managed through different roles. Templates allow Minimum Information (MI) compliance. Conclusion Djeen allows managing project associated with heterogeneous data types while enforcing annotation integrity and minimum information. Projects are managed within a hierarchy and user permissions are finely-grained for each project, user and group. Djeen Component source code (version 1.5.1) and installation documentation are available under CeCILL license from http://sourceforge.net/projects/djeen/files and supplementary material. PMID:23742665
The Database Query Support Processor (QSP)
NASA Technical Reports Server (NTRS)
1993-01-01
The number and diversity of databases available to users continues to increase dramatically. Currently, the trend is towards decentralized, client server architectures that (on the surface) are less expensive to acquire, operate, and maintain than information architectures based on centralized, monolithic mainframes. The database query support processor (QSP) effort evaluates the performance of a network level, heterogeneous database access capability. Air Force Material Command's Rome Laboratory has developed an approach, based on ANSI standard X3.138 - 1988, 'The Information Resource Dictionary System (IRDS)' to seamless access to heterogeneous databases based on extensions to data dictionary technology. To successfully query a decentralized information system, users must know what data are available from which source, or have the knowledge and system privileges necessary to find out this information. Privacy and security considerations prohibit free and open access to every information system in every network. Even in completely open systems, time required to locate relevant data (in systems of any appreciable size) would be better spent analyzing the data, assuming the original question was not forgotten. Extensions to data dictionary technology have the potential to more fully automate the search and retrieval for relevant data in a decentralized environment. Substantial amounts of time and money could be saved by not having to teach users what data resides in which systems and how to access each of those systems. Information describing data and how to get it could be removed from the application and placed in a dedicated repository where it belongs. The result simplified applications that are less brittle and less expensive to build and maintain. Software technology providing the required functionality is off the shelf. The key difficulty is in defining the metadata required to support the process. The database query support processor effort will provide quantitative data on the amount of effort required to implement an extended data dictionary at the network level, add new systems, adapt to changing user needs, and provide sound estimates on operations and maintenance costs and savings.
Ultra-Structure database design methodology for managing systems biology data and analyses
Maier, Christopher W; Long, Jeffrey G; Hemminger, Bradley M; Giddings, Morgan C
2009-01-01
Background Modern, high-throughput biological experiments generate copious, heterogeneous, interconnected data sets. Research is dynamic, with frequently changing protocols, techniques, instruments, and file formats. Because of these factors, systems designed to manage and integrate modern biological data sets often end up as large, unwieldy databases that become difficult to maintain or evolve. The novel rule-based approach of the Ultra-Structure design methodology presents a potential solution to this problem. By representing both data and processes as formal rules within a database, an Ultra-Structure system constitutes a flexible framework that enables users to explicitly store domain knowledge in both a machine- and human-readable form. End users themselves can change the system's capabilities without programmer intervention, simply by altering database contents; no computer code or schemas need be modified. This provides flexibility in adapting to change, and allows integration of disparate, heterogenous data sets within a small core set of database tables, facilitating joint analysis and visualization without becoming unwieldy. Here, we examine the application of Ultra-Structure to our ongoing research program for the integration of large proteomic and genomic data sets (proteogenomic mapping). Results We transitioned our proteogenomic mapping information system from a traditional entity-relationship design to one based on Ultra-Structure. Our system integrates tandem mass spectrum data, genomic annotation sets, and spectrum/peptide mappings, all within a small, general framework implemented within a standard relational database system. General software procedures driven by user-modifiable rules can perform tasks such as logical deduction and location-based computations. The system is not tied specifically to proteogenomic research, but is rather designed to accommodate virtually any kind of biological research. Conclusion We find Ultra-Structure offers substantial benefits for biological information systems, the largest being the integration of diverse information sources into a common framework. This facilitates systems biology research by integrating data from disparate high-throughput techniques. It also enables us to readily incorporate new data types, sources, and domain knowledge with no change to the database structure or associated computer code. Ultra-Structure may be a significant step towards solving the hard problem of data management and integration in the systems biology era. PMID:19691849
Active in-database processing to support ambient assisted living systems.
de Morais, Wagner O; Lundström, Jens; Wickström, Nicholas
2014-08-12
As an alternative to the existing software architectures that underpin the development of smart homes and ambient assisted living (AAL) systems, this work presents a database-centric architecture that takes advantage of active databases and in-database processing. Current platforms supporting AAL systems use database management systems (DBMSs) exclusively for data storage. Active databases employ database triggers to detect and react to events taking place inside or outside of the database. DBMSs can be extended with stored procedures and functions that enable in-database processing. This means that the data processing is integrated and performed within the DBMS. The feasibility and flexibility of the proposed approach were demonstrated with the implementation of three distinct AAL services. The active database was used to detect bed-exits and to discover common room transitions and deviations during the night. In-database machine learning methods were used to model early night behaviors. Consequently, active in-database processing avoids transferring sensitive data outside the database, and this improves performance, security and privacy. Furthermore, centralizing the computation into the DBMS facilitates code reuse, adaptation and maintenance. These are important system properties that take into account the evolving heterogeneity of users, their needs and the devices that are characteristic of smart homes and AAL systems. Therefore, DBMSs can provide capabilities to address requirements for scalability, security, privacy, dependability and personalization in applications of smart environments in healthcare.
Active In-Database Processing to Support Ambient Assisted Living Systems
de Morais, Wagner O.; Lundström, Jens; Wickström, Nicholas
2014-01-01
As an alternative to the existing software architectures that underpin the development of smart homes and ambient assisted living (AAL) systems, this work presents a database-centric architecture that takes advantage of active databases and in-database processing. Current platforms supporting AAL systems use database management systems (DBMSs) exclusively for data storage. Active databases employ database triggers to detect and react to events taking place inside or outside of the database. DBMSs can be extended with stored procedures and functions that enable in-database processing. This means that the data processing is integrated and performed within the DBMS. The feasibility and flexibility of the proposed approach were demonstrated with the implementation of three distinct AAL services. The active database was used to detect bed-exits and to discover common room transitions and deviations during the night. In-database machine learning methods were used to model early night behaviors. Consequently, active in-database processing avoids transferring sensitive data outside the database, and this improves performance, security and privacy. Furthermore, centralizing the computation into the DBMS facilitates code reuse, adaptation and maintenance. These are important system properties that take into account the evolving heterogeneity of users, their needs and the devices that are characteristic of smart homes and AAL systems. Therefore, DBMSs can provide capabilities to address requirements for scalability, security, privacy, dependability and personalization in applications of smart environments in healthcare. PMID:25120164
Deeply learnt hashing forests for content based image retrieval in prostate MR images
NASA Astrophysics Data System (ADS)
Shah, Amit; Conjeti, Sailesh; Navab, Nassir; Katouzian, Amin
2016-03-01
Deluge in the size and heterogeneity of medical image databases necessitates the need for content based retrieval systems for their efficient organization. In this paper, we propose such a system to retrieve prostate MR images which share similarities in appearance and content with a query image. We introduce deeply learnt hashing forests (DL-HF) for this image retrieval task. DL-HF effectively leverages the semantic descriptiveness of deep learnt Convolutional Neural Networks. This is used in conjunction with hashing forests which are unsupervised random forests. DL-HF hierarchically parses the deep-learnt feature space to encode subspaces with compact binary code words. We propose a similarity preserving feature descriptor called Parts Histogram which is derived from DL-HF. Correlation defined on this descriptor is used as a similarity metric for retrieval from the database. Validations on publicly available multi-center prostate MR image database established the validity of the proposed approach. The proposed method is fully-automated without any user-interaction and is not dependent on any external image standardization like image normalization and registration. This image retrieval method is generalizable and is well-suited for retrieval in heterogeneous databases other imaging modalities and anatomies.
Semantic mediation in the national geologic map database (US)
Percy, D.; Richard, S.; Soller, D.
2008-01-01
Controlled language is the primary challenge in merging heterogeneous databases of geologic information. Each agency or organization produces databases with different schema, and different terminology for describing the objects within. In order to make some progress toward merging these databases using current technology, we have developed software and a workflow that allows for the "manual semantic mediation" of these geologic map databases. Enthusiastic support from many state agencies (stakeholders and data stewards) has shown that the community supports this approach. Future implementations will move toward a more Artificial Intelligence-based approach, using expert-systems or knowledge-bases to process data based on the training sets we have developed manually.
Realization of Real-Time Clinical Data Integration Using Advanced Database Technology
Yoo, Sooyoung; Kim, Boyoung; Park, Heekyong; Choi, Jinwook; Chun, Jonghoon
2003-01-01
As information & communication technologies have advanced, interest in mobile health care systems has grown. In order to obtain information seamlessly from distributed and fragmented clinical data from heterogeneous institutions, we need solutions that integrate data. In this article, we introduce a method for information integration based on real-time message communication using trigger and advanced database technologies. Messages were devised to conform to HL7, a standard for electronic data exchange in healthcare environments. The HL7 based system provides us with an integrated environment in which we are able to manage the complexities of medical data. We developed this message communication interface to generate and parse HL7 messages automatically from the database point of view. We discuss how easily real time data exchange is performed in the clinical information system, given the requirement for minimum loading of the database system. PMID:14728271
A dedicated database system for handling multi-level data in systems biology.
Pornputtapong, Natapol; Wanichthanarak, Kwanjeera; Nilsson, Avlant; Nookaew, Intawat; Nielsen, Jens
2014-01-01
Advances in high-throughput technologies have enabled extensive generation of multi-level omics data. These data are crucial for systems biology research, though they are complex, heterogeneous, highly dynamic, incomplete and distributed among public databases. This leads to difficulties in data accessibility and often results in errors when data are merged and integrated from varied resources. Therefore, integration and management of systems biological data remain very challenging. To overcome this, we designed and developed a dedicated database system that can serve and solve the vital issues in data management and hereby facilitate data integration, modeling and analysis in systems biology within a sole database. In addition, a yeast data repository was implemented as an integrated database environment which is operated by the database system. Two applications were implemented to demonstrate extensibility and utilization of the system. Both illustrate how the user can access the database via the web query function and implemented scripts. These scripts are specific for two sample cases: 1) Detecting the pheromone pathway in protein interaction networks; and 2) Finding metabolic reactions regulated by Snf1 kinase. In this study we present the design of database system which offers an extensible environment to efficiently capture the majority of biological entities and relations encountered in systems biology. Critical functions and control processes were designed and implemented to ensure consistent, efficient, secure and reliable transactions. The two sample cases on the yeast integrated data clearly demonstrate the value of a sole database environment for systems biology research.
A generalized strategy for building resident database interfaces
NASA Technical Reports Server (NTRS)
Moroh, Marsha; Wanderman, Ken
1990-01-01
A strategy for building resident interfaces to host heterogeneous distributed data base management systems is developed. The strategy is used to construct several interfaces. A set of guidelines is developed for users to construct their own interfaces.
Gupta, Amarnath; Bug, William; Marenco, Luis; Qian, Xufei; Condit, Christopher; Rangarajan, Arun; Müller, Hans Michael; Miller, Perry L.; Sanders, Brian; Grethe, Jeffrey S.; Astakhov, Vadim; Shepherd, Gordon; Sternberg, Paul W.; Martone, Maryann E.
2009-01-01
The overarching goal of the NIF (Neuroscience Information Framework) project is to be a one-stop-shop for Neuroscience. This paper provides a technical overview of how the system is designed. The technical goal of the first version of the NIF system was to develop an information system that a neuroscientist can use to locate relevant information from a wide variety of information sources by simple keyword queries. Although the user would provide only keywords to retrieve information, the NIF system is designed to treat them as concepts whose meanings are interpreted by the system. Thus, a search for term should find a record containing synonyms of the term. The system is targeted to find information from web pages, publications, databases, web sites built upon databases, XML documents and any other modality in which such information may be published. We have designed a system to achieve this functionality. A central element in the system is an ontology called NIFSTD (for NIF Standard) constructed by amalgamating a number of known and newly developed ontologies. NIFSTD is used by our ontology management module, called OntoQuest to perform ontology-based search over data sources. The NIF architecture currently provides three different mechanisms for searching heterogeneous data sources including relational databases, web sites, XML documents and full text of publications. Version 1.0 of the NIF system is currently in beta test and may be accessed through http://nif.nih.gov. PMID:18958629
Gupta, Amarnath; Bug, William; Marenco, Luis; Qian, Xufei; Condit, Christopher; Rangarajan, Arun; Müller, Hans Michael; Miller, Perry L; Sanders, Brian; Grethe, Jeffrey S; Astakhov, Vadim; Shepherd, Gordon; Sternberg, Paul W; Martone, Maryann E
2008-09-01
The overarching goal of the NIF (Neuroscience Information Framework) project is to be a one-stop-shop for Neuroscience. This paper provides a technical overview of how the system is designed. The technical goal of the first version of the NIF system was to develop an information system that a neuroscientist can use to locate relevant information from a wide variety of information sources by simple keyword queries. Although the user would provide only keywords to retrieve information, the NIF system is designed to treat them as concepts whose meanings are interpreted by the system. Thus, a search for term should find a record containing synonyms of the term. The system is targeted to find information from web pages, publications, databases, web sites built upon databases, XML documents and any other modality in which such information may be published. We have designed a system to achieve this functionality. A central element in the system is an ontology called NIFSTD (for NIF Standard) constructed by amalgamating a number of known and newly developed ontologies. NIFSTD is used by our ontology management module, called OntoQuest to perform ontology-based search over data sources. The NIF architecture currently provides three different mechanisms for searching heterogeneous data sources including relational databases, web sites, XML documents and full text of publications. Version 1.0 of the NIF system is currently in beta test and may be accessed through http://nif.nih.gov.
The Battle Command Sustainment Support System: Initial Analysis Report
2016-09-01
diagnostic monitoring, asynchronous commits, and others. The other components of the NEDP include a main forwarding gateway /web server and one or more...NATIONAL ENTERPRISE DATA PORTAL ANALYSIS The NEDP is comprised of an Oracle Database 10g referred to as the National Data Server and several other...data forwarding gateways (DFG). Together, with the Oracle Database 10g, these components provide a heterogeneous data source that aligns various data
NASA Astrophysics Data System (ADS)
Piasecki, M.; Beran, B.
2007-12-01
Search engines have changed the way we see the Internet. The ability to find the information by just typing in keywords was a big contribution to the overall web experience. While the conventional search engine methodology worked well for textual documents, locating scientific data remains a problem since they are stored in databases not readily accessible by search engine bots. Considering different temporal, spatial and thematic coverage of different databases, especially for interdisciplinary research it is typically necessary to work with multiple data sources. These sources can be federal agencies which generally offer national coverage or regional sources which cover a smaller area with higher detail. However for a given geographic area of interest there often exists more than one database with relevant data. Thus being able to query multiple databases simultaneously is a desirable feature that would be tremendously useful for scientists. Development of such a search engine requires dealing with various heterogeneity issues. In scientific databases, systems often impose controlled vocabularies which ensure that they are generally homogeneous within themselves but are semantically heterogeneous when moving between different databases. This defines the boundaries of possible semantic related problems making it easier to solve than with the conventional search engines that deal with free text. We have developed a search engine that enables querying multiple data sources simultaneously and returns data in a standardized output despite the aforementioned heterogeneity issues between the underlying systems. This application relies mainly on metadata catalogs or indexing databases, ontologies and webservices with virtual globe and AJAX technologies for the graphical user interface. Users can trigger a search of dozens of different parameters over hundreds of thousands of stations from multiple agencies by providing a keyword, a spatial extent, i.e. a bounding box, and a temporal bracket. As part of this development we have also added an environment that allows users to do some of the semantic tagging, i.e. the linkage of a variable name (which can be anything they desire) to defined concepts in the ontology structure which in turn provides the backbone of the search engine.
Surviving the Glut: The Management of Event Streams in Cyberphysical Systems
NASA Astrophysics Data System (ADS)
Buchmann, Alejandro
Alejandro Buchmann is Professor in the Department of Computer Science, Technische Universität Darmstadt, where he heads the Databases and Distributed Systems Group. He received his MS (1977) and PhD (1980) from the University of Texas at Austin. He was an Assistant/Associate Professor at the Institute for Applied Mathematics and Systems IIMAS/UNAM in Mexico, doing research on databases for CAD, geographic information systems, and objectoriented databases. At Computer Corporation of America (later Xerox Advanced Information Systems) in Cambridge, Mass., he worked in the areas of active databases and real-time databases, and at GTE Laboratories, Waltham, in the areas of distributed object systems and the integration of heterogeneous legacy systems. 1991 he returned to academia and joined T.U. Darmstadt. His current research interests are at the intersection of middleware, databases, eventbased distributed systems, ubiquitous computing, and very large distributed systems (P2P, WSN). Much of the current research is concerned with guaranteeing quality of service and reliability properties in these systems, for example, scalability, performance, transactional behaviour, consistency, and end-to-end security. Many research projects imply collaboration with industry and cover a broad spectrum of application domains. Further information can be found at http://www.dvs.tu-darmstadt.de
Ontology based heterogeneous materials database integration and semantic query
NASA Astrophysics Data System (ADS)
Zhao, Shuai; Qian, Quan
2017-10-01
Materials digital data, high throughput experiments and high throughput computations are regarded as three key pillars of materials genome initiatives. With the fast growth of materials data, the integration and sharing of data is very urgent, that has gradually become a hot topic of materials informatics. Due to the lack of semantic description, it is difficult to integrate data deeply in semantic level when adopting the conventional heterogeneous database integration approaches such as federal database or data warehouse. In this paper, a semantic integration method is proposed to create the semantic ontology by extracting the database schema semi-automatically. Other heterogeneous databases are integrated to the ontology by means of relational algebra and the rooted graph. Based on integrated ontology, semantic query can be done using SPARQL. During the experiments, two world famous First Principle Computational databases, OQMD and Materials Project are used as the integration targets, which show the availability and effectiveness of our method.
SQL is Dead; Long-live SQL: Relational Database Technology in Science Contexts
NASA Astrophysics Data System (ADS)
Howe, B.; Halperin, D.
2014-12-01
Relational databases are often perceived as a poor fit in science contexts: Rigid schemas, poor support for complex analytics, unpredictable performance, significant maintenance and tuning requirements --- these idiosyncrasies often make databases unattractive in science contexts characterized by heterogeneous data sources, complex analysis tasks, rapidly changing requirements, and limited IT budgets. In this talk, I'll argue that although the value proposition of typical relational database systems are weak in science, the core ideas that power relational databases have become incredibly prolific in open source science software, and are emerging as a universal abstraction for both big data and small data. In addition, I'll talk about two open source systems we are building to "jailbreak" the core technology of relational databases and adapt them for use in science. The first is SQLShare, a Database-as-a-Service system supporting collaborative data analysis and exchange by reducing database use to an Upload-Query-Share workflow with no installation, schema design, or configuration required. The second is Myria, a service that supports much larger scale data, complex analytics, and supports multiple back end systems. Finally, I'll describe some of the ways our collaborators in oceanography, astronomy, biology, fisheries science, and more are using these systems to replace script-based workflows for reasons of performance, flexibility, and convenience.
Clinical results of HIS, RIS, PACS integration using data integration CASE tools
NASA Astrophysics Data System (ADS)
Taira, Ricky K.; Chan, Hing-Ming; Breant, Claudine M.; Huang, Lu J.; Valentino, Daniel J.
1995-05-01
Current infrastructure research in PACS is dominated by the development of communication networks (local area networks, teleradiology, ATM networks, etc.), multimedia display workstations, and hierarchical image storage architectures. However, limited work has been performed on developing flexible, expansible, and intelligent information processing architectures for the vast decentralized image and text data repositories prevalent in healthcare environments. Patient information is often distributed among multiple data management systems. Current large-scale efforts to integrate medical information and knowledge sources have been costly with limited retrieval functionality. Software integration strategies to unify distributed data and knowledge sources is still lacking commercially. Systems heterogeneity (i.e., differences in hardware platforms, communication protocols, database management software, nomenclature, etc.) is at the heart of the problem and is unlikely to be standardized in the near future. In this paper, we demonstrate the use of newly available CASE (computer- aided software engineering) tools to rapidly integrate HIS, RIS, and PACS information systems. The advantages of these tools include fast development time (low-level code is generated from graphical specifications), and easy system maintenance (excellent documentation, easy to perform changes, and centralized code repository in an object-oriented database). The CASE tools are used to develop and manage the `middle-ware' in our client- mediator-serve architecture for systems integration. Our architecture is scalable and can accommodate heterogeneous database and communication protocols.
Experiments and Analysis on a Computer Interface to an Information-Retrieval Network.
ERIC Educational Resources Information Center
Marcus, Richard S.; Reintjes, J. Francis
A primary goal of this project was to develop an interface that would provide direct access for inexperienced users to existing online bibliographic information retrieval networks. The experiment tested the concept of a virtual-system mode of access to a network of heterogeneous interactive retrieval systems and databases. An experimental…
BioWarehouse: a bioinformatics database warehouse toolkit
Lee, Thomas J; Pouliot, Yannick; Wagner, Valerie; Gupta, Priyanka; Stringer-Calvert, David WJ; Tenenbaum, Jessica D; Karp, Peter D
2006-01-01
Background This article addresses the problem of interoperation of heterogeneous bioinformatics databases. Results We introduce BioWarehouse, an open source toolkit for constructing bioinformatics database warehouses using the MySQL and Oracle relational database managers. BioWarehouse integrates its component databases into a common representational framework within a single database management system, thus enabling multi-database queries using the Structured Query Language (SQL) but also facilitating a variety of database integration tasks such as comparative analysis and data mining. BioWarehouse currently supports the integration of a pathway-centric set of databases including ENZYME, KEGG, and BioCyc, and in addition the UniProt, GenBank, NCBI Taxonomy, and CMR databases, and the Gene Ontology. Loader tools, written in the C and JAVA languages, parse and load these databases into a relational database schema. The loaders also apply a degree of semantic normalization to their respective source data, decreasing semantic heterogeneity. The schema supports the following bioinformatics datatypes: chemical compounds, biochemical reactions, metabolic pathways, proteins, genes, nucleic acid sequences, features on protein and nucleic-acid sequences, organisms, organism taxonomies, and controlled vocabularies. As an application example, we applied BioWarehouse to determine the fraction of biochemically characterized enzyme activities for which no sequences exist in the public sequence databases. The answer is that no sequence exists for 36% of enzyme activities for which EC numbers have been assigned. These gaps in sequence data significantly limit the accuracy of genome annotation and metabolic pathway prediction, and are a barrier for metabolic engineering. Complex queries of this type provide examples of the value of the data warehousing approach to bioinformatics research. Conclusion BioWarehouse embodies significant progress on the database integration problem for bioinformatics. PMID:16556315
BioWarehouse: a bioinformatics database warehouse toolkit.
Lee, Thomas J; Pouliot, Yannick; Wagner, Valerie; Gupta, Priyanka; Stringer-Calvert, David W J; Tenenbaum, Jessica D; Karp, Peter D
2006-03-23
This article addresses the problem of interoperation of heterogeneous bioinformatics databases. We introduce BioWarehouse, an open source toolkit for constructing bioinformatics database warehouses using the MySQL and Oracle relational database managers. BioWarehouse integrates its component databases into a common representational framework within a single database management system, thus enabling multi-database queries using the Structured Query Language (SQL) but also facilitating a variety of database integration tasks such as comparative analysis and data mining. BioWarehouse currently supports the integration of a pathway-centric set of databases including ENZYME, KEGG, and BioCyc, and in addition the UniProt, GenBank, NCBI Taxonomy, and CMR databases, and the Gene Ontology. Loader tools, written in the C and JAVA languages, parse and load these databases into a relational database schema. The loaders also apply a degree of semantic normalization to their respective source data, decreasing semantic heterogeneity. The schema supports the following bioinformatics datatypes: chemical compounds, biochemical reactions, metabolic pathways, proteins, genes, nucleic acid sequences, features on protein and nucleic-acid sequences, organisms, organism taxonomies, and controlled vocabularies. As an application example, we applied BioWarehouse to determine the fraction of biochemically characterized enzyme activities for which no sequences exist in the public sequence databases. The answer is that no sequence exists for 36% of enzyme activities for which EC numbers have been assigned. These gaps in sequence data significantly limit the accuracy of genome annotation and metabolic pathway prediction, and are a barrier for metabolic engineering. Complex queries of this type provide examples of the value of the data warehousing approach to bioinformatics research. BioWarehouse embodies significant progress on the database integration problem for bioinformatics.
Integrating Scientific Array Processing into Standard SQL
NASA Astrophysics Data System (ADS)
Misev, Dimitar; Bachhuber, Johannes; Baumann, Peter
2014-05-01
We live in a time that is dominated by data. Data storage is cheap and more applications than ever accrue vast amounts of data. Storing the emerging multidimensional data sets efficiently, however, and allowing them to be queried by their inherent structure, is a challenge many databases have to face today. Despite the fact that multidimensional array data is almost always linked to additional, non-array information, array databases have mostly developed separately from relational systems, resulting in a disparity between the two database categories. The current SQL standard and SQL DBMS supports arrays - and in an extension also multidimensional arrays - but does so in a very rudimentary and inefficient way. This poster demonstrates the practicality of an SQL extension for array processing, implemented in a proof-of-concept multi-faceted system that manages a federation of array and relational database systems, providing transparent, efficient and scalable access to the heterogeneous data in them.
Database interfaces on NASA's heterogeneous distributed database system
NASA Technical Reports Server (NTRS)
Huang, Shou-Hsuan Stephen
1987-01-01
The purpose of Distributed Access View Integrated Database (DAVID) interface module (Module 9: Resident Primitive Processing Package) is to provide data transfer between local DAVID systems and resident Data Base Management Systems (DBMSs). The result of current research is summarized. A detailed description of the interface module is provided. Several Pascal templates were constructed. The Resident Processor program was also developed. Even though it is designed for the Pascal templates, it can be modified for templates in other languages, such as C, without much difficulty. The Resident Processor itself can be written in any programming language. Since Module 5 routines are not ready yet, there is no way to test the interface module. However, simulation shows that the data base access programs produced by the Resident Processor do work according to the specifications.
ARIANE: integration of information databases within a hospital intranet.
Joubert, M; Aymard, S; Fieschi, D; Volot, F; Staccini, P; Robert, J J; Fieschi, M
1998-05-01
Large information systems handle massive volume of data stored in heterogeneous sources. Each server has its own model of representation of concepts with regard to its aims. One of the main problems end-users encounter when accessing different servers is to match their own viewpoint on biomedical concepts with the various representations that are made in the databases servers. The aim of the project ARIANE is to provide end-users with easy-to-use and natural means to access and query heterogeneous information databases. The objectives of this research work consist in building a conceptual interface by means of the Internet technology inside an enterprise Intranet and to propose a method to realize it. This method is based on the knowledge sources provided by the Unified Medical Language System (UMLS) project of the US National Library of Medicine. Experiments concern queries to three different information servers: PubMed, a Medline server of the NLM; Thériaque, a French database on drugs implemented in the Hospital Intranet; and a Web site dedicated to Internet resources in gastroenterology and nutrition, located at the Faculty of Medicine of Nice (France). Accessing to each of these servers is different according to the kind of information delivered and according to the technology used to query it. Dealing with health care professional workstation, the authors introduced in the ARIANE project quality criteria in order to attempt a homogeneous and efficient way to build a query system able to be integrated in existing information systems and to integrate existing and new information sources.
Generic Entity Resolution in Relational Databases
NASA Astrophysics Data System (ADS)
Sidló, Csaba István
Entity Resolution (ER) covers the problem of identifying distinct representations of real-world entities in heterogeneous databases. We consider the generic formulation of ER problems (GER) with exact outcome. In practice, input data usually resides in relational databases and can grow to huge volumes. Yet, typical solutions described in the literature employ standalone memory resident algorithms. In this paper we utilize facilities of standard, unmodified relational database management systems (RDBMS) to enhance the efficiency of GER algorithms. We study and revise the problem formulation, and propose practical and efficient algorithms optimized for RDBMS external memory processing. We outline a real-world scenario and demonstrate the advantage of algorithms by performing experiments on insurance customer data.
Cardiological database management system as a mediator to clinical decision support.
Pappas, C; Mavromatis, A; Maglaveras, N; Tsikotis, A; Pangalos, G; Ambrosiadou, V
1996-03-01
An object-oriented medical database management system is presented for a typical cardiologic center, facilitating epidemiological trials. Object-oriented analysis and design were used for the system design, offering advantages for the integrity and extendibility of medical information systems. The system was developed using object-oriented design and programming methodology, the C++ language and the Borland Paradox Relational Data Base Management System on an MS-Windows NT environment. Particular attention was paid to system compatibility, portability, the ease of use, and the suitable design of the patient record so as to support the decisions of medical personnel in cardiovascular centers. The system was designed to accept complex, heterogeneous, distributed data in various formats and from different kinds of examinations such as Holter, Doppler and electrocardiography.
NASA Astrophysics Data System (ADS)
WANG, Qingrong; ZHU, Changfeng
2017-06-01
Integration of distributed heterogeneous data sources is the key issues under the big data applications. In this paper the strategy of variable precision is introduced to the concept lattice, and the one-to-one mapping mode of variable precision concept lattice and ontology concept lattice is constructed to produce the local ontology by constructing the variable precision concept lattice for each subsystem, and the distributed generation algorithm of variable precision concept lattice based on ontology heterogeneous database is proposed to draw support from the special relationship between concept lattice and ontology construction. Finally, based on the standard of main concept lattice of the existing heterogeneous database generated, a case study has been carried out in order to testify the feasibility and validity of this algorithm, and the differences between the main concept lattice and the standard concept lattice are compared. Analysis results show that this algorithm above-mentioned can automatically process the construction process of distributed concept lattice under the heterogeneous data sources.
Tassy, Olivier; Dauga, Delphine; Daian, Fabrice; Sobral, Daniel; Robin, François; Khoueiry, Pierre; Salgado, David; Fox, Vanessa; Caillol, Danièle; Schiappa, Renaud; Laporte, Baptiste; Rios, Anne; Luxardi, Guillaume; Kusakabe, Takehiro; Joly, Jean-Stéphane; Darras, Sébastien; Christiaen, Lionel; Contensin, Magali; Auger, Hélène; Lamy, Clément; Hudson, Clare; Rothbächer, Ute; Gilchrist, Michael J; Makabe, Kazuhiro W; Hotta, Kohji; Fujiwara, Shigeki; Satoh, Nori; Satou, Yutaka; Lemaire, Patrick
2010-10-01
Developmental biology aims to understand how the dynamics of embryonic shapes and organ functions are encoded in linear DNA molecules. Thanks to recent progress in genomics and imaging technologies, systemic approaches are now used in parallel with small-scale studies to establish links between genomic information and phenotypes, often described at the subcellular level. Current model organism databases, however, do not integrate heterogeneous data sets at different scales into a global view of the developmental program. Here, we present a novel, generic digital system, NISEED, and its implementation, ANISEED, to ascidians, which are invertebrate chordates suitable for developmental systems biology approaches. ANISEED hosts an unprecedented combination of anatomical and molecular data on ascidian development. This includes the first detailed anatomical ontologies for these embryos, and quantitative geometrical descriptions of developing cells obtained from reconstructed three-dimensional (3D) embryos up to the gastrula stages. Fully annotated gene model sets are linked to 30,000 high-resolution spatial gene expression patterns in wild-type and experimentally manipulated conditions and to 528 experimentally validated cis-regulatory regions imported from specialized databases or extracted from 160 literature articles. This highly structured data set can be explored via a Developmental Browser, a Genome Browser, and a 3D Virtual Embryo module. We show how integration of heterogeneous data in ANISEED can provide a system-level understanding of the developmental program through the automatic inference of gene regulatory interactions, the identification of inducing signals, and the discovery and explanation of novel asymmetric divisions.
Tassy, Olivier; Dauga, Delphine; Daian, Fabrice; Sobral, Daniel; Robin, François; Khoueiry, Pierre; Salgado, David; Fox, Vanessa; Caillol, Danièle; Schiappa, Renaud; Laporte, Baptiste; Rios, Anne; Luxardi, Guillaume; Kusakabe, Takehiro; Joly, Jean-Stéphane; Darras, Sébastien; Christiaen, Lionel; Contensin, Magali; Auger, Hélène; Lamy, Clément; Hudson, Clare; Rothbächer, Ute; Gilchrist, Michael J.; Makabe, Kazuhiro W.; Hotta, Kohji; Fujiwara, Shigeki; Satoh, Nori; Satou, Yutaka; Lemaire, Patrick
2010-01-01
Developmental biology aims to understand how the dynamics of embryonic shapes and organ functions are encoded in linear DNA molecules. Thanks to recent progress in genomics and imaging technologies, systemic approaches are now used in parallel with small-scale studies to establish links between genomic information and phenotypes, often described at the subcellular level. Current model organism databases, however, do not integrate heterogeneous data sets at different scales into a global view of the developmental program. Here, we present a novel, generic digital system, NISEED, and its implementation, ANISEED, to ascidians, which are invertebrate chordates suitable for developmental systems biology approaches. ANISEED hosts an unprecedented combination of anatomical and molecular data on ascidian development. This includes the first detailed anatomical ontologies for these embryos, and quantitative geometrical descriptions of developing cells obtained from reconstructed three-dimensional (3D) embryos up to the gastrula stages. Fully annotated gene model sets are linked to 30,000 high-resolution spatial gene expression patterns in wild-type and experimentally manipulated conditions and to 528 experimentally validated cis-regulatory regions imported from specialized databases or extracted from 160 literature articles. This highly structured data set can be explored via a Developmental Browser, a Genome Browser, and a 3D Virtual Embryo module. We show how integration of heterogeneous data in ANISEED can provide a system-level understanding of the developmental program through the automatic inference of gene regulatory interactions, the identification of inducing signals, and the discovery and explanation of novel asymmetric divisions. PMID:20647237
KA-SB: from data integration to large scale reasoning
Roldán-García, María del Mar; Navas-Delgado, Ismael; Kerzazi, Amine; Chniber, Othmane; Molina-Castro, Joaquín; Aldana-Montes, José F
2009-01-01
Background The analysis of information in the biological domain is usually focused on the analysis of data from single on-line data sources. Unfortunately, studying a biological process requires having access to disperse, heterogeneous, autonomous data sources. In this context, an analysis of the information is not possible without the integration of such data. Methods KA-SB is a querying and analysis system for final users based on combining a data integration solution with a reasoner. Thus, the tool has been created with a process divided into two steps: 1) KOMF, the Khaos Ontology-based Mediator Framework, is used to retrieve information from heterogeneous and distributed databases; 2) the integrated information is crystallized in a (persistent and high performance) reasoner (DBOWL). This information could be further analyzed later (by means of querying and reasoning). Results In this paper we present a novel system that combines the use of a mediation system with the reasoning capabilities of a large scale reasoner to provide a way of finding new knowledge and of analyzing the integrated information from different databases, which is retrieved as a set of ontology instances. This tool uses a graphical query interface to build user queries easily, which shows a graphical representation of the ontology and allows users o build queries by clicking on the ontology concepts. Conclusion These kinds of systems (based on KOMF) will provide users with very large amounts of information (interpreted as ontology instances once retrieved), which cannot be managed using traditional main memory-based reasoners. We propose a process for creating persistent and scalable knowledgebases from sets of OWL instances obtained by integrating heterogeneous data sources with KOMF. This process has been applied to develop a demo tool , which uses the BioPax Level 3 ontology as the integration schema, and integrates UNIPROT, KEGG, CHEBI, BRENDA and SABIORK databases. PMID:19796402
Biological data integration: wrapping data and tools.
Lacroix, Zoé
2002-06-01
Nowadays scientific data is inevitably digital and stored in a wide variety of formats in heterogeneous systems. Scientists need to access an integrated view of remote or local heterogeneous data sources with advanced data accessing, analyzing, and visualization tools. Building a digital library for scientific data requires accessing and manipulating data extracted from flat files or databases, documents retrieved from the Web as well as data generated by software. We present an approach to wrapping web data sources, databases, flat files, or data generated by tools through a database view mechanism. Generally, a wrapper has two tasks: it first sends a query to the source to retrieve data and, second builds the expected output with respect to the virtual structure. Our wrappers are composed of a retrieval component based on an intermediate object view mechanism called search views mapping the source capabilities to attributes, and an eXtensible Markup Language (XML) engine, respectively, to perform these two tasks. The originality of the approach consists of: 1) a generic view mechanism to access seamlessly data sources with limited capabilities and 2) the ability to wrap data sources as well as the useful specific tools they may provide. Our approach has been developed and demonstrated as part of the multidatabase system supporting queries via uniform object protocol model (OPM) interfaces.
FRED, a Front End for Databases.
ERIC Educational Resources Information Center
Crystal, Maurice I.; Jakobson, Gabriel E.
1982-01-01
FRED (a Front End for Databases) was conceived to alleviate data access difficulties posed by the heterogeneous nature of online databases. A hardware/software layer interposed between users and databases, it consists of three subsystems: user-interface, database-interface, and knowledge base. Architectural alternatives for this database machine…
Ho, Lap; Cheng, Haoxiang; Wang, Jun; Simon, James E; Wu, Qingli; Zhao, Danyue; Carry, Eileen; Ferruzzi, Mario G; Faith, Jeremiah; Valcarcel, Breanna; Hao, Ke; Pasinetti, Giulio M
2018-03-05
The development of a given botanical preparation for eventual clinical application requires extensive, detailed characterizations of the chemical composition, as well as the biological availability, biological activity, and safety profiles of the botanical. These issues are typically addressed using diverse experimental protocols and model systems. Based on this consideration, in this study we established a comprehensive database and analysis framework for the collection, collation, and integrative analysis of diverse, multiscale data sets. Using this framework, we conducted an integrative analysis of heterogeneous data from in vivo and in vitro investigation of a complex bioactive dietary polyphenol-rich preparation (BDPP) and built an integrated network linking data sets generated from this multitude of diverse experimental paradigms. We established a comprehensive database and analysis framework as well as a systematic and logical means to catalogue and collate the diverse array of information gathered, which is securely stored and added to in a standardized manner to enable fast query. We demonstrated the utility of the database in (1) a statistical ranking scheme to prioritize response to treatments and (2) in depth reconstruction of functionality studies. By examination of these data sets, the system allows analytical querying of heterogeneous data and the access of information related to interactions, mechanism of actions, functions, etc., which ultimately provide a global overview of complex biological responses. Collectively, we present an integrative analysis framework that leads to novel insights on the biological activities of a complex botanical such as BDPP that is based on data-driven characterizations of interactions between BDPP-derived phenolic metabolites and their mechanisms of action, as well as synergism and/or potential cancellation of biological functions. Out integrative analytical approach provides novel means for a systematic integrative analysis of heterogeneous data types in the development of complex botanicals such as polyphenols for eventual clinical and translational applications.
Design and implementation of a CORBA-based genome mapping system prototype.
Hu, J; Mungall, C; Nicholson, D; Archibald, A L
1998-01-01
CORBA (Common Object Request Broker Architecture), as an open standard, is considered to be a good solution for the development and deployment of applications in distributed heterogeneous environments. This technology can be applied in the bioinformatics area to enhance utilization, management and interoperation between biological resources. This paper investigates issues in developing CORBA applications for genome mapping information systems in the Internet environment with emphasis on database connectivity and graphical user interfaces. The design and implementation of a CORBA prototype for an animal genome mapping database are described. The prototype demonstration is available via: http://www.ri.bbsrc.ac.uk/ark_corba/. jian.hu@bbsrc.ac.uk
Monitoring tools of COMPASS experiment at CERN
NASA Astrophysics Data System (ADS)
Bodlak, M.; Frolov, V.; Huber, S.; Jary, V.; Konorov, I.; Levit, D.; Novy, J.; Salac, R.; Tomsa, J.; Virius, M.
2015-12-01
This paper briefly introduces the data acquisition system of the COMPASS experiment and is mainly focused on the part that is responsible for the monitoring of the nodes in the whole newly developed data acquisition system of this experiment. The COMPASS is a high energy particle experiment with a fixed target located at the SPS of the CERN laboratory in Geneva, Switzerland. The hardware of the data acquisition system has been upgraded to use FPGA cards that are responsible for data multiplexing and event building. The software counterpart of the system includes several processes deployed in heterogenous network environment. There are two processes, namely Message Logger and Message Browser, taking care of monitoring. These tools handle messages generated by nodes in the system. While Message Logger collects and saves messages to the database, the Message Browser serves as a graphical interface over the database containing these messages. For better performance, certain database optimizations have been used. Lastly, results of performance tests are presented.
Integrating Distributed Homogeneous and Heterogeneous Databases: Prototypes. Volume 3.
1987-12-01
Integrating Distributed3 Institute of Teholg Homogeneous and -Knowledge-Based eeokn usDtb e: Integrated Information Pooye Systems Engineering Pooye (KBIISE...Transportation Systems Center, December 1987 Broadway, NIA 02142 13. NUMBER OF PAGES IT ~ *n~1~ ArFre 218 Pages 14. kW rSi dTfrn front N Gr~in Office) IS...SECURITY CLASS. (of thie report) Transportation Systems Center, Unclassified Broadway, MA 02142 I5a. DECLASSIFICATION/ DOWNGRADING SCHEDULE 16. DISTRIBUTION
Spatial cyberinfrastructures, ontologies, and the humanities.
Sieber, Renee E; Wellen, Christopher C; Jin, Yuan
2011-04-05
We report on research into building a cyberinfrastructure for Chinese biographical and geographic data. Our cyberinfrastructure contains (i) the McGill-Harvard-Yenching Library Ming Qing Women's Writings database (MQWW), the only online database on historical Chinese women's writings, (ii) the China Biographical Database, the authority for Chinese historical people, and (iii) the China Historical Geographical Information System, one of the first historical geographic information systems. Key to this integration is that linked databases retain separate identities as bases of knowledge, while they possess sufficient semantic interoperability to allow for multidatabase concepts and to support cross-database queries on an ad hoc basis. Computational ontologies create underlying semantics for database access. This paper focuses on the spatial component in a humanities cyberinfrastructure, which includes issues of conflicting data, heterogeneous data models, disambiguation, and geographic scale. First, we describe the methodology for integrating the databases. Then we detail the system architecture, which includes a tier of ontologies and schema. We describe the user interface and applications that allow for cross-database queries. For instance, users should be able to analyze the data, examine hypotheses on spatial and temporal relationships, and generate historical maps with datasets from MQWW for research, teaching, and publication on Chinese women writers, their familial relations, publishing venues, and the literary and social communities. Last, we discuss the social side of cyberinfrastructure development, as people are considered to be as critical as the technical components for its success.
An integrated approach to reservoir modeling
DOE Office of Scientific and Technical Information (OSTI.GOV)
Donaldson, K.
1993-08-01
The purpose of this research is to evaluate the usefulness of the following procedural and analytical methods in investigating the heterogeneity of the oil reserve for the Mississipian Big Injun Sandstone of the Granny Creek field, Clay and Roane counties, West Virginia: (1) relational database, (2) two-dimensional cross sections, (3) true three-dimensional modeling, (4) geohistory analysis, (5) a rule-based expert system, and (6) geographical information systems. The large data set could not be effectively integrated and interpreted without this approach. A relational database was designed to fully integrate three- and four-dimensional data. The database provides an effective means for maintainingmore » and manipulating the data. A two-dimensional cross section program was designed to correlate stratigraphy, depositional environments, porosity, permeability, and petrographic data. This flexible design allows for additional four-dimensional data. Dynamic Graphics[sup [trademark
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ball, G.; Kuznetsov, V.; Evans, D.
We present the Data Aggregation System, a system for information retrieval and aggregation from heterogenous sources of relational and non-relational data for the Compact Muon Solenoid experiment on the CERN Large Hadron Collider. The experiment currently has a number of organically-developed data sources, including front-ends to a number of different relational databases and non-database data services which do not share common data structures or APIs (Application Programming Interfaces), and cannot at this stage be readily converged. DAS provides a single interface for querying all these services, a caching layer to speed up access to expensive underlying calls and the abilitymore » to merge records from different data services pertaining to a single primary key.« less
Use of Graph Database for the Integration of Heterogeneous Biological Data.
Yoon, Byoung-Ha; Kim, Seon-Kyu; Kim, Seon-Young
2017-03-01
Understanding complex relationships among heterogeneous biological data is one of the fundamental goals in biology. In most cases, diverse biological data are stored in relational databases, such as MySQL and Oracle, which store data in multiple tables and then infer relationships by multiple-join statements. Recently, a new type of database, called the graph-based database, was developed to natively represent various kinds of complex relationships, and it is widely used among computer science communities and IT industries. Here, we demonstrate the feasibility of using a graph-based database for complex biological relationships by comparing the performance between MySQL and Neo4j, one of the most widely used graph databases. We collected various biological data (protein-protein interaction, drug-target, gene-disease, etc.) from several existing sources, removed duplicate and redundant data, and finally constructed a graph database containing 114,550 nodes and 82,674,321 relationships. When we tested the query execution performance of MySQL versus Neo4j, we found that Neo4j outperformed MySQL in all cases. While Neo4j exhibited a very fast response for various queries, MySQL exhibited latent or unfinished responses for complex queries with multiple-join statements. These results show that using graph-based databases, such as Neo4j, is an efficient way to store complex biological relationships. Moreover, querying a graph database in diverse ways has the potential to reveal novel relationships among heterogeneous biological data.
Use of Graph Database for the Integration of Heterogeneous Biological Data
Yoon, Byoung-Ha; Kim, Seon-Kyu
2017-01-01
Understanding complex relationships among heterogeneous biological data is one of the fundamental goals in biology. In most cases, diverse biological data are stored in relational databases, such as MySQL and Oracle, which store data in multiple tables and then infer relationships by multiple-join statements. Recently, a new type of database, called the graph-based database, was developed to natively represent various kinds of complex relationships, and it is widely used among computer science communities and IT industries. Here, we demonstrate the feasibility of using a graph-based database for complex biological relationships by comparing the performance between MySQL and Neo4j, one of the most widely used graph databases. We collected various biological data (protein-protein interaction, drug-target, gene-disease, etc.) from several existing sources, removed duplicate and redundant data, and finally constructed a graph database containing 114,550 nodes and 82,674,321 relationships. When we tested the query execution performance of MySQL versus Neo4j, we found that Neo4j outperformed MySQL in all cases. While Neo4j exhibited a very fast response for various queries, MySQL exhibited latent or unfinished responses for complex queries with multiple-join statements. These results show that using graph-based databases, such as Neo4j, is an efficient way to store complex biological relationships. Moreover, querying a graph database in diverse ways has the potential to reveal novel relationships among heterogeneous biological data. PMID:28416946
A Machine Reading System for Assembling Synthetic Paleontological Databases
Peters, Shanan E.; Zhang, Ce; Livny, Miron; Ré, Christopher
2014-01-01
Many aspects of macroevolutionary theory and our understanding of biotic responses to global environmental change derive from literature-based compilations of paleontological data. Existing manually assembled databases are, however, incomplete and difficult to assess and enhance with new data types. Here, we develop and validate the quality of a machine reading system, PaleoDeepDive, that automatically locates and extracts data from heterogeneous text, tables, and figures in publications. PaleoDeepDive performs comparably to humans in several complex data extraction and inference tasks and generates congruent synthetic results that describe the geological history of taxonomic diversity and genus-level rates of origination and extinction. Unlike traditional databases, PaleoDeepDive produces a probabilistic database that systematically improves as information is added. We show that the system can readily accommodate sophisticated data types, such as morphological data in biological illustrations and associated textual descriptions. Our machine reading approach to scientific data integration and synthesis brings within reach many questions that are currently underdetermined and does so in ways that may stimulate entirely new modes of inquiry. PMID:25436610
An XML-based Generic Tool for Information Retrieval in Solar Databases
NASA Astrophysics Data System (ADS)
Scholl, Isabelle F.; Legay, Eric; Linsolas, Romain
This paper presents the current architecture of the `Solar Web Project' now in its development phase. This tool will provide scientists interested in solar data with a single web-based interface for browsing distributed and heterogeneous catalogs of solar observations. The main goal is to have a generic application that can be easily extended to new sets of data or to new missions with a low level of maintenance. It is developed with Java and XML is used as a powerful configuration language. The server, independent of any database scheme, can communicate with a client (the user interface) and several local or remote archive access systems (such as existing web pages, ftp sites or SQL databases). Archive access systems are externally described in XML files. The user interface is also dynamically generated from an XML file containing the window building rules and a simplified database description. This project is developed at MEDOC (Multi-Experiment Data and Operations Centre), located at the Institut d'Astrophysique Spatiale (Orsay, France). Successful tests have been conducted with other solar archive access systems.
Adding Hierarchical Objects to Relational Database General-Purpose XML-Based Information Managements
NASA Technical Reports Server (NTRS)
Lin, Shu-Chun; Knight, Chris; La, Tracy; Maluf, David; Bell, David; Tran, Khai Peter; Gawdiak, Yuri
2006-01-01
NETMARK is a flexible, high-throughput software system for managing, storing, and rapid searching of unstructured and semi-structured documents. NETMARK transforms such documents from their original highly complex, constantly changing, heterogeneous data formats into well-structured, common data formats in using Hypertext Markup Language (HTML) and/or Extensible Markup Language (XML). The software implements an object-relational database system that combines the best practices of the relational model utilizing Structured Query Language (SQL) with those of the object-oriented, semantic database model for creating complex data. In particular, NETMARK takes advantage of the Oracle 8i object-relational database model using physical-address data types for very efficient keyword searches of records across both context and content. NETMARK also supports multiple international standards such as WEBDAV for drag-and-drop file management and SOAP for integrated information management using Web services. The document-organization and -searching capabilities afforded by NETMARK are likely to make this software attractive for use in disciplines as diverse as science, auditing, and law enforcement.
Combining computational models, semantic annotations and simulation experiments in a graph database
Henkel, Ron; Wolkenhauer, Olaf; Waltemath, Dagmar
2015-01-01
Model repositories such as the BioModels Database, the CellML Model Repository or JWS Online are frequently accessed to retrieve computational models of biological systems. However, their storage concepts support only restricted types of queries and not all data inside the repositories can be retrieved. In this article we present a storage concept that meets this challenge. It grounds on a graph database, reflects the models’ structure, incorporates semantic annotations and simulation descriptions and ultimately connects different types of model-related data. The connections between heterogeneous model-related data and bio-ontologies enable efficient search via biological facts and grant access to new model features. The introduced concept notably improves the access of computational models and associated simulations in a model repository. This has positive effects on tasks such as model search, retrieval, ranking, matching and filtering. Furthermore, our work for the first time enables CellML- and Systems Biology Markup Language-encoded models to be effectively maintained in one database. We show how these models can be linked via annotations and queried. Database URL: https://sems.uni-rostock.de/projects/masymos/ PMID:25754863
Numerical Model Sensitivity to Heterogeneous Satellite Derived Vegetation Roughness
NASA Technical Reports Server (NTRS)
Jasinski, Michael; Eastman, Joseph; Borak, Jordan
2011-01-01
The sensitivity of a mesoscale weather prediction model to a 1 km satellite-based vegetation roughness initialization is investigated for a domain within the south central United States. Three different roughness databases are employed: i) a control or standard lookup table roughness that is a function only of land cover type, ii) a spatially heterogeneous roughness database, specific to the domain, that was previously derived using a physically based procedure and Moderate Resolution Imaging Spectroradiometer (MODIS) imagery, and iii) a MODIS climatologic roughness database that like (i) is a function only of land cover type, but possesses domain specific mean values from (ii). The model used is the Weather Research and Forecast Model (WRF) coupled to the Community Land Model within the Land Information System (LIS). For each simulation, a statistical comparison is made between modeled results and ground observations within a domain including Oklahoma, Eastern Arkansas, and Northwest Louisiana during a 4-day period within IHOP 2002. Sensitivity analysis compares the impact the three roughness initializations on time-series temperature, precipitation probability of detection (POD), average wind speed, boundary layer height, and turbulent kinetic energy (TKE). Overall, the results indicate that, for the current investigation, replacement of the standard look-up table values with the satellite-derived values statistically improves model performance for most observed variables. Such natural roughness heterogeneity enhances the surface wind speed, PBL height and TKE production up to 10 percent, with a lesser effect over grassland, and greater effect over mixed land cover domains.
NCBI2RDF: enabling full RDF-based access to NCBI databases.
Anguita, Alberto; García-Remesal, Miguel; de la Iglesia, Diana; Maojo, Victor
2013-01-01
RDF has become the standard technology for enabling interoperability among heterogeneous biomedical databases. The NCBI provides access to a large set of life sciences databases through a common interface called Entrez. However, the latter does not provide RDF-based access to such databases, and, therefore, they cannot be integrated with other RDF-compliant databases and accessed via SPARQL query interfaces. This paper presents the NCBI2RDF system, aimed at providing RDF-based access to the complete NCBI data repository. This API creates a virtual endpoint for servicing SPARQL queries over different NCBI repositories and presenting to users the query results in SPARQL results format, thus enabling this data to be integrated and/or stored with other RDF-compliant repositories. SPARQL queries are dynamically resolved, decomposed, and forwarded to the NCBI-provided E-utilities programmatic interface to access the NCBI data. Furthermore, we show how our approach increases the expressiveness of the native NCBI querying system, allowing several databases to be accessed simultaneously. This feature significantly boosts productivity when working with complex queries and saves time and effort to biomedical researchers. Our approach has been validated with a large number of SPARQL queries, thus proving its reliability and enhanced capabilities in biomedical environments.
NASA Astrophysics Data System (ADS)
Shahini Shamsabadi, Salar
A web-based PAVEment MONitoring system, PAVEMON, is a GIS oriented platform for accommodating, representing, and leveraging data from a multi-modal mobile sensor system. Stated sensor system consists of acoustic, optical, electromagnetic, and GPS sensors and is capable of producing as much as 1 Terabyte of data per day. Multi-channel raw sensor data (microphone, accelerometer, tire pressure sensor, video) and processed results (road profile, crack density, international roughness index, micro texture depth, etc.) are outputs of this sensor system. By correlating the sensor measurements and positioning data collected in tight time synchronization, PAVEMON attaches a spatial component to all the datasets. These spatially indexed outputs are placed into an Oracle database which integrates seamlessly with PAVEMON's web-based system. The web-based system of PAVEMON consists of two major modules: 1) a GIS module for visualizing and spatial analysis of pavement condition information layers, and 2) a decision-support module for managing maintenance and repair (Mℝ) activities and predicting future budget needs. PAVEMON weaves together sensor data with third-party climate and traffic information from the National Oceanic and Atmospheric Administration (NOAA) and Long Term Pavement Performance (LTPP) databases for an organized data driven approach to conduct pavement management activities. PAVEMON deals with heterogeneous and redundant observations by fusing them for jointly-derived higher-confidence results. A prominent example of the fusion algorithms developed within PAVEMON is a data fusion algorithm used for estimating the overall pavement conditions in terms of ASTM's Pavement Condition Index (PCI). PAVEMON predicts PCI by undertaking a statistical fusion approach and selecting a subset of all the sensor measurements. Other fusion algorithms include noise-removal algorithms to remove false negatives in the sensor data in addition to fusion algorithms developed for identifying features on the road. PAVEMON offers an ideal research and monitoring platform for rapid, intelligent and comprehensive evaluation of tomorrow's transportation infrastructure based on up-to-date data from heterogeneous sensor systems.
Competitive-Cooperative Automated Reasoning from Distributed and Multiple Source of Data
NASA Astrophysics Data System (ADS)
Fard, Amin Milani
Knowledge extraction from distributed database systems, have been investigated during past decade in order to analyze billions of information records. In this work a competitive deduction approach in a heterogeneous data grid environment is proposed using classic data mining and statistical methods. By applying a game theory concept in a multi-agent model, we tried to design a policy for hierarchical knowledge discovery and inference fusion. To show the system run, a sample multi-expert system has also been developed.
Spatial cyberinfrastructures, ontologies, and the humanities
Sieber, Renee E.; Wellen, Christopher C.; Jin, Yuan
2011-01-01
We report on research into building a cyberinfrastructure for Chinese biographical and geographic data. Our cyberinfrastructure contains (i) the McGill-Harvard-Yenching Library Ming Qing Women's Writings database (MQWW), the only online database on historical Chinese women's writings, (ii) the China Biographical Database, the authority for Chinese historical people, and (iii) the China Historical Geographical Information System, one of the first historical geographic information systems. Key to this integration is that linked databases retain separate identities as bases of knowledge, while they possess sufficient semantic interoperability to allow for multidatabase concepts and to support cross-database queries on an ad hoc basis. Computational ontologies create underlying semantics for database access. This paper focuses on the spatial component in a humanities cyberinfrastructure, which includes issues of conflicting data, heterogeneous data models, disambiguation, and geographic scale. First, we describe the methodology for integrating the databases. Then we detail the system architecture, which includes a tier of ontologies and schema. We describe the user interface and applications that allow for cross-database queries. For instance, users should be able to analyze the data, examine hypotheses on spatial and temporal relationships, and generate historical maps with datasets from MQWW for research, teaching, and publication on Chinese women writers, their familial relations, publishing venues, and the literary and social communities. Last, we discuss the social side of cyberinfrastructure development, as people are considered to be as critical as the technical components for its success. PMID:21444819
Astaras, Alexander; Arvanitidou, Marina; Chouvarda, Ioanna; Kilintzis, Vassilis; Koutkias, Vassilis; Sanchez, Eduardo Monton; Stalidis, George; Triantafyllidis, Andreas; Maglaveras, Nicos
2008-01-01
A flexible, scaleable and cost-effective medical telemetry system is described for monitoring sleep-related disorders in the home environment. The system was designed and built for real-time data acquisition and processing, allowing for additional use in intensive care unit scenarios where rapid medical response is required in case of emergency. It comprises a wearable body area network of Zigbee-compatible wireless sensors worn by the subject, a central database repository residing in the medical centre and thin client workstations located at the subject's home and in the clinician's office. The system supports heterogeneous setup configurations, involving a variety of data acquisition sensors to suit several medical applications. All telemetry data is securely transferred and stored in the central database under the clinicians' ownership and control.
A Support Database System for Integrated System Health Management (ISHM)
NASA Technical Reports Server (NTRS)
Schmalzel, John; Figueroa, Jorge F.; Turowski, Mark; Morris, John
2007-01-01
The development, deployment, operation and maintenance of Integrated Systems Health Management (ISHM) applications require the storage and processing of tremendous amounts of low-level data. This data must be shared in a secure and cost-effective manner between developers, and processed within several heterogeneous architectures. Modern database technology allows this data to be organized efficiently, while ensuring the integrity and security of the data. The extensibility and interoperability of the current database technologies also allows for the creation of an associated support database system. A support database system provides additional capabilities by building applications on top of the database structure. These applications can then be used to support the various technologies in an ISHM architecture. This presentation and paper propose a detailed structure and application description for a support database system, called the Health Assessment Database System (HADS). The HADS provides a shared context for organizing and distributing data as well as a definition of the applications that provide the required data-driven support to ISHM. This approach provides another powerful tool for ISHM developers, while also enabling novel functionality. This functionality includes: automated firmware updating and deployment, algorithm development assistance and electronic datasheet generation. The architecture for the HADS has been developed as part of the ISHM toolset at Stennis Space Center for rocket engine testing. A detailed implementation has begun for the Methane Thruster Testbed Project (MTTP) in order to assist in developing health assessment and anomaly detection algorithms for ISHM. The structure of this implementation is shown in Figure 1. The database structure consists of three primary components: the system hierarchy model, the historical data archive and the firmware codebase. The system hierarchy model replicates the physical relationships between system elements to provide the logical context for the database. The historical data archive provides a common repository for sensor data that can be shared between developers and applications. The firmware codebase is used by the developer to organize the intelligent element firmware into atomic units which can be assembled into complete firmware for specific elements.
Using Web Ontology Language to Integrate Heterogeneous Databases in the Neurosciences
Lam, Hugo Y.K.; Marenco, Luis; Shepherd, Gordon M.; Miller, Perry L.; Cheung, Kei-Hoi
2006-01-01
Integrative neuroscience involves the integration and analysis of diverse types of neuroscience data involving many different experimental techniques. This data will increasingly be distributed across many heterogeneous databases that are web-accessible. Currently, these databases do not expose their schemas (database structures) and their contents to web applications/agents in a standardized, machine-friendly way. This limits database interoperation. To address this problem, we describe a pilot project that illustrates how neuroscience databases can be expressed using the Web Ontology Language, which is a semantically-rich ontological language, as a common data representation language to facilitate complex cross-database queries. In this pilot project, an existing tool called “D2RQ” was used to translate two neuroscience databases (NeuronDB and CoCoDat) into OWL, and the resulting OWL ontologies were then merged. An OWL-based reasoner (Racer) was then used to provide a sophisticated query language (nRQL) to perform integrated queries across the two databases based on the merged ontology. This pilot project is one step toward exploring the use of semantic web technologies in the neurosciences. PMID:17238384
Bravo, Carlos; Suarez, Carlos; González, Carolina; López, Diego; Blobel, Bernd
2014-01-01
Healthcare information is distributed through multiple heterogeneous and autonomous systems. Access to, and sharing of, distributed information sources are a challenging task. To contribute to meeting this challenge, this paper presents a formal, complete and semi-automatic transformation service from Relational Databases to Web Ontology Language. The proposed service makes use of an algorithm that allows to transform several data models of different domains by deploying mainly inheritance rules. The paper emphasizes the relevance of integrating the proposed approach into an ontology-based interoperability service to achieve semantic interoperability.
Glance Information System for ATLAS Management
NASA Astrophysics Data System (ADS)
Grael, F. F.; Maidantchik, C.; Évora, L. H. R. A.; Karam, K.; Moraes, L. O. F.; Cirilli, M.; Nessi, M.; Pommès, K.; ATLAS Collaboration
2011-12-01
ATLAS Experiment is an international collaboration where more than 37 countries, 172 institutes and laboratories, 2900 physicists, engineers, and computer scientists plus 700 students participate. The management of this teamwork involves several aspects such as institute contribution, employment records, members' appointment, authors' list, preparation and publication of papers and speakers nomination. Previously, most of the information was accessible by a limited group and developers had to face problems such as different terminology, diverse data modeling, heterogeneous databases and unlike users needs. Moreover, the systems were not designed to handle new requirements. The maintenance has to be an easy task due to the long lifetime experiment and professionals turnover. The Glance system, a generic mechanism for accessing any database, acts as an intermediate layer isolating the user from the particularities of each database. It retrieves, inserts and updates the database independently of its technology and modeling. Relying on Glance, a group of systems were built to support the ATLAS management and operation aspects: ATLAS Membership, ATLAS Appointments, ATLAS Speakers, ATLAS Analysis Follow-Up, ATLAS Conference Notes, ATLAS Thesis, ATLAS Traceability and DSS Alarms Viewer. This paper presents the overview of the Glance information framework and describes the privilege mechanism developed to grant different level of access for each member and system.
Testing in Service-Oriented Environments
2010-03-01
software releases (versions, service packs, vulnerability patches) for one com- mon ESB during the 13-month period from January 1, 2008 through...impact on quality of service : Unlike traditional software compo- nents, a single instance of a web service can be used by multiple consumers. Since the...distributed, with heterogeneous hardware and software (SOA infrastructure, services , operating systems, and databases). Because of cost and security, it
Bernstein, Inge T; Lindorff-Larsen, Karen; Timshel, Susanne; Brandt, Carsten A; Dinesen, Birger; Fenger, Mogens; Gerdes, Anne-Marie; Iversen, Lene H; Madsen, Mogens R; Okkels, Henrik; Sunde, Lone; Rahr, Hans B; Wikman, Friedrick P; Rossing, Niels
2011-05-01
The Danish HNPCC register is a publically financed national database. The register gathers epidemiological and genomic data in HNPCC families to improve prognosis by screening and identifying family members at risk. Diagnostic data are generated throughout the country and collected over several decades. Until recently, paper-based reports were sent to the register and typed into the database. In the EC cofunded-INFOBIOMED network of excellence, the register was a model for electronic exchange of epidemiological and genomic data between diagnosing/treating departments and the central database. The aim of digitization was to optimize the organization of screening by facilitating combination of genotype-phenotype information, and to generate IT-tools sufficiently usable and generic to be implemented in other countries and for other oncogenetic diseases. The focus was on integration of heterogeneous data, elaboration, and dissemination of classification systems and development of communication standards. At the conclusion of the EU project in 2007 the system was implemented in 12 pilot departments. In the surgical departments this resulted in a 192% increase of reports to the database. Several gaps were identified: lack of standards for data to be exchanged, lack of local databases suitable for direct communication, reporting being time-consuming and dependent on interest and feedback. © 2011 Wiley-Liss, Inc.
A Scalable Data Access Layer to Manage Structured Heterogeneous Biomedical Data.
Delussu, Giovanni; Lianas, Luca; Frexia, Francesca; Zanetti, Gianluigi
2016-01-01
This work presents a scalable data access layer, called PyEHR, designed to support the implementation of data management systems for secondary use of structured heterogeneous biomedical and clinical data. PyEHR adopts the openEHR's formalisms to guarantee the decoupling of data descriptions from implementation details and exploits structure indexing to accelerate searches. Data persistence is guaranteed by a driver layer with a common driver interface. Interfaces for two NoSQL Database Management Systems are already implemented: MongoDB and Elasticsearch. We evaluated the scalability of PyEHR experimentally through two types of tests, called "Constant Load" and "Constant Number of Records", with queries of increasing complexity on synthetic datasets of ten million records each, containing very complex openEHR archetype structures, distributed on up to ten computing nodes.
Advanced techniques for the storage and use of very large, heterogeneous spatial databases
NASA Technical Reports Server (NTRS)
Peuquet, Donna J.
1987-01-01
Progress is reported in the development of a prototype knowledge-based geographic information system. The overall purpose of this project is to investigate and demonstrate the use of advanced methods in order to greatly improve the capabilities of geographic information system technology in the handling of large, multi-source collections of spatial data in an efficient manner, and to make these collections of data more accessible and usable for the Earth scientist.
Interoperability of GADU in using heterogeneous Grid resources for bioinformatics applications.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sulakhe, D.; Rodriguez, A.; Wilde, M.
2008-03-01
Bioinformatics tools used for efficient and computationally intensive analysis of genetic sequences require large-scale computational resources to accommodate the growing data. Grid computational resources such as the Open Science Grid and TeraGrid have proved useful for scientific discovery. The genome analysis and database update system (GADU) is a high-throughput computational system developed to automate the steps involved in accessing the Grid resources for running bioinformatics applications. This paper describes the requirements for building an automated scalable system such as GADU that can run jobs on different Grids. The paper describes the resource-independent configuration of GADU using the Pegasus-based virtual datamore » system that makes high-throughput computational tools interoperable on heterogeneous Grid resources. The paper also highlights the features implemented to make GADU a gateway to computationally intensive bioinformatics applications on the Grid. The paper will not go into the details of problems involved or the lessons learned in using individual Grid resources as it has already been published in our paper on genome analysis research environment (GNARE) and will focus primarily on the architecture that makes GADU resource independent and interoperable across heterogeneous Grid resources.« less
Informatics in radiology: use of CouchDB for document-based storage of DICOM objects.
Rascovsky, Simón J; Delgado, Jorge A; Sanz, Alexander; Calvo, Víctor D; Castrillón, Gabriel
2012-01-01
Picture archiving and communication systems traditionally have depended on schema-based Structured Query Language (SQL) databases for imaging data management. To optimize database size and performance, many such systems store a reduced set of Digital Imaging and Communications in Medicine (DICOM) metadata, discarding informational content that might be needed in the future. As an alternative to traditional database systems, document-based key-value stores recently have gained popularity. These systems store documents containing key-value pairs that facilitate data searches without predefined schemas. Document-based key-value stores are especially suited to archive DICOM objects because DICOM metadata are highly heterogeneous collections of tag-value pairs conveying specific information about imaging modalities, acquisition protocols, and vendor-supported postprocessing options. The authors used an open-source document-based database management system (Apache CouchDB) to create and test two such databases; CouchDB was selected for its overall ease of use, capability for managing attachments, and reliance on HTTP and Representational State Transfer standards for accessing and retrieving data. A large database was created first in which the DICOM metadata from 5880 anonymized magnetic resonance imaging studies (1,949,753 images) were loaded by using a Ruby script. To provide the usual DICOM query functionality, several predefined "views" (standard queries) were created by using JavaScript. For performance comparison, the same queries were executed in both the CouchDB database and a SQL-based DICOM archive. The capabilities of CouchDB for attachment management and database replication were separately assessed in tests of a similar, smaller database. Results showed that CouchDB allowed efficient storage and interrogation of all DICOM objects; with the use of information retrieval algorithms such as map-reduce, all the DICOM metadata stored in the large database were searchable with only a minimal increase in retrieval time over that with the traditional database management system. Results also indicated possible uses for document-based databases in data mining applications such as dose monitoring, quality assurance, and protocol optimization. RSNA, 2012
BioCarian: search engine for exploratory searches in heterogeneous biological databases.
Zaki, Nazar; Tennakoon, Chandana
2017-10-02
There are a large number of biological databases publicly available for scientists in the web. Also, there are many private databases generated in the course of research projects. These databases are in a wide variety of formats. Web standards have evolved in the recent times and semantic web technologies are now available to interconnect diverse and heterogeneous sources of data. Therefore, integration and querying of biological databases can be facilitated by techniques used in semantic web. Heterogeneous databases can be converted into Resource Description Format (RDF) and queried using SPARQL language. Searching for exact queries in these databases is trivial. However, exploratory searches need customized solutions, especially when multiple databases are involved. This process is cumbersome and time consuming for those without a sufficient background in computer science. In this context, a search engine facilitating exploratory searches of databases would be of great help to the scientific community. We present BioCarian, an efficient and user-friendly search engine for performing exploratory searches on biological databases. The search engine is an interface for SPARQL queries over RDF databases. We note that many of the databases can be converted to tabular form. We first convert the tabular databases to RDF. The search engine provides a graphical interface based on facets to explore the converted databases. The facet interface is more advanced than conventional facets. It allows complex queries to be constructed, and have additional features like ranking of facet values based on several criteria, visually indicating the relevance of a facet value and presenting the most important facet values when a large number of choices are available. For the advanced users, SPARQL queries can be run directly on the databases. Using this feature, users will be able to incorporate federated searches of SPARQL endpoints. We used the search engine to do an exploratory search on previously published viral integration data and were able to deduce the main conclusions of the original publication. BioCarian is accessible via http://www.biocarian.com . We have developed a search engine to explore RDF databases that can be used by both novice and advanced users.
Corwin, John; Silberschatz, Avi; Miller, Perry L; Marenco, Luis
2007-01-01
Data sparsity and schema evolution issues affecting clinical informatics and bioinformatics communities have led to the adoption of vertical or object-attribute-value-based database schemas to overcome limitations posed when using conventional relational database technology. This paper explores these issues and discusses why biomedical data are difficult to model using conventional relational techniques. The authors propose a solution to these obstacles based on a relational database engine using a sparse, column-store architecture. The authors provide benchmarks comparing the performance of queries and schema-modification operations using three different strategies: (1) the standard conventional relational design; (2) past approaches used by biomedical informatics researchers; and (3) their sparse, column-store architecture. The performance results show that their architecture is a promising technique for storing and processing many types of data that are not handled well by the other two semantic data models.
Heterogeneous Face Attribute Estimation: A Deep Multi-Task Learning Approach.
Han, Hu; K Jain, Anil; Shan, Shiguang; Chen, Xilin
2017-08-10
Face attribute estimation has many potential applications in video surveillance, face retrieval, and social media. While a number of methods have been proposed for face attribute estimation, most of them did not explicitly consider the attribute correlation and heterogeneity (e.g., ordinal vs. nominal and holistic vs. local) during feature representation learning. In this paper, we present a Deep Multi-Task Learning (DMTL) approach to jointly estimate multiple heterogeneous attributes from a single face image. In DMTL, we tackle attribute correlation and heterogeneity with convolutional neural networks (CNNs) consisting of shared feature learning for all the attributes, and category-specific feature learning for heterogeneous attributes. We also introduce an unconstrained face database (LFW+), an extension of public-domain LFW, with heterogeneous demographic attributes (age, gender, and race) obtained via crowdsourcing. Experimental results on benchmarks with multiple face attributes (MORPH II, LFW+, CelebA, LFWA, and FotW) show that the proposed approach has superior performance compared to state of the art. Finally, evaluations on a public-domain face database (LAP) with a single attribute show that the proposed approach has excellent generalization ability.
NCBI2RDF: Enabling Full RDF-Based Access to NCBI Databases
Anguita, Alberto; García-Remesal, Miguel; de la Iglesia, Diana; Maojo, Victor
2013-01-01
RDF has become the standard technology for enabling interoperability among heterogeneous biomedical databases. The NCBI provides access to a large set of life sciences databases through a common interface called Entrez. However, the latter does not provide RDF-based access to such databases, and, therefore, they cannot be integrated with other RDF-compliant databases and accessed via SPARQL query interfaces. This paper presents the NCBI2RDF system, aimed at providing RDF-based access to the complete NCBI data repository. This API creates a virtual endpoint for servicing SPARQL queries over different NCBI repositories and presenting to users the query results in SPARQL results format, thus enabling this data to be integrated and/or stored with other RDF-compliant repositories. SPARQL queries are dynamically resolved, decomposed, and forwarded to the NCBI-provided E-utilities programmatic interface to access the NCBI data. Furthermore, we show how our approach increases the expressiveness of the native NCBI querying system, allowing several databases to be accessed simultaneously. This feature significantly boosts productivity when working with complex queries and saves time and effort to biomedical researchers. Our approach has been validated with a large number of SPARQL queries, thus proving its reliability and enhanced capabilities in biomedical environments. PMID:23984425
Ontology-based knowledge representation for resolution of semantic heterogeneity in GIS
NASA Astrophysics Data System (ADS)
Liu, Ying; Xiao, Han; Wang, Limin; Han, Jialing
2017-07-01
Lack of semantic interoperability in geographical information systems has been identified as the main obstacle for data sharing and database integration. The new method should be found to overcome the problems of semantic heterogeneity. Ontologies are considered to be one approach to support geographic information sharing. This paper presents an ontology-driven integration approach to help in detecting and possibly resolving semantic conflicts. Its originality is that each data source participating in the integration process contains an ontology that defines the meaning of its own data. This approach ensures the automation of the integration through regulation of semantic integration algorithm. Finally, land classification in field GIS is described as the example.
Towards a Global Service Registry for the World-Wide LHC Computing Grid
NASA Astrophysics Data System (ADS)
Field, Laurence; Alandes Pradillo, Maria; Di Girolamo, Alessandro
2014-06-01
The World-Wide LHC Computing Grid encompasses a set of heterogeneous information systems; from central portals such as the Open Science Grid's Information Management System and the Grid Operations Centre Database, to the WLCG information system, where the information sources are the Grid services themselves. Providing a consistent view of the information, which involves synchronising all these informations systems, is a challenging activity that has lead the LHC virtual organisations to create their own configuration databases. This experience, whereby each virtual organisation's configuration database interfaces with multiple information systems, has resulted in the duplication of effort, especially relating to the use of manual checks for the handling of inconsistencies. The Global Service Registry aims to address this issue by providing a centralised service that aggregates information from multiple information systems. It shows both information on registered resources (i.e. what should be there) and available resources (i.e. what is there). The main purpose is to simplify the synchronisation of the virtual organisation's own configuration databases, which are used for job submission and data management, through the provision of a single interface for obtaining all the information. By centralising the information, automated consistency and validation checks can be performed to improve the overall quality of information provided. Although internally the GLUE 2.0 information model is used for the purpose of integration, the Global Service Registry in not dependent on any particular information model for ingestion or dissemination. The intention is to allow the virtual organisation's configuration databases to be decoupled from the underlying information systems in a transparent way and hence simplify any possible future migration due to the evolution of those systems. This paper presents the Global Service Registry architecture, its advantages compared to the current situation and how it can support the evolution of information systems.
A Scalable Data Access Layer to Manage Structured Heterogeneous Biomedical Data
Lianas, Luca; Frexia, Francesca; Zanetti, Gianluigi
2016-01-01
This work presents a scalable data access layer, called PyEHR, designed to support the implementation of data management systems for secondary use of structured heterogeneous biomedical and clinical data. PyEHR adopts the openEHR’s formalisms to guarantee the decoupling of data descriptions from implementation details and exploits structure indexing to accelerate searches. Data persistence is guaranteed by a driver layer with a common driver interface. Interfaces for two NoSQL Database Management Systems are already implemented: MongoDB and Elasticsearch. We evaluated the scalability of PyEHR experimentally through two types of tests, called “Constant Load” and “Constant Number of Records”, with queries of increasing complexity on synthetic datasets of ten million records each, containing very complex openEHR archetype structures, distributed on up to ten computing nodes. PMID:27936191
Bockholt, Henry J.; Scully, Mark; Courtney, William; Rachakonda, Srinivas; Scott, Adam; Caprihan, Arvind; Fries, Jill; Kalyanam, Ravi; Segall, Judith M.; de la Garza, Raul; Lane, Susan; Calhoun, Vince D.
2009-01-01
A neuroinformatics (NI) system is critical to brain imaging research in order to shorten the time between study conception and results. Such a NI system is required to scale well when large numbers of subjects are studied. Further, when multiple sites participate in research projects organizational issues become increasingly difficult. Optimized NI applications mitigate these problems. Additionally, NI software enables coordination across multiple studies, leveraging advantages potentially leading to exponential research discoveries. The web-based, Mind Research Network (MRN), database system has been designed and improved through our experience with 200 research studies and 250 researchers from seven different institutions. The MRN tools permit the collection, management, reporting and efficient use of large scale, heterogeneous data sources, e.g., multiple institutions, multiple principal investigators, multiple research programs and studies, and multimodal acquisitions. We have collected and analyzed data sets on thousands of research participants and have set up a framework to automatically analyze the data, thereby making efficient, practical data mining of this vast resource possible. This paper presents a comprehensive framework for capturing and analyzing heterogeneous neuroscience research data sources that has been fully optimized for end-users to perform novel data mining. PMID:20461147
Design of a Multi Dimensional Database for the Archimed DataWarehouse.
Bréant, Claudine; Thurler, Gérald; Borst, François; Geissbuhler, Antoine
2005-01-01
The Archimed data warehouse project started in 1993 at the Geneva University Hospital. It has progressively integrated seven data marts (or domains of activity) archiving medical data such as Admission/Discharge/Transfer (ADT) data, laboratory results, radiology exams, diagnoses, and procedure codes. The objective of the Archimed data warehouse is to facilitate the access to an integrated and coherent view of patient medical in order to support analytical activities such as medical statistics, clinical studies, retrieval of similar cases and data mining processes. This paper discusses three principal design aspects relative to the conception of the database of the data warehouse: 1) the granularity of the database, which refers to the level of detail or summarization of data, 2) the database model and architecture, describing how data will be presented to end users and how new data is integrated, 3) the life cycle of the database, in order to ensure long term scalability of the environment. Both, the organization of patient medical data using a standardized elementary fact representation and the use of the multi dimensional model have proved to be powerful design tools to integrate data coming from the multiple heterogeneous database systems part of the transactional Hospital Information System (HIS). Concurrently, the building of the data warehouse in an incremental way has helped to control the evolution of the data content. These three design aspects bring clarity and performance regarding data access. They also provide long term scalability to the system and resilience to further changes that may occur in source systems feeding the data warehouse.
Building Community Around Hydrologic Data Models Within CUAHSI
NASA Astrophysics Data System (ADS)
Maidment, D.
2007-12-01
The Consortium of Universities for the Advancement of Hydrologic Science, Inc (CUAHSI) has a Hydrologic Information Systems project which aims to provide better data access and capacity for data synthesis for the nation's water information, both that collected by academic investigators and that collected by water agencies. These data include observations of streamflow, water quality, groundwater levels, weather and climate and aquatic biology. Each water agency or research investigator has a unique method of formatting their data (syntactic heterogeneity) and describing their variables (semantic heterogeneity). The result is a large agglomeration of data in many formats and descriptions whose full content is hard to interpret and analyze. CUAHSI is helping to resolve syntactic heterogeneity through the development of WaterML, a standard XML markup language for communicating water observations data through web services, and a standard relational database structure for archiving data called the Observations Data Model. Variables in these data archiving and communicating systems are indexed against a controlled vocabulary of descriptive terms to provide the capacity to synthesize common data types from disparate data sources.
Engineering the object-relation database model in O-Raid
NASA Technical Reports Server (NTRS)
Dewan, Prasun; Vikram, Ashish; Bhargava, Bharat
1989-01-01
Raid is a distributed database system based on the relational model. O-raid is an extension of the Raid system and will support complex data objects. The design of O-Raid is evolutionary and retains all features of relational data base systems and those of a general purpose object-oriented programming language. O-Raid has several novel properties. Objects, classes, and inheritance are supported together with a predicate-base relational query language. O-Raid objects are compatible with C++ objects and may be read and manipulated by a C++ program without any 'impedance mismatch'. Relations and columns within relations may themselves be treated as objects with associated variables and methods. Relations may contain heterogeneous objects, that is, objects of more than one class in a certain column, which can individually evolve by being reclassified. Special facilities are provided to reduce the data search in a relation containing complex objects.
Pantazatos, Spiro P.; Li, Jianrong; Pavlidis, Paul; Lussier, Yves A.
2009-01-01
An approach towards heterogeneous neuroscience dataset integration is proposed that uses Natural Language Processing (NLP) and a knowledge-based phenotype organizer system (PhenOS) to link ontology-anchored terms to underlying data from each database, and then maps these terms based on a computable model of disease (SNOMED CT®). The approach was implemented using sample datasets from fMRIDC, GEO, The Whole Brain Atlas and Neuronames, and allowed for complex queries such as “List all disorders with a finding site of brain region X, and then find the semantically related references in all participating databases based on the ontological model of the disease or its anatomical and morphological attributes”. Precision of the NLP-derived coding of the unstructured phenotypes in each dataset was 88% (n = 50), and precision of the semantic mapping between these terms across datasets was 98% (n = 100). To our knowledge, this is the first example of the use of both semantic decomposition of disease relationships and hierarchical information found in ontologies to integrate heterogeneous phenotypes across clinical and molecular datasets. PMID:20495688
Gaeta, M; Campanella, F; Gentile, L; Schifino, G M; Capasso, L; Bandera, F; Banfi, G; Arpesella, M; Ricci, C
2017-01-01
The circulatory diseases, in particular ischemic heart diseases and stroke, represent the main causes of death worldwide both in high income and in middle and low income countries. Our aim is to provide a comprehensive report to depict the circulatory disease mortality in Europe over the last 30 years and to address the sources of heterogeneity among different countries. Our study was performed using the WHO statistical information system - mortality database - and was restricted to the 28 countries belonging to the European Union (EU-28). We evaluated gender and age time series of all circulatory disease mortality, ischemic heart diseases, cerebrovascular diseases, pulmonary and other circulatory diseases and than we performed forecast for 2016. Mortality heterogeneity was evaluated by countries using the Cochrane Q statistic and the I-squared index. Between 1985 and 2011 SDR for deaths attributable to all circulatory system diseases decreased from 440.9 to 212.0 x 100,000 in EU-28 and a clear uniform reduction was observed. Heterogeneity among countries was found to be consistent, therefore different analysis were carried out considering geographical area. We forecast a reduction in European cardiovascular mortality. Heterogeneity among countries could only in part be explained by both geographical and health expenditure factors.
LHCb Conditions database operation assistance systems
NASA Astrophysics Data System (ADS)
Clemencic, M.; Shapoval, I.; Cattaneo, M.; Degaudenzi, H.; Santinelli, R.
2012-12-01
The Conditions Database (CondDB) of the LHCb experiment provides versioned, time dependent geometry and conditions data for all LHCb data processing applications (simulation, high level trigger (HLT), reconstruction, analysis) in a heterogeneous computing environment ranging from user laptops to the HLT farm and the Grid. These different use cases impose front-end support for multiple database technologies (Oracle and SQLite are used). Sophisticated distribution tools are required to ensure timely and robust delivery of updates to all environments. The content of the database has to be managed to ensure that updates are internally consistent and externally compatible with multiple versions of the physics application software. In this paper we describe three systems that we have developed to address these issues. The first system is a CondDB state tracking extension to the Oracle 3D Streams replication technology, to trap cases when the CondDB replication was corrupted. Second, an automated distribution system for the SQLite-based CondDB, providing also smart backup and checkout mechanisms for the CondDB managers and LHCb users respectively. And, finally, a system to verify and monitor the internal (CondDB self-consistency) and external (LHCb physics software vs. CondDB) compatibility. The former two systems are used in production in the LHCb experiment and have achieved the desired goal of higher flexibility and robustness for the management and operation of the CondDB. The latter one has been fully designed and is passing currently to the implementation stage.
Wendling, T; Jung, K; Callahan, A; Schuler, A; Shah, N H; Gallego, B
2018-06-03
There is growing interest in using routinely collected data from health care databases to study the safety and effectiveness of therapies in "real-world" conditions, as it can provide complementary evidence to that of randomized controlled trials. Causal inference from health care databases is challenging because the data are typically noisy, high dimensional, and most importantly, observational. It requires methods that can estimate heterogeneous treatment effects while controlling for confounding in high dimensions. Bayesian additive regression trees, causal forests, causal boosting, and causal multivariate adaptive regression splines are off-the-shelf methods that have shown good performance for estimation of heterogeneous treatment effects in observational studies of continuous outcomes. However, it is not clear how these methods would perform in health care database studies where outcomes are often binary and rare and data structures are complex. In this study, we evaluate these methods in simulation studies that recapitulate key characteristics of comparative effectiveness studies. We focus on the conditional average effect of a binary treatment on a binary outcome using the conditional risk difference as an estimand. To emulate health care database studies, we propose a simulation design where real covariate and treatment assignment data are used and only outcomes are simulated based on nonparametric models of the real outcomes. We apply this design to 4 published observational studies that used records from 2 major health care databases in the United States. Our results suggest that Bayesian additive regression trees and causal boosting consistently provide low bias in conditional risk difference estimates in the context of health care database studies. Copyright © 2018 John Wiley & Sons, Ltd.
La Gamba, Fabiola; Corrao, Giovanni; Romio, Silvana; Sturkenboom, Miriam; Trifirò, Gianluca; Schink, Tania; de Ridder, Maria
2017-10-01
Clustering of patients in databases is usually ignored in one-stage meta-analysis of multi-database studies using matched case-control data. The aim of this study was to compare bias and efficiency of such a one-stage meta-analysis with a two-stage meta-analysis. First, we compared the approaches by generating matched case-control data under 5 simulated scenarios, built by varying: (1) the exposure-outcome association; (2) its variability among databases; (3) the confounding strength of one covariate on this association; (4) its variability; and (5) the (heterogeneous) confounding strength of two covariates. Second, we made the same comparison using empirical data from the ARITMO project, a multiple database study investigating the risk of ventricular arrhythmia following the use of medications with arrhythmogenic potential. In our study, we specifically investigated the effect of current use of promethazine. Bias increased for one-stage meta-analysis with increasing (1) between-database variance of exposure effect and (2) heterogeneous confounding generated by two covariates. The efficiency of one-stage meta-analysis was slightly lower than that of two-stage meta-analysis for the majority of investigated scenarios. Based on ARITMO data, there were no evident differences between one-stage (OR = 1.50, CI = [1.08; 2.08]) and two-stage (OR = 1.55, CI = [1.12; 2.16]) approaches. When the effect of interest is heterogeneous, a one-stage meta-analysis ignoring clustering gives biased estimates. Two-stage meta-analysis generates estimates at least as accurate and precise as one-stage meta-analysis. However, in a study using small databases and rare exposures and/or outcomes, a correct one-stage meta-analysis becomes essential. Copyright © 2017 John Wiley & Sons, Ltd.
NASA Astrophysics Data System (ADS)
Valach, J.; Štefcová, P.; Bruna, R.; Zemánek, P.
2017-08-01
This paper outlines recently started project dedicated to creation and development of information system for cuneiform tablets. The contribution deals with the architecture of a virtual collection of cuneiform tablets, conceived as a complex system combining and integrating several domains of information obtained from various types of analyses. The research team includes experts from the field of collection conservation with philologists and researchers in the 3D scanning and physical measurement. Multidisciplinary databases like the one described, represent a new tool in digital humanities and help to improve accessibility of collections to public and researchers.
Chen, Po-Hao; Loehfelm, Thomas W; Kamer, Aaron P; Lemmon, Andrew B; Cook, Tessa S; Kohli, Marc D
2016-12-01
The residency review committee of the Accreditation Council of Graduate Medical Education (ACGME) collects data on resident exam volume and sets minimum requirements. However, this data is not made readily available, and the ACGME does not share their tools or methodology. It is therefore difficult to assess the integrity of the data and determine if it truly reflects relevant aspects of the resident experience. This manuscript describes our experience creating a multi-institutional case log, incorporating data from three American diagnostic radiology residency programs. Each of the three sites independently established automated query pipelines from the various radiology information systems in their respective hospital groups, thereby creating a resident-specific database. Then, the three institutional resident case log databases were aggregated into a single centralized database schema. Three hundred thirty residents and 2,905,923 radiologic examinations over a 4-year span were catalogued using 11 ACGME categories. Our experience highlights big data challenges including internal data heterogeneity and external data discrepancies faced by informatics researchers.
Integrating the Allen Brain Institute Cell Types Database into Automated Neuroscience Workflow.
Stockton, David B; Santamaria, Fidel
2017-10-01
We developed software tools to download, extract features, and organize the Cell Types Database from the Allen Brain Institute (ABI) in order to integrate its whole cell patch clamp characterization data into the automated modeling/data analysis cycle. To expand the potential user base we employed both Python and MATLAB. The basic set of tools downloads selected raw data and extracts cell, sweep, and spike features, using ABI's feature extraction code. To facilitate data manipulation we added a tool to build a local specialized database of raw data plus extracted features. Finally, to maximize automation, we extended our NeuroManager workflow automation suite to include these tools plus a separate investigation database. The extended suite allows the user to integrate ABI experimental and modeling data into an automated workflow deployed on heterogeneous computer infrastructures, from local servers, to high performance computing environments, to the cloud. Since our approach is focused on workflow procedures our tools can be modified to interact with the increasing number of neuroscience databases being developed to cover all scales and properties of the nervous system.
CROPPER: a metagene creator resource for cross-platform and cross-species compendium studies.
Paananen, Jussi; Storvik, Markus; Wong, Garry
2006-09-22
Current genomic research methods provide researchers with enormous amounts of data. Combining data from different high-throughput research technologies commonly available in biological databases can lead to novel findings and increase research efficiency. However, combining data from different heterogeneous sources is often a very arduous task. These sources can be different microarray technology platforms, genomic databases, or experiments performed on various species. Our aim was to develop a software program that could facilitate the combining of data from heterogeneous sources, and thus allow researchers to perform genomic cross-platform/cross-species studies and to use existing experimental data for compendium studies. We have developed a web-based software resource, called CROPPER that uses the latest genomic information concerning different data identifiers and orthologous genes from the Ensembl database. CROPPER can be used to combine genomic data from different heterogeneous sources, allowing researchers to perform cross-platform/cross-species compendium studies without the need for complex computational tools or the requirement of setting up one's own in-house database. We also present an example of a simple cross-platform/cross-species compendium study based on publicly available Parkinson's disease data derived from different sources. CROPPER is a user-friendly and freely available web-based software resource that can be successfully used for cross-species/cross-platform compendium studies.
A service-based framework for pharmacogenomics data integration
NASA Astrophysics Data System (ADS)
Wang, Kun; Bai, Xiaoying; Li, Jing; Ding, Cong
2010-08-01
Data are central to scientific research and practices. The advance of experiment methods and information retrieval technologies leads to explosive growth of scientific data and databases. However, due to the heterogeneous problems in data formats, structures and semantics, it is hard to integrate the diversified data that grow explosively and analyse them comprehensively. As more and more public databases are accessible through standard protocols like programmable interfaces and Web portals, Web-based data integration becomes a major trend to manage and synthesise data that are stored in distributed locations. Mashup, a Web 2.0 technique, presents a new way to compose content and software from multiple resources. The paper proposes a layered framework for integrating pharmacogenomics data in a service-oriented approach using the mashup technology. The framework separates the integration concerns from three perspectives including data, process and Web-based user interface. Each layer encapsulates the heterogeneous issues of one aspect. To facilitate the mapping and convergence of data, the ontology mechanism is introduced to provide consistent conceptual models across different databases and experiment platforms. To support user-interactive and iterative service orchestration, a context model is defined to capture information of users, tasks and services, which can be used for service selection and recommendation during a dynamic service composition process. A prototype system is implemented and cases studies are presented to illustrate the promising capabilities of the proposed approach.
Ran, Xia; Cai, Wei-Jun; Huang, Xiu-Feng; Liu, Qi; Lu, Fan; Qu, Jia; Wu, Jinyu; Jin, Zi-Bing
2014-01-01
Inherited retinal degeneration (IRD), a leading cause of human blindness worldwide, is exceptionally heterogeneous with clinical heterogeneity and genetic variety. During the past decades, tremendous efforts have been made to explore the complex heterogeneity, and massive mutations have been identified in different genes underlying IRD with the significant advancement of sequencing technology. In this study, we developed a comprehensive database, 'RetinoGenetics', which contains informative knowledge about all known IRD-related genes and mutations for IRD. 'RetinoGenetics' currently contains 4270 mutations in 186 genes, with detailed information associated with 164 phenotypes from 934 publications and various types of functional annotations. Then extensive annotations were performed to each gene using various resources, including Gene Ontology, KEGG pathways, protein-protein interaction, mutational annotations and gene-disease network. Furthermore, by using the search functions, convenient browsing ways and intuitive graphical displays, 'RetinoGenetics' could serve as a valuable resource for unveiling the genetic basis of IRD. Taken together, 'RetinoGenetics' is an integrative, informative and updatable resource for IRD-related genetic predispositions. Database URL: http://www.retinogenetics.org/. © The Author(s) 2014. Published by Oxford University Press.
LAILAPS: the plant science search engine.
Esch, Maria; Chen, Jinbo; Colmsee, Christian; Klapperstück, Matthias; Grafahrend-Belau, Eva; Scholz, Uwe; Lange, Matthias
2015-01-01
With the number of sequenced plant genomes growing, the number of predicted genes and functional annotations is also increasing. The association between genes and phenotypic traits is currently of great interest. Unfortunately, the information available today is widely scattered over a number of different databases. Information retrieval (IR) has become an all-encompassing bioinformatics methodology for extracting knowledge from complex, heterogeneous and distributed databases, and therefore can be a useful tool for obtaining a comprehensive view of plant genomics, from genes to traits. Here we describe LAILAPS (http://lailaps.ipk-gatersleben.de), an IR system designed to link plant genomic data in the context of phenotypic attributes for a detailed forward genetic research. LAILAPS comprises around 65 million indexed documents, encompassing >13 major life science databases with around 80 million links to plant genomic resources. The LAILAPS search engine allows fuzzy querying for candidate genes linked to specific traits over a loosely integrated system of indexed and interlinked genome databases. Query assistance and an evidence-based annotation system enable time-efficient and comprehensive information retrieval. An artificial neural network incorporating user feedback and behavior tracking allows relevance sorting of results. We fully describe LAILAPS's functionality and capabilities by comparing this system's performance with other widely used systems and by reporting both a validation in maize and a knowledge discovery use-case focusing on candidate genes in barley. © The Author 2014. Published by Oxford University Press on behalf of Japanese Society of Plant Physiologists.
Calibration of an Unsteady Groundwater Flow Model for a Complex, Strongly Heterogeneous Aquifer
NASA Astrophysics Data System (ADS)
Curtis, Z. K.; Liao, H.; Li, S. G.; Phanikumar, M. S.; Lusch, D.
2016-12-01
Modeling of groundwater systems characterized by complex three-dimensional structure and heterogeneity remains a significant challenge. Most of today's groundwater models are developed based on relatively simple conceptual representations in favor of model calibratibility. As more complexities are modeled, e.g., by adding more layers and/or zones, or introducing transient processes, more parameters have to be estimated and issues related to ill-posed groundwater problems and non-unique calibration arise. Here, we explore the use of an alternative conceptual representation for groundwater modeling that is fully three-dimensional and can capture complex 3D heterogeneity (both systematic and "random") without over-parameterizing the aquifer system. In particular, we apply Transition Probability (TP) geostatistics on high resolution borehole data from a water well database to characterize the complex 3D geology. Different aquifer material classes, e.g., `AQ' (aquifer material), `MAQ' (marginal aquifer material'), `PCM' (partially confining material), and `CM' (confining material), are simulated, with the hydraulic properties of each material type as tuning parameters during calibration. The TP-based approach is applied to simulate unsteady groundwater flow in a large, complex, and strongly heterogeneous glacial aquifer system in Michigan across multiple spatial and temporal scales. The resulting model is calibrated to observed static water level data over a time span of 50 years. The results show that the TP-based conceptualization enables much more accurate and robust calibration/simulation than that based on conventional deterministic layer/zone based conceptual representations.
Evaluating the Impact of Database Heterogeneity on Observational Study Results
Madigan, David; Ryan, Patrick B.; Schuemie, Martijn; Stang, Paul E.; Overhage, J. Marc; Hartzema, Abraham G.; Suchard, Marc A.; DuMouchel, William; Berlin, Jesse A.
2013-01-01
Clinical studies that use observational databases to evaluate the effects of medical products have become commonplace. Such studies begin by selecting a particular database, a decision that published papers invariably report but do not discuss. Studies of the same issue in different databases, however, can and do generate different results, sometimes with strikingly different clinical implications. In this paper, we systematically study heterogeneity among databases, holding other study methods constant, by exploring relative risk estimates for 53 drug-outcome pairs and 2 widely used study designs (cohort studies and self-controlled case series) across 10 observational databases. When holding the study design constant, our analysis shows that estimated relative risks range from a statistically significant decreased risk to a statistically significant increased risk in 11 of 53 (21%) of drug-outcome pairs that use a cohort design and 19 of 53 (36%) of drug-outcome pairs that use a self-controlled case series design. This exceeds the proportion of pairs that were consistent across databases in both direction and statistical significance, which was 9 of 53 (17%) for cohort studies and 5 of 53 (9%) for self-controlled case series. Our findings show that clinical studies that use observational databases can be sensitive to the choice of database. More attention is needed to consider how the choice of data source may be affecting results. PMID:23648805
Large scale validation of the M5L lung CAD on heterogeneous CT datasets.
Torres, E Lopez; Fiorina, E; Pennazio, F; Peroni, C; Saletta, M; Camarlinghi, N; Fantacci, M E; Cerello, P
2015-04-01
M5L, a fully automated computer-aided detection (CAD) system for the detection and segmentation of lung nodules in thoracic computed tomography (CT), is presented and validated on several image datasets. M5L is the combination of two independent subsystems, based on the Channeler Ant Model as a segmentation tool [lung channeler ant model (lungCAM)] and on the voxel-based neural approach. The lungCAM was upgraded with a scan equalization module and a new procedure to recover the nodules connected to other lung structures; its classification module, which makes use of a feed-forward neural network, is based of a small number of features (13), so as to minimize the risk of lacking generalization, which could be possible given the large difference between the size of the training and testing datasets, which contain 94 and 1019 CTs, respectively. The lungCAM (standalone) and M5L (combined) performance was extensively tested on 1043 CT scans from three independent datasets, including a detailed analysis of the full Lung Image Database Consortium/Image Database Resource Initiative database, which is not yet found in literature. The lungCAM and M5L performance is consistent across the databases, with a sensitivity of about 70% and 80%, respectively, at eight false positive findings per scan, despite the variable annotation criteria and acquisition and reconstruction conditions. A reduced sensitivity is found for subtle nodules and ground glass opacities (GGO) structures. A comparison with other CAD systems is also presented. The M5L performance on a large and heterogeneous dataset is stable and satisfactory, although the development of a dedicated module for GGOs detection could further improve it, as well as an iterative optimization of the training procedure. The main aim of the present study was accomplished: M5L results do not deteriorate when increasing the dataset size, making it a candidate for supporting radiologists on large scale screenings and clinical programs.
Optimizing acupuncture treatment for dry eye syndrome: a systematic review.
Kim, Bong Hyun; Kim, Min Hee; Kang, Se Hyun; Nam, Hae Jeong
2018-05-03
In a former meta-analysis review, acupuncture was considered a potentially effective treatment for dry eye syndrome (DES), but there were heterogeneities among the outcomes. We updated the meta-analysis and conducted subgroup analysis to reduce the heterogeneity and suggest the most effective acupuncture method based on clinical trials. We searched for randomized controlled trials (RCTs) in 10 databases (MEDLINE, EMBASE, CENTAL, AMED, SCOPUS, CNKI, Wangfang database, Oriental Medicine Advanced Searching Integrated System (OASIS), Koreamed, J-stage) and searched by hand to compare the effects of acupuncture and artificial tears (AT). We also conducted subgroup analysis by (1) method of intervention (acupuncture only or acupuncture plus AT), (2) intervention frequency (less than 3 times a week or more than 3 times a week), (3) period of treatment (less than 4 weeks or more than 4 weeks), and (4) acupoints (BL1, BL2, ST1, ST2, TE23, Ex-HN5). The Bucher method was used for subgroup comparisons. Nineteen studies with 1126 patients were included. Significant improvements on the Schirmer test (weighted mean difference[WMD], 2.14; 95% confidence interval[CI], 0.93 to 3.34; p = 0.0005) and break up time (BUT) (WMD, 0.98; 95% CI, 0.79 to 1.18; p < 0.00001) were reported. In the subgroup analysis, acupuncture plus AT treatment had a weaker effect in BUT but a stronger effect on the Schirmer test and a better overall effect than acupuncture alone. For treatment duration, treatment longer than 1 month was more effective than shorter treatment. With regard to treatment frequency, treatment less than three times a week was more effective than more frequent treatment. In the acupoint analysis, acupuncture treatment including the BL2 and ST1 acupoints was less effective than treatment that did not include them. None of those factors reduced the heterogeneity. Acupuncture was more effective than AT in treating DES but showed high heterogeneity. Intervention differences did not influence the heterogeneity.
Chen, R S; Nadkarni, P; Marenco, L; Levin, F; Erdos, J; Miller, P L
2000-01-01
The entity-attribute-value representation with classes and relationships (EAV/CR) provides a flexible and simple database schema to store heterogeneous biomedical data. In certain circumstances, however, the EAV/CR model is known to retrieve data less efficiently than conventionally based database schemas. To perform a pilot study that systematically quantifies performance differences for database queries directed at real-world microbiology data modeled with EAV/CR and conventional representations, and to explore the relative merits of different EAV/CR query implementation strategies. Clinical microbiology data obtained over a ten-year period were stored using both database models. Query execution times were compared for four clinically oriented attribute-centered and entity-centered queries operating under varying conditions of database size and system memory. The performance characteristics of three different EAV/CR query strategies were also examined. Performance was similar for entity-centered queries in the two database models. Performance in the EAV/CR model was approximately three to five times less efficient than its conventional counterpart for attribute-centered queries. The differences in query efficiency became slightly greater as database size increased, although they were reduced with the addition of system memory. The authors found that EAV/CR queries formulated using multiple, simple SQL statements executed in batch were more efficient than single, large SQL statements. This paper describes a pilot project to explore issues in and compare query performance for EAV/CR and conventional database representations. Although attribute-centered queries were less efficient in the EAV/CR model, these inefficiencies may be addressable, at least in part, by the use of more powerful hardware or more memory, or both.
Research on distributed heterogeneous data PCA algorithm based on cloud platform
NASA Astrophysics Data System (ADS)
Zhang, Jin; Huang, Gang
2018-05-01
Principal component analysis (PCA) of heterogeneous data sets can solve the problem that centralized data scalability is limited. In order to reduce the generation of intermediate data and error components of distributed heterogeneous data sets, a principal component analysis algorithm based on heterogeneous data sets under cloud platform is proposed. The algorithm performs eigenvalue processing by using Householder tridiagonalization and QR factorization to calculate the error component of the heterogeneous database associated with the public key to obtain the intermediate data set and the lost information. Experiments on distributed DBM heterogeneous datasets show that the model method has the feasibility and reliability in terms of execution time and accuracy.
Iavindrasana, Jimison; Depeursinge, Adrien; Ruch, Patrick; Spahni, Stéphane; Geissbuhler, Antoine; Müller, Henning
2007-01-01
The diagnostic and therapeutic processes, as well as the development of new treatments, are hindered by the fragmentation of information which underlies them. In a multi-institutional research study database, the clinical information system (CIS) contains the primary data input. An important part of the money of large scale clinical studies is often paid for data creation and maintenance. The objective of this work is to design a decentralized, scalable, reusable database architecture with lower maintenance costs for managing and integrating distributed heterogeneous data required as basis for a large-scale research project. Technical and legal aspects are taken into account based on various use case scenarios. The architecture contains 4 layers: data storage and access are decentralized at their production source, a connector as a proxy between the CIS and the external world, an information mediator as a data access point and the client side. The proposed design will be implemented inside six clinical centers participating in the @neurIST project as part of a larger system on data integration and reuse for aneurism treatment.
Huang, Haiyan; Liu, Chun-Chi; Zhou, Xianghong Jasmine
2010-04-13
The rapid accumulation of gene expression data has offered unprecedented opportunities to study human diseases. The National Center for Biotechnology Information Gene Expression Omnibus is currently the largest database that systematically documents the genome-wide molecular basis of diseases. However, thus far, this resource has been far from fully utilized. This paper describes the first study to transform public gene expression repositories into an automated disease diagnosis database. Particularly, we have developed a systematic framework, including a two-stage Bayesian learning approach, to achieve the diagnosis of one or multiple diseases for a query expression profile along a hierarchical disease taxonomy. Our approach, including standardizing cross-platform gene expression data and heterogeneous disease annotations, allows analyzing both sources of information in a unified probabilistic system. A high level of overall diagnostic accuracy was shown by cross validation. It was also demonstrated that the power of our method can increase significantly with the continued growth of public gene expression repositories. Finally, we showed how our disease diagnosis system can be used to characterize complex phenotypes and to construct a disease-drug connectivity map.
In-Memory Graph Databases for Web-Scale Data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Castellana, Vito G.; Morari, Alessandro; Weaver, Jesse R.
RDF databases have emerged as one of the most relevant way for organizing, integrating, and managing expo- nentially growing, often heterogeneous, and not rigidly structured data for a variety of scientific and commercial fields. In this paper we discuss the solutions integrated in GEMS (Graph database Engine for Multithreaded Systems), a software framework for implementing RDF databases on commodity, distributed-memory high-performance clusters. Unlike the majority of current RDF databases, GEMS has been designed from the ground up to primarily employ graph-based methods. This is reflected in all the layers of its stack. The GEMS framework is composed of: a SPARQL-to-C++more » compiler, a library of data structures and related methods to access and modify them, and a custom runtime providing lightweight software multithreading, network messages aggregation and a partitioned global address space. We provide an overview of the framework, detailing its component and how they have been closely designed and customized to address issues of graph methods applied to large-scale datasets on clusters. We discuss in details the principles that enable automatic translation of the queries (expressed in SPARQL, the query language of choice for RDF databases) to graph methods, and identify differences with respect to other RDF databases.« less
Using RDF to Model the Structure and Process of Systems
NASA Astrophysics Data System (ADS)
Rodriguez, Marko A.; Watkins, Jennifer H.; Bollen, Johan; Gershenson, Carlos
Many systems can be described in terms of networks of discrete elements and their various relationships to one another. A semantic network, or multi-relational network, is a directed labeled graph consisting of a heterogeneous set of entities connected by a heterogeneous set of relationships. Semantic networks serve as a promising general-purpose modeling substrate for complex systems. Various standardized formats and tools are now available to support practical, large-scale semantic network models. First, the Resource Description Framework (RDF) offers a standardized semantic network data model that can be further formalized by ontology modeling languages such as RDF Schema (RDFS) and the Web Ontology Language (OWL). Second, the recent introduction of highly performant triple-stores (i.e. semantic network databases) allows semantic network models on the order of 109 edges to be efficiently stored and manipulated. RDF and its related technologies are currently used extensively in the domains of computer science, digital library science, and the biological sciences. This article will provide an introduction to RDF/RDFS/OWL and an examination of its suitability to model discrete element complex systems.
Update on Genomic Databases and Resources at the National Center for Biotechnology Information.
Tatusova, Tatiana
2016-01-01
The National Center for Biotechnology Information (NCBI), as a primary public repository of genomic sequence data, collects and maintains enormous amounts of heterogeneous data. Data for genomes, genes, gene expressions, gene variation, gene families, proteins, and protein domains are integrated with the analytical, search, and retrieval resources through the NCBI website, text-based search and retrieval system, provides a fast and easy way to navigate across diverse biological databases.Comparative genome analysis tools lead to further understanding of evolution processes quickening the pace of discovery. Recent technological innovations have ignited an explosion in genome sequencing that has fundamentally changed our understanding of the biology of living organisms. This huge increase in DNA sequence data presents new challenges for the information management system and the visualization tools. New strategies have been designed to bring an order to this genome sequence shockwave and improve the usability of associated data.
NASA Astrophysics Data System (ADS)
Michel, L.; Motch, C.; Pineau, F. X.
2009-05-01
As members of the Survey Science Consortium of the XMM-Newton mission the Strasbourg Observatory is in charge of the real-time cross-correlations of X-ray data with archival catalogs. We also are committed to provide a specific tools to handle these cross-correlations and propose identifications at other wavelengths. In order to do so, we developed a database generator (Saada) managing persitent links and supporting heterogeneous input datasets. This system allows to easily build an archive containing numerous and complex links between individual items [1]. It also offers a powerfull query engine able to select sources on the basis of the properties (existence, distance, colours) of the X-ray-archival associations. We present such a database in operation for the 2XMMi catalogue. This system is flexible enough to provide both a public data interface and a servicing interface which could be used in the framework of the Simbol-X ground segment.
Toward a Bio-Medical Thesaurus: Building the Foundation of the UMLS
Tuttle, Mark S.; Blois, Marsden S.; Erlbaum, Mark S.; Nelson, Stuart J.; Sherertz, David D.
1988-01-01
The Unified Medical Language System (UMLS) is being designed to provide a uniform user interface to heterogeneous machine-readable bio-medical information resources, such as bibliographic databases, genetic databases, expert systems and patient records.1 Such an interface will have to recognize different ways of saying the same thing, and provide links to ways of saying related things. One way to represent the necessary associations is via a domain thesaurus. As no such thesaurus exists, and because, once built, it will be both sizable and in need of continuous maintenance, its design should include a methodology for building and maintaining it. We propose a methodology, utilizing lexically expanded schema inversion, and a design, called T. Lex, which together form one approach to the problem of defining and building a bio-medical thesaurus. We argue that the semantic locality implicit in such a thesaurus will support model-based reasoning in bio-medicine.2
Tsou, Ann-Ping; Sun, Yi-Ming; Liu, Chia-Lin; Huang, Hsien-Da; Horng, Jorng-Tzong; Tsai, Meng-Feng; Liu, Baw-Juine
2006-07-01
Identification of transcriptional regulatory sites plays an important role in the investigation of gene regulation. For this propose, we designed and implemented a data warehouse to integrate multiple heterogeneous biological data sources with data types such as text-file, XML, image, MySQL database model, and Oracle database model. The utility of the biological data warehouse in predicting transcriptional regulatory sites of coregulated genes was explored using a synexpression group derived from a microarray study. Both of the binding sites of known transcription factors and predicted over-represented (OR) oligonucleotides were demonstrated for the gene group. The potential biological roles of both known nucleotides and one OR nucleotide were demonstrated using bioassays. Therefore, the results from the wet-lab experiments reinforce the power and utility of the data warehouse as an approach to the genome-wide search for important transcription regulatory elements that are the key to many complex biological systems.
Modeling and Databases for Teaching Petrology
NASA Astrophysics Data System (ADS)
Asher, P.; Dutrow, B.
2003-12-01
With the widespread availability of high-speed computers with massive storage and ready transport capability of large amounts of data, computational and petrologic modeling and the use of databases provide new tools with which to teach petrology. Modeling can be used to gain insights into a system, predict system behavior, describe a system's processes, compare with a natural system or simply to be illustrative. These aspects result from data driven or empirical, analytical or numerical models or the concurrent examination of multiple lines of evidence. At the same time, use of models can enhance core foundations of the geosciences by improving critical thinking skills and by reinforcing prior knowledge gained. However, the use of modeling to teach petrology is dictated by the level of expectation we have for students and their facility with modeling approaches. For example, do we expect students to push buttons and navigate a program, understand the conceptual model and/or evaluate the results of a model. Whatever the desired level of sophistication, specific elements of design should be incorporated into a modeling exercise for effective teaching. These include, but are not limited to; use of the scientific method, use of prior knowledge, a clear statement of purpose and goals, attainable goals, a connection to the natural/actual system, a demonstration that complex heterogeneous natural systems are amenable to analyses by these techniques and, ideally, connections to other disciplines and the larger earth system. Databases offer another avenue with which to explore petrology. Large datasets are available that allow integration of multiple lines of evidence to attack a petrologic problem or understand a petrologic process. These are collected into a database that offers a tool for exploring, organizing and analyzing the data. For example, datasets may be geochemical, mineralogic, experimental and/or visual in nature, covering global, regional to local scales. These datasets provide students with access to large amount of related data through space and time. Goals of the database working group include educating earth scientists about information systems in general, about the importance of metadata about ways of using databases and datasets as educational tools and about the availability of existing datasets and databases. The modeling and databases groups hope to create additional petrologic teaching tools using these aspects and invite the community to contribute to the effort.
ARIADNE: a Tracking System for Relationships in LHCb Metadata
NASA Astrophysics Data System (ADS)
Shapoval, I.; Clemencic, M.; Cattaneo, M.
2014-06-01
The data processing model of the LHCb experiment implies handling of an evolving set of heterogeneous metadata entities and relationships between them. The entities range from software and databases states to architecture specificators and software/data deployment locations. For instance, there is an important relationship between the LHCb Conditions Database (CondDB), which provides versioned, time dependent geometry and conditions data, and the LHCb software, which is the data processing applications (used for simulation, high level triggering, reconstruction and analysis of physics data). The evolution of CondDB and of the LHCb applications is a weakly-homomorphic process. It means that relationships between a CondDB state and LHCb application state may not be preserved across different database and application generations. These issues may lead to various kinds of problems in the LHCb production, varying from unexpected application crashes to incorrect data processing results. In this paper we present Ariadne - a generic metadata relationships tracking system based on the novel NoSQL Neo4j graph database. Its aim is to track and analyze many thousands of evolving relationships for cases such as the one described above, and several others, which would otherwise remain unmanaged and potentially harmful. The highlights of the paper include the system's implementation and management details, infrastructure needed for running it, security issues, first experience of usage in the LHCb production and potential of the system to be applied to a wider set of LHCb tasks.
E-health and healthcare enterprise information system leveraging service-oriented architecture.
Hsieh, Sung-Huai; Hsieh, Sheau-Ling; Cheng, Po-Hsun; Lai, Feipei
2012-04-01
To present the successful experiences of an integrated, collaborative, distributed, large-scale enterprise healthcare information system over a wired and wireless infrastructure in National Taiwan University Hospital (NTUH). In order to smoothly and sequentially transfer from the complex relations among the old (legacy) systems to the new-generation enterprise healthcare information system, we adopted the multitier framework based on service-oriented architecture to integrate the heterogeneous systems as well as to interoperate among many other components and multiple databases. We also present mechanisms of a logical layer reusability approach and data (message) exchange flow via Health Level 7 (HL7) middleware, DICOM standard, and the Integrating the Healthcare Enterprise workflow. The architecture and protocols of the NTUH enterprise healthcare information system, especially in the Inpatient Information System (IIS), are discussed in detail. The NTUH Inpatient Healthcare Information System is designed and deployed on service-oriented architecture middleware frameworks. The mechanisms of integration as well as interoperability among the components and the multiple databases apply the HL7 standards for data exchanges, which are embedded in XML formats, and Microsoft .NET Web services to integrate heterogeneous platforms. The preliminary performance of the current operation IIS is evaluated and analyzed to verify the efficiency and effectiveness of the designed architecture; it shows reliability and robustness in the highly demanding traffic environment of NTUH. The newly developed NTUH IIS provides an open and flexible environment not only to share medical information easily among other branch hospitals, but also to reduce the cost of maintenance. The HL7 message standard is widely adopted to cover all data exchanges in the system. All services are independent modules that enable the system to be deployed and configured to the highest degree of flexibility. Furthermore, we can conclude that the multitier Inpatient Healthcare Information System has been designed successfully and in a collaborative manner, based on the index of performance evaluations, central processing unit, and memory utilizations.
A Hybrid Approach to Protect Palmprint Templates
Sun, Dongmei; Xiong, Ke; Qiu, Zhengding
2014-01-01
Biometric template protection is indispensable to protect personal privacy in large-scale deployment of biometric systems. Accuracy, changeability, and security are three critical requirements for template protection algorithms. However, existing template protection algorithms cannot satisfy all these requirements well. In this paper, we propose a hybrid approach that combines random projection and fuzzy vault to improve the performances at these three points. Heterogeneous space is designed for combining random projection and fuzzy vault properly in the hybrid scheme. New chaff point generation method is also proposed to enhance the security of the heterogeneous vault. Theoretical analyses of proposed hybrid approach in terms of accuracy, changeability, and security are given in this paper. Palmprint database based experimental results well support the theoretical analyses and demonstrate the effectiveness of proposed hybrid approach. PMID:24982977
A hybrid approach to protect palmprint templates.
Liu, Hailun; Sun, Dongmei; Xiong, Ke; Qiu, Zhengding
2014-01-01
Biometric template protection is indispensable to protect personal privacy in large-scale deployment of biometric systems. Accuracy, changeability, and security are three critical requirements for template protection algorithms. However, existing template protection algorithms cannot satisfy all these requirements well. In this paper, we propose a hybrid approach that combines random projection and fuzzy vault to improve the performances at these three points. Heterogeneous space is designed for combining random projection and fuzzy vault properly in the hybrid scheme. New chaff point generation method is also proposed to enhance the security of the heterogeneous vault. Theoretical analyses of proposed hybrid approach in terms of accuracy, changeability, and security are given in this paper. Palmprint database based experimental results well support the theoretical analyses and demonstrate the effectiveness of proposed hybrid approach.
NASA Astrophysics Data System (ADS)
Bhanumurthy, V.; Venugopala Rao, K.; Srinivasa Rao, S.; Ram Mohan Rao, K.; Chandra, P. Satya; Vidhyasagar, J.; Diwakar, P. G.; Dadhwal, V. K.
2014-11-01
Geographical Information Science (GIS) is now graduated from traditional desktop system to Internet system. Internet GIS is emerging as one of the most promising technologies for addressing Emergency Management. Web services with different privileges are playing an important role in dissemination of the emergency services to the decision makers. Spatial database is one of the most important components in the successful implementation of Emergency Management. It contains spatial data in the form of raster, vector, linked with non-spatial information. Comprehensive data is required to handle emergency situation in different phases. These database elements comprise core data, hazard specific data, corresponding attribute data, and live data coming from the remote locations. Core data sets are minimum required data including base, thematic, infrastructure layers to handle disasters. Disaster specific information is required to handle a particular disaster situation like flood, cyclone, forest fire, earth quake, land slide, drought. In addition to this Emergency Management require many types of data with spatial and temporal attributes that should be made available to the key players in the right format at right time. The vector database needs to be complemented with required resolution satellite imagery for visualisation and analysis in disaster management. Therefore, the database is interconnected and comprehensive to meet the requirement of an Emergency Management. This kind of integrated, comprehensive and structured database with appropriate information is required to obtain right information at right time for the right people. However, building spatial database for Emergency Management is a challenging task because of the key issues such as availability of data, sharing policies, compatible geospatial standards, data interoperability etc. Therefore, to facilitate using, sharing, and integrating the spatial data, there is a need to define standards to build emergency database systems. These include aspects such as i) data integration procedures namely standard coding scheme, schema, meta data format, spatial format ii) database organisation mechanism covering data management, catalogues, data models iii) database dissemination through a suitable environment, as a standard service for effective service dissemination. National Database for Emergency Management (NDEM) is such a comprehensive database for addressing disasters in India at the national level. This paper explains standards for integrating, organising the multi-scale and multi-source data with effective emergency response using customized user interfaces for NDEM. It presents standard procedure for building comprehensive emergency information systems for enabling emergency specific functions through geospatial technologies.
Organization of Heterogeneous Scientific Data Using the EAV/CR Representation
Nadkarni, Prakash M.; Marenco, Luis; Chen, Roland; Skoufos, Emmanouil; Shepherd, Gordon; Miller, Perry
1999-01-01
Entity-attribute-value (EAV) representation is a means of organizing highly heterogeneous data using a relatively simple physical database schema. EAV representation is widely used in the medical domain, most notably in the storage of data related to clinical patient records. Its potential strengths suggest its use in other biomedical areas, in particular research databases whose schemas are complex as well as constantly changing to reflect evolving knowledge in rapidly advancing scientific domains. When deployed for such purposes, the basic EAV representation needs to be augmented significantly to handle the modeling of complex objects (classes) as well as to manage interobject relationships. The authors refer to their modification of the basic EAV paradigm as EAV/CR (EAV with classes and relationships). They describe EAV/CR representation with examples from two biomedical databases that use it. PMID:10579606
Logical optimization for database uniformization
NASA Technical Reports Server (NTRS)
Grant, J.
1984-01-01
Data base uniformization refers to the building of a common user interface facility to support uniform access to any or all of a collection of distributed heterogeneous data bases. Such a system should enable a user, situated anywhere along a set of distributed data bases, to access all of the information in the data bases without having to learn the various data manipulation languages. Furthermore, such a system should leave intact the component data bases, and in particular, their already existing software. A survey of various aspects of the data bases uniformization problem and a proposed solution are presented.
A scalable database model for multiparametric time series: a volcano observatory case study
NASA Astrophysics Data System (ADS)
Montalto, Placido; Aliotta, Marco; Cassisi, Carmelo; Prestifilippo, Michele; Cannata, Andrea
2014-05-01
The variables collected by a sensor network constitute a heterogeneous data source that needs to be properly organized in order to be used in research and geophysical monitoring. With the time series term we refer to a set of observations of a given phenomenon acquired sequentially in time. When the time intervals are equally spaced one speaks of period or sampling frequency. Our work describes in detail a possible methodology for storage and management of time series using a specific data structure. We designed a framework, hereinafter called TSDSystem (Time Series Database System), in order to acquire time series from different data sources and standardize them within a relational database. The operation of standardization provides the ability to perform operations, such as query and visualization, of many measures synchronizing them using a common time scale. The proposed architecture follows a multiple layer paradigm (Loaders layer, Database layer and Business Logic layer). Each layer is specialized in performing particular operations for the reorganization and archiving of data from different sources such as ASCII, Excel, ODBC (Open DataBase Connectivity), file accessible from the Internet (web pages, XML). In particular, the loader layer performs a security check of the working status of each running software through an heartbeat system, in order to automate the discovery of acquisition issues and other warning conditions. Although our system has to manage huge amounts of data, performance is guaranteed by using a smart partitioning table strategy, that keeps balanced the percentage of data stored in each database table. TSDSystem also contains modules for the visualization of acquired data, that provide the possibility to query different time series on a specified time range, or follow the realtime signal acquisition, according to a data access policy from the users.
A multidisciplinary database for geophysical time series management
NASA Astrophysics Data System (ADS)
Montalto, P.; Aliotta, M.; Cassisi, C.; Prestifilippo, M.; Cannata, A.
2013-12-01
The variables collected by a sensor network constitute a heterogeneous data source that needs to be properly organized in order to be used in research and geophysical monitoring. With the time series term we refer to a set of observations of a given phenomenon acquired sequentially in time. When the time intervals are equally spaced one speaks of period or sampling frequency. Our work describes in detail a possible methodology for storage and management of time series using a specific data structure. We designed a framework, hereinafter called TSDSystem (Time Series Database System), in order to acquire time series from different data sources and standardize them within a relational database. The operation of standardization provides the ability to perform operations, such as query and visualization, of many measures synchronizing them using a common time scale. The proposed architecture follows a multiple layer paradigm (Loaders layer, Database layer and Business Logic layer). Each layer is specialized in performing particular operations for the reorganization and archiving of data from different sources such as ASCII, Excel, ODBC (Open DataBase Connectivity), file accessible from the Internet (web pages, XML). In particular, the loader layer performs a security check of the working status of each running software through an heartbeat system, in order to automate the discovery of acquisition issues and other warning conditions. Although our system has to manage huge amounts of data, performance is guaranteed by using a smart partitioning table strategy, that keeps balanced the percentage of data stored in each database table. TSDSystem also contains modules for the visualization of acquired data, that provide the possibility to query different time series on a specified time range, or follow the realtime signal acquisition, according to a data access policy from the users.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Harber, K.S.
1993-05-01
This report contains the following papers: Implications in vivid logic; a self-learning bayesian expert system; a natural language generation system for a heterogeneous distributed database system; competence-switching'' managed by intelligent systems; strategy acquisition by an artificial neural network: Experiments in learning to play a stochastic game; viewpoints and selective inheritance in object-oriented modeling; multivariate discretization of continuous attributes for machine learning; utilization of the case-based reasoning method to resolve dynamic problems; formalization of an ontology of ceramic science in CLASSIC; linguistic tools for intelligent systems; an application of rough sets in knowledge synthesis; and a relational model for imprecise queries.more » These papers have been indexed separately.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Harber, K.S.
1993-05-01
This report contains the following papers: Implications in vivid logic; a self-learning Bayesian Expert System; a natural language generation system for a heterogeneous distributed database system; ``competence-switching`` managed by intelligent systems; strategy acquisition by an artificial neural network: Experiments in learning to play a stochastic game; viewpoints and selective inheritance in object-oriented modeling; multivariate discretization of continuous attributes for machine learning; utilization of the case-based reasoning method to resolve dynamic problems; formalization of an ontology of ceramic science in CLASSIC; linguistic tools for intelligent systems; an application of rough sets in knowledge synthesis; and a relational model for imprecise queries.more » These papers have been indexed separately.« less
Cai, Yi; Du, Jingcheng; Huang, Jing; Ellenberg, Susan S; Hennessy, Sean; Tao, Cui; Chen, Yong
2017-07-05
To identify safety signals by manual review of individual report in large surveillance databases is time consuming; such an approach is very unlikely to reveal complex relationships between medications and adverse events. Since the late 1990s, efforts have been made to develop data mining tools to systematically and automatically search for safety signals in surveillance databases. Influenza vaccines present special challenges to safety surveillance because the vaccine changes every year in response to the influenza strains predicted to be prevalent that year. Therefore, it may be expected that reporting rates of adverse events following flu vaccines (number of reports for a specific vaccine-event combination/number of reports for all vaccine-event combinations) may vary substantially across reporting years. Current surveillance methods seldom consider these variations in signal detection, and reports from different years are typically collapsed together to conduct safety analyses. However, merging reports from different years ignores the potential heterogeneity of reporting rates across years and may miss important safety signals. Reports of adverse events between years 1990 to 2013 were extracted from the Vaccine Adverse Event Reporting System (VAERS) database and formatted into a three-dimensional data array with types of vaccine, groups of adverse events and reporting time as the three dimensions. We propose a random effects model to test the heterogeneity of reporting rates for a given vaccine-event combination across reporting years. The proposed method provides a rigorous statistical procedure to detect differences of reporting rates among years. We also introduce a new visualization tool to summarize the result of the proposed method when applied to multiple vaccine-adverse event combinations. We applied the proposed method to detect safety signals of FLU3, an influenza vaccine containing three flu strains, in the VAERS database. We showed that it had high statistical power to detect the variation in reporting rates across years. The identified vaccine-event combinations with significant different reporting rates over years suggested potential safety issues due to changes in vaccines which require further investigation. We developed a statistical model to detect safety signals arising from heterogeneity of reporting rates of a given vaccine-event combinations across reporting years. This method detects variation in reporting rates over years with high power. The temporal trend of reporting rate across years may reveal the impact of vaccine update on occurrence of adverse events and provide evidence for further investigations.
Impact of IPAD on CAD/CAM database university research
NASA Technical Reports Server (NTRS)
Leach, L. M.; Wozny, M. J.
1984-01-01
IPAD program has provided direction, focus and software products which impacted on CAD/CAM data base research and follow-on research. The relationship of IPAD to the research projects which involve the storage of geometric data in common data ase facilities such as data base machines, the exchange of data between heterogeneous data bases, the development of IGES processors, the migration of lrge CAD/CAM data base management systems to noncompatible hosts, and the value of RIM as a research tool is described.
Brief Report: The Negev Hospital-University-Based (HUB) Autism Database
ERIC Educational Resources Information Center
Meiri, Gal; Dinstein, Ilan; Michaelowski, Analya; Flusser, Hagit; Ilan, Michal; Faroy, Michal; Bar-Sinai, Asif; Manelis, Liora; Stolowicz, Dana; Yosef, Lili Lea; Davidovitch, Nadav; Golan, Hava; Arbelle, Shosh; Menashe, Idan
2017-01-01
Elucidating the heterogeneous etiologies of autism will require investment in comprehensive longitudinal data acquisition from large community based cohorts. With this in mind, we have established a hospital-university-based (HUB) database of autism which incorporates prospective and retrospective data from a large and ethnically diverse…
Semi-Automated Annotation of Biobank Data Using Standard Medical Terminologies in a Graph Database.
Hofer, Philipp; Neururer, Sabrina; Goebel, Georg
2016-01-01
Data describing biobank resources frequently contains unstructured free-text information or insufficient coding standards. (Bio-) medical ontologies like Orphanet Rare Diseases Ontology (ORDO) or the Human Disease Ontology (DOID) provide a high number of concepts, synonyms and entity relationship properties. Such standard terminologies increase quality and granularity of input data by adding comprehensive semantic background knowledge from validated entity relationships. Moreover, cross-references between terminology concepts facilitate data integration across databases using different coding standards. In order to encourage the use of standard terminologies, our aim is to identify and link relevant concepts with free-text diagnosis inputs within a biobank registry. Relevant concepts are selected automatically by lexical matching and SPARQL queries against a RDF triplestore. To ensure correctness of annotations, proposed concepts have to be confirmed by medical data administration experts before they are entered into the registry database. Relevant (bio-) medical terminologies describing diseases and phenotypes were identified and stored in a graph database which was tied to a local biobank registry. Concept recommendations during data input trigger a structured description of medical data and facilitate data linkage between heterogeneous systems.
Kobayashi, Norio; Ishii, Manabu; Takahashi, Satoshi; Mochizuki, Yoshiki; Matsushima, Akihiro; Toyoda, Tetsuro
2011-07-01
Global cloud frameworks for bioinformatics research databases become huge and heterogeneous; solutions face various diametric challenges comprising cross-integration, retrieval, security and openness. To address this, as of March 2011 organizations including RIKEN published 192 mammalian, plant and protein life sciences databases having 8.2 million data records, integrated as Linked Open or Private Data (LOD/LPD) using SciNetS.org, the Scientists' Networking System. The huge quantity of linked data this database integration framework covers is based on the Semantic Web, where researchers collaborate by managing metadata across public and private databases in a secured data space. This outstripped the data query capacity of existing interface tools like SPARQL. Actual research also requires specialized tools for data analysis using raw original data. To solve these challenges, in December 2009 we developed the lightweight Semantic-JSON interface to access each fragment of linked and raw life sciences data securely under the control of programming languages popularly used by bioinformaticians such as Perl and Ruby. Researchers successfully used the interface across 28 million semantic relationships for biological applications including genome design, sequence processing, inference over phenotype databases, full-text search indexing and human-readable contents like ontology and LOD tree viewers. Semantic-JSON services of SciNetS.org are provided at http://semanticjson.org.
Integrative Systems Biology for Data Driven Knowledge Discovery
Greene, Casey S.; Troyanskaya, Olga G.
2015-01-01
Integrative systems biology is an approach that brings together diverse high throughput experiments and databases to gain new insights into biological processes or systems at molecular through physiological levels. These approaches rely on diverse high-throughput experimental techniques that generate heterogeneous data by assaying varying aspects of complex biological processes. Computational approaches are necessary to provide an integrative view of these experimental results and enable data-driven knowledge discovery. Hypotheses generated from these approaches can direct definitive molecular experiments in a cost effective manner. Using integrative systems biology approaches, we can leverage existing biological knowledge and large-scale data to improve our understanding of yet unknown components of a system of interest and how its malfunction leads to disease. PMID:21044756
A broadband multimedia TeleLearning system
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wang, Ruiping; Karmouch, A.
1996-12-31
In this paper we discuss a broadband multimedia TeleLearning system under development in the Multimedia Information Research Laboratory at the University of Ottawa. The system aims at providing a seamless environment for TeleLearning using the latest telecommunication and multimedia information processing technology. It basically consists of a media production center, a courseware author site, a courseware database, a courseware user site, and an on-line facilitator site. All these components are distributed over an ATM network and work together to offer a multimedia interactive courseware service. An MHEG-based model is exploited in designing the system architecture to achieve the real-time, interactive,more » and reusable information interchange through heterogeneous platforms. The system architecture, courseware processing strategies, courseware document models are presented.« less
Turner, Rebecca M; Davey, Jonathan; Clarke, Mike J; Thompson, Simon G; Higgins, Julian PT
2012-01-01
Background Many meta-analyses contain only a small number of studies, which makes it difficult to estimate the extent of between-study heterogeneity. Bayesian meta-analysis allows incorporation of external evidence on heterogeneity, and offers advantages over conventional random-effects meta-analysis. To assist in this, we provide empirical evidence on the likely extent of heterogeneity in particular areas of health care. Methods Our analyses included 14 886 meta-analyses from the Cochrane Database of Systematic Reviews. We classified each meta-analysis according to the type of outcome, type of intervention comparison and medical specialty. By modelling the study data from all meta-analyses simultaneously, using the log odds ratio scale, we investigated the impact of meta-analysis characteristics on the underlying between-study heterogeneity variance. Predictive distributions were obtained for the heterogeneity expected in future meta-analyses. Results Between-study heterogeneity variances for meta-analyses in which the outcome was all-cause mortality were found to be on average 17% (95% CI 10–26) of variances for other outcomes. In meta-analyses comparing two active pharmacological interventions, heterogeneity was on average 75% (95% CI 58–95) of variances for non-pharmacological interventions. Meta-analysis size was found to have only a small effect on heterogeneity. Predictive distributions are presented for nine different settings, defined by type of outcome and type of intervention comparison. For example, for a planned meta-analysis comparing a pharmacological intervention against placebo or control with a subjectively measured outcome, the predictive distribution for heterogeneity is a log-normal (−2.13, 1.582) distribution, which has a median value of 0.12. In an example of meta-analysis of six studies, incorporating external evidence led to a smaller heterogeneity estimate and a narrower confidence interval for the combined intervention effect. Conclusions Meta-analysis characteristics were strongly associated with the degree of between-study heterogeneity, and predictive distributions for heterogeneity differed substantially across settings. The informative priors provided will be very beneficial in future meta-analyses including few studies. PMID:22461129
NASA Astrophysics Data System (ADS)
Velazquez, Enrique Israel
Improvements in medical and genomic technologies have dramatically increased the production of electronic data over the last decade. As a result, data management is rapidly becoming a major determinant, and urgent challenge, for the development of Precision Medicine. Although successful data management is achievable using Relational Database Management Systems (RDBMS), exponential data growth is a significant contributor to failure scenarios. Growing amounts of data can also be observed in other sectors, such as economics and business, which, together with the previous facts, suggests that alternate database approaches (NoSQL) may soon be required for efficient storage and management of big databases. However, this hypothesis has been difficult to test in the Precision Medicine field since alternate database architectures are complex to assess and means to integrate heterogeneous electronic health records (EHR) with dynamic genomic data are not easily available. In this dissertation, we present a novel set of experiments for identifying NoSQL database approaches that enable effective data storage and management in Precision Medicine using patients' clinical and genomic information from the cancer genome atlas (TCGA). The first experiment draws on performance and scalability from biologically meaningful queries with differing complexity and database sizes. The second experiment measures performance and scalability in database updates without schema changes. The third experiment assesses performance and scalability in database updates with schema modifications due dynamic data. We have identified two NoSQL approach, based on Cassandra and Redis, which seems to be the ideal database management systems for our precision medicine queries in terms of performance and scalability. We present NoSQL approaches and show how they can be used to manage clinical and genomic big data. Our research is relevant to the public health since we are focusing on one of the main challenges to the development of Precision Medicine and, consequently, investigating a potential solution to the progressively increasing demands on health care.
An adaptable architecture for patient cohort identification from diverse data sources.
Bache, Richard; Miles, Simon; Taweel, Adel
2013-12-01
We define and validate an architecture for systems that identify patient cohorts for clinical trials from multiple heterogeneous data sources. This architecture has an explicit query model capable of supporting temporal reasoning and expressing eligibility criteria independently of the representation of the data used to evaluate them. The architecture has the key feature that queries defined according to the query model are both pre and post-processed and this is used to address both structural and semantic heterogeneity. The process of extracting the relevant clinical facts is separated from the process of reasoning about them. A specific instance of the query model is then defined and implemented. We show that the specific instance of the query model has wide applicability. We then describe how it is used to access three diverse data warehouses to determine patient counts. Although the proposed architecture requires greater effort to implement the query model than would be the case for using just SQL and accessing a data-based management system directly, this effort is justified because it supports both temporal reasoning and heterogeneous data sources. The query model only needs to be implemented once no matter how many data sources are accessed. Each additional source requires only the implementation of a lightweight adaptor. The architecture has been used to implement a specific query model that can express complex eligibility criteria and access three diverse data warehouses thus demonstrating the feasibility of this approach in dealing with temporal reasoning and data heterogeneity.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Not Available
This report contains papers on the following topics: NREN Security Issues: Policies and Technologies; Layer Wars: Protect the Internet with Network Layer Security; Electronic Commission Management; Workflow 2000 - Electronic Document Authorization in Practice; Security Issues of a UNIX PEM Implementation; Implementing Privacy Enhanced Mail on VMS; Distributed Public Key Certificate Management; Protecting the Integrity of Privacy-enhanced Electronic Mail; Practical Authorization in Large Heterogeneous Distributed Systems; Security Issues in the Truffles File System; Issues surrounding the use of Cryptographic Algorithms and Smart Card Applications; Smart Card Augmentation of Kerberos; and An Overview of the Advanced Smart Card Access Control System.more » Selected papers were processed separately for inclusion in the Energy Science and Technology Database.« less
NASA Astrophysics Data System (ADS)
Meyer, Hanna; Authmann, Christian; Dreber, Niels; Hess, Bastian; Kellner, Klaus; Morgenthal, Theunis; Nauss, Thomas; Seeger, Bernhard; Tsvuura, Zivanai; Wiegand, Kerstin
2017-04-01
Bush encroachment is a syndrome of land degradation that occurs in many savannas including those of southern Africa. The increase in density, cover or biomass of woody vegetation often has negative effects on a range of ecosystem functions and services, which are hardly reversible. However, despite its importance, neither the causes of bush encroachment, nor the consequences of different resource management strategies to combat or mitigate related shifts in savanna states are fully understood. The project "IDESSA" (An Integrative Decision Support System for Sustainable Rangeland Management in Southern African Savannas) aims to improve the understanding of the complex interplays between land use, climate patterns and vegetation dynamics and to implement an integrative monitoring and decision-support system for the sustainable management of different savanna types. For this purpose, IDESSA follows an innovative approach that integrates local knowledge, botanical surveys, remote-sensing and machine-learning based time-series of atmospheric and land-cover dynamics, spatially explicit simulation modeling and analytical database management. The integration of the heterogeneous data will be implemented in a user oriented database infrastructure and scientific workflow system. Accessible via web-based interfaces, this database and analysis system will allow scientists to manage and analyze monitoring data and scenario computations, as well as allow stakeholders (e. g. land users, policy makers) to retrieve current ecosystem information and seasonal outlooks. We present the concept of the project and show preliminary results of the realization steps towards the integrative savanna management and decision-support system.
The customization of APACHE II for patients receiving orthotopic liver transplants
Moreno, Rui
2002-01-01
General outcome prediction models developed for use with large, multicenter databases of critically ill patients may not correctly estimate mortality if applied to a particular group of patients that was under-represented in the original database. The development of new diagnostic weights has been proposed as a method of adapting the general model – the Acute Physiology and Chronic Health Evaluation (APACHE) II in this case – to a new group of patients. Such customization must be empirically tested, because the original model cannot contain an appropriate set of predictive variables for the particular group. In this issue of Critical Care, Arabi and co-workers present the results of the validation of a modified model of the APACHE II system for patients receiving orthotopic liver transplants. The use of a highly heterogeneous database for which not all important variables were taken into account and of a sample too small to use the Hosmer–Lemeshow goodness-of-fit test appropriately makes their conclusions uncertain. PMID:12133174
Scientific information repository assisting reflectance spectrometry in legal medicine.
Belenki, Liudmila; Sterzik, Vera; Bohnert, Michael; Zimmermann, Klaus; Liehr, Andreas W
2012-06-01
Reflectance spectrometry is a fast and reliable method for the characterization of human skin if the spectra are analyzed with respect to a physical model describing the optical properties of human skin. For a field study performed at the Institute of Legal Medicine and the Freiburg Materials Research Center of the University of Freiburg, a scientific information repository has been developed, which is a variant of an electronic laboratory notebook and assists in the acquisition, management, and high-throughput analysis of reflectance spectra in heterogeneous research environments. At the core of the repository is a database management system hosting the master data. It is filled with primary data via a graphical user interface (GUI) programmed in Java, which also enables the user to browse the database and access the results of data analysis. The latter is carried out via Matlab, Python, and C programs, which retrieve the primary data from the scientific information repository, perform the analysis, and store the results in the database for further usage.
TR32DB - Management of Research Data in a Collaborative, Interdisciplinary Research Project
NASA Astrophysics Data System (ADS)
Curdt, Constanze; Hoffmeister, Dirk; Waldhoff, Guido; Lang, Ulrich; Bareth, Georg
2015-04-01
The management of research data in a well-structured and documented manner is essential in the context of collaborative, interdisciplinary research environments (e.g. across various institutions). Consequently, set-up and use of a research data management (RDM) system like a data repository or project database is necessary. These systems should accompany and support scientists during the entire research life cycle (e.g. data collection, documentation, storage, archiving, sharing, publishing) and operate cross-disciplinary in interdisciplinary research projects. Challenges and problems of RDM are well-know. Consequently, the set-up of a user-friendly, well-documented, sustainable RDM system is essential, as well as user support and further assistance. In the framework of the Transregio Collaborative Research Centre 32 'Patterns in Soil-Vegetation-Atmosphere Systems: Monitoring, Modelling, and Data Assimilation' (CRC/TR32), funded by the German Research Foundation (DFG), a RDM system was self-designed and implemented. The CRC/TR32 project database (TR32DB, www.tr32db.de) is operating online since early 2008. The TR32DB handles all data, which are created by the involved project participants from several institutions (e.g. Universities of Cologne, Bonn, Aachen, and the Research Centre Jülich) and research fields (e.g. soil and plant sciences, hydrology, geography, geophysics, meteorology, remote sensing). Very heterogeneous research data are considered, which are resulting from field measurement campaigns, meteorological monitoring, remote sensing, laboratory studies and modelling approaches. Furthermore, outcomes like publications, conference contributions, PhD reports and corresponding images are regarded. The TR32DB project database is set-up in cooperation with the Regional Computing Centre of the University of Cologne (RRZK) and also located in this hardware environment. The TR32DB system architecture is composed of three main components: (i) a file-based data storage including backup, (ii) a database-based storage for administrative data and metadata, and (iii) a web-interface for user access. The TR32DB offers common features of RDM systems. These include data storage, entry of corresponding metadata by a user-friendly input wizard, search and download of data depending on user permission, as well as secure internal exchange of data. In addition, a Digital Object Identifier (DOI) can be allocated for specific datasets and several web mapping components are supported (e.g. Web-GIS and map search). The centrepiece of the TR32DB is the self-provided and implemented CRC/TR32 specific metadata schema. This enables the documentation of all involved, heterogeneous data with accurate, interoperable metadata. The TR32DB Metadata Schema is set-up in a multi-level approach and supports several metadata standards and schemes (e.g. Dublin Core, ISO 19115, INSPIRE, DataCite). Furthermore, metadata properties with focus on the CRC/TR32 background (e.g. CRC/TR32 specific keywords) and the supported data types are complemented. Mandatory, optional and automatic metadata properties are specified. Overall, the TR32DB is designed and implemented according to the needs of the CRC/TR32 (e.g. huge amount of heterogeneous data) and demands of the DFG (e.g. cooperation with a computing centre). The application of a self-designed, project-specific, interoperable metadata schema enables the accurate documentation of all CRC/TR32 data. The implementation of the TR32DB in the hardware environment of the RRZK ensures the access to the data after the end of the CRC/TR32 funding in 2018.
Niu, Heng; Yang, Jingyu; Yang, Kunxian; Huang, Yingze
2017-11-01
DNA promoter methylation can suppresses gene expression and shows an important role in the biological functions of Ras association domain family 1A (RASSF1A). Many studies have performed to elucidate the role of RASSF1A promoter methylation in thyroid carcinoma, while the results were conflicting and heterogeneous. Here, we analyzed the data of databases to determine the relationship between RASSF1A promoter methylation and thyroid carcinoma. We used the data from 14 cancer-normal studies and Gene Expression Omnibus (GEO) database to analyze RASSF1A promoter methylation in thyroid carcinoma susceptibility. The data from the Cancer Genome Atlas project (TCGA) database was used to analyze the relationship between RASSF1A promoter methylation and thyroid carcinoma susceptibility, clinical characteristics, prognosis. Odds ratios were estimated for thyroid carcinoma susceptibility and hazard ratios were estimated for thyroid carcinoma prognosis. The heterogeneity between studies of meta-analysis was explored using H, I values, and meta-regression. We adopted quality criteria to classify the studies of meta-analysis. Subgroup analyses were done for thyroid carcinoma susceptibility according to ethnicity, methods, and primers. Result of meta-analysis indicated that RASSF1A promoter methylation is associated with higher susceptibility to thyroid carcinoma with small heterogeneity. Similarly, the result from GEO database also showed that a significant association between RASSF1A gene promoter methylation and thyroid carcinoma susceptibility. For the results of TCGA database, we found that RASSF1A promoter methylation is associated with susceptibility and poor disease-free survival (DFS) of thyroid carcinoma. In addition, we also found a close association between RASSF1A promoter methylation and patient tumor stage and age, but not in patients of different genders. The methylation status of RASSF1A promoter is strongly associated with thyroid carcinoma susceptibility and DFS. The RASSF1A promoter methylation test can be applied in the clinical diagnosis of thyroid carcinoma.
NASA Astrophysics Data System (ADS)
van Acken, D.; Luguet, A.; Pearson, D. G.; Nowell, G. M.; Fonseca, R. O. C.; Nagel, T. J.; Schulz, T.
2017-04-01
Highly siderophile element (HSE) concentration and 187Os/188Os isotopic heterogeneity has been observed on various scales in the Earth's mantle. Interaction of residual mantle peridotite with infiltrating melts has been suggested to overprint primary bulk rock HSE signatures originating from partial melting, contributing to the heterogeneity seen in the global peridotite database. Here we present a detailed study of harzburgitic xenolith 474527 from the Kangerlussuaq suite, West Greenland, coupling the Re-Os isotope geochemistry with petrography of both base metal sulfides (BMS) and silicates to assess the impact of overprint induced by melt-rock reaction on the Re-Os isotope system. Garnet harzburgite sample 474527 shows considerable heterogeneity in the composition of its major phases, most notably olivine and Cr-rich garnet, suggesting formation through multiple stages of partial melting and subsequent metasomatic events. The major BMS phases show a fairly homogeneous pentlandite-rich composition typical for BMS formed via metasomatic reaction, whereas the 187Os/188Os compositions determined for 17 of these BMS are extremely heterogeneous ranging between 0.1037 and 0.1981. Analyses by LA-ICP-MS reveal at least two populations of BMS grains characterized by contrasting HSE patterns. One type of pattern is strongly enriched in the more compatible HSE Os, Ir, and Ru over the typically incompatible Pt, Pd, and Re, while the other type shows moderate enrichment of the more incompatible HSE and has overall lower compatible HSE/incompatible HSE composition. The small-scale heterogeneity observed in these BMS highlights the need for caution when utilizing the Re-Os system to date mantle events, as even depleted harzburgite samples such as 474527 are likely to have experienced a complex history of metasomatic overprinting, with uncertain effects on the HSE.
NASA's Aviation Safety and Modeling Project
NASA Technical Reports Server (NTRS)
Chidester, Thomas R.; Statler, Irving C.
2006-01-01
The Aviation Safety Monitoring and Modeling (ASMM) Project of NASA's Aviation Safety program is cultivating sources of data and developing automated computer hardware and software to facilitate efficient, comprehensive, and accurate analyses of the data collected from large, heterogeneous databases throughout the national aviation system. The ASMM addresses the need to provide means for increasing safety by enabling the identification and correcting of predisposing conditions that could lead to accidents or to incidents that pose aviation risks. A major component of the ASMM Project is the Aviation Performance Measuring System (APMS), which is developing the next generation of software tools for analyzing and interpreting flight data.
TheHiveDB image data management and analysis framework.
Muehlboeck, J-Sebastian; Westman, Eric; Simmons, Andrew
2014-01-06
The hive database system (theHiveDB) is a web-based brain imaging database, collaboration, and activity system which has been designed as an imaging workflow management system capable of handling cross-sectional and longitudinal multi-center studies. It can be used to organize and integrate existing data from heterogeneous projects as well as data from ongoing studies. It has been conceived to guide and assist the researcher throughout the entire research process, integrating all relevant types of data across modalities (e.g., brain imaging, clinical, and genetic data). TheHiveDB is a modern activity and resource management system capable of scheduling image processing on both private compute resources and the cloud. The activity component supports common image archival and management tasks as well as established pipeline processing (e.g., Freesurfer for extraction of scalar measures from magnetic resonance images). Furthermore, via theHiveDB activity system algorithm developers may grant access to virtual machines hosting versioned releases of their tools to collaborators and the imaging community. The application of theHiveDB is illustrated with a brief use case based on organizing, processing, and analyzing data from the publically available Alzheimer Disease Neuroimaging Initiative.
TheHiveDB image data management and analysis framework
Muehlboeck, J-Sebastian; Westman, Eric; Simmons, Andrew
2014-01-01
The hive database system (theHiveDB) is a web-based brain imaging database, collaboration, and activity system which has been designed as an imaging workflow management system capable of handling cross-sectional and longitudinal multi-center studies. It can be used to organize and integrate existing data from heterogeneous projects as well as data from ongoing studies. It has been conceived to guide and assist the researcher throughout the entire research process, integrating all relevant types of data across modalities (e.g., brain imaging, clinical, and genetic data). TheHiveDB is a modern activity and resource management system capable of scheduling image processing on both private compute resources and the cloud. The activity component supports common image archival and management tasks as well as established pipeline processing (e.g., Freesurfer for extraction of scalar measures from magnetic resonance images). Furthermore, via theHiveDB activity system algorithm developers may grant access to virtual machines hosting versioned releases of their tools to collaborators and the imaging community. The application of theHiveDB is illustrated with a brief use case based on organizing, processing, and analyzing data from the publically available Alzheimer Disease Neuroimaging Initiative. PMID:24432000
Tuberculosis treatment outcome monitoring in European Union countries: systematic review
van Hest, Rob; Ködmön, Csaba; Verver, Suzanne; Erkens, Connie G.M.; Straetemans, Masja; Manissero, Davide; de Vries, Gerard
2013-01-01
Treatment success measured by treatment outcome monitoring (TOM) is a key programmatic output of tuberculosis (TB) control programmes. We performed a systematic literature review on national-level TOM in the 30 European Union (EU)/European Economic Areas (EEA) countries to summarise methods used to collect and report data on TOM. Online reference bibliographic databases PubMed/MEDLINE and EMBASE were searched to identify relevant indexed and non-indexed literature published between January 2000 and August 2010. The search strategy resulted in 615 potentially relevant indexed citations, of which 27 full-text national studies (79 data sets) were included for final analysis. The selected studies were performed in 10 EU/EEA countries and gave a fragmented impression of TOM in the EU/EEA. Publication year, study period, sample size, databases, definitions, variables, patient and outcome categories, and population subgroups varied widely, portraying a very heterogeneous picture. This review confirmed previous reports of considerable heterogeneity in publications of TOM results across EU/EEA countries. PubMed/MEDLINE and EMBASE indexed studies are not a suitable instrument to measure representative TOM results for the 30 EU/EEA countries. Uniform and complete reporting to the centralised European Surveillance System will produce the most timely and reliable results of TB treatment outcomes in the EU/EEA. PMID:22790913
Shea, S; Sengupta, S; Crosswell, A; Clayton, P D
1992-01-01
The developing Integrated Academic Information System (IAIMS) at Columbia-Presbyterian Medical Center provides data sharing links between two separate corporate entities, namely Columbia University Medical School and The Presbyterian Hospital, using a network-based architecture. Multiple database servers with heterogeneous user authentication protocols are linked to this network. "One-stop information shopping" implies one log-on procedure per session, not separate log-on and log-off procedures for each server or application used during a session. These circumstances provide challenges at the policy and technical levels to data security at the network level and insuring smooth information access for end users of these network-based services. Five activities being conducted as part of our security project are described: (1) policy development; (2) an authentication server for the network; (3) Kerberos as a tool for providing mutual authentication, encryption, and time stamping of authentication messages; (4) a prototype interface using Kerberos services to authenticate users accessing a network database server; and (5) a Kerberized electronic signature.
NASA Astrophysics Data System (ADS)
Taira, Ricky K.; Wong, Clement; Johnson, David; Bhushan, Vikas; Rivera, Monica; Huang, Lu J.; Aberle, Denise R.; Cardenas, Alfonso F.; Chu, Wesley W.
1995-05-01
With the increase in the volume and distribution of images and text available in PACS and medical electronic health-care environments it becomes increasingly important to maintain indexes that summarize the content of these multi-media documents. Such indices are necessary to quickly locate relevant patient cases for research, patient management, and teaching. The goal of this project is to develop an intelligent document retrieval system that allows researchers to request for patient cases based on document content. Thus we wish to retrieve patient cases from electronic information archives that could include a combined specification of patient demographics, low level radiologic findings (size, shape, number), intermediate-level radiologic findings (e.g., atelectasis, infiltrates, etc.) and/or high-level pathology constraints (e.g., well-differentiated small cell carcinoma). The cases could be distributed among multiple heterogeneous databases such as PACS, RIS, and HIS. Content- based retrieval systems go beyond the capabilities of simple key-word or string-based retrieval matching systems. These systems require a knowledge base to comprehend the generality/specificity of a concept (thus knowing the subclasses or related concepts to a given concept) and knowledge of the various string representations for each concept (i.e., synonyms, lexical variants, etc.). We have previously reported on a data integration mediation layer that allows transparent access to multiple heterogeneous distributed medical databases (HIS, RIS, and PACS). The data access layer of our architecture currently has limited query processing capabilities. Given a patient hospital identification number, the access mediation layer collects all documents in RIS and HIS and returns this information to a specified workstation location. In this paper we report on our efforts to extend the query processing capabilities of the system by creation of custom query interfaces, an intelligent query processing engine, and a document-content index that can be generated automatically (i.e., no manual authoring or changes to the normal clinical protocols).
Groundwater modeling in integrated water resources management--visions for 2020.
Refsgaard, Jens Christian; Højberg, Anker Lajer; Møller, Ingelise; Hansen, Martin; Søndergaard, Verner
2010-01-01
Groundwater modeling is undergoing a change from traditional stand-alone studies toward being an integrated part of holistic water resources management procedures. This is illustrated by the development in Denmark, where comprehensive national databases for geologic borehole data, groundwater-related geophysical data, geologic models, as well as a national groundwater-surface water model have been established and integrated to support water management. This has enhanced the benefits of using groundwater models. Based on insight gained from this Danish experience, a scientifically realistic scenario for the use of groundwater modeling in 2020 has been developed, in which groundwater models will be a part of sophisticated databases and modeling systems. The databases and numerical models will be seamlessly integrated, and the tasks of monitoring and modeling will be merged. Numerical models for atmospheric, surface water, and groundwater processes will be coupled in one integrated modeling system that can operate at a wide range of spatial scales. Furthermore, the management systems will be constructed with a focus on building credibility of model and data use among all stakeholders and on facilitating a learning process whereby data and models, as well as stakeholders' understanding of the system, are updated to currently available information. The key scientific challenges for achieving this are (1) developing new methodologies for integration of statistical and qualitative uncertainty; (2) mapping geological heterogeneity and developing scaling methodologies; (3) developing coupled model codes; and (4) developing integrated information systems, including quality assurance and uncertainty information that facilitate active stakeholder involvement and learning.
In Silico Approaches and the Role of Ontologies in Aging Research
Boerries, Melanie; Busch, Hauke; de Grey, Aubrey; Hahn, Udo; Hiller, Thomas; Hoeflich, Andreas; Jansen, Ludger; Janssens, Georges E.; Kaleta, Christoph; Meinema, Anne C.; Schäuble, Sascha; Simm, Andreas; Schofield, Paul N.; Smith, Barry; Sühnel, Juergen; Vera, Julio; Wagner, Wolfgang; Wönne, Eva C.; Wuttke, Daniel
2013-01-01
Abstract The 2013 Rostock Symposium on Systems Biology and Bioinformatics in Aging Research was again dedicated to dissecting the aging process using in silico means. A particular focus was on ontologies, because these are a key technology to systematically integrate heterogeneous information about the aging process. Related topics were databases and data integration. Other talks tackled modeling issues and applications, the latter including talks focused on marker development and cellular stress as well as on diseases, in particular on diseases of kidney and skin. PMID:24188080
Kobayashi, Norio; Ishii, Manabu; Takahashi, Satoshi; Mochizuki, Yoshiki; Matsushima, Akihiro; Toyoda, Tetsuro
2011-01-01
Global cloud frameworks for bioinformatics research databases become huge and heterogeneous; solutions face various diametric challenges comprising cross-integration, retrieval, security and openness. To address this, as of March 2011 organizations including RIKEN published 192 mammalian, plant and protein life sciences databases having 8.2 million data records, integrated as Linked Open or Private Data (LOD/LPD) using SciNetS.org, the Scientists' Networking System. The huge quantity of linked data this database integration framework covers is based on the Semantic Web, where researchers collaborate by managing metadata across public and private databases in a secured data space. This outstripped the data query capacity of existing interface tools like SPARQL. Actual research also requires specialized tools for data analysis using raw original data. To solve these challenges, in December 2009 we developed the lightweight Semantic-JSON interface to access each fragment of linked and raw life sciences data securely under the control of programming languages popularly used by bioinformaticians such as Perl and Ruby. Researchers successfully used the interface across 28 million semantic relationships for biological applications including genome design, sequence processing, inference over phenotype databases, full-text search indexing and human-readable contents like ontology and LOD tree viewers. Semantic-JSON services of SciNetS.org are provided at http://semanticjson.org. PMID:21632604
Managing hydrological measurements for small and intermediate projects: RObsDat
NASA Astrophysics Data System (ADS)
Reusser, Dominik E.
2014-05-01
Hydrological measurements need good management for the data not to be lost. Multiple, often overlapping files from various loggers with heterogeneous formats need to be merged. Data needs to be validated and cleaned and subsequently converted to the format for the hydrological target application. Preferably, all these steps should be easily tracable. RObsDat is an R package designed to support such data management. It comes with a command line user interface to support hydrologists to enter and adjust their data in a database following the Observations Data Model (ODM) standard by QUASHI. RObsDat helps in the setup of the database within one of the free database engines MySQL, PostgreSQL or SQLite. It imports the controlled water vocabulary from the QUASHI web service and provides a smart interface between the hydrologist and the database: Already existing data entries are detected and duplicates avoided. The data import function converts different data table designes to make import simple. Cleaning and modifications of data are handled with a simple version control system. Variable and location names are treated in a user friendly way, accepting and processing multiple versions. A new development is the use of spacetime objects for subsequent processing.
Steyaert, Louis T.; Loveland, Thomas R.; Brown, Jesslyn F.; Reed, Bradley C.
1993-01-01
Environmental modelers are testing and evaluating a prototype land cover characteristics database for the conterminous United States developed by the EROS Data Center of the U.S. Geological Survey and the University of Nebraska Center for Advanced Land Management Information Technologies. This database was developed from multi temporal, 1-kilometer advanced very high resolution radiometer (AVHRR) data for 1990 and various ancillary data sets such as elevation, ecological regions, and selected climatic normals. Several case studies using this database were analyzed to illustrate the integration of satellite remote sensing and geographic information systems technologies with land-atmosphere interactions models at a variety of spatial and temporal scales. The case studies are representative of contemporary environmental simulation modeling at local to regional levels in global change research, land and water resource management, and environmental simulation modeling at local to regional levels in global change research, land and water resource management and environmental risk assessment. The case studies feature land surface parameterizations for atmospheric mesoscale and global climate models; biogenic-hydrocarbons emissions models; distributed parameter watershed and other hydrological models; and various ecological models such as ecosystem, dynamics, biogeochemical cycles, ecotone variability, and equilibrium vegetation models. The case studies demonstrate the important of multi temporal AVHRR data to develop to develop and maintain a flexible, near-realtime land cover characteristics database. Moreover, such a flexible database is needed to derive various vegetation classification schemes, to aggregate data for nested models, to develop remote sensing algorithms, and to provide data on dynamic landscape characteristics. The case studies illustrate how such a database supports research on spatial heterogeneity, land use, sensitivity analysis, and scaling issues involving regional extrapolations and parameterizations of dynamic land processes within simulation models.
NASA Astrophysics Data System (ADS)
Barette, Florian; Poppe, Sam; Smets, Benoît; Benbakkar, Mhammed; Kervyn, Matthieu
2017-10-01
We present an integrated, spatially-explicit database of existing geochemical major-element analyses available from (post-) colonial scientific reports, PhD Theses and international publications for the Virunga Volcanic Province, located in the western branch of the East African Rift System. This volcanic province is characterised by alkaline volcanism, including silica-undersaturated, alkaline and potassic lavas. The database contains a total of 908 geochemical analyses of eruptive rocks for the entire volcanic province with a localisation for most samples. A preliminary analysis of the overall consistency of the database, using statistical techniques on sets of geochemical analyses with contrasted analytical methods or dates, demonstrates that the database is consistent. We applied a principal component analysis and cluster analysis on whole-rock major element compositions included in the database to study the spatial variation of the chemical composition of eruptive products in the Virunga Volcanic Province. These statistical analyses identify spatially distributed clusters of eruptive products. The known geochemical contrasts are highlighted by the spatial analysis, such as the unique geochemical signature of Nyiragongo lavas compared to other Virunga lavas, the geochemical heterogeneity of the Bulengo area, and the trachyte flows of Karisimbi volcano. Most importantly, we identified separate clusters of eruptive products which originate from primitive magmatic sources. These lavas of primitive composition are preferentially located along NE-SW inherited rift structures, often at distance from the central Virunga volcanoes. Our results illustrate the relevance of a spatial analysis on integrated geochemical data for a volcanic province, as a complement to classical petrological investigations. This approach indeed helps to characterise geochemical variations within a complex of magmatic systems and to identify specific petrologic and geochemical investigations that should be tackled within a study area.
NASA Astrophysics Data System (ADS)
Hsu, L.; Lehnert, K. A.; Carbotte, S. M.; Arko, R. A.; Ferrini, V.; O'hara, S. H.; Walker, J. D.
2012-12-01
The Integrated Earth Data Applications (IEDA) facility maintains multiple data systems with a wide range of solid earth data types from the marine, terrestrial, and polar environments. Examples of the different data types include syntheses of ultra-high resolution seafloor bathymetry collected on large collaborative cruises and analytical geochemistry measurements collected by single investigators in small, unique projects. These different data types have historically been channeled into separate, discipline-specific databases with search and retrieval tailored for the specific data type. However, a current major goal is to integrate data from different systems to allow interdisciplinary data discovery and scientific analysis. To increase discovery and access across these heterogeneous systems, IEDA employs several unique IDs, including sample IDs (International Geo Sample Number, IGSN), person IDs (GeoPass ID), funding award IDs (NSF Award Number), cruise IDs (from the Marine Geoscience Data System Expedition Metadata Catalog), dataset IDs (DOIs), and publication IDs (DOIs). These IDs allow linking of a sample registry (System for Earth SAmple Registration), data libraries and repositories (e.g. Geochemical Research Library, Marine Geoscience Data System), integrated synthesis databases (e.g. EarthChem Portal, PetDB), and investigator services (IEDA Data Compliance Tool). The linked systems allow efficient discovery of related data across different levels of granularity. In addition, IEDA data systems maintain links with several external data systems, including digital journal publishers. Links have been established between the EarthChem Portal and ScienceDirect through publication DOIs, returning sample-level objects and geochemical analyses for a particular publication. Linking IEDA-hosted data to digital publications with IGSNs at the sample level and with IEDA-allocated dataset DOIs are under development. As an example, an individual investigator could sign up for a GeoPass account ID, write a proposal to NSF and create a data plan using the IEDA Data Management Plan Tool. Having received the grant, the investigator then collects rock samples on a scientific cruise from dredges and registers the samples with IGSNs. The investigator then performs analytical geochemistry on the samples, and submits the full dataset to the Geochemical Resource Library for a dataset DOI. Finally, the investigator writes an article that is published in Science Direct. Knowing any of the following IDs: Investigator GeoPass ID, NSF Award Number, Cruise ID, Sample IGSNs, dataset DOI, or publication DOI, a user would be able to navigate to all samples, datasets, and publications in IEDA and external systems. Use of persistent identifiers to link heterogeneous data systems in IEDA thus increases access, discovery, and proper citation of hard-earned investigator datasets.
An adaptable architecture for patient cohort identification from diverse data sources
Bache, Richard; Miles, Simon; Taweel, Adel
2013-01-01
Objective We define and validate an architecture for systems that identify patient cohorts for clinical trials from multiple heterogeneous data sources. This architecture has an explicit query model capable of supporting temporal reasoning and expressing eligibility criteria independently of the representation of the data used to evaluate them. Method The architecture has the key feature that queries defined according to the query model are both pre and post-processed and this is used to address both structural and semantic heterogeneity. The process of extracting the relevant clinical facts is separated from the process of reasoning about them. A specific instance of the query model is then defined and implemented. Results We show that the specific instance of the query model has wide applicability. We then describe how it is used to access three diverse data warehouses to determine patient counts. Discussion Although the proposed architecture requires greater effort to implement the query model than would be the case for using just SQL and accessing a data-based management system directly, this effort is justified because it supports both temporal reasoning and heterogeneous data sources. The query model only needs to be implemented once no matter how many data sources are accessed. Each additional source requires only the implementation of a lightweight adaptor. Conclusions The architecture has been used to implement a specific query model that can express complex eligibility criteria and access three diverse data warehouses thus demonstrating the feasibility of this approach in dealing with temporal reasoning and data heterogeneity. PMID:24064442
Honoré, Paul; Granjeaud, Samuel; Tagett, Rebecca; Deraco, Stéphane; Beaudoing, Emmanuel; Rougemont, Jacques; Debono, Stéphane; Hingamp, Pascal
2006-09-20
High throughput gene expression profiling (GEP) is becoming a routine technique in life science laboratories. With experimental designs that repeatedly span thousands of genes and hundreds of samples, relying on a dedicated database infrastructure is no longer an option.GEP technology is a fast moving target, with new approaches constantly broadening the field diversity. This technology heterogeneity, compounded by the informatics complexity of GEP databases, means that software developments have so far focused on mainstream techniques, leaving less typical yet established techniques such as Nylon microarrays at best partially supported. MAF (MicroArray Facility) is the laboratory database system we have developed for managing the design, production and hybridization of spotted microarrays. Although it can support the widely used glass microarrays and oligo-chips, MAF was designed with the specific idiosyncrasies of Nylon based microarrays in mind. Notably single channel radioactive probes, microarray stripping and reuse, vector control hybridizations and spike-in controls are all natively supported by the software suite. MicroArray Facility is MIAME supportive and dynamically provides feedback on missing annotations to help users estimate effective MIAME compliance. Genomic data such as clone identifiers and gene symbols are also directly annotated by MAF software using standard public resources. The MAGE-ML data format is implemented for full data export. Journalized database operations (audit tracking), data anonymization, material traceability and user/project level confidentiality policies are also managed by MAF. MicroArray Facility is a complete data management system for microarray producers and end-users. Particular care has been devoted to adequately model Nylon based microarrays. The MAF system, developed and implemented in both private and academic environments, has proved a robust solution for shared facilities and industry service providers alike.
Honoré, Paul; Granjeaud, Samuel; Tagett, Rebecca; Deraco, Stéphane; Beaudoing, Emmanuel; Rougemont, Jacques; Debono, Stéphane; Hingamp, Pascal
2006-01-01
Background High throughput gene expression profiling (GEP) is becoming a routine technique in life science laboratories. With experimental designs that repeatedly span thousands of genes and hundreds of samples, relying on a dedicated database infrastructure is no longer an option. GEP technology is a fast moving target, with new approaches constantly broadening the field diversity. This technology heterogeneity, compounded by the informatics complexity of GEP databases, means that software developments have so far focused on mainstream techniques, leaving less typical yet established techniques such as Nylon microarrays at best partially supported. Results MAF (MicroArray Facility) is the laboratory database system we have developed for managing the design, production and hybridization of spotted microarrays. Although it can support the widely used glass microarrays and oligo-chips, MAF was designed with the specific idiosyncrasies of Nylon based microarrays in mind. Notably single channel radioactive probes, microarray stripping and reuse, vector control hybridizations and spike-in controls are all natively supported by the software suite. MicroArray Facility is MIAME supportive and dynamically provides feedback on missing annotations to help users estimate effective MIAME compliance. Genomic data such as clone identifiers and gene symbols are also directly annotated by MAF software using standard public resources. The MAGE-ML data format is implemented for full data export. Journalized database operations (audit tracking), data anonymization, material traceability and user/project level confidentiality policies are also managed by MAF. Conclusion MicroArray Facility is a complete data management system for microarray producers and end-users. Particular care has been devoted to adequately model Nylon based microarrays. The MAF system, developed and implemented in both private and academic environments, has proved a robust solution for shared facilities and industry service providers alike. PMID:16987406
Experimental validation of a new heterogeneous mechanical test design
NASA Astrophysics Data System (ADS)
Aquino, J.; Campos, A. Andrade; Souto, N.; Thuillier, S.
2018-05-01
Standard material parameters identification strategies generally use an extensive number of classical tests for collecting the required experimental data. However, a great effort has been made recently by the scientific and industrial communities to support this experimental database on heterogeneous tests. These tests can provide richer information on the material behavior allowing the identification of a more complete set of material parameters. This is a result of the recent development of full-field measurements techniques, like digital image correlation (DIC), that can capture the heterogeneous deformation fields on the specimen surface during the test. Recently, new specimen geometries were designed to enhance the richness of the strain field and capture supplementary strain states. The butterfly specimen is an example of these new geometries, designed through a numerical optimization procedure where an indicator capable of evaluating the heterogeneity and the richness of strain information. However, no experimental validation was yet performed. The aim of this work is to experimentally validate the heterogeneous butterfly mechanical test in the parameter identification framework. For this aim, DIC technique and a Finite Element Model Up-date inverse strategy are used together for the parameter identification of a DC04 steel, as well as the calculation of the indicator. The experimental tests are carried out in a universal testing machine with the ARAMIS measuring system to provide the strain states on the specimen surface. The identification strategy is accomplished with the data obtained from the experimental tests and the results are compared to a reference numerical solution.
A Collaborative Reasoning Maintenance System for a Reliable Application of Legislations
NASA Astrophysics Data System (ADS)
Tamisier, Thomas; Didry, Yoann; Parisot, Olivier; Feltz, Fernand
Decision support systems are nowadays used to disentangle all kinds of intricate situations and perform sophisticated analysis. Moreover, they are applied in areas where the knowledge can be heterogeneous, partially un-formalized, implicit, or diffuse. The representation and management of this knowledge become the key point to ensure the proper functioning of the system and keep an intuitive view upon its expected behavior. This paper presents a generic architecture for implementing knowledge-base systems used in collaborative business, where the knowledge is organized into different databases, according to the usage, persistence and quality of the information. This approach is illustrated with Cadral, a customizable automated tool built on this architecture and used for processing family benefits applications at the National Family Benefits Fund of the Grand-Duchy of Luxembourg.
NASA Technical Reports Server (NTRS)
Donnellan, Andrea; Parker, Jay W.; Lyzenga, Gregory A.; Granat, Robert A.; Norton, Charles D.; Rundle, John B.; Pierce, Marlon E.; Fox, Geoffrey C.; McLeod, Dennis; Ludwig, Lisa Grant
2012-01-01
QuakeSim 2.0 improves understanding of earthquake processes by providing modeling tools and integrating model applications and various heterogeneous data sources within a Web services environment. QuakeSim is a multisource, synergistic, data-intensive environment for modeling the behavior of earthquake faults individually, and as part of complex interacting systems. Remotely sensed geodetic data products may be explored, compared with faults and landscape features, mined by pattern analysis applications, and integrated with models and pattern analysis applications in a rich Web-based and visualization environment. Integration of heterogeneous data products with pattern informatics tools enables efficient development of models. Federated database components and visualization tools allow rapid exploration of large datasets, while pattern informatics enables identification of subtle, but important, features in large data sets. QuakeSim is valuable for earthquake investigations and modeling in its current state, and also serves as a prototype and nucleus for broader systems under development. The framework provides access to physics-based simulation tools that model the earthquake cycle and related crustal deformation. Spaceborne GPS and Inter ferometric Synthetic Aperture (InSAR) data provide information on near-term crustal deformation, while paleoseismic geologic data provide longerterm information on earthquake fault processes. These data sources are integrated into QuakeSim's QuakeTables database system, and are accessible by users or various model applications. UAVSAR repeat pass interferometry data products are added to the QuakeTables database, and are available through a browseable map interface or Representational State Transfer (REST) interfaces. Model applications can retrieve data from Quake Tables, or from third-party GPS velocity data services; alternatively, users can manually input parameters into the models. Pattern analysis of GPS and seismicity data has proved useful for mid-term forecasting of earthquakes, and for detecting subtle changes in crustal deformation. The GPS time series analysis has also proved useful as a data-quality tool, enabling the discovery of station anomalies and data processing and distribution errors. Improved visualization tools enable more efficient data exploration and understanding. Tools provide flexibility to science users for exploring data in new ways through download links, but also facilitate standard, intuitive, and routine uses for science users and end users such as emergency responders.
NASA Astrophysics Data System (ADS)
Liu, G.; Wu, C.; Li, X.; Song, P.
2013-12-01
The 3D urban geological information system has been a major part of the national urban geological survey project of China Geological Survey in recent years. Large amount of multi-source and multi-subject data are to be stored in the urban geological databases. There are various models and vocabularies drafted and applied by industrial companies in urban geological data. The issues such as duplicate and ambiguous definition of terms and different coding structure increase the difficulty of information sharing and data integration. To solve this problem, we proposed a national standard-driven information classification and coding method to effectively store and integrate urban geological data, and we applied the data dictionary technology to achieve structural and standard data storage. The overall purpose of this work is to set up a common data platform to provide information sharing service. Research progresses are as follows: (1) A unified classification and coding method for multi-source data based on national standards. Underlying national standards include GB 9649-88 for geology and GB/T 13923-2006 for geography. Current industrial models are compared with national standards to build a mapping table. The attributes of various urban geological data entity models are reduced to several categories according to their application phases and domains. Then a logical data model is set up as a standard format to design data file structures for a relational database. (2) A multi-level data dictionary for data standardization constraint. Three levels of data dictionary are designed: model data dictionary is used to manage system database files and enhance maintenance of the whole database system; attribute dictionary organizes fields used in database tables; term and code dictionary is applied to provide a standard for urban information system by adopting appropriate classification and coding methods; comprehensive data dictionary manages system operation and security. (3) An extension to system data management function based on data dictionary. Data item constraint input function is making use of the standard term and code dictionary to get standard input result. Attribute dictionary organizes all the fields of an urban geological information database to ensure the consistency of term use for fields. Model dictionary is used to generate a database operation interface automatically with standard semantic content via term and code dictionary. The above method and technology have been applied to the construction of Fuzhou Urban Geological Information System, South-East China with satisfactory results.
Wei, C P; Hu, P J; Sheng, O R
2001-03-01
When performing primary reading on a newly taken radiological examination, a radiologist often needs to reference relevant prior images of the same patient for confirmation or comparison purposes. Support of such image references is of clinical importance and may have significant effects on radiologists' examination reading efficiency, service quality, and work satisfaction. To effectively support such image reference needs, we proposed and developed a knowledge-based patient image pre-fetching system, addressing several challenging requirements of the application that include representation and learning of image reference heuristics and management of data-intensive knowledge inferencing. Moreover, the system demands an extensible and maintainable architecture design capable of effectively adapting to a dynamic environment characterized by heterogeneous and autonomous data source systems. In this paper, we developed a synthesized object-oriented entity- relationship model, a conceptual model appropriate for representing radiologists' prior image reference heuristics that are heuristic oriented and data intensive. We detailed the system architecture and design of the knowledge-based patient image pre-fetching system. Our architecture design is based on a client-mediator-server framework, capable of coping with a dynamic environment characterized by distributed, heterogeneous, and highly autonomous data source systems. To adapt to changes in radiologists' patient prior image reference heuristics, ID3-based multidecision-tree induction and CN2-based multidecision induction learning techniques were developed and evaluated. Experimentally, we examined effects of the pre-fetching system we created on radiologists' examination readings. Preliminary results show that the knowledge-based patient image pre-fetching system more accurately supports radiologists' patient prior image reference needs than the current practice adopted at the study site and that radiologists may become more efficient, consultatively effective, and better satisfied when supported by the pre-fetching system than when relying on the study site's pre-fetching practice.
Integrative Functional Genomics for Systems Genetics in GeneWeaver.org.
Bubier, Jason A; Langston, Michael A; Baker, Erich J; Chesler, Elissa J
2017-01-01
The abundance of existing functional genomics studies permits an integrative approach to interpreting and resolving the results of diverse systems genetics studies. However, a major challenge lies in assembling and harmonizing heterogeneous data sets across species for facile comparison to the positional candidate genes and coexpression networks that come from systems genetic studies. GeneWeaver is an online database and suite of tools at www.geneweaver.org that allows for fast aggregation and analysis of gene set-centric data. GeneWeaver contains curated experimental data together with resource-level data such as GO annotations, MP annotations, and KEGG pathways, along with persistent stores of user entered data sets. These can be entered directly into GeneWeaver or transferred from widely used resources such as GeneNetwork.org. Data are analyzed using statistical tools and advanced graph algorithms to discover new relations, prioritize candidate genes, and generate function hypotheses. Here we use GeneWeaver to find genes common to multiple gene sets, prioritize candidate genes from a quantitative trait locus, and characterize a set of differentially expressed genes. Coupling a large multispecies repository curated and empirical functional genomics data to fast computational tools allows for the rapid integrative analysis of heterogeneous data for interpreting and extrapolating systems genetics results.
You, Leiming; Wu, Jiexin; Feng, Yuchao; Fu, Yonggui; Guo, Yanan; Long, Liyuan; Zhang, Hui; Luan, Yijie; Tian, Peng; Chen, Liangfu; Huang, Guangrui; Huang, Shengfeng; Li, Yuxin; Li, Jie; Chen, Chengyong; Zhang, Yaqing; Chen, Shangwu; Xu, Anlong
2015-01-01
Increasing amounts of genes have been shown to utilize alternative polyadenylation (APA) 3′-processing sites depending on the cell and tissue type and/or physiological and pathological conditions at the time of processing, and the construction of genome-wide database regarding APA is urgently needed for better understanding poly(A) site selection and APA-directed gene expression regulation for a given biology. Here we present a web-accessible database, named APASdb (http://mosas.sysu.edu.cn/utr), which can visualize the precise map and usage quantification of different APA isoforms for all genes. The datasets are deeply profiled by the sequencing alternative polyadenylation sites (SAPAS) method capable of high-throughput sequencing 3′-ends of polyadenylated transcripts. Thus, APASdb details all the heterogeneous cleavage sites downstream of poly(A) signals, and maintains near complete coverage for APA sites, much better than the previous databases using conventional methods. Furthermore, APASdb provides the quantification of a given APA variant among transcripts with different APA sites by computing their corresponding normalized-reads, making our database more useful. In addition, APASdb supports URL-based retrieval, browsing and display of exon-intron structure, poly(A) signals, poly(A) sites location and usage reads, and 3′-untranslated regions (3′-UTRs). Currently, APASdb involves APA in various biological processes and diseases in human, mouse and zebrafish. PMID:25378337
Self-concept of left-behind children in China: a systematic review of the literature.
Wang, X; Ling, L; Su, H; Cheng, J; Jin, L; Sun, Y-H
2015-05-01
The aim of our study was to systematically review studies which had compared self-concept in left-behind children with the general population of children in China. Relevant studies about self-concept of left-behind children in China published from 2004 to 2014 were sought by searching online databases including Chinese Biological Medicine Database (CBM), Chinese National Knowledge Infrastructure (CNKI), Wanfang Database, Vip Database, PubMed Database, Google Scholar and Web of Science. The methodological quality of the articles was assessed by using Newcastle-Ottawa Scale (NOS). Poled effect size and associated 95% confidence interval (CI) were calculated using the random effects model. Cochrane's Q was used to test for heterogeneity and I(2) index was used to determine the degree of heterogeneity. Nineteen studies involving 7758 left-behind children met the inclusion criteria and 15 studies were included in a meta-analysis. The results indicated that left-behind group had a lower score of self-concept and more psychological problems than the control group. The factors associated with self-concept in left-behind children were gender, age, grade and the relationships with parents, guardians and teachers. Left-behind children had lower self-concept and more mental health problems compared with the general population of children. The development of self-concept may be an important channel for promoting mental health of left-behind children. © 2014 John Wiley & Sons Ltd.
Karadimas, H.; Hemery, F.; Roland, P.; Lepage, E.
2000-01-01
In medical software development, the use of databases plays a central role. However, most of the databases have heterogeneous encoding and data models. To deal with these variations in the application code directly is error-prone and reduces the potential reuse of the produced software. Several approaches to overcome these limitations have been proposed in the medical database literature, which will be presented. We present a simple solution, based on a Java library, and a central Metadata description file in XML. This development approach presents several benefits in software design and development cycles, the main one being the simplicity in maintenance. PMID:11079915
Masseroli, M; Bonacina, S; Pinciroli, F
2004-01-01
The actual development of distributed information technologies and Java programming enables employing them also in the medical arena to support the retrieval, integration and evaluation of heterogeneous data and multimodal images in a web browser environment. With this aim, we used them to implement a client-server architecture based on software agents. The client side is a Java applet running in a web browser and providing a friendly medical user interface to browse and visualize different patient and medical test data, integrating them properly. The server side manages secure connections and queries to heterogeneous remote databases and file systems containing patient personal and clinical data. Based on the Java Advanced Imaging API, processing and analysis tools were developed to support the evaluation of remotely retrieved bioimages through the quantification of their features in different regions of interest. The Java platform-independence allows the centralized management of the implemented prototype and its deployment to each site where an intranet or internet connection is available. Giving healthcare providers effective support for comprehensively browsing, visualizing and evaluating medical images and records located in different remote repositories, the developed prototype can represent an important aid in providing more efficient diagnoses and medical treatments.
Saada: A Generator of Astronomical Database
NASA Astrophysics Data System (ADS)
Michel, L.
2011-11-01
Saada transforms a set of heterogeneous FITS files or VOtables of various categories (images, tables, spectra, etc.) in a powerful database deployed on the Web. Databases are located on your host and stay independent of any external server. This job doesn’t require writing code. Saada can mix data of various categories in multiple collections. Data collections can be linked each to others making relevant browsing paths and allowing data-mining oriented queries. Saada supports 4 VO services (Spectra, images, sources and TAP) . Data collections can be published immediately after the deployment of the Web interface.
Addressing the Heterogeneity of Subject Indexing in the ADS Databases
NASA Astrophysics Data System (ADS)
Dubin, David S.
A drawback of the current document representation scheme in the ADS abstract service is its heterogeneous subject indexing. Several related but inconsistent indexing languages are represented in ADS. A method of reconciling some indexing inconsistencies is described. Using lexical similarity alone, one out of six ADS descriptors can be automatically mapped to some other descriptor. Analysis of postings data can direct administrators to those mergings it is most important to check for errors.
CoReCG: a comprehensive database of genes associated with colon-rectal cancer
Agarwal, Rahul; Kumar, Binayak; Jayadev, Msk; Raghav, Dhwani; Singh, Ashutosh
2016-01-01
Cancer of large intestine is commonly referred as colorectal cancer, which is also the third most frequently prevailing neoplasm across the globe. Though, much of work is being carried out to understand the mechanism of carcinogenesis and advancement of this disease but, fewer studies has been performed to collate the scattered information of alterations in tumorigenic cells like genes, mutations, expression changes, epigenetic alteration or post translation modification, genetic heterogeneity. Earlier findings were mostly focused on understanding etiology of colorectal carcinogenesis but less emphasis were given for the comprehensive review of the existing findings of individual studies which can provide better diagnostics based on the suggested markers in discrete studies. Colon Rectal Cancer Gene Database (CoReCG), contains 2056 colon-rectal cancer genes information involved in distinct colorectal cancer stages sourced from published literature with an effective knowledge based information retrieval system. Additionally, interactive web interface enriched with various browsing sections, augmented with advance search facility for querying the database is provided for user friendly browsing, online tools for sequence similarity searches and knowledge based schema ensures a researcher friendly information retrieval mechanism. Colorectal cancer gene database (CoReCG) is expected to be a single point source for identification of colorectal cancer-related genes, thereby helping with the improvement of classification, diagnosis and treatment of human cancers. Database URL: lms.snu.edu.in/corecg PMID:27114494
Prevalence of physical inactivity in Iran: a systematic review.
Fakhrzadeh, Hossein; Djalalinia, Shirin; Mirarefin, Mojdeh; Arefirad, Tahereh; Asayesh, Hamid; Safiri, Saeid; Samami, Elham; Mansourian, Morteza; Shamsizadeh, Morteza; Qorbani, Mostafa
2016-01-01
Introduction: Physical inactivity is one of the most important risk factors for chronic diseases, including cardiovascular disease, cancer, and stroke. We aim to conduct a systematic review of the prevalence of physical inactivity in Iran. Methods: We searched international databases; ISI, PubMed/Medline, Scopus, and national databases Irandoc, Barakat knowledge network system, and Scientific Information Database (SID). We collected data for outcome measures of prevalence of physical inactivity by sex, age, province, and year. Quality assessment and data extraction has been conducted independently by two independent research experts. There were no limitations for time and language. Results: We analyzed data for prevalence of physical inactivity in Iranian population. According to our search strategy we found 254 records; of them 185 were from international databases and the remaining 69 were obtained from national databases after refining the data, 34 articles that met eligible criteria remained for data extraction. From them respectively; 9, 20, 2 and 3 studies were at national, provincial, regional and local levels. The estimates for inactivity ranged from approximately 30% to almost 70% and had considerable variation between sexes and studied sub-groups. Conclusion: In Iran, most of studies reported high prevalence of physical inactivity. Our findings reveal a heterogeneity of reported values, often from differences in study design, measurement tools and methods, different target groups and sub-population sampling. These data do not provide the possibility of aggregation of data for a comprehensive inference.
Du, G D; Ma, L; Lv, Y H; Huang, L H; Fan, C Y; Xiang, Y; Lei, Q; Hu, R
2016-10-20
Objective: To assess the correlation between obstructive sleep apnea hypopnea syndrome(OSAHS) and chronic obstructive pulmonary disease(COPD). Method: Databases such as Chinese Biomedical Literature Database, PubMed, Chinese Academic Journals full-text database, Wanfang Resource Database and Chongqing VIP have been searched to collect literatures about the relationship between OSAHS and COPD. The literature in conference proceedings and certain unpublished articles were also manually retrieved. RCT conformed to the condition was evaluated according to the standards of literature assessment, and the data has been extracted. The RevMan5.3 software was applied to carry out the same Metaanalysis. Result: Totally 19 articles were included, and Metaanalysis reveal that overlap syndrome(OS) patient's apnea hypopnea index is significantly higher than those of OSAHS patients[WMD=7.56, 95% CI (4.19,10.94), P <0.01]; The LSaO₂ of OS patients is significantly lower than OSAHS patients[WMD=-10.50, 95% CI (-11.58, -6.08), P <0.01]; OS patients' FEV₁/FVC is significantly lower than COPD patients[WMD=4.65,95% CI (1.15,8.15), P <0.01].The results revealed that subgroup analysis according to the sample volume, age, body mass index(BMI) and FEV₁/FVC between OS patients and OSAHS patients has heterogeneity, but when analysis with the score of ESS the heterogeneity does not exist. Further, the subgroup analysis according to the sample volume, BMI, AHI,LSaO₂ and the time of Oxygen is lower than 90%(T90) those index between OS patients and COPD patients has heterogeneity, and the heterogeneity does not exist when subgroup is analyses with neck circumference. The funnel schema was nearly symmetry with little bias. Conclusion: The experimental results indicate that OSAHS is significantly related with COPD, and they may be the mutual risk factor for each other.. Copyright© by the Editorial Department of Journal of Clinical Otorhinolaryngology Head and Neck Surgery.
NASA Technical Reports Server (NTRS)
Peuquet, Donna J.
1987-01-01
A new approach to building geographic data models that is based on the fundamental characteristics of the data is presented. An overall theoretical framework for representing geographic data is proposed. An example of utilizing this framework in a Geographic Information System (GIS) context by combining artificial intelligence techniques with recent developments in spatial data processing techniques is given. Elements of data representation discussed include hierarchical structure, separation of locational and conceptual views, and the ability to store knowledge at variable levels of completeness and precision.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hurd, J.R.; Bonner, C.A.; Ostenak, C.A.
1989-01-01
ROBOCAL, which is presently being developed and tested at Los Alamos National Laboratory, is a full-scale, prototypical robotic system, for remote calorimetric and gamma-ray analysis of special nuclear materials. It integrates a fully automated, multi-drawer, vertical stacker-retriever system for staging unmeasured nuclear materials, and a fully automated gantry robot for computer-based selection and transfer of nuclear materials to calorimetric and gamma-ray measurement stations. Since ROBOCAL is designed for minimal operator intervention, a completely programmed user interface and data-base system are provided to interact with the automated mechanical and assay systems. The assay system is designed to completely integrate calorimetric andmore » gamma-ray data acquisition and to perform state-of-the-art analyses on both homogeneous and heterogeneous distributions of nuclear materials in a wide variety of matrices. 10 refs., 10 figs., 4 tabs.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Begoli, Edmon; Bates, Jack; Kistler, Derek E
The Polystore architecture revisits the federated approach to access and querying of the standalone, independent databases in the uniform and optimized fashion, but this time in the context of heterogeneous data and specialized analyses. In the light of this architectural philosophy, and in the light of the major data architecture development efforts at the US Department of Veterans Administration (VA), we discuss the need for the heterogeneous data store consisting of the large relational data warehouse, an image and text datastore, and a peta-scale genomic repository. The VA's heterogeneous datastore would, to a larger or smaller degree, follow the architecturalmore » blueprint proposed by the polystore architecture. To this end, we discuss the current state of the data architecture at VA, architectural alternatives for development of the heterogeneous datastore, the anticipated challenges, and the drawbacks and benefits of adopting the polystore architecture.« less
Integrating hospital information systems in healthcare institutions: a mediation architecture.
El Azami, Ikram; Cherkaoui Malki, Mohammed Ouçamah; Tahon, Christian
2012-10-01
Many studies have examined the integration of information systems into healthcare institutions, leading to several standards in the healthcare domain (CORBAmed: Common Object Request Broker Architecture in Medicine; HL7: Health Level Seven International; DICOM: Digital Imaging and Communications in Medicine; and IHE: Integrating the Healthcare Enterprise). Due to the existence of a wide diversity of heterogeneous systems, three essential factors are necessary to fully integrate a system: data, functions and workflow. However, most of the previous studies have dealt with only one or two of these factors and this makes the system integration unsatisfactory. In this paper, we propose a flexible, scalable architecture for Hospital Information Systems (HIS). Our main purpose is to provide a practical solution to insure HIS interoperability so that healthcare institutions can communicate without being obliged to change their local information systems and without altering the tasks of the healthcare professionals. Our architecture is a mediation architecture with 3 levels: 1) a database level, 2) a middleware level and 3) a user interface level. The mediation is based on two central components: the Mediator and the Adapter. Using the XML format allows us to establish a structured, secured exchange of healthcare data. The notion of medical ontology is introduced to solve semantic conflicts and to unify the language used for the exchange. Our mediation architecture provides an effective, promising model that promotes the integration of hospital information systems that are autonomous, heterogeneous, semantically interoperable and platform-independent.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gong, Zhenhuan; Boyuka, David; Zou, X
Download Citation Email Print Request Permissions Save to Project The size and scope of cutting-edge scientific simulations are growing much faster than the I/O and storage capabilities of their run-time environments. The growing gap is exacerbated by exploratory, data-intensive analytics, such as querying simulation data with multivariate, spatio-temporal constraints, which induces heterogeneous access patterns that stress the performance of the underlying storage system. Previous work addresses data layout and indexing techniques to improve query performance for a single access pattern, which is not sufficient for complex analytics jobs. We present PARLO a parallel run-time layout optimization framework, to achieve multi-levelmore » data layout optimization for scientific applications at run-time before data is written to storage. The layout schemes optimize for heterogeneous access patterns with user-specified priorities. PARLO is integrated with ADIOS, a high-performance parallel I/O middleware for large-scale HPC applications, to achieve user-transparent, light-weight layout optimization for scientific datasets. It offers simple XML-based configuration for users to achieve flexible layout optimization without the need to modify or recompile application codes. Experiments show that PARLO improves performance by 2 to 26 times for queries with heterogeneous access patterns compared to state-of-the-art scientific database management systems. Compared to traditional post-processing approaches, its underlying run-time layout optimization achieves a 56% savings in processing time and a reduction in storage overhead of up to 50%. PARLO also exhibits a low run-time resource requirement, while also limiting the performance impact on running applications to a reasonable level.« less
Gálvez, Sergio; Ferusic, Adis; Esteban, Francisco J; Hernández, Pilar; Caballero, Juan A; Dorado, Gabriel
2016-10-01
The Smith-Waterman algorithm has a great sensitivity when used for biological sequence-database searches, but at the expense of high computing-power requirements. To overcome this problem, there are implementations in literature that exploit the different hardware-architectures available in a standard PC, such as GPU, CPU, and coprocessors. We introduce an application that splits the original database-search problem into smaller parts, resolves each of them by executing the most efficient implementations of the Smith-Waterman algorithms in different hardware architectures, and finally unifies the generated results. Using non-overlapping hardware allows simultaneous execution, and up to 2.58-fold performance gain, when compared with any other algorithm to search sequence databases. Even the performance of the popular BLAST heuristic is exceeded in 78% of the tests. The application has been tested with standard hardware: Intel i7-4820K CPU, Intel Xeon Phi 31S1P coprocessors, and nVidia GeForce GTX 960 graphics cards. An important increase in performance has been obtained in a wide range of situations, effectively exploiting the available hardware.
NASA Astrophysics Data System (ADS)
Patel, M. N.; Looney, P.; Young, K.; Halling-Brown, M. D.
2014-03-01
Radiological imaging is fundamental within the healthcare industry and has become routinely adopted for diagnosis, disease monitoring and treatment planning. Over the past two decades both diagnostic and therapeutic imaging have undergone a rapid growth, the ability to be able to harness this large influx of medical images can provide an essential resource for research and training. Traditionally, the systematic collection of medical images for research from heterogeneous sites has not been commonplace within the NHS and is fraught with challenges including; data acquisition, storage, secure transfer and correct anonymisation. Here, we describe a semi-automated system, which comprehensively oversees the collection of both unprocessed and processed medical images from acquisition to a centralised database. The provision of unprocessed images within our repository enables a multitude of potential research possibilities that utilise the images. Furthermore, we have developed systems and software to integrate these data with their associated clinical data and annotations providing a centralised dataset for research. Currently we regularly collect digital mammography images from two sites and partially collect from a further three, with efforts to expand into other modalities and sites currently ongoing. At present we have collected 34,014 2D images from 2623 individuals. In this paper we describe our medical image collection system for research and discuss the wide spectrum of challenges faced during the design and implementation of such systems.
NASA Technical Reports Server (NTRS)
Abiteboul, Serge
1997-01-01
The amount of data of all kinds available electronically has increased dramatically in recent years. The data resides in different forms, ranging from unstructured data in the systems to highly structured in relational database systems. Data is accessible through a variety of interfaces including Web browsers, database query languages, application-specic interfaces, or data exchange formats. Some of this data is raw data, e.g., images or sound. Some of it has structure even if the structure is often implicit, and not as rigid or regular as that found in standard database systems. Sometimes the structure exists but has to be extracted from the data. Sometimes also it exists but we prefer to ignore it for certain purposes such as browsing. We call here semi-structured data this data that is (from a particular viewpoint) neither raw data nor strictly typed, i.e., not table-oriented as in a relational model or sorted-graph as in object databases. As will seen later when the notion of semi-structured data is more precisely de ned, the need for semi-structured data arises naturally in the context of data integration, even when the data sources are themselves well-structured. Although data integration is an old topic, the need to integrate a wider variety of data- formats (e.g., SGML or ASN.1 data) and data found on the Web has brought the topic of semi-structured data to the forefront of research. The main purpose of the paper is to isolate the essential aspects of semi- structured data. We also survey some proposals of models and query languages for semi-structured data. In particular, we consider recent works at Stanford U. and U. Penn on semi-structured data. In both cases, the motivation is found in the integration of heterogeneous data.
GenoQuery: a new querying module for functional annotation in a genomic warehouse
Lemoine, Frédéric; Labedan, Bernard; Froidevaux, Christine
2008-01-01
Motivation: We have to cope with both a deluge of new genome sequences and a huge amount of data produced by high-throughput approaches used to exploit these genomic features. Crossing and comparing such heterogeneous and disparate data will help improving functional annotation of genomes. This requires designing elaborate integration systems such as warehouses for storing and querying these data. Results: We have designed a relational genomic warehouse with an original multi-layer architecture made of a databases layer and an entities layer. We describe a new querying module, GenoQuery, which is based on this architecture. We use the entities layer to define mixed queries. These mixed queries allow searching for instances of biological entities and their properties in the different databases, without specifying in which database they should be found. Accordingly, we further introduce the central notion of alternative queries. Such queries have the same meaning as the original mixed queries, while exploiting complementarities yielded by the various integrated databases of the warehouse. We explain how GenoQuery computes all the alternative queries of a given mixed query. We illustrate how useful this querying module is by means of a thorough example. Availability: http://www.lri.fr/~lemoine/GenoQuery/ Contact: chris@lri.fr, lemoine@lri.fr PMID:18586731
Curating and Preserving the Big Canopy Database System: an Active Curation Approach using SEAD
NASA Astrophysics Data System (ADS)
Myers, J.; Cushing, J. B.; Lynn, P.; Weiner, N.; Ovchinnikova, A.; Nadkarni, N.; McIntosh, A.
2015-12-01
Modern research is increasingly dependent upon highly heterogeneous data and on the associated cyberinfrastructure developed to organize, analyze, and visualize that data. However, due to the complexity and custom nature of such combined data-software systems, it can be very challenging to curate and preserve them for the long term at reasonable cost and in a way that retains their scientific value. In this presentation, we describe how this challenge was met in preserving the Big Canopy Database (CanopyDB) system using an agile approach and leveraging the Sustainable Environment - Actionable Data (SEAD) DataNet project's hosted data services. The CanopyDB system was developed over more than a decade at Evergreen State College to address the needs of forest canopy researchers. It is an early yet sophisticated exemplar of the type of system that has become common in biological research and science in general, including multiple relational databases for different experiments, a custom database generation tool used to create them, an image repository, and desktop and web tools to access, analyze, and visualize this data. SEAD provides secure project spaces with a semantic content abstraction (typed content with arbitrary RDF metadata statements and relationships to other content), combined with a standards-based curation and publication pipeline resulting in packaged research objects with Digital Object Identifiers. Using SEAD, our cross-project team was able to incrementally ingest CanopyDB components (images, datasets, software source code, documentation, executables, and virtualized services) and to iteratively define and extend the metadata and relationships needed to document them. We believe that both the process, and the richness of the resultant standards-based (OAI-ORE) preservation object, hold lessons for the development of best-practice solutions for preserving scientific data in association with the tools and services needed to derive value from it.
The Web Based Monitoring Project at the CMS Experiment
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lopez-Perez, Juan Antonio; Badgett, William; Behrens, Ulf
The Compact Muon Solenoid is a large a complex general purpose experiment at the CERN Large Hadron Collider (LHC), built and maintained by many collaborators from around the world. Efficient operation of the detector requires widespread and timely access to a broad range of monitoring and status information. To the end the Web Based Monitoring (WBM) system was developed to present data to users located anywhere from many underlying heterogeneous sources, from real time messaging systems to relational databases. This system provides the power to combine and correlate data in both graphical and tabular formats of interest to the experimenters,more » including data such as beam conditions, luminosity, trigger rates, detector conditions, and many others, allowing for flexibility on the user’s side. This paper describes the WBM system architecture and describes how the system has been used from the beginning of data taking until now (Run1 and Run 2).« less
The web based monitoring project at the CMS experiment
NASA Astrophysics Data System (ADS)
Lopez-Perez, Juan Antonio; Badgett, William; Behrens, Ulf; Chakaberia, Irakli; Jo, Youngkwon; Maeshima, Kaori; Maruyama, Sho; Patrick, James; Rapsevicius, Valdas; Soha, Aron; Stankevicius, Mantas; Sulmanas, Balys; Toda, Sachiko; Wan, Zongru
2017-10-01
The Compact Muon Solenoid is a large a complex general purpose experiment at the CERN Large Hadron Collider (LHC), built and maintained by many collaborators from around the world. Efficient operation of the detector requires widespread and timely access to a broad range of monitoring and status information. To that end the Web Based Monitoring (WBM) system was developed to present data to users located anywhere from many underlying heterogeneous sources, from real time messaging systems to relational databases. This system provides the power to combine and correlate data in both graphical and tabular formats of interest to the experimenters, including data such as beam conditions, luminosity, trigger rates, detector conditions, and many others, allowing for flexibility on the user’s side. This paper describes the WBM system architecture and describes how the system has been used from the beginning of data taking until now (Run1 and Run 2).
Web Based Monitoring in the CMS Experiment at CERN
DOE Office of Scientific and Technical Information (OSTI.GOV)
Badgett, William; Borrello, Laura; Chakaberia, Irakli
2014-09-03
The Compact Muon Solenoid (CMS) is a large and complex general purpose experiment at the CERN Large Hadron Collider (LHC), built and maintained by many collaborators from around the world. Efficient operation of the detector requires widespread and timely access to a broad range of monitoring and status information. To this end the Web Based Monitoring (WBM) system was developed to present data to users located anywhere from many underlying heterogeneous sources, from real time messaging systems to relational databases. This system provides the power to combine and correlate data in both graphical and tabular formats of interest to themore » experimenters, including data such as beam conditions, luminosity, trigger rates, detector conditions, and many others, allowing for flexibility on the user side. This paper describes the WBM system architecture and describes how the system was used during the first major data taking run of the LHC.« less
NASA Technical Reports Server (NTRS)
Lin, Risheng; Afjeh, Abdollah A.
2003-01-01
Crucial to an efficient aircraft simulation-based design is a robust data modeling methodology for both recording the information and providing data transfer readily and reliably. To meet this goal, data modeling issues involved in the aircraft multidisciplinary design are first analyzed in this study. Next, an XML-based. extensible data object model for multidisciplinary aircraft design is constructed and implemented. The implementation of the model through aircraft databinding allows the design applications to access and manipulate any disciplinary data with a lightweight and easy-to-use API. In addition, language independent representation of aircraft disciplinary data in the model fosters interoperability amongst heterogeneous systems thereby facilitating data sharing and exchange between various design tools and systems.
NASA Astrophysics Data System (ADS)
Land, Walker H., Jr.; Lewis, Michael; Sadik, Omowunmi; Wong, Lut; Wanekaya, Adam; Gonzalez, Richard J.; Balan, Arun
2004-04-01
This paper extends the classification approaches described in reference [1] in the following way: (1.) developing and evaluating a new method for evolving organophosphate nerve agent Support Vector Machine (SVM) classifiers using Evolutionary Programming, (2.) conducting research experiments using a larger database of organophosphate nerve agents, and (3.) upgrading the architecture to an object-based grid system for evaluating the classification of EP derived SVMs. Due to the increased threats of chemical and biological weapons of mass destruction (WMD) by international terrorist organizations, a significant effort is underway to develop tools that can be used to detect and effectively combat biochemical warfare. This paper reports the integration of multi-array sensors with Support Vector Machines (SVMs) for the detection of organophosphates nerve agents using a grid computing system called Legion. Grid computing is the use of large collections of heterogeneous, distributed resources (including machines, databases, devices, and users) to support large-scale computations and wide-area data access. Finally, preliminary results using EP derived support vector machines designed to operate on distributed systems have provided accurate classification results. In addition, distributed training time architectures are 50 times faster when compared to standard iterative training time methods.
Spatial Data Integration Using Ontology-Based Approach
NASA Astrophysics Data System (ADS)
Hasani, S.; Sadeghi-Niaraki, A.; Jelokhani-Niaraki, M.
2015-12-01
In today's world, the necessity for spatial data for various organizations is becoming so crucial that many of these organizations have begun to produce spatial data for that purpose. In some circumstances, the need to obtain real time integrated data requires sustainable mechanism to process real-time integration. Case in point, the disater management situations that requires obtaining real time data from various sources of information. One of the problematic challenges in the mentioned situation is the high degree of heterogeneity between different organizations data. To solve this issue, we introduce an ontology-based method to provide sharing and integration capabilities for the existing databases. In addition to resolving semantic heterogeneity, better access to information is also provided by our proposed method. Our approach is consisted of three steps, the first step is identification of the object in a relational database, then the semantic relationships between them are modelled and subsequently, the ontology of each database is created. In a second step, the relative ontology will be inserted into the database and the relationship of each class of ontology will be inserted into the new created column in database tables. Last step is consisted of a platform based on service-oriented architecture, which allows integration of data. This is done by using the concept of ontology mapping. The proposed approach, in addition to being fast and low cost, makes the process of data integration easy and the data remains unchanged and thus takes advantage of the legacy application provided.
NASA Astrophysics Data System (ADS)
Carniel, Roberto; Di Cecca, Mauro; Jaquet, Olivier
2006-05-01
In the framework of the EU-funded project "Multi-disciplinary monitoring, modelling and forecasting of volcanic hazard" (MULTIMO), multiparametric data have been recorded at the MULTIMO station in Montserrat. Moreover, several other long time series, recorded at Montserrat and at other volcanoes, have been acquired in order to test stochastic and deterministic methodologies under development. Creating a general framework to handle data efficiently is a considerable task even for homogeneous data. In the case of heterogeneous data, this becomes a major issue. A need for a consistent way of browsing such a heterogeneous dataset in a user-friendly way therefore arose. Additionally, a framework for applying the calculation of the developed dynamical parameters on the data series was also needed in order to easily keep these parameters under control, e.g. for monitoring, research or forecasting purposes. The solution which we present is completely based on Open Source software, including Linux operating system, MySql database management system, Apache web server, Zope application server, Scilab math engine, Plone content management framework, Unified Modelling Language. From the user point of view the main advantage is the possibility of browsing through datasets recorded on different volcanoes, with different instruments, with different sampling frequencies, stored in different formats, all via a consistent, user- friendly interface that transparently runs queries to the database, gets the data from the main storage units, generates the graphs and produces dynamically generated web pages to interact with the user. The involvement of third parties for continuing the development in the Open Source philosophy and/or extending the application fields is now sought.
NASA Astrophysics Data System (ADS)
Seufert, V.; Wood, S.; Reid, A.; Gonzalez, A.; Rhemtulla, J.; Ramankutty, N.
2014-12-01
The most important current driver of biodiversity loss is the conversion of natural habitats for human land uses, mostly for the purpose of food production. However, by causing this biodiversity loss, food production is eroding the very same ecosystem services (e.g. pollination and soil fertility) that it depends on. We therefore need to adopt more wildlife-friendly agricultural practices that can contribute to preserving biodiversity. Organic farming has been shown to typically host higher biodiversity than conventional farming. But how is the biodiversity benefit of organic management dependent on the landscape context farms are situated in? To implement organic farming as an effective means for protecting biodiversity and enhancing ecosystem services we need to understand better under what conditions organic management is most beneficial for species. We conducted a meta-analysis of the literature to answer this question, compiling the most comprehensive database to date of studies that monitored biodiversity in organic vs. conventional fields. We also collected information about the landscape surrounding these fields from remote sensing products. Our database consists of 348 study sites across North America and Europe. Our analysis shows that organic management can improve biodiversity in agricultural fields substantially. It is especially effective at preserving biodiversity in homogeneous landscapes that are structurally simplified and dominated by either cropland or pasture. In heterogeneous landscapes conventional agriculture might instead already hold high biodiversity, and organic management does not appear to provide as much of a benefit for species richness as in simplified landscapes. Our results suggest that strategies to maintain biodiversity-dependent ecosystem services should include a combination of pristine natural habitats, wildlife-friendly farming systems like organic farming, and high-yielding conventional systems, interspersed in structurally diverse, heterogeneous landscapes.
Stotler, R.L.; Frape, S.K.; El Mugammar, H.T.; Johnston, C.; Judd-Henrey, I.; Harvey, F.E.; Drimmie, R.; Jones, J.P.
2011-01-01
The Waterloo Moraine is a stratigraphically complex system and is the major water supply to the cities of Kitchener and Waterloo in Ontario, Canada. Despite over 30 years of investigation, no attempt has been made to unify existing geochemical data into a single database. A composite view of the moraine geochemistry has been created using the available geochemical information, and a framework created for geochemical data synthesis of other similar flow systems. Regionally, fluid chemistry is highly heterogeneous, with large variations in both water type and total dissolved solids content. Locally, upper aquifer units are affected by nitrate and chloride from fertilizer and road salt. Typical upper-aquifer fluid chemistry is dominated by calcium, magnesium, and bicarbonate, a result of calcite and dolomite dissolution. Evidence also suggests that ion exchange and diffusion from tills and bedrock units accounts for some elevated sodium concentrations. Locally, hydraulic "windows" cross connect upper and lower aquifer units, which are typically separated by a clay till. Lower aquifer units are also affected by dedolomitization, mixing with bedrock water, and locally, upward diffusion of solutes from the bedrock aquifers. A map of areas where aquifer units are geochemically similar was constructed to highlight areas with potential hydraulic windows. ?? 2010 Springer-Verlag.
Coordinating complex decision support activities across distributed applications
NASA Technical Reports Server (NTRS)
Adler, Richard M.
1994-01-01
Knowledge-based technologies have been applied successfully to automate planning and scheduling in many problem domains. Automation of decision support can be increased further by integrating task-specific applications with supporting database systems, and by coordinating interactions between such tools to facilitate collaborative activities. Unfortunately, the technical obstacles that must be overcome to achieve this vision of transparent, cooperative problem-solving are daunting. Intelligent decision support tools are typically developed for standalone use, rely on incompatible, task-specific representational models and application programming interfaces (API's), and run on heterogeneous computing platforms. Getting such applications to interact freely calls for platform independent capabilities for distributed communication, as well as tools for mapping information across disparate representations. Symbiotics is developing a layered set of software tools (called NetWorks! for integrating and coordinating heterogeneous distributed applications. he top layer of tools consists of an extensible set of generic, programmable coordination services. Developers access these services via high-level API's to implement the desired interactions between distributed applications.
NASA Astrophysics Data System (ADS)
Michel, L.; Motch, C.; Nguyen Ngoc, H.; Pineau, F. X.
2009-09-01
Saada (http://amwdb.u-strasbg.fr/saada) is a tool for helping astronomers build local archives without writing any code (Michel et al. 2004). Databases created by Saada can host collections of heterogeneous data files. These data collections can also be published in the VO. An overview of the main Saada features is presented in this demo: creation of a basic database, creation of relationships, data searches using SaadaQL, metadata tagging, and use of VO services.
Rare Diseases Leading to Childhood Glaucoma: Epidemiology, Pathophysiogenesis, and Management.
Abdolrahimzadeh, Solmaz; Fameli, Valeria; Mollo, Roberto; Contestabile, Maria Teresa; Perdicchi, Andrea; Recupero, Santi Maria
2015-01-01
Noteworthy heterogeneity exists in the rare diseases associated with childhood glaucoma. Primary congenital glaucoma is mostly sporadic; however, 10% to 40% of cases are familial. CYP1B1 gene mutations seem to account for 87% of familial cases and 27% of sporadic cases. Childhood glaucoma is classified in primary and secondary congenital glaucoma, further divided as glaucoma arising in dysgenesis associated with neural crest anomalies, phakomatoses, metabolic disorders, mitotic diseases, congenital disorders, and acquired conditions. Neural crest alterations lead to the wide spectrum of iridocorneal trabeculodysgenesis. Systemic diseases associated with childhood glaucoma include the heterogenous group of phakomatoses where glaucoma is frequently encountered in the Sturge-Weber syndrome and its variants, in phakomatosis pigmentovascularis associated with oculodermal melanocytosis, and more rarely in neurofibromatosis type 1. Childhood glaucoma is also described in systemic disorders of mitotic and metabolic activity. Acquired secondary glaucoma has been associated with uveitis, trauma, drugs, and neoplastic diseases. A database research revealed reports of childhood glaucoma in rare diseases, which do not include glaucoma in their manifestation. These are otopalatodigital syndrome, complete androgen insensitivity, pseudotrisomy 13, Brachmann-de Lange syndrome, acrofrontofacionasal dysostosis, caudal regression syndrome, and Wolf-Hirschhorn syndrome.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sharma, G.D.
1993-09-01
The Alaskan North Slope comprises one of the Nation`s and the world`s most prolific oil province. Original oil in place (OOIP) is estimated at nearly 70 BBL (Kamath and Sharma, 1986). Generalized reservoir descriptions have been completed by the University of Alaska`s Petroleum Development Laboratory over North Slope`s major fields. These fields include West Sak (20 BBL OOIP), Ugnu (15 BBL OOIP), Prudhoe Bay (23 BBL OOIP), Kuparuk (5.5 BBL OOIP), Milne Point (3 BBL OOIP), and Endicott (1 BBL OOIP). Reservoir description has included the acquisition of open hole log data from the Alaska Oil and Gas Conservation Commissionmore » (AOGCC), computerized well log analysis using state-of-the-art computers, and integration of geologic and logging data. The studies pertaining to fluid characterization described in this report include: experimental study of asphaltene precipitation for enriched gases, CO{sup 2} and West Sak crude system, modeling of asphaltene equilibria including homogeneous as well as polydispersed thermodynamic models, effect of asphaltene deposition on rock-fluid properties, fluid properties of some Alaskan north slope reservoirs. Finally, the last chapter summarizes the reservoir heterogeneity classification system for TORIS and TORIS database.« less
Olofsson, Per; Norén, Håkan; Carlsson, Ann
2018-02-01
The updated intrapartum cardiotocography (CTG) classification system by FIGO in 2015 (FIGO2015) and the FIGO2015-approached classification by the Swedish Society of Obstetricians and Gynecologist in 2017 (SSOG2017) are not harmonized with the fetal ECG ST analysis (STAN) algorithm from 2007 (STAN2007). The study aimed to reveal homogeneity and agreement between the systems in classifying CTG and ST events, and relate them to maternal and perinatal outcomes. Among CTG traces with ST events, 100 traces originally classified as normal, 100 as suspicious and 100 as pathological were randomly selected from a STAN database and classified by two experts in consensus. Homogeneity and agreement statistics between the CTG classifications were performed. Maternal and perinatal outcomes were evaluated in cases with clinically hidden ST data (n = 151). A two-tailed p < 0.05 was regarded as significant. For CTG classes, the heterogeneity was significant between the old and new systems, and agreements were moderate to strong (proportion of agreement, kappa index 0.70-0.86). Between the new classifications, heterogeneity was significant and agreements strong (0.90, 0.92). For significant ST events, heterogeneities were significant and agreements moderate to almost perfect (STAN2007 vs. FIGO2015 0.86, 0.72; STAN2007 vs. SSOG2017 0.92, 0.84; FIGO2015 vs. SSOG2017 0.94, 0.87). Significant ST events occurred more often combined with STAN2007 than with FIGO2015 classification, but not with SSOG2017; correct identification of adverse outcomes was not significantly different between the systems. There are discrepancies in the classification of CTG patterns and significant ST events between the old and new systems. The clinical relevance of the findings remains to be shown. © 2017 The Authors. Acta Obstetricia et Gynecologica Scandinavica published by John Wiley & Sons Ltd on behalf of Nordic Federation of Societies of Obstetrics and Gynecology (NFOG).
An Information System for European culture collections: the way forward.
Casaregola, Serge; Vasilenko, Alexander; Romano, Paolo; Robert, Vincent; Ozerskaya, Svetlana; Kopf, Anna; Glöckner, Frank O; Smith, David
2016-01-01
Culture collections contain indispensable information about the microorganisms preserved in their repositories, such as taxonomical descriptions, origins, physiological and biochemical characteristics, bibliographic references, etc. However, information currently accessible in databases rarely adheres to common standard protocols. The resultant heterogeneity between culture collections, in terms of both content and format, notably hampers microorganism-based research and development (R&D). The optimized exploitation of these resources thus requires standardized, and simplified, access to the associated information. To this end, and in the interest of supporting R&D in the fields of agriculture, health and biotechnology, a pan-European distributed research infrastructure, MIRRI, including over 40 public culture collections and research institutes from 19 European countries, was established. A prime objective of MIRRI is to unite and provide universal access to the fragmented, and untapped, resources, information and expertise available in European public collections of microorganisms; a key component of which is to develop a dynamic Information System. For the first time, both culture collection curators as well as their users have been consulted and their feedback, concerning the needs and requirements for collection databases and data accessibility, utilised. Users primarily noted that databases were not interoperable, thus rendering a global search of multiple databases impossible. Unreliable or out-of-date and, in particular, non-homogenous, taxonomic information was also considered to be a major obstacle to searching microbial data efficiently. Moreover, complex searches are rarely possible in online databases thus limiting the extent of search queries. Curators also consider that overall harmonization-including Standard Operating Procedures, data structure, and software tools-is necessary to facilitate their work and to make high-quality data easily accessible to their users. Clearly, the needs of culture collection curators coincide with those of users on the crucial point of database interoperability. In this regard, and in order to design an appropriate Information System, important aspects on which the culture collection community should focus include: the interoperability of data sets with the ontologies to be used; setting best practice in data management, and the definition of an appropriate data standard.
Zhang, Liming; Yu, Dongsheng; Shi, Xuezheng; Xu, Shengxiang; Xing, Shihe; Zhao, Yongcong
2014-01-01
Soil organic carbon (SOC) models were often applied to regions with high heterogeneity, but limited spatially differentiated soil information and simulation unit resolution. This study, carried out in the Tai-Lake region of China, defined the uncertainty derived from application of the DeNitrification-DeComposition (DNDC) biogeochemical model in an area with heterogeneous soil properties and different simulation units. Three different resolution soil attribute databases, a polygonal capture of mapping units at 1∶50,000 (P5), a county-based database of 1∶50,000 (C5) and county-based database of 1∶14,000,000 (C14), were used as inputs for regional DNDC simulation. The P5 and C5 databases were combined with the 1∶50,000 digital soil map, which is the most detailed soil database for the Tai-Lake region. The C14 database was combined with 1∶14,000,000 digital soil map, which is a coarse database and is often used for modeling at a national or regional scale in China. The soil polygons of P5 database and county boundaries of C5 and C14 databases were used as basic simulation units. Results project that from 1982 to 2000, total SOC change in the top layer (0–30 cm) of the 2.3 M ha of paddy soil in the Tai-Lake region was +1.48 Tg C, −3.99 Tg C and −15.38 Tg C based on P5, C5 and C14 databases, respectively. With the total SOC change as modeled with P5 inputs as the baseline, which is the advantages of using detailed, polygon-based soil dataset, the relative deviation of C5 and C14 were 368% and 1126%, respectively. The comparison illustrates that DNDC simulation is strongly influenced by choice of fundamental geographic resolution as well as input soil attribute detail. The results also indicate that improving the framework of DNDC is essential in creating accurate models of the soil carbon cycle. PMID:24523922
Veneman, Jolien B; Saetnan, Eli R; Clare, Amanda J; Newbold, Charles J
2016-12-01
The body of peer-reviewed papers on enteric methane mitigation strategies in ruminants is rapidly growing and allows for better estimation of the true effect of each strategy though the use of meta-analysis methods. Here we present the development of an online database of measured methane mitigation strategies called MitiGate, currently comprising 412 papers. The database is accessible through an online user-friendly interface that allows data extraction with various levels of aggregation on one hand and data-uploading for submission to the database allowing for future refinement and updates of mitigation estimates as well as providing easy access to relevant data for integration into modelling efforts or policy recommendations. To demonstrate and verify the usefulness of the MitiGate database those studies where methane emissions were expressed per unit of intake (293 papers resulting in 845 treatment comparisons) were used in a meta-analysis. The meta-analysis of the current database estimated the effect size of each of the mitigation strategies as well as the associated variance and measure of heterogeneity. Currently, under-representation of certain strategies, geographic regions and long term studies are the main limitations in providing an accurate quantitative estimation of the mitigation potential of each strategy under varying animal production systems. We have thus implemented the facility for researchers to upload meta-data of their peer reviewed research through a simple input form in the hope that MitiGate will grow into a fully inclusive resource for those wishing to model methane mitigation strategies in ruminants. Copyright © 2016 Elsevier B.V. All rights reserved.
Dynamic taxonomies applied to a web-based relational database for geo-hydrological risk mitigation
NASA Astrophysics Data System (ADS)
Sacco, G. M.; Nigrelli, G.; Bosio, A.; Chiarle, M.; Luino, F.
2012-02-01
In its 40 years of activity, the Research Institute for Geo-hydrological Protection of the Italian National Research Council has amassed a vast and varied collection of historical documentation on landslides, muddy-debris flows, and floods in northern Italy from 1600 to the present. Since 2008, the archive resources have been maintained through a relational database management system. The database is used for routine study and research purposes as well as for providing support during geo-hydrological emergencies, when data need to be quickly and accurately retrieved. Retrieval speed and accuracy are the main objectives of an implementation based on a dynamic taxonomies model. Dynamic taxonomies are a general knowledge management model for configuring complex, heterogeneous information bases that support exploratory searching. At each stage of the process, the user can explore or browse the database in a guided yet unconstrained way by selecting the alternatives suggested for further refining the search. Dynamic taxonomies have been successfully applied to such diverse and apparently unrelated domains as e-commerce and medical diagnosis. Here, we describe the application of dynamic taxonomies to our database and compare it to traditional relational database query methods. The dynamic taxonomy interface, essentially a point-and-click interface, is considerably faster and less error-prone than traditional form-based query interfaces that require the user to remember and type in the "right" search keywords. Finally, dynamic taxonomy users have confirmed that one of the principal benefits of this approach is the confidence of having considered all the relevant information. Dynamic taxonomies and relational databases work in synergy to provide fast and precise searching: one of the most important factors in timely response to emergencies.
Geomasking sensitive health data and privacy protection: an evaluation using an E911 database.
Allshouse, William B; Fitch, Molly K; Hampton, Kristen H; Gesink, Dionne C; Doherty, Irene A; Leone, Peter A; Serre, Marc L; Miller, William C
2010-10-01
Geomasking is used to provide privacy protection for individual address information while maintaining spatial resolution for mapping purposes. Donut geomasking and other random perturbation geomasking algorithms rely on the assumption of a homogeneously distributed population to calculate displacement distances, leading to possible under-protection of individuals when this condition is not met. Using household data from 2007, we evaluated the performance of donut geomasking in Orange County, North Carolina. We calculated the estimated k-anonymity for every household based on the assumption of uniform household distribution. We then determined the actual k-anonymity by revealing household locations contained in the county E911 database. Census block groups in mixed-use areas with high population distribution heterogeneity were the most likely to have privacy protection below selected criteria. For heterogeneous populations, we suggest tripling the minimum displacement area in the donut to protect privacy with a less than 1% error rate.
Geomasking sensitive health data and privacy protection: an evaluation using an E911 database
Allshouse, William B; Fitch, Molly K; Hampton, Kristen H; Gesink, Dionne C; Doherty, Irene A; Leone, Peter A; Serre, Marc L; Miller, William C
2010-01-01
Geomasking is used to provide privacy protection for individual address information while maintaining spatial resolution for mapping purposes. Donut geomasking and other random perturbation geomasking algorithms rely on the assumption of a homogeneously distributed population to calculate displacement distances, leading to possible under-protection of individuals when this condition is not met. Using household data from 2007, we evaluated the performance of donut geomasking in Orange County, North Carolina. We calculated the estimated k-anonymity for every household based on the assumption of uniform household distribution. We then determined the actual k-anonymity by revealing household locations contained in the county E911 database. Census block groups in mixed-use areas with high population distribution heterogeneity were the most likely to have privacy protection below selected criteria. For heterogeneous populations, we suggest tripling the minimum displacement area in the donut to protect privacy with a less than 1% error rate. PMID:20953360
Yun Chen; Hui Yang
2014-01-01
The rapid advancements of biomedical instrumentation and healthcare technology have resulted in data-rich environments in hospitals. However, the meaningful information extracted from rich datasets is limited. There is a dire need to go beyond current medical practices, and develop data-driven methods and tools that will enable and help (i) the handling of big data, (ii) the extraction of data-driven knowledge, (iii) the exploitation of acquired knowledge for optimizing clinical decisions. This present study focuses on the prediction of mortality rates in Intensive Care Units (ICU) using patient-specific healthcare recordings. It is worth mentioning that postsurgical monitoring in ICU leads to massive datasets with unique properties, e.g., variable heterogeneity, patient heterogeneity, and time asyncronization. To cope with the challenges in ICU datasets, we developed the postsurgical decision support system with a series of analytical tools, including data categorization, data pre-processing, feature extraction, feature selection, and predictive modeling. Experimental results show that the proposed data-driven methodology outperforms traditional approaches and yields better results based on the evaluation of real-world ICU data from 4000 subjects in the database. This research shows great potentials for the use of data-driven analytics to improve the quality of healthcare services.
Odronitz, Florian; Kollmar, Martin
2006-11-29
Annotation of protein sequences of eukaryotic organisms is crucial for the understanding of their function in the cell. Manual annotation is still by far the most accurate way to correctly predict genes. The classification of protein sequences, their phylogenetic relation and the assignment of function involves information from various sources. This often leads to a collection of heterogeneous data, which is hard to track. Cytoskeletal and motor proteins consist of large and diverse superfamilies comprising up to several dozen members per organism. Up to date there is no integrated tool available to assist in the manual large-scale comparative genomic analysis of protein families. Pfarao (Protein Family Application for Retrieval, Analysis and Organisation) is a database driven online working environment for the analysis of manually annotated protein sequences and their relationship. Currently, the system can store and interrelate a wide range of information about protein sequences, species, phylogenetic relations and sequencing projects as well as links to literature and domain predictions. Sequences can be imported from multiple sequence alignments that are generated during the annotation process. A web interface allows to conveniently browse the database and to compile tabular and graphical summaries of its content. We implemented a protein sequence-centric web application to store, organize, interrelate, and present heterogeneous data that is generated in manual genome annotation and comparative genomics. The application has been developed for the analysis of cytoskeletal and motor proteins (CyMoBase) but can easily be adapted for any protein.
NASA Astrophysics Data System (ADS)
Bikakis, Nikos; Gioldasis, Nektarios; Tsinaraki, Chrisa; Christodoulakis, Stavros
SPARQL is today the standard access language for Semantic Web data. In the recent years XML databases have also acquired industrial importance due to the widespread applicability of XML in the Web. In this paper we present a framework that bridges the heterogeneity gap and creates an interoperable environment where SPARQL queries are used to access XML databases. Our approach assumes that fairly generic mappings between ontology constructs and XML Schema constructs have been automatically derived or manually specified. The mappings are used to automatically translate SPARQL queries to semantically equivalent XQuery queries which are used to access the XML databases. We present the algorithms and the implementation of SPARQL2XQuery framework, which is used for answering SPARQL queries over XML databases.
Patel, Tejas K; Patel, Parvati B
2018-06-01
The aim of this study was to estimate the prevalence of mortality among patients due to adverse drug reactions that lead to hospitalisation (fatal ADR Ad ), to explore the heterogeneity in its estimation through subgroup analysis of study characteristics, and to identify system-organ classes involved and causative drugs for fatal ADR Ad . We identified prospective ADR Ad -related studies via screening of the PubMed and Google Scholar databases with appropriate key terms. We estimated the prevalence of fatal ADR Ad using a double arcsine method and explored heterogeneity using the following study characteristics: age groups, wards, study region, ADR definitions, ADR identification methods, study duration and sample size. We examined patterns of fatal ADR Ad and causative drugs. Among 312 full-text articles assessed, 49 studies satisfied the selection criteria and were included in the analysis. The mean prevalence of fatal ADR Ad was 0.20% (95% CI: 0.13-0.27%; I 2 = 93%). The age groups and study wards were the important heterogeneity modifiers. The mean fatal ADR Ad prevalence varied from 0.01% in paediatric patients to 0.44% in the elderly. Subgroup analysis showed a higher prevalence of fatal ADR Ad in intensive care units, emergency departments, multispecialty wards and whole hospitals. Computer-based monitoring systems in combination with other methods detected higher mortality. Intracranial haemorrhage, renal failure and gastrointestinal bleeding accounted for more than 50% of fatal ADR Ad cases. Warfarin, aspirin, renin-angiotensin system (RAS) inhibitors and digoxin accounted for 60% of fatal ADR Ad . ADR Ad is an important cause of mortality. Strategies targeting the safer use of warfarin, aspirin, RAS inhibitors and digoxin could reduce the large number of fatal ADR Ad cases.
Chiba, Hirokazu; Nishide, Hiroyo; Uchiyama, Ikuo
2015-01-01
Recently, various types of biological data, including genomic sequences, have been rapidly accumulating. To discover biological knowledge from such growing heterogeneous data, a flexible framework for data integration is necessary. Ortholog information is a central resource for interlinking corresponding genes among different organisms, and the Semantic Web provides a key technology for the flexible integration of heterogeneous data. We have constructed an ortholog database using the Semantic Web technology, aiming at the integration of numerous genomic data and various types of biological information. To formalize the structure of the ortholog information in the Semantic Web, we have constructed the Ortholog Ontology (OrthO). While the OrthO is a compact ontology for general use, it is designed to be extended to the description of database-specific concepts. On the basis of OrthO, we described the ortholog information from our Microbial Genome Database for Comparative Analysis (MBGD) in the form of Resource Description Framework (RDF) and made it available through the SPARQL endpoint, which accepts arbitrary queries specified by users. In this framework based on the OrthO, the biological data of different organisms can be integrated using the ortholog information as a hub. Besides, the ortholog information from different data sources can be compared with each other using the OrthO as a shared ontology. Here we show some examples demonstrating that the ortholog information described in RDF can be used to link various biological data such as taxonomy information and Gene Ontology. Thus, the ortholog database using the Semantic Web technology can contribute to biological knowledge discovery through integrative data analysis.
Biomedical data integration in computational drug design and bioinformatics.
Seoane, Jose A; Aguiar-Pulido, Vanessa; Munteanu, Cristian R; Rivero, Daniel; Rabunal, Juan R; Dorado, Julian; Pazos, Alejandro
2013-03-01
In recent years, in the post genomic era, more and more data is being generated by biological high throughput technologies, such as proteomics and transcriptomics. This omics data can be very useful, but the real challenge is to analyze all this data, as a whole, after integrating it. Biomedical data integration enables making queries to different, heterogeneous and distributed biomedical data sources. Data integration solutions can be very useful not only in the context of drug design, but also in biomedical information retrieval, clinical diagnosis, system biology, etc. In this review, we analyze the most common approaches to biomedical data integration, such as federated databases, data warehousing, multi-agent systems and semantic technology, as well as the solutions developed using these approaches in the past few years.
A local space time kriging approach applied to a national outpatient malaria data set
NASA Astrophysics Data System (ADS)
Gething, P. W.; Atkinson, P. M.; Noor, A. M.; Gikandi, P. W.; Hay, S. I.; Nixon, M. S.
2007-10-01
Increases in the availability of reliable health data are widely recognised as essential for efforts to strengthen health-care systems in resource-poor settings worldwide. Effective health-system planning requires comprehensive and up-to-date information on a range of health metrics and this requirement is generally addressed by a Health Management Information System (HMIS) that coordinates the routine collection of data at individual health facilities and their compilation into national databases. In many resource-poor settings, these systems are inadequate and national databases often contain only a small proportion of the expected records. In this paper, we take an important health metric in Kenya (the proportion of outpatient treatments for malaria (MP)) from the national HMIS database and predict the values of MP at facilities where monthly records are missing. The available MP data were densely distributed across a spatiotemporal domain and displayed second-order heterogeneity. We used three different kriging methodologies to make cross-validation predictions of MP in order to test the effect on prediction accuracy of (a) the extension of a spatial-only to a space-time prediction approach, and (b) the replacement of a globally stationary with a locally varying random function model. Space-time kriging was found to produce predictions with 98.4% less mean bias and 14.8% smaller mean imprecision than conventional spatial-only kriging. A modification of space-time kriging that allowed space-time variograms to be recalculated for every prediction location within a spatially local neighbourhood resulted in a larger decrease in mean imprecision over ordinary kriging (18.3%) although the mean bias was reduced less (87.5%).
A local space–time kriging approach applied to a national outpatient malaria data set
Gething, P.W.; Atkinson, P.M.; Noor, A.M.; Gikandi, P.W.; Hay, S.I.; Nixon, M.S.
2007-01-01
Increases in the availability of reliable health data are widely recognised as essential for efforts to strengthen health-care systems in resource-poor settings worldwide. Effective health-system planning requires comprehensive and up-to-date information on a range of health metrics and this requirement is generally addressed by a Health Management Information System (HMIS) that coordinates the routine collection of data at individual health facilities and their compilation into national databases. In many resource-poor settings, these systems are inadequate and national databases often contain only a small proportion of the expected records. In this paper, we take an important health metric in Kenya (the proportion of outpatient treatments for malaria (MP)) from the national HMIS database and predict the values of MP at facilities where monthly records are missing. The available MP data were densely distributed across a spatiotemporal domain and displayed second-order heterogeneity. We used three different kriging methodologies to make cross-validation predictions of MP in order to test the effect on prediction accuracy of (a) the extension of a spatial-only to a space–time prediction approach, and (b) the replacement of a globally stationary with a locally varying random function model. Space–time kriging was found to produce predictions with 98.4% less mean bias and 14.8% smaller mean imprecision than conventional spatial-only kriging. A modification of space–time kriging that allowed space–time variograms to be recalculated for every prediction location within a spatially local neighbourhood resulted in a larger decrease in mean imprecision over ordinary kriging (18.3%) although the mean bias was reduced less (87.5%). PMID:19424510
XML-based approaches for the integration of heterogeneous bio-molecular data.
Mesiti, Marco; Jiménez-Ruiz, Ernesto; Sanz, Ismael; Berlanga-Llavori, Rafael; Perlasca, Paolo; Valentini, Giorgio; Manset, David
2009-10-15
The today's public database infrastructure spans a very large collection of heterogeneous biological data, opening new opportunities for molecular biology, bio-medical and bioinformatics research, but raising also new problems for their integration and computational processing. In this paper we survey the most interesting and novel approaches for the representation, integration and management of different kinds of biological data by exploiting XML and the related recommendations and approaches. Moreover, we present new and interesting cutting edge approaches for the appropriate management of heterogeneous biological data represented through XML. XML has succeeded in the integration of heterogeneous biomolecular information, and has established itself as the syntactic glue for biological data sources. Nevertheless, a large variety of XML-based data formats have been proposed, thus resulting in a difficult effective integration of bioinformatics data schemes. The adoption of a few semantic-rich standard formats is urgent to achieve a seamless integration of the current biological resources.
Rhodes, Kirsty M; Turner, Rebecca M; Higgins, Julian P T
2015-01-01
Estimation of between-study heterogeneity is problematic in small meta-analyses. Bayesian meta-analysis is beneficial because it allows incorporation of external evidence on heterogeneity. To facilitate this, we provide empirical evidence on the likely heterogeneity between studies in meta-analyses relating to specific research settings. Our analyses included 6,492 continuous-outcome meta-analyses within the Cochrane Database of Systematic Reviews. We investigated the influence of meta-analysis settings on heterogeneity by modeling study data from all meta-analyses on the standardized mean difference scale. Meta-analysis setting was described according to outcome type, intervention comparison type, and medical area. Predictive distributions for between-study variance expected in future meta-analyses were obtained, which can be used directly as informative priors. Among outcome types, heterogeneity was found to be lowest in meta-analyses of obstetric outcomes. Among intervention comparison types, heterogeneity was lowest in meta-analyses comparing two pharmacologic interventions. Predictive distributions are reported for different settings. In two example meta-analyses, incorporating external evidence led to a more precise heterogeneity estimate. Heterogeneity was influenced by meta-analysis characteristics. Informative priors for between-study variance were derived for each specific setting. Our analyses thus assist the incorporation of realistic prior information into meta-analyses including few studies. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.
Newborn screening healthcare information system based on service-oriented architecture.
Hsieh, Sung-Huai; Hsieh, Sheau-Ling; Chien, Yin-Hsiu; Weng, Yung-Ching; Hsu, Kai-Ping; Chen, Chi-Huang; Tu, Chien-Ming; Wang, Zhenyu; Lai, Feipei
2010-08-01
In this paper, we established a newborn screening system under the HL7/Web Services frameworks. We rebuilt the NTUH Newborn Screening Laboratory's original standalone architecture, having various heterogeneous systems operating individually, and restructured it into a Service-Oriented Architecture (SOA), distributed platform for further integrity and enhancements of sample collections, testing, diagnoses, evaluations, treatments or follow-up services, screening database management, as well as collaboration, communication among hospitals; decision supports and improving screening accuracy over the Taiwan neonatal systems are also addressed. In addition, the new system not only integrates the newborn screening procedures among phlebotomy clinics, referral hospitals, as well as the newborn screening center in Taiwan, but also introduces new models of screening procedures for the associated, medical practitioners. Furthermore, it reduces the burden of manual operations, especially the reporting services, those were heavily dependent upon previously. The new system can accelerate the whole procedures effectively and efficiently. It improves the accuracy and the reliability of the screening by ensuring the quality control during the processing as well.
Heterogenous database integration in a physician workstation.
Annevelink, J; Young, C Y; Tang, P C
1991-01-01
We discuss the integration of a variety of data and information sources in a Physician Workstation (PWS), focusing on the integration of data from DHCP, the Veteran Administration's Distributed Hospital Computer Program. We designed a logically centralized, object-oriented data-schema, used by end users and applications to explore the data accessible through an object-oriented database using a declarative query language. We emphasize the use of procedural abstraction to transparently integrate a variety of information sources into the data schema.
Heterogenous database integration in a physician workstation.
Annevelink, J.; Young, C. Y.; Tang, P. C.
1991-01-01
We discuss the integration of a variety of data and information sources in a Physician Workstation (PWS), focusing on the integration of data from DHCP, the Veteran Administration's Distributed Hospital Computer Program. We designed a logically centralized, object-oriented data-schema, used by end users and applications to explore the data accessible through an object-oriented database using a declarative query language. We emphasize the use of procedural abstraction to transparently integrate a variety of information sources into the data schema. PMID:1807624
The French network of hydrogeological sites H+
NASA Astrophysics Data System (ADS)
Davy, P.; Le Borgne, T.; Bour, O.; Gautier, S.; Porel, G.; Bodin, J.; de Dreuzy, J.; Pezard, P.
2008-12-01
For groundwater issues (potential leakages in waste repository, aquifer management "), the development of modeling techniques is far ahead of the actual knowledge of aquifers. This raises two fundamental issues: 1) which and how much data are necessary to make predictions accurate enough for aquifer management issues; 2) which models remain relevant to describe the heterogeneity and complexity of geological systems. The French observatory H+ was created in 2002 with the twofold motivation of acquiring a large database for validating models of heterogeneous aquifers, and of surveying groundwater quality evolution in the context of environmental changes. H+ is a network of 4 sites (Ploemeur, Brittany, France; HES Poitiers, France; Cadarache, France; Campos, Mallorca, Spain) with different geological, climatic, and economic contexts. All of them are characterized by a highly heterogeneous structure (fractured crystalline basement for Ploemeur, karstified and fractured limestone for Poitiers, Cadarache and Mallorca), which is far to be taken into account by basic models. Ploemeur is exploited as a tap-water plant for a medium-size coastal city (15,000 inhabitants) for 20 years. Each site is developed for long term investigation and monitoring. They involves a dense network of boreholes, detailed geological and geophysical surveys, periodic campaigns and/or permanent measurements of groundwater flow, water chemistry, geophysical signals (including ground motions), climatic parameter, etc. Several large-scale flow experiments are scheduled per year to investigate the aquifer structure with combined geophysical, hydrogeological, and geochemical instruments. All this information is recorded in a database that has been developed to improve the sustainability and quality of data, and to be used as a collaborative tool for both site researchers and modelers. This project lasts now for 5 years. It is a short time to collect the amount of information necessary to apprehend the complexity of aquifers; but it is already enough to obtain a few important scientific results about the very nature of the flow heterogeneity, the origin and residence time of water elements, the kinetic of geochemical processes, etc. We have also developed new methods to investigate aquifers (in-situ flow measurements, flow experiment designs, groundwater dating, versatile in-situ probes, etc.). This experience aiming at building up long term knowledge appears extremely useful to address critical issues related to groundwater aquifers: the structure and occurrence of productive aquifer in crystalline basement, the assessment of aquifer protection area in the context of highly heterogeneous flow, the biochemical reactivity processes, the long term evolution of both water quantity and quality in the context of significant environmental changes, for instance.
Device Data Ingestion for Industrial Big Data Platforms with a Case Study †
Ji, Cun; Shao, Qingshi; Sun, Jiao; Liu, Shijun; Pan, Li; Wu, Lei; Yang, Chenglei
2016-01-01
Despite having played a significant role in the Industry 4.0 era, the Internet of Things is currently faced with the challenge of how to ingest large-scale heterogeneous and multi-type device data. In response to this problem we present a heterogeneous device data ingestion model for an industrial big data platform. The model includes device templates and four strategies for data synchronization, data slicing, data splitting and data indexing, respectively. We can ingest device data from multiple sources with this heterogeneous device data ingestion model, which has been verified on our industrial big data platform. In addition, we present a case study on device data-based scenario analysis of industrial big data. PMID:26927121
dbWFA: a web-based database for functional annotation of Triticum aestivum transcripts
Vincent, Jonathan; Dai, Zhanwu; Ravel, Catherine; Choulet, Frédéric; Mouzeyar, Said; Bouzidi, M. Fouad; Agier, Marie; Martre, Pierre
2013-01-01
The functional annotation of genes based on sequence homology with genes from model species genomes is time-consuming because it is necessary to mine several unrelated databases. The aim of the present work was to develop a functional annotation database for common wheat Triticum aestivum (L.). The database, named dbWFA, is based on the reference NCBI UniGene set, an expressed gene catalogue built by expressed sequence tag clustering, and on full-length coding sequences retrieved from the TriFLDB database. Information from good-quality heterogeneous sources, including annotations for model plant species Arabidopsis thaliana (L.) Heynh. and Oryza sativa L., was gathered and linked to T. aestivum sequences through BLAST-based homology searches. Even though the complexity of the transcriptome cannot yet be fully appreciated, we developed a tool to easily and promptly obtain information from multiple functional annotation systems (Gene Ontology, MapMan bin codes, MIPS Functional Categories, PlantCyc pathway reactions and TAIR gene families). The use of dbWFA is illustrated here with several query examples. We were able to assign a putative function to 45% of the UniGenes and 81% of the full-length coding sequences from TriFLDB. Moreover, comparison of the annotation of the whole T. aestivum UniGene set along with curated annotations of the two model species assessed the accuracy of the annotation provided by dbWFA. To further illustrate the use of dbWFA, genes specifically expressed during the early cell division or late storage polymer accumulation phases of T. aestivum grain development were identified using a clustering analysis and then annotated using dbWFA. The annotation of these two sets of genes was consistent with previous analyses of T. aestivum grain transcriptomes and proteomes. Database URL: urgi.versailles.inra.fr/dbWFA/ PMID:23660284
A scalable healthcare information system based on a service-oriented architecture.
Yang, Tzu-Hsiang; Sun, Yeali S; Lai, Feipei
2011-06-01
Many existing healthcare information systems are composed of a number of heterogeneous systems and face the important issue of system scalability. This paper first describes the comprehensive healthcare information systems used in National Taiwan University Hospital (NTUH) and then presents a service-oriented architecture (SOA)-based healthcare information system (HIS) based on the service standard HL7. The proposed architecture focuses on system scalability, in terms of both hardware and software. Moreover, we describe how scalability is implemented in rightsizing, service groups, databases, and hardware scalability. Although SOA-based systems sometimes display poor performance, through a performance evaluation of our HIS based on SOA, the average response time for outpatient, inpatient, and emergency HL7Central systems are 0.035, 0.04, and 0.036 s, respectively. The outpatient, inpatient, and emergency WebUI average response times are 0.79, 1.25, and 0.82 s. The scalability of the rightsizing project and our evaluation results show that the SOA HIS we propose provides evidence that SOA can provide system scalability and sustainability in a highly demanding healthcare information system.
A collaborative computer auditing system under SOA-based conceptual model
NASA Astrophysics Data System (ADS)
Cong, Qiushi; Huang, Zuoming; Hu, Jibing
2013-03-01
Some of the current challenges of computer auditing are the obstacles to retrieving, converting and translating data from different database schema. During the last few years, there are many data exchange standards under continuous development such as Extensible Business Reporting Language (XBRL). These XML document standards can be used for data exchange among companies, financial institutions, and audit firms. However, for many companies, it is still expensive and time-consuming to translate and provide XML messages with commercial application packages, because it is complicated and laborious to search and transform data from thousands of tables in the ERP databases. How to transfer transaction documents for supporting continuous auditing or real time auditing between audit firms and their client companies is a important topic. In this paper, a collaborative computer auditing system under SOA-based conceptual model is proposed. By utilizing the widely used XML document standards and existing data transformation applications developed by different companies and software venders, we can wrap these application as commercial web services that will be easy implemented under the forthcoming application environments: service-oriented architecture (SOA). Under the SOA environments, the multiagency mechanism will help the maturity and popularity of data assurance service over the Internet. By the wrapping of data transformation components with heterogeneous databases or platforms, it will create new component markets composed by many software vendors and assurance service companies to provide data assurance services for audit firms, regulators or third parties.
Integration of Schemas on the Pre-Design Level Using the KCPM-Approach
NASA Astrophysics Data System (ADS)
Vöhringer, Jürgen; Mayr, Heinrich C.
Integration is a central research and operational issue in information system design and development. It can be conducted on the system, schema, and view or data level. On the system level, integration deals with the progressive linking and testing of system components to merge their functional and technical characteristics and behavior into a comprehensive, interoperable system. Schema integration comprises the comparison and merging of two or more schemas, usually conceptual database schemas. The integration of data deals with merging the contents of multiple sources of related data. View integration is similar to schema integration, however focuses on views and queries on these instead of schemas. All these types of integration have in common, that two or more sources are merged and previously compared, in order to identify matches and mismatches as well as conflicts and inconsistencies. The sources may stem from heterogeneous companies, organizational units or projects. Integration enables the reuse and combined use of source components.
Heterogeneity of long-history migration predicts emotion recognition accuracy.
Wood, Adrienne; Rychlowska, Magdalena; Niedenthal, Paula M
2016-06-01
Recent work (Rychlowska et al., 2015) demonstrated the power of a relatively new cultural dimension, historical heterogeneity, in predicting cultural differences in the endorsement of emotion expression norms. Historical heterogeneity describes the number of source countries that have contributed to a country's present-day population over the last 500 years. People in cultures originating from a large number of source countries may have historically benefited from greater and clearer emotional expressivity, because they lacked a common language and well-established social norms. We therefore hypothesized that in addition to endorsing more expressive display rules, individuals from heterogeneous cultures will also produce facial expressions that are easier to recognize by people from other cultures. By reanalyzing cross-cultural emotion recognition data from 92 papers and 82 cultures, we show that emotion expressions of people from heterogeneous cultures are more easily recognized by observers from other cultures than are the expressions produced in homogeneous cultures. Heterogeneity influences expression recognition rates alongside the individualism-collectivism of the perceivers' culture, as more individualistic cultures were more accurate in emotion judgments than collectivistic cultures. This work reveals the present-day behavioral consequences of long-term historical migration patterns and demonstrates the predictive power of historical heterogeneity. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Atlas - a data warehouse for integrative bioinformatics.
Shah, Sohrab P; Huang, Yong; Xu, Tao; Yuen, Macaire M S; Ling, John; Ouellette, B F Francis
2005-02-21
We present a biological data warehouse called Atlas that locally stores and integrates biological sequences, molecular interactions, homology information, functional annotations of genes, and biological ontologies. The goal of the system is to provide data, as well as a software infrastructure for bioinformatics research and development. The Atlas system is based on relational data models that we developed for each of the source data types. Data stored within these relational models are managed through Structured Query Language (SQL) calls that are implemented in a set of Application Programming Interfaces (APIs). The APIs include three languages: C++, Java, and Perl. The methods in these API libraries are used to construct a set of loader applications, which parse and load the source datasets into the Atlas database, and a set of toolbox applications which facilitate data retrieval. Atlas stores and integrates local instances of GenBank, RefSeq, UniProt, Human Protein Reference Database (HPRD), Biomolecular Interaction Network Database (BIND), Database of Interacting Proteins (DIP), Molecular Interactions Database (MINT), IntAct, NCBI Taxonomy, Gene Ontology (GO), Online Mendelian Inheritance in Man (OMIM), LocusLink, Entrez Gene and HomoloGene. The retrieval APIs and toolbox applications are critical components that offer end-users flexible, easy, integrated access to this data. We present use cases that use Atlas to integrate these sources for genome annotation, inference of molecular interactions across species, and gene-disease associations. The Atlas biological data warehouse serves as data infrastructure for bioinformatics research and development. It forms the backbone of the research activities in our laboratory and facilitates the integration of disparate, heterogeneous biological sources of data enabling new scientific inferences. Atlas achieves integration of diverse data sets at two levels. First, Atlas stores data of similar types using common data models, enforcing the relationships between data types. Second, integration is achieved through a combination of APIs, ontology, and tools. The Atlas software is freely available under the GNU General Public License at: http://bioinformatics.ubc.ca/atlas/
Atlas – a data warehouse for integrative bioinformatics
Shah, Sohrab P; Huang, Yong; Xu, Tao; Yuen, Macaire MS; Ling, John; Ouellette, BF Francis
2005-01-01
Background We present a biological data warehouse called Atlas that locally stores and integrates biological sequences, molecular interactions, homology information, functional annotations of genes, and biological ontologies. The goal of the system is to provide data, as well as a software infrastructure for bioinformatics research and development. Description The Atlas system is based on relational data models that we developed for each of the source data types. Data stored within these relational models are managed through Structured Query Language (SQL) calls that are implemented in a set of Application Programming Interfaces (APIs). The APIs include three languages: C++, Java, and Perl. The methods in these API libraries are used to construct a set of loader applications, which parse and load the source datasets into the Atlas database, and a set of toolbox applications which facilitate data retrieval. Atlas stores and integrates local instances of GenBank, RefSeq, UniProt, Human Protein Reference Database (HPRD), Biomolecular Interaction Network Database (BIND), Database of Interacting Proteins (DIP), Molecular Interactions Database (MINT), IntAct, NCBI Taxonomy, Gene Ontology (GO), Online Mendelian Inheritance in Man (OMIM), LocusLink, Entrez Gene and HomoloGene. The retrieval APIs and toolbox applications are critical components that offer end-users flexible, easy, integrated access to this data. We present use cases that use Atlas to integrate these sources for genome annotation, inference of molecular interactions across species, and gene-disease associations. Conclusion The Atlas biological data warehouse serves as data infrastructure for bioinformatics research and development. It forms the backbone of the research activities in our laboratory and facilitates the integration of disparate, heterogeneous biological sources of data enabling new scientific inferences. Atlas achieves integration of diverse data sets at two levels. First, Atlas stores data of similar types using common data models, enforcing the relationships between data types. Second, integration is achieved through a combination of APIs, ontology, and tools. The Atlas software is freely available under the GNU General Public License at: PMID:15723693
SLIVISU, an Interactive Visualisation Framework for Analysis of Geological Sea-Level Indicators
NASA Astrophysics Data System (ADS)
Klemann, V.; Schulte, S.; Unger, A.; Dransch, D.
2011-12-01
Flanking data analysis in earth system sciences by advanced visualisation tools is a striking feature due to rising complexity, amount and variety of available data. With respect to sea-level indicators (SLIs), their analysis in earth-system applications, such as modelling and simulation on regional or global scales, demands the consideration of large amounts of data - we talk about thousands of SLIs - and, so, to go ahead of analysing single sea-level curves. On the other hand, a gross analysis by means of statistical methods is hindered by the often heterogeneous and individual character of the single SLIs, i.e., the spatio-temporal context and often heterogenous information is difficult to handle or to represent in an objective way. Therefore a concept of integrating automated analysis and visualisation is mandatory. This is provided by visual analytics. As an implementation of this concept, we present the visualisation framework SLIVISU, developed at GFZ, which bases on multiple linked views and provides a synoptic analysis of observational data, model configurations, model outputs and results of automated analysis in glacial isostatic adjustment. Starting as a visualisation tool for an existing database of SLIs, it now serves as an analysis tool for the evaluation of model simulations in studies of glacial-isostatic adjustment.
Systems heterogeneity: An integrative way to understand cancer heterogeneity.
Wang, Diane Catherine; Wang, Xiangdong
2017-04-01
The concept of systems heterogeneity was firstly coined and explained in the Special Issue, as a new alternative to understand the importance and complexity of heterogeneity in cancer. Systems heterogeneity can offer a full image of heterogeneity at multi-dimensional functions and multi-omics by integrating gene or protein expression, epigenetics, sequencing, phosphorylation, transcription, pathway, or interaction. The Special Issue starts with the roles of epigenetics in the initiation and development of cancer heterogeneity through the interaction between permanent genetic mutations and dynamic epigenetic alterations. Cell heterogeneity was defined as the difference in biological function and phenotypes between cells in the same organ/tissue or in different organs, as well as various challenges, as exampled in telocytes. The single cell heterogeneity has the value of identifying diagnostic biomarkers and therapeutic targets and clinical potential of single cell systems heterogeneity in clinical oncology. A number of signaling pathways and factors contribute to the development of systems heterogeneity. Proteomic heterogeneity can change the strategy and thinking of drug discovery and development by understanding the interactions between proteins or proteins with drugs in order to optimize drug efficacy and safety. The association of cancer heterogeneity with cancer cell evolution and metastasis was also overviewed as a new alternative for diagnostic biomarkers and therapeutic targets in clinical application. Copyright © 2016 Elsevier Ltd. All rights reserved.
Bifurcation analysis of a heterogeneous traffic flow model
NASA Astrophysics Data System (ADS)
Wang, Yu-Qing; Yan, Bo-Wen; Zhou, Chao-Fan; Li, Wei-Kang; Jia, Bin
2018-03-01
In this work, a heterogeneous traffic flow model coupled with the periodic boundary condition is proposed. Based on the previous models, a heterogeneous system composed of more than one kind of vehicles is considered. By bifurcation analysis, bifurcation patterns of the heterogeneous system are discussed in three situations in detail and illustrated by diagrams of bifurcation patterns. Besides, the stability analysis of the heterogeneous system is performed to test its anti-interference ability. The relationship between the number of vehicles and the stability is obtained. Furthermore, the attractor analysis is applied to investigate the nature of the heterogeneous system near its steady-state neighborhood. Phase diagrams of the process of the heterogeneous system from initial state to equilibrium state are intuitively presented.
NASA Astrophysics Data System (ADS)
Elag, M.; Kumar, P.
2014-12-01
Often, scientists and small research groups collect data, which target to address issues and have limited geographic or temporal range. A large number of such collections together constitute a large database that is of immense value to Earth Science studies. Complexity of integrating these data include heterogeneity in dimensions, coordinate systems, scales, variables, providers, users and contexts. They have been defined as long-tail data. Similarly, we use "long-tail models" to characterize a heterogeneous collection of models and/or modules developed for targeted problems by individuals and small groups, which together provide a large valuable collection. Complexity of integrating across these models include differing variable names and units for the same concept, model runs at different time steps and spatial resolution, use of differing naming and reference conventions, etc. Ability to "integrate long-tail models and data" will provide an opportunity for the interoperability and reusability of communities' resources, where not only models can be combined in a workflow, but each model will be able to discover and (re)use data in application specific context of space, time and questions. This capability is essential to represent, understand, predict, and manage heterogeneous and interconnected processes and activities by harnessing the complex, heterogeneous, and extensive set of distributed resources. Because of the staggering production rate of long-tail models and data resulting from the advances in computational, sensing, and information technologies, an important challenge arises: how can geoinformatics bring together these resources seamlessly, given the inherent complexity among model and data resources that span across various domains. We will present a semantic-based framework to support integration of "long-tail" models and data. This builds on existing technologies including: (i) SEAD (Sustainable Environmental Actionable Data) which supports curation and preservation of long-tail data during its life-cycle; (ii) BrownDog, which enhances the machine interpretability of large unstructured and uncurated data; and (iii) CSDMS (Community Surface Dynamics Modeling System), which "componentizes" models by providing plug-and-play environment for models integration.
Acknowledging patient heterogeneity in economic evaluation : a systematic literature review.
Grutters, Janneke P C; Sculpher, Mark; Briggs, Andrew H; Severens, Johan L; Candel, Math J; Stahl, James E; De Ruysscher, Dirk; Boer, Albert; Ramaekers, Bram L T; Joore, Manuela A
2013-02-01
Patient heterogeneity is the part of variability that can be explained by certain patient characteristics (e.g. age, disease stage). Population reimbursement decisions that acknowledge patient heterogeneity could potentially save money and increase population health. To date, however, economic evaluations pay only limited attention to patient heterogeneity. The objective of the present paper is to provide a comprehensive overview of the current knowledge regarding patient heterogeneity within economic evaluation of healthcare programmes. A systematic literature review was performed to identify methodological papers on the topic of patient heterogeneity in economic evaluation. Data were obtained using a keyword search of the PubMed database and manual searches. Handbooks were also included. Relevant data were extracted regarding potential sources of patient heterogeneity, in which of the input parameters of an economic evaluation these occur, methods to acknowledge patient heterogeneity and specific concerns associated with this acknowledgement. A total of 20 articles and five handbooks were included. The relevant sources of patient heterogeneity (demographics, preferences and clinical characteristics) and the input parameters where they occurred (baseline risk, treatment effect, health state utility and resource utilization) were combined in a framework. Methods were derived for the design, analysis and presentation phases of an economic evaluation. Concerns related mainly to the danger of false-positive results and equity issues. By systematically reviewing current knowledge regarding patient heterogeneity within economic evaluations of healthcare programmes, we provide guidance for future economic evaluations. Guidance is provided on which sources of patient heterogeneity to consider, how to acknowledge them in economic evaluation and potential concerns. The improved acknowledgement of patient heterogeneity in future economic evaluations may well improve the efficiency of healthcare.
Odronitz, Florian; Kollmar, Martin
2006-01-01
Background Annotation of protein sequences of eukaryotic organisms is crucial for the understanding of their function in the cell. Manual annotation is still by far the most accurate way to correctly predict genes. The classification of protein sequences, their phylogenetic relation and the assignment of function involves information from various sources. This often leads to a collection of heterogeneous data, which is hard to track. Cytoskeletal and motor proteins consist of large and diverse superfamilies comprising up to several dozen members per organism. Up to date there is no integrated tool available to assist in the manual large-scale comparative genomic analysis of protein families. Description Pfarao (Protein Family Application for Retrieval, Analysis and Organisation) is a database driven online working environment for the analysis of manually annotated protein sequences and their relationship. Currently, the system can store and interrelate a wide range of information about protein sequences, species, phylogenetic relations and sequencing projects as well as links to literature and domain predictions. Sequences can be imported from multiple sequence alignments that are generated during the annotation process. A web interface allows to conveniently browse the database and to compile tabular and graphical summaries of its content. Conclusion We implemented a protein sequence-centric web application to store, organize, interrelate, and present heterogeneous data that is generated in manual genome annotation and comparative genomics. The application has been developed for the analysis of cytoskeletal and motor proteins (CyMoBase) but can easily be adapted for any protein. PMID:17134497
A Chronostratigraphic Relational Database Ontology
NASA Astrophysics Data System (ADS)
Platon, E.; Gary, A.; Sikora, P.
2005-12-01
A chronostratigraphic research database was donated by British Petroleum to the Stratigraphy Group at the Energy and Geoscience Institute (EGI), University of Utah. These data consists of over 2,000 measured sections representing over three decades of research into the application of the graphic correlation method. The data are global and includes both microfossil (foraminifera, calcareous nannoplankton, spores, pollen, dinoflagellate cysts, etc) and macrofossil data. The objective of the donation was to make the research data available to the public in order to encourage additional chronostratigraphy studies, specifically regarding graphic correlation. As part of the National Science Foundation's Cyberinfrastructure for the Geosciences (GEON) initiative these data have been made available to the public at http://css.egi.utah.edu. To encourage further research using the graphic correlation method, EGI has developed a software package, StrataPlot that will soon be publicly available from the GEON website as a standalone software download. The EGI chronostratigraphy research database, although relatively large, has many data holes relative to some paleontological disciplines and geographical areas, so the challenge becomes how do we expand the data available for chronostratigrahic studies using graphic correlation. There are several public or soon-to-be public databases available to chronostratigraphic research, but they have their own data structures and modes of presentation. The heterogeneous nature of these database schemas hinders their integration and makes it difficult for the user to retrieve and consolidate potentially valuable chronostratigraphic data. The integration of these data sources would facilitate rapid and comprehensive data searches, thus helping advance studies in chronostratigraphy. The GEON project will host a number of databases within the geology domain, some of which contain biostratigraphic data. Ontologies are being developed to provide an integrated query system for the searching across GEON's biostratigraphy databases, as well as databases available in the public domain. Although creating an ontology directly from the existing database metadata would have been effective and straightforward, our effort was directed towards creating a more efficient representation of our database, as well as a general representation of the biostratigraphic domain.
Jambusaria, Ankit; Klomp, Jeff; Hong, Zhigang; Rafii, Shahin; Dai, Yang; Malik, Asrar B; Rehman, Jalees
2018-06-07
The heterogeneity of cells across tissue types represents a major challenge for studying biological mechanisms as well as for therapeutic targeting of distinct tissues. Computational prediction of tissue-specific gene regulatory networks may provide important insights into the mechanisms underlying the cellular heterogeneity of cells in distinct organs and tissues. Using three pathway analysis techniques, gene set enrichment analysis (GSEA), parametric analysis of gene set enrichment (PGSEA), alongside our novel model (HeteroPath), which assesses heterogeneously upregulated and downregulated genes within the context of pathways, we generated distinct tissue-specific gene regulatory networks. We analyzed gene expression data derived from freshly isolated heart, brain, and lung endothelial cells and populations of neurons in the hippocampus, cingulate cortex, and amygdala. In both datasets, we found that HeteroPath segregated the distinct cellular populations by identifying regulatory pathways that were not identified by GSEA or PGSEA. Using simulated datasets, HeteroPath demonstrated robustness that was comparable to what was seen using existing gene set enrichment methods. Furthermore, we generated tissue-specific gene regulatory networks involved in vascular heterogeneity and neuronal heterogeneity by performing motif enrichment of the heterogeneous genes identified by HeteroPath and linking the enriched motifs to regulatory transcription factors in the ENCODE database. HeteroPath assesses contextual bidirectional gene expression within pathways and thus allows for transcriptomic assessment of cellular heterogeneity. Unraveling tissue-specific heterogeneity of gene expression can lead to a better understanding of the molecular underpinnings of tissue-specific phenotypes.
Franz, D; Mrosek, M; Mrosek, S; Helbig, H; Framme, C
2012-01-01
Patients with penetrating eye injuries are a very heterogeneous group both medically and economically. Since 2009, treatment involving sutures for open eye injuries and cases requiring amniotic membrane transplantation (AMT) were allocated to DRG C01B of the German diagnosis-related group system. However, given the significant clinical differences between these treatments, an inhomogeneity of costs to performance is postulated. This analysis describes case allocation problems within the G-DRG C01B category and presents solutions. A retrospective analysis was conducted from the standardized G-DRG data of 277 patients with open eye injuries and AMT between 2007 and 2008, grouped under the 2008 G-DRG system version to the G-DRG C01Z category. This data was provided by the Department of Ophthalmology at the University Hospital Regensburg. Additionally case-based data of the following were supplemented: length of surgery, time of anesthesia and intensity of patient care. Fixed and variable costs were determined for surgery and other inpatient treatment. Finally, an analysis of the heterogeneity of costs within the G-DRG C01B of the G-DRG system 2009 was implemented. Inhomogeneity was evident within the G-DRG C01B of the G-DRG system 2009 for the two groups suture of open eye injuries and AMT concerning the parameters length of stay, proportion of high outliers and cost per case. Multiple surgeries during an inpatient stay lead to an extended length of stay and increasing costs, especially within the AMT group. Intensity of patient care and the consideration of patient comorbidity did not yield relevant differences. The quality of the G-DRG system is measured by its ability to obtain adequate funding for highly complex and heterogeneous cases. Specific modifications of the G-DRG structures could increase the appropriateness of case allocation for patients with open eye injuries within the G-DRG C01B of the German DRG system 2009. As a result of the present study, cases with amniotic membrane transplantation should not be allocated to the G-DRG C01B. A petition has been presented by the German Association of Ophthalmology (DOG) to the German DRG Institute to restructure the G-DRG C01B. Data-based analysis is an essential prerequisite for a constructive development of the G-DRG system and a necessary tool for the active participation of medical societies in this process.
Grühn, Daniel; Scheibe, Susanne; Baltes, Paul B
2007-09-01
Using the heterogeneity-homogeneity list paradigm, the authors investigated 48 young adults' (20-30 years) and 48 older adults' (65-75 years) recognition memory for emotional pictures. The authors obtained no evidence for a positivity bias in older adults' memory: Age differences were primarily driven by older adults' diminished ability to remember negative pictures. The authors further found a strong effect of list types: Pictures, particularly neutral ones, were better recognized in homogeneous (blocked) lists than in heterogeneous (mixed) ones. Results confirm those of a previous study by D. Grühn, J. Smith, and P. B. Baltes (2005) that used a different type of to-be-remembered material, that is, pictures instead of words. (PsycINFO Database Record (c) 2007 APA, all rights reserved).
Global distribution of minerals in arid soils as lower boundary condition in dust models
NASA Astrophysics Data System (ADS)
Nickovic, Slobodan
2010-05-01
Mineral dust eroded from arid soils affects the radiation budget of the Earth system, modifies ocean bioproductivity and influences human health. Dust aerosol is a complex mixture of minerals. Dust mineral composition has several potentially important impacts to environment and society. Iron and phosphorus embedded in mineral aerosol are essential for the primary marine productivity when dust deposits over the open ocean. Dust also acts as efficient agent for heterogeneous ice nucleation and this process is dependent on mineralogical structure of dust. Recent findings in medical geology indicate possible role of minerals to human health. In this study, a new 1-km global database was developed for several minerals (Illite, Kaolinite, Smectite, Calcite, Quartz, Feldspar, Hematite and Gypsum) embedded in clay and silt populations of arid soils. For the database generation, high-resolution data sets on soil textures, soil types and land cover was used. Tin addition to the selected minerals, phosphorus was also added whose geographical distribution was specified from compiled literature and data on soil types. The developed global database was used to specify sources of mineral fractions in the DREAM dust model and to simulate atmospheric paths of minerals and their potential impacts on marine biochemistry and tropospheric ice nucleation.
Machine Learning and Decision Support in Critical Care
Johnson, Alistair E. W.; Ghassemi, Mohammad M.; Nemati, Shamim; Niehaus, Katherine E.; Clifton, David A.; Clifford, Gari D.
2016-01-01
Clinical data management systems typically provide caregiver teams with useful information, derived from large, sometimes highly heterogeneous, data sources that are often changing dynamically. Over the last decade there has been a significant surge in interest in using these data sources, from simply re-using the standard clinical databases for event prediction or decision support, to including dynamic and patient-specific information into clinical monitoring and prediction problems. However, in most cases, commercial clinical databases have been designed to document clinical activity for reporting, liability and billing reasons, rather than for developing new algorithms. With increasing excitement surrounding “secondary use of medical records” and “Big Data” analytics, it is important to understand the limitations of current databases and what needs to change in order to enter an era of “precision medicine.” This review article covers many of the issues involved in the collection and preprocessing of critical care data. The three challenges in critical care are considered: compartmentalization, corruption, and complexity. A range of applications addressing these issues are covered, including the modernization of static acuity scoring; on-line patient tracking; personalized prediction and risk assessment; artifact detection; state estimation; and incorporation of multimodal data sources such as genomic and free text data. PMID:27765959
Muffly, Matthew K; Muffly, Tyler M; Weterings, Robbie; Singleton, Mark; Honkanen, Anita
2016-07-01
There is no comprehensive database of pediatric anesthesiologists, their demographic characteristics, or geographic location in the United States. We endeavored to create a comprehensive database of pediatric anesthesiologists by merging individuals identified as US pediatric anesthesiologists by the American Board of Anesthesiology, National Provider Identifier registry, Healthgrades.com database, and the Society for Pediatric Anesthesia membership list as of November 5, 2015. Professorial rank was accessed via the Association of American Medical Colleges and other online sources. Descriptive statistics characterized pediatric anesthesiologists' demographics. Pediatric anesthesiologists' locations at the city and state level were geocoded and mapped with the use of ArcGIS Desktop 10.1 mapping software (Redlands, CA). We identified 4048 pediatric anesthesiologists in the United States, which is approximately 8.8% of the physician anesthesiology workforce (n = 46,000). The median age of pediatric anesthesiologists was 49 years (interquartile range, 40-57 years), and the majority (56.4%) were men. Approximately two-thirds of identified pediatric anesthesiologists were subspecialty board certified in pediatric anesthesiology, and 33% of pediatric anesthesiologists had an identified academic affiliation. There is substantial heterogeneity in the geographic distribution of pediatric anesthesiologists by state and US Census Division with urban clustering. This description of pediatric anesthesiologists' demographic characteristics and geographic distribution fills an important gap in our understanding of pediatric anesthesia systems of care.
A proposal for cervical screening information systems in developing countries.
Marrett, Loraine D; Robles, Sylvia; Ashbury, Fredrick D; Green, Bo; Goel, Vivek; Luciani, Silvana
2002-11-20
The effective and efficient delivery of cervical screening programs requires information for planning, management, delivery and evaluation. Specially designed systems are generally required to meet these needs. In many developing countries, lack of information systems constitutes an important barrier to development of comprehensive screening programs and the effective control of cervical cancer. Our report outlines a framework for creating such systems in developing countries and describes a conceptual model for a cervical screening information system. The proposed system is modular, recognizing that there will be considerable between-region heterogeneity in current status and priorities. The proposed system is centered on modules that would allow for the assembly and computerization of data on Pap tests, since these represent the main screening modality at the present time. Additional modules would process data and create and maintain a screening database (e.g., standardize, edit, link and update modules) and allow for the integration of other types of data, such as cervical histopathology results. An open systems development model is proposed, since it is most compatible with the goals of local stakeholder involvement and capacity-building. Copyright 2002 Wiley-Liss, Inc.
Gioutlakis, Aris; Klapa, Maria I.
2017-01-01
It has been acknowledged that source databases recording experimentally supported human protein-protein interactions (PPIs) exhibit limited overlap. Thus, the reconstruction of a comprehensive PPI network requires appropriate integration of multiple heterogeneous primary datasets, presenting the PPIs at various genetic reference levels. Existing PPI meta-databases perform integration via normalization; namely, PPIs are merged after converted to a certain target level. Hence, the node set of the integrated network depends each time on the number and type of the combined datasets. Moreover, the irreversible a priori normalization process hinders the identification of normalization artifacts in the integrated network, which originate from the nonlinearity characterizing the genetic information flow. PICKLE (Protein InteraCtion KnowLedgebasE) 2.0 implements a new architecture for this recently introduced human PPI meta-database. Its main novel feature over the existing meta-databases is its approach to primary PPI dataset integration via genetic information ontology. Building upon the PICKLE principles of using the reviewed human complete proteome (RHCP) of UniProtKB/Swiss-Prot as the reference protein interactor set, and filtering out protein interactions with low probability of being direct based on the available evidence, PICKLE 2.0 first assembles the RHCP genetic information ontology network by connecting the corresponding genes, nucleotide sequences (mRNAs) and proteins (UniProt entries) and then integrates PPI datasets by superimposing them on the ontology network without any a priori transformations. Importantly, this process allows the resulting heterogeneous integrated network to be reversibly normalized to any level of genetic reference without loss of the original information, the latter being used for identification of normalization biases, and enables the appraisal of potential false positive interactions through PPI source database cross-checking. The PICKLE web-based interface (www.pickle.gr) allows for the simultaneous query of multiple entities and provides integrated human PPI networks at either the protein (UniProt) or the gene level, at three PPI filtering modes. PMID:29023571
Eronen, Lauri; Toivonen, Hannu
2012-06-06
Biological databases contain large amounts of data concerning the functions and associations of genes and proteins. Integration of data from several such databases into a single repository can aid the discovery of previously unknown connections spanning multiple types of relationships and databases. Biomine is a system that integrates cross-references from several biological databases into a graph model with multiple types of edges, such as protein interactions, gene-disease associations and gene ontology annotations. Edges are weighted based on their type, reliability, and informativeness. We present Biomine and evaluate its performance in link prediction, where the goal is to predict pairs of nodes that will be connected in the future, based on current data. In particular, we formulate protein interaction prediction and disease gene prioritization tasks as instances of link prediction. The predictions are based on a proximity measure computed on the integrated graph. We consider and experiment with several such measures, and perform a parameter optimization procedure where different edge types are weighted to optimize link prediction accuracy. We also propose a novel method for disease-gene prioritization, defined as finding a subset of candidate genes that cluster together in the graph. We experimentally evaluate Biomine by predicting future annotations in the source databases and prioritizing lists of putative disease genes. The experimental results show that Biomine has strong potential for predicting links when a set of selected candidate links is available. The predictions obtained using the entire Biomine dataset are shown to clearly outperform ones obtained using any single source of data alone, when different types of links are suitably weighted. In the gene prioritization task, an established reference set of disease-associated genes is useful, but the results show that under favorable conditions, Biomine can also perform well when no such information is available.The Biomine system is a proof of concept. Its current version contains 1.1 million entities and 8.1 million relations between them, with focus on human genetics. Some of its functionalities are available in a public query interface at http://biomine.cs.helsinki.fi, allowing searching for and visualizing connections between given biological entities.
Multi-source and ontology-based retrieval engine for maize mutant phenotypes
USDA-ARS?s Scientific Manuscript database
In the midst of this genomics era, major plant genome databases are collecting massive amounts of heterogeneous information, including sequence data, gene product information, images of mutant phenotypes, etc., as well as textual descriptions of many of these entities. While basic browsing and sear...
Analytics to Better Interpret and Use Large Amounts of Heterogeneous Data
NASA Astrophysics Data System (ADS)
Mathews, T. J.; Baskin, W. E.; Rinsland, P. L.
2014-12-01
Data scientists at NASA's Atmospheric Science Data Center (ASDC) are seasoned software application developers who have worked with the creation, archival, and distribution of large datasets (multiple terabytes and larger). In order for ASDC data scientists to effectively implement the most efficient processes for cataloging and organizing data access applications, they must be intimately familiar with data contained in the datasets with which they are working. Key technologies that are critical components to the background of ASDC data scientists include: large RBMSs (relational database management systems) and NoSQL databases; web services; service-oriented architectures; structured and unstructured data access; as well as processing algorithms. However, as prices of data storage and processing decrease, sources of data increase, and technologies advance - granting more people to access to data at real or near-real time - data scientists are being pressured to accelerate their ability to identify and analyze vast amounts of data. With existing tools this is becoming exceedingly more challenging to accomplish. For example, NASA Earth Science Data and Information System (ESDIS) alone grew from having just over 4PBs of data in 2009 to nearly 6PBs of data in 2011. This amount then increased to roughly10PBs of data in 2013. With data from at least ten new missions to be added to the ESDIS holdings by 2017, the current volume will continue to grow exponentially and drive the need to be able to analyze more data even faster. Though there are many highly efficient, off-the-shelf analytics tools available, these tools mainly cater towards business data, which is predominantly unstructured. Inadvertently, there are very few known analytics tools that interface well to archived Earth science data, which is predominantly heterogeneous and structured. This presentation will identify use cases for data analytics from an Earth science perspective in order to begin to identify specific tools that may be able to address those challenges.
NASA Astrophysics Data System (ADS)
Armigliato, Alberto; Pagnoni, Gianluca; Zaniboni, Filippo; Tinti, Stefano
2013-04-01
TRIDEC is a EU-FP7 Project whose main goal is, in general terms, to develop suitable strategies for the management of crises possibly arising in the Earth management field. The general paradigms adopted by TRIDEC to develop those strategies include intelligent information management, the capability of managing dynamically increasing volumes and dimensionality of information in complex events, and collaborative decision making in systems that are typically very loosely coupled. The two areas where TRIDEC applies and tests its strategies are tsunami early warning and industrial subsurface development. In the field of tsunami early warning, TRIDEC aims at developing a Decision Support System (DSS) that integrates 1) a set of seismic, geodetic and marine sensors devoted to the detection and characterisation of possible tsunamigenic sources and to monitoring the time and space evolution of the generated tsunami, 2) large-volume databases of pre-computed numerical tsunami scenarios, 3) a proper overall system architecture. Two test areas are dealt with in TRIDEC: the western Iberian margin and the eastern Mediterranean. In this study, we focus on the western Iberian margin with special emphasis on the Portuguese coasts. The strategy adopted in TRIDEC plans to populate two different databases, called "Virtual Scenario Database" (VSDB) and "Matching Scenario Database" (MSDB), both of which deal only with earthquake-generated tsunamis. In the VSDB we simulate numerically few large-magnitude events generated by the major known tectonic structures in the study area. Heterogeneous slip distributions on the earthquake faults are introduced to simulate events as "realistically" as possible. The members of the VSDB represent the unknowns that the TRIDEC platform must be able to recognise and match during the early crisis management phase. On the other hand, the MSDB contains a very large number (order of thousands) of tsunami simulations performed starting from many different simple earthquake sources of different magnitudes and located in the "vicinity" of the virtual scenario earthquake. In the DSS perspective, the members of the MSDB have to be suitably combined based on the information coming from the sensor networks, and the results are used during the crisis evolution phase to forecast the degree of exposition of different coastal areas. We provide examples from both databases whose members are computed by means of the in-house software called UBO-TSUFD, implementing the non-linear shallow-water equations and solving them over a set of nested grids that guarantee a suitable spatial resolution (few tens of meters) in specific, suitably chosen, coastal areas.
NASA Astrophysics Data System (ADS)
Li, Yung-Hui; Zheng, Bo-Ren; Ji, Dai-Yan; Tien, Chung-Hao; Liu, Po-Tsun
2014-09-01
Cross sensor iris matching may seriously degrade the recognition performance because of the sensor mis-match problem of iris images between the enrollment and test stage. In this paper, we propose two novel patch-based heterogeneous dictionary learning method to attack this problem. The first method applies the latest sparse representation theory while the second method tries to learn the correspondence relationship through PCA in heterogeneous patch space. Both methods learn the basic atoms in iris textures across different image sensors and build connections between them. After such connections are built, at test stage, it is possible to hallucinate (synthesize) iris images across different sensors. By matching training images with hallucinated images, the recognition rate can be successfully enhanced. The experimental results showed the satisfied results both visually and in terms of recognition rate. Experimenting with an iris database consisting of 3015 images, we show that the EER is decreased 39.4% relatively by the proposed method.
Accounting for heterogeneity in meta-analysis using a multiplicative model-an empirical study.
Mawdsley, David; Higgins, Julian P T; Sutton, Alex J; Abrams, Keith R
2017-03-01
In meta-analysis, the random-effects model is often used to account for heterogeneity. The model assumes that heterogeneity has an additive effect on the variance of effect sizes. An alternative model, which assumes multiplicative heterogeneity, has been little used in the medical statistics community, but is widely used by particle physicists. In this paper, we compare the two models using a random sample of 448 meta-analyses drawn from the Cochrane Database of Systematic Reviews. In general, differences in goodness of fit are modest. The multiplicative model tends to give results that are closer to the null, with a narrower confidence interval. Both approaches make different assumptions about the outcome of the meta-analysis. In our opinion, the selection of the more appropriate model will often be guided by whether the multiplicative model's assumption of a single effect size is plausible. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
NASA Astrophysics Data System (ADS)
Willmes, C.
2017-12-01
In the frame of the Collaborative Research Centre 806 (CRC 806) an interdisciplinary research project, that needs to manage data, information and knowledge from heterogeneous domains, such as archeology, cultural sciences, and the geosciences, a collaborative internal knowledge base system was developed. The system is based on the open source MediaWiki software, that is well known as the software that enables Wikipedia, for its facilitation of a web based collaborative knowledge and information management platform. This software is additionally enhanced with the Semantic MediaWiki (SMW) extension, that allows to store and manage structural data within the Wiki platform, as well as it facilitates complex query and API interfaces to the structured data stored in the SMW data base. Using an additional open source software called mobo, it is possible to improve the data model development process, as well as automated data imports, from small spreadsheets to large relational databases. Mobo is a command line tool that helps building and deploying SMW structure in an agile, Schema-Driven Development way, and allows to manage and collaboratively develop the data model formalizations, that are formalized in JSON-Schema format, using version control systems like git. The combination of a well equipped collaborative web platform facilitated by Mediawiki, the possibility to store and query structured data in this collaborative database provided by SMW, as well as the possibility for automated data import and data model development enabled by mobo, result in a powerful but flexible system to build and develop a collaborative knowledge base system. Furthermore, SMW allows the application of Semantic Web technology, the structured data can be exported into RDF, thus it is possible to set a triple-store including a SPARQL endpoint on top of the database. The JSON-Schema based data models, can be enhanced into JSON-LD, to facilitate and profit from the possibilities of Linked Data technology.
Bellos, Christos; Papadopoulos, Athanassios; Rosso, Roberto; Fotiadis, Dimitrios I
2011-01-01
CHRONIOUS system is an integrated platform aiming at the management of chronic disease patients. One of the most important components of the system is a Decision Support System (DSS) that has been developed in a Smart Device (SD). This component decides on patient's current health status by combining several data, which are acquired either by wearable sensors or manually inputted by the patient or retrieved from the specific database. In case no abnormal situation has been tracked, the DSS takes no action and remains deactivated until next abnormal situation pack of data are being acquired or next scheduled data being transmitted. The DSS that has been implemented is an integrated classification system with two parallel classifiers, combining an expert system (rule-based system) and a supervised classifier, such as Support Vector Machines (SVM), Random Forests, artificial Neural Networks (aNN like the Multi-Layer Perceptron), Decision Trees and Naïve Bayes. The above categorized system is useful for providing critical information about the health status of the patient.
NASA Astrophysics Data System (ADS)
Myre, Joseph M.
Heterogeneous computing systems have recently come to the forefront of the High-Performance Computing (HPC) community's interest. HPC computer systems that incorporate special purpose accelerators, such as Graphics Processing Units (GPUs), are said to be heterogeneous. Large scale heterogeneous computing systems have consistently ranked highly on the Top500 list since the beginning of the heterogeneous computing trend. By using heterogeneous computing systems that consist of both general purpose processors and special- purpose accelerators, the speed and problem size of many simulations could be dramatically increased. Ultimately this results in enhanced simulation capabilities that allows, in some cases for the first time, the execution of parameter space and uncertainty analyses, model optimizations, and other inverse modeling techniques that are critical for scientific discovery and engineering analysis. However, simplifying the usage and optimization of codes for heterogeneous computing systems remains a challenge. This is particularly true for scientists and engineers for whom understanding HPC architectures and undertaking performance analysis may not be primary research objectives. To enable scientists and engineers to remain focused on their primary research objectives, a modular environment for geophysical inversion and run-time autotuning on heterogeneous computing systems is presented. This environment is composed of three major components: 1) CUSH---a framework for reducing the complexity of programming heterogeneous computer systems, 2) geophysical inversion routines which can be used to characterize physical systems, and 3) run-time autotuning routines designed to determine configurations of heterogeneous computing systems in an attempt to maximize the performance of scientific and engineering codes. Using three case studies, a lattice-Boltzmann method, a non-negative least squares inversion, and a finite-difference fluid flow method, it is shown that this environment provides scientists and engineers with means to reduce the programmatic complexity of their applications, to perform geophysical inversions for characterizing physical systems, and to determine high-performing run-time configurations of heterogeneous computing systems using a run-time autotuner.
NASA Astrophysics Data System (ADS)
Hsu, Charles; Viazanko, Michael; O'Looney, Jimmy; Szu, Harold
2009-04-01
Modularity Biometric System (MBS) is an approach to support AiTR of the cooperated and/or non-cooperated standoff biometric in an area persistent surveillance. Advanced active and passive EOIR and RF sensor suite is not considered here. Neither will we consider the ROC, PD vs. FAR, versus the standoff POT in this paper. Our goal is to catch the "most wanted (MW)" two dozens, separately furthermore ad hoc woman MW class from man MW class, given their archrivals sparse front face data basis, by means of various new instantaneous input called probing faces. We present an advanced algorithm: mini-Max classifier, a sparse sample realization of Cramer-Rao Fisher bound of the Maximum Likelihood classifier that minimize the dispersions among the same woman classes and maximize the separation among different man-woman classes, based on the simple feature space of MIT Petland eigen-faces. The original aspect consists of a modular structured design approach at the system-level with multi-level architectures, multiple computing paradigms, and adaptable/evolvable techniques to allow for achieving a scalable structure in terms of biometric algorithms, identification quality, sensors, database complexity, database integration, and component heterogenity. MBS consist of a number of biometric technologies including fingerprints, vein maps, voice and face recognitions with innovative DSP algorithm, and their hardware implementations such as using Field Programmable Gate arrays (FPGAs). Biometric technologies and the composed modularity biometric system are significant for governmental agencies, enterprises, banks and all other organizations to protect people or control access to critical resources.
Distributed spatial information integration based on web service
NASA Astrophysics Data System (ADS)
Tong, Hengjian; Zhang, Yun; Shao, Zhenfeng
2008-10-01
Spatial information systems and spatial information in different geographic locations usually belong to different organizations. They are distributed and often heterogeneous and independent from each other. This leads to the fact that many isolated spatial information islands are formed, reducing the efficiency of information utilization. In order to address this issue, we present a method for effective spatial information integration based on web service. The method applies asynchronous invocation of web service and dynamic invocation of web service to implement distributed, parallel execution of web map services. All isolated information islands are connected by the dispatcher of web service and its registration database to form a uniform collaborative system. According to the web service registration database, the dispatcher of web services can dynamically invoke each web map service through an asynchronous delegating mechanism. All of the web map services can be executed at the same time. When each web map service is done, an image will be returned to the dispatcher. After all of the web services are done, all images are transparently overlaid together in the dispatcher. Thus, users can browse and analyze the integrated spatial information. Experiments demonstrate that the utilization rate of spatial information resources is significantly raised thought the proposed method of distributed spatial information integration.
Distributed spatial information integration based on web service
NASA Astrophysics Data System (ADS)
Tong, Hengjian; Zhang, Yun; Shao, Zhenfeng
2009-10-01
Spatial information systems and spatial information in different geographic locations usually belong to different organizations. They are distributed and often heterogeneous and independent from each other. This leads to the fact that many isolated spatial information islands are formed, reducing the efficiency of information utilization. In order to address this issue, we present a method for effective spatial information integration based on web service. The method applies asynchronous invocation of web service and dynamic invocation of web service to implement distributed, parallel execution of web map services. All isolated information islands are connected by the dispatcher of web service and its registration database to form a uniform collaborative system. According to the web service registration database, the dispatcher of web services can dynamically invoke each web map service through an asynchronous delegating mechanism. All of the web map services can be executed at the same time. When each web map service is done, an image will be returned to the dispatcher. After all of the web services are done, all images are transparently overlaid together in the dispatcher. Thus, users can browse and analyze the integrated spatial information. Experiments demonstrate that the utilization rate of spatial information resources is significantly raised thought the proposed method of distributed spatial information integration.
Ezra Tsur, Elishai
2017-01-01
Databases are imperative for research in bioinformatics and computational biology. Current challenges in database design include data heterogeneity and context-dependent interconnections between data entities. These challenges drove the development of unified data interfaces and specialized databases. The curation of specialized databases is an ever-growing challenge due to the introduction of new data sources and the emergence of new relational connections between established datasets. Here, an open-source framework for the curation of specialized databases is proposed. The framework supports user-designed models of data encapsulation, objects persistency and structured interfaces to local and external data sources such as MalaCards, Biomodels and the National Centre for Biotechnology Information (NCBI) databases. The proposed framework was implemented using Java as the development environment, EclipseLink as the data persistency agent and Apache Derby as the database manager. Syntactic analysis was based on J3D, jsoup, Apache Commons and w3c.dom open libraries. Finally, a construction of a specialized database for aneurysms associated vascular diseases is demonstrated. This database contains 3-dimensional geometries of aneurysms, patient's clinical information, articles, biological models, related diseases and our recently published model of aneurysms' risk of rapture. Framework is available in: http://nbel-lab.com.
Metagenomic Taxonomy-Guided Database-Searching Strategy for Improving Metaproteomic Analysis.
Xiao, Jinqiu; Tanca, Alessandro; Jia, Ben; Yang, Runqing; Wang, Bo; Zhang, Yu; Li, Jing
2018-04-06
Metaproteomics provides a direct measure of the functional information by investigating all proteins expressed by a microbiota. However, due to the complexity and heterogeneity of microbial communities, it is very hard to construct a sequence database suitable for a metaproteomic study. Using a public database, researchers might not be able to identify proteins from poorly characterized microbial species, while a sequencing-based metagenomic database may not provide adequate coverage for all potentially expressed protein sequences. To address this challenge, we propose a metagenomic taxonomy-guided database-search strategy (MT), in which a merged database is employed, consisting of both taxonomy-guided reference protein sequences from public databases and proteins from metagenome assembly. By applying our MT strategy to a mock microbial mixture, about two times as many peptides were detected as with the metagenomic database only. According to the evaluation of the reliability of taxonomic attribution, the rate of misassignments was comparable to that obtained using an a priori matched database. We also evaluated the MT strategy with a human gut microbial sample, and we found 1.7 times as many peptides as using a standard metagenomic database. In conclusion, our MT strategy allows the construction of databases able to provide high sensitivity and precision in peptide identification in metaproteomic studies, enabling the detection of proteins from poorly characterized species within the microbiota.
de Groot, Mark C H; Klungel, Olaf H; Leufkens, Hubert G M; van Dijk, Liset; Grobbee, Diederick E; van de Garde, Ewoudt M W
2014-10-01
The heterogeneity in case-control studies on the associations between community-acquired pneumonia (CAP) and ACE-inhibitors (ACEi), statins, and proton pump inhibitors (PPI) hampers translation to clinical practice. Our objective is to explore sources of this heterogeneity by applying a common protocol in different data settings. We conducted ten case-control studies using data from five different health care databases. Databases varied on type of patients (hospitalised vs. GP), level of case validity, and mode of exposure ascertainment (prescription or dispensing based). Identified CAP patients and controls were matched on age, gender, and calendar year. Conditional logistic regression was used to calculate odds ratios (OR) for the associations between the drugs of interest and CAP. Associations were adjusted by a common set of potential confounders. Data of 38,742 cases and 118,019 controls were studied. Comparable patterns of variation between case-control studies were observed for ACEi, statins and PPI use and pneumonia risk with adjusted ORs varying from 1.04 to 1.49, 0.82 to 1.50 and 1.16 to 2.71, respectively. Overall, higher ORs were found for hospitalised CAP patients matched to population controls versus GP CAP patients matched to population controls. Prevalence of drug exposure was higher in dispensing data versus prescription data. We show that case-control selection and methods of exposure ascertainment induce bias that cannot be adjusted for and to a considerable extent explain the heterogeneity in results obtained in case-control studies on statins, ACEi and PPIs and CAP. The common protocol approach helps to better understand sources of variation in observational studies.
2011-01-01
Background Monitoring the time course of mortality by cause is a key public health issue. However, several mortality data production changes may affect cause-specific time trends, thus altering the interpretation. This paper proposes a statistical method that detects abrupt changes ("jumps") and estimates correction factors that may be used for further analysis. Methods The method was applied to a subset of the AMIEHS (Avoidable Mortality in the European Union, toward better Indicators for the Effectiveness of Health Systems) project mortality database and considered for six European countries and 13 selected causes of deaths. For each country and cause of death, an automated jump detection method called Polydect was applied to the log mortality rate time series. The plausibility of a data production change associated with each detected jump was evaluated through literature search or feedback obtained from the national data producers. For each plausible jump position, the statistical significance of the between-age and between-gender jump amplitude heterogeneity was evaluated by means of a generalized additive regression model, and correction factors were deduced from the results. Results Forty-nine jumps were detected by the Polydect method from 1970 to 2005. Most of the detected jumps were found to be plausible. The age- and gender-specific amplitudes of the jumps were estimated when they were statistically heterogeneous, and they showed greater by-age heterogeneity than by-gender heterogeneity. Conclusion The method presented in this paper was successfully applied to a large set of causes of death and countries. The method appears to be an alternative to bridge coding methods when the latter are not systematically implemented because they are time- and resource-consuming. PMID:21929756
Towards Direct Manipulation and Remixing of Massive Data: The EarthServer Approach
NASA Astrophysics Data System (ADS)
Baumann, P.
2012-04-01
Complex analytics on "big data" is one of the core challenges of current Earth science, generating strong requirements for on-demand processing and fil tering of massive data sets. Issues under discussion include flexibility, performance, scalability, and the heterogeneity of the information types invo lved. In other domains, high-level query languages (such as those offered by database systems) have proven successful in the quest for flexible, scalable data access interfaces to massive amounts of data. However, due to the lack of support for many of the Earth science data structures, database systems are only used for registries and catalogs, but not for the bulk of spatio-temporal data. One core information category in this field is given by coverage data. ISO 19123 defines coverages, simplifying, as a representation of a "space-time varying phenomenon". This model can express a large class of Earth science data structures, including rectified and non-rectified rasters, curvilinear grids, point clouds, TINs, general meshes, trajectories, surfaces, and solids. This abstract definition, which is too high-level to establish interoperability, is concretized by the OGC GML 3.2.1 Application Schema for Coverages Standard into an interoperable representation. The OGC Web Coverage Processing Service (WCPS) Standard defines a declarative query language on multi-dimensional raster-type coverages, such as 1D in-situ sensor timeseries, 2D EO imagery, 3D x/y/t image time series and x/y/z geophysical data, 4D x/y/z/t climate and ocean data. Hence, important ingredients for versatile coverage retrieval are given - however, this potential has not been fully unleashed by service architectures up to now. The EU FP7-INFRA project EarthServer, launched in September 2011, aims at enabling standards-based on-demand analytics over the Web for Earth science data based on an integration of W3C XQuery for alphanumeric data and OGC-WCPS for raster data. Ultimately, EarthServer will support all OGC coverage types. The platform used by EarthServer is the rasdaman raster database system. To exploit heterogeneous multi-parallel platforms, automatic request distribution and orchestration is being established. Client toolkits are under development which will allow to quickly compose bespoke interactive clients, ranging from mobile devices over Web clients to high-end immersive virtual reality. The EarthServer platform has been deployed in six large-scale data centres with the aim of setting up Lighthouse Applications addressing all Earth Sciences, including satellite and airborne earth observation as well as use cases from atmosphere, ocean, snow, and ice monitoring, and geology on Earth and Mars. These services, each of which will ultimately host at least 100 TB, will form a peer cloud with distributed query processing for arbitrarily mixing database and in-situ access. With its ability to directly manipulate, analyze and remix massive data, the goal of EarthServer is to lift the data providers' semantic level from data stewardship to service stewardship.
Distributed consensus for discrete-time heterogeneous multi-agent systems
NASA Astrophysics Data System (ADS)
Zhao, Huanyu; Fei, Shumin
2018-06-01
This paper studies the consensus problem for a class of discrete-time heterogeneous multi-agent systems. Two kinds of consensus algorithms will be considered. The heterogeneous multi-agent systems considered are converted into equivalent error systems by a model transformation. Then we analyse the consensus problem of the original systems by analysing the stability problem of the error systems. Some sufficient conditions for consensus of heterogeneous multi-agent systems are obtained by applying algebraic graph theory and matrix theory. Simulation examples are presented to show the usefulness of the results.
IntegromeDB: an integrated system and biological search engine.
Baitaluk, Michael; Kozhenkov, Sergey; Dubinina, Yulia; Ponomarenko, Julia
2012-01-19
With the growth of biological data in volume and heterogeneity, web search engines become key tools for researchers. However, general-purpose search engines are not specialized for the search of biological data. Here, we present an approach at developing a biological web search engine based on the Semantic Web technologies and demonstrate its implementation for retrieving gene- and protein-centered knowledge. The engine is available at http://www.integromedb.org. The IntegromeDB search engine allows scanning data on gene regulation, gene expression, protein-protein interactions, pathways, metagenomics, mutations, diseases, and other gene- and protein-related data that are automatically retrieved from publicly available databases and web pages using biological ontologies. To perfect the resource design and usability, we welcome and encourage community feedback.
[Big data, generalities and integration in radiotherapy].
Le Fèvre, C; Poty, L; Noël, G
2018-02-01
The many advances in data collection computing systems (data collection, database, storage), diagnostic and therapeutic possibilities are responsible for an increase and a diversification of available data. Big data offers the capacities, in the field of health, to accelerate the discoveries and to optimize the management of patients by combining a large volume of data and the creation of therapeutic models. In radiotherapy, the development of big data is attractive because data are very numerous et heterogeneous (demographics, radiomics, genomics, radiogenomics, etc.). The expectation would be to predict the effectiveness and tolerance of radiation therapy. With these new concepts, still at the preliminary stage, it is possible to create a personalized medicine which is always more secure and reliable. Copyright © 2017. Published by Elsevier SAS.
An interactive parallel programming environment applied in atmospheric science
NASA Technical Reports Server (NTRS)
vonLaszewski, G.
1996-01-01
This article introduces an interactive parallel programming environment (IPPE) that simplifies the generation and execution of parallel programs. One of the tasks of the environment is to generate message-passing parallel programs for homogeneous and heterogeneous computing platforms. The parallel programs are represented by using visual objects. This is accomplished with the help of a graphical programming editor that is implemented in Java and enables portability to a wide variety of computer platforms. In contrast to other graphical programming systems, reusable parts of the programs can be stored in a program library to support rapid prototyping. In addition, runtime performance data on different computing platforms is collected in a database. A selection process determines dynamically the software and the hardware platform to be used to solve the problem in minimal wall-clock time. The environment is currently being tested on a Grand Challenge problem, the NASA four-dimensional data assimilation system.
Use of fuzzy sets in modeling of GIS objects
NASA Astrophysics Data System (ADS)
Mironova, Yu N.
2018-05-01
The paper discusses modeling and methods of data visualization in geographic information systems. Information processing in Geoinformatics is based on the use of models. Therefore, geoinformation modeling is a key in the chain of GEODATA processing. When solving problems, using geographic information systems often requires submission of the approximate or insufficient reliable information about the map features in the GIS database. Heterogeneous data of different origin and accuracy have some degree of uncertainty. In addition, not all information is accurate: already during the initial measurements, poorly defined terms and attributes (e.g., "soil, well-drained") are used. Therefore, there are necessary methods for working with uncertain requirements, classes, boundaries. The author proposes using spatial information fuzzy sets. In terms of a characteristic function, a fuzzy set is a natural generalization of ordinary sets, when one rejects the binary nature of this feature and assumes that it can take any value in the interval.
Deep Whole-Genome Sequencing to Detect Mixed Infection of Mycobacterium tuberculosis
Gan, Mingyu; Liu, Qingyun; Yang, Chongguang; Gao, Qian; Luo, Tao
2016-01-01
Mixed infection by multiple Mycobacterium tuberculosis (MTB) strains is associated with poor treatment outcome of tuberculosis (TB). Traditional genotyping methods have been used to detect mixed infections of MTB, however, their sensitivity and resolution are limited. Deep whole-genome sequencing (WGS) has been proved highly sensitive and discriminative for studying population heterogeneity of MTB. Here, we developed a phylogenetic-based method to detect MTB mixed infections using WGS data. We collected published WGS data of 782 global MTB strains from public database. We called homogeneous and heterogeneous single nucleotide variations (SNVs) of individual strains by mapping short reads to the ancestral MTB reference genome. We constructed a phylogenomic database based on 68,639 homogeneous SNVs of 652 MTB strains. Mixed infections were determined if multiple evolutionary paths were identified by mapping the SNVs of individual samples to the phylogenomic database. By simulation, our method could specifically detect mixed infections when the sequencing depth of minor strains was as low as 1× coverage, and when the genomic distance of two mixed strains was as small as 16 SNVs. By applying our methods to all 782 samples, we detected 47 mixed infections and 45 of them were caused by locally endemic strains. The results indicate that our method is highly sensitive and discriminative for identifying mixed infections from deep WGS data of MTB isolates. PMID:27391214
Rainaldi, Guglielmo; Volpicella, Mariateresa; Licciulli, Flavio; Liuni, Sabino; Gallerani, Raffaele; Ceci, Luigi R
2003-01-01
The updated version of PLMItRNA reports information and multialignments on 609 genes and 34 tRNA molecules active in the mitochondria of Viridiplantae (27 Embryophyta and 10 Chlorophyta), and photosynthetic algae (one Cryptophyta, four Rhodophyta and two Stramenopiles). Colour-code based tables reporting the different genetic origin of identified genes allow hyper-textual link to single entries. Promoter sequences identified for tRNA genes in the mitochondrial genomes of Angiospermae are also reported. The PLMItRNA database is accessible at http://bighost.area.ba.cnr.it/PLMItRNA/.
Spread of hospital-acquired infections: A comparison of healthcare networks
Astagneau, Pascal; Crépey, Pascal
2017-01-01
Hospital-acquired infections (HAIs), including emerging multi-drug resistant organisms, threaten healthcare systems worldwide. Efficient containment measures of HAIs must mobilize the entire healthcare network. Thus, to best understand how to reduce the potential scale of HAI epidemic spread, we explore patient transfer patterns in the French healthcare system. Using an exhaustive database of all hospital discharge summaries in France in 2014, we construct and analyze three patient networks based on the following: transfers of patients with HAI (HAI-specific network); patients with suspected HAI (suspected-HAI network); and all patients (general network). All three networks have heterogeneous patient flow and demonstrate small-world and scale-free characteristics. Patient populations that comprise these networks are also heterogeneous in their movement patterns. Ranking of hospitals by centrality measures and comparing community clustering using community detection algorithms shows that despite the differences in patient population, the HAI-specific and suspected-HAI networks rely on the same underlying structure as that of the general network. As a result, the general network may be more reliable in studying potential spread of HAIs. Finally, we identify transfer patterns at both the French regional and departmental (county) levels that are important in the identification of key hospital centers, patient flow trajectories, and regional clusters that may serve as a basis for novel wide-scale infection control strategies. PMID:28837555
The Status of Literacy of Sustainable Agriculture in Iran: A Systematic Review
ERIC Educational Resources Information Center
Vaninee, Hassan Sadough; Veisi, Hadi; Gorbani, Shiva; Falsafi, Peyman; Liaghati, Houman
2016-01-01
This study analyzes heterogeneous research with a focus on the knowledge, attitude, and behavior of farmers and the components of sustainable agriculture literacy through an interdisciplinary, systematic literature review for the time frame from 1996 to 2013. The major research databases were searched and 170 papers were identified. Paper…
ERIC Educational Resources Information Center
Lee, Myeong Soo; Choi, Tae-Young; Shin, Byung-Cheul; Ernst, Edzard
2012-01-01
This study aimed to assess the effectiveness of acupuncture as a treatment for autism spectrum disorders (ASD). We searched the literature using 15 databases. Eleven randomized clinical trials (RCTs) met our inclusion criteria. Most had significant methodological weaknesses. The studies' statistical and clinical heterogeneity prevented us from…
Heterogeneous Embedded Real-Time Systems Environment
2003-12-01
AFRL-IF-RS-TR-2003-290 Final Technical Report December 2003 HETEROGENEOUS EMBEDDED REAL - TIME SYSTEMS ENVIRONMENT Integrated...HETEROGENEOUS EMBEDDED REAL - TIME SYSTEMS ENVIRONMENT 6. AUTHOR(S) Cosmo Castellano and James Graham 5. FUNDING NUMBERS C - F30602-97-C-0259
Resources monitoring and automatic management system for multi-VO distributed computing system
NASA Astrophysics Data System (ADS)
Chen, J.; Pelevanyuk, I.; Sun, Y.; Zhemchugov, A.; Yan, T.; Zhao, X. H.; Zhang, X. M.
2017-10-01
Multi-VO supports based on DIRAC have been set up to provide workload and data management for several high energy experiments in IHEP. To monitor and manage the heterogeneous resources which belong to different Virtual Organizations in a uniform way, a resources monitoring and automatic management system based on Resource Status System(RSS) of DIRAC has been presented in this paper. The system is composed of three parts: information collection, status decision and automatic control, and information display. The information collection includes active and passive way of gathering status from different sources and stores them in databases. The status decision and automatic control is used to evaluate the resources status and take control actions on resources automatically through some pre-defined policies and actions. The monitoring information is displayed on a web portal. Both the real-time information and historical information can be obtained from the web portal. All the implementations are based on DIRAC framework. The information and control including sites, policies, web portal for different VOs can be well defined and distinguished within DIRAC user and group management infrastructure.
Kauppinen, Tomi; Keßler, Carsten; Fritz, Fleur
2014-01-01
Background Healthcare organizations around the world are challenged by pressures to reduce cost, improve coordination and outcome, and provide more with less. This requires effective planning and evidence-based practice by generating important information from available data. Thus, flexible and user-friendly ways to represent, query, and visualize health data becomes increasingly important. International organizations such as the World Health Organization (WHO) regularly publish vital data on priority health topics that can be utilized for public health policy and health service development. However, the data in most portals is displayed in either Excel or PDF formats, which makes information discovery and reuse difficult. Linked Open Data (LOD)—a new Semantic Web set of best practice of standards to publish and link heterogeneous data—can be applied to the representation and management of public level health data to alleviate such challenges. However, the technologies behind building LOD systems and their effectiveness for health data are yet to be assessed. Objective The objective of this study is to evaluate whether Linked Data technologies are potential options for health information representation, visualization, and retrieval systems development and to identify the available tools and methodologies to build Linked Data-based health information systems. Methods We used the Resource Description Framework (RDF) for data representation, Fuseki triple store for data storage, and Sgvizler for information visualization. Additionally, we integrated SPARQL query interface for interacting with the data. We primarily use the WHO health observatory dataset to test the system. All the data were represented using RDF and interlinked with other related datasets on the Web of Data using Silk—a link discovery framework for Web of Data. A preliminary usability assessment was conducted following the System Usability Scale (SUS) method. Results We developed an LOD-based health information representation, querying, and visualization system by using Linked Data tools. We imported more than 20,000 HIV-related data elements on mortality, prevalence, incidence, and related variables, which are freely available from the WHO global health observatory database. Additionally, we automatically linked 5312 data elements from DBpedia, Bio2RDF, and LinkedCT using the Silk framework. The system users can retrieve and visualize health information according to their interests. For users who are not familiar with SPARQL queries, we integrated a Linked Data search engine interface to search and browse the data. We used the system to represent and store the data, facilitating flexible queries and different kinds of visualizations. The preliminary user evaluation score by public health data managers and users was 82 on the SUS usability measurement scale. The need to write queries in the interface was the main reported difficulty of LOD-based systems to the end user. Conclusions The system introduced in this article shows that current LOD technologies are a promising alternative to represent heterogeneous health data in a flexible and reusable manner so that they can serve intelligent queries, and ultimately support decision-making. However, the development of advanced text-based search engines is necessary to increase its usability especially for nontechnical users. Further research with large datasets is recommended in the future to unfold the potential of Linked Data and Semantic Web for future health information systems development. PMID:25601195
Tilahun, Binyam; Kauppinen, Tomi; Keßler, Carsten; Fritz, Fleur
2014-10-25
Healthcare organizations around the world are challenged by pressures to reduce cost, improve coordination and outcome, and provide more with less. This requires effective planning and evidence-based practice by generating important information from available data. Thus, flexible and user-friendly ways to represent, query, and visualize health data becomes increasingly important. International organizations such as the World Health Organization (WHO) regularly publish vital data on priority health topics that can be utilized for public health policy and health service development. However, the data in most portals is displayed in either Excel or PDF formats, which makes information discovery and reuse difficult. Linked Open Data (LOD)-a new Semantic Web set of best practice of standards to publish and link heterogeneous data-can be applied to the representation and management of public level health data to alleviate such challenges. However, the technologies behind building LOD systems and their effectiveness for health data are yet to be assessed. The objective of this study is to evaluate whether Linked Data technologies are potential options for health information representation, visualization, and retrieval systems development and to identify the available tools and methodologies to build Linked Data-based health information systems. We used the Resource Description Framework (RDF) for data representation, Fuseki triple store for data storage, and Sgvizler for information visualization. Additionally, we integrated SPARQL query interface for interacting with the data. We primarily use the WHO health observatory dataset to test the system. All the data were represented using RDF and interlinked with other related datasets on the Web of Data using Silk-a link discovery framework for Web of Data. A preliminary usability assessment was conducted following the System Usability Scale (SUS) method. We developed an LOD-based health information representation, querying, and visualization system by using Linked Data tools. We imported more than 20,000 HIV-related data elements on mortality, prevalence, incidence, and related variables, which are freely available from the WHO global health observatory database. Additionally, we automatically linked 5312 data elements from DBpedia, Bio2RDF, and LinkedCT using the Silk framework. The system users can retrieve and visualize health information according to their interests. For users who are not familiar with SPARQL queries, we integrated a Linked Data search engine interface to search and browse the data. We used the system to represent and store the data, facilitating flexible queries and different kinds of visualizations. The preliminary user evaluation score by public health data managers and users was 82 on the SUS usability measurement scale. The need to write queries in the interface was the main reported difficulty of LOD-based systems to the end user. The system introduced in this article shows that current LOD technologies are a promising alternative to represent heterogeneous health data in a flexible and reusable manner so that they can serve intelligent queries, and ultimately support decision-making. However, the development of advanced text-based search engines is necessary to increase its usability especially for nontechnical users. Further research with large datasets is recommended in the future to unfold the potential of Linked Data and Semantic Web for future health information systems development.
Prevalence of arterial hypertension among Brazilian adolescents: systematic review and meta-analysis
2013-01-01
Background Cardiovascular diseases are the leading cause of death in the world and are responsible for a high number of disability-adjusted life years. Elevated blood pressure is an independent, linear and continuous risk factor for cardiovascular disease and has also been reported in the young population. Brazil is a country of continental dimensions, and is very heterogeneous with respect to socioeconomic and cultural aspects. Brazilian studies on the subject of hypertension in adolescence are not nationally representative, and this provides a rationale for the conduction of a meta-analysis to assess the magnitude of the problem in the country. Methods Hypertension studies in adolescents published from 1990 to September 2010 were searched in six electronic databases. Forest plots of the prevalence of hypertension were built for the overall population and by gender. Heterogeneity was assessed using I2 statistics. Meta-regression models were adjusted in order to identify possible sources of heterogeneity. Results Of 3,631 articles initially identified, 17 were considered eligible for systematic review. The pooled prevalence of hypertension, estimated through random effects models, was 8.12% (95% CI 6.24 - 10.52) for the total population. Overall, prevalence was higher in males, 8.75% (95% CI 5.81 - 12.96) than females, 6.31%, (95% CI 4.41 - 8.96). Several variables were investigated in the heterogeneity analysis: region of the study, sample size, age and method of blood pressure measurement. The only variables that partially and inconsistently explained the observed heterogeneity (I2 = 95.3%) were the region of the country where the study was conducted and sample. Conclusions There was a large variation in hypertension prevalence and in the methods used for its evaluation throughout studies with Brazilian adolescents, indicating the need for standardized procedures and validated methods for hypertension measurement. Despite the large observed heterogeneity, and the small number of studies in some regions of Brazil, the pooled prevalence found in both males and females shows that systemic arterial hypertension should be monitored in the population aged 10–20 years and that specific measures are required to prevent and control the disease, as well as its risk factors. Studies that compare regional heterogeneities may contribute to the knowledge of factors associated with increased blood pressure among adolescents. PMID:24025095
Chronology of chrondrule and CAI formation: Mg-Al isotopic evidence
NASA Technical Reports Server (NTRS)
Macpherson, G. J.; Davis, A. M.
1994-01-01
Details of the chondrule and Ca-Al-rich inclusion (CAI) formation during the earliest history of the solar system are imperfectly known. Because CAI's are more 'refractory' than ferromagnesian chondrules and have the lowest recorded initial Sr-87/Sr-86 ratios of any solar system materials, the expectation is that CAI's formed earlier than chondrules. But it is not known, for example, if CAI formation had stopped by the time chondrule formation began. Conventional (absolute) age-dating techniques cannot adequately resolve small age differences (less than 10(exp 6) years) between objects of such antiquity. One approach has been to look at systematic differences in the daughter products of short-lived radionuclides such as Al-26 and I-129. Unfortunately, neither system appears to be 'well-behaved.' One possible reason for this circumstance is that later secondary events have partially reset the isotopic systems, but a viable alternative continues to be large-scale (nebular) heterogeneity in initial isotopic abundances, which would of course render the systems nearly useless as chronometers. In the past two years the nature of this problem has been redefined somewhat. Examination of the Al-Mg isotopic database for all CAI's suggests that the vast majority of inclusions originally had the same initial Al-26/Al-27 abundance ratio, and that the ill-behaved isotopic systematics now observed are the results of later partial reequilibration due to thermal processing. Isotopic heterogeneities did exist in the nebula, as demonstrated by the existence of so-called FUN inclusions in CV3 chondrites and isotopically anomalous hibonite grains in CM2 chondrites, which had little or no live Al-26 at the time of their formation. But, among the population of CV3 inclusions at least, FUN inclusions appear to have been a relatively minor nebular component.
OLYMPUS DISS - A Readily Implemented Geographic Data and Information Sharing System
NASA Astrophysics Data System (ADS)
Necsoiu, D. M.; Winfrey, B.; Murphy, K.; McKague, H. L.
2002-12-01
Electronic information technology has become a crucial component of business, government, and scientific organizations. In this technology era, many enterprises are moving away from the perception that information repositories are only a tool for decision-making. Instead, many organizations are learning that information systems, which are capable of organizing and following the interrelations between information and both the short-term and strategic organizational goals, are assets themselves, with inherent value. Olympus Data and Information Sharing System (DISS) is a system developed at the Center for Nuclear Waste Regulatory Analyses (CNWRA) to solve several difficult tasks associated with the management of geographical, geological and geophysical data. Three of the tasks were to (1) gather the large amount of heterogeneous information that has accumulated over the operational lifespan of CNWRA, (2) store the data in a central, knowledge-based, searchable database and (3) create quick, easy, convenient, and reliable access to that information. Faced with these difficult tasks CNWRA identified the requirements for designing such a system. Key design criteria were: (a) ability to ingest different data formats (i.e., raster, vector, and tabular data); (b) minimal expense using open-source and commercial off-the-shelf software; (c) seamless management of geospatial data, freeing up time for researchers to focus on analyses or algorithm development, rather than on time consuming format conversions; (d) controlled access; and (e) scalable architecture to meet new and continuing demands. Olympus DISS is a solution that can be easily adapted to small and mid-size enterprises dealing with heterogeneous geographic data. It uses established data standards, provides a flexible mechanism to build applications upon and output geographic data in multiple and clear ways. This abstract is an independent product of the CNWRA and does not necessarily reflect the views or regulatory position of the Nuclear Regulatory Commission.
Tempest: Accelerated MS/MS database search software for heterogeneous computing platforms
Adamo, Mark E.; Gerber, Scott A.
2017-01-01
MS/MS database search algorithms derive a set of candidate peptide sequences from in-silico digest of a protein sequence database, and compute theoretical fragmentation patterns to match these candidates against observed MS/MS spectra. The original Tempest publication described these operations mapped to a CPU-GPU model, in which the CPU generates peptide candidates that are asynchronously sent to a discrete GPU to be scored against experimental spectra in parallel (Milloy et al., 2012). The current version of Tempest expands this model, incorporating OpenCL to offer seamless parallelization across multicore CPUs, GPUs, integrated graphics chips, and general-purpose coprocessors. Three protocols describe how to configure and run a Tempest search, including discussion of how to leverage Tempest's unique feature set to produce optimal results. PMID:27603022
Czech multicenter research database of severe COPD
Novotna, Barbora; Koblizek, Vladimir; Zatloukal, Jaromir; Plutinsky, Marek; Hejduk, Karel; Zbozinkova, Zuzana; Jarkovsky, Jiri; Sobotik, Ondrej; Dvorak, Tomas; Safranek, Petr
2014-01-01
Purpose Chronic obstructive pulmonary disease (COPD) has been recognized as a heterogeneous, multiple organ system-affecting disorder. The Global Initiative for Chronic Obstructive Lung Disease (GOLD) places emphasis on symptom and exacerbation management. The aim of this study is examine the course of COPD and its impact on morbidity and all-cause mortality of patients, with respect to individual phenotypes and GOLD categories. This study will also evaluate COPD real-life patient care in the Czech Republic. Patients and methods The Czech Multicentre Research Database of COPD is projected to last for 5 years, with the aim of enrolling 1,000 patients. This is a multicenter, observational, and prospective study of patients with severe COPD (post-bronchodilator forced expiratory volume in 1 second ≤60%). Every consecutive patient, who fulfils the inclusion criteria, is asked to participate in the study. Patient recruitment is done on the basis of signed informed consent. The study was approved by the Multicentre Ethical Committee in Brno, Czech Republic. Results The objective of this paper was to outline the methodology of this study. Conclusion The establishment of the database is a useful step in improving care for COPD subjects. Additionally, it will serve as a source of data elucidating the natural course of COPD, comorbidities, and overall impact on the patients. Moreover, it will provide information on the diverse course of the COPD syndrome in the Czech Republic. PMID:25419124
A web-based, relational database for studying glaciers in the Italian Alps
NASA Astrophysics Data System (ADS)
Nigrelli, G.; Chiarle, M.; Nuzzi, A.; Perotti, L.; Torta, G.; Giardino, M.
2013-02-01
Glaciers are among the best terrestrial indicators of climate change and thus glacier inventories have attracted a growing, worldwide interest in recent years. In Italy, the first official glacier inventory was completed in 1925 and 774 glacial bodies were identified. As the amount of data continues to increase, and new techniques become available, there is a growing demand for computer tools that can efficiently manage the collected data. The Research Institute for Geo-hydrological Protection of the National Research Council, in cooperation with the Departments of Computer Science and Earth Sciences of the University of Turin, created a database that provides a modern tool for storing, processing and sharing glaciological data. The database was developed according to the need of storing heterogeneous information, which can be retrieved through a set of web search queries. The database's architecture is server-side, and was designed by means of an open source software. The website interface, simple and intuitive, was intended to meet the needs of a distributed public: through this interface, any type of glaciological data can be managed, specific queries can be performed, and the results can be exported in a standard format. The use of a relational database to store and organize a large variety of information about Italian glaciers collected over the last hundred years constitutes a significant step forward in ensuring the safety and accessibility of such data. Moreover, the same benefits also apply to the enhanced operability for handling information in the future, including new and emerging types of data formats, such as geographic and multimedia files. Future developments include the integration of cartographic data, such as base maps, satellite images and vector data. The relational database described in this paper will be the heart of a new geographic system that will merge data, data attributes and maps, leading to a complete description of Italian glacial environments.
A Services-Oriented Architecture for Water Observations Data
NASA Astrophysics Data System (ADS)
Maidment, D. R.; Zaslavsky, I.; Valentine, D.; Tarboton, D. G.; Whitenack, T.; Whiteaker, T.; Hooper, R.; Kirschtel, D.
2009-04-01
Water observations data are time series of measurements made at point locations of water level, flow, and quality and corresponding data for climatic observations at point locations such as gaged precipitation and weather variables. A services-oriented architecture has been built for such information for the United States that has three components: hydrologic information servers, hydrologic information clients, and a centralized metadata cataloging system. These are connected using web services for observations data and metadata defined by an XML-based language called WaterML. A Hydrologic Information Server can be built by storing observations data in a relational database schema in the CUAHSI Observations Data Model, in which case, web services access to the data and metadata is automatically provided by query functions for WaterML that are wrapped around the relational database within a web server. A Hydrologic Information Server can also be constructed by custom-programming an interface to an existing water agency web site so that responds to the same queries by producing data in WaterML as do the CUAHSI Observations Data Model based servers. A Hydrologic Information Client is one which can interpret and ingest WaterML metadata and data. We have two client applications for Excel and ArcGIS and have shown how WaterML web services can be ingested into programming environments such as Matlab and Visual Basic. HIS Central, maintained at the San Diego Supercomputer Center is a repository of observational metadata for WaterML web services which presently indexes 342 million data measured at 1.75 million locations. This is the largest catalog water observational data for the United States presently in existence. As more observation networks join what we term "CUAHSI Water Data Federation", and the system accommodates a growing number of sites, measured parameters, applications, and users, rapid and reliable access to large heterogeneous hydrologic data repositories becomes critical. The CUAHSI HIS solution to the scalability and heterogeneity challenges has several components. Structural differences across the data repositories are addressed by building a standard services foundation for the exchange of hydrologic data, as derived from a common information model for observational data measured at stationary points and its implementation as a relational schema (ODM) and an XML schema (WaterML). Semantic heterogeneity is managed by mapping water quantity, water quality, and other parameters collected by government agencies and academic projects to a common ontology. The WaterML-compliant web services are indexed in a community services registry called HIS Central (hiscentral.cuahsi.org). Once a web service is registered in HIS Central, its metadata (site and variable characteristics, period of record for each variable at each site, etc.) is harvested and appended to the central catalog. The catalog is further updated as the service publisher associates the variables in the published service with ontology concepts. After this, the newly published service becomes available for spatial and semantics-based queries from online and desktop client applications developed by the project. Hydrologic system server software is now deployed at more than a dozen locations in the United States and Australia. To provide rapid access to data summaries, in particular for several nation-wide data repositories including EPA STORET, USGS NWIS, and USDA SNOTEL, we convert the observation data catalogs and databases with harvested data values into special representations that support high-performance analysis and visualization. The construction of OLAP (Online Analytical Processing) cubes, often called data cubes, is an approach to organizing and querying large multi-dimensional data collections. We have applied the OLAP techniques, as implemented in Microsoft SQL Server 2005/2008, to the analysis of the catalogs from several agencies. OLAP analysis results reflect geography and history of observation data availability from USGS NWIS, EPA STORET, and USDA SNOTEL repositories, and spatial and temporal dynamics of the available measurements for several key nutrient-related parameters. Our experience developing the CUAHSI HIS cyberinfrastructure demonstrated that efficient integration of hydrologic observations from multiple government and academic sources requires a range of technical approaches focused on managing different components of data heterogeneity and system scalability. While this submission addresses technical aspects of developing a national-scale information system for hydrologic observations, the challenges of explicating shared semantics of hydrologic observations and building a community of HIS users and developers remain critical in constructing a nation-wide federation of water data services.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lopez Torres, E., E-mail: Ernesto.Lopez.Torres@cern.ch, E-mail: cerello@to.infn.it; Fiorina, E.; Pennazio, F.
Purpose: M5L, a fully automated computer-aided detection (CAD) system for the detection and segmentation of lung nodules in thoracic computed tomography (CT), is presented and validated on several image datasets. Methods: M5L is the combination of two independent subsystems, based on the Channeler Ant Model as a segmentation tool [lung channeler ant model (lungCAM)] and on the voxel-based neural approach. The lungCAM was upgraded with a scan equalization module and a new procedure to recover the nodules connected to other lung structures; its classification module, which makes use of a feed-forward neural network, is based of a small number ofmore » features (13), so as to minimize the risk of lacking generalization, which could be possible given the large difference between the size of the training and testing datasets, which contain 94 and 1019 CTs, respectively. The lungCAM (standalone) and M5L (combined) performance was extensively tested on 1043 CT scans from three independent datasets, including a detailed analysis of the full Lung Image Database Consortium/Image Database Resource Initiative database, which is not yet found in literature. Results: The lungCAM and M5L performance is consistent across the databases, with a sensitivity of about 70% and 80%, respectively, at eight false positive findings per scan, despite the variable annotation criteria and acquisition and reconstruction conditions. A reduced sensitivity is found for subtle nodules and ground glass opacities (GGO) structures. A comparison with other CAD systems is also presented. Conclusions: The M5L performance on a large and heterogeneous dataset is stable and satisfactory, although the development of a dedicated module for GGOs detection could further improve it, as well as an iterative optimization of the training procedure. The main aim of the present study was accomplished: M5L results do not deteriorate when increasing the dataset size, making it a candidate for supporting radiologists on large scale screenings and clinical programs.« less
Laplacian normalization and random walk on heterogeneous networks for disease-gene prioritization.
Zhao, Zhi-Qin; Han, Guo-Sheng; Yu, Zu-Guo; Li, Jinyan
2015-08-01
Random walk on heterogeneous networks is a recently emerging approach to effective disease gene prioritization. Laplacian normalization is a technique capable of normalizing the weight of edges in a network. We use this technique to normalize the gene matrix and the phenotype matrix before the construction of the heterogeneous network, and also use this idea to define the transition matrices of the heterogeneous network. Our method has remarkably better performance than the existing methods for recovering known gene-phenotype relationships. The Shannon information entropy of the distribution of the transition probabilities in our networks is found to be smaller than the networks constructed by the existing methods, implying that a higher number of top-ranked genes can be verified as disease genes. In fact, the most probable gene-phenotype relationships ranked within top 3 or top 5 in our gene lists can be confirmed by the OMIM database for many cases. Our algorithms have shown remarkably superior performance over the state-of-the-art algorithms for recovering gene-phenotype relationships. All Matlab codes can be available upon email request. Copyright © 2015 Elsevier Ltd. All rights reserved.
Dietary choline and betaine intakes vary in an adult multiethnic population.
Yonemori, Kim M; Lim, Unhee; Koga, Karin R; Wilkens, Lynne R; Au, Donna; Boushey, Carol J; Le Marchand, Loïc; Kolonel, Laurence N; Murphy, Suzanne P
2013-06-01
Choline and betaine are important nutrients for human health, but reference food composition databases for these nutrients became available only recently. We tested the feasibility of using these databases to estimate dietary choline and betaine intakes among ethnically diverse adults who participated in the Multiethnic Cohort (MEC) Study. Of the food items (n = 965) used to quantify intakes for the MEC FFQ, 189 items were exactly matched with items in the USDA Database for the Choline Content of Common Foods for total choline, choline-containing compounds, and betaine, and 547 items were matched to the USDA National Nutrient Database for Standard Reference for total choline (n = 547) and 148 for betaine. When a match was not found, choline and betaine values were imputed based on the same food with a different form (124 food items for choline, 300 for choline compounds, 236 for betaine), a similar food (n = 98, 284, and 227, respectively) or the closest item in the same food category (n = 6, 191, and 157, respectively), or the values were assumed to be zero (n = 1, 1, and 8, respectively). The resulting mean intake estimates for choline and betaine among 188,147 MEC participants (aged 45-75) varied by sex (372 and 154 mg/d in men, 304 and 128 mg/d in women, respectively; P-heterogeneity < 0.0001) and by race/ethnicity among Caucasians, African Americans, Japanese Americans, Latinos, and Native Hawaiians (P-heterogeneity < 0.0001), largely due to the variation in energy intake. Our findings demonstrate the feasibility of assessing choline and betaine intake and characterize the variation in intake that exists in a multiethnic population.
Dietary Choline and Betaine Intakes Vary in an Adult Multiethnic Population123
Yonemori, Kim M.; Lim, Unhee; Koga, Karin R.; Wilkens, Lynne R.; Au, Donna; Boushey, Carol J.; Le Marchand, Loïc; Kolonel, Laurence N.; Murphy, Suzanne P.
2013-01-01
Choline and betaine are important nutrients for human health, but reference food composition databases for these nutrients became available only recently. We tested the feasibility of using these databases to estimate dietary choline and betaine intakes among ethnically diverse adults who participated in the Multiethnic Cohort (MEC) Study. Of the food items (n = 965) used to quantify intakes for the MEC FFQ, 189 items were exactly matched with items in the USDA Database for the Choline Content of Common Foods for total choline, choline-containing compounds, and betaine, and 547 items were matched to the USDA National Nutrient Database for Standard Reference for total choline (n = 547) and 148 for betaine. When a match was not found, choline and betaine values were imputed based on the same food with a different form (124 food items for choline, 300 for choline compounds, 236 for betaine), a similar food (n = 98, 284, and 227, respectively) or the closest item in the same food category (n = 6, 191, and 157, respectively), or the values were assumed to be zero (n = 1, 1, and 8, respectively). The resulting mean intake estimates for choline and betaine among 188,147 MEC participants (aged 45–75) varied by sex (372 and 154 mg/d in men, 304 and 128 mg/d in women, respectively; P-heterogeneity < 0.0001) and by race/ethnicity among Caucasians, African Americans, Japanese Americans, Latinos, and Native Hawaiians (P-heterogeneity < 0.0001), largely due to the variation in energy intake. Our findings demonstrate the feasibility of assessing choline and betaine intake and characterize the variation in intake that exists in a multiethnic population. PMID:23616508
Meta-analysis on the effectiveness of team-based learning on medical education in China.
Chen, Minjian; Ni, Chunhui; Hu, Yanhui; Wang, Meilin; Liu, Lu; Ji, Xiaoming; Chu, Haiyan; Wu, Wei; Lu, Chuncheng; Wang, Shouyu; Wang, Shoulin; Zhao, Liping; Li, Zhong; Zhu, Huijuan; Wang, Jianming; Xia, Yankai; Wang, Xinru
2018-04-10
Team-based learning (TBL) has been adopted as a new medical pedagogical approach in China. However, there are no studies or reviews summarizing the effectiveness of TBL on medical education. This study aims to obtain an overall estimation of the effectiveness of TBL on outcomes of theoretical teaching of medical education in China. We retrieved the studies from inception through December, 2015. Chinese National Knowledge Infrastructure, Chinese Biomedical Literature Database, Chinese Wanfang Database, Chinese Scientific Journal Database, PubMed, EMBASE and Cochrane Database were searched. The quality of included studies was assessed by the Newcastle-Ottawa scale. Standardized mean difference (SMD) was applied for the estimation of the pooled effects. Heterogeneity assumption was detected by I 2 statistics, and was further explored by meta-regression analysis. A total of 13 articles including 1545 participants eventually entered into the meta-analysis. The quality scores of these studies ranged from 6 to 10. Altogether, TBL significantly increased students' theoretical examination scores when compared with lecture-based learning (LBL) (SMD = 2.46, 95% CI: 1.53-3.40). Additionally, TBL significantly increased students' learning attitude (SMD = 3.23, 95% CI: 2.27-4.20), and learning skill (SMD = 2.70, 95% CI: 1.33-4.07). The meta-regression results showed that randomization, education classification and gender diversity were the factors that caused heterogeneity. TBL in theoretical teaching of medical education seems to be more effective than LBL in improving the knowledge, attitude and skill of students in China, providing evidence for the implement of TBL in medical education in China. The medical schools should implement TBL with the consideration on the practical teaching situations such as students' education level.
PDXliver: a database of liver cancer patient derived xenograft mouse models.
He, Sheng; Hu, Bo; Li, Chao; Lin, Ping; Tang, Wei-Guo; Sun, Yun-Fan; Feng, Fang-You-Min; Guo, Wei; Li, Jia; Xu, Yang; Yao, Qian-Lan; Zhang, Xin; Qiu, Shuang-Jian; Zhou, Jian; Fan, Jia; Li, Yi-Xue; Li, Hong; Yang, Xin-Rong
2018-05-09
Liver cancer is the second leading cause of cancer-related deaths and characterized by heterogeneity and drug resistance. Patient-derived xenograft (PDX) models have been widely used in cancer research because they reproduce the characteristics of original tumors. However, the current studies of liver cancer PDX mice are scattered and the number of available PDX models are too small to represent the heterogeneity of liver cancer patients. To improve this situation and to complement available PDX models related resources, here we constructed a comprehensive database, PDXliver, to integrate and analyze liver cancer PDX models. Currently, PDXliver contains 116 PDX models from Chinese liver cancer patients, 51 of them were established by the in-house PDX platform and others were curated from the public literatures. These models are annotated with complete information, including clinical characteristics of patients, genome-wide expression profiles, germline variations, somatic mutations and copy number alterations. Analysis of expression subtypes and mutated genes show that PDXliver represents the diversity of human patients. Another feature of PDXliver is storing drug response data of PDX mice, which makes it possible to explore the association between molecular profiles and drug sensitivity. All data can be accessed via the Browse and Search pages. Additionally, two tools are provided to interactively visualize the omics data of selected PDXs or to compare two groups of PDXs. As far as we known, PDXliver is the first public database of liver cancer PDX models. We hope that this comprehensive resource will accelerate the utility of PDX models and facilitate liver cancer research. The PDXliver database is freely available online at: http://www.picb.ac.cn/PDXliver/.
Methodologies and systems for heterogeneous concurrent computing
NASA Technical Reports Server (NTRS)
Sunderam, V. S.
1994-01-01
Heterogeneous concurrent computing is gaining increasing acceptance as an alternative or complementary paradigm to multiprocessor-based parallel processing as well as to conventional supercomputing. While algorithmic and programming aspects of heterogeneous concurrent computing are similar to their parallel processing counterparts, system issues, partitioning and scheduling, and performance aspects are significantly different. In this paper, we discuss critical design and implementation issues in heterogeneous concurrent computing, and describe techniques for enhancing its effectiveness. In particular, we highlight the system level infrastructures that are required, aspects of parallel algorithm development that most affect performance, system capabilities and limitations, and tools and methodologies for effective computing in heterogeneous networked environments. We also present recent developments and experiences in the context of the PVM system and comment on ongoing and future work.
NASA Technical Reports Server (NTRS)
Zendejas, Silvino; Bui, Tung; Bui, Bach; Malhotra, Shantanu; Chen, Fannie; Kim, Rachel; Allen, Christopher; Luong, Ivy; Chang, George; Sadaqathulla, Syed
2009-01-01
The Work Coordination Engine (WCE) is a Java application integrated into the Service Management Database (SMDB), which coordinates the dispatching and monitoring of a work order system. WCE de-queues work orders from SMDB and orchestrates the dispatching of work to a registered set of software worker applications distributed over a set of local, or remote, heterogeneous computing systems. WCE monitors the execution of work orders once dispatched, and accepts the results of the work order by storing to the SMDB persistent store. The software leverages the use of a relational database, Java Messaging System (JMS), and Web Services using Simple Object Access Protocol (SOAP) technologies to implement an efficient work-order dispatching mechanism capable of coordinating the work of multiple computer servers on various platforms working concurrently on different, or similar, types of data or algorithmic processing. Existing (legacy) applications can be wrapped with a proxy object so that no changes to the application are needed to make them available for integration into the work order system as "workers." WCE automatically reschedules work orders that fail to be executed by one server to a different server if available. From initiation to completion, the system manages the execution state of work orders and workers via a well-defined set of events, states, and actions. It allows for configurable work-order execution timeouts by work-order type. This innovation eliminates a current processing bottleneck by providing a highly scalable, distributed work-order system used to quickly generate products needed by the Deep Space Network (DSN) to support space flight operations. WCE is driven by asynchronous messages delivered via JMS indicating the availability of new work or workers. It runs completely unattended in support of the lights-out operations concept in the DSN.
Cheng, Feixiong; Liu, Chuang; Shen, Bairong; Zhao, Zhongming
2016-08-26
Cancer is increasingly recognized as a cellular system phenomenon that is attributed to the accumulation of genetic or epigenetic alterations leading to the perturbation of the molecular network architecture. Elucidation of network properties that can characterize tumor initiation and progression, or pinpoint the molecular targets related to the drug sensitivity or resistance, is therefore of critical importance for providing systems-level insights into tumorigenesis and clinical outcome in the molecularly targeted cancer therapy. In this study, we developed a network-based framework to quantitatively examine cellular network heterogeneity and modularity in cancer. Specifically, we constructed gene co-expressed protein interaction networks derived from large-scale RNA-Seq data across 8 cancer types generated in The Cancer Genome Atlas (TCGA) project. We performed gene network entropy and balanced versus unbalanced motif analysis to investigate cellular network heterogeneity and modularity in tumor versus normal tissues, different stages of progression, and drug resistant versus sensitive cancer cell lines. We found that tumorigenesis could be characterized by a significant increase of gene network entropy in all of the 8 cancer types. The ratio of the balanced motifs in normal tissues is higher than that of tumors, while the ratio of unbalanced motifs in tumors is higher than that of normal tissues in all of the 8 cancer types. Furthermore, we showed that network entropy could be used to characterize tumor progression and anticancer drug responses. For example, we found that kinase inhibitor resistant cancer cell lines had higher entropy compared to that of sensitive cell lines using the integrative analysis of microarray gene expression and drug pharmacological data collected from the Genomics of Drug Sensitivity in Cancer database. In addition, we provided potential network-level evidence that smoking might increase cancer cellular network heterogeneity and further contribute to tyrosine kinase inhibitor (e.g., gefitinib) resistance. In summary, we demonstrated that network properties such as network entropy and unbalanced motifs associated with tumor initiation, progression, and anticancer drug responses, suggesting new potential network-based prognostic and predictive measure in cancer.
DICOM-compliant PACS with CD-based image archival
NASA Astrophysics Data System (ADS)
Cox, Robert D.; Henri, Christopher J.; Rubin, Richard K.; Bret, Patrice M.
1998-07-01
This paper describes the design and implementation of a low- cost PACS conforming to the DICOM 3.0 standard. The goal was to provide an efficient image archival and management solution on a heterogeneous hospital network as a basis for filmless radiology. The system follows a distributed, client/server model and was implemented at a fraction of the cost of a commercial PACS. It provides reliable archiving on recordable CD and allows access to digital images throughout the hospital and on the Internet. Dedicated servers have been designed for short-term storage, CD-based archival, data retrieval and remote data access or teleradiology. The short-term storage devices provide DICOM storage and query/retrieve services to scanners and workstations and approximately twelve weeks of 'on-line' image data. The CD-based archival and data retrieval processes are fully automated with the exception of CD loading and unloading. The system employs lossless compression on both short- and long-term storage devices. All servers communicate via the DICOM protocol in conjunction with both local and 'master' SQL-patient databases. Records are transferred from the local to the master database independently, ensuring that storage devices will still function if the master database server cannot be reached. The system features rules-based work-flow management and WWW servers to provide multi-platform remote data access. The WWW server system is distributed on the storage, retrieval and teleradiology servers allowing viewing of locally stored image data directly in a WWW browser without the need for data transfer to a central WWW server. An independent system monitors disk usage, processes, network and CPU load on each server and reports errors to the image management team via email. The PACS was implemented using a combination of off-the-shelf hardware, freely available software and applications developed in-house. The system has enabled filmless operation in CT, MR and ultrasound within the radiology department and throughout the hospital. The use of WWW technology has enabled the development of an intuitive we- based teleradiology and image management solution that provides complete access to image data.
Menditto, Enrica; Bolufer De Gea, Angela; Cahir, Caitriona; Marengoni, Alessandra; Riegler, Salvatore; Fico, Giuseppe; Costa, Elisio; Monaco, Alessandro; Pecorelli, Sergio; Pani, Luca; Prados-Torres, Alexandra
2016-01-01
Computerized health care databases have been widely described as an excellent opportunity for research. The availability of "big data" has brought about a wave of innovation in projects when conducting health services research. Most of the available secondary data sources are restricted to the geographical scope of a given country and present heterogeneous structure and content. Under the umbrella of the European Innovation Partnership on Active and Healthy Ageing, collaborative work conducted by the partners of the group on "adherence to prescription and medical plans" identified the use of observational and large-population databases to monitor medication-taking behavior in the elderly. This article describes the methodology used to gather the information from available databases among the Adherence Action Group partners with the aim of improving data sharing on a European level. A total of six databases belonging to three different European countries (Spain, Republic of Ireland, and Italy) were included in the analysis. Preliminary results suggest that there are some similarities. However, these results should be applied in different contexts and European countries, supporting the idea that large European studies should be designed in order to get the most of already available databases.
Individual heterogeneity generating explosive system network dynamics.
Manrique, Pedro D; Johnson, Neil F
2018-03-01
Individual heterogeneity is a key characteristic of many real-world systems, from organisms to humans. However, its role in determining the system's collective dynamics is not well understood. Here we study how individual heterogeneity impacts the system network dynamics by comparing linking mechanisms that favor similar or dissimilar individuals. We find that this heterogeneity-based evolution drives an unconventional form of explosive network behavior, and it dictates how a polarized population moves toward consensus. Our model shows good agreement with data from both biological and social science domains. We conclude that individual heterogeneity likely plays a key role in the collective development of real-world networks and communities, and it cannot be ignored.
Individual heterogeneity generating explosive system network dynamics
NASA Astrophysics Data System (ADS)
Manrique, Pedro D.; Johnson, Neil F.
2018-03-01
Individual heterogeneity is a key characteristic of many real-world systems, from organisms to humans. However, its role in determining the system's collective dynamics is not well understood. Here we study how individual heterogeneity impacts the system network dynamics by comparing linking mechanisms that favor similar or dissimilar individuals. We find that this heterogeneity-based evolution drives an unconventional form of explosive network behavior, and it dictates how a polarized population moves toward consensus. Our model shows good agreement with data from both biological and social science domains. We conclude that individual heterogeneity likely plays a key role in the collective development of real-world networks and communities, and it cannot be ignored.
Spaulding, William; Deogun, Jitender
2011-09-01
Personalization of treatment is a current strategic goal for improving health care. Integrated treatment approaches such as psychiatric rehabilitation benefit from personalization because they involve matching diverse arrays of treatment options to individually unique profiles of need. The need for personalization is evident in the heterogeneity of people with severe mental illness and in the findings of experimental psychopathology. One pathway to personalization lies in analysis of the judgments and decision making of human experts and other participants as they respond to complex circumstances in pursuit of treatment and rehabilitation goals. Such analysis is aided by computer simulation of human decision making, which in turn informs development of computerized clinical decision support systems. This inspires a research program involving concurrent development of databases, domain ontology, and problem-solving algorithms, toward the goal of personalizing psychiatric rehabilitation through human collaboration with intelligent cyber systems. The immediate hurdle is to demonstrate that clinical decisions beyond diagnosis really do affect outcome. This can be done by supporting the hypothesis that a human treatment team with access to a reasonably comprehensive clinical database that tracks patient status and treatment response over time achieves better outcome than a treatment team without such access, in a controlled experimental trial. Provided the hypothesis can be supported, the near future will see prototype systems that can construct an integrated assessment, formulation, and rehabilitation plan from clinical assessment data and contextual information. This will lead to advanced systems that collaborate with human decision makers to personalize psychiatric rehabilitation and optimize outcome.
NASA Astrophysics Data System (ADS)
Ghiorso, M. S.
2013-12-01
Internally consistent thermodynamic databases are critical resources that facilitate the calculation of heterogeneous phase equilibria and thereby support geochemical, petrological, and geodynamical modeling. These 'databases' are actually derived data/model systems that depend on a diverse suite of physical property measurements, calorimetric data, and experimental phase equilibrium brackets. In addition, such databases are calibrated with the adoption of various models for extrapolation of heat capacities and volumetric equations of state to elevated temperature and pressure conditions. Finally, these databases require specification of thermochemical models for the mixing properties of solid, liquid, and fluid solutions, which are often rooted in physical theory and, in turn, depend on additional experimental observations. The process of 'calibrating' a thermochemical database involves considerable effort and an extensive computational infrastructure. Because of these complexities, the community tends to rely on a small number of thermochemical databases, generated by a few researchers; these databases often have limited longevity and are universally difficult to maintain. ThermoFit is a software framework and user interface whose aim is to provide a modeling environment that facilitates creation, maintenance and distribution of thermodynamic data/model collections. Underlying ThermoFit are data archives of fundamental physical property, calorimetric, crystallographic, and phase equilibrium constraints that provide the essential experimental information from which thermodynamic databases are traditionally calibrated. ThermoFit standardizes schema for accessing these data archives and provides web services for data mining these collections. Beyond simple data management and interoperability, ThermoFit provides a collection of visualization and software modeling tools that streamline the model/database generation process. Most notably, ThermoFit facilitates the rapid visualization of predicted model outcomes and permits the user to modify these outcomes using tactile- or mouse-based GUI interaction, permitting real-time updates that reflect users choices, preferences, and priorities involving derived model results. This ability permits some resolution of the problem of correlated model parameters in the common situation where thermodynamic models must be calibrated from inadequate data resources. The ability also allows modeling constraints to be imposed using natural data and observations (i.e. petrologic or geochemical intuition). Once formulated, ThermoFit facilitates deployment of data/model collections by automated creation of web services. Users consume these services via web-, excel-, or desktop-clients. ThermoFit is currently under active development and not yet generally available; a limited capability prototype system has been coded for Macintosh computers and utilized to construct thermochemical models for H2O-CO2 mixed fluid saturation in silicate liquids. The longer term goal is to release ThermoFit as a web portal application client with server-based cloud computations supporting the modeling environment.
Integration of Multidisciplinary Sensory Data:
Miller, Perry L.; Nadkarni, Prakash; Singer, Michael; Marenco, Luis; Hines, Michael; Shepherd, Gordon
2001-01-01
The paper provides an overview of neuroinformatics research at Yale University being performed as part of the national Human Brain Project. This research is exploring the integration of multidisciplinary sensory data, using the olfactory system as a model domain. The neuroinformatics activities fall into three main areas: 1) building databases and related tools that support experimental olfactory research at Yale and can also serve as resources for the field as a whole, 2) using computer models (molecular models and neuronal models) to help understand data being collected experimentally and to help guide further laboratory experiments, 3) performing basic neuroinformatics research to develop new informatics technologies, including a flexible data model (EAV/CR, entity-attribute-value with classes and relationships) designed to facilitate the integration of diverse heterogeneous data within a single unifying framework. PMID:11141511
IntegromeDB: an integrated system and biological search engine
2012-01-01
Background With the growth of biological data in volume and heterogeneity, web search engines become key tools for researchers. However, general-purpose search engines are not specialized for the search of biological data. Description Here, we present an approach at developing a biological web search engine based on the Semantic Web technologies and demonstrate its implementation for retrieving gene- and protein-centered knowledge. The engine is available at http://www.integromedb.org. Conclusions The IntegromeDB search engine allows scanning data on gene regulation, gene expression, protein-protein interactions, pathways, metagenomics, mutations, diseases, and other gene- and protein-related data that are automatically retrieved from publicly available databases and web pages using biological ontologies. To perfect the resource design and usability, we welcome and encourage community feedback. PMID:22260095
Design and implementation of a health data interoperability mediator.
Kuo, Mu-Hsing; Kushniruk, Andre William; Borycki, Elizabeth Marie
2010-01-01
The objective of this study is to design and implement a common-gateway oriented mediator to solve the health data interoperability problems that exist among heterogeneous health information systems. The proposed mediator has three main components: (1) a Synonym Dictionary (SD) that stores a set of global metadata and terminologies to serve as the mapping intermediary, (2) a Semantic Mapping Engine (SME) that can be used to map metadata and instance semantics, and (3) a DB-to-XML module that translates source health data stored in a database into XML format and back. A routine admission notification data exchange scenario is used to test the efficiency and feasibility of the proposed mediator. The study results show that the proposed mediator can make health information exchange more efficient.
Liu, Fenghua; Tang, Yong; Sun, Junwei; Yuan, Zhanna; Li, Shasha; Sheng, Jun; Ren, He; Hao, Jihui
2012-01-01
To investigate the efficacy and safety of regional intra-arterial chemotherapy (RIAC) versus systemic chemotherapy for stage III/IV pancreatic cancer. Randomized controlled trials of patients with advanced pancreatic cancer treated by regional intra-arterial or systemic chemotherapy were identified using PubMed, ISI, EMBASE, Cochrane Library, Google, Chinese Scientific Journals Database (VIP), and China National Knowledge Infrastructure (CNKI) electronic databases, for all publications dated between 1960 and December 31, 2010. Data was independently extracted by two reviewers. Odds ratios and relative risks were pooled using either fixed- or random-effects models, depending on I(2) statistic and Q test assessments of heterogeneity. Statistical analysis was performed using RevMan 5.0. Six randomized controlled trials comprised of 298 patients met the standards for inclusion in the meta-analysis, among 492 articles that were identified. Eight patients achieved complete remission (CR) with regional intra-arterial chemotherapy (RIAC), whereas no patients achieved CR with systemic chemotherapy. Compared with systemic chemotherapy, patients receiving RIAC had superior partial remissions (RR = 1.99, 95% CI: 1.50, 2.65; 58.06% with RIAC and 29.37% with systemic treatment), clinical benefits (RR = 2.34, 95% CI: 1.84, 2.97; 78.06% with RAIC and 29.37% with systemic treatment), total complication rates (RR = 0.72, 95% CI: 0.60, 0.87; 49.03% with RIAC and 71.33% with systemic treatment), and hematological side effects (RR = 0.76, 95% CI: 0.63, 0.91; 60.87% with RIAC and 85.71% with systemic treatment). The median survival time with RIAC (5-21 months) was longer than for systemic chemotherapy (2.7-14 months). Similarly, one year survival rates with RIAC (28.6%-41.2%) were higher than with systemic chemotherapy (0%-12.9%.). Regional intra-arterial chemotherapy is more effective and has fewer complications than systemic chemotherapy for treating advanced pancreatic cancer.
Preterm newborn readiness for oral feeding: systematic review and meta-analysis.
Lima, Ana Henriques; Côrtes, Marcela Guimarães; Bouzada, Maria Cândida Ferrarez; Friche, Amélia Augusta de Lima
2015-01-01
To identify and systematize the main studies on the transition from enteral to oral feeding in preterm infants. Articles that describe the transition from oral to enteral feeding in preterm infants were located in MEDLINE, LILACS, and SciELO databases. Original studies, with available abstract, published in the last 10 years were included. Analysis of the methodology and the main results of the studies, and meta-analysis of the effects of sensory-motor-oral stimulation at the time of transition to full oral feeding and duration of hospitalization were conducted. Twenty-nine national and international publications were considered. Most studies were clinical trials (44.8%) and did not use rating scales to start the transition process (82.7%). In the meta-analysis, positive effect of stimulation of the sensory-motor-oral system was observed with respect to the transition time to oral diet (p=0.0000), but not in relation to the length of hospital stay (p=0.09). However, heterogeneity between studies was found both in the analysis of the transition time to full oral feeding (I2=93.98) and in the length of hospital stay (I2=82.30). The transition to oral feeding is an important moment, and various physical and clinical characteristics of preterm infants have been used to describe this process. Despite the impossibility of generalizing the results due to the heterogeneity of the studies, we have noted the importance of strategies for stimulation of sensory-motor-oral system to decrease the period of transition to full oral feeding system.
The Brainomics/Localizer database.
Papadopoulos Orfanos, Dimitri; Michel, Vincent; Schwartz, Yannick; Pinel, Philippe; Moreno, Antonio; Le Bihan, Denis; Frouin, Vincent
2017-01-01
The Brainomics/Localizer database exposes part of the data collected by the in-house Localizer project, which planned to acquire four types of data from volunteer research subjects: anatomical MRI scans, functional MRI data, behavioral and demographic data, and DNA sampling. Over the years, this local project has been collecting such data from hundreds of subjects. We had selected 94 of these subjects for their complete datasets, including all four types of data, as the basis for a prior publication; the Brainomics/Localizer database publishes the data associated with these 94 subjects. Since regulatory rules prevent us from making genetic data available for download, the database serves only anatomical MRI scans, functional MRI data, behavioral and demographic data. To publish this set of heterogeneous data, we use dedicated software based on the open-source CubicWeb semantic web framework. Through genericity in the data model and flexibility in the display of data (web pages, CSV, JSON, XML), CubicWeb helps us expose these complex datasets in original and efficient ways. Copyright © 2015 Elsevier Inc. All rights reserved.
NASA Access Mechanism: Lessons learned document
NASA Technical Reports Server (NTRS)
Burdick, Lisa; Dunbar, Rick; Duncan, Denise; Generous, Curtis; Hunter, Judy; Lycas, John; Taber-Dudas, Ardeth
1994-01-01
The six-month beta test of the NASA Access Mechanism (NAM) prototype was completed on June 30, 1993. This report documents the lessons learned from the use of this Graphical User Interface to NASA databases such as the NASA STI Database, outside databases, Internet resources, and peers in the NASA R&D community. Design decisions, such as the use of XWindows software, a client-server distributed architecture, and use of the NASA Science Internet, are explained. Users' reactions to the interface and suggestions for design changes are reported, as are the changes made by the software developers based on new technology for information discovery and retrieval. The lessons learned section also reports reactions from the public, both at demonstrations and in response to articles in the trade press and journals. Recommendations are included for future versions, such as a World Wide Web (WWW) and Mosaic based interface to heterogeneous databases, and NAM-Lite, a version which allows customization to include utilities provided locally at NASA Centers.
The Molecular Signatures Database (MSigDB) hallmark gene set collection.
Liberzon, Arthur; Birger, Chet; Thorvaldsdóttir, Helga; Ghandi, Mahmoud; Mesirov, Jill P; Tamayo, Pablo
2015-12-23
The Molecular Signatures Database (MSigDB) is one of the most widely used and comprehensive databases of gene sets for performing gene set enrichment analysis. Since its creation, MSigDB has grown beyond its roots in metabolic disease and cancer to include >10,000 gene sets. These better represent a wider range of biological processes and diseases, but the utility of the database is reduced by increased redundancy across, and heterogeneity within, gene sets. To address this challenge, here we use a combination of automated approaches and expert curation to develop a collection of "hallmark" gene sets as part of MSigDB. Each hallmark in this collection consists of a "refined" gene set, derived from multiple "founder" sets, that conveys a specific biological state or process and displays coherent expression. The hallmarks effectively summarize most of the relevant information of the original founder sets and, by reducing both variation and redundancy, provide more refined and concise inputs for gene set enrichment analysis.
MMA-EoS: A Computational Framework for Mineralogical Thermodynamics
NASA Astrophysics Data System (ADS)
Chust, T. C.; Steinle-Neumann, G.; Dolejš, D.; Schuberth, B. S. A.; Bunge, H.-P.
2017-12-01
We present a newly developed software framework, MMA-EoS, that evaluates phase equilibria and thermodynamic properties of multicomponent systems by Gibbs energy minimization, with application to mantle petrology. The code is versatile in terms of the equation-of-state and mixing properties and allows for the computation of properties of single phases, solution phases, and multiphase aggregates. Currently, the open program distribution contains equation-of-state formulations widely used, that is, Caloric-Murnaghan, Caloric-Modified-Tait, and Birch-Murnaghan-Mie-Grüneisen-Debye models, with published databases included. Through its modular design and easily scripted database, MMA-EoS can readily be extended with new formulations of equations-of-state and changes or extensions to thermodynamic data sets. We demonstrate the application of the program by reproducing and comparing physical properties of mantle phases and assemblages with previously published work and experimental data, successively increasing complexity, up to computing phase equilibria of six-component compositions. Chemically complex systems allow us to trace the budget of minor chemical components in order to explore whether they lead to the formation of new phases or extend stability fields of existing ones. Self-consistently computed thermophysical properties for a homogeneous mantle and a mechanical mixture of slab lithologies show no discernible differences that require a heterogeneous mantle structure as has been suggested previously. Such examples illustrate how thermodynamics of mantle mineralogy can advance the study of Earth's interior.
Grieve, Stuart M; Korgaonkar, Mayuresh S; Clark, C Richard; Williams, Leanne M
2011-04-01
Magnetic resonance imaging (MRI) studies of structural brain development have suggested that the limbic system is relatively preserved in comparison to other brain regions with healthy aging. The goal of this study was to systematically investigate age-related changes of the limbic system using measures of cortical thickness, volumetric and diffusion characteristics. We also investigated if the "relative preservation" concept is consistent across the individual sub-regions of the limbic system. T1 weighted structural MRI and Diffusion Tensor Imaging data from 476 healthy participants from the Brain Resource International Database was used for this study. Age-related changes in grey matter (GM)/white matter (WM) volume, cortical thickness, diffusional characteristics for the pericortical WM and for the fiber tracts associated with the limbic regions were quantified. A regional variability in the aging patterns across the limbic system was present. Four important patterns of age-related changes were highlighted for the limbic sub-regions: 1. early maturation of GM with late loss in the hippocampus and amygdala; 2. an extreme pattern of GM preservation in the entorhinal cortex; 3. a flat pattern of reduced GM loss in the anterior cingulate and the parahippocampus and; 4. accelerated GM loss in the isthmus and posterior cingulate. The GM volumetric data and cortical thickness measures proved to be internally consistent, while the diffusional measures provided complementary data that seem consistent with the GM trends identified. This heterogeneity can be hypothesized to be associated with age-related changes of cognitive function specialized for that region and direct connections to the other brain regions sub-serving these functions. Copyright © 2011 Elsevier Inc. All rights reserved.
PropBase Query Layer: a single portal to UK subsurface physical property databases
NASA Astrophysics Data System (ADS)
Kingdon, Andrew; Nayembil, Martin L.; Richardson, Anne E.; Smith, A. Graham
2013-04-01
Until recently, the delivery of geological information for industry and public was achieved by geological mapping. Now pervasively available computers mean that 3D geological models can deliver realistic representations of the geometric location of geological units, represented as shells or volumes. The next phase of this process is to populate these with physical properties data that describe subsurface heterogeneity and its associated uncertainty. Achieving this requires capture and serving of physical, hydrological and other property information from diverse sources to populate these models. The British Geological Survey (BGS) holds large volumes of subsurface property data, derived both from their own research data collection and also other, often commercially derived data sources. This can be voxelated to incorporate this data into the models to demonstrate property variation within the subsurface geometry. All property data held by BGS has for many years been stored in relational databases to ensure their long-term continuity. However these have, by necessity, complex structures; each database contains positional reference data and model information, and also metadata such as sample identification information and attributes that define the source and processing. Whilst this is critical to assessing these analyses, it also hugely complicates the understanding of variability of the property under assessment and requires multiple queries to study related datasets making extracting physical properties from these databases difficult. Therefore the PropBase Query Layer has been created to allow simplified aggregation and extraction of all related data and its presentation of complex data in simple, mostly denormalized, tables which combine information from multiple databases into a single system. The structure from each relational database is denormalized in a generalised structure, so that each dataset can be viewed together in a common format using a simple interface. Data are re-engineered to facilitate easy loading. The query layer structure comprises tables, procedures, functions, triggers, views and materialised views. The structure contains a main table PRB_DATA which contains all of the data with the following attribution: • a unique identifier • the data source • the unique identifier from the parent database for traceability • the 3D location • the property type • the property value • the units • necessary qualifiers • precision information and an audit trail Data sources, property type and units are constrained by dictionaries, a key component of the structure which defines what properties and inheritance hierarchies are to be coded and also guides the process as to what and how these are extracted from the structure. Data types served by the Query Layer include site investigation derived geotechnical data, hydrogeology datasets, regional geochemistry, geophysical logs as well as lithological and borehole metadata. The size and complexity of the data sets with multiple parent structures requires a technically robust approach to keep the layer synchronised. This is achieved through Oracle procedures written in PL/SQL containing the logic required to carry out the data manipulation (inserts, updates, deletes) to keep the layer synchronised with the underlying databases either as regular scheduled jobs (weekly, monthly etc) or invoked on demand. The PropBase Query Layer's implementation has enabled rapid data discovery, visualisation and interpretation of geological data with greater ease, simplifying the parametrisation of 3D model volumes and facilitating the study of intra-unit heterogeneity.
Pandey, Vaibhav; Saini, Poonam
2018-06-01
MapReduce (MR) computing paradigm and its open source implementation Hadoop have become a de facto standard to process big data in a distributed environment. Initially, the Hadoop system was homogeneous in three significant aspects, namely, user, workload, and cluster (hardware). However, with growing variety of MR jobs and inclusion of different configurations of nodes in the existing cluster, heterogeneity has become an essential part of Hadoop systems. The heterogeneity factors adversely affect the performance of a Hadoop scheduler and limit the overall throughput of the system. To overcome this problem, various heterogeneous Hadoop schedulers have been proposed in the literature. Existing survey works in this area mostly cover homogeneous schedulers and classify them on the basis of quality of service parameters they optimize. Hence, there is a need to study the heterogeneous Hadoop schedulers on the basis of various heterogeneity factors considered by them. In this survey article, we first discuss different heterogeneity factors that typically exist in a Hadoop system and then explore various challenges that arise while designing the schedulers in the presence of such heterogeneity. Afterward, we present the comparative study of heterogeneous scheduling algorithms available in the literature and classify them by the previously said heterogeneity factors. Lastly, we investigate different methods and environment used for evaluation of discussed Hadoop schedulers.
Distributed policy based access to networked heterogeneous ISR data sources
NASA Astrophysics Data System (ADS)
Bent, G.; Vyvyan, D.; Wood, David; Zerfos, Petros; Calo, Seraphin
2010-04-01
Within a coalition environment, ad hoc Communities of Interest (CoI's) come together, perhaps for only a short time, with different sensors, sensor platforms, data fusion elements, and networks to conduct a task (or set of tasks) with different coalition members taking different roles. In such a coalition, each organization will have its own inherent restrictions on how it will interact with the others. These are usually stated as a set of policies, including security and privacy policies. The capability that we want to enable for a coalition operation is to provide access to information from any coalition partner in conformance with the policies of all. One of the challenges in supporting such ad-hoc coalition operations is that of providing efficient access to distributed sources of data, where the applications requiring the data do not have knowledge of the location of the data within the network. To address this challenge the International Technology Alliance (ITA) program has been developing the concept of a Dynamic Distributed Federated Database (DDFD), also know as a Gaian Database. This type of database provides a means for accessing data across a network of distributed heterogeneous data sources where access to the information is controlled by a mixture of local and global policies. We describe how a network of disparate ISR elements can be expressed as a DDFD and how this approach enables sensor and other information sources to be discovered autonomously or semi-autonomously and/or combined, fused formally defined local and global policies.
LIVIVO - the Vertical Search Engine for Life Sciences.
Müller, Bernd; Poley, Christoph; Pössel, Jana; Hagelstein, Alexandra; Gübitz, Thomas
2017-01-01
The explosive growth of literature and data in the life sciences challenges researchers to keep track of current advancements in their disciplines. Novel approaches in the life science like the One Health paradigm require integrated methodologies in order to link and connect heterogeneous information from databases and literature resources. Current publications in the life sciences are increasingly characterized by the employment of trans-disciplinary methodologies comprising molecular and cell biology, genetics, genomic, epigenomic, transcriptional and proteomic high throughput technologies with data from humans, plants, and animals. The literature search engine LIVIVO empowers retrieval functionality by incorporating various literature resources from medicine, health, environment, agriculture and nutrition. LIVIVO is developed in-house by ZB MED - Information Centre for Life Sciences. It provides a user-friendly and usability-tested search interface with a corpus of 55 Million citations derived from 50 databases. Standardized application programming interfaces are available for data export and high throughput retrieval. The search functions allow for semantic retrieval with filtering options based on life science entities. The service oriented architecture of LIVIVO uses four different implementation layers to deliver search services. A Knowledge Environment is developed by ZB MED to deal with the heterogeneity of data as an integrative approach to model, store, and link semantic concepts within literature resources and databases. Future work will focus on the exploitation of life science ontologies and on the employment of NLP technologies in order to improve query expansion, filters in faceted search, and concept based relevancy rankings in LIVIVO.
Nano-catalysts: Bridging the gap between homogeneous and heterogeneous catalysis
Functionalized nanoparticles have emerged as sustainable alternatives to conventional materials, as robust, high-surface-area heterogeneous catalyst supports. We envisioned a catalyst system, which can bridge the homogenous and heterogeneous system. Postsynthetic surface modifica...
MACSIMS : multiple alignment of complete sequences information management system
Thompson, Julie D; Muller, Arnaud; Waterhouse, Andrew; Procter, Jim; Barton, Geoffrey J; Plewniak, Frédéric; Poch, Olivier
2006-01-01
Background In the post-genomic era, systems-level studies are being performed that seek to explain complex biological systems by integrating diverse resources from fields such as genomics, proteomics or transcriptomics. New information management systems are now needed for the collection, validation and analysis of the vast amount of heterogeneous data available. Multiple alignments of complete sequences provide an ideal environment for the integration of this information in the context of the protein family. Results MACSIMS is a multiple alignment-based information management program that combines the advantages of both knowledge-based and ab initio sequence analysis methods. Structural and functional information is retrieved automatically from the public databases. In the multiple alignment, homologous regions are identified and the retrieved data is evaluated and propagated from known to unknown sequences with these reliable regions. In a large-scale evaluation, the specificity of the propagated sequence features is estimated to be >99%, i.e. very few false positive predictions are made. MACSIMS is then used to characterise mutations in a test set of 100 proteins that are known to be involved in human genetic diseases. The number of sequence features associated with these proteins was increased by 60%, compared to the features available in the public databases. An XML format output file allows automatic parsing of the MACSIM results, while a graphical display using the JalView program allows manual analysis. Conclusion MACSIMS is a new information management system that incorporates detailed analyses of protein families at the structural, functional and evolutionary levels. MACSIMS thus provides a unique environment that facilitates knowledge extraction and the presentation of the most pertinent information to the biologist. A web server and the source code are available at . PMID:16792820
Heterogeneous Systems for Information-Variable Environments (HIVE)
2017-05-01
ARL-TR-8027 ● May 2017 US Army Research Laboratory Heterogeneous Systems for Information - Variable Environments (HIVE) by Amar...not return it to the originator. ARL-TR-8027 ● May 2017 US Army Research Laboratory Heterogeneous Systems for Information ...Computational and Information Sciences Directorate, ARL Approved for public release; distribution is unlimited. ii REPORT
Turner, Rebecca M; Jackson, Dan; Wei, Yinghui; Thompson, Simon G; Higgins, Julian P T
2015-01-01
Numerous meta-analyses in healthcare research combine results from only a small number of studies, for which the variance representing between-study heterogeneity is estimated imprecisely. A Bayesian approach to estimation allows external evidence on the expected magnitude of heterogeneity to be incorporated. The aim of this paper is to provide tools that improve the accessibility of Bayesian meta-analysis. We present two methods for implementing Bayesian meta-analysis, using numerical integration and importance sampling techniques. Based on 14 886 binary outcome meta-analyses in the Cochrane Database of Systematic Reviews, we derive a novel set of predictive distributions for the degree of heterogeneity expected in 80 settings depending on the outcomes assessed and comparisons made. These can be used as prior distributions for heterogeneity in future meta-analyses. The two methods are implemented in R, for which code is provided. Both methods produce equivalent results to standard but more complex Markov chain Monte Carlo approaches. The priors are derived as log-normal distributions for the between-study variance, applicable to meta-analyses of binary outcomes on the log odds-ratio scale. The methods are applied to two example meta-analyses, incorporating the relevant predictive distributions as prior distributions for between-study heterogeneity. We have provided resources to facilitate Bayesian meta-analysis, in a form accessible to applied researchers, which allow relevant prior information on the degree of heterogeneity to be incorporated. © 2014 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd. PMID:25475839
Characterizing the genetic structure of a forensic DNA database using a latent variable approach.
Kruijver, Maarten
2016-07-01
Several problems in forensic genetics require a representative model of a forensic DNA database. Obtaining an accurate representation of the offender database can be difficult, since databases typically contain groups of persons with unregistered ethnic origins in unknown proportions. We propose to estimate the allele frequencies of the subpopulations comprising the offender database and their proportions from the database itself using a latent variable approach. We present a model for which parameters can be estimated using the expectation maximization (EM) algorithm. This approach does not rely on relatively small and possibly unrepresentative population surveys, but is driven by the actual genetic composition of the database only. We fit the model to a snapshot of the Dutch offender database (2014), which contains close to 180,000 profiles, and find that three subpopulations suffice to describe a large fraction of the heterogeneity in the database. We demonstrate the utility and reliability of the approach with three applications. First, we use the model to predict the number of false leads obtained in database searches. We assess how well the model predicts the number of false leads obtained in mock searches in the Dutch offender database, both for the case of familial searching for first degree relatives of a donor and searching for contributors to three-person mixtures. Second, we study the degree of partial matching between all pairs of profiles in the Dutch database and compare this to what is predicted using the latent variable approach. Third, we use the model to provide evidence to support that the Dutch practice of estimating match probabilities using the Balding-Nichols formula with a native Dutch reference database and θ=0.03 is conservative. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Chemical Transformation System: Cloud Based ...
Integrated Environmental Modeling (IEM) systems that account for the fate/transport of organics frequently require physicochemical properties as well as transformation products. A myriad of chemical property databases exist but these can be difficult to access and often do not contain the proprietary chemicals that environmental regulators must consider. We are building the Chemical Transformation System (CTS) to facilitate model parameterization and analysis. CTS integrates a number of physicochemical property calculators into the system including EPI Suite, SPARC, TEST and ChemAxon. The calculators are heterogeneous in their scientific methodologies, technology implementations and deployment stacks. CTS also includes a chemical transformation processing engine that has been loaded with reaction libraries for human biotransformation, abiotic reduction and abiotic hydrolysis. CTS implements a common interface for the disparate calculators accepting molecular identifiers (SMILES, IUPAC, CAS#, user-drawn molecule) before submission for processing. To make the system as accessible as possible and provide a consistent programmatic interface, we wrapped the calculators in a standardized RESTful Application Programming Interface (API) which makes it capable of servicing a much broader spectrum of clients without constraints to interoperability such as operating system or programming language. CTS is hosted in a shared cloud environment, the Quantitative Environmental
Authomatization of Digital Collection Access Using Mobile and Wireless Data Terminals
NASA Astrophysics Data System (ADS)
Leontiev, I. V.
Information technologies become vital due to information processing needs, database access, data analysis and decision support. Currently, a lot of scientific projects are oriented on database integration of heterogeneous systems. The problem of on-line and rapid access to large integrated systems of digital collections is also very important. Usually users move between different locations, either at work or at home. In most cases users need an efficient and remote access to information, stored in integrated data collections. Desktop computers are unable to fulfill the needs, so mobile and wireless devices become helpful. Handhelds and data terminals are nessessary in medical assistance (they store detailed information about each patient, and helpful for nurses), immediate access to data collections is used in a Highway patrol services (databanks of cars, owners, driver licences). Using mobile access, warehouse operations can be validated. Library and museum items cyclecounting will speed up using online barcode-scanning and central database access. That's why mobile devices - cell phones, PDA, handheld computers with wireless access, WindowsCE and PalmOS terminals become popular. Generally, mobile devices have a relatively slow processor, and limited display capabilities, but they are effective for storing and displaying textual data, recognize user hand-writing with stylus, support GUI. Users can perform operations on handheld terminal, and exchange data with the main system (using immediate radio access, or offline access during syncronization process) for update. In our report, we give an approach for mobile access to data collections, which raises an efficiency of data processing in a book library, helps to control available books, books in stock, validate service charges, eliminate staff mistakes, generate requests for book delivery. Our system uses mobile devices Symbol RF (with radio-channel access), and data terminals Symbol Palm Terminal for batch-processing and synchronization with remote library databases. We discuss the use of PalmOS-compatible devices, and WindowsCE terminals. Our software system is based on modular, scalable three-tier architecture. Additional functionality can be easily customized. Scalability is also supplied by Internet / Intranet technologies, and radio-access points. The base module of the system supports generic warehouse operations: cyclecounting with handheld barcode-scanners, efficient items delivery and issue, item movement, reserving, report generating on finished and in-process operations. Movements are optimized using worker's current location, operations are sorted in a priority order and transmitted to mobile and wireless worker's terminals. Mobile terminals improve of tasks processing control, eliminate staff mistakes, display actual information about main processes, provide data for online-reports, and significantly raise the efficiency of data exchange.
Ferdynus, C; Huiart, L
2016-09-01
Administrative health databases such as the French National Heath Insurance Database - SNIIRAM - are a major tool to answer numerous public health research questions. However the use of such data requires complex and time-consuming data management. Our objective was to develop and make available a tool to optimize cohort constitution within administrative health databases. We developed a process to extract, transform and load (ETL) data from various heterogeneous sources in a standardized data warehouse. This data warehouse is architected as a star schema corresponding to an i2b2 star schema model. We then evaluated the performance of this ETL using data from a pharmacoepidemiology research project conducted in the SNIIRAM database. The ETL we developed comprises a set of functionalities for creating SAS scripts. Data can be integrated into a standardized data warehouse. As part of the performance assessment of this ETL, we achieved integration of a dataset from the SNIIRAM comprising more than 900 million lines in less than three hours using a desktop computer. This enables patient selection from the standardized data warehouse within seconds of the request. The ETL described in this paper provides a tool which is effective and compatible with all administrative health databases, without requiring complex database servers. This tool should simplify cohort constitution in health databases; the standardization of warehouse data facilitates collaborative work between research teams. Copyright © 2016 Elsevier Masson SAS. All rights reserved.
Šarić, Željko; Xu, Xuecai; Duan, Li; Babić, Darko
2018-06-20
This study intended to investigate the interactions between accident rate and traffic signs in state roads located in Croatia, and accommodate the heterogeneity attributed to unobserved factors. The data from 130 state roads between 2012 and 2016 were collected from Traffic Accident Database System maintained by the Republic of Croatia Ministry of the Interior. To address the heterogeneity, a panel quantile regression model was proposed, in which quantile regression model offers a more complete view and a highly comprehensive analysis of the relationship between accident rate and traffic signs, while the panel data model accommodates the heterogeneity attributed to unobserved factors. Results revealed that (1) low visibility of material damage (MD) and death or injured (DI) increased the accident rate; (2) the number of mandatory signs and the number of warning signs were more likely to reduce the accident rate; (3)average speed limit and the number of invalid traffic signs per km exhibited a high accident rate. To our knowledge, it's the first attempt to analyze the interactions between accident consequences and traffic signs by employing a panel quantile regression model; by involving the visibility, the present study demonstrates that the low visibility causes a relatively higher risk of MD and DI; It is noteworthy that average speed limit corresponds with accident rate positively; The number of mandatory signs and the number of warning signs are more likely to reduce the accident rate; The number of invalid traffic signs per km are significant for accident rate, thus regular maintenance should be kept for a safer roadway environment.
NASA Astrophysics Data System (ADS)
de Boer, Maaike H. T.; Bouma, Henri; Kruithof, Maarten C.; ter Haar, Frank B.; Fischer, Noëlle M.; Hagendoorn, Laurens K.; Joosten, Bart; Raaijmakers, Stephan
2017-10-01
The information available on-line and off-line, from open as well as from private sources, is growing at an exponential rate and places an increasing demand on the limited resources of Law Enforcement Agencies (LEAs). The absence of appropriate tools and techniques to collect, process, and analyze the volumes of complex and heterogeneous data has created a severe information overload. If a solution is not found, the impact on law enforcement will be dramatic, e.g. because important evidence is missed or the investigation time is too long. Furthermore, there is an uneven level of capabilities to deal with the large volumes of complex and heterogeneous data that come from multiple open and private sources at national level across the EU, which hinders cooperation and information sharing. Consequently, there is a pertinent need to develop tools, systems and processes which expedite online investigations. In this paper, we describe a suite of analysis tools to identify and localize generic concepts, instances of objects and logos in images, which constitutes a significant portion of everyday law enforcement data. We describe how incremental learning based on only a few examples and large-scale indexing are addressed in both concept detection and instance search. Our search technology allows querying of the database by visual examples and by keywords. Our tools are packaged in a Docker container to guarantee easy deployment on a system and our tools exploit possibilities provided by open source toolboxes, contributing to the technical autonomy of LEAs.
Single- or multiple-visit endodontics: which technique results in fewest postoperative problems?
Balto, Khaled
2009-01-01
The Cochrane Central Register of Controlled Trials, Medline, Embase, six thesis databases (Networked Digital Library of Theses and Dissertations, Proquest Digital Dissertations, OAIster, Index to Theses, Australian Digital Thesis Program and Dissertation.com) and one conference report database (BIOSIS Previews) were searched. There were no language restrictions. Studies were included if subjects had a noncontributory medical history; underwent nonsurgical root canal treatment during the study; there was comparison between single- and multiple-visit root canal treatment; and if outcome was measured in terms of pain degree or prevalence of flare-up. Data were extracted using a standard data extraction sheet. Because of variations in recorded outcomes and methodological and clinical heterogeneity, a meta-analysis was not carried out, although a qualitative synthesis was presented. Sixteen studies fitted the inclusion criteria in the review, with sample size varying from 60-1012 cases. The prevalence of postoperative pain ranged from 3-58%. The heterogeneity of the included studies was far too great to yield meaningful results from a meta-analysis. Compelling evidence is lacking to indicate any significantly different prevalence of postoperative pain or flare-up following either single- or multiple-visit root canal treatment.
Mannava, Priya; Abdullah, Asnawi; James, Chris; Dodd, Rebecca; Annear, Peter Leslie
2015-03-01
Addressing the growing burden of noncommunicable diseases (NCDs) in countries of the Asia-Pacific region requires well-functioning health systems. In low- and middle-income countries (LMICs), however, health systems are generally characterized by inadequate financial and human resources, unsuitable service delivery models, and weak information systems. The aims of this review were to identify (a) health systems interventions being implemented to deliver NCD programs and services and their outcomes and (b) the health systems bottlenecks impeding access to or delivery of these programs and services in LMICs of the Asia-Pacific region. A search of 4 databases for literature published between 1990 and 2010 retrieved 36 relevant studies. For each study, information on basic characteristics, type of health systems bottleneck/intervention, and outcome was extracted, and methodological quality appraised. Health systems interventions and bottlenecks were classified as per the World Health Organization health systems building blocks framework. The review identified interventions and bottlenecks in the building blocks of service delivery, health workforce, financing, health information systems, and medical products, vaccines, and technologies. Studies, however, were heterogeneous in methodologies used, and the overall quality was generally low. There are several gaps in the evidence base around NCDs in the Asia-Pacific region that require further investigation. © 2013 APJPH.
Martinez-Murcia, Francisco Jesús; Lai, Meng-Chuan; Górriz, Juan Manuel; Ramírez, Javier; Young, Adam M H; Deoni, Sean C L; Ecker, Christine; Lombardo, Michael V; Baron-Cohen, Simon; Murphy, Declan G M; Bullmore, Edward T; Suckling, John
2017-03-01
Neuroimaging studies have reported structural and physiological differences that could help understand the causes and development of Autism Spectrum Disorder (ASD). Many of them rely on multisite designs, with the recruitment of larger samples increasing statistical power. However, recent large-scale studies have put some findings into question, considering the results to be strongly dependent on the database used, and demonstrating the substantial heterogeneity within this clinically defined category. One major source of variance may be the acquisition of the data in multiple centres. In this work we analysed the differences found in the multisite, multi-modal neuroimaging database from the UK Medical Research Council Autism Imaging Multicentre Study (MRC AIMS) in terms of both diagnosis and acquisition sites. Since the dissimilarities between sites were higher than between diagnostic groups, we developed a technique called Significance Weighted Principal Component Analysis (SWPCA) to reduce the undesired intensity variance due to acquisition site and to increase the statistical power in detecting group differences. After eliminating site-related variance, statistically significant group differences were found, including Broca's area and the temporo-parietal junction. However, discriminative power was not sufficient to classify diagnostic groups, yielding accuracies results close to random. Our work supports recent claims that ASD is a highly heterogeneous condition that is difficult to globally characterize by neuroimaging, and therefore different (and more homogenous) subgroups should be defined to obtain a deeper understanding of ASD. Hum Brain Mapp 38:1208-1223, 2017. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Tabrizi, Reza; Moosazadeh, Mahmood; Akbari, Maryam; Dabbaghmanesh, Mohammad Hossein; Mohamadkhani, Minoo; Asemi, Zatollah; Heydari, Seyed Taghi; Akbari, Mojtaba; Lankarani, Kamran B
2018-01-01
Background The prevention and correction of vitamin D deficiency requires a precise depiction of the current situation and identification of risk factors in each region. The present study attempted to determine these entities using a systematic review and meta-analysis in Iran. Methods Articles published online in Persian and English between 2000 and November 1, 2016, were reviewed. This was carried out using national databases such as SID, IranMedex, Magiran, and IranDoc and international databases such as PubMed, Google Scholar, and Scopus. The heterogeneity index among the studies was determined using the Cochran (Q) and I2 test. Based on the heterogeneity results, the random-effect model was applied to estimate the prevalence of vitamin D deficiency. In addition, meta-regression analysis was used to determine heterogeneity-suspected factors, and the Egger test was applied to identify publication bias. Results The meta-analysis of 48 studies identified 18531 individuals with vitamin D deficiency. According to the random-effect model, the prevalence of vitamin D deficiency among male, female, and pregnant women was estimated to be 45.64% (95% CI: 29.63 to 61.65), 61.90% (95% CI: 48.85 to 74.96), and 60.45% (95% CI: 23.73 to 97.16), respectively. The results of the meta-regression analysis indicated that the prevalence of vitamin D deficiency was significantly different in various geographical regions (β=4.4; P=0.023). Conclusion The results obtained showed a significant prevalence of vitamin D deficiency among the Iranian population, a condition to be addressed by appropriate planning. PMID:29749981
Stochastic simulation in systems biology
Székely, Tamás; Burrage, Kevin
2014-01-01
Natural systems are, almost by definition, heterogeneous: this can be either a boon or an obstacle to be overcome, depending on the situation. Traditionally, when constructing mathematical models of these systems, heterogeneity has typically been ignored, despite its critical role. However, in recent years, stochastic computational methods have become commonplace in science. They are able to appropriately account for heterogeneity; indeed, they are based around the premise that systems inherently contain at least one source of heterogeneity (namely, intrinsic heterogeneity). In this mini-review, we give a brief introduction to theoretical modelling and simulation in systems biology and discuss the three different sources of heterogeneity in natural systems. Our main topic is an overview of stochastic simulation methods in systems biology. There are many different types of stochastic methods. We focus on one group that has become especially popular in systems biology, biochemistry, chemistry and physics. These discrete-state stochastic methods do not follow individuals over time; rather they track only total populations. They also assume that the volume of interest is spatially homogeneous. We give an overview of these methods, with a discussion of the advantages and disadvantages of each, and suggest when each is more appropriate to use. We also include references to software implementations of them, so that beginners can quickly start using stochastic methods for practical problems of interest. PMID:25505503
Towards Semantic e-Science for Traditional Chinese Medicine
Chen, Huajun; Mao, Yuxin; Zheng, Xiaoqing; Cui, Meng; Feng, Yi; Deng, Shuiguang; Yin, Aining; Zhou, Chunying; Tang, Jinming; Jiang, Xiaohong; Wu, Zhaohui
2007-01-01
Background Recent advances in Web and information technologies with the increasing decentralization of organizational structures have resulted in massive amounts of information resources and domain-specific services in Traditional Chinese Medicine. The massive volume and diversity of information and services available have made it difficult to achieve seamless and interoperable e-Science for knowledge-intensive disciplines like TCM. Therefore, information integration and service coordination are two major challenges in e-Science for TCM. We still lack sophisticated approaches to integrate scientific data and services for TCM e-Science. Results We present a comprehensive approach to build dynamic and extendable e-Science applications for knowledge-intensive disciplines like TCM based on semantic and knowledge-based techniques. The semantic e-Science infrastructure for TCM supports large-scale database integration and service coordination in a virtual organization. We use domain ontologies to integrate TCM database resources and services in a semantic cyberspace and deliver a semantically superior experience including browsing, searching, querying and knowledge discovering to users. We have developed a collection of semantic-based toolkits to facilitate TCM scientists and researchers in information sharing and collaborative research. Conclusion Semantic and knowledge-based techniques are suitable to knowledge-intensive disciplines like TCM. It's possible to build on-demand e-Science system for TCM based on existing semantic and knowledge-based techniques. The presented approach in the paper integrates heterogeneous distributed TCM databases and services, and provides scientists with semantically superior experience to support collaborative research in TCM discipline. PMID:17493289
Graph Partitioning for Parallel Applications in Heterogeneous Grid Environments
NASA Technical Reports Server (NTRS)
Bisws, Rupak; Kumar, Shailendra; Das, Sajal K.; Biegel, Bryan (Technical Monitor)
2002-01-01
The problem of partitioning irregular graphs and meshes for parallel computations on homogeneous systems has been extensively studied. However, these partitioning schemes fail when the target system architecture exhibits heterogeneity in resource characteristics. With the emergence of technologies such as the Grid, it is imperative to study the partitioning problem taking into consideration the differing capabilities of such distributed heterogeneous systems. In our model, the heterogeneous system consists of processors with varying processing power and an underlying non-uniform communication network. We present in this paper a novel multilevel partitioning scheme for irregular graphs and meshes, that takes into account issues pertinent to Grid computing environments. Our partitioning algorithm, called MiniMax, generates and maps partitions onto a heterogeneous system with the objective of minimizing the maximum execution time of the parallel distributed application. For experimental performance study, we have considered both a realistic mesh problem from NASA as well as synthetic workloads. Simulation results demonstrate that MiniMax generates high quality partitions for various classes of applications targeted for parallel execution in a distributed heterogeneous environment.
Tracer experiments in periodical heterogeneous model porous medium
NASA Astrophysics Data System (ADS)
Majdalani, Samer; Delenne, Carole; Guinot, Vincent
2017-06-01
It is established that solute transport in homogenous porous media follows a classical 'S' shape breakthrough curve that can easily be modelled by a convection dispersion equation. In this study, we designed a Model Heterogeneous Porous Medium (MHPM) with a high degree of heterogeneity, in which the breakthrough curve does not follow the classical 'S' shape. The contrast in porosity is obtained by placing a cylindrical cavity (100% porosity) inside a 40% porosity medium composed with 1mm glass beads. Step tracing experiments are done by injecting salty water in the study column initially containing deionised water, until the outlet concentration stabilises to the input one. Several replicates of the experiment were conducted for n = 1 to 6 MHPM placed in series. The total of 116 experiments gives a high-quality database allowing the assessment of experimental uncertainty. The experimental results show that the breakthrough curve is very different from the `S' shape for small values of n, but the more n increases, the more the classical shape is recovered.
Image-based diagnostic aid for interstitial lung disease with secondary data integration
NASA Astrophysics Data System (ADS)
Depeursinge, Adrien; Müller, Henning; Hidki, Asmâa; Poletti, Pierre-Alexandre; Platon, Alexandra; Geissbuhler, Antoine
2007-03-01
Interstitial lung diseases (ILDs) are a relatively heterogeneous group of around 150 illnesses with often very unspecific symptoms. The most complete imaging method for the characterisation of ILDs is the high-resolution computed tomography (HRCT) of the chest but a correct interpretation of these images is difficult even for specialists as many diseases are rare and thus little experience exists. Moreover, interpreting HRCT images requires knowledge of the context defined by clinical data of the studied case. A computerised diagnostic aid tool based on HRCT images with associated medical data to retrieve similar cases of ILDs from a dedicated database can bring quick and precious information for example for emergency radiologists. The experience from a pilot project highlighted the need for detailed database containing high-quality annotations in addition to clinical data. The state of the art is studied to identify requirements for image-based diagnostic aid for interstitial lung disease with secondary data integration. The data acquisition steps are detailed. The selection of the most relevant clinical parameters is done in collaboration with lung specialists from current literature, along with knowledge bases of computer-based diagnostic decision support systems. In order to perform high-quality annotations of the interstitial lung tissue in the HRCT images an annotation software and its own file format is implemented for DICOM images. A multimedia database is implemented to store ILD cases with clinical data and annotated image series. Cases from the University & University Hospitals of Geneva (HUG) are retrospectively and prospectively collected to populate the database. Currently, 59 cases with certified diagnosis and their clinical parameters are stored in the database as well as 254 image series of which 26 have their regions of interest annotated. The available data was used to test primary visual features for the classification of lung tissue patterns. These features show good discriminative properties for the separation of five classes of visual observations.
Dark, Paul; Wilson, Claire; Blackwood, Bronagh; McAuley, Danny F; Perkins, Gavin D; McMullan, Ronan; Gates, Simon; Warhurst, Geoffrey
2012-01-01
Background There is growing interest in the potential utility of molecular diagnostics in improving the detection of life-threatening infection (sepsis). LightCycler® SeptiFast is a multipathogen probe-based real-time PCR system targeting DNA sequences of bacteria and fungi present in blood samples within a few hours. We report here the protocol of the first systematic review of published clinical diagnostic accuracy studies of this technology when compared with blood culture in the setting of suspected sepsis. Methods/design Data sources: the Cochrane Database of Systematic Reviews, the Database of Abstracts of Reviews of Effects (DARE), the Health Technology Assessment Database (HTA), the NHS Economic Evaluation Database (NHSEED), The Cochrane Library, MEDLINE, EMBASE, ISI Web of Science, BIOSIS Previews, MEDION and the Aggressive Research Intelligence Facility Database (ARIF). diagnostic accuracy studies that compare the real-time PCR technology with standard culture results performed on a patient's blood sample during the management of sepsis. three reviewers, working independently, will determine the level of evidence, methodological quality and a standard data set relating to demographics and diagnostic accuracy metrics for each study. Statistical analysis/data synthesis: heterogeneity of studies will be investigated using a coupled forest plot of sensitivity and specificity and a scatter plot in Receiver Operator Characteristic (ROC) space. Bivariate model method will be used to estimate summary sensitivity and specificity. The authors will investigate reporting biases using funnel plots based on effective sample size and regression tests of asymmetry. Subgroup analyses are planned for adults, children and infection setting (hospital vs community) if sufficient data are uncovered. Dissemination Recommendations will be made to the Department of Health (as part of an open-access HTA report) as to whether the real-time PCR technology has sufficient clinical diagnostic accuracy potential to move forward to efficacy testing during the provision of routine clinical care. Registration PROSPERO-NIHR Prospective Register of Systematic Reviews (CRD42011001289).
Federated querying architecture with clinical & translational health IT application.
Livne, Oren E; Schultz, N Dustin; Narus, Scott P
2011-10-01
We present a software architecture that federates data from multiple heterogeneous health informatics data sources owned by multiple organizations. The architecture builds upon state-of-the-art open-source Java and XML frameworks in innovative ways. It consists of (a) federated query engine, which manages federated queries and result set aggregation via a patient identification service; and (b) data source facades, which translate the physical data models into a common model on-the-fly and handle large result set streaming. System modules are connected via reusable Apache Camel integration routes and deployed to an OSGi enterprise service bus. We present an application of our architecture that allows users to construct queries via the i2b2 web front-end, and federates patient data from the University of Utah Enterprise Data Warehouse and the Utah Population database. Our system can be easily adopted, extended and integrated with existing SOA Healthcare and HL7 frameworks such as i2b2 and caGrid.
PLAN2L: a web tool for integrated text mining and literature-based bioentity relation extraction.
Krallinger, Martin; Rodriguez-Penagos, Carlos; Tendulkar, Ashish; Valencia, Alfonso
2009-07-01
There is an increasing interest in using literature mining techniques to complement information extracted from annotation databases or generated by bioinformatics applications. Here we present PLAN2L, a web-based online search system that integrates text mining and information extraction techniques to access systematically information useful for analyzing genetic, cellular and molecular aspects of the plant model organism Arabidopsis thaliana. Our system facilitates a more efficient retrieval of information relevant to heterogeneous biological topics, from implications in biological relationships at the level of protein interactions and gene regulation, to sub-cellular locations of gene products and associations to cellular and developmental processes, i.e. cell cycle, flowering, root, leaf and seed development. Beyond single entities, also predefined pairs of entities can be provided as queries for which literature-derived relations together with textual evidences are returned. PLAN2L does not require registration and is freely accessible at http://zope.bioinfo.cnio.es/plan2l.
Mongeau, R; Casu, M A; Pani, L; Pillolla, G; Lianas, L; Giachetti, A
2008-05-01
The vast amount of heterogeneous data generated in various fields of neurosciences such as neuropsychopharmacology can hardly be classified using traditional databases. We present here the concept of a virtual archive, spatially referenced over a simplified 3D brain map and accessible over the Internet. A simple prototype (available at http://aquatics.crs4.it/neuropsydat3d) has been realized using current Web-based virtual reality standards and technologies. It illustrates how primary literature or summary information can easily be retrieved through hyperlinks mapped onto a 3D schema while navigating through neuroanatomy. Furthermore, 3D navigation and visualization techniques are used to enhance the representation of brain's neurotransmitters, pathways and the involvement of specific brain areas in any particular physiological or behavioral functions. The system proposed shows how the use of a schematic spatial organization of data, widely exploited in other fields (e.g. Geographical Information Systems) can be extremely useful to develop efficient tools for research and teaching in neurosciences.
Dietary fibre: challenges in production and use of food composition data.
Westenbrink, Susanne; Brunt, Kommer; van der Kamp, Jan-Willem
2013-10-01
Dietary fibre is a heterogeneous group of components for which several definitions and analytical methods were developed over the past decades, causing confusion among users and producers of dietary fibre data in food composition databases. An overview is given of current definitions and analytical methods. Some of the issues related to maintaining dietary fibre values in food composition databases are discussed. Newly developed AOAC methods (2009.01 or modifications) yield higher dietary fibre values, due to the inclusion of low molecular weight dietary fibre and resistant starch. For food composition databases procedures need to be developed to combine 'classic' and 'new' dietary fibre values since re-analysing all foods on short notice is impossible due to financial restrictions. Standardised value documentation procedures are important to evaluate dietary fibre values from several sources before exchanging and using the data, e.g. for dietary intake research. Copyright © 2012 Elsevier Ltd. All rights reserved.
RAIN: RNA–protein Association and Interaction Networks
Junge, Alexander; Refsgaard, Jan C.; Garde, Christian; Pan, Xiaoyong; Santos, Alberto; Alkan, Ferhat; Anthon, Christian; von Mering, Christian; Workman, Christopher T.; Jensen, Lars Juhl; Gorodkin, Jan
2017-01-01
Protein association networks can be inferred from a range of resources including experimental data, literature mining and computational predictions. These types of evidence are emerging for non-coding RNAs (ncRNAs) as well. However, integration of ncRNAs into protein association networks is challenging due to data heterogeneity. Here, we present a database of ncRNA–RNA and ncRNA–protein interactions and its integration with the STRING database of protein–protein interactions. These ncRNA associations cover four organisms and have been established from curated examples, experimental data, interaction predictions and automatic literature mining. RAIN uses an integrative scoring scheme to assign a confidence score to each interaction. We demonstrate that RAIN outperforms the underlying microRNA-target predictions in inferring ncRNA interactions. RAIN can be operated through an easily accessible web interface and all interaction data can be downloaded. Database URL: http://rth.dk/resources/rain PMID:28077569
Tempest: Accelerated MS/MS Database Search Software for Heterogeneous Computing Platforms.
Adamo, Mark E; Gerber, Scott A
2016-09-07
MS/MS database search algorithms derive a set of candidate peptide sequences from in silico digest of a protein sequence database, and compute theoretical fragmentation patterns to match these candidates against observed MS/MS spectra. The original Tempest publication described these operations mapped to a CPU-GPU model, in which the CPU (central processing unit) generates peptide candidates that are asynchronously sent to a discrete GPU (graphics processing unit) to be scored against experimental spectra in parallel. The current version of Tempest expands this model, incorporating OpenCL to offer seamless parallelization across multicore CPUs, GPUs, integrated graphics chips, and general-purpose coprocessors. Three protocols describe how to configure and run a Tempest search, including discussion of how to leverage Tempest's unique feature set to produce optimal results. © 2016 by John Wiley & Sons, Inc. Copyright © 2016 John Wiley & Sons, Inc.
Indian genetic disease database
Pradhan, Sanchari; Sengupta, Mainak; Dutta, Anirban; Bhattacharyya, Kausik; Bag, Sumit K.; Dutta, Chitra; Ray, Kunal
2011-01-01
Indians, representing about one-sixth of the world population, consist of several thousands of endogamous groups with strong potential for excess of recessive diseases. However, no database is available on Indian population with comprehensive information on the diseases common in the country. To address this issue, we present Indian Genetic Disease Database (IGDD) release 1.0 (http://www.igdd.iicb.res.in)—an integrated and curated repository of growing number of mutation data on common genetic diseases afflicting the Indian populations. Currently the database covers 52 diseases with information on 5760 individuals carrying the mutant alleles of causal genes. Information on locus heterogeneity, type of mutation, clinical and biochemical data, geographical location and common mutations are furnished based on published literature. The database is currently designed to work best with Internet Explorer 8 (optimal resolution 1440 × 900) and it can be searched based on disease of interest, causal gene, type of mutation and geographical location of the patients or carriers. Provisions have been made for deposition of new data and logistics for regular updation of the database. The IGDD web portal, planned to be made freely available, contains user-friendly interfaces and is expected to be highly useful to the geneticists, clinicians, biologists and patient support groups of various genetic diseases. PMID:21037256
Menditto, Enrica; Bolufer De Gea, Angela; Cahir, Caitriona; Marengoni, Alessandra; Riegler, Salvatore; Fico, Giuseppe; Costa, Elisio; Monaco, Alessandro; Pecorelli, Sergio; Pani, Luca; Prados-Torres, Alexandra
2016-01-01
Computerized health care databases have been widely described as an excellent opportunity for research. The availability of “big data” has brought about a wave of innovation in projects when conducting health services research. Most of the available secondary data sources are restricted to the geographical scope of a given country and present heterogeneous structure and content. Under the umbrella of the European Innovation Partnership on Active and Healthy Ageing, collaborative work conducted by the partners of the group on “adherence to prescription and medical plans” identified the use of observational and large-population databases to monitor medication-taking behavior in the elderly. This article describes the methodology used to gather the information from available databases among the Adherence Action Group partners with the aim of improving data sharing on a European level. A total of six databases belonging to three different European countries (Spain, Republic of Ireland, and Italy) were included in the analysis. Preliminary results suggest that there are some similarities. However, these results should be applied in different contexts and European countries, supporting the idea that large European studies should be designed in order to get the most of already available databases. PMID:27358570
A distributed scheduling algorithm for heterogeneous real-time systems
NASA Technical Reports Server (NTRS)
Zeineldine, Osman; El-Toweissy, Mohamed; Mukkamala, Ravi
1991-01-01
Much of the previous work on load balancing and scheduling in distributed environments was concerned with homogeneous systems and homogeneous loads. Several of the results indicated that random policies are as effective as other more complex load allocation policies. The effects of heterogeneity on scheduling algorithms for hard real time systems is examined. A distributed scheduler specifically to handle heterogeneities in both nodes and node traffic is proposed. The performance of the algorithm is measured in terms of the percentage of jobs discarded. While a random task allocation is very sensitive to heterogeneities, the algorithm is shown to be robust to such non-uniformities in system components and load.
Measuring the effects of heterogeneity on distributed systems
NASA Technical Reports Server (NTRS)
El-Toweissy, Mohamed; Zeineldine, Osman; Mukkamala, Ravi
1991-01-01
Distributed computer systems in daily use are becoming more and more heterogeneous. Currently, much of the design and analysis studies of such systems assume homogeneity. This assumption of homogeneity has been mainly driven by the resulting simplicity in modeling and analysis. A simulation study is presented which investigated the effects of heterogeneity on scheduling algorithms for hard real time distributed systems. In contrast to previous results which indicate that random scheduling may be as good as a more complex scheduler, this algorithm is shown to be consistently better than a random scheduler. This conclusion is more prevalent at high workloads as well as at high levels of heterogeneity.
A data colocation grid framework for big data medical image processing: backend design
NASA Astrophysics Data System (ADS)
Bao, Shunxing; Huo, Yuankai; Parvathaneni, Prasanna; Plassard, Andrew J.; Bermudez, Camilo; Yao, Yuang; Lyu, Ilwoo; Gokhale, Aniruddha; Landman, Bennett A.
2018-03-01
When processing large medical imaging studies, adopting high performance grid computing resources rapidly becomes important. We recently presented a "medical image processing-as-a-service" grid framework that offers promise in utilizing the Apache Hadoop ecosystem and HBase for data colocation by moving computation close to medical image storage. However, the framework has not yet proven to be easy to use in a heterogeneous hardware environment. Furthermore, the system has not yet validated when considering variety of multi-level analysis in medical imaging. Our target design criteria are (1) improving the framework's performance in a heterogeneous cluster, (2) performing population based summary statistics on large datasets, and (3) introducing a table design scheme for rapid NoSQL query. In this paper, we present a heuristic backend interface application program interface (API) design for Hadoop and HBase for Medical Image Processing (HadoopBase-MIP). The API includes: Upload, Retrieve, Remove, Load balancer (for heterogeneous cluster) and MapReduce templates. A dataset summary statistic model is discussed and implemented by MapReduce paradigm. We introduce a HBase table scheme for fast data query to better utilize the MapReduce model. Briefly, 5153 T1 images were retrieved from a university secure, shared web database and used to empirically access an in-house grid with 224 heterogeneous CPU cores. Three empirical experiments results are presented and discussed: (1) load balancer wall-time improvement of 1.5-fold compared with a framework with built-in data allocation strategy, (2) a summary statistic model is empirically verified on grid framework and is compared with the cluster when deployed with a standard Sun Grid Engine (SGE), which reduces 8-fold of wall clock time and 14-fold of resource time, and (3) the proposed HBase table scheme improves MapReduce computation with 7 fold reduction of wall time compare with a naïve scheme when datasets are relative small. The source code and interfaces have been made publicly available.
A Data Colocation Grid Framework for Big Data Medical Image Processing: Backend Design.
Bao, Shunxing; Huo, Yuankai; Parvathaneni, Prasanna; Plassard, Andrew J; Bermudez, Camilo; Yao, Yuang; Lyu, Ilwoo; Gokhale, Aniruddha; Landman, Bennett A
2018-03-01
When processing large medical imaging studies, adopting high performance grid computing resources rapidly becomes important. We recently presented a "medical image processing-as-a-service" grid framework that offers promise in utilizing the Apache Hadoop ecosystem and HBase for data colocation by moving computation close to medical image storage. However, the framework has not yet proven to be easy to use in a heterogeneous hardware environment. Furthermore, the system has not yet validated when considering variety of multi-level analysis in medical imaging. Our target design criteria are (1) improving the framework's performance in a heterogeneous cluster, (2) performing population based summary statistics on large datasets, and (3) introducing a table design scheme for rapid NoSQL query. In this paper, we present a heuristic backend interface application program interface (API) design for Hadoop & HBase for Medical Image Processing (HadoopBase-MIP). The API includes: Upload, Retrieve, Remove, Load balancer (for heterogeneous cluster) and MapReduce templates. A dataset summary statistic model is discussed and implemented by MapReduce paradigm. We introduce a HBase table scheme for fast data query to better utilize the MapReduce model. Briefly, 5153 T1 images were retrieved from a university secure, shared web database and used to empirically access an in-house grid with 224 heterogeneous CPU cores. Three empirical experiments results are presented and discussed: (1) load balancer wall-time improvement of 1.5-fold compared with a framework with built-in data allocation strategy, (2) a summary statistic model is empirically verified on grid framework and is compared with the cluster when deployed with a standard Sun Grid Engine (SGE), which reduces 8-fold of wall clock time and 14-fold of resource time, and (3) the proposed HBase table scheme improves MapReduce computation with 7 fold reduction of wall time compare with a naïve scheme when datasets are relative small. The source code and interfaces have been made publicly available.
A Data Colocation Grid Framework for Big Data Medical Image Processing: Backend Design
Huo, Yuankai; Parvathaneni, Prasanna; Plassard, Andrew J.; Bermudez, Camilo; Yao, Yuang; Lyu, Ilwoo; Gokhale, Aniruddha; Landman, Bennett A.
2018-01-01
When processing large medical imaging studies, adopting high performance grid computing resources rapidly becomes important. We recently presented a "medical image processing-as-a-service" grid framework that offers promise in utilizing the Apache Hadoop ecosystem and HBase for data colocation by moving computation close to medical image storage. However, the framework has not yet proven to be easy to use in a heterogeneous hardware environment. Furthermore, the system has not yet validated when considering variety of multi-level analysis in medical imaging. Our target design criteria are (1) improving the framework’s performance in a heterogeneous cluster, (2) performing population based summary statistics on large datasets, and (3) introducing a table design scheme for rapid NoSQL query. In this paper, we present a heuristic backend interface application program interface (API) design for Hadoop & HBase for Medical Image Processing (HadoopBase-MIP). The API includes: Upload, Retrieve, Remove, Load balancer (for heterogeneous cluster) and MapReduce templates. A dataset summary statistic model is discussed and implemented by MapReduce paradigm. We introduce a HBase table scheme for fast data query to better utilize the MapReduce model. Briefly, 5153 T1 images were retrieved from a university secure, shared web database and used to empirically access an in-house grid with 224 heterogeneous CPU cores. Three empirical experiments results are presented and discussed: (1) load balancer wall-time improvement of 1.5-fold compared with a framework with built-in data allocation strategy, (2) a summary statistic model is empirically verified on grid framework and is compared with the cluster when deployed with a standard Sun Grid Engine (SGE), which reduces 8-fold of wall clock time and 14-fold of resource time, and (3) the proposed HBase table scheme improves MapReduce computation with 7 fold reduction of wall time compare with a naïve scheme when datasets are relative small. The source code and interfaces have been made publicly available. PMID:29887668
Davison, John; Öpik, Maarja; Zobel, Martin; Vasar, Martti; Metsis, Madis; Moora, Mari
2012-01-01
Despite the important ecosystem role played by arbuscular mycorrhizal fungi (AMF), little is known about spatial and temporal variation in soil AMF communities. We used pyrosequencing to characterise AMF communities in soil samples (n = 44) from a natural forest ecosystem. Fungal taxa were identified by BLAST matching of reads against the MaarjAM database of AMF SSU rRNA gene diversity. Sub-sampling within our dataset and experimental shortening of a set of long reads indicated that our approaches to taxonomic identification and diversity analysis were robust to variations in pyrosequencing read length and numbers of reads per sample. Different forest plots (each 10×10 m and separated from one another by 30 m) contained significantly different soil AMF communities, and the pairwise similarity of communities decreased with distance up to 50 m. However, there were no significant changes in community composition between different time points in the growing season (May-September). Spatial structure in soil AMF communities may be related to the heterogeneous vegetation of the natural forest study system, while the temporal stability of communities suggests that AMF in soil represent a fairly constant local species pool from which mycorrhizae form and disband during the season. PMID:22879900
Zhang, Yanqi; Zhou, Liang; Liu, Xiaoyu; Liu, Ling; Wu, Yazhou; Zhao, Zengwei; Yi, Dali; Yi, Dong
2015-01-01
Although the problem-based learning (PBL) emerged in 1969 and was soon widely applied internationally, the rapid development in China only occurred in the last 10 years. This study aims to compare the effect of PBL and lecture-based learning (LBL) on student course examination results for introductory Chinese undergraduate medical courses. Randomized and nonrandomized controlled trial studies on PBL use in Chinese undergraduate medical education were retrieved through PubMed, the Excerpta Medica Database (EMBASE), Chinese National Knowledge Infrastructure (CNKI) and VIP China Science and Technology Journal Database (VIP-CSTJ) with publication dates from 1st January 1966 till 31 August 2014. The pass rate, excellence rate and examination scores of course examination were collected. Methodological quality was evaluated based on the modified Jadad scale. The I-square statistic and Chi-square test of heterogeneity were used to assess the statistical heterogeneity. Overall RRs or SMDs with their 95% CIs were calculated in meta-analysis. Meta-regression and subgroup meta-analyses were also performed based on comparators and other confounding factors. Funnel plots and Egger's tests were performed to assess degrees of publication bias. The meta-analysis included 31studies and 4,699 subjects. Fourteen studies were of high quality with modified Jadad scores of 4 to 6, and 17 studies were of low quality with scores of 1 to 3. Relative to the LBL model, the PBL model yielded higher course examination pass rates [RR = 1.09, 95%CI (1.03, 1.17)], excellence rates [RR = 1.66, 95%CI (1.33, 2.06)] and examination scores [SMD = 0.82, 95%CI (0.63, 1.01)]. The meta-regression results show that course type was the significant confounding factor that caused heterogeneity in the examination-score meta-analysis (t = 0.410, P<0.001). The examination score SMD in "laboratory course" subgroup [SMD = 2.01, 95% CI: (1.50, 2.52)] was higher than that in "theory course" subgroup [SMD = 0.72, 95% CI: (0.56, 0.89)]. PBL teaching model application in introductory undergraduate medical courses can increase course examination excellence rates and scores in Chinese medical education system. It is more effective when applied to laboratory courses than to theory-based courses.
Zhang, Yanqi; Zhou, Liang; Liu, Xiaoyu; Liu, Ling; Wu, Yazhou; Zhao, Zengwei; Yi, Dali; Yi, Dong
2015-01-01
Background Although the problem-based learning (PBL) emerged in 1969 and was soon widely applied internationally, the rapid development in China only occurred in the last 10 years. This study aims to compare the effect of PBL and lecture-based learning (LBL) on student course examination results for introductory Chinese undergraduate medical courses. Methods Randomized and nonrandomized controlled trial studies on PBL use in Chinese undergraduate medical education were retrieved through PubMed, the Excerpta Medica Database (EMBASE), Chinese National Knowledge Infrastructure (CNKI) and VIP China Science and Technology Journal Database (VIP-CSTJ) with publication dates from 1st January 1966 till 31 August 2014. The pass rate, excellence rate and examination scores of course examination were collected. Methodological quality was evaluated based on the modified Jadad scale. The I-square statistic and Chi-square test of heterogeneity were used to assess the statistical heterogeneity. Overall RRs or SMDs with their 95% CIs were calculated in meta-analysis. Meta-regression and subgroup meta-analyses were also performed based on comparators and other confounding factors. Funnel plots and Egger’s tests were performed to assess degrees of publication bias. Results The meta-analysis included 31studies and 4,699 subjects. Fourteen studies were of high quality with modified Jadad scores of 4 to 6, and 17 studies were of low quality with scores of 1 to 3. Relative to the LBL model, the PBL model yielded higher course examination pass rates [RR = 1.09, 95%CI (1.03, 1.17)], excellence rates [RR = 1.66, 95%CI (1.33, 2.06)] and examination scores [SMD = 0.82, 95%CI (0.63, 1.01)]. The meta-regression results show that course type was the significant confounding factor that caused heterogeneity in the examination-score meta-analysis (t = 0.410, P<0.001). The examination score SMD in “laboratory course” subgroup [SMD = 2.01, 95% CI: (1.50, 2.52)] was higher than that in “theory course” subgroup [SMD = 0.72, 95% CI: (0.56, 0.89)]. Conclusions PBL teaching model application in introductory undergraduate medical courses can increase course examination excellence rates and scores in Chinese medical education system. It is more effective when applied to laboratory courses than to theory-based courses. PMID:25822653
Aging and wave-component latency delays in oVEMP and cVEMP: a systematic review with meta-analysis.
Macambira, Ysa Karen Dos Santos; Carnaúba, Aline Tenório Lins; Fernandes, Luciana Castelo Branco Camurça; Bueno, Nassib Bezerra; Menezes, Pedro de Lemos
The natural aging process may result in morphological changes in the vestibular system and in the afferent neural pathway, including loss of hair cells, decreased numbers of vestibular nerve cells, and loss of neurons in the vestibular nucleus. Thus, with advancing age, there should be a decrease in amplitudes and an increase in latencies of the vestibular evoked myogenic potentials, especially the prolongation of p13 latency. Moreover, many investigations have found no significant differences in latencies with advancing age. To determine if there are significant differences in the latencies of cervical and ocular evoked myogenic potentials between elderly and adult patients. This is a systematic review with meta-analysis of observational studies, comparing the differences of these parameters between elderly and young adults, without language or date restrictions, in the following databases: Pubmed, ScienceDirect, SCOPUS, Web of Science, SciELO and LILACS, in addition to the gray literature databases: OpenGrey.eu and DissOnline, as well as Research Gate. The n1 oVEMP latencies had a mean delay in the elderly of 2.32ms with 95% CI of 0.55-4.10ms. The overall effect test showed p=0.01, disclosing that such difference was significant. The heterogeneity found was I 2 =96% (p<0.001). Evaluation of p1 latency was not possible due to the low number of articles selected for this condition. cVEMP analysis was performed in 13 articles. For the p13 component, the mean latency delay in the elderly was 1.34ms with 95% CI of 0.56-2.11ms. The overall effect test showed a p<0.001, with heterogeneity value I 2 =92% (p<0.001). For the n23 component, the mean latency delay for the elderly was 2.82ms with 95% CI of 0.33-5.30ms. The overall effect test showed p=0.03. The heterogeneity found was I 2 =99% (p<0.001). The latency of oVEMP n1 wave component and latencies of cVEMP p13 and n23 wave components are longer in the elderly aged >60 years than in young adults. Copyright © 2017 Associação Brasileira de Otorrinolaringologia e Cirurgia Cérvico-Facial. Published by Elsevier Editora Ltda. All rights reserved.
Translation from the collaborative OSM database to cartography
NASA Astrophysics Data System (ADS)
Hayat, Flora
2018-05-01
The OpenStreetMap (OSM) database includes original items very useful for geographical analysis and for creating thematic maps. Contributors record in the open database various themes regarding amenities, leisure, transports, buildings and boundaries. The Michelin mapping department develops map prototypes to test the feasibility of mapping based on OSM. To translate the OSM database structure into a database structure fitted with Michelin graphic guidelines a research project is in development. It aims at defining the right structure for the Michelin uses. The research project relies on the analysis of semantic and geometric heterogeneities in OSM data. In that order, Michelin implements methods to transform the input geographical database into a cartographic image dedicated for specific uses (routing and tourist maps). The paper focuses on the mapping tools available to produce a personalised spatial database. Based on processed data, paper and Web maps can be displayed. Two prototypes are described in this article: a vector tile web map and a mapping method to produce paper maps on a regional scale. The vector tile mapping method offers an easy navigation within the map and within graphic and thematic guide- lines. Paper maps can be partly automatically drawn. The drawing automation and data management are part of the mapping creation as well as the final hand-drawing phase. Both prototypes have been set up using the OSM technical ecosystem.
Bleda, Marta; Tarraga, Joaquin; de Maria, Alejandro; Salavert, Francisco; Garcia-Alonso, Luz; Celma, Matilde; Martin, Ainoha; Dopazo, Joaquin; Medina, Ignacio
2012-07-01
During the past years, the advances in high-throughput technologies have produced an unprecedented growth in the number and size of repositories and databases storing relevant biological data. Today, there is more biological information than ever but, unfortunately, the current status of many of these repositories is far from being optimal. Some of the most common problems are that the information is spread out in many small databases; frequently there are different standards among repositories and some databases are no longer supported or they contain too specific and unconnected information. In addition, data size is increasingly becoming an obstacle when accessing or storing biological data. All these issues make very difficult to extract and integrate information from different sources, to analyze experiments or to access and query this information in a programmatic way. CellBase provides a solution to the growing necessity of integration by easing the access to biological data. CellBase implements a set of RESTful web services that query a centralized database containing the most relevant biological data sources. The database is hosted in our servers and is regularly updated. CellBase documentation can be found at http://docs.bioinfo.cipf.es/projects/cellbase.
Comparison of human cell signaling pathway databases—evolution, drawbacks and challenges
Chowdhury, Saikat; Sarkar, Ram Rup
2015-01-01
Elucidating the complexities of cell signaling pathways is of immense importance to gain understanding about various biological phenomenon, such as dynamics of gene/protein expression regulation, cell fate determination, embryogenesis and disease progression. The successful completion of human genome project has also helped experimental and theoretical biologists to analyze various important pathways. To advance this study, during the past two decades, systematic collections of pathway data from experimental studies have been compiled and distributed freely by several databases, which also integrate various computational tools for further analysis. Despite significant advancements, there exist several drawbacks and challenges, such as pathway data heterogeneity, annotation, regular update and automated image reconstructions, which motivated us to perform a thorough review on popular and actively functioning 24 cell signaling databases. Based on two major characteristics, pathway information and technical details, freely accessible data from commercial and academic databases are examined to understand their evolution and enrichment. This review not only helps to identify some novel and useful features, which are not yet included in any of the databases but also highlights their current limitations and subsequently propose the reasonable solutions for future database development, which could be useful to the whole scientific community. PMID:25632107
Rhythm and mood: relationships between the circadian clock and mood-related behavior.
Schnell, Anna; Albrecht, Urs; Sandrelli, Federica
2014-06-01
Mood disorders are multifactorial and heterogeneous diseases caused by the interplay of several genetic and environmental factors. In humans, mood disorders are often accompanied by abnormalities in the organization of the circadian system, which normally synchronizes activities and functions of cells and tissues. Studies on animal models suggest that the basic circadian clock mechanism, which runs in essentially all cells, is implicated in the modulation of biological phenomena regulating affective behaviors. In particular, recent findings highlight the importance of the circadian clock mechanisms in neurological pathways involved in mood, such as monoaminergic neurotransmission, hypothalamus-pituitary-adrenal axis regulation, suprachiasmatic nucleus and olfactory bulb activities, and neurogenesis. Defects at the level of both, the circadian clock mechanism and system, may contribute to the etiology of mood disorders. Modification of the circadian system using chronotherapy appears to be an effective treatment for mood disorders. Additionally, understanding the role of circadian clock mechanisms, which affect the regulation of different mood pathways, will open up the possibility for targeted pharmacological treatments. PsycINFO Database Record (c) 2014 APA, all rights reserved.
Dong, Y; Wu, G
2017-09-01
We carried out a meta-analysis to explore the association between poultry and eggs consumption and non-Hodgkin lymphoma (NHL) risk according to the published observational studies. A search of databases was performed in MEDLINE and EMBASE from their inception to March 2015. We derived meta-analytic estimates using random-effects models, and assessed between-study heterogeneity using the Cochran's Q and I 2 statistics. We identified a total of nine case-control and three prospective cohort studies, including 11,271 subjects with NHL. The summary relative risks for high vs. low analyses were 1.04 (95% confidence intervals [CIs]: 0.86-1.27; p heterogeneity <.001, I 2 = 84.0%) for poultry consumption and 1.15 (95% CIs: 0.87-1.51; p heterogeneity <.001, I 2 = 85.3%) for egg consumption. Meta-regression analysis showed that study locations, study quality, type of Food Frequency Questionnaire and confounders adjusted for total energy intake contributed to the high heterogeneity among the studies on poultry consumption, whereas no significant factors were responsible for the high heterogeneity among the studies on eggs consumption. Limited data suggested a null association between consumption of poultry and eggs and NHL subtypes. Findings from our meta-analysis indicate that consumption of poultry and eggs may be not related to NHL risk. © 2016 John Wiley & Sons Ltd.
Schendel, Diana E; Bresnahan, Michaeline; Carter, Kim W; Francis, Richard W; Gissler, Mika; Grønborg, Therese K; Gross, Raz; Gunnes, Nina; Hornig, Mady; Hultman, Christina M; Langridge, Amanda; Lauritsen, Marlene B; Leonard, Helen; Parner, Erik T; Reichenberg, Abraham; Sandin, Sven; Sourander, Andre; Stoltenberg, Camilla; Suominen, Auli; Surén, Pål; Susser, Ezra
2013-11-01
The International Collaboration for Autism Registry Epidemiology (iCARE) is the first multinational research consortium (Australia, Denmark, Finland, Israel, Norway, Sweden, USA) to promote research in autism geographical and temporal heterogeneity, phenotype, family and life course patterns, and etiology. iCARE devised solutions to challenges in multinational collaboration concerning data access security, confidentiality and management. Data are obtained by integrating existing national or state-wide, population-based, individual-level data systems and undergo rigorous harmonization and quality control processes. Analyses are performed using database federation via a computational infrastructure with a secure, web-based, interface. iCARE provides a unique, unprecedented resource in autism research that will significantly enhance the ability to detect environmental and genetic contributions to the causes and life course of autism.
Pituitary gene mutations and the growth hormone pathway.
Moseley, C T; Phillips, J A
2000-01-01
Hereditary forms of pituitary insufficiency not associated with anatomic defects of the central nervous system, hypothalamus, or pituitary are a heterogeneous group of disorders that result from interruptions at different points in the hypothalamic-pituitary-somatomedin-peripheral tissue axis. These different types of pituitary dwarfism can be classified on the level of the defect; mode of inheritance; whether the phenotype is isolated growth hormone deficiency (IGHD) or combined pituitary hormone deficiency (CPHD); whether the hormone is absent, deficient, or abnormal; and, in patients with GH resistance, whether insulin-like growth factor 1 (IGF1) is deficient due to GH receptor or IGF1 defects. Information on each disorder is summarized. More detailed information can be obtained through the electronic database Online Mendelian Inheritance in Man which is available at http://www3.ncbi.nlm.nih.gov/Omim/.
Finite-time consensus for controlled dynamical systems in network
NASA Astrophysics Data System (ADS)
Zoghlami, Naim; Mlayeh, Rhouma; Beji, Lotfi; Abichou, Azgal
2018-04-01
The key challenges in networked dynamical systems are the component heterogeneities, nonlinearities, and the high dimension of the formulated vector of state variables. In this paper, the emphasise is put on two classes of systems in network include most controlled driftless systems as well as systems with drift. For each model structure that defines homogeneous and heterogeneous multi-system behaviour, we derive protocols leading to finite-time consensus. For each model evolving in networks forming a homogeneous or heterogeneous multi-system, protocols integrating sufficient conditions are derived leading to finite-time consensus. Likewise, for the networking topology, we make use of fixed directed and undirected graphs. To prove our approaches, finite-time stability theory and Lyapunov methods are considered. As illustrative examples, the homogeneous multi-unicycle kinematics and the homogeneous/heterogeneous multi-second order dynamics in networks are studied.
Federal Register 2010, 2011, 2012, 2013, 2014
2013-02-01
... DEPARTMENT OF JUSTICE Antitrust Division Notice Pursuant to the National Cooperative Research and Production Act of 1993--Heterogeneous System Architecture Foundation Notice is hereby given that, on December..., 15 U.S.C. 4301 et seq. (``the Act''), Heterogeneous System Architecture Foundation (``HSA Foundation...
Federal Register 2010, 2011, 2012, 2013, 2014
2013-12-30
... DEPARTMENT OF JUSTICE Antitrust Division Notice Pursuant to the National Cooperative Research and Production Act of 1993--Heterogeneous System Architecture Foundation Notice is hereby given that, on November..., 15 U.S.C. Sec. 4301 et seq. (``the Act''), Heterogeneous System Architecture Foundation (``HSA...
Federal Register 2010, 2011, 2012, 2013, 2014
2013-10-28
... DEPARTMENT OF JUSTICE Antitrust Division Notice Pursuant to the National Cooperative Research and Production Act of 1993--Heterogeneous System Architecture Foundation Notice is hereby given that, on..., 15 U.S.C. 4301 et seq. (``the Act''), Heterogeneous System Architecture Foundation (``HSA Foundation...
Federal Register 2010, 2011, 2012, 2013, 2014
2012-11-06
... DEPARTMENT OF JUSTICE Antitrust Division Notice Pursuant to the National Cooperative Research and Production Act of 1993--Heterogeneous System Architecture Foundation Notice is hereby given that, on October..., 15 U.S.C. 301 et seq. (``the Act''), Heterogeneous System Architecture Foundation (``HSA Foundation...
2014-10-01
offer a practical solution to calculating the grain -scale hetero- geneity present in the deformation field. Consequently, crystal plasticity models...process/performance simulation codes (e.g., crystal plasticity finite element method). 15. SUBJECT TERMS ICME; microstructure informatics; higher...iii) protocols for direct and efficient linking of materials models/databases into process/performance simulation codes (e.g., crystal plasticity
Le, Duc-Hau; Verbeke, Lieven; Son, Le Hoang; Chu, Dinh-Toi; Pham, Van-Huy
2017-11-14
MicroRNAs (miRNAs) have been shown to play an important role in pathological initiation, progression and maintenance. Because identification in the laboratory of disease-related miRNAs is not straightforward, numerous network-based methods have been developed to predict novel miRNAs in silico. Homogeneous networks (in which every node is a miRNA) based on the targets shared between miRNAs have been widely used to predict their role in disease phenotypes. Although such homogeneous networks can predict potential disease-associated miRNAs, they do not consider the roles of the target genes of the miRNAs. Here, we introduce a novel method based on a heterogeneous network that not only considers miRNAs but also the corresponding target genes in the network model. Instead of constructing homogeneous miRNA networks, we built heterogeneous miRNA networks consisting of both miRNAs and their target genes, using databases of known miRNA-target gene interactions. In addition, as recent studies demonstrated reciprocal regulatory relations between miRNAs and their target genes, we considered these heterogeneous miRNA networks to be undirected, assuming mutual miRNA-target interactions. Next, we introduced a novel method (RWRMTN) operating on these mutual heterogeneous miRNA networks to rank candidate disease-related miRNAs using a random walk with restart (RWR) based algorithm. Using both known disease-associated miRNAs and their target genes as seed nodes, the method can identify additional miRNAs involved in the disease phenotype. Experiments indicated that RWRMTN outperformed two existing state-of-the-art methods: RWRMDA, a network-based method that also uses a RWR on homogeneous (rather than heterogeneous) miRNA networks, and RLSMDA, a machine learning-based method. Interestingly, we could relate this performance gain to the emergence of "disease modules" in the heterogeneous miRNA networks used as input for the algorithm. Moreover, we could demonstrate that RWRMTN is stable, performing well when using both experimentally validated and predicted miRNA-target gene interaction data for network construction. Finally, using RWRMTN, we identified 76 novel miRNAs associated with 23 disease phenotypes which were present in a recent database of known disease-miRNA associations. Summarizing, using random walks on mutual miRNA-target networks improves the prediction of novel disease-associated miRNAs because of the existence of "disease modules" in these networks.
Prevalence of hypertension among adolescents: systematic review and meta-analysis.
Gonçalves, Vivian Siqueira Santos; Galvão, Taís Freire; de Andrade, Keitty Regina Cordeiro; Dutra, Eliane Said; Bertolin, Maria Natacha Toral; de Carvalho, Kenia Mara Baiocchi; Pereira, Mauricio Gomes
2016-01-01
To estimate the prevalence of hypertension among adolescent Brazilian students. A systematic review of school-based cross-sectional studies was conducted. The articles were searched in the databases MEDLINE, Embase, Scopus, LILACS, SciELO, Web of Science, CAPES thesis database and Trip Database. In addition, we examined the lists of references of relevant studies to identify potentially eligible articles. No restrictions regarding publication date, language, or status applied. The studies were selected by two independent evaluators, who also extracted the data and assessed the methodological quality following eight criteria related to sampling, measuring blood pressure, and presenting results. The meta-analysis was calculated using a random effects model and analyses were performed to investigate heterogeneity. We retrieved 1,577 articles from the search and included 22 in the review. The included articles corresponded to 14,115 adolescents, 51.2% (n = 7,230) female. We observed a variety of techniques, equipment, and references used. The prevalence of hypertension was 8.0% (95%CI 5.0-11.0; I2 = 97.6%), 9.3% (95%CI 5.6-13.6; I2 = 96.4%) in males and 6.5% (95%CI 4.2-9.1; I2 = 94.2%) in females. The meta-regression failed to identify the causes of the heterogeneity among studies. Despite the differences found in the methodologies of the included studies, the results of this systematic review indicate that hypertension is prevalent in the Brazilian adolescent school population. For future investigations, we suggest the standardization of techniques, equipment, and references, aiming at improving the methodological quality of the studies.
Sami, Musa Basseer; Faruqui, Rafey
2015-12-01
Traumatic brain injury and stroke are among the leading causes of neurological disability worldwide. Although dopaminergic agents have long been associated with improvement of neuropsychiatric outcomes, to date much of the evidence to date has been in case reports and case series or open label trials. We undertook a systematic review of double-blinded randomised controlled trials (RCT) to determine the effect of dopaminergic agents on pre-defined outcomes of (a) apathy; (b) psychomotor retardation; (c) behavioural management and (d) cognitive function. Databases searched were: Medline, EMBASE, and PsychInfo for human studies. The Cochrane Clinical Trials Database and the TRIP Medical database were also searched. All identified studies, were further hand-searched. We identified six studies providing data on 227 participants, 150 of whom received dopaminergic therapy. Trials were compromised by cross-over design, inadequate wash out period, small numbers and heterogeneous outcome measures. However one good quality RCT demonstrates the efficacy of amantadine in behavioural management. One further RCT shows methylphenidate-levodopa is efficacious for mood post-stroke. One study shows rotigotine to improve hemi-inattention caused by prefrontal damage. Our systematic review demonstrates an evolving evidence base to suggest some benefits in agitation and aggression, mood and attentional deficits. However, there are key limitations of the studies undertaken to date involving small numbers of participants, heterogeneous outcome measures, and variable study designs. There is a need for on-going large prospective double-blind RCTs in these medications using standardised criteria and outcomes to fully understand their effectiveness in this patient group.
He, Jie; Yang, Xiaofang; Men, Bin; Wang, Dongsheng
2016-01-01
The heterogeneous Fenton reaction can generate highly reactive hydroxyl radicals (OH) from reactions between recyclable solid catalysts and H2O2 at acidic or even circumneutral pH. Hence, it can effectively oxidize refractory organics in water or soils and has become a promising environmentally friendly treatment technology. Due to the complex reaction system, the mechanism behind heterogeneous Fenton reactions remains unresolved but fascinating, and is crucial for understanding Fenton chemistry and the development and application of efficient heterogeneous Fenton technologies. Iron-based materials usually possess high catalytic activity, low cost, negligible toxicity and easy recovery, and are a superior type of heterogeneous Fenton catalysts. Therefore, this article reviews the fundamental but important interfacial mechanisms of heterogeneous Fenton reactions catalyzed by iron-based materials. OH, hydroperoxyl radicals/superoxide anions (HO2/O2(-)) and high-valent iron are the three main types of reactive oxygen species (ROS), with different oxidation reactivity and selectivity. Based on the mechanisms of ROS generation, the interfacial mechanisms of heterogeneous Fenton systems can be classified as the homogeneous Fenton mechanism induced by surface-leached iron, the heterogeneous catalysis mechanism, and the heterogeneous reaction-induced homogeneous mechanism. Different heterogeneous Fenton systems catalyzed by characteristic iron-based materials are comprehensively reviewed. Finally, related future research directions are also suggested. Copyright © 2015. Published by Elsevier B.V.
Robust mechanobiological behavior emerges in heterogeneous myosin systems.
Egan, Paul F; Moore, Jeffrey R; Ehrlicher, Allen J; Weitz, David A; Schunn, Christian; Cagan, Jonathan; LeDuc, Philip
2017-09-26
Biological complexity presents challenges for understanding natural phenomenon and engineering new technologies, particularly in systems with molecular heterogeneity. Such complexity is present in myosin motor protein systems, and computational modeling is essential for determining how collective myosin interactions produce emergent system behavior. We develop a computational approach for altering myosin isoform parameters and their collective organization, and support predictions with in vitro experiments of motility assays with α-actinins as molecular force sensors. The computational approach models variations in single myosin molecular structure, system organization, and force stimuli to predict system behavior for filament velocity, energy consumption, and robustness. Robustness is the range of forces where a filament is expected to have continuous velocity and depends on used myosin system energy. Myosin systems are shown to have highly nonlinear behavior across force conditions that may be exploited at a systems level by combining slow and fast myosin isoforms heterogeneously. Results suggest some heterogeneous systems have lower energy use near stall conditions and greater energy consumption when unloaded, therefore promoting robustness. These heterogeneous system capabilities are unique in comparison with homogenous systems and potentially advantageous for high performance bionanotechnologies. Findings open doors at the intersections of mechanics and biology, particularly for understanding and treating myosin-related diseases and developing approaches for motor molecule-based technologies.
Robust mechanobiological behavior emerges in heterogeneous myosin systems
NASA Astrophysics Data System (ADS)
Egan, Paul F.; Moore, Jeffrey R.; Ehrlicher, Allen J.; Weitz, David A.; Schunn, Christian; Cagan, Jonathan; LeDuc, Philip
2017-09-01
Biological complexity presents challenges for understanding natural phenomenon and engineering new technologies, particularly in systems with molecular heterogeneity. Such complexity is present in myosin motor protein systems, and computational modeling is essential for determining how collective myosin interactions produce emergent system behavior. We develop a computational approach for altering myosin isoform parameters and their collective organization, and support predictions with in vitro experiments of motility assays with α-actinins as molecular force sensors. The computational approach models variations in single myosin molecular structure, system organization, and force stimuli to predict system behavior for filament velocity, energy consumption, and robustness. Robustness is the range of forces where a filament is expected to have continuous velocity and depends on used myosin system energy. Myosin systems are shown to have highly nonlinear behavior across force conditions that may be exploited at a systems level by combining slow and fast myosin isoforms heterogeneously. Results suggest some heterogeneous systems have lower energy use near stall conditions and greater energy consumption when unloaded, therefore promoting robustness. These heterogeneous system capabilities are unique in comparison with homogenous systems and potentially advantageous for high performance bionanotechnologies. Findings open doors at the intersections of mechanics and biology, particularly for understanding and treating myosin-related diseases and developing approaches for motor molecule-based technologies.
Meertens, Robert; Casanova, Francesco; Knapp, Karen M; Thorn, Clare; Strain, William David
2018-05-04
A range of technologies using near infrared (NIR) light have shown promise at providing real time measurements of hemodynamic markers in bone tissue in vivo, an exciting prospect given existing difficulties in measuring hemodynamics in bone tissue. This systematic review aimed to evaluate the evidence for this potential use of NIR systems, establishing their potential as a research tool in this field. Major electronic databases including MEDLINE and EMBASE were searched using pre-planned search strategies with broad scope for any in vivo use of NIR technologies in human bone tissue. Following identification of studies by title and abstract screening, full text inclusion was determined by double blind assessment using predefined criteria. Full text studies for inclusion were data extracted using a predesigned proforma and quality assessed. Narrative synthesis was appropriate given the wide heterogeneity of included studies. Eighty-eight full text studies fulfilled the inclusion criteria, 57 addressing laser Doppler flowmetry (56 intra-operatively), 21 near infrared spectroscopy, and 10 photoplethysmography. The heterogeneity of the methodologies included differing hemodynamic markers, measurement protocols, anatomical locations, and research applications, making meaningful direct comparisons impossible. Further, studies were often limited by small sample sizes with potential selection biases, detection biases, and wide variability in results between participants. Despite promising potential in the use of NIR light to interrogate bone circulation, the application of NIR systems in bone requires rigorous assessment of the reproducibility of potential hemodynamic markers and further validation of these markers against alternative physiologically relevant reference standards. © 2018 Orthopaedic Research Society. Published by Wiley Periodicals, Inc. J Orthop Res 9999:1-9, 2018. © 2018 Orthopaedic Research Society. Published by Wiley Periodicals, Inc.
Accelerating DNA analysis applications on GPU clusters
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tumeo, Antonino; Villa, Oreste
DNA analysis is an emerging application of high performance bioinformatic. Modern sequencing machinery are able to provide, in few hours, large input streams of data which needs to be matched against exponentially growing databases known fragments. The ability to recognize these patterns effectively and fastly may allow extending the scale and the reach of the investigations performed by biology scientists. Aho-Corasick is an exact, multiple pattern matching algorithm often at the base of this application. High performance systems are a promising platform to accelerate this algorithm, which is computationally intensive but also inherently parallel. Nowadays, high performance systems also includemore » heterogeneous processing elements, such as Graphic Processing Units (GPUs), to further accelerate parallel algorithms. Unfortunately, the Aho-Corasick algorithm exhibits large performance variabilities, depending on the size of the input streams, on the number of patterns to search and on the number of matches, and poses significant challenges on current high performance software and hardware implementations. An adequate mapping of the algorithm on the target architecture, coping with the limit of the underlining hardware, is required to reach the desired high throughputs. Load balancing also plays a crucial role when considering the limited bandwidth among the nodes of these systems. In this paper we present an efficient implementation of the Aho-Corasick algorithm for high performance clusters accelerated with GPUs. We discuss how we partitioned and adapted the algorithm to fit the Tesla C1060 GPU and then present a MPI based implementation for a heterogeneous high performance cluster. We compare this implementation to MPI and MPI with pthreads based implementations for a homogeneous cluster of x86 processors, discussing the stability vs. the performance and the scaling of the solutions, taking into consideration aspects such as the bandwidth among the different nodes.« less
Santos, Carla Santana; Kowaltowski, Alicia J; Bertotti, Mauro
2017-09-12
We developed a highly sensitive oxygen consumption scanning microscopy system using platinized platinum disc microelectrodes. The system is capable of reliably detecting single-cell respiration, responding to classical regulators of mitochondrial oxygen consumption activity as expected. Comparisons with commercial multi-cell oxygen detection systems show that the system has comparable errors (if not smaller), with the advantage of being able to monitor inter and intra-cell heterogeneity in oxygen consumption characteristics. Our results uncover heterogeneous oxygen consumption characteristics between cells and within the same cell´s microenvironments. Single Cell Oxygen Mapping (SCOM) is thus capable of reliably studying mitochondrial oxygen consumption characteristics and heterogeneity at a single-cell level.
Enhancement and Validation of an Arab Surname Database
Schwartz, Kendra; Beebani, Ganj; Sedki, Mai; Tahhan, Mamon; Ruterbusch, Julie J.
2015-01-01
Objectives Arab Americans constitute a large, heterogeneous, and quickly growing subpopulation in the United States. Health statistics for this group are difficult to find because US governmental offices do not recognize Arab as separate from white. The development and validation of an Arab- and Chaldean-American name database will enhance research efforts in this population subgroup. Methods A previously validated name database was supplemented with newly identified names gathered primarily from vital statistic records and then evaluated using a multistep process. This process included 1) review by 4 Arabic- and Chaldean-speaking reviewers, 2) ethnicity assessment by social media searches, and 3) self-report of ancestry obtained from a telephone survey. Results Our Arab- and Chaldean-American name algorithm has a positive predictive value of 91% and a negative predictive value of 100%. Conclusions This enhanced name database and algorithm can be used to identify Arab Americans in health statistics data, such as cancer and hospital registries, where they are often coded as white, to determine the extent of health disparities in this population. PMID:24625771
Coding of time-dependent stimuli in homogeneous and heterogeneous neural populations.
Beiran, Manuel; Kruscha, Alexandra; Benda, Jan; Lindner, Benjamin
2018-04-01
We compare the information transmission of a time-dependent signal by two types of uncoupled neuron populations that differ in their sources of variability: i) a homogeneous population whose units receive independent noise and ii) a deterministic heterogeneous population, where each unit exhibits a different baseline firing rate ('disorder'). Our criterion for making both sources of variability quantitatively comparable is that the interspike-interval distributions are identical for both systems. Numerical simulations using leaky integrate-and-fire neurons unveil that a non-zero amount of both noise or disorder maximizes the encoding efficiency of the homogeneous and heterogeneous system, respectively, as a particular case of suprathreshold stochastic resonance. Our findings thus illustrate that heterogeneity can render similarly profitable effects for neuronal populations as dynamic noise. The optimal noise/disorder depends on the system size and the properties of the stimulus such as its intensity or cutoff frequency. We find that weak stimuli are better encoded by a noiseless heterogeneous population, whereas for strong stimuli a homogeneous population outperforms an equivalent heterogeneous system up to a moderate noise level. Furthermore, we derive analytical expressions of the coherence function for the cases of very strong noise and of vanishing intrinsic noise or heterogeneity, which predict the existence of an optimal noise intensity. Our results show that, depending on the type of signal, noise as well as heterogeneity can enhance the encoding performance of neuronal populations.
Heterogeneous population dynamics and scaling laws near epidemic outbreaks.
Widder, Andreas; Kuehn, Christian
2016-10-01
In this paper, we focus on the influence of heterogeneity and stochasticity of the population on the dynamical structure of a basic susceptible-infected-susceptible (SIS) model. First we prove that, upon a suitable mathematical reformulation of the basic reproduction number, the homogeneous system and the heterogeneous system exhibit a completely analogous global behaviour. Then we consider noise terms to incorporate the fluctuation effects and the random import of the disease into the population and analyse the influence of heterogeneity on warning signs for critical transitions (or tipping points). This theory shows that one may be able to anticipate whether a bifurcation point is close before it happens. We use numerical simulations of a stochastic fast-slow heterogeneous population SIS model and show various aspects of heterogeneity have crucial influences on the scaling laws that are used as early-warning signs for the homogeneous system. Thus, although the basic structural qualitative dynamical properties are the same for both systems, the quantitative features for epidemic prediction are expected to change and care has to be taken to interpret potential warning signs for disease outbreaks correctly.
NASA Astrophysics Data System (ADS)
Yang, Hongyong; Han, Fujun; Zhao, Mei; Zhang, Shuning; Yue, Jun
2017-08-01
Because many networked systems can only be characterized with fractional-order dynamics in complex environments, fractional-order calculus has been studied deeply recently. When diverse individual features are shown in different agents of networked systems, heterogeneous fractional-order dynamics will be used to describe the complex systems. Based on the distinguishing properties of agents, heterogeneous fractional-order multi-agent systems (FOMAS) are presented. With the supposition of multiple leader agents in FOMAS, distributed containment control of FOMAS is studied in directed weighted topologies. By applying Laplace transformation and frequency domain theory of the fractional-order operator, an upper bound of delays is obtained to ensure containment consensus of delayed heterogenous FOMAS. Consensus results of delayed FOMAS in this paper can be extended to systems with integer-order models. Finally, numerical examples are used to verify our results.
Morrison, Norman; Hancock, David; Hirschman, Lynette; Dawyndt, Peter; Verslyppe, Bert; Kyrpides, Nikos; Kottmann, Renzo; Yilmaz, Pelin; Glöckner, Frank Oliver; Grethe, Jeff; Booth, Tim; Sterk, Peter; Nenadic, Goran; Field, Dawn
2011-04-29
In the future, we hope to see an open and thriving data market in which users can find and select data from a wide range of data providers. In such an open access market, data are products that must be packaged accordingly. Increasingly, eCommerce sellers present heterogeneous product lines to buyers using faceted browsing. Using this approach we have developed the Ontogrator platform, which allows for rapid retrieval of data in a way that would be familiar to any online shopper. Using Knowledge Organization Systems (KOS), especially ontologies, Ontogrator uses text mining to mark up data and faceted browsing to help users navigate, query and retrieve data. Ontogrator offers the potential to impact scientific research in two major ways: 1) by significantly improving the retrieval of relevant information; and 2) by significantly reducing the time required to compose standard database queries and assemble information for further research. Here we present a pilot implementation developed in collaboration with the Genomic Standards Consortium (GSC) that includes content from the StrainInfo, GOLD, CAMERA, Silva and Pubmed databases. This implementation demonstrates the power of ontogration and highlights that the usefulness of this approach is fully dependent on both the quality of data and the KOS (ontologies) used. Ideally, the use and further expansion of this collaborative system will help to surface issues associated with the underlying quality of annotation and could lead to a systematic means for accessing integrated data resources.
Morrison, Norman; Hancock, David; Hirschman, Lynette; Dawyndt, Peter; Verslyppe, Bert; Kyrpides, Nikos; Kottmann, Renzo; Yilmaz, Pelin; Glöckner, Frank Oliver; Grethe, Jeff; Booth, Tim; Sterk, Peter; Nenadic, Goran; Field, Dawn
2011-01-01
In the future, we hope to see an open and thriving data market in which users can find and select data from a wide range of data providers. In such an open access market, data are products that must be packaged accordingly. Increasingly, eCommerce sellers present heterogeneous product lines to buyers using faceted browsing. Using this approach we have developed the Ontogrator platform, which allows for rapid retrieval of data in a way that would be familiar to any online shopper. Using Knowledge Organization Systems (KOS), especially ontologies, Ontogrator uses text mining to mark up data and faceted browsing to help users navigate, query and retrieve data. Ontogrator offers the potential to impact scientific research in two major ways: 1) by significantly improving the retrieval of relevant information; and 2) by significantly reducing the time required to compose standard database queries and assemble information for further research. Here we present a pilot implementation developed in collaboration with the Genomic Standards Consortium (GSC) that includes content from the StrainInfo, GOLD, CAMERA, Silva and Pubmed databases. This implementation demonstrates the power of ontogration and highlights that the usefulness of this approach is fully dependent on both the quality of data and the KOS (ontologies) used. Ideally, the use and further expansion of this collaborative system will help to surface issues associated with the underlying quality of annotation and could lead to a systematic means for accessing integrated data resources. PMID:21677865
Rossa, Carlos; Lehmann, Thomas; Sloboda, Ronald; Usmani, Nawaid; Tavakoli, Mahdi
2017-08-01
Global modelling has traditionally been the approach taken to estimate needle deflection in soft tissue. In this paper, we propose a new method based on local data-driven modelling of needle deflection. External measurement of needle-tissue interactions is collected from several insertions in ex vivo tissue to form a cloud of data. Inputs to the system are the needle insertion depth, axial rotations, and the forces and torques measured at the needle base by a force sensor. When a new insertion is performed, the just-in-time learning method estimates the model outputs given the current inputs to the needle-tissue system and the historical database. The query is compared to every observation in the database and is given weights according to some similarity criteria. Only a subset of historical data that is most relevant to the query is selected and a local linear model is fit to the selected points to estimate the query output. The model outputs the 3D deflection of the needle tip and the needle insertion force. The proposed approach is validated in ex vivo multilayered biological tissue in different needle insertion scenarios. Experimental results in five different case studies indicate an accuracy in predicting needle deflection of 0.81 and 1.24 mm in the horizontal and vertical lanes, respectively, and an accuracy of 0.5 N in predicting the needle insertion force over 216 needle insertions.
Statistically validated network of portfolio overlaps and systemic risk.
Gualdi, Stanislao; Cimini, Giulio; Primicerio, Kevin; Di Clemente, Riccardo; Challet, Damien
2016-12-21
Common asset holding by financial institutions (portfolio overlap) is nowadays regarded as an important channel for financial contagion with the potential to trigger fire sales and severe losses at the systemic level. We propose a method to assess the statistical significance of the overlap between heterogeneously diversified portfolios, which we use to build a validated network of financial institutions where links indicate potential contagion channels. The method is implemented on a historical database of institutional holdings ranging from 1999 to the end of 2013, but can be applied to any bipartite network. We find that the proportion of validated links (i.e. of significant overlaps) increased steadily before the 2007-2008 financial crisis and reached a maximum when the crisis occurred. We argue that the nature of this measure implies that systemic risk from fire sales liquidation was maximal at that time. After a sharp drop in 2008, systemic risk resumed its growth in 2009, with a notable acceleration in 2013. We finally show that market trends tend to be amplified in the portfolios identified by the algorithm, such that it is possible to have an informative signal about institutions that are about to suffer (enjoy) the most significant losses (gains).
NASA Astrophysics Data System (ADS)
Satyakumar, M.; Anil, R.; Sreeja, G. S.
2017-12-01
Traffic in Kerala has been growing at a rate of 10-11% every year, resulting severe congestion especially in urban areas. Because of the limitation of spaces it is not always possible to construct new roads. Road users rely on travel time information for journey planning and route choice decisions, while road system managers are increasingly viewing travel time as an important network performance indicator. More recently Advanced Traveler Information Systems (ATIS) are being developed to provide real-time information to roadway users. For ATIS various methodologies have been developed for dynamic travel time prediction. For this work the Kalman Filter Algorithm was selected for dynamic travel time prediction of different modes. The travel time data collected using handheld GPS device were used for prediction. Congestion Index were calculated and Range of CI values were determined according to the percentage speed drop. After prediction using Kalman Filter, the predicted values along with the GPS data was integrated to GIS and using Network Analysis of ArcGIS the offline route navigation guide was prepared. Using this database a program for route navigation based on travel time was developed. This system will help the travelers with pre-trip information.
Statistically validated network of portfolio overlaps and systemic risk
Gualdi, Stanislao; Cimini, Giulio; Primicerio, Kevin; Di Clemente, Riccardo; Challet, Damien
2016-01-01
Common asset holding by financial institutions (portfolio overlap) is nowadays regarded as an important channel for financial contagion with the potential to trigger fire sales and severe losses at the systemic level. We propose a method to assess the statistical significance of the overlap between heterogeneously diversified portfolios, which we use to build a validated network of financial institutions where links indicate potential contagion channels. The method is implemented on a historical database of institutional holdings ranging from 1999 to the end of 2013, but can be applied to any bipartite network. We find that the proportion of validated links (i.e. of significant overlaps) increased steadily before the 2007–2008 financial crisis and reached a maximum when the crisis occurred. We argue that the nature of this measure implies that systemic risk from fire sales liquidation was maximal at that time. After a sharp drop in 2008, systemic risk resumed its growth in 2009, with a notable acceleration in 2013. We finally show that market trends tend to be amplified in the portfolios identified by the algorithm, such that it is possible to have an informative signal about institutions that are about to suffer (enjoy) the most significant losses (gains). PMID:28000764
Dyson, Kirstie E; Bulling, Mark T; Solan, Martin; Hernandez-Milian, Gema; Raffaelli, David G; White, Piran C.L; Paterson, David M
2007-01-01
Despite the complexity of natural systems, heterogeneity caused by the fragmentation of habitats has seldom been considered when investigating ecosystem processes. Empirical approaches that have included the influence of heterogeneity tend to be biased towards terrestrial habitats; yet marine systems offer opportunities by virtue of their relative ease of manipulation, rapid response times and the well-understood effects of macrofauna on sediment processes. Here, the influence of heterogeneity on microphytobenthic production in synthetic estuarine assemblages is examined. Heterogeneity was created by enriching patches of sediment with detrital algae (Enteromorpha intestinalis) to provide a source of allochthonous organic matter. A gradient of species density for four numerically dominant intertidal macrofauna (Hediste diversicolor, Hydrobia ulvae, Corophium volutator, Macoma balthica) was constructed, and microphytobenthic biomass at the sediment surface was measured. Statistical analysis using generalized least squares regression indicated that heterogeneity within our system was a significant driving factor that interacted with macrofaunal density and species identity. Microphytobenthic biomass was highest in enriched patches, suggesting that nutrients were obtained locally from the sediment–water interface and not from the water column. Our findings demonstrate that organic enrichment can cause the development of heterogeneity which influences infaunal bioturbation and consequent nutrient generation, a driver of microphytobenthic production. PMID:17698480
Dyson, Kirstie E; Bulling, Mark T; Solan, Martin; Hernandez-Milian, Gema; Raffaelli, David G; White, Piran C L; Paterson, David M
2007-10-22
Despite the complexity of natural systems, heterogeneity caused by the fragmentation of habitats has seldom been considered when investigating ecosystem processes. Empirical approaches that have included the influence of heterogeneity tend to be biased towards terrestrial habitats; yet marine systems offer opportunities by virtue of their relative ease of manipulation, rapid response times and the well-understood effects of macrofauna on sediment processes. Here, the influence of heterogeneity on microphytobenthic production in synthetic estuarine assemblages is examined. Heterogeneity was created by enriching patches of sediment with detrital algae (Enteromorpha intestinalis) to provide a source of allochthonous organic matter. A gradient of species density for four numerically dominant intertidal macrofauna (Hediste diversicolor, Hydrobia ulvae, Corophium volutator, Macoma balthica) was constructed, and microphytobenthic biomass at the sediment surface was measured. Statistical analysis using generalized least squares regression indicated that heterogeneity within our system was a significant driving factor that interacted with macrofaunal density and species identity. Microphytobenthic biomass was highest in enriched patches, suggesting that nutrients were obtained locally from the sediment-water interface and not from the water column. Our findings demonstrate that organic enrichment can cause the development of heterogeneity which influences infaunal bioturbation and consequent nutrient generation, a driver of microphytobenthic production.
Spatial structure of soil properties at different scales of Mt. Kilimanjaro, Tanzania
NASA Astrophysics Data System (ADS)
Kühnel, Anna; Huwe, Bernd
2013-04-01
Soils of tropical mountain ecosystems provide important ecosystem services like water and carbon storage, water filtration and erosion control. As these ecosystems are threatened by global warming and the conversion of natural to human-modified landscapes, it is important to understand the implications of these changes. Within the DFG Research Unit "Kilimanjaro ecosystems under global change: Linking biodiversity, biotic interactions and biogeochemical ecosystem processes", we study the spatial heterogeneity of soils and the available water capacity for different land use systems. In the savannah zone of Mt. Kilimanjaro, maize fields are compared to natural savannah ecosystems. In the lower montane forest zone, coffee plantations, traditional home gardens, grasslands and natural forests are studied. We characterize the soils with respect to soil hydrology, emphasizing on the spatial variability of soil texture and bulk density at different scales. Furthermore soil organic carbon and nitrogen, cation exchange capacity and the pH-value are measured. Vis/Nir-Spectroscopy is used to detect small scale physical and chemical heterogeneity within soil profiles, as well as to get information of soil properties on a larger scale. We aim to build a spectral database for these soil properties for the Kilimanjaro region in order to get rapid information for geostatistical analysis. Partial least square regression with leave one out cross validation is used for model calibration. Results for silt and clay content, as well as carbon and nitrogen content are promising, with adjusted R² ranging from 0.70 for silt to 0.86 for nitrogen. Furthermore models for other nutrients, cation exchange capacity and available water capacity will be calibrated. We compare heterogeneity within and across the different ecosystems and state that spatial structure characteristics and complexity patterns in soil parameters can be quantitatively related to biodiversity and functional diversity parameters.
Xu, Xin; Xie, Yanqi; Lin, Yiwei; Xu, Xianglai; Zhu, Yi; Mao, Yeqing; Hu, Zhenghui; Wu, Jian; Chen, Hong; Zheng, Xiangyi; Qin, Jie; Xie, Liping
2012-12-01
Plasminogen activator inhibitor-1 (PAI-1), belonging to the urokinase plasminogen activation (uPA) system, is involved in cancer development and progression. The PAI-1 promoter 4G/5G polymorphism was shown to contribute to genetic susceptibility to cancer, although the results were inconsistent. To assess this relationship more precisely, a meta-analysis was performed. The electronic databases PubMed, Scopus, Web of Science and Chinese National Knowledge Infrastructure (CNKI) were searched; data were extracted and analyzed independently by two reviewers. Ultimately, 21 eligible case-control studies with a total of 8,415 cancer cases and 9,208 controls were included. The overall odds ratio (OR) with its 95% confidence interval (CI) showed a statistically significant association between the PAI-1 promoter 4G/5G polymorphism and cancer risk (4G/4G vs. 5G/5G: OR=1.25, 95% CI=1.07-1.47, P(heterogeneity)=0.001; 4G/4G vs. 4G/5G+5G/5G: OR=1.10, 95% CI=1.03-1.17, P(heterogeneity)=0.194; 4G/4G+4G/5G vs. 5G/5G: OR=1.17, 95% CI=1.01-1.35, P(heterogeneity)=0.041). In further subgroup analyses, the increased risk of cancer was observed in a subgroup of Caucasians with regards to endometrial cancer. Our meta-analysis suggests that the PAI-1 4G/5G polymorphism most likely contributes to susceptibility to cancer, particularly in Caucasians. Furthermore, the 4G allele may be associated with an increased risk of endometrial cancer.
2011-01-01
Introduction Infection is a major cause of morbidity and mortality in patients with rheumatoid arthritis (RA). The objective of this study was to perform a systematic review and meta-analysis of the effect of glucocorticoid (GC) therapy on the risk of infection in patients with RA. Methods A systematic review was conducted by using MEDLINE, EMBASE, CINAHL, and the Cochrane Central Register of Controlled Trials database to January 2010 to identify studies among populations of patients with RA that reported a comparison of infection incidence between patients treated with GC therapy and patients not exposed to GC therapy. Results In total, 21 randomised controlled trials (RCTs) and 42 observational studies were included. In the RCTs, GC therapy was not associated with a risk of infection (relative risk (RR), 0.97 (95% CI, 0.69, 1.36)). Small numbers of events in the RCTs meant that a clinically important increased or decreased risk could not be ruled out. The observational studies generated a RR of 1.67 (1.49, 1.87), although significant heterogeneity was present. The increased risk (and heterogeneity) persisted when analyses were stratified by varying definitions of exposure, outcome, and adjustment for confounders. A positive dose-response effect was seen. Conclusions Whereas observational studies suggested an increased risk of infection with GC therapy, RCTs suggested no increased risk. Inconsistent reporting of safety outcomes in the RCTs, as well as marked heterogeneity, probable residual confounding, and publication bias in the observational studies, limits the opportunity for a definitive conclusion. Clinicians should remain vigilant for infection in patients with RA treated with GC therapy. PMID:21884589
Bissell, E.G.; Aichele, Stephen S.
2004-01-01
About 400,000 residents of Oakland County, Mich., rely on ground water for their primary drinking-water supply. More than 90 percent of these residents draw ground water from the shallow glacial drift aquifer. Understanding the vertical hydraulic conductivity of the shallow glacial drift aquifer is important both in identifying areas of ground-water recharge and in evaluating susceptibility to contamination. The geologic environment throughout much of the county, however, is poorly understood and heterogeneous, making conventional aquifer mapping techniques difficult. Geostatistical procedures are therefore used to describe the effective vertical hydraulic conductivity of the top 50 ft of the glacial deposits and to predict the probability of finding a potentially protective confining layer at a given location. The results presented synthesize the available well-log data; however, only about 40 percent of the explainable variation in the dataset is accounted for, making the results more qualitative than quantitative. Most of the variation in the effective vertical hydraulic conductivity cannot be explained with the well-log data currently available (as of 2004). Although the geologic environment is heterogeneous, the quality-assurance process indicated that more than half of the wells in the county’s Wellkey database (statewide database for monitoring drinking-water wells) had inconsistent identifications of lithology.
Cascade heterogeneous face sketch-photo synthesis via dual-scale Markov Network
NASA Astrophysics Data System (ADS)
Yao, Saisai; Chen, Zhenxue; Jia, Yunyi; Liu, Chengyun
2018-03-01
Heterogeneous face sketch-photo synthesis is an important and challenging task in computer vision, which has widely applied in law enforcement and digital entertainment. According to the different synthesis results based on different scales, this paper proposes a cascade sketch-photo synthesis method via dual-scale Markov Network. Firstly, Markov Network with larger scale is used to synthesise the initial sketches and the local vertical and horizontal neighbour search (LVHNS) method is used to search for the neighbour patches of test patches in training set. Then, the initial sketches and test photos are jointly entered into smaller scale Markov Network. Finally, the fine sketches are obtained after cascade synthesis process. Extensive experimental results on various databases demonstrate the superiority of the proposed method compared with several state-of-the-art methods.
Névéol, Aurélie; Wilbur, W John; Lu, Zhiyong
2012-01-01
High-throughput experiments and bioinformatics techniques are creating an exploding volume of data that are becoming overwhelming to keep track of for biologists and researchers who need to access, analyze and process existing data. Much of the available data are being deposited in specialized databases, such as the Gene Expression Omnibus (GEO) for microarrays or the Protein Data Bank (PDB) for protein structures and coordinates. Data sets are also being described by their authors in publications archived in literature databases such as MEDLINE and PubMed Central. Currently, the curation of links between biological databases and the literature mainly relies on manual labour, which makes it a time-consuming and daunting task. Herein, we analysed the current state of link curation between GEO, PDB and MEDLINE. We found that the link curation is heterogeneous depending on the sources and databases involved, and that overlap between sources is low, <50% for PDB and GEO. Furthermore, we showed that text-mining tools can automatically provide valuable evidence to help curators broaden the scope of articles and database entries that they review. As a result, we made recommendations to improve the coverage of curated links, as well as the consistency of information available from different databases while maintaining high-quality curation. Database URLs: http://www.ncbi.nlm.nih.gov/PubMed, http://www.ncbi.nlm.nih.gov/geo/, http://www.rcsb.org/pdb/
Névéol, Aurélie; Wilbur, W. John; Lu, Zhiyong
2012-01-01
High-throughput experiments and bioinformatics techniques are creating an exploding volume of data that are becoming overwhelming to keep track of for biologists and researchers who need to access, analyze and process existing data. Much of the available data are being deposited in specialized databases, such as the Gene Expression Omnibus (GEO) for microarrays or the Protein Data Bank (PDB) for protein structures and coordinates. Data sets are also being described by their authors in publications archived in literature databases such as MEDLINE and PubMed Central. Currently, the curation of links between biological databases and the literature mainly relies on manual labour, which makes it a time-consuming and daunting task. Herein, we analysed the current state of link curation between GEO, PDB and MEDLINE. We found that the link curation is heterogeneous depending on the sources and databases involved, and that overlap between sources is low, <50% for PDB and GEO. Furthermore, we showed that text-mining tools can automatically provide valuable evidence to help curators broaden the scope of articles and database entries that they review. As a result, we made recommendations to improve the coverage of curated links, as well as the consistency of information available from different databases while maintaining high-quality curation. Database URLs: http://www.ncbi.nlm.nih.gov/PubMed, http://www.ncbi.nlm.nih.gov/geo/, http://www.rcsb.org/pdb/ PMID:22685160
United States Army Medical Materiel Development Activity: 1997 Annual Report.
1997-01-01
business planning and execution information management system (Project Management Division Database ( PMDD ) and Product Management Database System (PMDS...MANAGEMENT • Project Management Division Database ( PMDD ), Product Management Database System (PMDS), and Special Users Database System:The existing...System (FMS), were investigated. New Product Managers and Project Managers were added into PMDS and PMDD . A separate division, Support, was
NASA Astrophysics Data System (ADS)
Choo, Seongho; Li, Vitaly; Choi, Dong Hee; Jung, Gi Deck; Park, Hong Seong; Ryuh, Youngsun
2005-12-01
On developing the personal robot system presently, the internal architecture is every module those occupy separated functions are connected through heterogeneous network system. This module-based architecture supports specialization and division of labor at not only designing but also implementation, as an effect of this architecture, it can reduce developing times and costs for modules. Furthermore, because every module is connected among other modules through network systems, we can get easy integrations and synergy effect to apply advanced mutual functions by co-working some modules. In this architecture, one of the most important technologies is the network middleware that takes charge communications among each modules connected through heterogeneous networks systems. The network middleware acts as the human nerve system inside of personal robot system; it relays, transmits, and translates information appropriately between modules that are similar to human organizations. The network middleware supports various hardware platform, heterogeneous network systems (Ethernet, Wireless LAN, USB, IEEE 1394, CAN, CDMA-SMS, RS-232C). This paper discussed some mechanisms about our network middleware to intercommunication and routing among modules, methods for real-time data communication and fault-tolerant network service. There have designed and implemented a layered network middleware scheme, distributed routing management, network monitoring/notification technology on heterogeneous networks for these goals. The main theme is how to make routing information in our network middleware. Additionally, with this routing information table, we appended some features. Now we are designing, making a new version network middleware (we call 'OO M/W') that can support object-oriented operation, also are updating program sources itself for object-oriented architecture. It is lighter, faster, and can support more operation systems and heterogeneous network systems, but other general purposed middlewares like CORBA, UPnP, etc. can support only one network protocol or operating system.
Opportunities for the Mashup of Heterogenous Data Server via Semantic Web Technology
NASA Astrophysics Data System (ADS)
Ritschel, Bernd; Seelus, Christoph; Neher, Günther; Iyemori, Toshihiko; Koyama, Yukinobu; Yatagai, Akiyo; Murayama, Yasuhiro; King, Todd; Hughes, John; Fung, Shing; Galkin, Ivan; Hapgood, Michael; Belehaki, Anna
2015-04-01
Opportunities for the Mashup of Heterogenous Data Server via Semantic Web Technology European Union ESPAS, Japanese IUGONET and GFZ ISDC data server are developed for the ingestion, archiving and distributing of geo and space science domain data. Main parts of the data -managed by the mentioned data server- are related to near earth-space and geomagnetic field data. A smart mashup of the data server would allow a seamless browse and access to data and related context information. However the achievement of a high level of interoperability is a challenge because the data server are based on different data models and software frameworks. This paper is focused on the latest experiments and results for the mashup of the data server using the semantic Web approach. Besides the mashup of domain and terminological ontologies, especially the options to connect data managed by relational databases using D2R server and SPARQL technology will be addressed. A successful realization of the data server mashup will not only have a positive impact to the data users of the specific scientific domain but also to related projects, such as e.g. the development of a new interoperable version of NASA's Planetary Data System (PDS) or ICUS's World Data System alliance. ESPAS data server: https://www.espas-fp7.eu/portal/ IUGONET data server: http://search.iugonet.org/iugonet/ GFZ ISDC data server (semantic Web based prototype): http://rz-vm30.gfz-potsdam.de/drupal-7.9/ NASA PDS: http://pds.nasa.gov ICSU-WDS: https://www.icsu-wds.org
Six methodological steps to build medical data warehouses for research.
Szirbik, N B; Pelletier, C; Chaussalet, T
2006-09-01
We propose a simple methodology for heterogeneous data collection and central repository-style database design in healthcare. Our method can be used with or without other software development frameworks, and we argue that its application can save a relevant amount of implementation effort. Also, we believe that the method can be used in other fields of research, especially those that have a strong interdisciplinary nature. The idea emerged during a healthcare research project, which consisted among others in grouping information from heterogeneous and distributed information sources. We developed this methodology by the lessons learned when we had to build a data repository, containing information about elderly patients flows in the UK's long-term care system (LTC). We explain thoroughly those aspects that influenced the methodology building. The methodology is defined by six steps, which can be aligned with various iterative development frameworks. We describe here the alignment of our methodology with the RUP (rational unified process) framework. The methodology emphasizes current trends, as early identification of critical requirements, data modelling, close and timely interaction with users and stakeholders, ontology building, quality management, and exception handling. Of a special interest is the ontological engineering aspect, which had the effects with the highest impact after the project. That is, it helped stakeholders to perform better collaborative negotiations that brought better solutions for the overall system investigated. An insight into the problems faced by others helps to lead the negotiators to win-win situations. We consider that this should be the social result of any project that collects data for better decision making that leads finally to enhanced global outcomes.
BiologicalNetworks 2.0 - an integrative view of genome biology data
2010-01-01
Background A significant problem in the study of mechanisms of an organism's development is the elucidation of interrelated factors which are making an impact on the different levels of the organism, such as genes, biological molecules, cells, and cell systems. Numerous sources of heterogeneous data which exist for these subsystems are still not integrated sufficiently enough to give researchers a straightforward opportunity to analyze them together in the same frame of study. Systematic application of data integration methods is also hampered by a multitude of such factors as the orthogonal nature of the integrated data and naming problems. Results Here we report on a new version of BiologicalNetworks, a research environment for the integral visualization and analysis of heterogeneous biological data. BiologicalNetworks can be queried for properties of thousands of different types of biological entities (genes/proteins, promoters, COGs, pathways, binding sites, and other) and their relations (interactions, co-expression, co-citations, and other). The system includes the build-pathways infrastructure for molecular interactions/relations and module discovery in high-throughput experiments. Also implemented in BiologicalNetworks are the Integrated Genome Viewer and Comparative Genomics Browser applications, which allow for the search and analysis of gene regulatory regions and their conservation in multiple species in conjunction with molecular pathways/networks, experimental data and functional annotations. Conclusions The new release of BiologicalNetworks together with its back-end database introduces extensive functionality for a more efficient integrated multi-level analysis of microarray, sequence, regulatory, and other data. BiologicalNetworks is freely available at http://www.biologicalnetworks.org. PMID:21190573
The ESIS query environment pilot project
NASA Technical Reports Server (NTRS)
Fuchs, Jens J.; Ciarlo, Alessandro; Benso, Stefano
1993-01-01
The European Space Information System (ESIS) was originally conceived to provide the European space science community with simple and efficient access to space data archives, facilities with which to examine and analyze the retrieved data, and general information services. To achieve that ESIS will provide the scientists with a discipline specific environment for querying in a uniform and transparent manner data stored in geographically dispersed archives. Furthermore it will provide discipline specific tools for displaying and analyzing the retrieved data. The central concept of ESIS is to achieve a more efficient and wider usage of space scientific data, while maintaining the physical archives at the institutions which created them, and has the best background for ensuring and maintaining the scientific validity and interest of the data. In addition to coping with the physical distribution of data, ESIS is to manage also the heterogenity of the individual archives' data models, formats and data base management systems. Thus the ESIS system shall appear to the user as a single database, while it does in fact consist of a collection of dispersed and locally managed databases and data archives. The work reported in this paper is one of the results of the ESIS Pilot Project which is to be completed in 1993. More specifically it presents the pilot ESIS Query Environment (ESIS QE) system which forms the data retrieval and data dissemination axis of the ESIS system. The others are formed by the ESIS Correlation Environment (ESIS CE) and the ESIS Information Services. The ESIS QE Pilot Project is carried out for the European Space Agency's Research and Information center, ESRIN, by a Consortium consisting of Computer Resources International, Denmark, CISET S.p.a, Italy, the University of Strasbourg, France and the Rutherford Appleton Laboratories in the U.K. Furthermore numerous scientists both within ESA and space science community in Europe have been involved in defining the core concepts of the ESIS system.
Kinoshita, Moritoshi; Higashihara, Eiji; Kawano, Haruna; Higashiyama, Ryo; Koga, Daisuke; Fukui, Takafumi; Gondo, Nobuhisa; Oka, Takehiko; Kawahara, Kozo; Rigo, Krisztina; Hague, Tim; Katsuragi, Kiyonori; Sudo, Kimiyoshi; Takeshi, Masahiko; Horie, Shigeo; Nutahara, Kikuo
2016-01-01
Genetic testing of PKD1 and PKD2 is expected to play an increasingly important role in determining allelic influences in autosomal dominant polycystic kidney disease (ADPKD) in the near future. However, to date, genetic testing is not commonly employed because it is expensive, complicated because of genetic heterogeneity, and does not easily identify pathogenic variants. In this study, we developed a genetic testing system based on next-generation sequencing (NGS), long-range polymerase chain reaction, and a new software package. The new software package integrated seven databases and provided access to five cloud-based computing systems. The database integrated 241 polymorphic nonpathogenic variants detected in 140 healthy Japanese volunteers aged >35 years, who were confirmed by ultrasonography as having no cysts in either kidney. Using this system, we identified 60 novel and 30 known pathogenic mutations in 101 Japanese patients with ADPKD, with an overall detection rate of 89.1% (90/101) [95% confidence interval (CI), 83.0%-95.2%]. The sensitivity of the system increased to 93.1% (94/101) (95% CI, 88.1%-98.0%) when combined with multiplex ligation-dependent probe amplification analysis, making it sufficient for use in a clinical setting. In 82 (87.2%) of the patients, pathogenic mutations were detected in PKD1 (95% CI, 79.0%-92.5%), whereas in 12 (12.8%) patients pathogenic mutations were detected in PKD2 (95% CI, 7.5%-21.0%); this is consistent with previously reported findings. In addition, we were able to reconfirm our pathogenic mutation identification results using Sanger sequencing. In conclusion, we developed a high-sensitivity NGS-based system and successfully employed it to identify pathogenic mutations in PKD1 and PKD2 in Japanese patients with ADPKD.
Database of amino acid-nucleotide contacts in contacts in DNA-homeodomain protein
NASA Astrophysics Data System (ADS)
Grokhlina, T. I.; Zrelov, P. V.; Ivanov, V. V.; Polozov, R. V.; Chirgadze, Yu. N.; Sivozhelezov, V. S.
2013-09-01
The analysis of amino acid-nucleotide contacts in interfaces of the protein-DNA complexes, intended to find consistencies in the protein-DNA recognition, is a complex problem that requires an analysis of the physicochemical characteristics of these contacts and the positions of the participating amino acids and nucleotides in the chains of the protein and the DNA, respectively, as well as conservatism of these contacts. Thus, those heterogeneous data should be systematized. For this purpose we have developed a database of amino acid-nucleotide contacts ANTPC (Amino acid Nucleotide Type Position Conservation) following the archetypal example of the proteins in the homeodomain family. We show that it can be used to compare and classify the interfaces of the protein-DNA complexes.
Performance assessment of EMR systems based on post-relational database.
Yu, Hai-Yan; Li, Jing-Song; Zhang, Xiao-Guang; Tian, Yu; Suzuki, Muneou; Araki, Kenji
2012-08-01
Post-relational databases provide high performance and are currently widely used in American hospitals. As few hospital information systems (HIS) in either China or Japan are based on post-relational databases, here we introduce a new-generation electronic medical records (EMR) system called Hygeia, which was developed with the post-relational database Caché and the latest platform Ensemble. Utilizing the benefits of a post-relational database, Hygeia is equipped with an "integration" feature that allows all the system users to access data-with a fast response time-anywhere and at anytime. Performance tests of databases in EMR systems were implemented in both China and Japan. First, a comparison test was conducted between a post-relational database, Caché, and a relational database, Oracle, embedded in the EMR systems of a medium-sized first-class hospital in China. Second, a user terminal test was done on the EMR system Izanami, which is based on the identical database Caché and operates efficiently at the Miyazaki University Hospital in Japan. The results proved that the post-relational database Caché works faster than the relational database Oracle and showed perfect performance in the real-time EMR system.
Energy-aware Thread and Data Management in Heterogeneous Multi-core, Multi-memory Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Su, Chun-Yi
By 2004, microprocessor design focused on multicore scaling—increasing the number of cores per die in each generation—as the primary strategy for improving performance. These multicore processors typically equip multiple memory subsystems to improve data throughput. In addition, these systems employ heterogeneous processors such as GPUs and heterogeneous memories like non-volatile memory to improve performance, capacity, and energy efficiency. With the increasing volume of hardware resources and system complexity caused by heterogeneity, future systems will require intelligent ways to manage hardware resources. Early research to improve performance and energy efficiency on heterogeneous, multi-core, multi-memory systems focused on tuning a single primitivemore » or at best a few primitives in the systems. The key limitation of past efforts is their lack of a holistic approach to resource management that balances the tradeoff between performance and energy consumption. In addition, the shift from simple, homogeneous systems to these heterogeneous, multicore, multi-memory systems requires in-depth understanding of efficient resource management for scalable execution, including new models that capture the interchange between performance and energy, smarter resource management strategies, and novel low-level performance/energy tuning primitives and runtime systems. Tuning an application to control available resources efficiently has become a daunting challenge; managing resources in automation is still a dark art since the tradeoffs among programming, energy, and performance remain insufficiently understood. In this dissertation, I have developed theories, models, and resource management techniques to enable energy-efficient execution of parallel applications through thread and data management in these heterogeneous multi-core, multi-memory systems. I study the effect of dynamic concurrent throttling on the performance and energy of multi-core, non-uniform memory access (NUMA) systems. I use critical path analysis to quantify memory contention in the NUMA memory system and determine thread mappings. In addition, I implement a runtime system that combines concurrent throttling and a novel thread mapping algorithm to manage thread resources and improve energy efficient execution in multi-core, NUMA systems.« less
Li, Jin-Quan; Guo, Wen; Sun, Ze-Gan; Huang, Qing-Song; Lee, En Yeong; Wang, Ying; Yao, Xiao-Dong
2017-08-01
Cupping therapy is widely used in East Asia, the Middle East, or Central and North Europe to manage the symptom of knee osteoarthritis (KOA). The purpose of this systematic review was to evaluate the available evidence from randomized controlled trials (RCTs) of cupping therapy for treating patients with KOA. The following databases were searched from their inception until January 2017: PubMed, Embase, the Cochrane Central Register of Controlled Trials and four Chinese databases [WanFang Med Database, Chinese BioMedical Database, Chinese WeiPu Database, and China National Knowledge Infrastructure (CNKI)]. Only the RCTs related to the effects of cupping therapy on KOA were included in this systematic review. A quantitative synthesis of RCTs will be conducted using RevMan 5.3 software. Study selection, data extraction, and validation was performed independently by two reviewers. Cochrane criteria for risk-of-bias were used to assess the methodological quality of the trials. Seven RCTs met the inclusion criteria, and most were of low methodological quality. Study participants in the dry cupping therapy plus the Western medicine therapy group showed significantly greater improvements in the pain [MD = -1.01, 95%CI (-1.61, -0.41), p < 0.01], stiffness [MD = -0.81, 95%CI (-1.14, -0.48), p < 0.01] and physical function [MD = -5.53, 95%CI (-8.58, -2.47), p < 0.01] domains of Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC) compared to participants in the Western medicine therapy group, with low heterogeneity (Chi 2 = 0.00 p = 1.00, I 2 = 0% in pain; Chi 2 = 0.45 p = 0.50, I 2 = 0% in stiffness; Chi 2 = 1.09 p = 0.30, I 2 = 9% in physical function). However, it failed to do so on a Visual Analog Scale (VAS) [MD = -0.32, 95%CI (-0.70, 0.05), p = 0.09]. In addition, when compared with Western medicine therapy alone, meta-analysis of four RCTs suggested favorable statistically significant effects of wet cupping therapy plus western medicine on response rate [MD = 1.06, 95%CI (1.01, 1.12), p = 0.03; heterogeneity: Chi 2 = 1.13, p = 0.77, I 2 = 0%] and Lequesne Algofunctional Index (LAI) [MD = -2.74, 95%CI (-3.41, -2.07), p < 0.01; heterogeneity: Chi 2 = 2.03, p = 0.57, I 2 = 0% ]. Only weak evidence can support the hypothesis that cupping therapy can effectively improve the treatment efficacy and physical function in patients with KOA. Copyright © 2017. Published by Elsevier Ltd.
Ren, Qing; Yu, Xinyu; Liao, Fujiu; Chen, Xiaofan; Yan, Dongmei; Nie, Heyun; Fang, Jinju; Yang, Meng; Zhou, Xu
2018-05-01
In East Asia, Gua Sha therapy is widely used in patients with perimenopausal syndrome. The goal of this systematic review was to evaluate the available evidence from randomized controlled trials (RCTs) of Gua Sha therapy for the treatment of patients with perimenopausal syndrome. Databases searched from inception until June 2017 included: PubMed, Embase, the Cochrane Central Register of Controlled Trials and four Chinese databases [WanFang Med Database, Chinese BioMedical Database, Chinese WeiPu Database, and the China National Knowledge Infrastructure (CNKI)]. Only the RCTs related to the effects of Gua Sha therapy on perimenopausal syndrome were included in this systematic review. A quantitative analysis of RCTs was employed using RevMan 5.3 software. Study selection, data extraction, and validation were performed by two independent reviewers. Cochrane criteria for risk-of-bias were used to evaluate the methodological quality of the trials. A total of 6 RCTs met the inclusion criteria, and most were of low methodological quality. When compared with Western medicine therapy alone, meta-analysis of 5 RCTs indicated favorable statistically significant effects of Gua Sha therapy plus Western medicine on the Kupperman Menopausal Index (KMI) Score [mean difference (MD) = -4.57, 95% confidence interval (CI) (-5.37, -3.77), p < 0.01; heterogeneity: Chi 2 = 29.57 p < 0.01, I 2 = 86%]. Moreover, study participants who received Gua Sha therapy plus Western medicine therapy showed significantly greater improvements in serum levels of follicle-stimulating hormone (FSH) [MD = -5.00, 95% CI (-9.60, -0.40), p = 0.03], luteinizing hormone (LH) [MD = -4.00, 95% CI (-7.67, -0.33), p = 0.03], and E 2 [MD = -6.60, 95% CI (-12.32, -0.88), p = 0.02] compared to participants in the Western medicine therapy group, with a low heterogeneity (Chi 2 = 0.12, p = 0.94, I 2 = 0% in FSH; Chi 2 = 0.19 p = 0.91, I 2 = 0% in LH; Chi 2 = 0.93, p = 0.63, I 2 = 0% in E 2 ). In addition, the pooled results displayed favorable significant effects of Gua Sha therapy plus the Western medicine therapy on the MENQOL scale when compared with the Western medicine therapy alone [MD = -5.13, 95% CI (-7.45, -2.81), p < 0.01] with low heterogeneity (Chi 2 = 0.66, p = 0.42, I 2 = 0%). Preliminary evidence supported the hypothesis that Gua Sha therapy effectively improved the treatment efficacy in patients with perimenopausal syndrome. Additional studies will be required to elucidate optimal frequency and dosage of Gua Sha. Copyright © 2018 Elsevier Ltd. All rights reserved.
Alving, Berit Elisabeth; Christensen, Janne Buck; Thrysøe, Lars
2018-03-01
The purpose of this literature review is to provide an overview of the information retrieval behaviour of clinical nurses, in terms of the use of databases and other information resources and their frequency of use. Systematic searches carried out in five databases and handsearching were used to identify the studies from 2010 to 2016, with a populations, exposures and outcomes (PEO) search strategy, focusing on the question: In which databases or other information resources do hospital nurses search for evidence based information, and how often? Of 5272 titles retrieved based on the search strategy, only nine studies fulfilled the criteria for inclusion. The studies are from the United States, Canada, Taiwan and Nigeria. The results show that hospital nurses' primary choice of source for evidence based information is Google and peers, while bibliographic databases such as PubMed are secondary choices. Data on frequency are only included in four of the studies, and data are heterogenous. The reasons for choosing Google and peers are primarily lack of time; lack of information; lack of retrieval skills; or lack of training in database searching. Only a few studies are published on clinical nurses' retrieval behaviours, and more studies are needed from Europe and Australia. © 2018 Health Libraries Group.
A Visual Interface for Querying Heterogeneous Phylogenetic Databases.
Jamil, Hasan M
2017-01-01
Despite the recent growth in the number of phylogenetic databases, access to these wealth of resources remain largely tool or form-based interface driven. It is our thesis that the flexibility afforded by declarative query languages may offer the opportunity to access these repositories in a better way, and to use such a language to pose truly powerful queries in unprecedented ways. In this paper, we propose a substantially enhanced closed visual query language, called PhyQL, that can be used to query phylogenetic databases represented in a canonical form. The canonical representation presented helps capture most phylogenetic tree formats in a convenient way, and is used as the storage model for our PhyloBase database for which PhyQL serves as the query language. We have implemented a visual interface for the end users to pose PhyQL queries using visual icons, and drag and drop operations defined over them. Once a query is posed, the interface translates the visual query into a Datalog query for execution over the canonical database. Responses are returned as hyperlinks to phylogenies that can be viewed in several formats using the tree viewers supported by PhyloBase. Results cached in PhyQL buffer allows secondary querying on the computed results making it a truly powerful querying architecture.
Measuring allostatic load in the workforce: a systematic review
MAUSS, Daniel; LI, Jian; SCHMIDT, Burkhard; ANGERER, Peter; JARCZOK, Marc N.
2014-01-01
The Allostatic Load Index (ALI) has been used to establish associations between stress and health-related outcomes. This review summarizes the measurement and methodological challenges of allostatic load in occupational settings. Databases of Medline, PubPsych, and Cochrane were searched to systematically explore studies measuring ALI in working adults following the PRISMA statement. Study characteristics, biomarkers and methods were tabulated. Methodological quality was evaluated using a standardized checklist. Sixteen articles (2003–2013) met the inclusion criteria, with a total of 39 (range 6–17) different variables used to calculate ALI. Substantial heterogeneity was observed in the number and type of biomarkers used, the analytic techniques applied and study quality. Particularly, primary mediators were not regularly included in ALI calculation. Consensus on methods to measure ALI in working populations is limited. Research should include longitudinal studies using multi-systemic variables to measure employees at risk for biological wear and tear. PMID:25224337
Processing of the WLCG monitoring data using NoSQL
NASA Astrophysics Data System (ADS)
Andreeva, J.; Beche, A.; Belov, S.; Dzhunov, I.; Kadochnikov, I.; Karavakis, E.; Saiz, P.; Schovancova, J.; Tuckett, D.
2014-06-01
The Worldwide LHC Computing Grid (WLCG) today includes more than 150 computing centres where more than 2 million jobs are being executed daily and petabytes of data are transferred between sites. Monitoring the computing activities of the LHC experiments, over such a huge heterogeneous infrastructure, is extremely demanding in terms of computation, performance and reliability. Furthermore, the generated monitoring flow is constantly increasing, which represents another challenge for the monitoring systems. While existing solutions are traditionally based on Oracle for data storage and processing, recent developments evaluate NoSQL for processing large-scale monitoring datasets. NoSQL databases are getting increasingly popular for processing datasets at the terabyte and petabyte scale using commodity hardware. In this contribution, the integration of NoSQL data processing in the Experiment Dashboard framework is described along with first experiences of using this technology for monitoring the LHC computing activities.
Predicting adverse hemodynamic events in critically ill patients.
Yoon, Joo H; Pinsky, Michael R
2018-06-01
The art of predicting future hemodynamic instability in the critically ill has rapidly become a science with the advent of advanced analytical processed based on computer-driven machine learning techniques. How these methods have progressed beyond severity scoring systems to interface with decision-support is summarized. Data mining of large multidimensional clinical time-series databases using a variety of machine learning tools has led to our ability to identify alert artifact and filter it from bedside alarms, display real-time risk stratification at the bedside to aid in clinical decision-making and predict the subsequent development of cardiorespiratory insufficiency hours before these events occur. This fast evolving filed is primarily limited by linkage of high-quality granular to physiologic rationale across heterogeneous clinical care domains. Using advanced analytic tools to glean knowledge from clinical data streams is rapidly becoming a reality whose clinical impact potential is great.
Post-conviction DNA testing: the UK's first ‘exoneration’ case?
Johnson, Paul; Williams, Robin
2005-01-01
The routine incorporation of forensic DNA profiling into the criminal justice systems of the United Kingdom has been widely promoted as a device for improving the quality of investigative and prosecutorial processes. From its first uses in the 1980s, in cases of serious crime, to the now daily collection, analysis and comparison of genetic samples in the National DNA Database, DNA profiling has become a standard instrument of policing and a powerful evidential resource for prosecutors. However, the use of post-conviction DNA testing has, until recently, been uncommon in the United Kingdom. This paper explores the first case, in England, of the contribution of DNA profiling to a successful appeal against conviction by an imprisoned offender. Analysis of the details of this case is used to emphasise the ways in which novel forms of scientific evidence remain subject to traditional and heterogeneous tests of relevance and credibility. PMID:15112595
Development of expert systems for analyzing electronic documents
NASA Astrophysics Data System (ADS)
Abeer Yassin, Al-Azzawi; Shidlovskiy, S.; Jamal, A. A.
2018-05-01
The paper analyses a Database Management System (DBMS). Expert systems, Databases, and database technology have become an essential component of everyday life in the modern society. As databases are widely used in every organization with a computer system, data resource control and data management are very important [1]. DBMS is the most significant tool developed to serve multiple users in a database environment consisting of programs that enable users to create and maintain a database. This paper focuses on development of a database management system for General Directorate for education of Diyala in Iraq (GDED) using Clips, java Net-beans and Alfresco and system components, which were previously developed in Tomsk State University at the Faculty of Innovative Technology.
Occupational styrene exposure and acquired dyschromatopsia: A systematic review and meta-analysis.
Choi, Ariel R; Braun, Joseph M; Papandonatos, George D; Greenberg, Paul B
2017-11-01
Styrene is a chemical used in the manufacture of plastic-based products worldwide. We systematically reviewed eligible studies of occupational styrene-induced dyschromatopsia, qualitatively synthesizing their findings and estimating the exposure effect through meta-analysis. PubMed, EMBASE, and Web of Science databases were queried for eligible studies. Using a random effects model, we compared measures of dyschromatopsia between exposed and non-exposed workers to calculate the standardized mean difference (Hedges'g). We also assessed between-study heterogeneity and publication bias. Styrene-exposed subjects demonstrated poorer color vision than did the non-exposed (Hedges' g = 0.56; 95%CI: 0.37, 0.76; P < 0.0001). A non-significant Cochran's Q test result (Q = 23.2; P = 0.171) and an I 2 of 32.2% (0.0%, 69.9%) indicated low-to-moderate between-study heterogeneity. Funnel plot and trim-and-fill analyses suggested publication bias. This review confirms the hypothesis of occupational styrene-induced dyschromatopsia, suggesting a modest effect size with mild heterogeneity between studies. © 2017 Wiley Periodicals, Inc.
Adaptive Control of Synchronization in Delay-Coupled Heterogeneous Networks of FitzHugh-Nagumo Nodes
NASA Astrophysics Data System (ADS)
Plotnikov, S. A.; Lehnert, J.; Fradkov, A. L.; Schöll, E.
We study synchronization in delay-coupled neural networks of heterogeneous nodes. It is well known that heterogeneities in the nodes hinder synchronization when becoming too large. We show that an adaptive tuning of the overall coupling strength can be used to counteract the effect of the heterogeneity. Our adaptive controller is demonstrated on ring networks of FitzHugh-Nagumo systems which are paradigmatic for excitable dynamics but can also — depending on the system parameters — exhibit self-sustained periodic firing. We show that the adaptively tuned time-delayed coupling enables synchronization even if parameter heterogeneities are so large that excitable nodes coexist with oscillatory ones.
Environment/Health/Safety (EHS): Databases
Hazard Documents Database Biosafety Authorization System CATS (Corrective Action Tracking System) (for findings 12/2005 to present) Chemical Management System Electrical Safety Ergonomics Database (for new Learned / Best Practices REMS - Radiation Exposure Monitoring System SJHA Database - Subcontractor Job
Statistically Validated Networks in Bipartite Complex Systems
Tumminello, Michele; Miccichè, Salvatore; Lillo, Fabrizio; Piilo, Jyrki; Mantegna, Rosario N.
2011-01-01
Many complex systems present an intrinsic bipartite structure where elements of one set link to elements of the second set. In these complex systems, such as the system of actors and movies, elements of one set are qualitatively different than elements of the other set. The properties of these complex systems are typically investigated by constructing and analyzing a projected network on one of the two sets (for example the actor network or the movie network). Complex systems are often very heterogeneous in the number of relationships that the elements of one set establish with the elements of the other set, and this heterogeneity makes it very difficult to discriminate links of the projected network that are just reflecting system's heterogeneity from links relevant to unveil the properties of the system. Here we introduce an unsupervised method to statistically validate each link of a projected network against a null hypothesis that takes into account system heterogeneity. We apply the method to a biological, an economic and a social complex system. The method we propose is able to detect network structures which are very informative about the organization and specialization of the investigated systems, and identifies those relationships between elements of the projected network that cannot be explained simply by system heterogeneity. We also show that our method applies to bipartite systems in which different relationships might have different qualitative nature, generating statistically validated networks in which such difference is preserved. PMID:21483858
NASA Astrophysics Data System (ADS)
Boden, T. A.; Krassovski, M.; Yang, B.
2013-06-01
The Carbon Dioxide Information Analysis Center (CDIAC) at Oak Ridge National Laboratory (ORNL), USA has provided scientific data management support for the US Department of Energy and international climate change science since 1982. Among the many data archived and available from CDIAC are collections from long-term measurement projects. One current example is the AmeriFlux measurement network. AmeriFlux provides continuous measurements from forests, grasslands, wetlands, and croplands in North, Central, and South America and offers important insight about carbon cycling in terrestrial ecosystems. To successfully manage AmeriFlux data and support climate change research, CDIAC has designed flexible data systems using proven technologies and standards blended with new, evolving technologies and standards. The AmeriFlux data system, comprised primarily of a relational database, a PHP-based data interface and a FTP server, offers a broad suite of AmeriFlux data. The data interface allows users to query the AmeriFlux collection in a variety of ways and then subset, visualize and download the data. From the perspective of data stewardship, on the other hand, this system is designed for CDIAC to easily control database content, automate data movement, track data provenance, manage metadata content, and handle frequent additions and corrections. CDIAC and researchers in the flux community developed data submission guidelines to enhance the AmeriFlux data collection, enable automated data processing, and promote standardization across regional networks. Both continuous flux and meteorological data and irregular biological data collected at AmeriFlux sites are carefully scrutinized by CDIAC using established quality-control algorithms before the data are ingested into the AmeriFlux data system. Other tasks at CDIAC include reformatting and standardizing the diverse and heterogeneous datasets received from individual sites into a uniform and consistent network database, generating high-level derived products to meet the current demands from a broad user group, and developing new products in anticipation of future needs. In this paper, we share our approaches to meet the challenges of standardizing, archiving and delivering quality, well-documented AmeriFlux data worldwide to benefit others with similar challenges of handling diverse climate change data, to further heighten awareness and use of an outstanding ecological data resource, and to highlight expanded software engineering applications being used for climate change measurement data.
NASA Astrophysics Data System (ADS)
Boden, T. A.; Krassovski, M.; Yang, B.
2013-02-01
The Carbon Dioxide Information Analysis Center (CDIAC) at Oak Ridge National Laboratory (ORNL), USA has provided scientific data management support for the US Department of Energy and international climate change science since 1982. Among the many data archived and available from CDIAC are collections from long-term measurement projects. One current example is the AmeriFlux measurement network. AmeriFlux provides continuous measurements from forests, grasslands, wetlands, and croplands in North, Central, and South America and offers important insight about carbon cycling in terrestrial ecosystems. To successfully manage AmeriFlux data and support climate change research, CDIAC has designed flexible data systems using proven technologies and standards blended with new, evolving technologies and standards. The AmeriFlux data system, comprised primarily of a relational database, a PHP based data-interface and a FTP server, offers a broad suite of AmeriFlux data. The data interface allows users to query the AmeriFlux collection in a variety of ways and then subset, visualize and download the data. From the perspective of data stewardship, on the other hand, this system is designed for CDIAC to easily control database content, automate data movement, track data provenance, manage metadata content, and handle frequent additions and corrections. CDIAC and researchers in the flux community developed data submission guidelines to enhance the AmeriFlux data collection, enable automated data processing, and promote standardization across regional networks. Both continuous flux and meteorological data and irregular biological data collected at AmeriFlux sites are carefully scrutinized by CDIAC using established quality-control algorithms before the data are ingested into the AmeriFlux data system. Other tasks at CDIAC include reformatting and standardizing the diverse and heterogeneous datasets received from individual sites into a uniform and consistent network database, generating high-level derived products to meet the current demands from a broad user group, and developing new products in anticipation of future needs. In this paper, we share our approaches to meet the challenges of standardizing, archiving and delivering quality, well-documented AmeriFlux data worldwide to benefit others with similar challenges of handling diverse climate change data, to further heighten awareness and use of an outstanding ecological data resource, and to highlight expanded software engineering applications being used for climate change measurement data.
Wang, Tingting; Liu, Yuan; Li, Zhanzhan; Liu, Kaihua; Xu, Yang; Shi, Wenpei; Chen, Lizhang
2017-01-01
Background Intimate partner violence (IPV) is the most common form of violence against women worldwide. IPV during pregnancy is an important risk factor for adverse health outcomes for women and their offspring. However, the prevalence of IPV during pregnancy is not well understood in China. The objective of this study was to estimate the pooled prevalence of IPV during pregnancy in China using a systematic review and meta-analysis. Methods Systematic literature searches were conducted in PubMed, Web of Science, CNKI, Wanfang, Weipu and CBM databases to identify relevant articles published from the inception of each database to January 31, 2016 that reported data on the prevalence of IPV during pregnancy in China. The Risk of Bias Tool for prevalence studies was used to assess the risk of bias in individual studies. Owing to significant between-study heterogeneity, a random-effects model was used to calculate the pooled prevalence and corresponding 95% confidence interval, and then univariate meta-regression analyses were performed to investigate the sources of heterogeneity. Subgroup analysis was conducted to explore the risk factors associated with IPV during pregnancy. Results Thirteen studies with a total of 30,665 individuals were included in this study. The overall pooled prevalence of IPV during pregnancy was 7.7% (95% CI: 5.6–10.1%) with significant heterogeneity (I2 = 97.8%, p < 0.001). The results of the univariate meta-regression analyses showed that only the variable “sample source” explained part of the heterogeneity in this study (p < 0.05). The characteristics “number of children” and “unplanned pregnancy” were determined as risk factors for experiencing violence during pregnancy. Conclusions The prevalence of IPV during pregnancy in China is considerable and one of the highest reported in Asia, which suggests that issues of violence against women during pregnancy should be included in efforts to improve the health of pregnant women and their offspring. In addition, a nationwide epidemiological study is needed to confirm the prevalence estimates and identify more risk factors for IPV during pregnancy. PMID:28968397
Wang, Tingting; Liu, Yuan; Li, Zhanzhan; Liu, Kaihua; Xu, Yang; Shi, Wenpei; Chen, Lizhang
2017-01-01
Intimate partner violence (IPV) is the most common form of violence against women worldwide. IPV during pregnancy is an important risk factor for adverse health outcomes for women and their offspring. However, the prevalence of IPV during pregnancy is not well understood in China. The objective of this study was to estimate the pooled prevalence of IPV during pregnancy in China using a systematic review and meta-analysis. Systematic literature searches were conducted in PubMed, Web of Science, CNKI, Wanfang, Weipu and CBM databases to identify relevant articles published from the inception of each database to January 31, 2016 that reported data on the prevalence of IPV during pregnancy in China. The Risk of Bias Tool for prevalence studies was used to assess the risk of bias in individual studies. Owing to significant between-study heterogeneity, a random-effects model was used to calculate the pooled prevalence and corresponding 95% confidence interval, and then univariate meta-regression analyses were performed to investigate the sources of heterogeneity. Subgroup analysis was conducted to explore the risk factors associated with IPV during pregnancy. Thirteen studies with a total of 30,665 individuals were included in this study. The overall pooled prevalence of IPV during pregnancy was 7.7% (95% CI: 5.6-10.1%) with significant heterogeneity (I2 = 97.8%, p < 0.001). The results of the univariate meta-regression analyses showed that only the variable "sample source" explained part of the heterogeneity in this study (p < 0.05). The characteristics "number of children" and "unplanned pregnancy" were determined as risk factors for experiencing violence during pregnancy. The prevalence of IPV during pregnancy in China is considerable and one of the highest reported in Asia, which suggests that issues of violence against women during pregnancy should be included in efforts to improve the health of pregnant women and their offspring. In addition, a nationwide epidemiological study is needed to confirm the prevalence estimates and identify more risk factors for IPV during pregnancy.
de Oliveira Azevedo, Christianne Terra; do Brasil, Pedro Emmanuel A A; Guida, Letícia; Lopes Moreira, Maria Elizabeth
2016-01-01
Congenital infection caused by Toxoplasma gondii can cause serious damage that can be diagnosed in utero or at birth, although most infants are asymptomatic at birth. Prenatal diagnosis of congenital toxoplasmosis considerably improves the prognosis and outcome for infected infants. For this reason, an assay for the quick, sensitive, and safe diagnosis of fetal toxoplasmosis is desirable. To systematically review the performance of polymerase chain reaction (PCR) analysis of the amniotic fluid of pregnant women with recent serological toxoplasmosis diagnoses for the diagnosis of fetal toxoplasmosis. A systematic literature review was conducted via a search of electronic databases; the literature included primary studies of the diagnostic accuracy of PCR analysis of amniotic fluid from pregnant women who seroconverted during pregnancy. The PCR test was compared to a gold standard for diagnosis. A total of 1.269 summaries were obtained from the electronic database and reviewed, and 20 studies, comprising 4.171 samples, met the established inclusion criteria and were included in the review. The following results were obtained: studies about PCR assays for fetal toxoplasmosis are generally susceptible to bias; reports of the tests' use lack critical information; the protocols varied among studies; the heterogeneity among studies was concentrated in the tests' sensitivity; there was evidence that the sensitivity of the tests increases with time, as represented by the trimester; and there was more heterogeneity among studies in which there was more time between maternal diagnosis and fetal testing. The sensitivity of the method, if performed up to five weeks after maternal diagnosis, was 87% and specificity was 99%. The global sensitivity heterogeneity of the PCR test in this review was 66.5% (I(2)). The tests show low evidence of heterogeneity with a sensitivity of 87% and specificity of 99% when performed up to five weeks after maternal diagnosis. The test has a known performance and could be recommended for use up to five weeks after maternal diagnosis, when there is suspicion of fetal toxoplasmosis.
2016-01-01
Introduction Congenital infection caused by Toxoplasma gondii can cause serious damage that can be diagnosed in utero or at birth, although most infants are asymptomatic at birth. Prenatal diagnosis of congenital toxoplasmosis considerably improves the prognosis and outcome for infected infants. For this reason, an assay for the quick, sensitive, and safe diagnosis of fetal toxoplasmosis is desirable. Goal To systematically review the performance of polymerase chain reaction (PCR) analysis of the amniotic fluid of pregnant women with recent serological toxoplasmosis diagnoses for the diagnosis of fetal toxoplasmosis. Method A systematic literature review was conducted via a search of electronic databases; the literature included primary studies of the diagnostic accuracy of PCR analysis of amniotic fluid from pregnant women who seroconverted during pregnancy. The PCR test was compared to a gold standard for diagnosis. Results A total of 1.269 summaries were obtained from the electronic database and reviewed, and 20 studies, comprising 4.171 samples, met the established inclusion criteria and were included in the review. The following results were obtained: studies about PCR assays for fetal toxoplasmosis are generally susceptible to bias; reports of the tests’ use lack critical information; the protocols varied among studies; the heterogeneity among studies was concentrated in the tests’ sensitivity; there was evidence that the sensitivity of the tests increases with time, as represented by the trimester; and there was more heterogeneity among studies in which there was more time between maternal diagnosis and fetal testing. The sensitivity of the method, if performed up to five weeks after maternal diagnosis, was 87% and specificity was 99%. Conclusion The global sensitivity heterogeneity of the PCR test in this review was 66.5% (I2). The tests show low evidence of heterogeneity with a sensitivity of 87% and specificity of 99% when performed up to five weeks after maternal diagnosis. The test has a known performance and could be recommended for use up to five weeks after maternal diagnosis, when there is suspicion of fetal toxoplasmosis. PMID:27055272
Prevalence of hypertension among adolescents: systematic review and meta-analysis
Gonçalves, Vivian Siqueira Santos; Galvão, Taís Freire; de Andrade, Keitty Regina Cordeiro; Dutra, Eliane Said; Bertolin, Maria Natacha Toral; de Carvalho, Kenia Mara Baiocchi; Pereira, Mauricio Gomes
2016-01-01
ABSTRACT OBJECTIVE To estimate the prevalence of hypertension among adolescent Brazilian students. METHODS A systematic review of school-based cross-sectional studies was conducted. The articles were searched in the databases MEDLINE, Embase, Scopus, LILACS, SciELO, Web of Science, CAPES thesis database and Trip Database. In addition, we examined the lists of references of relevant studies to identify potentially eligible articles. No restrictions regarding publication date, language, or status applied. The studies were selected by two independent evaluators, who also extracted the data and assessed the methodological quality following eight criteria related to sampling, measuring blood pressure, and presenting results. The meta-analysis was calculated using a random effects model and analyses were performed to investigate heterogeneity. RESULTS We retrieved 1,577 articles from the search and included 22 in the review. The included articles corresponded to 14,115 adolescents, 51.2% (n = 7,230) female. We observed a variety of techniques, equipment, and references used. The prevalence of hypertension was 8.0% (95%CI 5.0–11.0; I2 = 97.6%), 9.3% (95%CI 5.6–13.6; I2 = 96.4%) in males and 6.5% (95%CI 4.2–9.1; I2 = 94.2%) in females. The meta-regression failed to identify the causes of the heterogeneity among studies. CONCLUSIONS Despite the differences found in the methodologies of the included studies, the results of this systematic review indicate that hypertension is prevalent in the Brazilian adolescent school population. For future investigations, we suggest the standardization of techniques, equipment, and references, aiming at improving the methodological quality of the studies. PMID:27253903
Fan, Kai-Xi; Xu, Zhong-Fa; Wang, Mei-Rong; Li, Dao-Tang; Yang, Xiang-Shan; Guo, Jing
2015-03-14
To compare the clinical outcomes between jejunal interposition reconstruction and Roux-en-Y anastomosis after total gastrostomy in patients with gastric cancer. A systematic literature search was conducted by two independent researchers on PubMed, EMBASE, the Cochrane Library, Google Scholar, and other English literature databases, as well as the Chinese Academic Journal, Chinese Biomedical Literature Database, and other Chinese literature databases using "Gastrostomy", "Roux-en-Y", and "Interposition" as keywords. Data extraction and verification were performed on the literature included in this study. RevMan 5.2 software was used for data processing. A fixed-effects model was applied in the absence of heterogeneity between studies. A random effects model was applied in the presence of heterogeneity between studies. Ten studies with a total of 762 gastric cancer patients who underwent total gastrostomy were included in this study. Among them, 357 received jejunal interposition reconstruction after total gastrostomy, and 405 received Roux-en-Y anastomosis. Compared with Roux-en-Y anastomosis, jejunal interposition reconstruction significantly decreased the incidence of dumping syndrome (OR = 0.18, 95%CI: 0.10-0.31; P < 0.001), increased the prognostic nutritional index [weighted mean difference (WMD) = 6.02, 95%CI: 1.82-10.22; P < 0.001], and improved the degree of postoperative weight loss [WMD = 2.47, 95%CI: -3.19-(-1.75); P < 0.001]. However, there is no statistically significant difference in operative time, hospital stay, or incidence of reflux esophagitis. Compared with Roux-en-Y anastomosis, patients who underwent jejunal interposition reconstruction after total gastrostomy had a lower risk of postoperative long-term complications and improved life quality.
Fathima, Mariam; Peiris, David; Naik-Panvelkar, Pradnya; Saini, Bandana; Armour, Carol Lyn
2014-12-02
The use of computerized clinical decision support systems may improve the diagnosis and ongoing management of chronic diseases, which requires recurrent visits to multiple health professionals, disease and medication monitoring and modification of patient behavior. The aim of this review was to systematically review randomized controlled trials evaluating the effectiveness of computerized clinical decision systems (CCDSS) in the care of people with asthma and COPD. Randomized controlled trials published between 2003 and 2013 were searched using multiple electronic databases Medline, EMBASE, CINAHL, IPA, Informit, PsychINFO, Compendex, and Cochrane Clinical Controlled Trials Register databases. To be included, RCTs had to evaluate the role of the CCDSSs for asthma and/or COPD in primary care. Nineteen studies representing 16 RCTs met our inclusion criteria. The majority of the trials were conducted in patients with asthma. Study quality was generally high. Meta-analysis was not conducted because of methodological and clinical heterogeneity. The use of CCDSS improved asthma and COPD care in 14 of the 19 studies reviewed (74%). Nine of the nineteen studies showed statistically significant (p < 0.05) improvement in the primary outcomes measured. The majority of the studies evaluated health care process measures as their primary outcomes (10/19). Evidence supports the effectiveness of CCDSS in the care of people with asthma. However there is very little information of its use in COPD care. Although there is considerable improvement in the health care process measures and clinical outcomes through the use of CCDSSs, its effects on user workload and efficiency, safety, costs of care, provider and patient satisfaction remain understudied.
The iMeteo is a web-based weather visualization tool
NASA Astrophysics Data System (ADS)
Tuni San-Martín, Max; San-Martín, Daniel; Cofiño, Antonio S.
2010-05-01
iMeteo is a web-based weather visualization tool. Designed with an extensible J2EE architecture, it is capable of displaying information from heterogeneous data sources such as gridded data from numerical models (in NetCDF format) or databases of local predictions. All this information is presented in a user-friendly way, being able to choose the specific tool to display data (maps, graphs, information tables) and customize it to desired locations. *Modular Display System* Visualization of the data is achieved through a set of mini tools called widgets. A user can add them at will and arrange them around the screen easily with a drag and drop movement. They can be of various types and each can be configured separately, forming a really powerful and configurable system. The "Map" is the most complex widget, since it can show several variables simultaneously (either gridded or point-based) through a layered display. Other useful widgets are the the "Histogram", which generates a graph with the frequency characteristics of a variable and the "Timeline" which shows the time evolution of a variable at a given location in an interactive way. *Customization and security* Following the trends in web development, the user can easily customize the way data is displayed. Due to programming in client side with technologies like AJAX, the interaction with the application is similar to the desktop ones because there are rapid respone times. If a user is registered then he could also save his settings in the database, allowing access from any system with Internet access with his particular setup. There is particular emphasis on application security. The administrator can define a set of user profiles, which may have associated restrictions on access to certain data sources, geographic areas or time intervals.
Integration of a neuroimaging processing pipeline into a pan-canadian computing grid
NASA Astrophysics Data System (ADS)
Lavoie-Courchesne, S.; Rioux, P.; Chouinard-Decorte, F.; Sherif, T.; Rousseau, M.-E.; Das, S.; Adalat, R.; Doyon, J.; Craddock, C.; Margulies, D.; Chu, C.; Lyttelton, O.; Evans, A. C.; Bellec, P.
2012-02-01
The ethos of the neuroimaging field is quickly moving towards the open sharing of resources, including both imaging databases and processing tools. As a neuroimaging database represents a large volume of datasets and as neuroimaging processing pipelines are composed of heterogeneous, computationally intensive tools, such open sharing raises specific computational challenges. This motivates the design of novel dedicated computing infrastructures. This paper describes an interface between PSOM, a code-oriented pipeline development framework, and CBRAIN, a web-oriented platform for grid computing. This interface was used to integrate a PSOM-compliant pipeline for preprocessing of structural and functional magnetic resonance imaging into CBRAIN. We further tested the capacity of our infrastructure to handle a real large-scale project. A neuroimaging database including close to 1000 subjects was preprocessed using our interface and publicly released to help the participants of the ADHD-200 international competition. This successful experiment demonstrated that our integrated grid-computing platform is a powerful solution for high-throughput pipeline analysis in the field of neuroimaging.
An Emerging Role for Polystores in Precision Medicine
DOE Office of Scientific and Technical Information (OSTI.GOV)
Begoli, Edmon; Christian, J. Blair; Gadepally, Vijay
Medical data is organically heterogeneous, and it usually varies significantly in both size and composition. Yet, this data is also a key for the recent and promising field of precision medicine, which focuses on identifying and tailoring appropriate medical treatments for the needs of the individual patients, based on their specific conditions, their medical history, lifestyle, genetic, and other individual factors. As we, and a database community at large, recognize that a “one size does not fit all” solution is required to work with such data, we present in this paper our observations based on our experiences, and the applicationsmore » in the field of precision medicine. Finally, we make the case for the use of polystore architecture; how it applies for precision medicine; we discuss the reference architecture; describe some of its critical components (array database); and discuss the specific types of analysis that directly benefit from this database architecture, and the ways it serves the data.« less
Distributed database kriging for adaptive sampling (D²KAS)
Roehm, Dominic; Pavel, Robert S.; Barros, Kipton; ...
2015-03-18
We present an adaptive sampling method supplemented by a distributed database and a prediction method for multiscale simulations using the Heterogeneous Multiscale Method. A finite-volume scheme integrates the macro-scale conservation laws for elastodynamics, which are closed by momentum and energy fluxes evaluated at the micro-scale. In the original approach, molecular dynamics (MD) simulations are launched for every macro-scale volume element. Our adaptive sampling scheme replaces a large fraction of costly micro-scale MD simulations with fast table lookup and prediction. The cloud database Redis provides the plain table lookup, and with locality aware hashing we gather input data for our predictionmore » scheme. For the latter we use kriging, which estimates an unknown value and its uncertainty (error) at a specific location in parameter space by using weighted averages of the neighboring points. We find that our adaptive scheme significantly improves simulation performance by a factor of 2.5 to 25, while retaining high accuracy for various choices of the algorithm parameters.« less
Hassani-Pak, Keywan; Rawlings, Christopher
2017-06-13
Genetics and "omics" studies designed to uncover genotype to phenotype relationships often identify large numbers of potential candidate genes, among which the causal genes are hidden. Scientists generally lack the time and technical expertise to review all relevant information available from the literature, from key model species and from a potentially wide range of related biological databases in a variety of data formats with variable quality and coverage. Computational tools are needed for the integration and evaluation of heterogeneous information in order to prioritise candidate genes and components of interaction networks that, if perturbed through potential interventions, have a positive impact on the biological outcome in the whole organism without producing negative side effects. Here we review several bioinformatics tools and databases that play an important role in biological knowledge discovery and candidate gene prioritization. We conclude with several key challenges that need to be addressed in order to facilitate biological knowledge discovery in the future.
Scientific Use Cases for the Virtual Atomic and Molecular Data Center
NASA Astrophysics Data System (ADS)
Dubernet, M. L.; Aboudarham, J.; Ba, Y. A.; Boiziot, M.; Bottinelli, S.; Caux, E.; Endres, C.; Glorian, J. M.; Henry, F.; Lamy, L.; Le Sidaner, P.; Møller, T.; Moreau, N.; Rénié, C.; Roueff, E.; Schilke, P.; Vastel, C.; Zwoelf, C. M.
2014-12-01
VAMDC Consortium is a worldwide consortium which federates interoperable Atomic and Molecular databases through an e-science infrastructure. The contained data are of the highest scientific quality and are crucial for many applications: astrophysics, atmospheric physics, fusion, plasma and lighting technologies, health, etc. In this paper we present astrophysical scientific use cases in relation to the use of the VAMDC e-infrastructure. Those will cover very different applications such as: (i) modeling the spectra of interstellar objects using the myXCLASS software tool implemented in the Common Astronomy Software Applications package (CASA) or using the CASSIS software tool, in its stand-alone version or implemented in the Herschel Interactive Processing Environment (HIPE); (ii) the use of Virtual Observatory tools accessing VAMDC databases; (iii) the access of VAMDC from the Paris solar BASS2000 portal; (iv) the combination of tools and database from the APIS service (Auroral Planetary Imaging and Spectroscopy); (v) combination of heterogeneous data for the application to the interstellar medium from the SPECTCOL tool.
The data operation centre tool. Architecture and population strategies
NASA Astrophysics Data System (ADS)
Dal Pra, Stefano; Crescente, Alberto
2012-12-01
Keeping track of the layout of the informatic resources in a big datacenter is a complex task. DOCET is a database-based webtool designed and implemented at INFN. It aims at providing a uniform interface to manage and retrieve needed information about one or more datacenter, such as available hardware, software and their status. Having a suitable application is however useless until most of the information about the centre are not inserted in the DOCET'S database. Manually inserting all the information from scratch is an unfeasible task. After describing DOCET'S high level architecture, its main features and current development track, we present and discuss the work done to populate the DOCET database for the INFN-T1 site by retrieving information from a heterogenous variety of authoritative sources, such as DNS, DHCP, Quattor host profiles, etc. We then describe the work being done to integrate DOCET with some common management operation, such as adding a newly installed host to DHCP and DNS, or creating a suitable Quattor profile template for it.
A Relational Database System for Student Use.
ERIC Educational Resources Information Center
Fertuck, Len
1982-01-01
Describes an APL implementation of a relational database system suitable for use in a teaching environment in which database development and database administration are studied, and discusses the functions of the user and the database administrator. An appendix illustrating system operation and an eight-item reference list are attached. (Author/JL)
James Webb Space Telescope XML Database: From the Beginning to Today
NASA Technical Reports Server (NTRS)
Gal-Edd, Jonathan; Fatig, Curtis C.
2005-01-01
The James Webb Space Telescope (JWST) Project has been defining, developing, and exercising the use of a common eXtensible Markup Language (XML) for the command and telemetry (C&T) database structure. JWST is the first large NASA space mission to use XML for databases. The JWST project started developing the concepts for the C&T database in 2002. The database will need to last at least 20 years since it will be used beginning with flight software development, continuing through Observatory integration and test (I&T) and through operations. Also, a database tool kit has been provided to the 18 various flight software development laboratories located in the United States, Europe, and Canada that allows the local users to create their own databases. Recently the JWST Project has been working with the Jet Propulsion Laboratory (JPL) and Object Management Group (OMG) XML Telemetry and Command Exchange (XTCE) personnel to provide all the information needed by JWST and JPL for exchanging database information using a XML standard structure. The lack of standardization requires custom ingest scripts for each ground system segment, increasing the cost of the total system. Providing a non-proprietary standard of the telemetry and command database definition formation will allow dissimilar systems to communicate without the need for expensive mission specific database tools and testing of the systems after the database translation. The various ground system components that would benefit from a standardized database are the telemetry and command systems, archives, simulators, and trending tools. JWST has exchanged the XML database with the Eclipse, EPOCH, ASIST ground systems, Portable spacecraft simulator (PSS), a front-end system, and Integrated Trending and Plotting System (ITPS) successfully. This paper will discuss how JWST decided to use XML, the barriers to a new concept, experiences utilizing the XML structure, exchanging databases with other users, and issues that have been experienced in creating databases for the C&T system.
ERIC Educational Resources Information Center
Dalrymple, Prudence W.; Roderer, Nancy K.
1994-01-01
Highlights the changes that have occurred from 1987-93 in database access systems. Topics addressed include types of databases, including CD-ROMs; enduser interface; database selection; database access management, including library instruction and use of primary literature; economic issues; database users; the search process; and improving…
An Introduction to Database Structure and Database Machines.
ERIC Educational Resources Information Center
Detweiler, Karen
1984-01-01
Enumerates principal management objectives of database management systems (data independence, quality, security, multiuser access, central control) and criteria for comparison (response time, size, flexibility, other features). Conventional database management systems, relational databases, and database machines used for backend processing are…
Federal Register 2010, 2011, 2012, 2013, 2014
2013-05-16
... Excluded Parties Listing System (EPLS) databases into the System for Award Management (SAM) database. DATES... combined the functional capabilities of the CCR, ORCA, and EPLS procurement systems into the SAM database... identification number and the type of organization from the System for Award Management database. 0 3. Revise the...
Heterogeneous recurrence monitoring and control of nonlinear stochastic processes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yang, Hui, E-mail: huiyang@usf.edu; Chen, Yun
Recurrence is one of the most common phenomena in natural and engineering systems. Process monitoring of dynamic transitions in nonlinear and nonstationary systems is more concerned with aperiodic recurrences and recurrence variations. However, little has been done to investigate the heterogeneous recurrence variations and link with the objectives of process monitoring and anomaly detection. Notably, nonlinear recurrence methodologies are based on homogeneous recurrences, which treat all recurrence states in the same way as black dots, and non-recurrence is white in recurrence plots. Heterogeneous recurrences are more concerned about the variations of recurrence states in terms of state properties (e.g., valuesmore » and relative locations) and the evolving dynamics (e.g., sequential state transitions). This paper presents a novel approach of heterogeneous recurrence analysis that utilizes a new fractal representation to delineate heterogeneous recurrence states in multiple scales, including the recurrences of both single states and multi-state sequences. Further, we developed a new set of heterogeneous recurrence quantifiers that are extracted from fractal representation in the transformed space. To that end, we integrated multivariate statistical control charts with heterogeneous recurrence analysis to simultaneously monitor two or more related quantifiers. Experimental results on nonlinear stochastic processes show that the proposed approach not only captures heterogeneous recurrence patterns in the fractal representation but also effectively monitors the changes in the dynamics of a complex system.« less
NASA Astrophysics Data System (ADS)
Beauducel, François; Bosson, Alexis; Randriamora, Frédéric; Anténor-Habazac, Christian; Lemarchand, Arnaud; Saurel, Jean-Marie; Nercessian, Alexandre; Bouin, Marie-Paule; de Chabalier, Jean-Bernard; Clouard, Valérie
2010-05-01
Seismological and Volcanological observatories have common needs and often common practical problems for multi disciplinary data monitoring applications. In fact, access to integrated data in real-time and estimation of measurements uncertainties are keys for an efficient interpretation, but instruments variety, heterogeneity of data sampling and acquisition systems lead to difficulties that may hinder crisis management. In Guadeloupe observatory, we have developed in the last years an operational system that attempts to answer the questions in the context of a pluri-instrumental observatory. Based on a single computer server, open source scripts (Matlab, Perl, Bash, Nagios) and a Web interface, the system proposes: an extended database for networks management, stations and sensors (maps, station file with log history, technical characteristics, meta-data, photos and associated documents); a web-form interfaces for manual data input/editing and export (like geochemical analysis, some of the deformation measurements, ...); routine data processing with dedicated automatic scripts for each technique, production of validated data outputs, static graphs on preset moving time intervals, and possible e-mail alarms; computers, acquisition processes, stations and individual sensors status automatic check with simple criteria (files update and signal quality), displayed as synthetic pages for technical control. In the special case of seismology, WebObs includes a digital stripchart multichannel continuous seismogram associated with EarthWorm acquisition chain (see companion paper Part 1), event classification database, location scripts, automatic shakemaps and regional catalog with associated hypocenter maps accessed through a user request form. This system leads to a real-time Internet access for integrated monitoring and becomes a strong support for scientists and technicians exchange, and is widely open to interdisciplinary real-time modeling. It has been set up at Martinique observatory and installation is planned this year at Montserrat Volcanological Observatory. It also in production at the geomagnetic observatory of Addis Abeba in Ethiopia.
Discriminating cellular heterogeneity using microwell-based RNA cytometry
Dimov, Ivan K.; Lu, Rong; Lee, Eric P.; Seita, Jun; Sahoo, Debashis; Park, Seung-min; Weissman, Irving L.; Lee, Luke P.
2014-01-01
Discriminating cellular heterogeneity is important for understanding cellular physiology. However, it is limited by the technical difficulties of single-cell measurements. Here, we develop a two-stage system to determine cellular heterogeneity. In the first stage, we perform multiplex single-cell RNA-cytometry in a microwell array containing over 60,000 reaction chambers. In the second stage, we use the RNA-cytometry data to determine cellular heterogeneity by providing a heterogeneity likelihood score. Moreover, we use Monte-Carlo simulation and RNA-cytometry data to calculate the minimum number of cells required for detecting heterogeneity. We applied this system to characterize the RNA distributions of aging related genes in a highly purified mouse hematopoietic stem cell population. We identified genes that reveal novel heterogeneity of these cells. We also show that changes in expression of genes such as Birc6 during aging can be attributed to the shift of relative portions of cells in the high-expressing subgroup versus low-expressing subgroup. PMID:24667995
Contagion on complex networks with persuasion
NASA Astrophysics Data System (ADS)
Huang, Wei-Min; Zhang, Li-Jie; Xu, Xin-Jian; Fu, Xinchu
2016-03-01
The threshold model has been widely adopted as a classic model for studying contagion processes on social networks. We consider asymmetric individual interactions in social networks and introduce a persuasion mechanism into the threshold model. Specifically, we study a combination of adoption and persuasion in cascading processes on complex networks. It is found that with the introduction of the persuasion mechanism, the system may become more vulnerable to global cascades, and the effects of persuasion tend to be more significant in heterogeneous networks than those in homogeneous networks: a comparison between heterogeneous and homogeneous networks shows that under weak persuasion, heterogeneous networks tend to be more robust against random shocks than homogeneous networks; whereas under strong persuasion, homogeneous networks are more stable. Finally, we study the effects of adoption and persuasion threshold heterogeneity on systemic stability. Though both heterogeneities give rise to global cascades, the adoption heterogeneity has an overwhelmingly stronger impact than the persuasion heterogeneity when the network connectivity is sufficiently dense.
Contagion on complex networks with persuasion
Huang, Wei-Min; Zhang, Li-Jie; Xu, Xin-Jian; Fu, Xinchu
2016-01-01
The threshold model has been widely adopted as a classic model for studying contagion processes on social networks. We consider asymmetric individual interactions in social networks and introduce a persuasion mechanism into the threshold model. Specifically, we study a combination of adoption and persuasion in cascading processes on complex networks. It is found that with the introduction of the persuasion mechanism, the system may become more vulnerable to global cascades, and the effects of persuasion tend to be more significant in heterogeneous networks than those in homogeneous networks: a comparison between heterogeneous and homogeneous networks shows that under weak persuasion, heterogeneous networks tend to be more robust against random shocks than homogeneous networks; whereas under strong persuasion, homogeneous networks are more stable. Finally, we study the effects of adoption and persuasion threshold heterogeneity on systemic stability. Though both heterogeneities give rise to global cascades, the adoption heterogeneity has an overwhelmingly stronger impact than the persuasion heterogeneity when the network connectivity is sufficiently dense. PMID:27029498
Contagion on complex networks with persuasion.
Huang, Wei-Min; Zhang, Li-Jie; Xu, Xin-Jian; Fu, Xinchu
2016-03-31
The threshold model has been widely adopted as a classic model for studying contagion processes on social networks. We consider asymmetric individual interactions in social networks and introduce a persuasion mechanism into the threshold model. Specifically, we study a combination of adoption and persuasion in cascading processes on complex networks. It is found that with the introduction of the persuasion mechanism, the system may become more vulnerable to global cascades, and the effects of persuasion tend to be more significant in heterogeneous networks than those in homogeneous networks: a comparison between heterogeneous and homogeneous networks shows that under weak persuasion, heterogeneous networks tend to be more robust against random shocks than homogeneous networks; whereas under strong persuasion, homogeneous networks are more stable. Finally, we study the effects of adoption and persuasion threshold heterogeneity on systemic stability. Though both heterogeneities give rise to global cascades, the adoption heterogeneity has an overwhelmingly stronger impact than the persuasion heterogeneity when the network connectivity is sufficiently dense.
Hendrickx, Diana M; Boyles, Rebecca R; Kleinjans, Jos C S; Dearry, Allen
2014-12-01
A joint US-EU workshop on enhancing data sharing and exchange in toxicogenomics was held at the National Institute for Environmental Health Sciences. Currently, efficient reuse of data is hampered by problems related to public data availability, data quality, database interoperability (the ability to exchange information), standardization and sustainability. At the workshop, experts from universities and research institutes presented databases, studies, organizations and tools that attempt to deal with these problems. Furthermore, a case study showing that combining toxicogenomics data from multiple resources leads to more accurate predictions in risk assessment was presented. All participants agreed that there is a need for a web portal describing the diverse, heterogeneous data resources relevant for toxicogenomics research. Furthermore, there was agreement that linking more data resources would improve toxicogenomics data analysis. To outline a roadmap to enhance interoperability between data resources, the participants recommend collecting user stories from the toxicogenomics research community on barriers in data sharing and exchange currently hampering answering to certain research questions. These user stories may guide the prioritization of steps to be taken for enhancing integration of toxicogenomics databases.
ETHNOS: A versatile electronic tool for the development and curation of national genetic databases
2010-01-01
National and ethnic mutation databases (NEMDBs) are emerging online repositories, recording extensive information about the described genetic heterogeneity of an ethnic group or population. These resources facilitate the provision of genetic services and provide a comprehensive list of genomic variations among different populations. As such, they enhance awareness of the various genetic disorders. Here, we describe the features of the ETHNOS software, a simple but versatile tool based on a flat-file database that is specifically designed for the development and curation of NEMDBs. ETHNOS is a freely available software which runs more than half of the NEMDBs currently available. Given the emerging need for NEMDB in genetic testing services and the fact that ETHNOS is the only off-the-shelf software available for NEMDB development and curation, its adoption in subsequent NEMDB development would contribute towards data content uniformity, unlike the diverse contents and quality of the available gene (locus)-specific databases. Finally, we allude to the potential applications of NEMDBs, not only as worldwide central allele frequency repositories, but also, and most importantly, as data warehouses of individual-level genomic data, hence allowing for a comprehensive ethnicity-specific documentation of genomic variation. PMID:20650823
ETHNOS : A versatile electronic tool for the development and curation of national genetic databases.
van Baal, Sjozef; Zlotogora, Joël; Lagoumintzis, George; Gkantouna, Vassiliki; Tzimas, Ioannis; Poulas, Konstantinos; Tsakalidis, Athanassios; Romeo, Giovanni; Patrinos, George P
2010-06-01
National and ethnic mutation databases (NEMDBs) are emerging online repositories, recording extensive information about the described genetic heterogeneity of an ethnic group or population. These resources facilitate the provision of genetic services and provide a comprehensive list of genomic variations among different populations. As such, they enhance awareness of the various genetic disorders. Here, we describe the features of the ETHNOS software, a simple but versatile tool based on a flat-file database that is specifically designed for the development and curation of NEMDBs. ETHNOS is a freely available software which runs more than half of the NEMDBs currently available. Given the emerging need for NEMDB in genetic testing services and the fact that ETHNOS is the only off-the-shelf software available for NEMDB development and curation, its adoption in subsequent NEMDB development would contribute towards data content uniformity, unlike the diverse contents and quality of the available gene (locus)-specific databases. Finally, we allude to the potential applications of NEMDBs, not only as worldwide central allele frequency repositories, but also, and most importantly, as data warehouses of individual-level genomic data, hence allowing for a comprehensive ethnicity-specific documentation of genomic variation.
Miyoshi, Newton Shydeo Brandão; Pinheiro, Daniel Guariz; Silva, Wilson Araújo; Felipe, Joaquim Cezar
2013-06-06
The use of the knowledge produced by sciences to promote human health is the main goal of translational medicine. To make it feasible we need computational methods to handle the large amount of information that arises from bench to bedside and to deal with its heterogeneity. A computational challenge that must be faced is to promote the integration of clinical, socio-demographic and biological data. In this effort, ontologies play an essential role as a powerful artifact for knowledge representation. Chado is a modular ontology-oriented database model that gained popularity due to its robustness and flexibility as a generic platform to store biological data; however it lacks supporting representation of clinical and socio-demographic information. We have implemented an extension of Chado - the Clinical Module - to allow the representation of this kind of information. Our approach consists of a framework for data integration through the use of a common reference ontology. The design of this framework has four levels: data level, to store the data; semantic level, to integrate and standardize the data by the use of ontologies; application level, to manage clinical databases, ontologies and data integration process; and web interface level, to allow interaction between the user and the system. The clinical module was built based on the Entity-Attribute-Value (EAV) model. We also proposed a methodology to migrate data from legacy clinical databases to the integrative framework. A Chado instance was initialized using a relational database management system. The Clinical Module was implemented and the framework was loaded using data from a factual clinical research database. Clinical and demographic data as well as biomaterial data were obtained from patients with tumors of head and neck. We implemented the IPTrans tool that is a complete environment for data migration, which comprises: the construction of a model to describe the legacy clinical data, based on an ontology; the Extraction, Transformation and Load (ETL) process to extract the data from the source clinical database and load it in the Clinical Module of Chado; the development of a web tool and a Bridge Layer to adapt the web tool to Chado, as well as other applications. Open-source computational solutions currently available for translational science does not have a model to represent biomolecular information and also are not integrated with the existing bioinformatics tools. On the other hand, existing genomic data models do not represent clinical patient data. A framework was developed to support translational research by integrating biomolecular information coming from different "omics" technologies with patient's clinical and socio-demographic data. This framework should present some features: flexibility, compression and robustness. The experiments accomplished from a use case demonstrated that the proposed system meets requirements of flexibility and robustness, leading to the desired integration. The Clinical Module can be accessed in http://dcm.ffclrp.usp.br/caib/pg=iptrans.
E&P data lifecycle: a case study in Petrobras Company
NASA Astrophysics Data System (ADS)
Mastella, Laura; Campinho, Vania; Alonso, João
2013-04-01
Petrobras, the biggest Brazilian Petroleum Company, has been studying and working on Brazilian sedimentary basins for nearly 60 years. The corporate database currently registers over 25000 wells and all their associated products (geophysical logs, cores, sidewall samples) and analyses. There are thousands of samples, descriptions, pictures, measures, and other scientific data resulted from petroleum exploration and production. This data constitutes a huge scientific database which is applied to support Petrobras economic strategy. Geological models built during the exploration phase continue to be refined during both the development and production phases: data should be continually manipulated, correlated and integrated. As E&P assets reach maturity, a new cycle starts: data is re-analyzed and new hypotheses are made in order to increase hydrocarbon productivity. Initial geological models then evolve from accumulated knowledge throughout all the E&P phases. Therefore the quality control must be performed in the first phases of data acquisition, i.e., during the exploration phase, to avoid reworking and loss of information. The last decade witnessed a great evolution in petroleum industry technology. As a consequence, the complexity and particulars of the information generated have increased accordingly. Current technology has also facilitated access to networks and databases, making it possible to store large amounts of information. This scenario makes available a large mass of information from difference sources, which uses heterogeneous vocabulary as well as different scales and measurement units. In this context, knowledge might be diluted and the total amount of information cannot be applied in E&P process. In order to provide adequate data governance, data input is controlled by rules, standards and policies, implemented by corporate software systems. Petrobras' integrated E&P database is a centralized repository to which all E&P systems can have access. The quality of the data that goes into the database can be increased by means of information management practices: • data validation, • language internationalization, • dictionaries, patterns, metadata. Moreover, stored data must be kept consistent, and any changes in the data should be registered while maintaining, if possible, the original data, associating the modification with its author, timestamp and reason. These practices lead to the creation of a database that serves and benefits the company's knowledge. Information retrieval and visualization is one of the main issues concerning petroleum industries. In order to make significant information available for end-users, it is fundamental to have an efficient data integration strategy. The integration of E&P data, such as geological, geophysical, geographical and operational data, is the end goal of the exploratory activities. Petrobras corporate systems are evolving towards it so as to make available various data from diverse sources and to create a dashboard that can be easily accessed at any time by geoscientists and reservoir engineers. The main goal is to maintain scientific integrity of information, from generators to consumers, during all E&P data life cycle.
Kessel, Kerstin A; Combs, Stephanie E
2016-01-01
Recently, information availability has become more elaborate and widespread, and treatment decisions are based on a multitude of factors, including imaging, molecular or pathological markers, surgical results, and patient's preference. In this context, the term "Big Data" evolved also in health care. The "hype" is heavily discussed in literature. In interdisciplinary medical specialties, such as radiation oncology, not only heterogeneous and voluminous amount of data must be evaluated but also spread in different styles across various information systems. Exactly this problem is also referred to in many ongoing discussions about Big Data - the "three V's": volume, velocity, and variety. We reviewed 895 articles extracted from the NCBI databases about current developments in electronic clinical data management systems and their further analysis or postprocessing procedures. Few articles show first ideas and ways to immediately make use of collected data, particularly imaging data. Many developments can be noticed in the field of clinical trial or analysis documentation, mobile devices for documentation, and genomics research. Using Big Data to advance medical research is definitely on the rise. Health care is perhaps the most comprehensive, important, and economically viable field of application.
Yan, Xianghe; Peng, Yun; Meng, Jianghong; Ruzante, Juliana; Fratamico, Pina M; Huang, Lihan; Juneja, Vijay; Needleman, David S
2011-01-01
Several factors have hindered effective use of information and resources related to food safety due to inconsistency among semantically heterogeneous data resources, lack of knowledge on profiling of food-borne pathogens, and knowledge gaps among research communities, government risk assessors/managers, and end-users of the information. This paper discusses technical aspects in the establishment of a comprehensive food safety information system consisting of the following steps: (a) computational collection and compiling publicly available information, including published pathogen genomic, proteomic, and metabolomic data; (b) development of ontology libraries on food-borne pathogens and design automatic algorithms with formal inference and fuzzy and probabilistic reasoning to address the consistency and accuracy of distributed information resources (e.g., PulseNet, FoodNet, OutbreakNet, PubMed, NCBI, EMBL, and other online genetic databases and information); (c) integration of collected pathogen profiling data, Foodrisk.org ( http://www.foodrisk.org ), PMP, Combase, and other relevant information into a user-friendly, searchable, "homogeneous" information system available to scientists in academia, the food industry, and government agencies; and (d) development of a computational model in semantic web for greater adaptability and robustness.
Performance Modeling in CUDA Streams - A Means for High-Throughput Data Processing.
Li, Hao; Yu, Di; Kumar, Anand; Tu, Yi-Cheng
2014-10-01
Push-based database management system (DBMS) is a new type of data processing software that streams large volume of data to concurrent query operators. The high data rate of such systems requires large computing power provided by the query engine. In our previous work, we built a push-based DBMS named G-SDMS to harness the unrivaled computational capabilities of modern GPUs. A major design goal of G-SDMS is to support concurrent processing of heterogenous query processing operations and enable resource allocation among such operations. Understanding the performance of operations as a result of resource consumption is thus a premise in the design of G-SDMS. With NVIDIA's CUDA framework as the system implementation platform, we present our recent work on performance modeling of CUDA kernels running concurrently under a runtime mechanism named CUDA stream . Specifically, we explore the connection between performance and resource occupancy of compute-bound kernels and develop a model that can predict the performance of such kernels. Furthermore, we provide an in-depth anatomy of the CUDA stream mechanism and summarize the main kernel scheduling disciplines in it. Our models and derived scheduling disciplines are verified by extensive experiments using synthetic and real-world CUDA kernels.
Interstitial lung disease in systemic autoimmune rheumatic diseases: a comprehensive review.
Atzeni, Fabiola; Gerardi, Maria Chiara; Barilaro, Giuseppe; Masala, Ignazio Francesco; Benucci, Maurizio; Sarzi-Puttini, Piercarlo
2018-01-01
Interstitial lung diseases (ILDs) are among the most serious complications associated with systemic rheumatic diseases, and lead to significant morbidity and mortality; they may also be the first manifestation of connective tissue diseases (CTDs). The aim of this narrative review is to summarise the data concerning the pathogenesis of CTD/ILD and its distinguishing features in different rheumatic diseseas. Areas covered: The pathogenesis, clinical aspects and treatment of ILD associated with rheumatic systemic diseases and CTDs were reviewed by searching the PubMed, Medline, and Cochrane Library databases for papers published between 1995 and February 2017 using combinations of words or terms. Articles not written in English were excluded. Expert commentary: The management of CTD-ILD is challenging because of the lack of robust data regarding the treatments used, the heterogeneity of the diseases themselves, and the scarcity of well-defined outcome measures. Treatment decisions are often made clinically on the basis of functional, radiographic progression, and exacerbating factors such as age and the burden of comorbidities. Given the complexities of diagnosis and the paucity of treatment trials, the management of CTD patients with ILD requires multidisciplinary collaboration between rheumatologists and pulmonologists in CTD-ILD clinics.
An Effective Cache Algorithm for Heterogeneous Storage Systems
Li, Yong; Feng, Dan
2013-01-01
Modern storage environment is commonly composed of heterogeneous storage devices. However, traditional cache algorithms exhibit performance degradation in heterogeneous storage systems because they were not designed to work with the diverse performance characteristics. In this paper, we present a new cache algorithm called HCM for heterogeneous storage systems. The HCM algorithm partitions the cache among the disks and adopts an effective scheme to balance the work across the disks. Furthermore, it applies benefit-cost analysis to choose the best allocation of cache block to improve the performance. Conducting simulations with a variety of traces and a wide range of cache size, our experiments show that HCM significantly outperforms the existing state-of-the-art storage-aware cache algorithms. PMID:24453890
Lee, Howard; Chapiro, Julius; Schernthaner, Rüdiger; Duran, Rafael; Wang, Zhijun; Gorodetski, Boris; Geschwind, Jean-François; Lin, MingDe
2015-04-01
The objective of this study was to demonstrate that an intra-arterial liver therapy clinical research database system is a more workflow efficient and robust tool for clinical research than a spreadsheet storage system. The database system could be used to generate clinical research study populations easily with custom search and retrieval criteria. A questionnaire was designed and distributed to 21 board-certified radiologists to assess current data storage problems and clinician reception to a database management system. Based on the questionnaire findings, a customized database and user interface system were created to perform automatic calculations of clinical scores including staging systems such as the Child-Pugh and Barcelona Clinic Liver Cancer, and facilitates data input and output. Questionnaire participants were favorable to a database system. The interface retrieved study-relevant data accurately and effectively. The database effectively produced easy-to-read study-specific patient populations with custom-defined inclusion/exclusion criteria. The database management system is workflow efficient and robust in retrieving, storing, and analyzing data. Copyright © 2015 AUR. Published by Elsevier Inc. All rights reserved.
Implementation of a data management software system for SSME test history data
NASA Technical Reports Server (NTRS)
Abernethy, Kenneth
1986-01-01
The implementation of a software system for managing Space Shuttle Main Engine (SSME) test/flight historical data is presented. The software system uses the database management system RIM7 for primary data storage and routine data management, but includes several FORTRAN programs, described here, which provide customized access to the RIM7 database. The consolidation, modification, and transfer of data from the database THIST, to the RIM7 database THISRM is discussed. The RIM7 utility modules for generating some standard reports from THISRM and performing some routine updating and maintenance are briefly described. The FORTRAN accessing programs described include programs for initial loading of large data sets into the database, capturing data from files for database inclusion, and producing specialized statistical reports which cannot be provided by the RIM7 report generator utility. An expert system tutorial, constructed using the expert system shell product INSIGHT2, is described. Finally, a potential expert system, which would analyze data in the database, is outlined. This system could use INSIGHT2 as well and would take advantage of RIM7's compatibility with the microcomputer database system RBase 5000.
Tosato, Valentina; Sims, Jason; West, Nicole; Colombin, Martina; Bruschi, Carlo V
2017-05-01
Adaptation by natural selection might improve the fitness of an organism and its probability to survive in unfavorable environmental conditions. Decoding the genetic basis of adaptive evolution is one of the great challenges to deal with. To this purpose, Saccharomyces cerevisiae has been largely investigated because of its short division time, excellent aneuploidy tolerance and the availability of the complete sequence of its genome with a thorough genome database. In the past, we developed a system, named bridge-induced translocation, to trigger specific, non-reciprocal translocations, exploiting the endogenous recombination system of budding yeast. This technique allows users to generate a heterogeneous population of cells with different aneuploidies and increased phenotypic variation. In this work, we demonstrate that ad hoc chromosomal translocations might induce adaptation, fostering selection of thermo-tolerant yeast strains with improved phenotypic fitness. This "yeast eugenomics" correlates with a shift to enhanced expression of genes involved in stress response, heat shock as well as carbohydrate metabolism. We propose that the bridge-induced translocation is a suitable approach to generate adapted, physiologically boosted strains for biotechnological applications.
A Semantic Sensor Web for Environmental Decision Support Applications
Gray, Alasdair J. G.; Sadler, Jason; Kit, Oles; Kyzirakos, Kostis; Karpathiotakis, Manos; Calbimonte, Jean-Paul; Page, Kevin; García-Castro, Raúl; Frazer, Alex; Galpin, Ixent; Fernandes, Alvaro A. A.; Paton, Norman W.; Corcho, Oscar; Koubarakis, Manolis; De Roure, David; Martinez, Kirk; Gómez-Pérez, Asunción
2011-01-01
Sensing devices are increasingly being deployed to monitor the physical world around us. One class of application for which sensor data is pertinent is environmental decision support systems, e.g., flood emergency response. For these applications, the sensor readings need to be put in context by integrating them with other sources of data about the surrounding environment. Traditional systems for predicting and detecting floods rely on methods that need significant human resources. In this paper we describe a semantic sensor web architecture for integrating multiple heterogeneous datasets, including live and historic sensor data, databases, and map layers. The architecture provides mechanisms for discovering datasets, defining integrated views over them, continuously receiving data in real-time, and visualising on screen and interacting with the data. Our approach makes extensive use of web service standards for querying and accessing data, and semantic technologies to discover and integrate datasets. We demonstrate the use of our semantic sensor web architecture in the context of a flood response planning web application that uses data from sensor networks monitoring the sea-state around the coast of England. PMID:22164110
Development and Operation of a Database Machine for Online Access and Update of a Large Database.
ERIC Educational Resources Information Center
Rush, James E.
1980-01-01
Reviews the development of a fault tolerant database processor system which replaced OCLC's conventional file system. A general introduction to database management systems and the operating environment is followed by a description of the hardware selection, software processes, and system characteristics. (SW)
75 FR 18255 - Passenger Facility Charge Database System for Air Carrier Reporting
Federal Register 2010, 2011, 2012, 2013, 2014
2010-04-09
... Facility Charge Database System for Air Carrier Reporting AGENCY: Federal Aviation Administration (FAA... the Passenger Facility Charge (PFC) database system to report PFC quarterly report information. In... developed a national PFC database system in order to more easily track the PFC program on a nationwide basis...
An Improved Database System for Program Assessment
ERIC Educational Resources Information Center
Haga, Wayne; Morris, Gerard; Morrell, Joseph S.
2011-01-01
This research paper presents a database management system for tracking course assessment data and reporting related outcomes for program assessment. It improves on a database system previously presented by the authors and in use for two years. The database system presented is specific to assessment for ABET (Accreditation Board for Engineering and…
Integrating heterogeneous drug sensitivity data from cancer pharmacogenomic studies.
Pozdeyev, Nikita; Yoo, Minjae; Mackie, Ryan; Schweppe, Rebecca E; Tan, Aik Choon; Haugen, Bryan R
2016-08-09
The consistency of in vitro drug sensitivity data is of key importance for cancer pharmacogenomics. Previous attempts to correlate drug sensitivities from the large pharmacogenomics databases, such as the Cancer Cell Line Encyclopedia (CCLE) and the Genomics of Drug Sensitivity in Cancer (GDSC), have produced discordant results. We developed a new drug sensitivity metric, the area under the dose response curve adjusted for the range of tested drug concentrations, which allows integration of heterogeneous drug sensitivity data from the CCLE, the GDSC, and the Cancer Therapeutics Response Portal (CTRP). We show that there is moderate to good agreement of drug sensitivity data for many targeted therapies, particularly kinase inhibitors. The results of this largest cancer cell line drug sensitivity data analysis to date are accessible through the online portal, which serves as a platform for high power pharmacogenomics analysis.
76 FR 11465 - Privacy Act of 1974; System of Records
Federal Register 2010, 2011, 2012, 2013, 2014
2011-03-02
... separate systems of records: ``FHFA-OIG Audit Files Database,'' ``FHFA-OIG Investigative & Evaluative Files Database,'' ``FHFA-OIG Investigative & Evaluative MIS Database,'' and ``FHFA-OIG Hotline Database.'' These... Audit Files Database. FHFA-OIG-2: FHFA-OIG Investigative & Evaluative Files Database. FHFA-OIG-3: FHFA...
Malmir, Hanieh; Shab-Bidar, Sakineh; Djafarian, Kurosh
2018-04-01
We aimed to systematically review available data on the association between vitamin C intake and bone mineral density (BMD), as well as risk of fractures and osteoporosis, and to summarise this information through a meta-analysis. Previous studies on vitamin C intake in relation to BMD and risk of fracture and osteoporosis were selected through searching PubMed, Scopus, ISI Web of Science and Google Scholar databases before February 2017, using MeSH and text words. To pool data, either a fixed-effects model or a random-effects model was used, and for assessing heterogeneity, Cochran's Q and I 2 tests were used. Subgroup analysis was applied to define possible sources of heterogeneity. Greater dietary vitamin C intake was positively associated with BMD at femoral neck (pooled r 0·18; 0·06, 0·30) and lumbar spine (pooled r 0·14; 95 % CI 0·06, 0·22); however, significant between-study heterogeneity was found at femoral neck: I 2=87·6 %, P heterogeneity<0·001. In addition, we found a non-significant association between dietary vitamin C intake and the risk of hip fracture (overall relative risk=0·74; 95 % CI 0·51, 1·08). Significant between-study heterogeneity was found (I 2=79·1 %, P heterogeneity<0·001), and subgroup analysis indicated that study design, sex and age were the main sources of heterogeneity. Greater dietary vitamin C intake was associated with a 33 % lower risk of osteoporosis (overall relative risk=0·67; 95 % CI 0·47, 0·94). Greater dietary vitamin C intake was associated with a lower risk of hip fracture and osteoporosis, as well as higher BMD, at femoral neck and lumbar spine.
The effect of soil heterogeneity on ATES performance
NASA Astrophysics Data System (ADS)
Sommer, W.; Rijnaarts, H.; Grotenhuis, T.; van Gaans, P.
2012-04-01
Due to an increasing demand for sustainable energy, application of Aquifer Thermal Energy Storage (ATES) is growing rapidly. Large-scale application of ATES is limited by the space that is available in the subsurface. Especially in urban areas, suboptimal performance is expected due to thermal interference between individual wells of a single system, or interference with other ATES systems or groundwater abstractions. To avoid thermal interference there are guidelines on well spacing. However, these guidelines, and also design calculations, are based on the assumption of a homogeneous subsurface, while studies report a standard deviation in logpermeability of 1 to 2 for unconsolidated aquifers (Gelhar, 1993). Such heterogeneity may create preferential pathways, reducing ATES performance due to increased advective heat loss or interference between ATES wells. The role of hydraulic heterogeneity of the subsurface related to ATES performance has received little attention in literature. Previous research shows that even small amounts of heterogeneity can result in considerable uncertainty in the distribution of thermal energy in the subsurface and an increased radius of influence (Ferguson, 2007). This is supported by subsurface temperature measurements around ATES wells, which suggest heterogeneity gives rise to preferential pathways and short-circuiting between ATES wells (Bridger and Allen, 2010). Using 3-dimensional stochastic heat transport modeling, we quantified the influence of heterogeneity on the performance of a doublet well energy storage system. The following key parameters are varied to study their influence on thermal recovery and thermal balance: 1) regional flow velocity, 2) distance between wells and 3) characteristics of the heterogeneity. Results show that heterogeneity at the scale of a doublet ATES system introduces an uncertainty up to 18% in expected thermal recovery. The uncertainty increases with decreasing distance between ATES wells. The uncertainty in the thermal balance ratio related to heterogeneity is limited (smaller than 3%). If thermal interference should be avoided, wells in heterogeneous aquifers should be placed further apart than in homogeneous aquifers, leading to larger volume claim in the subsurface. By relating the number of ATES systems in an area to their expected performance, these results can be used to optimize regional application of ATES. Bridger, D. W. and D. M. Allen (2010). "Heat transport simulations in a heterogeneous aquifer used for aquifer thermal energy storage (ATES)." Canadian Geotechnical Journal 47(1): 96-115. Ferguson, G. (2007). "Heterogeneity and thermal modeling of ground water." Ground Water 45(4): 485-490. Gelhar, L. W. (1993). Stochastic Subsurface Hydrology, Prentice Hall.
Optimizing structure of complex technical system by heterogeneous vector criterion in interval form
NASA Astrophysics Data System (ADS)
Lysenko, A. V.; Kochegarov, I. I.; Yurkov, N. K.; Grishko, A. K.
2018-05-01
The article examines the methods of development and multi-criteria choice of the preferred structural variant of the complex technical system at the early stages of its life cycle in the absence of sufficient knowledge of parameters and variables for optimizing this structure. The suggested methods takes into consideration the various fuzzy input data connected with the heterogeneous quality criteria of the designed system and the parameters set by their variation range. The suggested approach is based on the complex use of methods of interval analysis, fuzzy sets theory, and the decision-making theory. As a result, the method for normalizing heterogeneous quality criteria has been developed on the basis of establishing preference relations in the interval form. The method of building preferential relations in the interval form on the basis of the vector of heterogeneous quality criteria suggest the use of membership functions instead of the coefficients considering the criteria value. The former show the degree of proximity of the realization of the designed system to the efficient or Pareto optimal variants. The study analyzes the example of choosing the optimal variant for the complex system using heterogeneous quality criteria.
An Overview of MSHN: The Management System for Heterogeneous Networks
1999-04-01
An Overview of MSHN: The Management System for Heterogeneous Networks Debra A. Hensgen†, Taylor Kidd†, David St. John§, Matthew C . Schnaidt†, Howard...ABSTRACT UU 18. NUMBER OF PAGES 15 19a. NAME OF RESPONSIBLE PERSON a. REPORT unclassified b. ABSTRACT unclassified c . THIS PAGE...Alhusaini, V. K. Prasanna, and C . S. Raghavendra, “A unified resource scheduling framework for heterogeneous computing environments,” Proc. 8th IEEE
Improved Information Retrieval Performance on SQL Database Using Data Adapter
NASA Astrophysics Data System (ADS)
Husni, M.; Djanali, S.; Ciptaningtyas, H. T.; Wicaksana, I. G. N. A.
2018-02-01
The NoSQL databases, short for Not Only SQL, are increasingly being used as the number of big data applications increases. Most systems still use relational databases (RDBs), but as the number of data increases each year, the system handles big data with NoSQL databases to analyze and access data more quickly. NoSQL emerged as a result of the exponential growth of the internet and the development of web applications. The query syntax in the NoSQL database differs from the SQL database, therefore requiring code changes in the application. Data adapter allow applications to not change their SQL query syntax. Data adapters provide methods that can synchronize SQL databases with NotSQL databases. In addition, the data adapter provides an interface which is application can access to run SQL queries. Hence, this research applied data adapter system to synchronize data between MySQL database and Apache HBase using direct access query approach, where system allows application to accept query while synchronization process in progress. From the test performed using data adapter, the results obtained that the data adapter can synchronize between SQL databases, MySQL, and NoSQL database, Apache HBase. This system spends the percentage of memory resources in the range of 40% to 60%, and the percentage of processor moving from 10% to 90%. In addition, from this system also obtained the performance of database NoSQL better than SQL database.
A framework for analysis of large database of old art paintings
NASA Astrophysics Data System (ADS)
Da Rugna, Jérome; Chareyron, Ga"l.; Pillay, Ruven; Joly, Morwena
2011-03-01
For many years, a lot of museums and countries organize the high definition digitalization of their own collections. In consequence, they generate massive data for each object. In this paper, we only focus on art painting collections. Nevertheless, we faced a very large database with heterogeneous data. Indeed, image collection includes very old and recent scans of negative photos, digital photos, multi and hyper spectral acquisitions, X-ray acquisition, and also front, back and lateral photos. Moreover, we have noted that art paintings suffer from much degradation: crack, softening, artifact, human damages and, overtime corruption. Considering that, it appears necessary to develop specific approaches and methods dedicated to digital art painting analysis. Consequently, this paper presents a complete framework to evaluate, compare and benchmark devoted to image processing algorithms.
Monitoring of IaaS and scientific applications on the Cloud using the Elasticsearch ecosystem
NASA Astrophysics Data System (ADS)
Bagnasco, S.; Berzano, D.; Guarise, A.; Lusso, S.; Masera, M.; Vallero, S.
2015-05-01
The private Cloud at the Torino INFN computing centre offers IaaS services to different scientific computing applications. The infrastructure is managed with the OpenNebula cloud controller. The main stakeholders of the facility are a grid Tier-2 site for the ALICE collaboration at LHC, an interactive analysis facility for the same experiment and a grid Tier-2 site for the BES-III collaboration, plus an increasing number of other small tenants. Besides keeping track of the usage, the automation of dynamic allocation of resources to tenants requires detailed monitoring and accounting of the resource usage. As a first investigation towards this, we set up a monitoring system to inspect the site activities both in terms of IaaS and applications running on the hosted virtual instances. For this purpose we used the Elasticsearch, Logstash and Kibana stack. In the current implementation, the heterogeneous accounting information is fed to different MySQL databases and sent to Elasticsearch via a custom Logstash plugin. For the IaaS metering, we developed sensors for the OpenNebula API. The IaaS level information gathered through the API is sent to the MySQL database through an ad-hoc developed RESTful web service, which is also used for other accounting purposes. Concerning the application level, we used the Root plugin TProofMonSenderSQL to collect accounting data from the interactive analysis facility. The BES-III virtual instances used to be monitored with Zabbix, as a proof of concept we also retrieve the information contained in the Zabbix database. Each of these three cases is indexed separately in Elasticsearch. We are now starting to consider dismissing the intermediate level provided by the SQL database and evaluating a NoSQL option as a unique central database for all the monitoring information. We setup a set of Kibana dashboards with pre-defined queries in order to monitor the relevant information in each case. In this way we have achieved a uniform monitoring interface for both the IaaS and the scientific applications, mostly leveraging off-the-shelf tools.
The contribution of nurses to incident disclosure: a narrative review.
Harrison, Reema; Birks, Yvonne; Hall, Jill; Bosanquet, Kate; Harden, Melissa; Iedema, Rick
2014-02-01
To explore (a) how nurses feel about disclosing patient safety incidents to patients, (b) the current contribution that nurses make to the process of disclosing patient safety incidents to patients and (c) the barriers that nurses report as inhibiting their involvement in disclosure. A systematic search process was used to identify and select all relevant material. Heterogeneity in study design of the included articles prohibited a meta-analysis and findings were therefore synthesised in a narrative review. A range of text words, synonyms and subject headings were developed in conjunction with the York Centre for Reviews and Dissemination and used to undertake a systematic search of electronic databases (MEDLINE; EMBASE; CENTRAL; PsycINFO; Health Management and Information Consortium; CINAHL; ASSIA; Science Citation Index; Social Science Citation Index; Cochrane Database of Systematic Reviews; Database of Abstracts of Reviews of Effects; Health Technology Assessment Database; Health Systems Evidence; PASCAL; LILACS). Retrieval of studies was restricted to those published after 1980. Further data sources were: websites, grey literature, research in progress databases, hand-searching of relevant journals and author contact. The title and abstract of each citation was independently screened by two reviewers and disagreements resolved by consensus or consultation with a third person. Full text articles retrieved were further screened against the inclusion and exclusion criteria then checked by a second reviewer (YB). Relevant data were extracted and findings were synthesised in a narrative empirical synthesis. The systematic search and selection process identified 15 publications which included 11 unique studies that emerged from a range of locations. Findings suggest that nurses currently support both physicians and patients through incident disclosure, but may be ill-prepared to disclose incidents independently. Barriers to nurse involvement included a lack of opportunities for education and training, and the multiple and sometimes conflicting roles within nursing. Numerous potential benefits were identified that may result from nurses having a greater contribution to the disclosure process, but the provision of support and training is essential to overcome the reported barriers faced by nurses internationally. Copyright © 2013 Elsevier Ltd. All rights reserved.
Wilson, Claire; Blackwood, Bronagh; McAuley, Danny F; Perkins, Gavin D; McMullan, Ronan; Gates, Simon; Warhurst, Geoffrey
2012-01-01
Background There is growing interest in the potential utility of molecular diagnostics in improving the detection of life-threatening infection (sepsis). LightCycler® SeptiFast is a multipathogen probe-based real-time PCR system targeting DNA sequences of bacteria and fungi present in blood samples within a few hours. We report here the protocol of the first systematic review of published clinical diagnostic accuracy studies of this technology when compared with blood culture in the setting of suspected sepsis. Methods/design Data sources: the Cochrane Database of Systematic Reviews, the Database of Abstracts of Reviews of Effects (DARE), the Health Technology Assessment Database (HTA), the NHS Economic Evaluation Database (NHSEED), The Cochrane Library, MEDLINE, EMBASE, ISI Web of Science, BIOSIS Previews, MEDION and the Aggressive Research Intelligence Facility Database (ARIF). Study selection: diagnostic accuracy studies that compare the real-time PCR technology with standard culture results performed on a patient's blood sample during the management of sepsis. Data extraction: three reviewers, working independently, will determine the level of evidence, methodological quality and a standard data set relating to demographics and diagnostic accuracy metrics for each study. Statistical analysis/data synthesis: heterogeneity of studies will be investigated using a coupled forest plot of sensitivity and specificity and a scatter plot in Receiver Operator Characteristic (ROC) space. Bivariate model method will be used to estimate summary sensitivity and specificity. The authors will investigate reporting biases using funnel plots based on effective sample size and regression tests of asymmetry. Subgroup analyses are planned for adults, children and infection setting (hospital vs community) if sufficient data are uncovered. Dissemination Recommendations will be made to the Department of Health (as part of an open-access HTA report) as to whether the real-time PCR technology has sufficient clinical diagnostic accuracy potential to move forward to efficacy testing during the provision of routine clinical care. Registration PROSPERO—NIHR Prospective Register of Systematic Reviews (CRD42011001289). PMID:22240646
S. T. A. Pickett; M. L. Cadenasso; E. J. Rosi-Marshall; Ken Belt; P. M. Groffman; Morgan Grove; E. G. Irwin; S. S. Kaushal; S. L. LaDeau; C. H. Nilon; C. M. Swan; P. S. Warren
2016-01-01
Urban areas are understood to be extraordinarily spatially heterogeneous. Spatial heterogeneity, and its causes, consequences, and changes, are central to ecological science. The social sciences and urban design and planning professions also include spatial heterogeneity as a key concern. However, urban ecology, as a pursuit that integrates across these disciplines,...
Gensous, Noémie; Marti, Aurélie; Barnetche, Thomas; Blanco, Patrick; Lazaro, Estibaliz; Seneschal, Julien; Truchetet, Marie-Elise; Duffau, Pierre; Richez, Christophe
2017-10-24
The aim of this study was to identify the most reliable biomarkers in the literature that could be used as flare predictors in systemic lupus erythematosus (SLE). A systematic review of the literature was performed using two databases (MEDLINE and EMBASE) through April 2015 and congress abstracts from the American College of Rheumatology and the European League Against Rheumatism were reviewed from 2010 to 2014. Two independent reviewers screened titles and abstracts and analysed selected papers in detail, using a specific questionnaire. Reports addressing the relationships between one or more defined biological test(s) and the occurrence of disease exacerbation were included in the systematic review. From all of the databases, 4668 records were retrieved, of which 69 studies or congress abstracts were selected for the systematic review. The performance of seven types of biomarkers performed routinely in clinical practice and nine types of novel biological markers was evaluated. Despite some encouraging results for anti-double-stranded DNA antibodies, anti-C1q antibodies, B-lymphocyte stimulator and tumour necrosis factor-like weak inducer of apoptosis, none of the biomarkers stood out from the others as a potential gold standard for flare prediction. The results were heterogeneous, and a lack of standardized data prevented us from identifying a powerful biomarker. No powerful conclusions could be drawn from this systematic review due to a lack of standardized data. Efforts should be undertaken to optimize future research on potential SLE biomarkers to develop validated candidates. Thus, we propose a standardized pattern for future studies.
Economic impact of electronic prescribing in the hospital setting: A systematic review.
Ahmed, Zamzam; Barber, Nick; Jani, Yogini; Garfield, Sara; Franklin, Bryony Dean
2016-04-01
To examine evidence on the economic impact of electronic prescribing (EP) systems in the hospital setting. We conducted a systematic search of MEDLINE, EMBASE, PsycINFO, International Pharmaceutical Abstracts, the NHS Economic Evaluation Database, the European Network of Health Economic Evaluation Database and Web of Science from inception to October 2013. Full and partial economic evaluations of EP or computerized provider order entry were included. We excluded studies assessing prescribing packages for specific drugs, and monetary outcomes that were not related to medicines. A checklist was used to evaluate risk of bias and evidence quality. The search yielded 1160 articles of which three met the inclusion criteria. Two were full economic evaluations and one a partial economic evaluation. A meta-analysis was not appropriate as studies were heterogeneous in design, economic evaluation method, interventions and outcome measures. Two studies investigated the financial impact of reducing preventable adverse drug events. The third measured savings related to various aspects of the system including those related to medication. Two studies reported positive financial effects. However the overall quality of the economic evidence was low and key details often not reported. There seems to be some evidence of financial benefits of EP in the hospital setting. However, it is not clear if evidence is transferable to other settings. Research is scarce and limited in quality, and reported methods are not always transparent. Further robust, high quality research is required to establish if hospital EP is cost effective and thus inform policy makers' decisions. Copyright © 2016. Published by Elsevier Ireland Ltd.
Efficiency of Cordless Versus Cord Techniques of Gingival Retraction: A Systematic Review.
Huang, Cui; Somar, Mirinal; Li, Kang; Mohadeb, Jhassu Varsha Naveena
2017-04-01
Primarily to assess the efficacy of cordless versus cord techniques in achieving hemostasis control and gingival displacement and their influence on gingival/periodontal health. In addition, subjective factors reported by the patient (pain, sensitivity, unpleasant taste, discomfort) and operator's experience to both techniques were analyzed. An electronic database search was conducted using five main databases ranging from publication year 1998 to December 2014 to identify any in vivo studies comparing cord and cordless gingival retraction techniques. Seven potential studies were analyzed. Out of the four articles that reported achievement of hemostasis control, three compared patients treated by an epi-gingival finish line and concluded that paste techniques were more efficient in controlling bleeding. Five studies reported on the amount of sulcus dilatation, with contrasting evidence. Only one study reported an increased gingival displacement when paste systems were used. Two studies did not observe any significant difference, although two showed greater gingival displacement associated with cords, particularly in cases where the finish line was placed at a subgingival level. Of the four studies that assessed the influence of both techniques on the gingival/periodontal health, three noted less traumatic injury to soft tissues when gingival paste was used. A paste system, in general, was documented to be more comfortable to patients and user-friendly to the operator. Because of heterogeneity of measurement variables across studies, this study precluded a meta-analytic approach. Although both techniques (cord/cordless) are reliable in achieving gingival retraction, some situations were identified wherein each of the techniques proved to be more efficient. © 2015 by the American College of Prosthodontists.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ernest A. Mancini
The University of Alabama in cooperation with Texas A&M University, McGill University, Longleaf Energy Group, Strago Petroleum Corporation, and Paramount Petroleum Company are undertaking an integrated, interdisciplinary geoscientific and engineering research project. The project is designed to characterize and model reservoir architecture, pore systems and rock-fluid interactions at the pore to field scale in Upper Jurassic Smackover reef and carbonate shoal reservoirs associated with varying degrees of relief on pre-Mesozoic basement paleohighs in the northeastern Gulf of Mexico. The project effort includes the prediction of fluid flow in carbonate reservoirs through reservoir simulation modeling which utilizes geologic reservoir characterization andmore » modeling and the prediction of carbonate reservoir architecture, heterogeneity and quality through seismic imaging. The primary objective of the project is to increase the profitability, producibility and efficiency of recovery of oil from existing and undiscovered Upper Jurassic fields characterized by reef and carbonate shoals associated with pre-Mesozoic basement paleohighs. The principal research effort for Year 1 of the project has been reservoir description and characterization. This effort has included four tasks: (1) geoscientific reservoir characterization, (2) the study of rock-fluid interactions, (3) petrophysical and engineering characterization and (4) data integration. This work was scheduled for completion in Year 1. Overall, the project work is on schedule. Geoscientific reservoir characterization is essentially completed. The architecture, porosity types and heterogeneity of the reef and shoal reservoirs at Appleton and Vocation Fields have been characterized using geological and geophysical data. The study of rock-fluid interactions has been initiated. Observations regarding the diagenetic processes influencing pore system development and heterogeneity in these reef and shoal reservoirs have been made. Petrophysical and engineering property characterization is progressing. Data on reservoir production rate and pressure history at Appleton and Vocation Fields have been tabulated, and porosity data from core analysis has been correlated with porosity as observed from well log response. Data integration is on schedule, in that, the geological, geophysical, petrophysical and engineering data collected to date for Appleton and Vocation Fields have been compiled into a fieldwide digital database for reservoir characterization, modeling and simulation for the reef and carbonate shoal reservoirs for each of these fields.« less
NASA Astrophysics Data System (ADS)
Pedretti, Daniele; Masetti, Marco; Beretta, Giovanni Pietro
2017-10-01
The expected long-term efficiency of vertical cutoff walls coupled to pump-and-treat technologies to contain solute plumes in highly heterogeneous aquifers was analyzed. A well-characterized case study in Italy, with a hydrogeological database of 471 results from hydraulic tests performed on the aquifer and the surrounding 2-km-long cement-bentonite (CB) walls, was used to build a conceptual model and assess a representative remediation site adopting coupled technologies. In the studied area, the aquifer hydraulic conductivity Ka [m/d] is log-normally distributed with mean E (Ya) = 0.32 , variance σYa2 = 6.36 (Ya = lnKa) and spatial correlation well described by an exponential isotropic variogram with integral scale less than 1/12 the domain size. The hardened CB wall's hydraulic conductivity, Kw [m/d], displayed strong scaling effects and a lognormal distribution with mean E (Yw) = - 3.43 and σYw2 = 0.53 (Yw =log10Kw). No spatial correlation of Kw was detected. Using this information, conservative transport was simulated across a CB wall in spatially correlated 1-D random Ya fields within a numerical Monte Carlo framework. Multiple scenarios representing different Kw values were tested. A continuous solute source with known concentration and deterministic drains' discharge rates were assumed. The efficiency of the confining system was measured by the probability of exceedance of concentration over a threshold (C∗) at a control section 10 years after the initial solute release. It was found that the stronger the aquifer heterogeneity, the higher the expected efficiency of the confinement system and the lower the likelihood of aquifer pollution. This behavior can be explained because, for the analyzed aquifer conditions, a lower Ka generates more pronounced drawdown in the water table in the proximity of the drain and consequently a higher advective flux towards the confined area, which counteracts diffusive fluxes across the walls. Thus, a higher σYa2 results in a larger amount of low Ka values in the proximity of the drain, and a higher probability of not exceeding C∗ .
DOE Office of Scientific and Technical Information (OSTI.GOV)
Venkata, Manjunath Gorentla; Aderholdt, William F
The pre-exascale systems are expected to have a significant amount of hierarchical and heterogeneous on-node memory, and this trend of system architecture in extreme-scale systems is expected to continue into the exascale era. along with hierarchical-heterogeneous memory, the system typically has a high-performing network ad a compute accelerator. This system architecture is not only effective for running traditional High Performance Computing (HPC) applications (Big-Compute), but also for running data-intensive HPC applications and Big-Data applications. As a consequence, there is a growing desire to have a single system serve the needs of both Big-Compute and Big-Data applications. Though the system architecturemore » supports the convergence of the Big-Compute and Big-Data, the programming models and software layer have yet to evolve to support either hierarchical-heterogeneous memory systems or the convergence. A programming abstraction to address this problem. The programming abstraction is implemented as a software library and runs on pre-exascale and exascale systems supporting current and emerging system architecture. Using distributed data-structures as a central concept, it provides (1) a simple, usable, and portable abstraction for hierarchical-heterogeneous memory and (2) a unified programming abstraction for Big-Compute and Big-Data applications.« less
NASA Astrophysics Data System (ADS)
Cervato, C.; Fils, D.; Bohling, G.; Diver, P.; Greer, D.; Reed, J.; Tang, X.
2006-12-01
The federation of databases is not a new endeavor. Great strides have been made e.g. in the health and astrophysics communities. Reviews of those successes indicate that they have been able to leverage off key cross-community core concepts. In its simplest implementation, a federation of databases with identical base schemas that can be extended to address individual efforts, is relatively easy to accomplish. Efforts of groups like the Open Geospatial Consortium have shown methods to geospatially relate data between different sources. We present here a summary of CHRONOS's (http://www.chronos.org) experience with highly heterogeneous data. Our experience with the federation of very diverse databases shows that the wide variety of encoding options for items like locality, time scale, taxon ID, and other key parameters makes it difficult to effectively join data across them. However, the response to this is not to develop one large, monolithic database, which will suffer growth pains due to social, national, and operational issues, but rather to systematically develop the architecture that will enable cross-resource (database, repository, tool, interface) interaction. CHRONOS has accomplished the major hurdle of federating small IT database efforts with service-oriented and XML-based approaches. The application of easy-to-use procedures that allow groups of all sizes to implement and experiment with searches across various databases and to use externally created tools is vital. We are sharing with the geoinformatics community the difficulties with application frameworks, user authentication, standards compliance, and data storage encountered in setting up web sites and portals for various science initiatives (e.g., ANDRILL, EARTHTIME). The ability to incorporate CHRONOS data, services, and tools into the existing framework of a group is crucial to the development of a model that supports and extends the vitality of the small- to medium-sized research effort that is essential for a vibrant scientific community. This presentation will directly address issues of portal development related to JSR-168 and other portal API's as well as issues related to both federated and local directory-based authentication. The application of service-oriented architecture in connection with ReST-based approaches is vital to facilitate service use by experienced and less experienced information technology groups. Application of these services with XML- based schemas allows for the connection to third party tools such a GIS-based tools and software designed to perform a specific scientific analysis. The connection of all these capabilities into a combined framework based on the standard XHTML Document object model and CSS 2.0 standards used in traditional web development will be demonstrated. CHRONOS also utilizes newer client techniques such as AJAX and cross- domain scripting along with traditional server-side database, application, and web servers. The combination of the various components of this architecture creates an environment based on open and free standards that allows for the discovery, retrieval, and integration of tools and data.
78 FR 2363 - Notification of Deletion of a System of Records; Automated Trust Funds Database
Federal Register 2010, 2011, 2012, 2013, 2014
2013-01-11
... Database AGENCY: Animal and Plant Health Inspection Service, USDA. ACTION: Notice of deletion of a system... establishing the Automated Trust Funds (ATF) database system of records. The Federal Information Security... Integrity Act of 1982, Public Law 97-255, provided authority for the system. The ATF database has been...
Dynamical Heterogeneity in Granular Fluids and Structural Glasses
NASA Astrophysics Data System (ADS)
Avila, Karina E.
Our current understanding of the dynamics of supercooled liquids and other similar slowly evolving (glassy) systems is rather limited. One aspect that is particularly poorly understood is the origin and behavior of the strong non trivial fluctuations that appear in the relaxation process toward equilibrium. Glassy systems and granular systems both present regions of particles moving cooperatively and at different rates from other regions. This phenomenon is known as spatially heterogeneous dynamics. A detailed explanation of this phenomenon may lead to a better understanding of the slow relaxation process, and perhaps it could even help to explain the presence of the glass transition. This dissertation concentrates on studying dynamical heterogeneity by analyzing simulation data for models of granular materials and structural glasses. For dissipative granular fluids, the growing behavior of dynamical heterogeneities is studied for different densities and different degrees of inelasticity in the particle collisions. The correlated regions are found to grow rapidly as the system approaches dynamical arrest. Their geometry is conserved even when probing at different cutoff length in the correlation function or when the energy dissipation in the system is increased. For structural glasses, I test a theoretical framework that models dynamical heterogeneity as originated in the presence of Goldstone modes, which emerge from a broken continuous time reparametrization symmetry. This analysis is based on quantifying the size and the spatial correlations of fluctuations in the time variable and of other kinds of fluctuations. The results obtained here agree with the predictions of the hypothesis. In particular, the fluctuations associated to the time reparametrization invariance become stronger for low temperatures, long timescales, and large coarse graining lengths. Overall, this research points to dynamical heterogeneity to be described for granular systems similarly than for other glassy systems and it provides evidence in favor of a particular theory for the origin of dynamical heterogeneity.
[The future of clinical laboratory database management system].
Kambe, M; Imidy, D; Matsubara, A; Sugimoto, Y
1999-09-01
To assess the present status of the clinical laboratory database management system, the difference between the Clinical Laboratory Information System and Clinical Laboratory System was explained in this study. Although three kinds of database management systems (DBMS) were shown including the relational model, tree model and network model, the relational model was found to be the best DBMS for the clinical laboratory database based on our experience and developments of some clinical laboratory expert systems. As a future clinical laboratory database management system, the IC card system connected to an automatic chemical analyzer was proposed for personal health data management and a microscope/video system was proposed for dynamic data management of leukocytes or bacteria.
García-Hermoso, Antonio; Saavedra, Jose M; Escalante, Yolanda; Sánchez-López, Mairena; Martínez-Vizcaíno, Vicente
2014-10-01
The purpose of this meta-analysis was to examine the evidence for the effectiveness of aerobic exercise interventions on reducing insulin resistance markers in obese children and/or adolescents. A secondary outcome was change in percentage of body fat. A computerized search was made from seven databases: CINAHL, Cochrane Central Register of Controlled Trials, EMBASE, ERIC, MEDLINE, PsycINFO, and Science Citation Index. The analysis was restricted to randomized controlled trials that examined the effect of aerobic exercise on insulin resistance markers in obese youth. Two independent reviewers screened studies and extracted data. Effect sizes (ES) and 95% confidence interval (CI) were calculated, and the heterogeneity of the studies was estimated using Cochran's Q-statistic. Nine studies were selected for meta-analysis as they fulfilled the inclusion criteria (n=367). Aerobic exercise interventions resulted in decreases in fasting glucose (ES=-0.39; low heterogeneity) and insulin (ES=-0.40; low heterogeneity) and in percentage of body fat (ES=-0.35; low heterogeneity). These improvements were specifically accentuated in adolescents (only in fasting insulin), or through programs lasting more than 12 weeks, three sessions per week, and over 60 min of aerobic exercise per session. This meta-analysis provides insights into the effectiveness of aerobic exercise interventions on insulin resistance markers in the obese youth population. © 2014 European Society of Endocrinology.
Yin, Zhujia; Liu, Lijuan; Wang, Haidong
2018-01-01
Based on the database data of Chinese industrial enterprises from 2000 to 2007 and the LP method, this paper measures the total factor productivity of enterprises and investigates the effect of different mixed ownership forms on enterprises’ efficiency and the effect of heterogeneous ownership balance on the mixed ownership enterprises’ efficiency. The state-owned enterprise and mixed ownership enterprise are identified by the enterprise’s paid-up capital. The results show that, on the whole, for the mixed ownership enterprise, the higher the diversification degree of the shareholders is, the higher the efficiency becomes, and in different types of industries, the mixed forms of shareholders have different effects on the efficiency of enterprises. The heterogeneous ownership balance and the enterprise efficiency show nonlinear U-type relationships. Both the higher and lower heterogeneous ownership balance degrees will promote the enterprise’s efficiency. However, when the ownership balance degree is in the range of [0.2 0.5], the increase in ownership balance will lead to the decline of enterprise efficiency. Therefore, when introducing non-state-owned capital, state-owned enterprises should take full account of their own characteristics by rationally controlling the shareholding ratio of non-state-owned capital and play the positive role of a mixed ownership structure in corporate governance with appropriate ownership balances. PMID:29614126
High statistical heterogeneity is more frequent in meta-analysis of continuous than binary outcomes.
Alba, Ana C; Alexander, Paul E; Chang, Joanne; MacIsaac, John; DeFry, Samantha; Guyatt, Gordon H
2016-02-01
We compared the distribution of heterogeneity in meta-analyses of binary and continuous outcomes. We searched citations in MEDLINE and Cochrane databases for meta-analyses of randomized trials published in 2012 that reported a measure of heterogeneity of either binary or continuous outcomes. Two reviewers independently performed eligibility screening and data abstraction. We evaluated the distribution of I(2) in meta-analyses of binary and continuous outcomes and explored hypotheses explaining the difference in distributions. After full-text screening, we selected 671 meta-analyses evaluating 557 binary and 352 continuous outcomes. Heterogeneity as assessed by I(2) proved higher in continuous than in binary outcomes: the proportion of continuous and binary outcomes reporting an I(2) of 0% was 34% vs. 52%, respectively, and reporting an I(2) of 60-100% was 39% vs. 14%. In continuous but not binary outcomes, I(2) increased with larger number of studies included in a meta-analysis. Increased precision and sample size do not explain the larger I(2) found in meta-analyses of continuous outcomes with a larger number of studies. Meta-analyses evaluating continuous outcomes showed substantially higher I(2) than meta-analyses of binary outcomes. Results suggest differing standards for interpreting I(2) in continuous vs. binary outcomes may be appropriate. Copyright © 2016 Elsevier Inc. All rights reserved.
Fraley, Stephanie I; Hardick, Justin; Masek, Billie J; Jo Masek, Billie; Athamanolap, Pornpat; Rothman, Richard E; Gaydos, Charlotte A; Carroll, Karen C; Wakefield, Teresa; Wang, Tza-Huei; Yang, Samuel
2013-10-01
Comprehensive profiling of nucleic acids in genetically heterogeneous samples is important for clinical and basic research applications. Universal digital high-resolution melt (U-dHRM) is a new approach to broad-based PCR diagnostics and profiling technologies that can overcome issues of poor sensitivity due to contaminating nucleic acids and poor specificity due to primer or probe hybridization inaccuracies for single nucleotide variations. The U-dHRM approach uses broad-based primers or ligated adapter sequences to universally amplify all nucleic acid molecules in a heterogeneous sample, which have been partitioned, as in digital PCR. Extensive assay optimization enables direct sequence identification by algorithm-based matching of melt curve shape and Tm to a database of known sequence-specific melt curves. We show that single-molecule detection and single nucleotide sensitivity is possible. The feasibility and utility of U-dHRM is demonstrated through detection of bacteria associated with polymicrobial blood infection and microRNAs (miRNAs) associated with host response to infection. U-dHRM using broad-based 16S rRNA gene primers demonstrates universal single cell detection of bacterial pathogens, even in the presence of larger amounts of contaminating bacteria; U-dHRM using universally adapted Lethal-7 miRNAs in a heterogeneous mixture showcases the single copy sensitivity and single nucleotide specificity of this approach.
Multistep continuous-flow synthesis of (R)- and (S)-rolipram using heterogeneous catalysts
NASA Astrophysics Data System (ADS)
Tsubogo, Tetsu; Oyamada, Hidekazu; Kobayashi, Shū
2015-04-01
Chemical manufacturing is conducted using either batch systems or continuous-flow systems. Flow systems have several advantages over batch systems, particularly in terms of productivity, heat and mixing efficiency, safety, and reproducibility. However, for over half a century, pharmaceutical manufacturing has used batch systems because the synthesis of complex molecules such as drugs has been difficult to achieve with continuous-flow systems. Here we describe the continuous-flow synthesis of drugs using only columns packed with heterogeneous catalysts. Commercially available starting materials were successively passed through four columns containing achiral and chiral heterogeneous catalysts to produce (R)-rolipram, an anti-inflammatory drug and one of the family of γ-aminobutyric acid (GABA) derivatives. In addition, simply by replacing a column packed with a chiral heterogeneous catalyst with another column packed with the opposing enantiomer, we obtained antipole (S)-rolipram. Similarly, we also synthesized (R)-phenibut, another drug belonging to the GABA family. These flow systems are simple and stable with no leaching of metal catalysts. Our results demonstrate that multistep (eight steps in this case) chemical transformations for drug synthesis can proceed smoothly under flow conditions using only heterogeneous catalysts, without the isolation of any intermediates and without the separation of any catalysts, co-products, by-products, and excess reagents. We anticipate that such syntheses will be useful in pharmaceutical manufacturing.
Genetic and Environmental Pathways in Type 1 Diabetes Complications
2010-09-01
active duty members of the military, their families and retired military personnel will potentially allow focused preventative treatment of at- risk...association and assess potential heterogeneity of association signals, such as by ancestry. Query eQTL databases for relevant associations. Goals: 1a1...1, so that at most an average of 7 SNPs remain as potential risk SNPs at each of the 30 loci. In Stage 2 we will genotype another 2000 cases and
2013-12-01
AbdelWahab, “ 2G / 3G Inter-RAT Handover Performance Analysis,” Second European Conference on Antennas and Propagation, pp. 1, 8, 11–16, Nov. 2007. [19] J...RADIO GLOBAL SYSTEM FOR MOBILE COMMUNICATIONS TRANSMITTER DEVELOPMENT FOR HETEROGENEOUS NETWORK VULNERABILITY TESTING by Carson C. McAbee... MOBILE COMMUNICATIONS TRANSMITTER DEVELOPMENT FOR HETEROGENEOUS NETWORK VULNERABILITY TESTING 5. FUNDING NUMBERS 6. AUTHOR(S) Carson C. McAbee
Deng, Shengming; Wu, Zhifang; Wu, Yiwei; Zhang, Wei; Li, Jihui; Dai, Na
2017-01-01
The objective of this meta-analysis is to explore the correlation between the apparent diffusion coefficient (ADC) on diffusion-weighted MR and the standard uptake value (SUV) of 18F-FDG on PET/CT in patients with cancer. Databases such as PubMed (MEDLINE included), EMBASE, and Cochrane Database of Systematic Review were searched for relevant original articles that explored the correlation between SUV and ADC in English. After applying Fisher's r-to-z transformation, correlation coefficient (r) values were extracted from each study and 95% confidence intervals (CIs) were calculated. Sensitivity and subgroup analyses based on tumor type were performed to investigate the potential heterogeneity. Forty-nine studies were eligible for the meta-analysis, comprising 1927 patients. Pooled r for all studies was −0.35 (95% CI: −0.42–0.28) and exhibited a notable heterogeneity (I2 = 78.4%; P < 0.01). In terms of the cancer type subgroup analysis, combined correlation coefficients of ADC/SUV range from −0.12 (lymphoma, n = 5) to −0.59 (pancreatic cancer, n = 2). We concluded that there is an average negative correlation between ADC and SUV in patients with cancer. Higher correlations were found in the brain tumor, cervix carcinoma, and pancreas cancer. However, a larger, prospective study is warranted to validate these findings in different cancer types. PMID:29097924
Deng, Shengming; Wu, Zhifang; Wu, Yiwei; Zhang, Wei; Li, Jihui; Dai, Na; Zhang, Bin; Yan, Jianhua
2017-01-01
The objective of this meta-analysis is to explore the correlation between the apparent diffusion coefficient (ADC) on diffusion-weighted MR and the standard uptake value (SUV) of 18 F-FDG on PET/CT in patients with cancer. Databases such as PubMed (MEDLINE included), EMBASE, and Cochrane Database of Systematic Review were searched for relevant original articles that explored the correlation between SUV and ADC in English. After applying Fisher's r -to- z transformation, correlation coefficient ( r ) values were extracted from each study and 95% confidence intervals (CIs) were calculated. Sensitivity and subgroup analyses based on tumor type were performed to investigate the potential heterogeneity. Forty-nine studies were eligible for the meta-analysis, comprising 1927 patients. Pooled r for all studies was -0.35 (95% CI: -0.42-0.28) and exhibited a notable heterogeneity ( I 2 = 78.4%; P < 0.01). In terms of the cancer type subgroup analysis, combined correlation coefficients of ADC/SUV range from -0.12 (lymphoma, n = 5) to -0.59 (pancreatic cancer, n = 2). We concluded that there is an average negative correlation between ADC and SUV in patients with cancer. Higher correlations were found in the brain tumor, cervix carcinoma, and pancreas cancer. However, a larger, prospective study is warranted to validate these findings in different cancer types.
NASA Astrophysics Data System (ADS)
Rouholahnejad, E.; Kirchner, J. W.
2016-12-01
Evapotranspiration (ET) is a key process in land-climate interactions and affects the dynamics of the atmosphere at local and regional scales. In estimating ET, most earth system models average over considerable sub-grid heterogeneity in land surface properties, precipitation (P), and potential evapotranspiration (PET). This spatial averaging could potentially bias ET estimates, due to the nonlinearities in the underlying relationships. In addition, most earth system models ignore lateral redistribution of water within and between grid cells, which could potentially alter both local and regional ET. Here we present a first attempt to quantify the effects of spatial heterogeneity and lateral redistribution on grid-cell-averaged ET as seen from the atmosphere over heterogeneous landscapes. Using a Budyko framework to express ET as a function of P and PET, we quantify how sub-grid heterogeneity affects average ET at the scale of typical earth system model grid cells. We show that averaging over sub-grid heterogeneity in P and PET, as typical earth system models do, leads to overestimates of average ET. We use a similar approach to quantify how lateral redistribution of water could affect average ET, as seen from the atmosphere. We show that where the aridity index P/PET increases with altitude, gravitationally driven lateral redistribution will increase average ET, implying that models that neglect lateral moisture redistribution will underestimate average ET. In contrast, where the aridity index P/PET decreases with altitude, gravitationally driven lateral redistribution will decrease average ET. This approach yields a simple conceptual framework and mathematical expressions for determining whether, and how much, spatial heterogeneity and lateral redistribution can affect regional ET fluxes as seen from the atmosphere. This analysis provides the basis for quantifying heterogeneity and redistribution effects on ET at regional and continental scales, which will be the focus of future work.
Spatial heterogeneities and variability of karst hydro-system : insights from geophysics
NASA Astrophysics Data System (ADS)
Champollion, C.; Fores, B.; Lesparre, N.; Frederic, N.
2017-12-01
Heterogeneous systems such as karsts or fractured hydro-systems are challenging for both scientist and groundwater resources management. Karsts heterogeneities prevent the comparison and moreover the combination of data representative of different scales: borehole water level can generally not be used directly to interpret spring flow dynamic for example. The spatial heterogeneity has also an impact on the temporal variability of groundwater transfer and storage. Karst hydro-systems have characteristic non linear relation between precipitation amount and discharge at the outlets with threshold effects and a large variability of groundwater transit times In the presentation, geophysical field experiments conducted in karst hydro-system in the south of France are used to investigate groundwater transfer and storage variability at a scale of a few hundred meters. We focus on the added value of both geophysical time-lapse gravity experiments and 2D ERT imaging of the subsurface heterogeneities. Both gravity and ERT results can only be interpreted with large ambiguity or some strong a priori: the relation between resistivity and water content is not unique; almost no information about the processes can be inferred from the groundwater stock variations. The present study demonstrate how the ERT and gravity field experiments can be interpreted together in a coherent scheme with less ambiguity. First the geological and hydro-meteorological context is presented. Then the ERT field experiment including the processing and the results are detailed in the section about geophysical imaging of the heterogeneities. The gravity double difference (S2D) time-lapse experiment is described in the section about geophysical monitoring of the temporal variability. The following discussion demonstrate the impact of both experiments on the interpretation in terms of processes and heterogeneities.
NASA Astrophysics Data System (ADS)
Choe, Chol-Ung; Kim, Ryong-Son; Ri, Ji-Song
2017-09-01
We consider a ring of phase oscillators with nonlocal coupling strength and heterogeneous phase lags. We analyze the effects of heterogeneity in the phase lags on the existence and stability of a variety of steady states. A nonlocal coupling with heterogeneous phase lags that allows the system to be solved analytically is suggested and the stability of solutions along the Ott-Antonsen invariant manifold is explored. We present a complete bifurcation diagram for stationary patterns including the uniform drift and modulated drift states as well as chimera state, which reveals that the stable modulated drift state and a continuum of metastable drift states could occur due to the heterogeneity of the phase lags. We verify our theoretical results using the direct numerical simulations of the model system.