WATCHMAN: A Data Warehouse Intelligent Cache Manager
NASA Technical Reports Server (NTRS)
Scheuermann, Peter; Shim, Junho; Vingralek, Radek
1996-01-01
Data warehouses store large volumes of data which are used frequently by decision support applications. Such applications involve complex queries. Query performance in such an environment is critical because decision support applications often require interactive query response time. Because data warehouses are updated infrequently, it becomes possible to improve query performance by caching sets retrieved by queries in addition to query execution plans. In this paper we report on the design of an intelligent cache manager for sets retrieved by queries called WATCHMAN, which is particularly well suited for data warehousing environment. Our cache manager employs two novel, complementary algorithms for cache replacement and for cache admission. WATCHMAN aims at minimizing query response time and its cache replacement policy swaps out entire retrieved sets of queries instead of individual pages. The cache replacement and admission algorithms make use of a profit metric, which considers for each retrieved set its average rate of reference, its size, and execution cost of the associated query. We report on a performance evaluation based on the TPC-D and Set Query benchmarks. These experiments show that WATCHMAN achieves a substantial performance improvement in a decision support environment when compared to a traditional LRU replacement algorithm.
Relational Algebra in Spatial Decision Support Systems Ontologies.
Diomidous, Marianna; Chardalias, Kostis; Koutonias, Panagiotis; Magnita, Adrianna; Andrianopoulos, Charalampos; Zimeras, Stelios; Mechili, Enkeleint Aggelos
2017-01-01
Decision Support Systems (DSS) is a powerful tool, for facilitates researchers to choose the correct decision based on their final results. Especially in medical cases where doctors could use these systems, to overcome the problem with the clinical misunderstanding. Based on these systems, queries must be constructed based on the particular questions that doctors must answer. In this work, combination between questions and queries would be presented via relational algebra.
GELLO: an object-oriented query and expression language for clinical decision support.
Sordo, Margarita; Ogunyemi, Omolola; Boxwala, Aziz A; Greenes, Robert A
2003-01-01
GELLO is a purpose-specific, object-oriented (OO) query and expression language. GELLO is the result of a concerted effort of the Decision Systems Group (DSG) working with the HL7 Clinical Decision Support Technical Committee (CDSTC) to provide the HL7 community with a common format for data encoding and manipulation. GELLO will soon be submitted for ballot to the HL7 CDSTC for consideration as a standard.
In-context query reformulation for failing SPARQL queries
NASA Astrophysics Data System (ADS)
Viswanathan, Amar; Michaelis, James R.; Cassidy, Taylor; de Mel, Geeth; Hendler, James
2017-05-01
Knowledge bases for decision support systems are growing increasingly complex, through continued advances in data ingest and management approaches. However, humans do not possess the cognitive capabilities to retain a bird's-eyeview of such knowledge bases, and may end up issuing unsatisfiable queries to such systems. This work focuses on the implementation of a query reformulation approach for graph-based knowledge bases, specifically designed to support the Resource Description Framework (RDF). The reformulation approach presented is instance-and schema-aware. Thus, in contrast to relaxation techniques found in the state-of-the-art, the presented approach produces in-context query reformulation.
An Experimental Investigation of Complexity in Database Query Formulation Tasks
ERIC Educational Resources Information Center
Casterella, Gretchen Irwin; Vijayasarathy, Leo
2013-01-01
Information Technology professionals and other knowledge workers rely on their ability to extract data from organizational databases to respond to business questions and support decision making. Structured query language (SQL) is the standard programming language for querying data in relational databases, and SQL skills are in high demand and are…
Flexible Decision Support in Device-Saturated Environments
2003-10-01
also output tuples to a remote MySQL or Postgres database. 3.3 GUI The GUI allows the user to pose queries using SQL and to display query...DatabaseConnection.java – handles connections to an external database (such as MySQL or Postgres ). • Debug.java – contains the code for printing out Debug messages...also provided. It is possible to output the results of queries to a MySQL or Postgres database for archival and the GUI can query those results
Secure Skyline Queries on Cloud Platform.
Liu, Jinfei; Yang, Juncheng; Xiong, Li; Pei, Jian
2017-04-01
Outsourcing data and computation to cloud server provides a cost-effective way to support large scale data storage and query processing. However, due to security and privacy concerns, sensitive data (e.g., medical records) need to be protected from the cloud server and other unauthorized users. One approach is to outsource encrypted data to the cloud server and have the cloud server perform query processing on the encrypted data only. It remains a challenging task to support various queries over encrypted data in a secure and efficient way such that the cloud server does not gain any knowledge about the data, query, and query result. In this paper, we study the problem of secure skyline queries over encrypted data. The skyline query is particularly important for multi-criteria decision making but also presents significant challenges due to its complex computations. We propose a fully secure skyline query protocol on data encrypted using semantically-secure encryption. As a key subroutine, we present a new secure dominance protocol, which can be also used as a building block for other queries. Finally, we provide both serial and parallelized implementations and empirically study the protocols in terms of efficiency and scalability under different parameter settings, verifying the feasibility of our proposed solutions.
Xiao, Fuyuan; Aritsugi, Masayoshi; Wang, Qing; Zhang, Rong
2016-09-01
For efficient and sophisticated analysis of complex event patterns that appear in streams of big data from health care information systems and support for decision-making, a triaxial hierarchical model is proposed in this paper. Our triaxial hierarchical model is developed by focusing on hierarchies among nested event pattern queries with an event concept hierarchy, thereby allowing us to identify the relationships among the expressions and sub-expressions of the queries extensively. We devise a cost-based heuristic by means of the triaxial hierarchical model to find an optimised query execution plan in terms of the costs of both the operators and the communications between them. According to the triaxial hierarchical model, we can also calculate how to reuse the results of the common sub-expressions in multiple queries. By integrating the optimised query execution plan with the reuse schemes, a multi-query optimisation strategy is developed to accomplish efficient processing of multiple nested event pattern queries. We present empirical studies in which the performance of multi-query optimisation strategy was examined under various stream input rates and workloads. Specifically, the workloads of pattern queries can be used for supporting monitoring patients' conditions. On the other hand, experiments with varying input rates of streams can correspond to changes of the numbers of patients that a system should manage, whereas burst input rates can correspond to changes of rushes of patients to be taken care of. The experimental results have shown that, in Workload 1, our proposal can improve about 4 and 2 times throughput comparing with the relative works, respectively; in Workload 2, our proposal can improve about 3 and 2 times throughput comparing with the relative works, respectively; in Workload 3, our proposal can improve about 6 times throughput comparing with the relative work. The experimental results demonstrated that our proposal was able to process complex queries efficiently which can support health information systems and further decision-making. Copyright © 2016 Elsevier B.V. All rights reserved.
2008-03-01
Fortunately, built into Excel is the capability to use ActiveX Data Objects (ADO), a software feature which uses VBA to interface with external...part of Excel’s ActiveX Direct Objects (ADO) functionality, Excel can execute SQL queries in Access with VBA. An SQL query statement can be written
Morrison, James J; Hostetter, Jason; Wang, Kenneth; Siegel, Eliot L
2015-02-01
Real-time mining of large research trial datasets enables development of case-based clinical decision support tools. Several applicable research datasets exist including the National Lung Screening Trial (NLST), a dataset unparalleled in size and scope for studying population-based lung cancer screening. Using these data, a clinical decision support tool was developed which matches patient demographics and lung nodule characteristics to a cohort of similar patients. The NLST dataset was converted into Structured Query Language (SQL) tables hosted on a web server, and a web-based JavaScript application was developed which performs real-time queries. JavaScript is used for both the server-side and client-side language, allowing for rapid development of a robust client interface and server-side data layer. Real-time data mining of user-specified patient cohorts achieved a rapid return of cohort cancer statistics and lung nodule distribution information. This system demonstrates the potential of individualized real-time data mining using large high-quality clinical trial datasets to drive evidence-based clinical decision-making.
Representing and querying now-relative relational medical data.
Anselma, Luca; Piovesan, Luca; Stantic, Bela; Terenziani, Paolo
2018-03-01
Temporal information plays a crucial role in medicine. Patients' clinical records are intrinsically temporal. Thus, in Medical Informatics there is an increasing need to store, support and query temporal data (particularly in relational databases), in order, for instance, to supplement decision-support systems. In this paper, we show that current approaches to relational data have remarkable limitations in the treatment of "now-relative" data (i.e., data holding true at the current time). This can severely compromise their applicability in general, and specifically in the medical context, where "now-relative" data are essential to assess the current status of the patients. We propose a theoretically grounded and application-independent relational approach to cope with now-relative data (which can be paired, e.g., with different decision support systems) overcoming such limitations. We propose a new temporal relational representation, which is the first relational model coping with the temporal indeterminacy intrinsic in now-relative data. We also propose new temporal algebraic operators to query them, supporting the distinction between possible and necessary time, and Allen's temporal relations between data. We exemplify the impact of our approach, and study the theoretical and computational properties of the new representation and algebra. Copyright © 2018 Elsevier B.V. All rights reserved.
Secure Skyline Queries on Cloud Platform
Liu, Jinfei; Yang, Juncheng; Xiong, Li; Pei, Jian
2017-01-01
Outsourcing data and computation to cloud server provides a cost-effective way to support large scale data storage and query processing. However, due to security and privacy concerns, sensitive data (e.g., medical records) need to be protected from the cloud server and other unauthorized users. One approach is to outsource encrypted data to the cloud server and have the cloud server perform query processing on the encrypted data only. It remains a challenging task to support various queries over encrypted data in a secure and efficient way such that the cloud server does not gain any knowledge about the data, query, and query result. In this paper, we study the problem of secure skyline queries over encrypted data. The skyline query is particularly important for multi-criteria decision making but also presents significant challenges due to its complex computations. We propose a fully secure skyline query protocol on data encrypted using semantically-secure encryption. As a key subroutine, we present a new secure dominance protocol, which can be also used as a building block for other queries. Finally, we provide both serial and parallelized implementations and empirically study the protocols in terms of efficiency and scalability under different parameter settings, verifying the feasibility of our proposed solutions. PMID:28883710
A Participants' DSS for a Management Game with a DSS Generator.
ERIC Educational Resources Information Center
Yeo, Gee Kin; Nah, Fui Hoon
1992-01-01
Describes the design of a decision support system (DSS) for a management game called MAGNUS (Management Game for National University of Singapore). Built-in models for performance analysis and decision making are explained; database query and model building are described; and future work is discussed. (11 references) (LRW)
EpiK: A Knowledge Base for Epidemiological Modeling and Analytics of Infectious Diseases
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hasan, S. M. Shamimul; Fox, Edward A.; Bisset, Keith
Computational epidemiology seeks to develop computational methods to study the distribution and determinants of health-related states or events (including disease), and the application of this study to the control of diseases and other health problems. Recent advances in computing and data sciences have led to the development of innovative modeling environments to support this important goal. The datasets used to drive the dynamic models as well as the data produced by these models presents unique challenges owing to their size, heterogeneity and diversity. These datasets form the basis of effective and easy to use decision support and analytical environments. Asmore » a result, it is important to develop scalable data management systems to store, manage and integrate these datasets. In this paper, we develop EpiK—a knowledge base that facilitates the development of decision support and analytical environments to support epidemic science. An important goal is to develop a framework that links the input as well as output datasets to facilitate effective spatio-temporal and social reasoning that is critical in planning and intervention analysis before and during an epidemic. The data management framework links modeling workflow data and its metadata using a controlled vocabulary. The metadata captures information about storage, the mapping between the linked model and the physical layout, and relationships to support services. EpiK is designed to support agent-based modeling and analytics frameworks—aggregate models can be seen as special cases and are thus supported. We use semantic web technologies to create a representation of the datasets that encapsulates both the location and the schema heterogeneity. The choice of RDF as a representation language is motivated by the diversity and growth of the datasets that need to be integrated. A query bank is developed—the queries capture a broad range of questions that can be posed and answered during a typical case study pertaining to disease outbreaks. The queries are constructed using SPARQL Protocol and RDF Query Language (SPARQL) over the EpiK. EpiK can hide schema and location heterogeneity while efficiently supporting queries that span the computational epidemiology modeling pipeline: from model construction to simulation output. As a result, we show that the performance of benchmark queries varies significantly with respect to the choice of hardware underlying the database and resource description framework (RDF) engine.« less
EpiK: A Knowledge Base for Epidemiological Modeling and Analytics of Infectious Diseases
Hasan, S. M. Shamimul; Fox, Edward A.; Bisset, Keith; ...
2017-11-06
Computational epidemiology seeks to develop computational methods to study the distribution and determinants of health-related states or events (including disease), and the application of this study to the control of diseases and other health problems. Recent advances in computing and data sciences have led to the development of innovative modeling environments to support this important goal. The datasets used to drive the dynamic models as well as the data produced by these models presents unique challenges owing to their size, heterogeneity and diversity. These datasets form the basis of effective and easy to use decision support and analytical environments. Asmore » a result, it is important to develop scalable data management systems to store, manage and integrate these datasets. In this paper, we develop EpiK—a knowledge base that facilitates the development of decision support and analytical environments to support epidemic science. An important goal is to develop a framework that links the input as well as output datasets to facilitate effective spatio-temporal and social reasoning that is critical in planning and intervention analysis before and during an epidemic. The data management framework links modeling workflow data and its metadata using a controlled vocabulary. The metadata captures information about storage, the mapping between the linked model and the physical layout, and relationships to support services. EpiK is designed to support agent-based modeling and analytics frameworks—aggregate models can be seen as special cases and are thus supported. We use semantic web technologies to create a representation of the datasets that encapsulates both the location and the schema heterogeneity. The choice of RDF as a representation language is motivated by the diversity and growth of the datasets that need to be integrated. A query bank is developed—the queries capture a broad range of questions that can be posed and answered during a typical case study pertaining to disease outbreaks. The queries are constructed using SPARQL Protocol and RDF Query Language (SPARQL) over the EpiK. EpiK can hide schema and location heterogeneity while efficiently supporting queries that span the computational epidemiology modeling pipeline: from model construction to simulation output. As a result, we show that the performance of benchmark queries varies significantly with respect to the choice of hardware underlying the database and resource description framework (RDF) engine.« less
A data analysis expert system for large established distributed databases
NASA Technical Reports Server (NTRS)
Gnacek, Anne-Marie; An, Y. Kim; Ryan, J. Patrick
1987-01-01
A design for a natural language database interface system, called the Deductively Augmented NASA Management Decision support System (DANMDS), is presented. The DANMDS system components have been chosen on the basis of the following considerations: maximal employment of the existing NASA IBM-PC computers and supporting software; local structuring and storing of external data via the entity-relationship model; a natural easy-to-use error-free database query language; user ability to alter query language vocabulary and data analysis heuristic; and significant artificial intelligence data analysis heuristic techniques that allow the system to become progressively and automatically more useful.
A weight based genetic algorithm for selecting views
NASA Astrophysics Data System (ADS)
Talebian, Seyed H.; Kareem, Sameem A.
2013-03-01
Data warehouse is a technology designed for supporting decision making. Data warehouse is made by extracting large amount of data from different operational systems; transforming it to a consistent form and loading it to the central repository. The type of queries in data warehouse environment differs from those in operational systems. In contrast to operational systems, the analytical queries that are issued in data warehouses involve summarization of large volume of data and therefore in normal circumstance take a long time to be answered. On the other hand, the result of these queries must be answered in a short time to enable managers to make decisions as short time as possible. As a result, an essential need in this environment is in improving the performances of queries. One of the most popular methods to do this task is utilizing pre-computed result of queries. In this method, whenever a new query is submitted by the user instead of calculating the query on the fly through a large underlying database, the pre-computed result or views are used to answer the queries. Although, the ideal option would be pre-computing and saving all possible views, but, in practice due to disk space constraint and overhead due to view updates it is not considered as a feasible choice. Therefore, we need to select a subset of possible views to save on disk. The problem of selecting the right subset of views is considered as an important challenge in data warehousing. In this paper we suggest a Weighted Based Genetic Algorithm (WBGA) for solving the view selection problem with two objectives.
2015-11-20
noticed erythematous rash on the girl’s trunk. Physical examination reveals strawberry red tongue, red and cracked lips, and swollen red hands. The whites...as erythematous rash, and test results, such as revealed swollen red hands and strawberry red tongue, as well as indicates a possible di- agnosis...such as strawberry tongue (also known as Kawasaki disease). Although such queries are fairly long, only a fraction of concepts corresponding to an in
A similarity-based data warehousing environment for medical images.
Teixeira, Jefferson William; Annibal, Luana Peixoto; Felipe, Joaquim Cezar; Ciferri, Ricardo Rodrigues; Ciferri, Cristina Dutra de Aguiar
2015-11-01
A core issue of the decision-making process in the medical field is to support the execution of analytical (OLAP) similarity queries over images in data warehousing environments. In this paper, we focus on this issue. We propose imageDWE, a non-conventional data warehousing environment that enables the storage of intrinsic features taken from medical images in a data warehouse and supports OLAP similarity queries over them. To comply with this goal, we introduce the concept of perceptual layer, which is an abstraction used to represent an image dataset according to a given feature descriptor in order to enable similarity search. Based on this concept, we propose the imageDW, an extended data warehouse with dimension tables specifically designed to support one or more perceptual layers. We also detail how to build an imageDW and how to load image data into it. Furthermore, we show how to process OLAP similarity queries composed of a conventional predicate and a similarity search predicate that encompasses the specification of one or more perceptual layers. Moreover, we introduce an index technique to improve the OLAP query processing over images. We carried out performance tests over a data warehouse environment that consolidated medical images from exams of several modalities. The results demonstrated the feasibility and efficiency of our proposed imageDWE to manage images and to process OLAP similarity queries. The results also demonstrated that the use of the proposed index technique guaranteed a great improvement in query processing. Copyright © 2015 Elsevier Ltd. All rights reserved.
Downing, N Lance; Adler-Milstein, Julia; Palma, Jonathan P; Lane, Steven; Eisenberg, Matthew; Sharp, Christopher; Longhurst, Christopher A
2017-01-01
Provider organizations increasingly have the ability to exchange patient health information electronically. Organizational health information exchange (HIE) policy decisions can impact the extent to which external information is readily available to providers, but this relationship has not been well studied. Our objective was to examine the relationship between electronic exchange of patient health information across organizations and organizational HIE policy decisions. We focused on 2 key decisions: whether to automatically search for information from other organizations and whether to require HIE-specific patient consent. We conducted a retrospective time series analysis of the effect of automatic querying and the patient consent requirement on the monthly volume of clinical summaries exchanged. We could not assess degree of use or usefulness of summaries, organizational decision-making processes, or generalizability to other vendors. Between 2013 and 2015, clinical summary exchange volume increased by 1349% across 11 organizations. Nine of the 11 systems were set up to enable auto-querying, and auto-querying was associated with a significant increase in the monthly rate of exchange (P = .006 for change in trend). Seven of the 11 organizations did not require patient consent specifically for HIE, and these organizations experienced a greater increase in volume of exchange over time compared to organizations that required consent. Automatic querying and limited consent requirements are organizational HIE policy decisions that impact the volume of exchange, and ultimately the information available to providers to support optimal care. Future efforts to ensure effective HIE may need to explicitly address these factors. © The Author 2016. Published by Oxford University Press on behalf of the American Medical Informatics Association.
Designing Tools for Supporting User Decision-Making in e-Commerce
NASA Astrophysics Data System (ADS)
Sutcliffe, Alistair; Al-Qaed, Faisal
The paper describes a set of tools designed to support a variety of user decision-making strategies. The tools are complemented by an online advisor so they can be adapted to different domains and users can be guided to adopt appropriate tools for different choices in e-commerce, e.g. purchasing high-value products, exploring product fit to users’ needs, or selecting products which satisfy requirements. The tools range from simple recommenders to decision support by interactive querying and comparison matrices. They were evaluated in a scenario-based experiment which varied the users’ task and motivation, with and without an advisor agent. The results show the tools and advisor were effective in supporting users and agreed with the predictions of ADM (adaptive decision making) theory, on which the design of the tools was based.
Supporting diagnosis and treatment in medical care based on Big Data processing.
Lupşe, Oana-Sorina; Crişan-Vida, Mihaela; Stoicu-Tivadar, Lăcrămioara; Bernard, Elena
2014-01-01
With information and data in all domains growing every day, it is difficult to manage and extract useful knowledge for specific situations. This paper presents an integrated system architecture to support the activity in the Ob-Gin departments with further developments in using new technology to manage Big Data processing - using Google BigQuery - in the medical domain. The data collected and processed with Google BigQuery results from different sources: two Obstetrics & Gynaecology Departments, the TreatSuggest application - an application for suggesting treatments, and a home foetal surveillance system. Data is uploaded in Google BigQuery from Bega Hospital Timişoara, Romania. The analysed data is useful for the medical staff, researchers and statisticians from public health domain. The current work describes the technological architecture and its processing possibilities that in the future will be proved based on quality criteria to lead to a better decision process in diagnosis and public health.
Multidimensional indexing structure for use with linear optimization queries
NASA Technical Reports Server (NTRS)
Bergman, Lawrence David (Inventor); Castelli, Vittorio (Inventor); Chang, Yuan-Chi (Inventor); Li, Chung-Sheng (Inventor); Smith, John Richard (Inventor)
2002-01-01
Linear optimization queries, which usually arise in various decision support and resource planning applications, are queries that retrieve top N data records (where N is an integer greater than zero) which satisfy a specific optimization criterion. The optimization criterion is to either maximize or minimize a linear equation. The coefficients of the linear equation are given at query time. Methods and apparatus are disclosed for constructing, maintaining and utilizing a multidimensional indexing structure of database records to improve the execution speed of linear optimization queries. Database records with numerical attributes are organized into a number of layers and each layer represents a geometric structure called convex hull. Such linear optimization queries are processed by searching from the outer-most layer of this multi-layer indexing structure inwards. At least one record per layer will satisfy the query criterion and the number of layers needed to be searched depends on the spatial distribution of records, the query-issued linear coefficients, and N, the number of records to be returned. When N is small compared to the total size of the database, answering the query typically requires searching only a small fraction of all relevant records, resulting in a tremendous speedup as compared to linearly scanning the entire dataset.
Stream traffic data archival, querying, and analysis with TransDec.
DOT National Transportation Integrated Search
2011-01-01
The goal of research was to extend the traffic data analysis of the TransDec (short for : Transportation Decision-Making) system, which was developed under METRANS 09-26 : research grant. The TransDec system is a real-data driven system to support de...
Hynes, Denise M.; Perrin, Ruth A.; Rappaport, Steven; Stevens, Joanne M.; Demakis, John G.
2004-01-01
Information systems are increasingly important for measuring and improving health care quality. A number of integrated health care delivery systems use advanced information systems and integrated decision support to carry out quality assurance activities, but none as large as the Veterans Health Administration (VHA). The VHA's Quality Enhancement Research Initiative (QUERI) is a large-scale, multidisciplinary quality improvement initiative designed to ensure excellence in all areas where VHA provides health care services, including inpatient, outpatient, and long-term care settings. In this paper, we describe the role of information systems in the VHA QUERI process, highlight the major information systems critical to this quality improvement process, and discuss issues associated with the use of these systems. PMID:15187063
Query Modification through External Sources to Support Clinical Decisions
2014-11-01
takes no medications. Physical examination is normal. The EKG shows nonspecific changes. Summary 58-year-old woman with hypertension and obesity presents...algorithm for suffix stripping. Program, 14:130–137, 1980. Reprinted in Readings in Information Retrieval, pages 313–316, 1997. M. S. Simpson, E
Tu, Samson W; Hrabak, Karen M; Campbell, James R; Glasgow, Julie; Nyman, Mark A; McClure, Robert; McClay, James; Abarbanel, Robert; Mansfield, James G; Martins, Susana M; Goldstein, Mary K; Musen, Mark A
2006-01-01
Developing computer-interpretable clinical practice guidelines (CPGs) to provide decision support for guideline-based care is an extremely labor-intensive task. In the EON/ATHENA and SAGE projects, we formulated substantial portions of CPGs as computable statements that express declarative relationships between patient conditions and possible interventions. We developed query and expression languages that allow a decision-support system (DSS) to evaluate these statements in specific patient situations. A DSS can use these guideline statements in multiple ways, including: (1) as inputs for determining preferred alternatives in decision-making, and (2) as a way to provide targeted commentaries in the clinical information system. The use of these declarative statements significantly reduces the modeling expertise and effort required to create and maintain computer-interpretable knowledge bases for decision-support purpose. We discuss possible implications for sharing of such knowledge bases.
Archetype-based data warehouse environment to enable the reuse of electronic health record data.
Marco-Ruiz, Luis; Moner, David; Maldonado, José A; Kolstrup, Nils; Bellika, Johan G
2015-09-01
The reuse of data captured during health care delivery is essential to satisfy the demands of clinical research and clinical decision support systems. A main barrier for the reuse is the existence of legacy formats of data and the high granularity of it when stored in an electronic health record (EHR) system. Thus, we need mechanisms to standardize, aggregate, and query data concealed in the EHRs, to allow their reuse whenever they are needed. To create a data warehouse infrastructure using archetype-based technologies, standards and query languages to enable the interoperability needed for data reuse. The work presented makes use of best of breed archetype-based data transformation and storage technologies to create a workflow for the modeling, extraction, transformation and load of EHR proprietary data into standardized data repositories. We converted legacy data and performed patient-centered aggregations via archetype-based transformations. Later, specific purpose aggregations were performed at a query level for particular use cases. Laboratory test results of a population of 230,000 patients belonging to Troms and Finnmark counties in Norway requested between January 2013 and November 2014 have been standardized. Test records normalization has been performed by defining transformation and aggregation functions between the laboratory records and an archetype. These mappings were used to automatically generate open EHR compliant data. These data were loaded into an archetype-based data warehouse. Once loaded, we defined indicators linked to the data in the warehouse to monitor test activity of Salmonella and Pertussis using the archetype query language. Archetype-based standards and technologies can be used to create a data warehouse environment that enables data from EHR systems to be reused in clinical research and decision support systems. With this approach, existing EHR data becomes available in a standardized and interoperable format, thus opening a world of possibilities toward semantic or concept-based reuse, query and communication of clinical data. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Identifying QT prolongation from ECG impressions using a general-purpose Natural Language Processor
Denny, Joshua C.; Miller, Randolph A.; Waitman, Lemuel Russell; Arrieta, Mark; Peterson, Joshua F.
2009-01-01
Objective Typically detected via electrocardiograms (ECGs), QT interval prolongation is a known risk factor for sudden cardiac death. Since medications can promote or exacerbate the condition, detection of QT interval prolongation is important for clinical decision support. We investigated the accuracy of natural language processing (NLP) for identifying QT prolongation from cardiologist-generated, free-text ECG impressions compared to corrected QT (QTc) thresholds reported by ECG machines. Methods After integrating negation detection to a locally-developed natural language processor, the KnowledgeMap concept identifier, we evaluated NLP-based detection of QT prolongation compared to the calculated QTc on a set of 44,318 ECGs obtained from hospitalized patients. We also created a string query using regular expressions to identify QT prolongation. We calculated sensitivity and specificity of the methods using manual physician review of the cardiologist-generated reports as the gold standard. To investigate causes of “false positive” calculated QTc, we manually reviewed randomly selected ECGs with a long calculated QTc but no mention of QT prolongation. Separately, we validated the performance of the negation detection algorithm on 5,000 manually-categorized ECG phrases for any medical concept (not limited to QT prolongation) prior to developing the NLP query for QT prolongation. Results The NLP query for QT prolongation correctly identified 2,364 of 2,373 ECGs with QT prolongation with a sensitivity of 0.996 and a positive predictive value of 1.000. There were no false positives. The regular expression query had a sensitivity of 0.999 and positive predictive value of 0.982. In contrast, the positive predictive value of common QTc thresholds derived from ECG machines was 0.07–0.25 with corresponding sensitivities of 0.994–0.046. The negation detection algorithm had a recall of 0.973 and precision of 0.982 for 10,490 concepts found within ECG impressions. Conclusions NLP and regular expression queries of cardiologists’ ECG interpretations can more effectively identify QT prolongation than the automated QTc intervals reported by ECG machines. Future clinical decision support could employ NLP queries to detect QTc prolongation and other reported ECG abnormalities. PMID:18938105
2014-11-01
for 6 months. Median performance for this topic was relatively low, despite being an easy diagnosis of hypothyroidism for a medical expert. However... hypothyroidism was ranked 3rd in the retrieval results. Without boosting, the highest-ranked article on hypothyroidism was ranked 16th. In contrast, this
ERIC Educational Resources Information Center
Mirel, Barbara
2001-01-01
Conducts a scenario-based usability test with 10 data analysts using visual querying (visually analyzing data with interactive graphics). Details a range of difficulties found in visual selection that, at times, gave rise to inaccurate selections, invalid conclusions, and misguided decisions. Argues that support for visual selection must be built…
A Note on Interfacing Object Warehouses and Mass Storage Systems for Data Mining Applications
NASA Technical Reports Server (NTRS)
Grossman, Robert L.; Northcutt, Dave
1996-01-01
Data mining is the automatic discovery of patterns, associations, and anomalies in data sets. Data mining requires numerically and statistically intensive queries. Our assumption is that data mining requires a specialized data management infrastructure to support the aforementioned intensive queries, but because of the sizes of data involved, this infrastructure is layered over a hierarchical storage system. In this paper, we discuss the architecture of a system which is layered for modularity, but exploits specialized lightweight services to maintain efficiency. Rather than use a full functioned database for example, we use light weight object services specialized for data mining. We propose using information repositories between layers so that components on either side of the layer can access information in the repositories to assist in making decisions about data layout, the caching and migration of data, the scheduling of queries, and related matters.
Tiede, Dirk; Baraldi, Andrea; Sudmanns, Martin; Belgiu, Mariana; Lang, Stefan
2017-01-01
ABSTRACT Spatiotemporal analytics of multi-source Earth observation (EO) big data is a pre-condition for semantic content-based image retrieval (SCBIR). As a proof of concept, an innovative EO semantic querying (EO-SQ) subsystem was designed and prototypically implemented in series with an EO image understanding (EO-IU) subsystem. The EO-IU subsystem is automatically generating ESA Level 2 products (scene classification map, up to basic land cover units) from optical satellite data. The EO-SQ subsystem comprises a graphical user interface (GUI) and an array database embedded in a client server model. In the array database, all EO images are stored as a space-time data cube together with their Level 2 products generated by the EO-IU subsystem. The GUI allows users to (a) develop a conceptual world model based on a graphically supported query pipeline as a combination of spatial and temporal operators and/or standard algorithms and (b) create, save and share within the client-server architecture complex semantic queries/decision rules, suitable for SCBIR and/or spatiotemporal EO image analytics, consistent with the conceptual world model. PMID:29098143
Hynes, Denise M; Weddle, Timothy; Smith, Nina; Whittier, Erika; Atkins, David; Francis, Joseph
2010-01-01
As the Department of Veterans Affairs (VA) Health Services Research and Development Service's Quality Enhancement Research Initiative (QUERI) has progressed, health information technology (HIT) has occupied a crucial role in implementation research projects. We evaluated the role of HIT in VA QUERI implementation research, including HIT use and development, the contributions implementation research has made to HIT development, and HIT-related barriers and facilitators to implementation research. Key informants from nine disease-specific QUERI Centers. Documentation analysis of 86 implementation project abstracts followed up by semi-structured interviews with key informants from each of the nine QUERI centers. We used qualitative and descriptive analyses. We found: (1) HIT provided data and information to facilitate implementation research, (2) implementation research helped to further HIT development in a variety of uses including the development of clinical decision support systems (23 of 86 implementation research projects), and (3) common HIT barriers to implementation research existed but could be overcome by collaborations with clinical and administrative leadership. Our review of the implementation research progress in the VA revealed interdependency on an HIT infrastructure and research-based development. Collaboration with multiple stakeholders is a key factor in successful use and development of HIT in implementation research efforts and in advancing evidence-based practice.
An Intelligent Polar Cyberinfrastrucuture to Support Spatiotemporal Decision Making
NASA Astrophysics Data System (ADS)
Song, M.; Li, W.; Zhou, X.
2014-12-01
In the era of big data, polar sciences have already faced an urgent demand of utilizing intelligent approaches to support precise and effective spatiotemporal decision-making. Service-oriented cyberinfrastructure has advantages of seamlessly integrating distributed computing resources, and aggregating a variety of geospatial data derived from Earth observation network. This paper focuses on building a smart service-oriented cyberinfrastructure to support intelligent question answering related to polar datasets. The innovation of this polar cyberinfrastructure includes: (1) a problem-solving environment that parses geospatial question in natural language, builds geoprocessing rules, composites atomic processing services and executes the entire workflow; (2) a self-adaptive spatiotemporal filter that is capable of refining query constraints through semantic analysis; (3) a dynamic visualization strategy to support results animation and statistics in multiple spatial reference systems; and (4) a user-friendly online portal to support collaborative decision-making. By means of this polar cyberinfrastructure, we intend to facilitate integration of distributed and heterogeneous Arctic datasets and comprehensive analysis of multiple environmental elements (e.g. snow, ice, permafrost) to provide a better understanding of the environmental variation in circumpolar regions.
Representation and Integration of Scientific Information
NASA Technical Reports Server (NTRS)
1998-01-01
The objective of this Joint Research Interchange with NASA-Ames was to investigate how the Tsimmis technology could be used to represent and integrate scientific information. The main goal of the Tsimmis project is to allow a decision maker to find information of interest from such sources, fuse it, and process it (e.g., summarize it, visualize it, discover trends). Another important goal is the easy incorporation of new sources, as well the ability to deal with sources whose structure or services evolve. During the Interchange we had research meetings approximately every month or two. The funds provided by NASA supported work that lead to the following two papers: Fusion Queries over Internet Databases; Efficient Query Subscription Processing in a Multicast Environment.
An RDF/OWL knowledge base for query answering and decision support in clinical pharmacogenetics.
Samwald, Matthias; Freimuth, Robert; Luciano, Joanne S; Lin, Simon; Powers, Robert L; Marshall, M Scott; Adlassnig, Klaus-Peter; Dumontier, Michel; Boyce, Richard D
2013-01-01
Genetic testing for personalizing pharmacotherapy is bound to become an important part of clinical routine. To address associated issues with data management and quality, we are creating a semantic knowledge base for clinical pharmacogenetics. The knowledge base is made up of three components: an expressive ontology formalized in the Web Ontology Language (OWL 2 DL), a Resource Description Framework (RDF) model for capturing detailed results of manual annotation of pharmacogenomic information in drug product labels, and an RDF conversion of relevant biomedical datasets. Our work goes beyond the state of the art in that it makes both automated reasoning as well as query answering as simple as possible, and the reasoning capabilities go beyond the capabilities of previously described ontologies.
NASA Astrophysics Data System (ADS)
Kase, Sue E.; Vanni, Michelle; Knight, Joanne A.; Su, Yu; Yan, Xifeng
2016-05-01
Within operational environments decisions must be made quickly based on the information available. Identifying an appropriate knowledge base and accurately formulating a search query are critical tasks for decision-making effectiveness in dynamic situations. The spreading of graph data management tools to access large graph databases is a rapidly emerging research area of potential benefit to the intelligence community. A graph representation provides a natural way of modeling data in a wide variety of domains. Graph structures use nodes, edges, and properties to represent and store data. This research investigates the advantages of information search by graph query initiated by the analyst and interactively refined within the contextual dimensions of the answer space toward a solution. The paper introduces SLQ, a user-friendly graph querying system enabling the visual formulation of schemaless and structureless graph queries. SLQ is demonstrated with an intelligence analyst information search scenario focused on identifying individuals responsible for manufacturing a mosquito-hosted deadly virus. The scenario highlights the interactive construction of graph queries without prior training in complex query languages or graph databases, intuitive navigation through the problem space, and visualization of results in graphical format.
Quality of online information to support patient decision-making in breast cancer surgery.
Bruce, Jordan G; Tucholka, Jennifer L; Steffens, Nicole M; Neuman, Heather B
2015-11-01
Breast cancer patients commonly use the internet as an information resource. Our objective was to evaluate the quality of online information available to support patients facing a decision for breast surgery. Breast cancer surgery-related queries were performed (Google and Bing), and reviewed for content pertinent to breast cancer surgery. The DISCERN instrument was used to evaluate websites' structural components that influence publication reliability and ability of information to support treatment decision-making. Scores of 4/5 were considered "good." 45 unique websites were identified. Websites satisfied a median 5/9 content questions. Commonly omitted topics included: having a choice between breast conservation and mastectomy (67%) and potential for 2nd surgery to obtain negative margins after breast conservation (60%). Websites had a median DISCERN score of 2.9 (range 2.0-4.5). Websites achieved higher scores on structural criteria (median 3.6 [2.1-4.7]), with 24% rated as "good." Scores on supporting decision-making questions were lower (2.6 [1.3-4.4]), with only 7% scoring "good." Although numerous breast cancer-related websites exist, most do a poor job providing women with essential information necessary to actively participate in decision-making for breast cancer surgery. Providing easily- accessible, high-quality online information has the potential to significantly improve patients' experiences with decision-making. © 2015 Wiley Periodicals, Inc.
Visual exploration of big spatio-temporal urban data: a study of New York City taxi trips.
Ferreira, Nivan; Poco, Jorge; Vo, Huy T; Freire, Juliana; Silva, Cláudio T
2013-12-01
As increasing volumes of urban data are captured and become available, new opportunities arise for data-driven analysis that can lead to improvements in the lives of citizens through evidence-based decision making and policies. In this paper, we focus on a particularly important urban data set: taxi trips. Taxis are valuable sensors and information associated with taxi trips can provide unprecedented insight into many different aspects of city life, from economic activity and human behavior to mobility patterns. But analyzing these data presents many challenges. The data are complex, containing geographical and temporal components in addition to multiple variables associated with each trip. Consequently, it is hard to specify exploratory queries and to perform comparative analyses (e.g., compare different regions over time). This problem is compounded due to the size of the data-there are on average 500,000 taxi trips each day in NYC. We propose a new model that allows users to visually query taxi trips. Besides standard analytics queries, the model supports origin-destination queries that enable the study of mobility across the city. We show that this model is able to express a wide range of spatio-temporal queries, and it is also flexible in that not only can queries be composed but also different aggregations and visual representations can be applied, allowing users to explore and compare results. We have built a scalable system that implements this model which supports interactive response times; makes use of an adaptive level-of-detail rendering strategy to generate clutter-free visualization for large results; and shows hidden details to the users in a summary through the use of overlay heat maps. We present a series of case studies motivated by traffic engineers and economists that show how our model and system enable domain experts to perform tasks that were previously unattainable for them.
Mohammadhassanzadeh, Hossein; Van Woensel, William; Abidi, Samina Raza; Abidi, Syed Sibte Raza
2017-01-01
Capturing complete medical knowledge is challenging-often due to incomplete patient Electronic Health Records (EHR), but also because of valuable, tacit medical knowledge hidden away in physicians' experiences. To extend the coverage of incomplete medical knowledge-based systems beyond their deductive closure, and thus enhance their decision-support capabilities, we argue that innovative, multi-strategy reasoning approaches should be applied. In particular, plausible reasoning mechanisms apply patterns from human thought processes, such as generalization, similarity and interpolation, based on attributional, hierarchical, and relational knowledge. Plausible reasoning mechanisms include inductive reasoning , which generalizes the commonalities among the data to induce new rules, and analogical reasoning , which is guided by data similarities to infer new facts. By further leveraging rich, biomedical Semantic Web ontologies to represent medical knowledge, both known and tentative, we increase the accuracy and expressivity of plausible reasoning, and cope with issues such as data heterogeneity, inconsistency and interoperability. In this paper, we present a Semantic Web-based, multi-strategy reasoning approach, which integrates deductive and plausible reasoning and exploits Semantic Web technology to solve complex clinical decision support queries. We evaluated our system using a real-world medical dataset of patients with hepatitis, from which we randomly removed different percentages of data (5%, 10%, 15%, and 20%) to reflect scenarios with increasing amounts of incomplete medical knowledge. To increase the reliability of the results, we generated 5 independent datasets for each percentage of missing values, which resulted in 20 experimental datasets (in addition to the original dataset). The results show that plausibly inferred knowledge extends the coverage of the knowledge base by, on average, 2%, 7%, 12%, and 16% for datasets with, respectively, 5%, 10%, 15%, and 20% of missing values. This expansion in the KB coverage allowed solving complex disease diagnostic queries that were previously unresolvable, without losing the correctness of the answers. However, compared to deductive reasoning, data-intensive plausible reasoning mechanisms yield a significant performance overhead. We observed that plausible reasoning approaches, by generating tentative inferences and leveraging domain knowledge of experts, allow us to extend the coverage of medical knowledge bases, resulting in improved clinical decision support. Second, by leveraging OWL ontological knowledge, we are able to increase the expressivity and accuracy of plausible reasoning methods. Third, our approach is applicable to clinical decision support systems for a range of chronic diseases.
Salehi, Ali; Jimenez-Berni, Jose; Deery, David M; Palmer, Doug; Holland, Edward; Rozas-Larraondo, Pablo; Chapman, Scott C; Georgakopoulos, Dimitrios; Furbank, Robert T
2015-01-01
To our knowledge, there is no software or database solution that supports large volumes of biological time series sensor data efficiently and enables data visualization and analysis in real time. Existing solutions for managing data typically use unstructured file systems or relational databases. These systems are not designed to provide instantaneous response to user queries. Furthermore, they do not support rapid data analysis and visualization to enable interactive experiments. In large scale experiments, this behaviour slows research discovery, discourages the widespread sharing and reuse of data that could otherwise inform critical decisions in a timely manner and encourage effective collaboration between groups. In this paper we present SensorDB, a web based virtual laboratory that can manage large volumes of biological time series sensor data while supporting rapid data queries and real-time user interaction. SensorDB is sensor agnostic and uses web-based, state-of-the-art cloud and storage technologies to efficiently gather, analyse and visualize data. Collaboration and data sharing between different agencies and groups is thereby facilitated. SensorDB is available online at http://sensordb.csiro.au.
Adaptive Peircean decision aid project summary assessments.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Senglaub, Michael E.
2007-01-01
This efforts objective was to identify and hybridize a suite of technologies enabling the development of predictive decision aids for use principally in combat environments but also in any complex information terrain. The technologies required included formal concept analysis for knowledge representation and information operations, Peircean reasoning to support hypothesis generation, Mill's's canons to begin defining information operators that support the first two technologies and co-evolutionary game theory to provide the environment/domain to assess predictions from the reasoning engines. The intended application domain is the IED problem because of its inherent evolutionary nature. While a fully functioning integrated algorithm wasmore » not achieved the hybridization and demonstration of the technologies was accomplished and demonstration of utility provided for a number of ancillary queries.« less
EMSE at TREC 2015 Clinical Decision Support Track
2015-11-20
pseudo relevant documents, semantic ressources of UMLS , and a hybrid approach called SMERA that combines LSI and UMLS based approaches. Only three of...approach to query expansion uses ontologies ( UMLS ) and a lo- cal approach based on pseudo relevant feedback documents using LSI. A brief description of...pseudo relevance feedback documents, and a semantic method based on UMLS concepts. The LSI-based method was used only to expand summary terms that can’t
UTD at TREC 2014: Query Expansion for Clinical Decision Support
2014-11-01
Description: A 62-year-old man sees a neurologist for progressive memory loss and jerking movements of the lower ex- tremities. Neurologic examination confirms...infiltration. Summary: 62-year-old man with progressive memory loss and in- voluntary leg movements. Brain MRI reveals cortical atrophy, and cortical...latent topics produced by the Latent Dirichlet Allocation (LDA) on the TREC-CDS corpus of scientific articles. The position of words “ loss ” and “ memory
Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce.
Aji, Ablimit; Wang, Fusheng; Vo, Hoang; Lee, Rubao; Liu, Qiaoling; Zhang, Xiaodong; Saltz, Joel
2013-08-01
Support of high performance queries on large volumes of spatial data becomes increasingly important in many application domains, including geospatial problems in numerous fields, location based services, and emerging scientific applications that are increasingly data- and compute-intensive. The emergence of massive scale spatial data is due to the proliferation of cost effective and ubiquitous positioning technologies, development of high resolution imaging technologies, and contribution from a large number of community users. There are two major challenges for managing and querying massive spatial data to support spatial queries: the explosion of spatial data, and the high computational complexity of spatial queries. In this paper, we present Hadoop-GIS - a scalable and high performance spatial data warehousing system for running large scale spatial queries on Hadoop. Hadoop-GIS supports multiple types of spatial queries on MapReduce through spatial partitioning, customizable spatial query engine RESQUE, implicit parallel spatial query execution on MapReduce, and effective methods for amending query results through handling boundary objects. Hadoop-GIS utilizes global partition indexing and customizable on demand local spatial indexing to achieve efficient query processing. Hadoop-GIS is integrated into Hive to support declarative spatial queries with an integrated architecture. Our experiments have demonstrated the high efficiency of Hadoop-GIS on query response and high scalability to run on commodity clusters. Our comparative experiments have showed that performance of Hadoop-GIS is on par with parallel SDBMS and outperforms SDBMS for compute-intensive queries. Hadoop-GIS is available as a set of library for processing spatial queries, and as an integrated software package in Hive.
Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce
Aji, Ablimit; Wang, Fusheng; Vo, Hoang; Lee, Rubao; Liu, Qiaoling; Zhang, Xiaodong; Saltz, Joel
2013-01-01
Support of high performance queries on large volumes of spatial data becomes increasingly important in many application domains, including geospatial problems in numerous fields, location based services, and emerging scientific applications that are increasingly data- and compute-intensive. The emergence of massive scale spatial data is due to the proliferation of cost effective and ubiquitous positioning technologies, development of high resolution imaging technologies, and contribution from a large number of community users. There are two major challenges for managing and querying massive spatial data to support spatial queries: the explosion of spatial data, and the high computational complexity of spatial queries. In this paper, we present Hadoop-GIS – a scalable and high performance spatial data warehousing system for running large scale spatial queries on Hadoop. Hadoop-GIS supports multiple types of spatial queries on MapReduce through spatial partitioning, customizable spatial query engine RESQUE, implicit parallel spatial query execution on MapReduce, and effective methods for amending query results through handling boundary objects. Hadoop-GIS utilizes global partition indexing and customizable on demand local spatial indexing to achieve efficient query processing. Hadoop-GIS is integrated into Hive to support declarative spatial queries with an integrated architecture. Our experiments have demonstrated the high efficiency of Hadoop-GIS on query response and high scalability to run on commodity clusters. Our comparative experiments have showed that performance of Hadoop-GIS is on par with parallel SDBMS and outperforms SDBMS for compute-intensive queries. Hadoop-GIS is available as a set of library for processing spatial queries, and as an integrated software package in Hive. PMID:24187650
Improve Performance of Data Warehouse by Query Cache
NASA Astrophysics Data System (ADS)
Gour, Vishal; Sarangdevot, S. S.; Sharma, Anand; Choudhary, Vinod
2010-11-01
The primary goal of data warehouse is to free the information locked up in the operational database so that decision makers and business analyst can make queries, analysis and planning regardless of the data changes in operational database. As the number of queries is large, therefore, in certain cases there is reasonable probability that same query submitted by the one or multiple users at different times. Each time when query is executed, all the data of warehouse is analyzed to generate the result of that query. In this paper we will study how using query cache improves performance of Data Warehouse and try to find the common problems faced. These kinds of problems are faced by Data Warehouse administrators which are minimizes response time and improves the efficiency of query in data warehouse overall, particularly when data warehouse is updated at regular interval.
Patterson, Sara E; Liu, Rangjiao; Statz, Cara M; Durkin, Daniel; Lakshminarayana, Anuradha; Mockus, Susan M
2016-01-16
Precision medicine in oncology relies on rapid associations between patient-specific variations and targeted therapeutic efficacy. Due to the advancement of genomic analysis, a vast literature characterizing cancer-associated molecular aberrations and relative therapeutic relevance has been published. However, data are not uniformly reported or readily available, and accessing relevant information in a clinically acceptable time-frame is a daunting proposition, hampering connections between patients and appropriate therapeutic options. One important therapeutic avenue for oncology patients is through clinical trials. Accordingly, a global view into the availability of targeted clinical trials would provide insight into strengths and weaknesses and potentially enable research focus. However, data regarding the landscape of clinical trials in oncology is not readily available, and as a result, a comprehensive understanding of clinical trial availability is difficult. To support clinical decision-making, we have developed a data loader and mapper that connects sequence information from oncology patients to data stored in an in-house database, the JAX Clinical Knowledgebase (JAX-CKB), which can be queried readily to access comprehensive data for clinical reporting via customized reporting queries. JAX-CKB functions as a repository to house expertly curated clinically relevant data surrounding our 358-gene panel, the JAX Cancer Treatment Profile (JAX CTP), and supports annotation of functional significance of molecular variants. Through queries of data housed in JAX-CKB, we have analyzed the landscape of clinical trials relevant to our 358-gene targeted sequencing panel to evaluate strengths and weaknesses in current molecular targeting in oncology. Through this analysis, we have identified patient indications, molecular aberrations, and targeted therapy classes that have strong or weak representation in clinical trials. Here, we describe the development and disseminate system methods for associating patient genomic sequence data with clinically relevant information, facilitating interpretation and providing a mechanism for informing therapeutic decision-making. Additionally, through customized queries, we have the capability to rapidly analyze the landscape of targeted therapies in clinical trials, enabling a unique view into current therapeutic availability in oncology.
Kopanitsa, Georgy
2017-05-18
The efficiency and acceptance of clinical decision support systems (CDSS) can increase if they reuse medical data captured during health care delivery. High heterogeneity of the existing legacy data formats has become the main barrier for the reuse of data. Thus, we need to apply data modeling mechanisms that provide standardization, transformation, accumulation and querying medical data to allow its reuse. In this paper, we focus on the interoperability issues of the hospital information systems (HIS) and CDSS data integration. Our study is based on the approach proposed by Marcos et al. where archetypes are used as a standardized mechanism for the interaction of a CDSS with an electronic health record (EHR). We build an integration tool to enable CDSSs collect data from various institutions without a need for modifications in the implementation. The approach implies development of a conceptual level as a set of archetypes representing concepts required by a CDSS. Treatment case data from Regional Clinical Hospital in Tomsk, Russia was extracted, transformed and loaded to the archetype database of a clinical decision support system. Test records' normalization has been performed by defining transformation and aggregation rules between the EHR data and the archetypes. These mapping rules were used to automatically generate openEHR compliant data. After the transformation, archetype data instances were loaded into the CDSS archetype based data storage. The performance times showed acceptable performance for the extraction stage with a mean of 17.428 s per year (3436 case records). The transformation times were also acceptable with 136.954 s per year (0.039 s per one instance). The accuracy evaluation showed the correctness and applicability of the method for the wide range of HISes. These operations were performed without interrupting the HIS workflow to prevent the HISes from disturbing the service provision to the users. The project results have proven that archetype based technologies are mature enough to be applied in routine operations that require extraction, transformation, loading and querying medical data from heterogeneous EHR systems. Inference models in clinical research and CDSS can benefit from this by defining queries to a valid data set with known structure and constraints. The standard based nature of the archetype approach allows an easy integration of CDSSs with existing EHR systems.
Zhang, Yi-Fan; Tian, Yu; Zhou, Tian-Shu; Araki, Kenji; Li, Jing-Song
2016-01-01
The broad adoption of clinical decision support systems within clinical practice has been hampered mainly by the difficulty in expressing domain knowledge and patient data in a unified formalism. This paper presents a semantic-based approach to the unified representation of healthcare domain knowledge and patient data for practical clinical decision making applications. A four-phase knowledge engineering cycle is implemented to develop a semantic healthcare knowledge base based on an HL7 reference information model, including an ontology to model domain knowledge and patient data and an expression repository to encode clinical decision making rules and queries. A semantic clinical decision support system is designed to provide patient-specific healthcare recommendations based on the knowledge base and patient data. The proposed solution is evaluated in the case study of type 2 diabetes mellitus inpatient management. The knowledge base is successfully instantiated with relevant domain knowledge and testing patient data. Ontology-level evaluation confirms model validity. Application-level evaluation of diagnostic accuracy reaches a sensitivity of 97.5%, a specificity of 100%, and a precision of 98%; an acceptance rate of 97.3% is given by domain experts for the recommended care plan orders. The proposed solution has been successfully validated in the case study as providing clinical decision support at a high accuracy and acceptance rate. The evaluation results demonstrate the technical feasibility and application prospect of our approach. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Evolution of Query Optimization Methods
NASA Astrophysics Data System (ADS)
Hameurlain, Abdelkader; Morvan, Franck
Query optimization is the most critical phase in query processing. In this paper, we try to describe synthetically the evolution of query optimization methods from uniprocessor relational database systems to data Grid systems through parallel, distributed and data integration systems. We point out a set of parameters to characterize and compare query optimization methods, mainly: (i) size of the search space, (ii) type of method (static or dynamic), (iii) modification types of execution plans (re-optimization or re-scheduling), (iv) level of modification (intra-operator and/or inter-operator), (v) type of event (estimation errors, delay, user preferences), and (vi) nature of decision-making (centralized or decentralized control).
Information system building of the urban electromagnetic environment
NASA Astrophysics Data System (ADS)
Wang, Jiechen; Rui, Yikang; Shen, Dingtao; Yu, Qing
2007-06-01
The pollution of urban electromagnetic radiation has become more serious, however, there is still lack of a perfect and interactive User System to manage, analyze and issue the information. In this study, taking the electromagnetic environment of Nanjing as an example, an information system based on WebGIS with the techniques of ArcIMS and JSP has been developed, in order to provide the services and technique supports for information query of public and decision making of relevant departments.
2014-11-01
created to serve as idealized representations of actual medical records, and include information such as medical history , current symptoms, diagnosis...NLM Medical Text Indexer (MTI).3 MeSH, or Medical Subject Headings, are terminology used by the NLM to index articles, catalog books, and searching...MeSH- indexed databases such as PubMed. However, since many medical conditions may be expressed in varying terminology , a single representation of a
LIMSI @ 2014 Clinical Decision Support Track
2014-11-01
MeSH and BoW runs) was based on the automatic generation of disease hypotheses for which we used data from OrphaNet [4] and the Disease Symptom Knowledge...with the MeSH terms of the top 5 disease hypotheses generated for the case reports. Compared to the other participants we achieved low scores...clinical question types. Query expansion (for both MeSH and BoW runs) was based on the automatic generation of disease hypotheses for which we used data
Using a Relational Database to Index Infectious Disease Information
Brown, Jay A.
2010-01-01
Mapping medical knowledge into a relational database became possible with the availability of personal computers and user-friendly database software in the early 1990s. To create a database of medical knowledge, the domain expert works like a mapmaker to first outline the domain and then add the details, starting with the most prominent features. The resulting “intelligent database” can support the decisions of healthcare professionals. The intelligent database described in this article contains profiles of 275 infectious diseases. Users can query the database for all diseases matching one or more specific criteria (symptom, endemic region of the world, or epidemiological factor). Epidemiological factors include sources (patients, water, soil, or animals), routes of entry, and insect vectors. Medical and public health professionals could use such a database as a decision-support software tool. PMID:20623018
DOE Office of Scientific and Technical Information (OSTI.GOV)
Coram, Jamie L.; Morrow, James D.; Perkins, David Nikolaus
2015-09-01
This document describes the PANTHER R&D Application, a proof-of-concept user interface application developed under the PANTHER Grand Challenge LDRD. The purpose of the application is to explore interaction models for graph analytics, drive algorithmic improvements from an end-user point of view, and support demonstration of PANTHER technologies to potential customers. The R&D Application implements a graph-centric interaction model that exposes analysts to the algorithms contained within the GeoGraphy graph analytics library. Users define geospatial-temporal semantic graph queries by constructing search templates based on nodes, edges, and the constraints among them. Users then analyze the results of the queries using bothmore » geo-spatial and temporal visualizations. Development of this application has made user experience an explicit driver for project and algorithmic level decisions that will affect how analysts one day make use of PANTHER technologies.« less
NASA Astrophysics Data System (ADS)
Yang, Z.; Han, W.; di, L.
2010-12-01
The National Agricultural Statistics Service (NASS) of the USDA produces the Cropland Data Layer (CDL) product, which is a raster-formatted, geo-referenced, U.S. crop specific land cover classification. These digital data layers are widely used for a variety of applications by universities, research institutions, government agencies, and private industry in climate change studies, environmental ecosystem studies, bioenergy production & transportation planning, environmental health research and agricultural production decision making. The CDL is also used internally by NASS for crop acreage and yield estimation. Like most geospatial data products, the CDL product is only available by CD/DVD delivery or online bulk file downloading via the National Research Conservation Research (NRCS) Geospatial Data Gateway (external users) or in a printed paper map format. There is no online geospatial information access and dissemination, no crop visualization & browsing, no geospatial query capability, nor online analytics. To facilitate the application of this data layer and to help disseminating the data, a web-service based CDL interactive map visualization, dissemination, querying system is proposed. It uses Web service based service oriented architecture, adopts open standard geospatial information science technology and OGC specifications and standards, and re-uses functions/algorithms from GeoBrain Technology (George Mason University developed). This system provides capabilities of on-line geospatial crop information access, query and on-line analytics via interactive maps. It disseminates all data to the decision makers and users via real time retrieval, processing and publishing over the web through standards-based geospatial web services. A CDL region of interest can also be exported directly to Google Earth for mashup or downloaded for use with other desktop application. This web service based system greatly improves equal-accessibility, interoperability, usability, and data visualization, facilitates crop geospatial information usage, and enables US cropland online exploring capability without any client-side software installation. It also greatly reduces the need for paper map and analysis report printing and media usages, and thus enhances low-carbon Agro-geoinformation dissemination for decision support.
A study on spatial decision support systems for HIV/AIDS prevention based on COM GIS technology
NASA Astrophysics Data System (ADS)
Yang, Kun; Luo, Huasong; Peng, Shungyun; Xu, Quanli
2007-06-01
Based on the deeply analysis of the current status and the existing problems of GIS technology applications in Epidemiology, this paper has proposed the method and process for establishing the spatial decision support systems of AIDS epidemic prevention by integrating the COM GIS, Spatial Database, GPS, Remote Sensing, and Communication technologies, as well as ASP and ActiveX software development technologies. One of the most important issues for constructing the spatial decision support systems of AIDS epidemic prevention is how to integrate the AIDS spreading models with GIS. The capabilities of GIS applications in the AIDS epidemic prevention have been described here in this paper firstly. Then some mature epidemic spreading models have also been discussed for extracting the computation parameters. Furthermore, a technical schema has been proposed for integrating the AIDS spreading models with GIS and relevant geospatial technologies, in which the GIS and model running platforms share a common spatial database and the computing results can be spatially visualized on Desktop or Web GIS clients. Finally, a complete solution for establishing the decision support systems of AIDS epidemic prevention has been offered in this paper based on the model integrating methods and ESRI COM GIS software packages. The general decision support systems are composed of data acquisition sub-systems, network communication sub-systems, model integrating sub-systems, AIDS epidemic information spatial database sub-systems, AIDS epidemic information querying and statistical analysis sub-systems, AIDS epidemic dynamic surveillance sub-systems, AIDS epidemic information spatial analysis and decision support sub-systems, as well as AIDS epidemic information publishing sub-systems based on Web GIS.
Towards Building a High Performance Spatial Query System for Large Scale Medical Imaging Data.
Aji, Ablimit; Wang, Fusheng; Saltz, Joel H
2012-11-06
Support of high performance queries on large volumes of scientific spatial data is becoming increasingly important in many applications. This growth is driven by not only geospatial problems in numerous fields, but also emerging scientific applications that are increasingly data- and compute-intensive. For example, digital pathology imaging has become an emerging field during the past decade, where examination of high resolution images of human tissue specimens enables more effective diagnosis, prediction and treatment of diseases. Systematic analysis of large-scale pathology images generates tremendous amounts of spatially derived quantifications of micro-anatomic objects, such as nuclei, blood vessels, and tissue regions. Analytical pathology imaging provides high potential to support image based computer aided diagnosis. One major requirement for this is effective querying of such enormous amount of data with fast response, which is faced with two major challenges: the "big data" challenge and the high computation complexity. In this paper, we present our work towards building a high performance spatial query system for querying massive spatial data on MapReduce. Our framework takes an on demand index building approach for processing spatial queries and a partition-merge approach for building parallel spatial query pipelines, which fits nicely with the computing model of MapReduce. We demonstrate our framework on supporting multi-way spatial joins for algorithm evaluation and nearest neighbor queries for microanatomic objects. To reduce query response time, we propose cost based query optimization to mitigate the effect of data skew. Our experiments show that the framework can efficiently support complex analytical spatial queries on MapReduce.
Towards Building a High Performance Spatial Query System for Large Scale Medical Imaging Data
Aji, Ablimit; Wang, Fusheng; Saltz, Joel H.
2013-01-01
Support of high performance queries on large volumes of scientific spatial data is becoming increasingly important in many applications. This growth is driven by not only geospatial problems in numerous fields, but also emerging scientific applications that are increasingly data- and compute-intensive. For example, digital pathology imaging has become an emerging field during the past decade, where examination of high resolution images of human tissue specimens enables more effective diagnosis, prediction and treatment of diseases. Systematic analysis of large-scale pathology images generates tremendous amounts of spatially derived quantifications of micro-anatomic objects, such as nuclei, blood vessels, and tissue regions. Analytical pathology imaging provides high potential to support image based computer aided diagnosis. One major requirement for this is effective querying of such enormous amount of data with fast response, which is faced with two major challenges: the “big data” challenge and the high computation complexity. In this paper, we present our work towards building a high performance spatial query system for querying massive spatial data on MapReduce. Our framework takes an on demand index building approach for processing spatial queries and a partition-merge approach for building parallel spatial query pipelines, which fits nicely with the computing model of MapReduce. We demonstrate our framework on supporting multi-way spatial joins for algorithm evaluation and nearest neighbor queries for microanatomic objects. To reduce query response time, we propose cost based query optimization to mitigate the effect of data skew. Our experiments show that the framework can efficiently support complex analytical spatial queries on MapReduce. PMID:24501719
Legisi, Lorena; DeSa, Elise; Qureshi, M Nasar
2016-12-01
Prostate cancer is the most common cancer diagnosed in men in developed countries. Using molecular testing may help to improve outcomes in this clinically challenging group. Since 2011, the Prostate Core Mitomic Test (PCMT), which quantifies a 3.4-kb mitochondrial DNA deletion strongly associated with prostate cancer, has been used by more than 50 urology practices accessing pathology services through our laboratory in New Jersey. However, the use of a molecular test can only be beneficial if it affects patient management and improves outcomes. To determine whether repeated biopsy decision-making was affected in a quantifiable manner through the adjunct use of molecular testing with the PCMT. In this observational study we conducted 2 independent, structured query language database queries of our patient records at our laboratory, QDx Pathology Services, in Cranford, NJ. Query 1 included all men who had a negative prostate biopsy and a negative PCMT between February 1, 2011, and June 30, 2013. Men with a previous diagnosis of cancer were excluded. Query 2 included all men who had a negative prostate biopsy and a repeated biopsy between February 1, 2011, and September 30, 2013. The data exported for each query included the unique specimen number for an index biopsy, the interval between biopsies where present, the unique specimen number for a follow-up biopsy where present, histopathology for all biopsies, the biopsy procedure dates, the patient's date of birth, and the PCMT result when utilized. The patient rebiopsy rates and intervals were compared between the patients who were using PCMT and those who were not to assess whether the adjunct use of the PCMT impacted the rebiopsy decision-making process. Query 1 identified 644 men who had a negative biopsy and a negative PCMT result within the study period. Query 2 identified 823 men with a repeat biopsy after the initial negative index biopsy within the study period. Of these men, 132 had PCMT to inform their care. This patient population of 1467 men originated from US-based clinical urology practices. Evaluation of the impact on physician behavior demonstrated a general trend toward the earlier detection of prostate cancer on repeat biopsy by an average of 2.5 months and a coincident increase in cancer detection rates for urologists using the deletion assay in their rebiopsy decision-making process. Importantly, this trend was only observed when men with atypical small acinar proliferation (ASAP) on index biopsy were not considered. In the 644 men with a negative PCMT result, only 35 (5.4%) were subjected to a follow-up biopsy, with 5 (14.3%) of the 35 men identified as having cancer. Finally, the cohort of 132 men who had PCMT and repeat biopsy was compared with the published data supporting PCMT's ability to predict rebiopsy outcome. The key metrics of sensitivity and negative predictive value were comparable and within the 95% confidence intervals of the reported work. Molecular tests, such as the PCMT, are useful in addressing the sampling error of prostate needle biopsy and providing additional evidence to inform the clinical uncertainty regarding initial negative prostate biopsy when ASAP is not present. Longitudinal monitoring of clinical impact indicators provides the necessary inputs to better allocation of healthcare resources in the short- and long-term.
Fast Query-Optimized Kernel-Machine Classification
NASA Technical Reports Server (NTRS)
Mazzoni, Dominic; DeCoste, Dennis
2004-01-01
A recently developed algorithm performs kernel-machine classification via incremental approximate nearest support vectors. The algorithm implements support-vector machines (SVMs) at speeds 10 to 100 times those attainable by use of conventional SVM algorithms. The algorithm offers potential benefits for classification of images, recognition of speech, recognition of handwriting, and diverse other applications in which there are requirements to discern patterns in large sets of data. SVMs constitute a subset of kernel machines (KMs), which have become popular as models for machine learning and, more specifically, for automated classification of input data on the basis of labeled training data. While similar in many ways to k-nearest-neighbors (k-NN) models and artificial neural networks (ANNs), SVMs tend to be more accurate. Using representations that scale only linearly in the numbers of training examples, while exploring nonlinear (kernelized) feature spaces that are exponentially larger than the original input dimensionality, KMs elegantly and practically overcome the classic curse of dimensionality. However, the price that one must pay for the power of KMs is that query-time complexity scales linearly with the number of training examples, making KMs often orders of magnitude more computationally expensive than are ANNs, decision trees, and other popular machine learning alternatives. The present algorithm treats an SVM classifier as a special form of a k-NN. The algorithm is based partly on an empirical observation that one can often achieve the same classification as that of an exact KM by using only small fraction of the nearest support vectors (SVs) of a query. The exact KM output is a weighted sum over the kernel values between the query and the SVs. In this algorithm, the KM output is approximated with a k-NN classifier, the output of which is a weighted sum only over the kernel values involving k selected SVs. Before query time, there are gathered statistics about how misleading the output of the k-NN model can be, relative to the outputs of the exact KM for a representative set of examples, for each possible k from 1 to the total number of SVs. From these statistics, there are derived upper and lower thresholds for each step k. These thresholds identify output levels for which the particular variant of the k-NN model already leans so strongly positively or negatively that a reversal in sign is unlikely, given the weaker SV neighbors still remaining. At query time, the partial output of each query is incrementally updated, stopping as soon as it exceeds the predetermined statistical thresholds of the current step. For an easy query, stopping can occur as early as step k = 1. For more difficult queries, stopping might not occur until nearly all SVs are touched. A key empirical observation is that this approach can tolerate very approximate nearest-neighbor orderings. In experiments, SVs and queries were projected to a subspace comprising the top few principal- component dimensions and neighbor orderings were computed in that subspace. This approach ensured that the overhead of the nearest-neighbor computations was insignificant, relative to that of the exact KM computation.
Gender Differences in Bladder Cancer Treatment Decision Making.
Pozzar, Rachel A; Berry, Donna L
2017-03-01
To explore gender differences in bladder cancer treatment decision making. . Secondary qualitative analysis of interview transcripts. . One multidisciplinary genitourinary oncology clinic (Dana-Farber Cancer Institute) and two urology clinics (Brigham and Women's Hospital and Beth Israel Deaconess Medical Center) in Boston, MA. . As part of the original study, 45 men and 15 women with bladder cancer participated in individual interviews. Participants were primarily Caucasian, and most had at least some college education. . Word frequency reports were used to identify thematic differences between the men's and women's statements. Line-by-line coding of constructs prevalent among women was then performed on all participants in NVivo 9. Coding results were compared between genders using matrix coding queries. . The role of family in the decision-making process was found to be a dominant theme for women but not for men. Women primarily described family members as facilitators of bladder cancer treatment-related decisions, but men were more likely to describe family in a nonsupportive role. . The results suggest that influences on the decision-making process are different for men and women with bladder cancer. Family may play a particularly important role for women faced with bladder cancer treatment-related decisions. . Clinical nurses who care for individuals with bladder cancer should routinely assess patients' support systems and desired level of family participation in decision making. For some people with bladder cancer, family may serve as a stressor. Nurses should support the decision-making processes of all patients and be familiar with resources that can provide support to patients who do not receive it from family.
NASA Astrophysics Data System (ADS)
Li, C.; Zhu, X.; Guo, W.; Liu, Y.; Huang, H.
2015-05-01
A method suitable for indoor complex semantic query considering the computation of indoor spatial relations is provided According to the characteristics of indoor space. This paper designs ontology model describing the space related information of humans, events and Indoor space objects (e.g. Storey and Room) as well as their relations to meet the indoor semantic query. The ontology concepts are used in IndoorSPARQL query language which extends SPARQL syntax for representing and querying indoor space. And four types specific primitives for indoor query, "Adjacent", "Opposite", "Vertical" and "Contain", are defined as query functions in IndoorSPARQL used to support quantitative spatial computations. Also a method is proposed to analysis the query language. Finally this paper adopts this method to realize indoor semantic query on the study area through constructing the ontology model for the study building. The experimental results show that the method proposed in this paper can effectively support complex indoor space semantic query.
Semantic technologies in a decision support system
NASA Astrophysics Data System (ADS)
Wasielewska, K.; Ganzha, M.; Paprzycki, M.; Bǎdicǎ, C.; Ivanovic, M.; Lirkov, I.
2015-10-01
The aim of our work is to design a decision support system based on ontological representation of domain(s) and semantic technologies. Specifically, we consider the case when Grid / Cloud user describes his/her requirements regarding a "resource" as a class expression from an ontology, while the instances of (the same) ontology represent available resources. The goal is to help the user to find the best option with respect to his/her requirements, while remembering that user's knowledge may be "limited." In this context, we discuss multiple approaches based on semantic data processing, which involve different "forms" of user interaction with the system. Specifically, we consider: (a) ontological matchmaking based on SPARQL queries and class expression, (b) graph-based semantic closeness of instances representing user requirements (constructed from the class expression) and available resources, and (c) multicriterial analysis based on the AHP method, which utilizes expert domain knowledge (also ontologically represented).
NASA Astrophysics Data System (ADS)
Liu, Y.; Zhang, W.; Yan, C.
2012-07-01
Presently, planning and assessment in maintenance, renewal and decision-making for watershed hydrology, water resource management and water quality assessment are evolving toward complex, spatially explicit regional environmental assessments. These problems have to be addressed with object-oriented spatio-temporal data models that can restore, manage, query and visualize various historic and updated basic information concerning with watershed hydrology, water resource management and water quality as well as compute and evaluate the watershed environmental conditions so as to provide online forecasting to police-makers and relevant authorities for supporting decision-making. The extensive data requirements and the difficult task of building input parameter files, however, has long been an obstacle to use of such complex models timely and effectively by resource managers. Success depends on an integrated approach that brings together scientific, education and training advances made across many individual disciplines and modified to fit the needs of the individuals and groups who must write, implement, evaluate, and adjust their watershed management plans. The centre for Hydro-science Research, Nanjing University, in cooperation with the relevant watershed management authorities, has developed a WebGIS management platform to facilitate this complex process. Improve the management of watersheds over the Huaihe basin through the development, promotion and use of a web-based, user-friendly, geospatial watershed management data and decision support system (WMDDSS) involved many difficulties for the development of this complicated System. In terms of the spatial and temporal characteristics of historic and currently available information on meteorological, hydrological, geographical, environmental and other relevant disciplines, we designed an object-oriented spatiotemporal data model that combines spatial, attribute and temporal information to implement the management system. Using this system, we can update, query and analyze environmental information as well as manage historical data, and a visualization tool was provided to help the user interpret results so as to provide scientific support for decision-making. The utility of the system has been demonstrated its values by being used in watershed management and environmental assessments.
The Hub Population Health System: distributed ad hoc queries and alerts
Anane, Sheila; Taverna, John; Amirfar, Sam; Stubbs-Dame, Remle; Singer, Jesse
2011-01-01
The Hub Population Health System enables the creation and distribution of queries for aggregate count information, clinical decision support alerts at the point-of-care for patients who meet specified conditions, and secure messages sent directly to provider electronic health record (EHR) inboxes. Using a metronidazole medication recall, the New York City Department of Health was able to determine the number of affected patients and message providers, and distribute an alert to participating practices. As of September 2011, the system is live in 400 practices and within a year will have over 532 practices with 2500 providers, representing over 2.5 million New Yorkers. The Hub can help public health experts to evaluate population health and quality improvement activities throughout the ambulatory care network. Multiple EHR vendors are building these features in partnership with the department's regional extension center in anticipation of new meaningful use requirements. PMID:22071531
2006-06-01
SPARQL SPARQL Protocol and RDF Query Language SQL Structured Query Language SUMO Suggested Upper Merged Ontology SW... Query optimization algorithms are implemented in the Pellet reasoner in order to ensure querying a knowledge base is efficient . These algorithms...memory as a treelike structure in order for the data to be queried . XML Query (XQuery) is the standard language used when querying XML
An integrative model for in-silico clinical-genomics discovery science.
Lussier, Yves A; Sarkar, Indra Nell; Cantor, Michael
2002-01-01
Human Genome discovery research has set the pace for Post-Genomic Discovery Research. While post-genomic fields focused at the molecular level are intensively pursued, little effort is being deployed in the later stages of molecular medicine discovery research, such as clinical-genomics. The objective of this study is to demonstrate the relevance and significance of integrating mainstream clinical informatics decision support systems to current bioinformatics genomic discovery science. This paper is a feasibility study of an original model enabling novel "in-silico" clinical-genomic discovery science and that demonstrates its feasibility. This model is designed to mediate queries among clinical and genomic knowledge bases with relevant bioinformatic analytic tools (e.g. gene clustering). Briefly, trait-disease-gene relationships were successfully illustrated using QMR, OMIM, SNOMED-RT, GeneCluster and TreeView. The analyses were visualized as two-dimensional dendrograms of clinical observations clustered around genes. To our knowledge, this is the first study using knowledge bases of clinical decision support systems for genomic discovery. Although this study is a proof of principle, it provides a framework for the development of clinical decision-support-system driven, high-throughput clinical-genomic technologies which could potentially unveil significant high-level functions of genes.
QRFXFreeze: Queryable Compressor for RFX.
Senthilkumar, Radha; Nandagopal, Gomathi; Ronald, Daphne
2015-01-01
The verbose nature of XML has been mulled over again and again and many compression techniques for XML data have been excogitated over the years. Some of the techniques incorporate support for querying the XML database in its compressed format while others have to be decompressed before they can be queried. XML compression in which querying is directly supported instantaneously with no compromise over time is forced to compromise over space. In this paper, we propose the compressor, QRFXFreeze, which not only reduces the space of storage but also supports efficient querying. The compressor does this without decompressing the compressed XML file. The compressor supports all kinds of XML documents along with insert, update, and delete operations. The forte of QRFXFreeze is that the textual data are semantically compressed and are indexed to reduce the querying time. Experimental results show that the proposed compressor performs much better than other well-known compressors.
Personalization and Patient Involvement in Decision Support Systems: Current Trends
Sacchi, L.; Lanzola, G.; Viani, N.
2015-01-01
Summary Objectives This survey aims at highlighting the latest trends (2012-2014) on the development, use, and evaluation of Information and Communication Technologies (ICT) based decision support systems (DSSs) in medicine, with a particular focus on patient-centered and personalized care. Methods We considered papers published on scientific journals, by querying PubMed and Web of Science™. Included studies focused on the implementation or evaluation of ICT-based tools used in clinical practice. A separate search was performed on computerized physician order entry systems (CPOEs), since they are increasingly embedding patient-tailored decision support. Results We found 73 papers on DSSs (53 on specific ICT tools) and 72 papers on CPOEs. Although decision support through the delivery of recommendations is frequent (28/53 papers), our review highlighted also DSSs only based on efficient information presentation (25/53). Patient participation in making decisions is still limited (9/53), and mostly focused on risk communication. The most represented medical area is cancer (12%). Policy makers are beginning to be included among stakeholders (6/73), but integration with hospital information systems is still low. Concerning knowledge representation/management issues, we identified a trend towards building inference engines on top of standard data models. Most of the tools (57%) underwent a formal assessment study, even if half of them aimed at evaluating usability and not effectiveness. Conclusions Overall, we have noticed interesting evolutions of medical DSSs to improve communication with the patient, consider the economic and organizational impact, and use standard models for knowledge representation. However, systems focusing on patient-centered care still do not seem to be available at large. PMID:26293857
NASA Astrophysics Data System (ADS)
Shahzad, Muhammad A.
1999-02-01
With the emergence of data warehousing, Decision support systems have evolved to its best. At the core of these warehousing systems lies a good database management system. Database server, used for data warehousing, is responsible for providing robust data management, scalability, high performance query processing and integration with other servers. Oracle being the initiator in warehousing servers, provides a wide range of features for facilitating data warehousing. This paper is designed to review the features of data warehousing - conceptualizing the concept of data warehousing and, lastly, features of Oracle servers for implementing a data warehouse.
The development of variable MLM editor and TSQL translator based on Arden Syntax in Taiwan.
Liang, Yan Ching; Chang, Polun
2003-01-01
The Arden Syntax standard has been utilized in the medical informatics community in several countries during the past decade. It is never used in nursing in Taiwan. We try to develop a system that acquire medical expert knowledge in Chinese and translates data and logic slot into TSQL Language. The system implements TSQL translator interpreting database queries referred to in the knowledge modules. The decision-support systems in medicine are data driven system where TSQL triggers as inference engine can be used to facilitate linking to a database.
Demonstration of Hadoop-GIS: A Spatial Data Warehousing System Over MapReduce.
Aji, Ablimit; Sun, Xiling; Vo, Hoang; Liu, Qioaling; Lee, Rubao; Zhang, Xiaodong; Saltz, Joel; Wang, Fusheng
2013-11-01
The proliferation of GPS-enabled devices, and the rapid improvement of scientific instruments have resulted in massive amounts of spatial data in the last decade. Support of high performance spatial queries on large volumes data has become increasingly important in numerous fields, which requires a scalable and efficient spatial data warehousing solution as existing approaches exhibit scalability limitations and efficiency bottlenecks for large scale spatial applications. In this demonstration, we present Hadoop-GIS - a scalable and high performance spatial query system over MapReduce. Hadoop-GIS provides an efficient spatial query engine to process spatial queries, data and space based partitioning, and query pipelines that parallelize queries implicitly on MapReduce. Hadoop-GIS also provides an expressive, SQL-like spatial query language for workload specification. We will demonstrate how spatial queries are expressed in spatially extended SQL queries, and submitted through a command line/web interface for execution. Parallel to our system demonstration, we explain the system architecture and details on how queries are translated to MapReduce operators, optimized, and executed on Hadoop. In addition, we will showcase how the system can be used to support two representative real world use cases: large scale pathology analytical imaging, and geo-spatial data warehousing.
Querying and Ranking XML Documents.
ERIC Educational Resources Information Center
Schlieder, Torsten; Meuss, Holger
2002-01-01
Discussion of XML, information retrieval, precision, and recall focuses on a retrieval technique that adopts the similarity measure of the vector space model, incorporates the document structure, and supports structured queries. Topics include a query model based on tree matching; structured queries and term-based ranking; and term frequency and…
González-Ferrer, A; Peleg, M; Marcos, M; Maldonado, J A
2016-07-01
Delivering patient-specific decision-support based on computer-interpretable guidelines (CIGs) requires mapping CIG clinical statements (data items, clinical recommendations) into patients' data. This is most effectively done via intermediate data schemas, which enable querying the data according to the semantics of a shared standard intermediate schema. This study aims to evaluate the use of HL7 virtual medical record (vMR) and openEHR archetypes as intermediate schemas for capturing clinical statements from CIGs that are mappable to electronic health records (EHRs) containing patient data and patient-specific recommendations. Using qualitative research methods, we analyzed the encoding of ten representative clinical statements taken from two CIGs used in real decision-support systems into two health information models (openEHR archetypes and HL7 vMR instances) by four experienced informaticians. Discussion among the modelers about each case study example greatly increased our understanding of the capabilities of these standards, which we share in this educational paper. Differing in content and structure, the openEHR archetypes were found to contain a greater level of representational detail and structure while the vMR representations took fewer steps to complete. The use of openEHR in the encoding of CIG clinical statements could potentially facilitate applications other than decision-support, including intelligent data analysis and integration of additional properties of data items from existing EHRs. On the other hand, due to their smaller size and fewer details, the use of vMR potentially supports quicker mapping of EHR data into clinical statements.
Leroy, Gondy; Xu, Jennifer; Chung, Wingyan; Eggers, Shauna; Chen, Hsinchun
2007-01-01
Retrieving sufficient relevant information online is difficult for many people because they use too few keywords to search and search engines do not provide many support tools. To further complicate the search, users often ignore support tools when available. Our goal is to evaluate in a realistic setting when users use support tools and how they perceive these tools. We compared three medical search engines with support tools that require more or less effort from users to form a query and evaluate results. We carried out an end user study with 23 users who were asked to find information, i.e., subtopics and supporting abstracts, for a given theme. We used a balanced within-subjects design and report on the effectiveness, efficiency and usability of the support tools from the end user perspective. We found significant differences in efficiency but did not find significant differences in effectiveness between the three search engines. Dynamic user support tools requiring less effort led to higher efficiency. Fewer searches were needed and more documents were found per search when both query reformulation and result review tools dynamically adjust to the user query. The query reformulation tool that provided a long list of keywords, dynamically adjusted to the user query, was used most often and led to more subtopics. As hypothesized, the dynamic result review tools were used more often and led to more subtopics than static ones. These results were corroborated by the usability questionnaires, which showed that support tools that dynamically optimize output were preferred.
Supporting ontology-based keyword search over medical databases.
Kementsietsidis, Anastasios; Lim, Lipyeow; Wang, Min
2008-11-06
The proliferation of medical terms poses a number of challenges in the sharing of medical information among different stakeholders. Ontologies are commonly used to establish relationships between different terms, yet their role in querying has not been investigated in detail. In this paper, we study the problem of supporting ontology-based keyword search queries on a database of electronic medical records. We present several approaches to support this type of queries, study the advantages and limitations of each approach, and summarize the lessons learned as best practices.
Processing SPARQL queries with regular expressions in RDF databases
2011-01-01
Background As the Resource Description Framework (RDF) data model is widely used for modeling and sharing a lot of online bioinformatics resources such as Uniprot (dev.isb-sib.ch/projects/uniprot-rdf) or Bio2RDF (bio2rdf.org), SPARQL - a W3C recommendation query for RDF databases - has become an important query language for querying the bioinformatics knowledge bases. Moreover, due to the diversity of users’ requests for extracting information from the RDF data as well as the lack of users’ knowledge about the exact value of each fact in the RDF databases, it is desirable to use the SPARQL query with regular expression patterns for querying the RDF data. To the best of our knowledge, there is currently no work that efficiently supports regular expression processing in SPARQL over RDF databases. Most of the existing techniques for processing regular expressions are designed for querying a text corpus, or only for supporting the matching over the paths in an RDF graph. Results In this paper, we propose a novel framework for supporting regular expression processing in SPARQL query. Our contributions can be summarized as follows. 1) We propose an efficient framework for processing SPARQL queries with regular expression patterns in RDF databases. 2) We propose a cost model in order to adapt the proposed framework in the existing query optimizers. 3) We build a prototype for the proposed framework in C++ and conduct extensive experiments demonstrating the efficiency and effectiveness of our technique. Conclusions Experiments with a full-blown RDF engine show that our framework outperforms the existing ones by up to two orders of magnitude in processing SPARQL queries with regular expression patterns. PMID:21489225
Processing SPARQL queries with regular expressions in RDF databases.
Lee, Jinsoo; Pham, Minh-Duc; Lee, Jihwan; Han, Wook-Shin; Cho, Hune; Yu, Hwanjo; Lee, Jeong-Hoon
2011-03-29
As the Resource Description Framework (RDF) data model is widely used for modeling and sharing a lot of online bioinformatics resources such as Uniprot (dev.isb-sib.ch/projects/uniprot-rdf) or Bio2RDF (bio2rdf.org), SPARQL - a W3C recommendation query for RDF databases - has become an important query language for querying the bioinformatics knowledge bases. Moreover, due to the diversity of users' requests for extracting information from the RDF data as well as the lack of users' knowledge about the exact value of each fact in the RDF databases, it is desirable to use the SPARQL query with regular expression patterns for querying the RDF data. To the best of our knowledge, there is currently no work that efficiently supports regular expression processing in SPARQL over RDF databases. Most of the existing techniques for processing regular expressions are designed for querying a text corpus, or only for supporting the matching over the paths in an RDF graph. In this paper, we propose a novel framework for supporting regular expression processing in SPARQL query. Our contributions can be summarized as follows. 1) We propose an efficient framework for processing SPARQL queries with regular expression patterns in RDF databases. 2) We propose a cost model in order to adapt the proposed framework in the existing query optimizers. 3) We build a prototype for the proposed framework in C++ and conduct extensive experiments demonstrating the efficiency and effectiveness of our technique. Experiments with a full-blown RDF engine show that our framework outperforms the existing ones by up to two orders of magnitude in processing SPARQL queries with regular expression patterns.
Occam's razor: supporting visual query expression for content-based image queries
NASA Astrophysics Data System (ADS)
Venters, Colin C.; Hartley, Richard J.; Hewitt, William T.
2005-01-01
This paper reports the results of a usability experiment that investigated visual query formulation on three dimensions: effectiveness, efficiency, and user satisfaction. Twenty eight evaluation sessions were conducted in order to assess the extent to which query by visual example supports visual query formulation in a content-based image retrieval environment. In order to provide a context and focus for the investigation, the study was segmented by image type, user group, and use function. The image type consisted of a set of abstract geometric device marks supplied by the UK Trademark Registry. Users were selected from the 14 UK Patent Information Network offices. The use function was limited to the retrieval of images by shape similarity. Two client interfaces were developed for comparison purposes: Trademark Image Browser Engine (TRIBE) and Shape Query Image Retrieval Systems Engine (SQUIRE).
Occam"s razor: supporting visual query expression for content-based image queries
NASA Astrophysics Data System (ADS)
Venters, Colin C.; Hartley, Richard J.; Hewitt, William T.
2004-12-01
This paper reports the results of a usability experiment that investigated visual query formulation on three dimensions: effectiveness, efficiency, and user satisfaction. Twenty eight evaluation sessions were conducted in order to assess the extent to which query by visual example supports visual query formulation in a content-based image retrieval environment. In order to provide a context and focus for the investigation, the study was segmented by image type, user group, and use function. The image type consisted of a set of abstract geometric device marks supplied by the UK Trademark Registry. Users were selected from the 14 UK Patent Information Network offices. The use function was limited to the retrieval of images by shape similarity. Two client interfaces were developed for comparison purposes: Trademark Image Browser Engine (TRIBE) and Shape Query Image Retrieval Systems Engine (SQUIRE).
Improving healthcare services using web based platform for management of medical case studies.
Ogescu, Cristina; Plaisanu, Claudiu; Udrescu, Florian; Dumitru, Silviu
2008-01-01
The paper presents a web based platform for management of medical cases, support for healthcare specialists in taking the best clinical decision. Research has been oriented mostly on multimedia data management, classification algorithms for querying, retrieving and processing different medical data types (text and images). The medical case studies can be accessed by healthcare specialists and by students as anonymous case studies providing trust and confidentiality in Internet virtual environment. The MIDAS platform develops an intelligent framework to manage sets of medical data (text, static or dynamic images), in order to optimize the diagnosis and the decision process, which will reduce the medical errors and will increase the quality of medical act. MIDAS is an integrated project working on medical information retrieval from heterogeneous, distributed medical multimedia database.
Greenes, R A
1991-11-01
Education and decision-support resources useful to radiologists are proliferating for the personal computer/workstation user or are potentially accessible via high-speed networks. These resources are typically made available through a set of application programs that tend to be developed in isolation and operate independently. Nonetheless, there is a growing need for an integrated environment for access to these resources in the context of professional work, during clinical problem-solving and decision-making activities, and for use in conjunction with other information resources. New application development environments are required to provide these capabilities. One such architecture for applications, which we have implemented in a prototype environment called DeSyGNER, is based on separately delineating the component information resources required for an application, termed entities, and the user interface and organizational paradigms, or composition methods, by which the entities are used to provide particular kinds of capability. Examples include composition methods to support query, book browsing, hyperlinking, tutorials, simulations, or question/answer testing. Future steps must address true integration of such applications with existing clinical information systems. We believe that the most viable approach for evolving this capability is based on the use of new software engineering methodologies, open systems, client-server communication, and delineation of standard message protocols.
RDF-GL: A SPARQL-Based Graphical Query Language for RDF
NASA Astrophysics Data System (ADS)
Hogenboom, Frederik; Milea, Viorel; Frasincar, Flavius; Kaymak, Uzay
This chapter presents RDF-GL, a graphical query language (GQL) for RDF. The GQL is based on the textual query language SPARQL and mainly focuses on SPARQL SELECT queries. The advantage of a GQL over textual query languages is that complexity is hidden through the use of graphical symbols. RDF-GL is supported by a Java-based editor, SPARQLinG, which is presented as well. The editor does not only allow for RDF-GL query creation, but also converts RDF-GL queries to SPARQL queries and is able to subsequently execute these. Experiments show that using the GQL in combination with the editor makes RDF querying more accessible for end users.
Amatchmethod Based on Latent Semantic Analysis for Earthquakehazard Emergency Plan
NASA Astrophysics Data System (ADS)
Sun, D.; Zhao, S.; Zhang, Z.; Shi, X.
2017-09-01
The structure of the emergency plan on earthquake is complex, and it's difficult for decision maker to make a decision in a short time. To solve the problem, this paper presents a match method based on Latent Semantic Analysis (LSA). After the word segmentation preprocessing of emergency plan, we carry out keywords extraction according to the part-of-speech and the frequency of words. Then through LSA, we map the documents and query information to the semantic space, and calculate the correlation of documents and queries by the relation between vectors. The experiments results indicate that the LSA can improve the accuracy of emergency plan retrieval efficiently.
Merging OLTP and OLAP - Back to the Future
NASA Astrophysics Data System (ADS)
Lehner, Wolfgang
When the terms "Data Warehousing" and "Online Analytical Processing" were coined in the 1990s by Kimball, Codd, and others, there was an obvious need for separating data and workload for operational transactional-style processing and decision-making implying complex analytical queries over large and historic data sets. Large data warehouse infrastructures have been set up to cope with the special requirements of analytical query answering for multiple reasons: For example, analytical thinking heavily relies on predefined navigation paths to guide the user through the data set and to provide different views on different aggregation levels.Multi-dimensional queries exploiting hierarchically structured dimensions lead to complex star queries at a relational backend, which could hardly be handled by classical relational systems.
Demonstration of Hadoop-GIS: A Spatial Data Warehousing System Over MapReduce
Aji, Ablimit; Sun, Xiling; Vo, Hoang; Liu, Qioaling; Lee, Rubao; Zhang, Xiaodong; Saltz, Joel; Wang, Fusheng
2016-01-01
The proliferation of GPS-enabled devices, and the rapid improvement of scientific instruments have resulted in massive amounts of spatial data in the last decade. Support of high performance spatial queries on large volumes data has become increasingly important in numerous fields, which requires a scalable and efficient spatial data warehousing solution as existing approaches exhibit scalability limitations and efficiency bottlenecks for large scale spatial applications. In this demonstration, we present Hadoop-GIS – a scalable and high performance spatial query system over MapReduce. Hadoop-GIS provides an efficient spatial query engine to process spatial queries, data and space based partitioning, and query pipelines that parallelize queries implicitly on MapReduce. Hadoop-GIS also provides an expressive, SQL-like spatial query language for workload specification. We will demonstrate how spatial queries are expressed in spatially extended SQL queries, and submitted through a command line/web interface for execution. Parallel to our system demonstration, we explain the system architecture and details on how queries are translated to MapReduce operators, optimized, and executed on Hadoop. In addition, we will showcase how the system can be used to support two representative real world use cases: large scale pathology analytical imaging, and geo-spatial data warehousing. PMID:27617325
Intervening to Reduce Suicide Risk in Veterans with Substance Use Disorders
2015-01-01
Other Support for Dr. Valenstein: 1. Effort ended on NIH/ NIA P01 AG031098 (PI – Cutler). 2. Effort ended on DoD W81XWH-11-2-0059 (PI – Hunt). 3...Effort started on VA QUERI RRP 12-511 (PI – Zivin). 19. Effort started on VA QUERI RRP 12- 505 (PI – Pfeiffer). OTHER SUPPORT VALENSTEIN...Pfeiffer, P.) 11/01/13 – 02/28/15 1.2 calendar VA MH-QUERI RRP 12- 505 $90,942 Technology-assisted peer support for recently hospitalized
StarView: The object oriented design of the ST DADS user interface
NASA Technical Reports Server (NTRS)
Williams, J. D.; Pollizzi, J. A.
1992-01-01
StarView is the user interface being developed for the Hubble Space Telescope Data Archive and Distribution Service (ST DADS). ST DADS is the data archive for HST observations and a relational database catalog describing the archived data. Users will use StarView to query the catalog and select appropriate datasets for study. StarView sends requests for archived datasets to ST DADS which processes the requests and returns the database to the user. StarView is designed to be a powerful and extensible user interface. Unique features include an internal relational database to navigate query results, a form definition language that will work with both CRT and X interfaces, a data definition language that will allow StarView to work with any relational database, and the ability to generate adhoc queries without requiring the user to understand the structure of the ST DADS catalog. Ultimately, StarView will allow the user to refine queries in the local database for improved performance and merge in data from external sources for correlation with other query results. The user will be able to create a query from single or multiple forms, merging the selected attributes into a single query. Arbitrary selection of attributes for querying is supported. The user will be able to select how query results are viewed. A standard form or table-row format may be used. Navigation capabilities are provided to aid the user in viewing query results. Object oriented analysis and design techniques were used in the design of StarView to support the mechanisms and concepts required to implement these features. One such mechanism is the Model-View-Controller (MVC) paradigm. The MVC allows the user to have multiple views of the underlying database, while providing a consistent mechanism for interaction regardless of the view. This approach supports both CRT and X interfaces while providing a common mode of user interaction. Another powerful abstraction is the concept of a Query Model. This concept allows a single query to be built form a single or multiple forms before it is submitted to ST DADS. Supporting this concept is the adhoc query generator which allows the user to select and qualify an indeterminate number attributes from the database. The user does not need any knowledge of how the joins across various tables are to be resolved. The adhoc generator calculates the joins automatically and generates the correct SQL query.
ERIC Educational Resources Information Center
Belkin, N. J.; Cool, C.; Kelly, D.; Lin, S. -J.; Park, S. Y.; Perez-Carballo, J.; Sikora, C.
2001-01-01
Reports on the progressive investigation of techniques for supporting interactive query reformulation in the TREC (Text Retrieval Conference) Interactive Track. Highlights include methods of term suggestion; interface design to support different system functionalities; an overview of each year's TREC investigation; and relevance to the development…
A shared computer-based problem-oriented patient record for the primary care team.
Linnarsson, R; Nordgren, K
1995-01-01
1. INTRODUCTION. A computer-based patient record (CPR) system, Swedestar, has been developed for use in primary health care. The principal aim of the system is to support continuous quality improvement through improved information handling, improved decision-making, and improved procedures for quality assurance. The Swedestar system has evolved during a ten-year period beginning in 1984. 2. SYSTEM DESIGN. The design philosophy is based on the following key factors: a shared, problem-oriented patient record; structured data entry based on an extensive controlled vocabulary; advanced search and query functions, where the query language has the most important role; integrated decision support for drug prescribing and care protocols and guidelines; integrated procedures for quality assurance. 3. A SHARED PROBLEM-ORIENTED PATIENT RECORD. The core of the CPR system is the problem-oriented patient record. All problems of one patient, recorded by different members of the care team, are displayed on the problem list. Starting from this list, a problem follow-up can be made, one problem at a time or for several problems simultaneously. Thus, it is possible to get an integrated view, across provider categories, of those problems of one patient that belong together. This shared problem-oriented patient record provides an important basis for the primary care team work. 4. INTEGRATED DECISION SUPPORT. The decision support of the system includes a drug prescribing module and a care protocol module. The drug prescribing module is integrated with the patient records and includes an on-line check of the patient's medication list for potential interactions and data-driven reminders concerning major drug problems. Care protocols have been developed for the most common chronic diseases, such as asthma, diabetes, and hypertension. The patient records can be automatically checked according to the care protocols. 5. PRACTICAL EXPERIENCE. The Swedestar system has been implemented in a primary care area with 30,000 inhabitants. It is being used by all the primary care team members: 15 general practitioners, 25 district nurses, and 10 physiotherapists. Several years of practical experience of the CPR system shows that it has a positive impact on quality of care on four levels: 1) improved clinical follow-up of individual patients; 2) facilitated follow-up of aggregated data such as practice activity analysis, annual reports, and clinical indicators; 3) automated medical audit; and 4) concurrent audit. Within that primary care area, quality of care has improved substantially in several aspects due to the use of the CPR system [1].
Sincere but naive: methodological queries concerning the British Columbia polygamy reference trial.
Ashley, Sean Matthew
2014-11-01
Academics frequently serve as expert witnesses in legal cases, yet their role as transmitters of social scientific knowledge remains under-examined. The present study analyzes the deployment of social science within British Columbia's polygamy reference trial where research is used to support the assertion that polygamy is inherently harmful to society. Within the trial record and the written decision, the protection of monogamy as an institution is performed in part through the marginalization of qualitative methodology and the concurrent privileging of quantitative studies that purportedly demonstrate widespread social harms associated with the practice of polygyny.
The Development of Variable MLM Editor and TSQL Translator Based on Arden Syntax in Taiwan
Liang, Yan-Ching; Chang, Polun
2003-01-01
The Arden Syntax standard has been utilized in the medical informatics community in several countries during the past decade. It is never used in nursing in Taiwan. We try to develop a system that acquire medical expert knowledge in Chinese and translates data and logic slot into TSQL Language. The system implements TSQL translator interpreting database queries referred to in the knowledge modules. The decision-support systems in medicine are data driven system where TSQL triggers as inference engine can be used to facilitate linking to a database. PMID:14728414
A feature dictionary supporting a multi-domain medical knowledge base.
Naeymi-Rad, F
1989-01-01
Because different terminology is used by physicians of different specialties in different locations to refer to the same feature (signs, symptoms, test results), it is essential that our knowledge development tools provide a means to access a common pool of terms. This paper discusses the design of an online medical dictionary that provides a solution to this problem for developers of multi-domain knowledge bases for MEDAS (Medical Emergency Decision Assistance System). Our Feature Dictionary supports phrase equivalents for features, feature interactions, feature classifications, and translations to the binary features generated by the expert during knowledge creation. It is also used in the conversion of a domain knowledge to the database used by the MEDAS inference diagnostic sessions. The Feature Dictionary also provides capabilities for complex queries across multiple domains using the supported relations. The Feature Dictionary supports three methods for feature representation: (1) for binary features, (2) for continuous valued features, and (3) for derived features.
Mamouras, Konstantinos; Raghothaman, Mukund; Alur, Rajeev; Ives, Zachary G; Khanna, Sanjeev
2017-06-01
Real-time decision making in emerging IoT applications typically relies on computing quantitative summaries of large data streams in an efficient and incremental manner. To simplify the task of programming the desired logic, we propose StreamQRE, which provides natural and high-level constructs for processing streaming data. Our language has a novel integration of linguistic constructs from two distinct programming paradigms: streaming extensions of relational query languages and quantitative extensions of regular expressions. The former allows the programmer to employ relational constructs to partition the input data by keys and to integrate data streams from different sources, while the latter can be used to exploit the logical hierarchy in the input stream for modular specifications. We first present the core language with a small set of combinators, formal semantics, and a decidable type system. We then show how to express a number of common patterns with illustrative examples. Our compilation algorithm translates the high-level query into a streaming algorithm with precise complexity bounds on per-item processing time and total memory footprint. We also show how to integrate approximation algorithms into our framework. We report on an implementation in Java, and evaluate it with respect to existing high-performance engines for processing streaming data. Our experimental evaluation shows that (1) StreamQRE allows more natural and succinct specification of queries compared to existing frameworks, (2) the throughput of our implementation is higher than comparable systems (for example, two-to-four times greater than RxJava), and (3) the approximation algorithms supported by our implementation can lead to substantial memory savings.
Mamouras, Konstantinos; Raghothaman, Mukund; Alur, Rajeev; Ives, Zachary G.; Khanna, Sanjeev
2017-01-01
Real-time decision making in emerging IoT applications typically relies on computing quantitative summaries of large data streams in an efficient and incremental manner. To simplify the task of programming the desired logic, we propose StreamQRE, which provides natural and high-level constructs for processing streaming data. Our language has a novel integration of linguistic constructs from two distinct programming paradigms: streaming extensions of relational query languages and quantitative extensions of regular expressions. The former allows the programmer to employ relational constructs to partition the input data by keys and to integrate data streams from different sources, while the latter can be used to exploit the logical hierarchy in the input stream for modular specifications. We first present the core language with a small set of combinators, formal semantics, and a decidable type system. We then show how to express a number of common patterns with illustrative examples. Our compilation algorithm translates the high-level query into a streaming algorithm with precise complexity bounds on per-item processing time and total memory footprint. We also show how to integrate approximation algorithms into our framework. We report on an implementation in Java, and evaluate it with respect to existing high-performance engines for processing streaming data. Our experimental evaluation shows that (1) StreamQRE allows more natural and succinct specification of queries compared to existing frameworks, (2) the throughput of our implementation is higher than comparable systems (for example, two-to-four times greater than RxJava), and (3) the approximation algorithms supported by our implementation can lead to substantial memory savings. PMID:29151821
DOT National Transportation Integrated Search
2011-01-01
The goal this research is to develop an end-to-end data-driven system, dubbed TransDec : (short for Transportation Decision-Making), to enable decision-making queries in : transportation systems with dynamic, real-time and historical data. With Trans...
Woo, Hyekyung; Cho, Youngtae; Shim, Eunyoung; Lee, Jong-Koo; Lee, Chang-Gun; Kim, Seong Hwan
2016-07-04
As suggested as early as in 2006, logs of queries submitted to search engines seeking information could be a source for detection of emerging influenza epidemics if changes in the volume of search queries are monitored (infodemiology). However, selecting queries that are most likely to be associated with influenza epidemics is a particular challenge when it comes to generating better predictions. In this study, we describe a methodological extension for detecting influenza outbreaks using search query data; we provide a new approach for query selection through the exploration of contextual information gleaned from social media data. Additionally, we evaluate whether it is possible to use these queries for monitoring and predicting influenza epidemics in South Korea. Our study was based on freely available weekly influenza incidence data and query data originating from the search engine on the Korean website Daum between April 3, 2011 and April 5, 2014. To select queries related to influenza epidemics, several approaches were applied: (1) exploring influenza-related words in social media data, (2) identifying the chief concerns related to influenza, and (3) using Web query recommendations. Optimal feature selection by least absolute shrinkage and selection operator (Lasso) and support vector machine for regression (SVR) were used to construct a model predicting influenza epidemics. In total, 146 queries related to influenza were generated through our initial query selection approach. A considerable proportion of optimal features for final models were derived from queries with reference to the social media data. The SVR model performed well: the prediction values were highly correlated with the recent observed influenza-like illness (r=.956; P<.001) and virological incidence rate (r=.963; P<.001). These results demonstrate the feasibility of using search queries to enhance influenza surveillance in South Korea. In addition, an approach for query selection using social media data seems ideal for supporting influenza surveillance based on search query data.
Woo, Hyekyung; Shim, Eunyoung; Lee, Jong-Koo; Lee, Chang-Gun; Kim, Seong Hwan
2016-01-01
Background As suggested as early as in 2006, logs of queries submitted to search engines seeking information could be a source for detection of emerging influenza epidemics if changes in the volume of search queries are monitored (infodemiology). However, selecting queries that are most likely to be associated with influenza epidemics is a particular challenge when it comes to generating better predictions. Objective In this study, we describe a methodological extension for detecting influenza outbreaks using search query data; we provide a new approach for query selection through the exploration of contextual information gleaned from social media data. Additionally, we evaluate whether it is possible to use these queries for monitoring and predicting influenza epidemics in South Korea. Methods Our study was based on freely available weekly influenza incidence data and query data originating from the search engine on the Korean website Daum between April 3, 2011 and April 5, 2014. To select queries related to influenza epidemics, several approaches were applied: (1) exploring influenza-related words in social media data, (2) identifying the chief concerns related to influenza, and (3) using Web query recommendations. Optimal feature selection by least absolute shrinkage and selection operator (Lasso) and support vector machine for regression (SVR) were used to construct a model predicting influenza epidemics. Results In total, 146 queries related to influenza were generated through our initial query selection approach. A considerable proportion of optimal features for final models were derived from queries with reference to the social media data. The SVR model performed well: the prediction values were highly correlated with the recent observed influenza-like illness (r=.956; P<.001) and virological incidence rate (r=.963; P<.001). Conclusions These results demonstrate the feasibility of using search queries to enhance influenza surveillance in South Korea. In addition, an approach for query selection using social media data seems ideal for supporting influenza surveillance based on search query data. PMID:27377323
HodDB: Design and Analysis of a Query Processor for Brick.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fierro, Gabriel; Culler, David
Brick is a recently proposed metadata schema and ontology for describing building components and the relationships between them. It represents buildings as directed labeled graphs using the RDF data model. Using the SPARQL query language, building-agnostic applications query a Brick graph to discover the set of resources and relationships they require to operate. Latency-sensitive applications, such as user interfaces, demand response and modelpredictive control, require fast queries — conventionally less than 100ms. We benchmark a set of popular open-source and commercial SPARQL databases against three real Brick models using seven application queries and find that none of them meet thismore » performance target. This lack of performance can be attributed to design decisions that optimize for queries over large graphs consisting of billions of triples, but give poor spatial locality and join performance on the small dense graphs typical of Brick. We present the design and evaluation of HodDB, a RDF/SPARQL database for Brick built over a node-based index structure. HodDB performs Brick queries 3-700x faster than leading SPARQL databases and consistently meets the 100ms threshold, enabling the portability of important latency-sensitive building applications.« less
Financial capacity in older adults: a growing concern for clinicians.
Gardiner, Paul A; Byrne, Gerard J; Mitchell, Leander K; Pachana, Nancy A
2015-02-02
Older people with cognitive impairment and/or dementia may be particularly vulnerable to diminished financial decision-making capacity. Financial capacity refers to the ability to satisfactorily manage one's financial affairs in a manner consistent with personal self-interest and values. Impairment of financial capacity makes the older individual vulnerable to financial exploitation, may negatively affect their family's financial situation and places strain on relationships within the family. Clinicians are often on the front line of responding to queries regarding decision-making capacity, and clinical evaluation options are often not well understood. Assessment of financial capacity should include formal objective assessment in addition to a clinical interview and gathering contextual data. Development of a flexible, empirically supported and clinically relevant assessment approach that spans all dimensions of financial capacity yet is simple enough to be used by non-specialist clinicians is needed.
Stapleton, Jo Anne; Sonenshein, Roy
2004-01-01
Beginning in 1995 the U.S. Geological Survey (USGS) funded scientific research to support the restoration of the Greater Everglades area and to supply decision makers and resource mangers with sound data on which to base their actions. However, none of the research and resulting data is useful if it can?t be discovered, can?t be assessed for utility in an application, can?t be accessed, or is in an undetermined format. The decision was made early in the USGS Place-Based Studies (PBS) program to create a ?one-stop? entry for information and data about USGS research results. To facilitate the discovery process some mechanism was needed to allow standardized queries about data. The FGDC metadata standard has been used to document the South Florida PBS data from the beginning.
NASA Technical Reports Server (NTRS)
Aspinall, David; Denney, Ewen; Lueth, Christoph
2012-01-01
We motivate and introduce a query language PrQL designed for inspecting machine representations of proofs. PrQL natively supports hiproofs which express proof structure using hierarchical nested labelled trees. The core language presented in this paper is locally structured (first-order), with queries built using recursion and patterns over proof structure and rule names. We define the syntax and semantics of locally structured queries, demonstrate their power, and sketch some implementation experiments.
Bennema, S C; Molento, M B; Scholte, R G; Carvalho, O S; Pritsch, I
2017-11-01
Fascioliasis is a condition caused by the trematode Fasciola hepatica. In this paper, the spatial distribution of F. hepatica in bovines in Brazil was modelled using a decision tree approach and a logistic regression, combined with a geographic information system (GIS) query. In the decision tree and the logistic model, isothermality had the strongest influence on disease prevalence. Also, the 50-year average precipitation in the warmest quarter of the year was included as a risk factor, having a negative influence on the parasite prevalence. The risk maps developed using both techniques, showed a predicted higher prevalence mainly in the South of Brazil. The prediction performance seemed to be high, but both techniques failed to reach a high accuracy in predicting the medium and high prevalence classes to the entire country. The GIS query map, based on the range of isothermality, minimum temperature of coldest month, precipitation of warmest quarter of the year, altitude and the average dailyland surface temperature, showed a possibility of presence of F. hepatica in a very large area. The risk maps produced using these methods can be used to focus activities of animal and public health programmes, even on non-evaluated F. hepatica areas.
Local classifier weighting by quadratic programming.
Cevikalp, Hakan; Polikar, Robi
2008-10-01
It has been widely accepted that the classification accuracy can be improved by combining outputs of multiple classifiers. However, how to combine multiple classifiers with various (potentially conflicting) decisions is still an open problem. A rich collection of classifier combination procedures -- many of which are heuristic in nature -- have been developed for this goal. In this brief, we describe a dynamic approach to combine classifiers that have expertise in different regions of the input space. To this end, we use local classifier accuracy estimates to weight classifier outputs. Specifically, we estimate local recognition accuracies of classifiers near a query sample by utilizing its nearest neighbors, and then use these estimates to find the best weights of classifiers to label the query. The problem is formulated as a convex quadratic optimization problem, which returns optimal nonnegative classifier weights with respect to the chosen objective function, and the weights ensure that locally most accurate classifiers are weighted more heavily for labeling the query sample. Experimental results on several data sets indicate that the proposed weighting scheme outperforms other popular classifier combination schemes, particularly on problems with complex decision boundaries. Hence, the results indicate that local classification-accuracy-based combination techniques are well suited for decision making when the classifiers are trained by focusing on different regions of the input space.
Gupta, Supriya; Klein, Kandace; Singh, Anand H; Thrall, James H
2017-05-01
Awareness of imaging utilization increased after implementation of Radiology Order Entry with decision support systems (ROE-DS). Our hypothesis is few exams with low Clinical Appropriateness Score (CAS) on ROE-DS are performed. Clinical indications of exams with CAS less than 3 (9-point scale) were re-reviewed and reports analyzed. Structured Query Language-based query retrieved exams with CAS less than 3 in ROE-DS from January 2007 to December 2011. Reasons provided by physicians for ordering these exams and reports of exams performed were analyzed. For each indication, number of exams ordered and performed was calculated. Statistical significance was assessed using Student's t test and χ 2 analysis (P < .05). From 445,984 exams, 12,615 exams (2.8%) had CAS less than 3, and 7,956 exams (63%) were performed. Reasons for ordering of 12,615 low CAS exams were as follows: Requests by physician specialists without further explanation (4,516 = 35.8%), notation of special clinical circumstances (2,877 = 22.8%), requests by nonphysician staff without further explanation (1,383 = 10.9%), absence of suspected finding on previous modality (1,099 = 8.7%), patient preference (737 = 5.8%), and requests based on radiologists' recommendations (706 = 5.6%). Difference between male and female (male < female) preferences for low CAS exams was statistically significant (P < .01). Imaging outcome was highest for extremity MRI cases (66.7%; P < .01). Less than 3% of exams ordered had low CAS and about two-thirds of these were performed. Most common indication for ordering these exams was physician specialist request based on opinion of medical necessity without specification. Extremity MRI constituted the highest positive findings for low CAS exams performed. Published by Elsevier Inc.
Improved data retrieval from TreeBASE via taxonomic and linguistic data enrichment
Anwar, Nadia; Hunt, Ela
2009-01-01
Background TreeBASE, the only data repository for phylogenetic studies, is not being used effectively since it does not meet the taxonomic data retrieval requirements of the systematics community. We show, through an examination of the queries performed on TreeBASE, that data retrieval using taxon names is unsatisfactory. Results We report on a new wrapper supporting taxon queries on TreeBASE by utilising a Taxonomy and Classification Database (TCl-Db) we created. TCl-Db holds merged and consolidated taxonomic names from multiple data sources and can be used to translate hierarchical, vernacular and synonym queries into specific query terms in TreeBASE. The query expansion supported by TCl-Db shows very significant information retrieval quality improvement. The wrapper can be accessed at the URL The methodology we developed is scalable and can be applied to new data, as those become available in the future. Conclusion Significantly improved data retrieval quality is shown for all queries, and additional flexibility is achieved via user-driven taxonomy selection. PMID:19426482
SPARQL Assist language-neutral query composer
2012-01-01
Background SPARQL query composition is difficult for the lay-person, and even the experienced bioinformatician in cases where the data model is unfamiliar. Moreover, established best-practices and internationalization concerns dictate that the identifiers for ontological terms should be opaque rather than human-readable, which further complicates the task of synthesizing queries manually. Results We present SPARQL Assist: a Web application that addresses these issues by providing context-sensitive type-ahead completion during SPARQL query construction. Ontological terms are suggested using their multi-lingual labels and descriptions, leveraging existing support for internationalization and language-neutrality. Moreover, the system utilizes the semantics embedded in ontologies, and within the query itself, to help prioritize the most likely suggestions. Conclusions To ensure success, the Semantic Web must be easily available to all users, regardless of locale, training, or preferred language. By enhancing support for internationalization, and moreover by simplifying the manual construction of SPARQL queries through the use of controlled-natural-language interfaces, we believe we have made some early steps towards simplifying access to Semantic Web resources. PMID:22373327
SPARQL assist language-neutral query composer.
McCarthy, Luke; Vandervalk, Ben; Wilkinson, Mark
2012-01-25
SPARQL query composition is difficult for the lay-person, and even the experienced bioinformatician in cases where the data model is unfamiliar. Moreover, established best-practices and internationalization concerns dictate that the identifiers for ontological terms should be opaque rather than human-readable, which further complicates the task of synthesizing queries manually. We present SPARQL Assist: a Web application that addresses these issues by providing context-sensitive type-ahead completion during SPARQL query construction. Ontological terms are suggested using their multi-lingual labels and descriptions, leveraging existing support for internationalization and language-neutrality. Moreover, the system utilizes the semantics embedded in ontologies, and within the query itself, to help prioritize the most likely suggestions. To ensure success, the Semantic Web must be easily available to all users, regardless of locale, training, or preferred language. By enhancing support for internationalization, and moreover by simplifying the manual construction of SPARQL queries through the use of controlled-natural-language interfaces, we believe we have made some early steps towards simplifying access to Semantic Web resources.
Information Network Model Query Processing
NASA Astrophysics Data System (ADS)
Song, Xiaopu
Information Networking Model (INM) [31] is a novel database model for real world objects and relationships management. It naturally and directly supports various kinds of static and dynamic relationships between objects. In INM, objects are networked through various natural and complex relationships. INM Query Language (INM-QL) [30] is designed to explore such information network, retrieve information about schema, instance, their attributes, relationships, and context-dependent information, and process query results in the user specified form. INM database management system has been implemented using Berkeley DB, and it supports INM-QL. This thesis is mainly focused on the implementation of the subsystem that is able to effectively and efficiently process INM-QL. The subsystem provides a lexical and syntactical analyzer of INM-QL, and it is able to choose appropriate evaluation strategies and index mechanism to process queries in INM-QL without the user's intervention. It also uses intermediate result structure to hold intermediate query result and other helping structures to reduce complexity of query processing.
Searching Choices: Quantifying Decision-Making Processes Using Search Engine Data.
Moat, Helen Susannah; Olivola, Christopher Y; Chater, Nick; Preis, Tobias
2016-07-01
When making a decision, humans consider two types of information: information they have acquired through their prior experience of the world, and further information they gather to support the decision in question. Here, we present evidence that data from search engines such as Google can help us model both sources of information. We show that statistics from search engines on the frequency of content on the Internet can help us estimate the statistical structure of prior experience; and, specifically, we outline how such statistics can inform psychological theories concerning the valuation of human lives, or choices involving delayed outcomes. Turning to information gathering, we show that search query data might help measure human information gathering, and it may predict subsequent decisions. Such data enable us to compare information gathered across nations, where analyses suggest, for example, a greater focus on the future in countries with a higher per capita GDP. We conclude that search engine data constitute a valuable new resource for cognitive scientists, offering a fascinating new tool for understanding the human decision-making process. Copyright © 2016 The Authors. Topics in Cognitive Science published by Wiley Periodicals, Inc. on behalf of Cognitive Science Society.
Rogers, Patrick; Erdal, Selnur; Santangelo, Jennifer; Liu, Jianhua; Schuster, Dara; Kamal, Jyoti
2008-11-06
The Ohio State University Medical Center (OSUMC) Information Warehouse (IW) is a comprehensive data warehousing facility incorporating operational, clinical, and biological data sets from multiple enterprise system. It is common for users of the IW to request complex ad-hoc queries that often require significant intervention by data analyst. In response to this challenge, we have designed a workflow that leverages synthesized data elements to support such queries in an more timely, efficient manner.
Distributed Sensing and Processing Adaptive Collaboration Environment (D-SPACE)
2014-07-01
to the query graph, or subgraph permutations with the same mismatch cost (often the case for homogeneous and/or symmetrical data/query). To avoid...decisions are generated in a bottom-up manner using the metric of entropy at the cluster level (Figure 9c). Using the definition of belief messages...for a cluster and a set of data nodes in this cluster , we compute the entropy for forward and backward messages as (,) = −∑ (
A new XML-based query language, CSRML, has been developed for representing chemical substructures, molecules, reaction rules, and reactions. CSRML queries are capable of integrating additional forms of information beyond the simple substructure (e.g., SMARTS) or reaction transfor...
WORDGRAPH: Keyword-in-Context Visualization for NETSPEAK's Wildcard Search.
Riehmann, Patrick; Gruendl, Henning; Potthast, Martin; Trenkmann, Martin; Stein, Benno; Froehlich, Benno
2012-09-01
The WORDGRAPH helps writers in visually choosing phrases while writing a text. It checks for the commonness of phrases and allows for the retrieval of alternatives by means of wildcard queries. To support such queries, we implement a scalable retrieval engine, which returns high-quality results within milliseconds using a probabilistic retrieval strategy. The results are displayed as WORDGRAPH visualization or as a textual list. The graphical interface provides an effective means for interactive exploration of search results using filter techniques, query expansion, and navigation. Our observations indicate that, of three investigated retrieval tasks, the textual interface is sufficient for the phrase verification task, wherein both interfaces support context-sensitive word choice, and the WORDGRAPH best supports the exploration of a phrase's context or the underlying corpus. Our user study confirms these observations and shows that WORDGRAPH is generally the preferred interface over the textual result list for queries containing multiple wildcards.
Content-aware network storage system supporting metadata retrieval
NASA Astrophysics Data System (ADS)
Liu, Ke; Qin, Leihua; Zhou, Jingli; Nie, Xuejun
2008-12-01
Nowadays, content-based network storage has become the hot research spot of academy and corporation[1]. In order to solve the problem of hit rate decline causing by migration and achieve the content-based query, we exploit a new content-aware storage system which supports metadata retrieval to improve the query performance. Firstly, we extend the SCSI command descriptor block to enable system understand those self-defined query requests. Secondly, the extracted metadata is encoded by extensible markup language to improve the universality. Thirdly, according to the demand of information lifecycle management (ILM), we store those data in different storage level and use corresponding query strategy to retrieval them. Fourthly, as the file content identifier plays an important role in locating data and calculating block correlation, we use it to fetch files and sort query results through friendly user interface. Finally, the experiments indicate that the retrieval strategy and sort algorithm have enhanced the retrieval efficiency and precision.
Added Value of Selected Images Embedded Into Radiology Reports to Referring Clinicians
Iyer, Veena R.; Hahn, Peter F.; Blaszkowsky, Lawrence S.; Thayer, Sarah P.; Halpern, Elkan F.; Harisinghani, Mukesh G.
2011-01-01
Purpose The aim of this study was to evaluate the added utility of embedding images for findings described in radiology text reports to referring clinicians. Methods Thirty-five cases referred for abdominal CT scans in 2007 and 2008 were included. Referring physicians were asked to view text-only reports, followed by the same reports with pertinent images embedded. For each pair of reports, a questionnaire was administered. A 5-point, Likert-type scale was used to assess if the clinical query was satisfactorily answered by the text-only report. A “yes-or-no” question was used to assess whether the report with images answered the clinical query better; a positive answer to this question generated “yes-or-no” queries to examine whether the report with images helped in making a more confident decision on management, whether it reduced time spent in forming the plan, and whether it altered management. The questionnaire asked whether a radiologist would be contacted with queries on reading the text-only report and the report with images. Results In 32 of 35 cases, the text-only reports satisfactorily answered the clinical queries. In these 32 cases, the reports with attached images helped in making more confident management decisions and reduced time in planning management. Attached images altered management in 2 cases. Radiologists would have been consulted for clarifications in 21 and 10 cases on reading the text-only reports and the reports with embedded images, respectively. Conclusions Providing relevant images with reports saves time, increases physicians' confidence in deciding treatment plans, and can alter management. PMID:20193926
Federated ontology-based queries over cancer data
2012-01-01
Background Personalised medicine provides patients with treatments that are specific to their genetic profiles. It requires efficient data sharing of disparate data types across a variety of scientific disciplines, such as molecular biology, pathology, radiology and clinical practice. Personalised medicine aims to offer the safest and most effective therapeutic strategy based on the gene variations of each subject. In particular, this is valid in oncology, where knowledge about genetic mutations has already led to new therapies. Current molecular biology techniques (microarrays, proteomics, epigenetic technology and improved DNA sequencing technology) enable better characterisation of cancer tumours. The vast amounts of data, however, coupled with the use of different terms - or semantic heterogeneity - in each discipline makes the retrieval and integration of information difficult. Results Existing software infrastructures for data-sharing in the cancer domain, such as caGrid, support access to distributed information. caGrid follows a service-oriented model-driven architecture. Each data source in caGrid is associated with metadata at increasing levels of abstraction, including syntactic, structural, reference and domain metadata. The domain metadata consists of ontology-based annotations associated with the structural information of each data source. However, caGrid's current querying functionality is given at the structural metadata level, without capitalising on the ontology-based annotations. This paper presents the design of and theoretical foundations for distributed ontology-based queries over cancer research data. Concept-based queries are reformulated to the target query language, where join conditions between multiple data sources are found by exploiting the semantic annotations. The system has been implemented, as a proof of concept, over the caGrid infrastructure. The approach is applicable to other model-driven architectures. A graphical user interface has been developed, supporting ontology-based queries over caGrid data sources. An extensive evaluation of the query reformulation technique is included. Conclusions To support personalised medicine in oncology, it is crucial to retrieve and integrate molecular, pathology, radiology and clinical data in an efficient manner. The semantic heterogeneity of the data makes this a challenging task. Ontologies provide a formal framework to support querying and integration. This paper provides an ontology-based solution for querying distributed databases over service-oriented, model-driven infrastructures. PMID:22373043
NASA Astrophysics Data System (ADS)
Liao, S.; Chen, L.; Li, J.; Xiong, W.; Wu, Q.
2015-07-01
Existing spatiotemporal database supports spatiotemporal aggregation query over massive moving objects datasets. Due to the large amounts of data and single-thread processing method, the query speed cannot meet the application requirements. On the other hand, the query efficiency is more sensitive to spatial variation then temporal variation. In this paper, we proposed a spatiotemporal aggregation query method using multi-thread parallel technique based on regional divison and implemented it on the server. Concretely, we divided the spatiotemporal domain into several spatiotemporal cubes, computed spatiotemporal aggregation on all cubes using the technique of multi-thread parallel processing, and then integrated the query results. By testing and analyzing on the real datasets, this method has improved the query speed significantly.
Multi-criteria decision analysis in environmental sciences: ten years of applications and trends.
Huang, Ivy B; Keisler, Jeffrey; Linkov, Igor
2011-09-01
Decision-making in environmental projects requires consideration of trade-offs between socio-political, environmental, and economic impacts and is often complicated by various stakeholder views. Multi-criteria decision analysis (MCDA) emerged as a formal methodology to face available technical information and stakeholder values to support decisions in many fields and can be especially valuable in environmental decision making. This study reviews environmental applications of MCDA. Over 300 papers published between 2000 and 2009 reporting MCDA applications in the environmental field were identified through a series of queries in the Web of Science database. The papers were classified by their environmental application area, decision or intervention type. In addition, the papers were also classified by the MCDA methods used in the analysis (analytic hierarchy process, multi-attribute utility theory, and outranking). The results suggest that there is a significant growth in environmental applications of MCDA over the last decade across all environmental application areas. Multiple MCDA tools have been successfully used for environmental applications. Even though the use of the specific methods and tools varies in different application areas and geographic regions, our review of a few papers where several methods were used in parallel with the same problem indicates that recommended course of action does not vary significantly with the method applied. Published by Elsevier B.V.
Making the connection: the VA-Regenstrief project.
Martin, D K
1992-01-01
The Regenstrief Automated Medical Record System is a well-established clinical information system with powerful facilities for querying and decision support. My colleagues and I introduced this system into the Indianapolis Veterans Affairs (VA) Medical Center by interfacing it to the institution's automated data-processing system, the Decentralized Hospital Computer Program (DHCP), using a recently standardized method for clinical data interchange. This article discusses some of the challenges encountered in that process, including the translation of vocabulary terms and maintenance of the software interface. Efforts such as these demonstrate the importance of standardization in medical informatics and the need for data standards at all levels of information exchange.
A Semantic Sensor Web for Environmental Decision Support Applications
Gray, Alasdair J. G.; Sadler, Jason; Kit, Oles; Kyzirakos, Kostis; Karpathiotakis, Manos; Calbimonte, Jean-Paul; Page, Kevin; García-Castro, Raúl; Frazer, Alex; Galpin, Ixent; Fernandes, Alvaro A. A.; Paton, Norman W.; Corcho, Oscar; Koubarakis, Manolis; De Roure, David; Martinez, Kirk; Gómez-Pérez, Asunción
2011-01-01
Sensing devices are increasingly being deployed to monitor the physical world around us. One class of application for which sensor data is pertinent is environmental decision support systems, e.g., flood emergency response. For these applications, the sensor readings need to be put in context by integrating them with other sources of data about the surrounding environment. Traditional systems for predicting and detecting floods rely on methods that need significant human resources. In this paper we describe a semantic sensor web architecture for integrating multiple heterogeneous datasets, including live and historic sensor data, databases, and map layers. The architecture provides mechanisms for discovering datasets, defining integrated views over them, continuously receiving data in real-time, and visualising on screen and interacting with the data. Our approach makes extensive use of web service standards for querying and accessing data, and semantic technologies to discover and integrate datasets. We demonstrate the use of our semantic sensor web architecture in the context of a flood response planning web application that uses data from sensor networks monitoring the sea-state around the coast of England. PMID:22164110
ERIC Educational Resources Information Center
Lynch, Clifford A.
1991-01-01
Describes several aspects of the problem of supporting information retrieval system query requirements in the relational database management system (RDBMS) environment and proposes an extension to query processing called nonmaterialized relations. User interactions with information retrieval systems are discussed, and nonmaterialized relations are…
Using Induction to Refine Information Retrieval Strategies
NASA Technical Reports Server (NTRS)
Baudin, Catherine; Pell, Barney; Kedar, Smadar
1994-01-01
Conceptual information retrieval systems use structured document indices, domain knowledge and a set of heuristic retrieval strategies to match user queries with a set of indices describing the document's content. Such retrieval strategies increase the set of relevant documents retrieved (increase recall), but at the expense of returning additional irrelevant documents (decrease precision). Usually in conceptual information retrieval systems this tradeoff is managed by hand and with difficulty. This paper discusses ways of managing this tradeoff by the application of standard induction algorithms to refine the retrieval strategies in an engineering design domain. We gathered examples of query/retrieval pairs during the system's operation using feedback from a user on the retrieved information. We then fed these examples to the induction algorithm and generated decision trees that refine the existing set of retrieval strategies. We found that (1) induction improved the precision on a set of queries generated by another user, without a significant loss in recall, and (2) in an interactive mode, the decision trees pointed out flaws in the retrieval and indexing knowledge and suggested ways to refine the retrieval strategies.
Development of Data Processing Software for NBI Spectroscopic Analysis System
NASA Astrophysics Data System (ADS)
Zhang, Xiaodan; Hu, Chundong; Sheng, Peng; Zhao, Yuanzhe; Wu, Deyun; Cui, Qinglong
2015-04-01
A set of data processing software is presented in this paper for processing NBI spectroscopic data. For better and more scientific managment and querying these data, they are managed uniformly by the NBI data server. The data processing software offers the functions of uploading beam spectral original and analytic data to the data server manually and automatically, querying and downloading all the NBI data, as well as dealing with local LZO data. The set software is composed of a server program and a client program. The server software is programmed in C/C++ under a CentOS development environment. The client software is developed under a VC 6.0 platform, which offers convenient operational human interfaces. The network communications between the server and the client are based on TCP. With the help of this set software, the NBI spectroscopic analysis system realizes the unattended automatic operation, and the clear interface also makes it much more convenient to offer beam intensity distribution data and beam power data to operators for operation decision-making. supported by National Natural Science Foundation of China (No. 11075183), the Chinese Academy of Sciences Knowledge Innovation
[Design and implementation of Chinese materia medica resources survey results display system].
Wang, Hui; Zhang, Xiao-Bo; Ge, Xiao-Guang; Jin, Yan; Wang, Ling; Zhao, Yan-Ping; Jing, Zhi-Xian; Guo, Lan-Ping; Huang, Lu-Qi
2017-11-01
From the beginning of the fourth national census of traditional Chinese medicine resources in 2011, a large amount of data have been collected and compiled, including wild medicinal plant resource data, cultivation of medicinal plant information, traditional knowledge, and specimen information. The traditional paper-based recording method is inconvenient for query and application. The B/S architecture, JavaWeb framework and SOA are used to design and develop the fourth national census results display platform. Through the data integration and sorting, the users are to provide with integrated data services and data query display solutions. The platform realizes the fine data classification, and has the simple data retrieval and the university statistical analysis function. The platform uses Echarts components, Geo Server, Open Layers and other technologies to provide a variety of data display forms such as charts, maps and other visualization forms, intuitive reflects the number, distribution and type of Chinese material medica resources. It meets the data mapping requirements of different levels of users, and provides support for management decision-making. Copyright© by the Chinese Pharmaceutical Association.
Goetz, Matthew B; Bowman, Candice; Hoang, Tuyen; Anaya, Henry; Osborn, Teresa; Gifford, Allen L; Asch, Steven M
2008-03-19
We describe how we used the framework of the U.S. Department of Veterans Affairs (VA) Quality Enhancement Research Initiative (QUERI) to develop a program to improve rates of diagnostic testing for the Human Immunodeficiency Virus (HIV). This venture was prompted by the observation by the CDC that 25% of HIV-infected patients do not know their diagnosis - a point of substantial importance to the VA, which is the largest provider of HIV care in the United States. Following the QUERI steps (or process), we evaluated: 1) whether undiagnosed HIV infection is a high-risk, high-volume clinical issue within the VA, 2) whether there are evidence-based recommendations for HIV testing, 3) whether there are gaps in the performance of VA HIV testing, and 4) the barriers and facilitators to improving current practice in the VA.Based on our findings, we developed and initiated a QUERI step 4/phase 1 pilot project using the precepts of the Chronic Care Model. Our improvement strategy relies upon electronic clinical reminders to provide decision support; audit/feedback as a clinical information system, and appropriate changes in delivery system design. These activities are complemented by academic detailing and social marketing interventions to achieve provider activation. Our preliminary formative evaluation indicates the need to ensure leadership and team buy-in, address facility-specific barriers, refine the reminder, and address factors that contribute to inter-clinic variances in HIV testing rates. Preliminary unadjusted data from the first seven months of our program show 3-5 fold increases in the proportion of at-risk patients who are offered HIV testing at the VA sites (stations) where the pilot project has been undertaken; no change was seen at control stations. This project demonstrates the early success of the application of the QUERI process to the development of a program to improve HIV testing rates. Preliminary unadjusted results show that the coordinated use of audit/feedback, provider activation, and organizational change can increase HIV testing rates for at-risk patients. We are refining our program prior to extending our work to a small-scale, multi-site evaluation (QUERI step 4/phase 2). We also plan to evaluate the durability/sustainability of the intervention effect, the costs of HIV testing, and the number of newly identified HIV-infected patients. Ultimately, we will evaluate this program in other geographically dispersed stations (QUERI step 4/phases 3 and 4).
Goetz, Matthew B; Bowman, Candice; Hoang, Tuyen; Anaya, Henry; Osborn, Teresa; Gifford, Allen L; Asch, Steven M
2008-01-01
Background We describe how we used the framework of the U.S. Department of Veterans Affairs (VA) Quality Enhancement Research Initiative (QUERI) to develop a program to improve rates of diagnostic testing for the Human Immunodeficiency Virus (HIV). This venture was prompted by the observation by the CDC that 25% of HIV-infected patients do not know their diagnosis – a point of substantial importance to the VA, which is the largest provider of HIV care in the United States. Methods Following the QUERI steps (or process), we evaluated: 1) whether undiagnosed HIV infection is a high-risk, high-volume clinical issue within the VA, 2) whether there are evidence-based recommendations for HIV testing, 3) whether there are gaps in the performance of VA HIV testing, and 4) the barriers and facilitators to improving current practice in the VA. Based on our findings, we developed and initiated a QUERI step 4/phase 1 pilot project using the precepts of the Chronic Care Model. Our improvement strategy relies upon electronic clinical reminders to provide decision support; audit/feedback as a clinical information system, and appropriate changes in delivery system design. These activities are complemented by academic detailing and social marketing interventions to achieve provider activation. Results Our preliminary formative evaluation indicates the need to ensure leadership and team buy-in, address facility-specific barriers, refine the reminder, and address factors that contribute to inter-clinic variances in HIV testing rates. Preliminary unadjusted data from the first seven months of our program show 3–5 fold increases in the proportion of at-risk patients who are offered HIV testing at the VA sites (stations) where the pilot project has been undertaken; no change was seen at control stations. Discussion This project demonstrates the early success of the application of the QUERI process to the development of a program to improve HIV testing rates. Preliminary unadjusted results show that the coordinated use of audit/feedback, provider activation, and organizational change can increase HIV testing rates for at-risk patients. We are refining our program prior to extending our work to a small-scale, multi-site evaluation (QUERI step 4/phase 2). We also plan to evaluate the durability/sustainability of the intervention effect, the costs of HIV testing, and the number of newly identified HIV-infected patients. Ultimately, we will evaluate this program in other geographically dispersed stations (QUERI step 4/phases 3 and 4). PMID:18353185
How Do Children Reformulate Their Search Queries?
ERIC Educational Resources Information Center
Rutter, Sophie; Ford, Nigel; Clough, Paul
2015-01-01
Introduction: This paper investigates techniques used by children in year 4 (age eight to nine) of a UK primary school to reformulate their queries, and how they use information retrieval systems to support query reformulation. Method: An in-depth study analysing the interactions of twelve children carrying out search tasks in a primary school…
Group Centric Information Sharing Using Hierarchical Models
2011-01-01
enable people to create data using RDF, build vocabularies using web ontology language (OWL), write rules and query data stores using SPARQL [8...a strict joined and the document was added with a strict add. In order to represent the fact that an action is allowed (or not), we have created a...greatly improve the system’s readiness to handle any number of access decision queries . a. The pair is tested against the gSIS Join and Add semantics
Blind Seer: A Scalable Private DBMS
2014-05-01
searchable index terms per DB row, in time comparable to (insecure) MySQL (many practical queries can be privately executed with work 1.2-3 times slower...than MySQL , although some queries are costlier). We support a rich query set, including searching on arbitrary boolean formulas on keywords and ranges...index terms per DB row, in time comparable to (insecure) MySQL (many practical queries can be privately executed with work 1.2-3 times slower than MySQL
Roberts, Laura Weiss; Kim, Jane Paik
2015-12-01
Schizophrenia is a serious mental disorder that may affect the decisional capacity, and as a consequence, preferred alternative decision-makers may be engaged to help with clinical care and research-related choices. Ideally, alternative decision-makers will seek to make decisions that fit with the views and preferences of the ill individual. Few data exist, however, comparing the views of alternative decision-makers to those of individuals with schizophrenia. We conducted a written survey with individuals with schizophrenia living in a community setting, and a parallel survey with the person whom the ill individual identified as being a preferred alternative decision-maker. Complete data were obtained on 20 pairs (n = 40, total). Domains queried included (a) burden, happiness, and safety of the ill individual and of his or her family in treatment and research decisions and (b) importance of ethical principles in every day life. Two-sided paired t-tests and graphical summaries were used to compare responses. Individuals with schizophrenia and their linked preferred alternative decision-makers were attuned on four of six aspects of treatment decision-making and on all six aspects of research decision-making that we queried. The preferred alternative decision-makers overall demonstrated attunement to the views of the ill individuals in this small study. Ill individuals and their preferred alternative decision-makers were aligned in their views of ethically-salient aspects of every day life. These novel findings suggest that alternative decision-makers identified by ill individuals may be able to guide choices based on an accurate understanding of the ill individuals' views and values. Copyright © 2015 Elsevier Ltd. All rights reserved.
Yang, Liu; Jin, Rong; Mummert, Lily; Sukthankar, Rahul; Goode, Adam; Zheng, Bin; Hoi, Steven C H; Satyanarayanan, Mahadev
2010-01-01
Similarity measurement is a critical component in content-based image retrieval systems, and learning a good distance metric can significantly improve retrieval performance. However, despite extensive study, there are several major shortcomings with the existing approaches for distance metric learning that can significantly affect their application to medical image retrieval. In particular, "similarity" can mean very different things in image retrieval: resemblance in visual appearance (e.g., two images that look like one another) or similarity in semantic annotation (e.g., two images of tumors that look quite different yet are both malignant). Current approaches for distance metric learning typically address only one goal without consideration of the other. This is problematic for medical image retrieval where the goal is to assist doctors in decision making. In these applications, given a query image, the goal is to retrieve similar images from a reference library whose semantic annotations could provide the medical professional with greater insight into the possible interpretations of the query image. If the system were to retrieve images that did not look like the query, then users would be less likely to trust the system; on the other hand, retrieving images that appear superficially similar to the query but are semantically unrelated is undesirable because that could lead users toward an incorrect diagnosis. Hence, learning a distance metric that preserves both visual resemblance and semantic similarity is important. We emphasize that, although our study is focused on medical image retrieval, the problem addressed in this work is critical to many image retrieval systems. We present a boosting framework for distance metric learning that aims to preserve both visual and semantic similarities. The boosting framework first learns a binary representation using side information, in the form of labeled pairs, and then computes the distance as a weighted Hamming distance using the learned binary representation. A boosting algorithm is presented to efficiently learn the distance function. We evaluate the proposed algorithm on a mammographic image reference library with an Interactive Search-Assisted Decision Support (ISADS) system and on the medical image data set from ImageCLEF. Our results show that the boosting framework compares favorably to state-of-the-art approaches for distance metric learning in retrieval accuracy, with much lower computational cost. Additional evaluation with the COREL collection shows that our algorithm works well for regular image data sets.
Designing a Relational Database to Facilitate Military Family Housing Utilization Decisions
2001-06-01
TOOLS 73 A. INTRODUCTION Z"*73 B. MICROSOFT ACCESS TOOLS 74 C. GEOGRAPHIC INFORMATION SYSTEM 77 D. SUMMARY 90 VIII. CONCLUSIONS AND...FAMILYMEMBER Objects 36 Figure 5-1. The Design View for the "Find duplicates" Query 46 Figure 5-2. MS Access Design View of the Housing Inventory...55 Figure 5-6. Access Query to Weed Out Null Appliance Rows 58 Figure 6-1. HATS Main Menu (Switchboard Form) 62 Figure 6-2. Using Drop-down Menus on
Legal Decision-Making by People with Aphasia: Critical Incidents for Speech Pathologists
ERIC Educational Resources Information Center
Ferguson, Alison; Duffield, Gemma; Worrall, Linda
2010-01-01
Background: The assessment and management of a person with aphasia for whom decision-making capacity is queried represents a highly complex clinical issue. In addition, there are few published guidelines and even fewer published accounts of empirical research to assist. Aims: The research presented in this paper aimed to identify the main issues…
Active Learning Using Hint Information.
Li, Chun-Liang; Ferng, Chun-Sung; Lin, Hsuan-Tien
2015-08-01
The abundance of real-world data and limited labeling budget calls for active learning, an important learning paradigm for reducing human labeling efforts. Many recently developed active learning algorithms consider both uncertainty and representativeness when making querying decisions. However, exploiting representativeness with uncertainty concurrently usually requires tackling sophisticated and challenging learning tasks, such as clustering. In this letter, we propose a new active learning framework, called hinted sampling, which takes both uncertainty and representativeness into account in a simpler way. We design a novel active learning algorithm within the hinted sampling framework with an extended support vector machine. Experimental results validate that the novel active learning algorithm can result in a better and more stable performance than that achieved by state-of-the-art algorithms. We also show that the hinted sampling framework allows improving another active learning algorithm designed from the transductive support vector machine.
PAQ: Persistent Adaptive Query Middleware for Dynamic Environments
NASA Astrophysics Data System (ADS)
Rajamani, Vasanth; Julien, Christine; Payton, Jamie; Roman, Gruia-Catalin
Pervasive computing applications often entail continuous monitoring tasks, issuing persistent queries that return continuously updated views of the operational environment. We present PAQ, a middleware that supports applications' needs by approximating a persistent query as a sequence of one-time queries. PAQ introduces an integration strategy abstraction that allows composition of one-time query responses into streams representing sophisticated spatio-temporal phenomena of interest. A distinguishing feature of our middleware is the realization that the suitability of a persistent query's result is a function of the application's tolerance for accuracy weighed against the associated overhead costs. In PAQ, programmers can specify an inquiry strategy that dictates how information is gathered. Since network dynamics impact the suitability of a particular inquiry strategy, PAQ associates an introspection strategy with a persistent query, that evaluates the quality of the query's results. The result of introspection can trigger application-defined adaptation strategies that alter the nature of the query. PAQ's simple API makes developing adaptive querying systems easily realizable. We present the key abstractions, describe their implementations, and demonstrate the middleware's usefulness through application examples and evaluation.
RCQ-GA: RDF Chain Query Optimization Using Genetic Algorithms
NASA Astrophysics Data System (ADS)
Hogenboom, Alexander; Milea, Viorel; Frasincar, Flavius; Kaymak, Uzay
The application of Semantic Web technologies in an Electronic Commerce environment implies a need for good support tools. Fast query engines are needed for efficient querying of large amounts of data, usually represented using RDF. We focus on optimizing a special class of SPARQL queries, the so-called RDF chain queries. For this purpose, we devise a genetic algorithm called RCQ-GA that determines the order in which joins need to be performed for an efficient evaluation of RDF chain queries. The approach is benchmarked against a two-phase optimization algorithm, previously proposed in literature. The more complex a query is, the more RCQ-GA outperforms the benchmark in solution quality, execution time needed, and consistency of solution quality. When the algorithms are constrained by a time limit, the overall performance of RCQ-GA compared to the benchmark further improves.
Wu, Dehua
2016-01-01
The spatial position and distribution of human body meridian are expressed limitedly in the decision support system (DSS) of acupuncture and moxibustion at present, which leads to the failure to give the effective quantitative analysis on the spatial range and the difficulty for the decision-maker to provide a realistic spatial decision environment. Focusing on the limit spatial expression in DSS of acupuncture and moxibustion, it was proposed that on the basis of the geographic information system, in association of DSS technology, the design idea was developed on the human body meridian spatial DSS. With the 4-layer service-oriented architecture adopted, the data center integrated development platform was taken as the system development environment. The hierarchical organization was done for the spatial data of human body meridian via the directory tree. The structured query language (SQL) server was used to achieve the unified management of spatial data and attribute data. The technologies of architecture, configuration and plug-in development model were integrated to achieve the data inquiry, buffer analysis and program evaluation of the human body meridian spatial DSS. The research results show that the human body meridian spatial DSS could reflect realistically the spatial characteristics of the spatial position and distribution of human body meridian and met the constantly changeable demand of users. It has the powerful spatial analysis function and assists with the scientific decision in clinical treatment and teaching of acupuncture and moxibustion. It is the new attempt to the informatization research of human body meridian.
An advanced web query interface for biological databases
Latendresse, Mario; Karp, Peter D.
2010-01-01
Although most web-based biological databases (DBs) offer some type of web-based form to allow users to author DB queries, these query forms are quite restricted in the complexity of DB queries that they can formulate. They can typically query only one DB, and can query only a single type of object at a time (e.g. genes) with no possible interaction between the objects—that is, in SQL parlance, no joins are allowed between DB objects. Writing precise queries against biological DBs is usually left to a programmer skillful enough in complex DB query languages like SQL. We present a web interface for building precise queries for biological DBs that can construct much more precise queries than most web-based query forms, yet that is user friendly enough to be used by biologists. It supports queries containing multiple conditions, and connecting multiple object types without using the join concept, which is unintuitive to biologists. This interactive web interface is called the Structured Advanced Query Page (SAQP). Users interactively build up a wide range of query constructs. Interactive documentation within the SAQP describes the schema of the queried DBs. The SAQP is based on BioVelo, a query language based on list comprehension. The SAQP is part of the Pathway Tools software and is available as part of several bioinformatics web sites powered by Pathway Tools, including the BioCyc.org site that contains more than 500 Pathway/Genome DBs. PMID:20624715
Measuring Up: Implementing a Dental Quality Measure in the Electronic Health Record Context
Bhardwaj, Aarti; Ramoni, Rachel; Kalenderian, Elsbeth; Neumann, Ana; Hebballi, Nutan B; White, Joel M; McClellan, Lyle; Walji, Muhammad F
2015-01-01
Background Quality improvement requires quality measures that are validly implementable. In this work, we assessed the feasibility and performance of an automated electronic Meaningful Use dental clinical quality measure (percentage of children who received fluoride varnish). Methods We defined how to implement the automated measure queries in a dental electronic health record (EHR). Within records identified through automated query, we manually reviewed a subsample to assess the performance of the query. Results The automated query found 71.0% of patients to have had fluoride varnish compared to 77.6% found using the manual chart review. The automated quality measure performance was 90.5% sensitivity, 90.8% specificity, 96.9% positive predictive value, and 75.2% negative predictive value. Conclusions Our findings support the feasibility of automated dental quality measure queries in the context of sufficient structured data. Information noted only in the free text rather than in structured data would require natural language processing approaches to effectively query. Practical Implications To participate in self-directed quality improvement, dental clinicians must embrace the accountability era. Commitment to quality will require enhanced documentation in order to support near-term automated calculation of quality measures. PMID:26562736
An adaptable architecture for patient cohort identification from diverse data sources.
Bache, Richard; Miles, Simon; Taweel, Adel
2013-12-01
We define and validate an architecture for systems that identify patient cohorts for clinical trials from multiple heterogeneous data sources. This architecture has an explicit query model capable of supporting temporal reasoning and expressing eligibility criteria independently of the representation of the data used to evaluate them. The architecture has the key feature that queries defined according to the query model are both pre and post-processed and this is used to address both structural and semantic heterogeneity. The process of extracting the relevant clinical facts is separated from the process of reasoning about them. A specific instance of the query model is then defined and implemented. We show that the specific instance of the query model has wide applicability. We then describe how it is used to access three diverse data warehouses to determine patient counts. Although the proposed architecture requires greater effort to implement the query model than would be the case for using just SQL and accessing a data-based management system directly, this effort is justified because it supports both temporal reasoning and heterogeneous data sources. The query model only needs to be implemented once no matter how many data sources are accessed. Each additional source requires only the implementation of a lightweight adaptor. The architecture has been used to implement a specific query model that can express complex eligibility criteria and access three diverse data warehouses thus demonstrating the feasibility of this approach in dealing with temporal reasoning and data heterogeneity.
HC StratoMineR: A Web-Based Tool for the Rapid Analysis of High-Content Datasets.
Omta, Wienand A; van Heesbeen, Roy G; Pagliero, Romina J; van der Velden, Lieke M; Lelieveld, Daphne; Nellen, Mehdi; Kramer, Maik; Yeong, Marley; Saeidi, Amir M; Medema, Rene H; Spruit, Marco; Brinkkemper, Sjaak; Klumperman, Judith; Egan, David A
2016-10-01
High-content screening (HCS) can generate large multidimensional datasets and when aligned with the appropriate data mining tools, it can yield valuable insights into the mechanism of action of bioactive molecules. However, easy-to-use data mining tools are not widely available, with the result that these datasets are frequently underutilized. Here, we present HC StratoMineR, a web-based tool for high-content data analysis. It is a decision-supportive platform that guides even non-expert users through a high-content data analysis workflow. HC StratoMineR is built by using My Structured Query Language for storage and querying, PHP: Hypertext Preprocessor as the main programming language, and jQuery for additional user interface functionality. R is used for statistical calculations, logic and data visualizations. Furthermore, C++ and graphical processor unit power is diffusely embedded in R by using the rcpp and rpud libraries for operations that are computationally highly intensive. We show that we can use HC StratoMineR for the analysis of multivariate data from a high-content siRNA knock-down screen and a small-molecule screen. It can be used to rapidly filter out undesirable data; to select relevant data; and to perform quality control, data reduction, data exploration, morphological hit picking, and data clustering. Our results demonstrate that HC StratoMineR can be used to functionally categorize HCS hits and, thus, provide valuable information for hit prioritization.
Allones, J L; Martinez, D; Taboada, M
2014-10-01
Clinical terminologies are considered a key technology for capturing clinical data in a precise and standardized manner, which is critical to accurately exchange information among different applications, medical records and decision support systems. An important step to promote the real use of clinical terminologies, such as SNOMED-CT, is to facilitate the process of finding mappings between local terms of medical records and concepts of terminologies. In this paper, we propose a mapping tool to discover text-to-concept mappings in SNOMED-CT. Name-based techniques were combined with a query expansion system to generate alternative search terms, and with a strategy to analyze and take advantage of the semantic relationships of the SNOMED-CT concepts. The developed tool was evaluated and compared to the search services provided by two SNOMED-CT browsers. Our tool automatically mapped clinical terms from a Spanish glossary of procedures in pathology with 88.0% precision and 51.4% recall, providing a substantial improvement of recall (28% and 60%) over other publicly accessible mapping services. The improvements reached by the mapping tool are encouraging. Our results demonstrate the feasibility of accurately mapping clinical glossaries to SNOMED-CT concepts, by means a combination of structural, query expansion and named-based techniques. We have shown that SNOMED-CT is a great source of knowledge to infer synonyms for the medical domain. Results show that an automated query expansion system overcomes the challenge of vocabulary mismatch partially.
Supporting Coral Reef Ecosystem Management Decisions Appropriate to Climate Change
NASA Astrophysics Data System (ADS)
Hendee, J. C.; Fletcher, P.; Shein, K. A.
2013-05-01
There has been a perception that the myriad of environmental information products derived from satellite and other instrumental sources means ipso facto that there is a direct use for them by environmental managers. Trouble is, as information providers, for the most part we don't really know what decisions managers face daily, nor is it a trivial matter to ascertain the effect of management decisions on the environment, at least in a time frame that facilitates timely maintenance and enhancement of decision support software. To bridge this gap in understanding, we conducted a Needs Assessment (using methodology from the NOAA/Coastal Services Center's Product Design and Evaluation training program) from December, 2011 through May, 2012, in which we queried 15 resource managers in southeast Florida to identify the types of climate data and information products they needed to understand the effects of climate change in their region of purview, and how best these products should be delivered and subsequently enhanced or corrected. Our intent has been to develop a suite of software and information products customized specifically for environmental managers. This report summarizes our success to date, including a report on the development of software for gathering and presenting specific types of climate data, and a narrative about how some U.S. government sponsored efforts, such as Giovanni and TerraVis, as well as non-governmental sponsored efforts such as Marxan, Zonation, SimCLIM, and other off-the-shelf software might be customized for use in specific regions.
A Visual Interface for Querying Heterogeneous Phylogenetic Databases.
Jamil, Hasan M
2017-01-01
Despite the recent growth in the number of phylogenetic databases, access to these wealth of resources remain largely tool or form-based interface driven. It is our thesis that the flexibility afforded by declarative query languages may offer the opportunity to access these repositories in a better way, and to use such a language to pose truly powerful queries in unprecedented ways. In this paper, we propose a substantially enhanced closed visual query language, called PhyQL, that can be used to query phylogenetic databases represented in a canonical form. The canonical representation presented helps capture most phylogenetic tree formats in a convenient way, and is used as the storage model for our PhyloBase database for which PhyQL serves as the query language. We have implemented a visual interface for the end users to pose PhyQL queries using visual icons, and drag and drop operations defined over them. Once a query is posed, the interface translates the visual query into a Datalog query for execution over the canonical database. Responses are returned as hyperlinks to phylogenies that can be viewed in several formats using the tree viewers supported by PhyloBase. Results cached in PhyQL buffer allows secondary querying on the computed results making it a truly powerful querying architecture.
Provenance Storage, Querying, and Visualization in PBase
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kianmajd, Parisa; Ludascher, Bertram; Missier, Paolo
2015-01-01
We present PBase, a repository for scientific workflows and their corresponding provenance information that facilitates the sharing of experiments among the scientific community. PBase is interoperable since it uses ProvONE, a standard provenance model for scientific workflows. Workflows and traces are stored in RDF, and with the support of SPARQL and the tree cover encoding, the repository provides a scalable infrastructure for querying the provenance data. Furthermore, through its user interface, it is possible to: visualize workflows and execution traces; visualize reachability relations within these traces; issue SPARQL queries; and visualize query results.
Mutual influence in shared decision making: a collaborative study of patients and physicians.
Lown, Beth A; Clark, William D; Hanson, Janice L
2009-06-01
To explore how patients and physicians describe attitudes and behaviours that facilitate shared decision making. Background Studies have described physician behaviours in shared decision making, explored decision aids for informing patients and queried whether patients and physicians want to share decisions. Little attention has been paid to patients' behaviors that facilitate shared decision making or to the influence of patients and physicians on each other during this process. Qualitative analysis of data from four research work groups, each composed of patients with chronic conditions and primary care physicians. Eighty-five patients and physicians identified six categories of paired physician/patient themes, including act in a relational way; explore/express patient's feelings and preferences; discuss information and options; seek information, support and advice; share control and negotiate a decision; and patients act on their own behalf and physicians act on behalf of the patient. Similar attitudes and behaviours were described for both patients and physicians. Participants described a dynamic process in which patients and physicians influence each other throughout shared decision making. This study is unique in that clinicians and patients collaboratively defined and described attitudes and behaviours that facilitate shared decision making and expand previous descriptions, particularly of patient attitudes and behaviours that facilitate shared decision making. Study participants described relational, contextual and affective behaviours and attitudes for both patients and physicians, and explicitly discussed sharing control and negotiation. The complementary, interactive behaviours described in the themes for both patients and physicians illustrate mutual influence of patients and physicians on each other.
Comparing the performance of two CBIRS indexing schemes
NASA Astrophysics Data System (ADS)
Mueller, Wolfgang; Robbert, Guenter; Henrich, Andreas
2003-01-01
Content based image retrieval (CBIR) as it is known today has to deal with a number of challenges. Quickly summarized, the main challenges are firstly, to bridge the semantic gap between high-level concepts and low-level features using feedback, secondly to provide performance under adverse conditions. High-dimensional spaces, as well as a demanding machine learning task make the right way of indexing an important issue. When indexing multimedia data, most groups opt for extraction of high-dimensional feature vectors from the data, followed by dimensionality reduction like PCA (Principal Components Analysis) or LSI (Latent Semantic Indexing). The resulting vectors are indexed using spatial indexing structures such as kd-trees or R-trees, for example. Other projects, such as MARS and Viper propose the adaptation of text indexing techniques, notably the inverted file. Here, the Viper system is the most direct adaptation of text retrieval techniques to quantized vectors. However, while the Viper query engine provides decent performance together with impressive user-feedback behavior, as well as the possibility for easy integration of long-term learning algorithms, and support for potentially infinite feature vectors, there has been no comparison of vector-based methods and inverted-file-based methods under similar conditions. In this publication, we compare a CBIR query engine that uses inverted files (Bothrops, a rewrite of the Viper query engine based on a relational database), and a CBIR query engine based on LSD (Local Split Decision) trees for spatial indexing using the same feature sets. The Benchathlon initiative works on providing a set of images and ground truth for simulating image queries by example and corresponding user feedback. When performing the Benchathlon benchmark on a CBIR system (the System Under Test, SUT), a benchmarking harness connects over internet to the SUT, performing a number of queries using an agreed-upon protocol, the multimedia retrieval markup language (MRML). Using this benchmark one can measure the quality of retrieval, as well as the overall (speed) performance of the benchmarked system. Our Benchmarks will draw on the Benchathlon"s work for documenting the retrieval performance of both inverted file-based and LSD tree based techniques. However in addition to these results, we will present statistics, that can be obtained only inside the system under test. These statistics will include the number of complex mathematical operations, as well as the amount of data that has to be read from disk during operation of a query.
Cairns, Andrew W; Bond, Raymond R; Finlay, Dewar D; Guldenring, Daniel; Badilini, Fabio; Libretti, Guido; Peace, Aaron J; Leslie, Stephen J
The 12-lead Electrocardiogram (ECG) has been used to detect cardiac abnormalities in the same format for more than 70years. However, due to the complex nature of 12-lead ECG interpretation, there is a significant cognitive workload required from the interpreter. This complexity in ECG interpretation often leads to errors in diagnosis and subsequent treatment. We have previously reported on the development of an ECG interpretation support system designed to augment the human interpretation process. This computerised decision support system has been named 'Interactive Progressive based Interpretation' (IPI). In this study, a decision support algorithm was built into the IPI system to suggest potential diagnoses based on the interpreter's annotations of the 12-lead ECG. We hypothesise semi-automatic interpretation using a digital assistant can be an optimal man-machine model for ECG interpretation. To improve interpretation accuracy and reduce missed co-abnormalities. The Differential Diagnoses Algorithm (DDA) was developed using web technologies where diagnostic ECG criteria are defined in an open storage format, Javascript Object Notation (JSON), which is queried using a rule-based reasoning algorithm to suggest diagnoses. To test our hypothesis, a counterbalanced trial was designed where subjects interpreted ECGs using the conventional approach and using the IPI+DDA approach. A total of 375 interpretations were collected. The IPI+DDA approach was shown to improve diagnostic accuracy by 8.7% (although not statistically significant, p-value=0.1852), the IPI+DDA suggested the correct interpretation more often than the human interpreter in 7/10 cases (varying statistical significance). Human interpretation accuracy increased to 70% when seven suggestions were generated. Although results were not found to be statistically significant, we found; 1) our decision support tool increased the number of correct interpretations, 2) the DDA algorithm suggested the correct interpretation more often than humans, and 3) as many as 7 computerised diagnostic suggestions augmented human decision making in ECG interpretation. Statistical significance may be achieved by expanding sample size. Copyright © 2017 Elsevier Inc. All rights reserved.
Towards Hybrid Online On-Demand Querying of Realtime Data with Stateful Complex Event Processing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhou, Qunzhi; Simmhan, Yogesh; Prasanna, Viktor K.
Emerging Big Data applications in areas like e-commerce and energy industry require both online and on-demand queries to be performed over vast and fast data arriving as streams. These present novel challenges to Big Data management systems. Complex Event Processing (CEP) is recognized as a high performance online query scheme which in particular deals with the velocity aspect of the 3-V’s of Big Data. However, traditional CEP systems do not consider data variety and lack the capability to embed ad hoc queries over the volume of data streams. In this paper, we propose H2O, a stateful complex event processing framework,more » to support hybrid online and on-demand queries over realtime data. We propose a semantically enriched event and query model to address data variety. A formal query algebra is developed to precisely capture the stateful and containment semantics of online and on-demand queries. We describe techniques to achieve the interactive query processing over realtime data featured by efficient online querying, dynamic stream data persistence and on-demand access. The system architecture is presented and the current implementation status reported.« less
Designing software for operational decision support through coloured Petri nets
NASA Astrophysics Data System (ADS)
Maggi, F. M.; Westergaard, M.
2017-05-01
Operational support provides, during the execution of a business process, replies to questions such as 'how do I end the execution of the process in the cheapest way?' and 'is my execution compliant with some expected behaviour?' These questions may be asked several times during a single execution and, to answer them, dedicated software components (the so-called operational support providers) need to be invoked. Therefore, an infrastructure is needed to handle multiple providers, maintain data between queries about the same execution and discard information when it is no longer needed. In this paper, we use coloured Petri nets (CPNs) to model and analyse software implementing such an infrastructure. This analysis is needed to clarify the requirements before implementation and to guarantee that the resulting software is correct. To this aim, we present techniques to represent and analyse state spaces with 250 million states on a normal PC. We show how the specified requirements have been implemented as a plug-in of the process mining tool ProM and how the operational support in ProM can be used in combination with an existing operational support provider.
An adaptable architecture for patient cohort identification from diverse data sources
Bache, Richard; Miles, Simon; Taweel, Adel
2013-01-01
Objective We define and validate an architecture for systems that identify patient cohorts for clinical trials from multiple heterogeneous data sources. This architecture has an explicit query model capable of supporting temporal reasoning and expressing eligibility criteria independently of the representation of the data used to evaluate them. Method The architecture has the key feature that queries defined according to the query model are both pre and post-processed and this is used to address both structural and semantic heterogeneity. The process of extracting the relevant clinical facts is separated from the process of reasoning about them. A specific instance of the query model is then defined and implemented. Results We show that the specific instance of the query model has wide applicability. We then describe how it is used to access three diverse data warehouses to determine patient counts. Discussion Although the proposed architecture requires greater effort to implement the query model than would be the case for using just SQL and accessing a data-based management system directly, this effort is justified because it supports both temporal reasoning and heterogeneous data sources. The query model only needs to be implemented once no matter how many data sources are accessed. Each additional source requires only the implementation of a lightweight adaptor. Conclusions The architecture has been used to implement a specific query model that can express complex eligibility criteria and access three diverse data warehouses thus demonstrating the feasibility of this approach in dealing with temporal reasoning and data heterogeneity. PMID:24064442
An alternative database approach for management of SNOMED CT and improved patient data queries.
Campbell, W Scott; Pedersen, Jay; McClay, James C; Rao, Praveen; Bastola, Dhundy; Campbell, James R
2015-10-01
SNOMED CT is the international lingua franca of terminologies for human health. Based in Description Logics (DL), the terminology enables data queries that incorporate inferences between data elements, as well as, those relationships that are explicitly stated. However, the ontologic and polyhierarchical nature of the SNOMED CT concept model make it difficult to implement in its entirety within electronic health record systems that largely employ object oriented or relational database architectures. The result is a reduction of data richness, limitations of query capability and increased systems overhead. The hypothesis of this research was that a graph database (graph DB) architecture using SNOMED CT as the basis for the data model and subsequently modeling patient data upon the semantic core of SNOMED CT could exploit the full value of the terminology to enrich and support advanced data querying capability of patient data sets. The hypothesis was tested by instantiating a graph DB with the fully classified SNOMED CT concept model. The graph DB instance was tested for integrity by calculating the transitive closure table for the SNOMED CT hierarchy and comparing the results with transitive closure tables created using current, validated methods. The graph DB was then populated with 461,171 anonymized patient record fragments and over 2.1 million associated SNOMED CT clinical findings. Queries, including concept negation and disjunction, were then run against the graph database and an enterprise Oracle relational database (RDBMS) of the same patient data sets. The graph DB was then populated with laboratory data encoded using LOINC, as well as, medication data encoded with RxNorm and complex queries performed using LOINC, RxNorm and SNOMED CT to identify uniquely described patient populations. A graph database instance was successfully created for two international releases of SNOMED CT and two US SNOMED CT editions. Transitive closure tables and descriptive statistics generated using the graph database were identical to those using validated methods. Patient queries produced identical patient count results to the Oracle RDBMS with comparable times. Database queries involving defining attributes of SNOMED CT concepts were possible with the graph DB. The same queries could not be directly performed with the Oracle RDBMS representation of the patient data and required the creation and use of external terminology services. Further, queries of undefined depth were successful in identifying unknown relationships between patient cohorts. The results of this study supported the hypothesis that a patient database built upon and around the semantic model of SNOMED CT was possible. The model supported queries that leveraged all aspects of the SNOMED CT logical model to produce clinically relevant query results. Logical disjunction and negation queries were possible using the data model, as well as, queries that extended beyond the structural IS_A hierarchy of SNOMED CT to include queries that employed defining attribute-values of SNOMED CT concepts as search parameters. As medical terminologies, such as SNOMED CT, continue to expand, they will become more complex and model consistency will be more difficult to assure. Simultaneously, consumers of data will increasingly demand improvements to query functionality to accommodate additional granularity of clinical concepts without sacrificing speed. This new line of research provides an alternative approach to instantiating and querying patient data represented using advanced computable clinical terminologies. Copyright © 2015 Elsevier Inc. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bowers, M; Robertson, S; Moore, J
Purpose: Advancement in Radiation Oncology (RO) practice develops through evidence based medicine and clinical trial. Knowledge usable for treatment planning, decision support and research is contained in our clinical data, stored in an Oncospace database. This data store and the tools for populating and analyzing it are compatible with standard RO practice and are shared with collaborating institutions. The question is - what protocol for system development and data sharing within an Oncospace Consortium? We focus our example on the technology and data meaning necessary to share across the Consortium. Methods: Oncospace consists of a database schema, planning and outcomemore » data import and web based analysis tools.1) Database: The Consortium implements a federated data store; each member collects and maintains its own data within an Oncospace schema. For privacy, PHI is contained within a single table, accessible to the database owner.2) Import: Spatial dose data from treatment plans (Pinnacle or DICOM) is imported via Oncolink. Treatment outcomes are imported from an OIS (MOSAIQ).3) Analysis: JHU has built a number of webpages to answer analysis questions. Oncospace data can also be analyzed via MATLAB or SAS queries.These materials are available to Consortium members, who contribute enhancements and improvements. Results: 1) The Oncospace Consortium now consists of RO centers at JHU, UVA, UW and the University of Toronto. These members have successfully installed and populated Oncospace databases with over 1000 patients collectively.2) Members contributing code and getting updates via SVN repository. Errors are reported and tracked via Redmine. Teleconferences include strategizing design and code reviews.3) Successfully remotely queried federated databases to combine multiple institutions’ DVH data for dose-toxicity analysis (see below – data combined from JHU and UW Oncospace). Conclusion: RO data sharing can and has been effected according to the Oncospace Consortium model: http://oncospace.radonc.jhmi.edu/ . John Wong - SRA from Elekta; Todd McNutt - SRA from Elekta; Michael Bowers - funded by Elekta.« less
Profile-IQ: Web-based data query system for local health department infrastructure and activities.
Shah, Gulzar H; Leep, Carolyn J; Alexander, Dayna
2014-01-01
To demonstrate the use of National Association of County & City Health Officials' Profile-IQ, a Web-based data query system, and how policy makers, researchers, the general public, and public health professionals can use the system to generate descriptive statistics on local health departments. This article is a descriptive account of an important health informatics tool based on information from the project charter for Profile-IQ and the authors' experience and knowledge in design and use of this query system. Profile-IQ is a Web-based data query system that is based on open-source software: MySQL 5.5, Google Web Toolkit 2.2.0, Apache Commons Math library, Google Chart API, and Tomcat 6.0 Web server deployed on an Amazon EC2 server. It supports dynamic queries of National Profile of Local Health Departments data on local health department finances, workforce, and activities. Profile-IQ's customizable queries provide a variety of statistics not available in published reports and support the growing information needs of users who do not wish to work directly with data files for lack of staff skills or time, or to avoid a data use agreement. Profile-IQ also meets the growing demand of public health practitioners and policy makers for data to support quality improvement, community health assessment, and other processes associated with voluntary public health accreditation. It represents a step forward in the recent health informatics movement of data liberation and use of open source information technology solutions to promote public health.
Managing data from multiple disciplines, scales, and sites to support synthesis and modeling
Olson, R. J.; Briggs, J. M.; Porter, J.H.; Mah, Grant R.; Stafford, S.G.
1999-01-01
The synthesis and modeling of ecological processes at multiple spatial and temporal scales involves bringing together and sharing data from numerous sources. This article describes a data and information system model that facilitates assembling, managing, and sharing diverse data from multiple disciplines, scales, and sites to support integrated ecological studies. Cross-site scientific-domain working groups coordinate the development of data associated with their particular scientific working group, including decisions about data requirements, data to be compiled, data formats, derived data products, and schedules across the sites. The Web-based data and information system consists of nodes for each working group plus a central node that provides data access, project information, data query, and other functionality. The approach incorporates scientists and computer experts in the working groups and provides incentives for individuals to submit documented data to the data and information system.
LC Data QUEST: A Technical Architecture for Community Federated Clinical Data Sharing.
Stephens, Kari A; Lin, Ching-Ping; Baldwin, Laura-Mae; Echo-Hawk, Abigail; Keppel, Gina A; Buchwald, Dedra; Whitener, Ron J; Korngiebel, Diane M; Berg, Alfred O; Black, Robert A; Tarczy-Hornoch, Peter
2012-01-01
The University of Washington Institute of Translational Health Sciences is engaged in a project, LC Data QUEST, building data sharing capacity in primary care practices serving rural and tribal populations in the Washington, Wyoming, Alaska, Montana, Idaho region to build research infrastructure. We report on the iterative process of developing the technical architecture for semantically aligning electronic health data in primary care settings across our pilot sites and tools that will facilitate linkages between the research and practice communities. Our architecture emphasizes sustainable technical solutions for addressing data extraction, alignment, quality, and metadata management. The architecture provides immediate benefits to participating partners via a clinical decision support tool and data querying functionality to support local quality improvement efforts. The FInDiT tool catalogues type, quantity, and quality of the data that are available across the LC Data QUEST data sharing architecture. These tools facilitate the bi-directional process of translational research.
Johnson, Christie
2016-01-01
This poster presentation presents a content modeling strategy using the SNOMED CT Observable Model to represent large amounts of detailed clinical data in a consistent and computable manner that can support multiple use cases. Lightweight Expression of Granular Objects (LEGOs) represent question/answer pairs on clinical data collection forms, where a question is modeled by a (usually) post-coordinated SNOMED CT expression. LEGOs transform electronic patient data into a normalized consumable, which means that the expressions can be treated as extensions of the SNOMED CT hierarchies for the purpose of performing subsumption queries and other analytics. Utilizing the LEGO approach for modeling clinical data obtained from a nursing admission assessment provides a foundation for data exchange across disparate information systems and software applications. Clinical data exchange of computable LEGO patient information enables the development of more refined data analytics, data storage and clinical decision support.
LC Data QUEST: A Technical Architecture for Community Federated Clinical Data Sharing
Stephens, Kari A.; Lin, Ching-Ping; Baldwin, Laura-Mae; Echo-Hawk, Abigail; Keppel, Gina A.; Buchwald, Dedra; Whitener, Ron J.; Korngiebel, Diane M.; Berg, Alfred O.; Black, Robert A.; Tarczy-Hornoch, Peter
2012-01-01
The University of Washington Institute of Translational Health Sciences is engaged in a project, LC Data QUEST, building data sharing capacity in primary care practices serving rural and tribal populations in the Washington, Wyoming, Alaska, Montana, Idaho region to build research infrastructure. We report on the iterative process of developing the technical architecture for semantically aligning electronic health data in primary care settings across our pilot sites and tools that will facilitate linkages between the research and practice communities. Our architecture emphasizes sustainable technical solutions for addressing data extraction, alignment, quality, and metadata management. The architecture provides immediate benefits to participating partners via a clinical decision support tool and data querying functionality to support local quality improvement efforts. The FInDiT tool catalogues type, quantity, and quality of the data that are available across the LC Data QUEST data sharing architecture. These tools facilitate the bi-directional process of translational research. PMID:22779052
A Customizable Dashboarding System for Watershed Model Interpretation
NASA Astrophysics Data System (ADS)
Easton, Z. M.; Collick, A.; Wagena, M. B.; Sommerlot, A.; Fuka, D.
2017-12-01
Stakeholders, including policymakers, agricultural water managers, and small farm managers, can benefit from the outputs of commonly run watershed models. However, the information that each stakeholder needs is be different. While policy makers are often interested in the broader effects that small farm management may have on a watershed during extreme events or over long periods, farmers are often interested in field specific effects at daily or seasonal period. To provide stakeholders with the ability to analyze and interpret data from large scale watershed models, we have developed a framework that can support custom exploration of the large datasets produced. For the volume of data produced by these models, SQL-based data queries are not efficient; thus, we employ a "Not Only SQL" (NO-SQL) query language, which allows data to scale in both quantity and query volumes. We demonstrate a stakeholder customizable Dashboarding system that allows stakeholders to create custom `dashboards' to summarize model output specific to their needs. Dashboarding is a dynamic and purpose-based visual interface needed to display one-to-many database linkages so that the information can be presented for a single time period or dynamically monitored over time and allows a user to quickly define focus areas of interest for their analysis. We utilize a single watershed model that is run four times daily with a combined set of climate projections, which are then indexed, and added to an ElasticSearch datastore. ElasticSearch is a NO-SQL search engine built on top of Apache Lucene, a free and open-source information retrieval software library. Aligned with the ElasticSearch project is the open source visualization and analysis system, Kibana, which we utilize for custom stakeholder dashboarding. The dashboards create a visualization of the stakeholder selected analysis and can be extended to recommend robust strategies to support decision-making.
Managing and Querying Image Annotation and Markup in XML.
Wang, Fusheng; Pan, Tony; Sharma, Ashish; Saltz, Joel
2010-01-01
Proprietary approaches for representing annotations and image markup are serious barriers for researchers to share image data and knowledge. The Annotation and Image Markup (AIM) project is developing a standard based information model for image annotation and markup in health care and clinical trial environments. The complex hierarchical structures of AIM data model pose new challenges for managing such data in terms of performance and support of complex queries. In this paper, we present our work on managing AIM data through a native XML approach, and supporting complex image and annotation queries through native extension of XQuery language. Through integration with xService, AIM databases can now be conveniently shared through caGrid.
Managing and Querying Image Annotation and Markup in XML
Wang, Fusheng; Pan, Tony; Sharma, Ashish; Saltz, Joel
2010-01-01
Proprietary approaches for representing annotations and image markup are serious barriers for researchers to share image data and knowledge. The Annotation and Image Markup (AIM) project is developing a standard based information model for image annotation and markup in health care and clinical trial environments. The complex hierarchical structures of AIM data model pose new challenges for managing such data in terms of performance and support of complex queries. In this paper, we present our work on managing AIM data through a native XML approach, and supporting complex image and annotation queries through native extension of XQuery language. Through integration with xService, AIM databases can now be conveniently shared through caGrid. PMID:21218167
Genomes as geography: using GIS technology to build interactive genome feature maps
Dolan, Mary E; Holden, Constance C; Beard, M Kate; Bult, Carol J
2006-01-01
Background Many commonly used genome browsers display sequence annotations and related attributes as horizontal data tracks that can be toggled on and off according to user preferences. Most genome browsers use only simple keyword searches and limit the display of detailed annotations to one chromosomal region of the genome at a time. We have employed concepts, methodologies, and tools that were developed for the display of geographic data to develop a Genome Spatial Information System (GenoSIS) for displaying genomes spatially, and interacting with genome annotations and related attribute data. In contrast to the paradigm of horizontally stacked data tracks used by most genome browsers, GenoSIS uses the concept of registered spatial layers composed of spatial objects for integrated display of diverse data. In addition to basic keyword searches, GenoSIS supports complex queries, including spatial queries, and dynamically generates genome maps. Our adaptation of the geographic information system (GIS) model in a genome context supports spatial representation of genome features at multiple scales with a versatile and expressive query capability beyond that supported by existing genome browsers. Results We implemented an interactive genome sequence feature map for the mouse genome in GenoSIS, an application that uses ArcGIS, a commercially available GIS software system. The genome features and their attributes are represented as spatial objects and data layers that can be toggled on and off according to user preferences or displayed selectively in response to user queries. GenoSIS supports the generation of custom genome maps in response to complex queries about genome features based on both their attributes and locations. Our example application of GenoSIS to the mouse genome demonstrates the powerful visualization and query capability of mature GIS technology applied in a novel domain. Conclusion Mapping tools developed specifically for geographic data can be exploited to display, explore and interact with genome data. The approach we describe here is organism independent and is equally useful for linear and circular chromosomes. One of the unique capabilities of GenoSIS compared to existing genome browsers is the capacity to generate genome feature maps dynamically in response to complex attribute and spatial queries. PMID:16984652
Fragger: a protein fragment picker for structural queries.
Berenger, Francois; Simoncini, David; Voet, Arnout; Shrestha, Rojan; Zhang, Kam Y J
2017-01-01
Protein modeling and design activities often require querying the Protein Data Bank (PDB) with a structural fragment, possibly containing gaps. For some applications, it is preferable to work on a specific subset of the PDB or with unpublished structures. These requirements, along with specific user needs, motivated the creation of a new software to manage and query 3D protein fragments. Fragger is a protein fragment picker that allows protein fragment databases to be created and queried. All fragment lengths are supported and any set of PDB files can be used to create a database. Fragger can efficiently search a fragment database with a query fragment and a distance threshold. Matching fragments are ranked by distance to the query. The query fragment can have structural gaps and the allowed amino acid sequences matching a query can be constrained via a regular expression of one-letter amino acid codes. Fragger also incorporates a tool to compute the backbone RMSD of one versus many fragments in high throughput. Fragger should be useful for protein design, loop grafting and related structural bioinformatics tasks.
a Novel Approach of Indexing and Retrieving Spatial Polygons for Efficient Spatial Region Queries
NASA Astrophysics Data System (ADS)
Zhao, J. H.; Wang, X. Z.; Wang, F. Y.; Shen, Z. H.; Zhou, Y. C.; Wang, Y. L.
2017-10-01
Spatial region queries are more and more widely used in web-based applications. Mechanisms to provide efficient query processing over geospatial data are essential. However, due to the massive geospatial data volume, heavy geometric computation, and high access concurrency, it is difficult to get response in real time. Spatial indexes are usually used in this situation. In this paper, based on k-d tree, we introduce a distributed KD-Tree (DKD-Tree) suitbable for polygon data, and a two-step query algorithm. The spatial index construction is recursive and iterative, and the query is an in memory process. Both the index and query methods can be processed in parallel, and are implemented based on HDFS, Spark and Redis. Experiments on a large volume of Remote Sensing images metadata have been carried out, and the advantages of our method are investigated by comparing with spatial region queries executed on PostgreSQL and PostGIS. Results show that our approach not only greatly improves the efficiency of spatial region query, but also has good scalability, Moreover, the two-step spatial range query algorithm can also save cluster resources to support a large number of concurrent queries. Therefore, this method is very useful when building large geographic information systems.
Image Retrieval by Color Semantics with Incomplete Knowledge.
ERIC Educational Resources Information Center
Corridoni, Jacopo M.; Del Bimbo, Alberto; Vicario, Enrico
1998-01-01
Presents a system which supports image retrieval by high-level chromatic contents, the sensations that color accordances generate on the observer. Surveys Itten's theory of color semantics and discusses image description and query specification. Presents examples of visual querying. (AEF)
Experiments on Interfaces To Support Query Expansion.
ERIC Educational Resources Information Center
Beaulieu, M.
1997-01-01
Focuses on the user and human-computer interaction aspects of the research based on the Okapi text retrieval system. Three experiments implementing different approaches to query expansion are described, including the use of graphical user interfaces with different windowing techniques. (Author/LRW)
Visually defining and querying consistent multi-granular clinical temporal abstractions.
Combi, Carlo; Oliboni, Barbara
2012-02-01
The main goal of this work is to propose a framework for the visual specification and query of consistent multi-granular clinical temporal abstractions. We focus on the issue of querying patient clinical information by visually defining and composing temporal abstractions, i.e., high level patterns derived from several time-stamped raw data. In particular, we focus on the visual specification of consistent temporal abstractions with different granularities and on the visual composition of different temporal abstractions for querying clinical databases. Temporal abstractions on clinical data provide a concise and high-level description of temporal raw data, and a suitable way to support decision making. Granularities define partitions on the time line and allow one to represent time and, thus, temporal clinical information at different levels of detail, according to the requirements coming from the represented clinical domain. The visual representation of temporal information has been considered since several years in clinical domains. Proposed visualization techniques must be easy and quick to understand, and could benefit from visual metaphors that do not lead to ambiguous interpretations. Recently, physical metaphors such as strips, springs, weights, and wires have been proposed and evaluated on clinical users for the specification of temporal clinical abstractions. Visual approaches to boolean queries have been considered in the last years and confirmed that the visual support to the specification of complex boolean queries is both an important and difficult research topic. We propose and describe a visual language for the definition of temporal abstractions based on a set of intuitive metaphors (striped wall, plastered wall, brick wall), allowing the clinician to use different granularities. A new algorithm, underlying the visual language, allows the physician to specify only consistent abstractions, i.e., abstractions not containing contradictory conditions on the component abstractions. Moreover, we propose a visual query language where different temporal abstractions can be composed to build complex queries: temporal abstractions are visually connected through the usual logical connectives AND, OR, and NOT. The proposed visual language allows one to simply define temporal abstractions by using intuitive metaphors, and to specify temporal intervals related to abstractions by using different temporal granularities. The physician can interact with the designed and implemented tool by point-and-click selections, and can visually compose queries involving several temporal abstractions. The evaluation of the proposed granularity-related metaphors consisted in two parts: (i) solving 30 interpretation exercises by choosing the correct interpretation of a given screenshot representing a possible scenario, and (ii) solving a complex exercise, by visually specifying through the interface a scenario described only in natural language. The exercises were done by 13 subjects. The percentage of correct answers to the interpretation exercises were slightly different with respect to the considered metaphors (54.4--striped wall, 73.3--plastered wall, 61--brick wall, and 61--no wall), but post hoc statistical analysis on means confirmed that differences were not statistically significant. The result of the user's satisfaction questionnaire related to the evaluation of the proposed granularity-related metaphors ratified that there are no preferences for one of them. The evaluation of the proposed logical notation consisted in two parts: (i) solving five interpretation exercises provided by a screenshot representing a possible scenario and by three different possible interpretations, of which only one was correct, and (ii) solving five exercises, by visually defining through the interface a scenario described only in natural language. Exercises had an increasing difficulty. The evaluation involved a total of 31 subjects. Results related to this evaluation phase confirmed us about the soundness of the proposed solution even in comparison with a well known proposal based on a tabular query form (the only significant difference is that our proposal requires more time for the training phase: 21 min versus 14 min). In this work we have considered the issue of visually composing and querying temporal clinical patient data. In this context we have proposed a visual framework for the specification of consistent temporal abstractions with different granularities and for the visual composition of different temporal abstractions to build (possibly) complex queries on clinical databases. A new algorithm has been proposed to check the consistency of the specified granular abstraction. From the evaluation of the proposed metaphors and interfaces and from the comparison of the visual query language with a well known visual method for boolean queries, the soundness of the overall system has been confirmed; moreover, pros and cons and possible improvements emerged from the comparison of different visual metaphors and solutions. Copyright © 2011 Elsevier B.V. All rights reserved.
Managing biomedical image metadata for search and retrieval of similar images.
Korenblum, Daniel; Rubin, Daniel; Napel, Sandy; Rodriguez, Cesar; Beaulieu, Chris
2011-08-01
Radiology images are generally disconnected from the metadata describing their contents, such as imaging observations ("semantic" metadata), which are usually described in text reports that are not directly linked to the images. We developed a system, the Biomedical Image Metadata Manager (BIMM) to (1) address the problem of managing biomedical image metadata and (2) facilitate the retrieval of similar images using semantic feature metadata. Our approach allows radiologists, researchers, and students to take advantage of the vast and growing repositories of medical image data by explicitly linking images to their associated metadata in a relational database that is globally accessible through a Web application. BIMM receives input in the form of standard-based metadata files using Web service and parses and stores the metadata in a relational database allowing efficient data query and maintenance capabilities. Upon querying BIMM for images, 2D regions of interest (ROIs) stored as metadata are automatically rendered onto preview images included in search results. The system's "match observations" function retrieves images with similar ROIs based on specific semantic features describing imaging observation characteristics (IOCs). We demonstrate that the system, using IOCs alone, can accurately retrieve images with diagnoses matching the query images, and we evaluate its performance on a set of annotated liver lesion images. BIMM has several potential applications, e.g., computer-aided detection and diagnosis, content-based image retrieval, automating medical analysis protocols, and gathering population statistics like disease prevalences. The system provides a framework for decision support systems, potentially improving their diagnostic accuracy and selection of appropriate therapies.
Music Identification System Using MPEG-7 Audio Signature Descriptors
You, Shingchern D.; Chen, Wei-Hwa; Chen, Woei-Kae
2013-01-01
This paper describes a multiresolution system based on MPEG-7 audio signature descriptors for music identification. Such an identification system may be used to detect illegally copied music circulated over the Internet. In the proposed system, low-resolution descriptors are used to search likely candidates, and then full-resolution descriptors are used to identify the unknown (query) audio. With this arrangement, the proposed system achieves both high speed and high accuracy. To deal with the problem that a piece of query audio may not be inside the system's database, we suggest two different methods to find the decision threshold. Simulation results show that the proposed method II can achieve an accuracy of 99.4% for query inputs both inside and outside the database. Overall, it is highly possible to use the proposed system for copyright control. PMID:23533359
O'Sullivan, D; Wilk, S; Michalowski, W; Slowinski, R; Thomas, R; Kadzinski, M; Farion, K
2014-01-01
Online medical knowledge repositories such as MEDLINE and The Cochrane Library are increasingly used by physicians to retrieve articles to aid with clinical decision making. The prevailing approach for organizing retrieved articles is in the form of a rank-ordered list, with the assumption that the higher an article is presented on a list, the more relevant it is. Despite this common list-based organization, it is seldom studied how physicians perceive the association between the relevance of articles and the order in which articles are presented. In this paper we describe a case study that captured physician preferences for 3-element lists of medical articles in order to learn how to organize medical knowledge for decision-making. Comprehensive relevance evaluations were developed to represent 3-element lists of hypothetical articles that may be retrieved from an online medical knowledge source such as MEDLINE or The Cochrane Library. Comprehensive relevance evaluations asses not only an article's relevance for a query, but also whether it has been placed on the correct list position. In other words an article may be relevant and correctly placed on a result list (e.g. the most relevant article appears first in the result list), an article may be relevant for a query but placed on an incorrect list position (e.g. the most relevant article appears second in a result list), or an article may be irrelevant for a query yet still appear in the result list. The relevance evaluations were presented to six senior physicians who were asked to express their preferences for an article's relevance and its position on a list by pairwise comparisons representing different combinations of 3-element lists. The elicited preferences were assessed using a novel GRIP (Generalized Regression with Intensities of Preference) method and represented as an additive value function. Value functions were derived for individual physicians as well as the group of physicians. The results show that physicians assign significant value to the 1st position on a list and they expect that the most relevant article is presented first. Whilst physicians still prefer obtaining a correctly placed article on position 2, they are also quite satisfied with misplaced relevant article. Low consideration of the 3rd position was uniformly confirmed. Our findings confirm the importance of placing the most relevant article on the 1st position on a list and the importance paid to position on a list significantly diminishes after the 2nd position. The derived value functions may be used by developers of clinical decision support applications to decide how best to organize medical knowledge for decision making and to create personalized evaluation measures that can augment typical measures used to evaluate information retrieval systems.
Chang, Lin-Chau; Mahmood, Riaz; Qureshi, Samina; Breder, Christopher D
2017-01-01
Standardised MedDRA Queries (SMQs) have been developed since the early 2000's and used by academia, industry, public health, and government sectors for detecting safety signals in adverse event safety databases. The purpose of the present study is to characterize how SMQs are used and the impact in safety analyses for New Drug Application (NDA) and Biologics License Application (BLA) submissions to the United States Food and Drug Administration (USFDA). We used the PharmaPendium database to capture SMQ use in Summary Basis of Approvals (SBoAs) of drugs and biologics approved by the USFDA. Characteristics of the drugs and the SMQ use were employed to evaluate the role of SMQ safety analyses in regulatory decisions and the veracity of signals they revealed. A comprehensive search of the SBoAs yielded 184 regulatory submissions approved from 2006 to 2015. Search strategies more frequently utilized restrictive searches with "narrow terms" to enhance specificity over strategies using "broad terms" to increase sensitivity, while some involved modification of search terms. A majority (59%) of 1290 searches used descriptive statistics, however inferential statistics were utilized in 35% of them. Commentary from reviewers and supervisory staff suggested that a small, yet notable percentage (18%) of 1290 searches supported regulatory decisions. The searches with regulatory impact were found in 73 submissions (40% of the submissions investigated). Most searches (75% of 227 searches) with regulatory implications described how the searches were confirmed, indicating prudence in the decision-making process. SMQs have an increasing role in the presentation and review of safety analysis for NDAs/BLAs and their regulatory reviews. This study suggests that SMQs are best used for screening process, with descriptive statistics, description of SMQ modifications, and systematic verification of cases which is crucial for drawing regulatory conclusions.
Chang, Lin-Chau; Mahmood, Riaz; Qureshi, Samina
2017-01-01
Purpose Standardised MedDRA Queries (SMQs) have been developed since the early 2000’s and used by academia, industry, public health, and government sectors for detecting safety signals in adverse event safety databases. The purpose of the present study is to characterize how SMQs are used and the impact in safety analyses for New Drug Application (NDA) and Biologics License Application (BLA) submissions to the United States Food and Drug Administration (USFDA). Methods We used the PharmaPendium database to capture SMQ use in Summary Basis of Approvals (SBoAs) of drugs and biologics approved by the USFDA. Characteristics of the drugs and the SMQ use were employed to evaluate the role of SMQ safety analyses in regulatory decisions and the veracity of signals they revealed. Results A comprehensive search of the SBoAs yielded 184 regulatory submissions approved from 2006 to 2015. Search strategies more frequently utilized restrictive searches with “narrow terms” to enhance specificity over strategies using “broad terms” to increase sensitivity, while some involved modification of search terms. A majority (59%) of 1290 searches used descriptive statistics, however inferential statistics were utilized in 35% of them. Commentary from reviewers and supervisory staff suggested that a small, yet notable percentage (18%) of 1290 searches supported regulatory decisions. The searches with regulatory impact were found in 73 submissions (40% of the submissions investigated). Most searches (75% of 227 searches) with regulatory implications described how the searches were confirmed, indicating prudence in the decision-making process. Conclusions SMQs have an increasing role in the presentation and review of safety analysis for NDAs/BLAs and their regulatory reviews. This study suggests that SMQs are best used for screening process, with descriptive statistics, description of SMQ modifications, and systematic verification of cases which is crucial for drawing regulatory conclusions. PMID:28570569
A natural language interface plug-in for cooperative query answering in biological databases.
Jamil, Hasan M
2012-06-11
One of the many unique features of biological databases is that the mere existence of a ground data item is not always a precondition for a query response. It may be argued that from a biologist's standpoint, queries are not always best posed using a structured language. By this we mean that approximate and flexible responses to natural language like queries are well suited for this domain. This is partly due to biologists' tendency to seek simpler interfaces and partly due to the fact that questions in biology involve high level concepts that are open to interpretations computed using sophisticated tools. In such highly interpretive environments, rigidly structured databases do not always perform well. In this paper, our goal is to propose a semantic correspondence plug-in to aid natural language query processing over arbitrary biological database schema with an aim to providing cooperative responses to queries tailored to users' interpretations. Natural language interfaces for databases are generally effective when they are tuned to the underlying database schema and its semantics. Therefore, changes in database schema become impossible to support, or a substantial reorganization cost must be absorbed to reflect any change. We leverage developments in natural language parsing, rule languages and ontologies, and data integration technologies to assemble a prototype query processor that is able to transform a natural language query into a semantically equivalent structured query over the database. We allow knowledge rules and their frequent modifications as part of the underlying database schema. The approach we adopt in our plug-in overcomes some of the serious limitations of many contemporary natural language interfaces, including support for schema modifications and independence from underlying database schema. The plug-in introduced in this paper is generic and facilitates connecting user selected natural language interfaces to arbitrary databases using a semantic description of the intended application. We demonstrate the feasibility of our approach with a practical example.
A Semantic Basis for Proof Queries and Transformations
NASA Technical Reports Server (NTRS)
Aspinall, David; Denney, Ewen W.; Luth, Christoph
2013-01-01
We extend the query language PrQL, designed for inspecting machine representations of proofs, to also allow transformation of proofs. PrQL natively supports hiproofs which express proof structure using hierarchically nested labelled trees, which we claim is a natural way of taming the complexity of huge proofs. Query-driven transformations enable manipulation of this structure, in particular, to transform proofs produced by interactive theorem provers into forms that assist their understanding, or that could be consumed by other tools. In this paper we motivate and define basic transformation operations, using an abstract denotational semantics of hiproofs and queries. This extends our previous semantics for queries based on syntactic tree representations.We define update operations that add and remove sub-proofs, and manipulate the hierarchy to group and ungroup nodes. We show that
Supporting temporal queries on clinical relational databases: the S-WATCH-QL language.
Combi, C.; Missora, L.; Pinciroli, F.
1996-01-01
Due to the ubiquitous and special nature of time, specially in clinical datábases there's the need of particular temporal data and operators. In this paper we describe S-WATCH-QL (Structured Watch Query Language), a temporal extension of SQL, the widespread query language based on the relational model. S-WATCH-QL extends the well-known SQL by the addition of: a) temporal data types that allow the storage of information with different levels of granularity; b) historical relations that can store together both instantaneous valid times and intervals; c) some temporal clauses, functions and predicates allowing to define complex temporal queries. PMID:8947722
Accelerating Research Impact in a Learning Health Care System
Elwy, A. Rani; Sales, Anne E.; Atkins, David
2017-01-01
Background: Since 1998, the Veterans Health Administration (VHA) Quality Enhancement Research Initiative (QUERI) has supported more rapid implementation of research into clinical practice. Objectives: With the passage of the Veterans Access, Choice and Accountability Act of 2014 (Choice Act), QUERI further evolved to support VHA’s transformation into a Learning Health Care System by aligning science with clinical priority goals based on a strategic planning process and alignment of funding priorities with updated VHA priority goals in response to the Choice Act. Design: QUERI updated its strategic goals in response to independent assessments mandated by the Choice Act that recommended VHA reduce variation in care by providing a clear path to implement best practices. Specifically, QUERI updated its application process to ensure its centers (Programs) focus on cross-cutting VHA priorities and specify roadmaps for implementation of research-informed practices across different settings. QUERI also increased funding for scientific evaluations of the Choice Act and other policies in response to Commission on Care recommendations. Results: QUERI’s national network of Programs deploys effective practices using implementation strategies across different settings. QUERI Choice Act evaluations informed the law’s further implementation, setting the stage for additional rigorous national evaluations of other VHA programs and policies including community provider networks. Conclusions: Grounded in implementation science and evidence-based policy, QUERI serves as an example of how to operationalize core components of a Learning Health Care System, notably through rigorous evaluation and scientific testing of implementation strategies to ultimately reduce variation in quality and improve overall population health. PMID:27997456
SeqWare Query Engine: storing and searching sequence data in the cloud.
O'Connor, Brian D; Merriman, Barry; Nelson, Stanley F
2010-12-21
Since the introduction of next-generation DNA sequencers the rapid increase in sequencer throughput, and associated drop in costs, has resulted in more than a dozen human genomes being resequenced over the last few years. These efforts are merely a prelude for a future in which genome resequencing will be commonplace for both biomedical research and clinical applications. The dramatic increase in sequencer output strains all facets of computational infrastructure, especially databases and query interfaces. The advent of cloud computing, and a variety of powerful tools designed to process petascale datasets, provide a compelling solution to these ever increasing demands. In this work, we present the SeqWare Query Engine which has been created using modern cloud computing technologies and designed to support databasing information from thousands of genomes. Our backend implementation was built using the highly scalable, NoSQL HBase database from the Hadoop project. We also created a web-based frontend that provides both a programmatic and interactive query interface and integrates with widely used genome browsers and tools. Using the query engine, users can load and query variants (SNVs, indels, translocations, etc) with a rich level of annotations including coverage and functional consequences. As a proof of concept we loaded several whole genome datasets including the U87MG cell line. We also used a glioblastoma multiforme tumor/normal pair to both profile performance and provide an example of using the Hadoop MapReduce framework within the query engine. This software is open source and freely available from the SeqWare project (http://seqware.sourceforge.net). The SeqWare Query Engine provided an easy way to make the U87MG genome accessible to programmers and non-programmers alike. This enabled a faster and more open exploration of results, quicker tuning of parameters for heuristic variant calling filters, and a common data interface to simplify development of analytical tools. The range of data types supported, the ease of querying and integrating with existing tools, and the robust scalability of the underlying cloud-based technologies make SeqWare Query Engine a nature fit for storing and searching ever-growing genome sequence datasets.
SeqWare Query Engine: storing and searching sequence data in the cloud
2010-01-01
Background Since the introduction of next-generation DNA sequencers the rapid increase in sequencer throughput, and associated drop in costs, has resulted in more than a dozen human genomes being resequenced over the last few years. These efforts are merely a prelude for a future in which genome resequencing will be commonplace for both biomedical research and clinical applications. The dramatic increase in sequencer output strains all facets of computational infrastructure, especially databases and query interfaces. The advent of cloud computing, and a variety of powerful tools designed to process petascale datasets, provide a compelling solution to these ever increasing demands. Results In this work, we present the SeqWare Query Engine which has been created using modern cloud computing technologies and designed to support databasing information from thousands of genomes. Our backend implementation was built using the highly scalable, NoSQL HBase database from the Hadoop project. We also created a web-based frontend that provides both a programmatic and interactive query interface and integrates with widely used genome browsers and tools. Using the query engine, users can load and query variants (SNVs, indels, translocations, etc) with a rich level of annotations including coverage and functional consequences. As a proof of concept we loaded several whole genome datasets including the U87MG cell line. We also used a glioblastoma multiforme tumor/normal pair to both profile performance and provide an example of using the Hadoop MapReduce framework within the query engine. This software is open source and freely available from the SeqWare project (http://seqware.sourceforge.net). Conclusions The SeqWare Query Engine provided an easy way to make the U87MG genome accessible to programmers and non-programmers alike. This enabled a faster and more open exploration of results, quicker tuning of parameters for heuristic variant calling filters, and a common data interface to simplify development of analytical tools. The range of data types supported, the ease of querying and integrating with existing tools, and the robust scalability of the underlying cloud-based technologies make SeqWare Query Engine a nature fit for storing and searching ever-growing genome sequence datasets. PMID:21210981
Time series patterns and language support in DBMS
NASA Astrophysics Data System (ADS)
Telnarova, Zdenka
2017-07-01
This contribution is focused on pattern type Time Series as a rich in semantics representation of data. Some example of implementation of this pattern type in traditional Data Base Management Systems is briefly presented. There are many approaches how to manipulate with patterns and query patterns. Crucial issue can be seen in systematic approach to pattern management and specific pattern query language which takes into consideration semantics of patterns. Query language SQL-TS for manipulating with patterns is shown on Time Series data.
Anytime query-tuned kernel machine classifiers via Cholesky factorization
NASA Technical Reports Server (NTRS)
DeCoste, D.
2002-01-01
We recently demonstrated 2 to 64-fold query-time speedups of Support Vector Machine and Kernel Fisher classifiers via a new computational geometry method for anytime output bounds (DeCoste,2002). This new paper refines our approach in two key ways. First, we introduce a simple linear algebra formulation based on Cholesky factorization, yielding simpler equations and lower computational overhead. Second, this new formulation suggests new methods for achieving additional speedups, including tuning on query samples. We demonstrate effectiveness on benchmark datasets.
Extending the Query Language of a Data Warehouse for Patient Recruitment.
Dietrich, Georg; Ertl, Maximilian; Fette, Georg; Kaspar, Mathias; Krebs, Jonathan; Mackenrodt, Daniel; Störk, Stefan; Puppe, Frank
2017-01-01
Patient recruitment for clinical trials is a laborious task, as many texts have to be screened. Usually, this work is done manually and takes a lot of time. We have developed a system that automates the screening process. Besides standard keyword queries, the query language supports extraction of numbers, time-spans and negations. In a feasibility study for patient recruitment from a stroke unit with 40 patients, we achieved encouraging extraction rates above 95% for numbers and negations and ca. 86% for time spans.
Menon, K Venugopal; Kumar, Dinesh; Thomas, Tessamma
2014-02-01
Study Design Preliminary evaluation of new tool. Objective To ascertain whether the newly developed content-based image retrieval (CBIR) software can be used successfully to retrieve images of similar cases of adolescent idiopathic scoliosis (AIS) from a database to help plan treatment without adhering to a classification scheme. Methods Sixty-two operated cases of AIS were entered into the newly developed CBIR database. Five new cases of different curve patterns were used as query images. The images were fed into the CBIR database that retrieved similar images from the existing cases. These were analyzed by a senior surgeon for conformity to the query image. Results Within the limits of variability set for the query system, all the resultant images conformed to the query image. One case had no similar match in the series. The other four retrieved several images that were matching with the query. No matching case was left out in the series. The postoperative images were then analyzed to check for surgical strategies. Broad guidelines for treatment could be derived from the results. More precise query settings, inclusion of bending films, and a larger database will enhance accurate retrieval and better decision making. Conclusion The CBIR system is an effective tool for accurate documentation and retrieval of scoliosis images. Broad guidelines for surgical strategies can be made from the postoperative images of the existing cases without adhering to any classification scheme.
Query Health: standards-based, cross-platform population health surveillance
Klann, Jeffrey G; Buck, Michael D; Brown, Jeffrey; Hadley, Marc; Elmore, Richard; Weber, Griffin M; Murphy, Shawn N
2014-01-01
Objective Understanding population-level health trends is essential to effectively monitor and improve public health. The Office of the National Coordinator for Health Information Technology (ONC) Query Health initiative is a collaboration to develop a national architecture for distributed, population-level health queries across diverse clinical systems with disparate data models. Here we review Query Health activities, including a standards-based methodology, an open-source reference implementation, and three pilot projects. Materials and methods Query Health defined a standards-based approach for distributed population health queries, using an ontology based on the Quality Data Model and Consolidated Clinical Document Architecture, Health Quality Measures Format (HQMF) as the query language, the Query Envelope as the secure transport layer, and the Quality Reporting Document Architecture as the result language. Results We implemented this approach using Informatics for Integrating Biology and the Bedside (i2b2) and hQuery for data analytics and PopMedNet for access control, secure query distribution, and response. We deployed the reference implementation at three pilot sites: two public health departments (New York City and Massachusetts) and one pilot designed to support Food and Drug Administration post-market safety surveillance activities. The pilots were successful, although improved cross-platform data normalization is needed. Discussions This initiative resulted in a standards-based methodology for population health queries, a reference implementation, and revision of the HQMF standard. It also informed future directions regarding interoperability and data access for ONC's Data Access Framework initiative. Conclusions Query Health was a test of the learning health system that supplied a functional methodology and reference implementation for distributed population health queries that has been validated at three sites. PMID:24699371
Query Health: standards-based, cross-platform population health surveillance.
Klann, Jeffrey G; Buck, Michael D; Brown, Jeffrey; Hadley, Marc; Elmore, Richard; Weber, Griffin M; Murphy, Shawn N
2014-01-01
Understanding population-level health trends is essential to effectively monitor and improve public health. The Office of the National Coordinator for Health Information Technology (ONC) Query Health initiative is a collaboration to develop a national architecture for distributed, population-level health queries across diverse clinical systems with disparate data models. Here we review Query Health activities, including a standards-based methodology, an open-source reference implementation, and three pilot projects. Query Health defined a standards-based approach for distributed population health queries, using an ontology based on the Quality Data Model and Consolidated Clinical Document Architecture, Health Quality Measures Format (HQMF) as the query language, the Query Envelope as the secure transport layer, and the Quality Reporting Document Architecture as the result language. We implemented this approach using Informatics for Integrating Biology and the Bedside (i2b2) and hQuery for data analytics and PopMedNet for access control, secure query distribution, and response. We deployed the reference implementation at three pilot sites: two public health departments (New York City and Massachusetts) and one pilot designed to support Food and Drug Administration post-market safety surveillance activities. The pilots were successful, although improved cross-platform data normalization is needed. This initiative resulted in a standards-based methodology for population health queries, a reference implementation, and revision of the HQMF standard. It also informed future directions regarding interoperability and data access for ONC's Data Access Framework initiative. Query Health was a test of the learning health system that supplied a functional methodology and reference implementation for distributed population health queries that has been validated at three sites. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
The StarView intelligent query mechanism
NASA Technical Reports Server (NTRS)
Semmel, R. D.; Silberberg, D. P.
1993-01-01
The StarView interface is being developed to facilitate the retrieval of scientific and engineering data produced by the Hubble Space Telescope. While predefined screens in the interface can be used to specify many common requests, ad hoc requests require a dynamic query formulation capability. Unfortunately, logical level knowledge is too sparse to support this capability. In particular, essential formulation knowledge is lost when the domain of interest is mapped to a set of database relation schemas. Thus, a system known as QUICK has been developed that uses conceptual design knowledge to facilitate query formulation. By heuristically determining strongly associated objects at the conceptual level, QUICK is able to formulate semantically reasonable queries in response to high-level requests that specify only attributes of interest. Moreover, by exploiting constraint knowledge in the conceptual design, QUICK assures that queries are formulated quickly and will execute efficiently.
Developing A Web-based User Interface for Semantic Information Retrieval
NASA Technical Reports Server (NTRS)
Berrios, Daniel C.; Keller, Richard M.
2003-01-01
While there are now a number of languages and frameworks that enable computer-based systems to search stored data semantically, the optimal design for effective user interfaces for such systems is still uncle ar. Such interfaces should mask unnecessary query detail from users, yet still allow them to build queries of arbitrary complexity without significant restrictions. We developed a user interface supporting s emantic query generation for Semanticorganizer, a tool used by scient ists and engineers at NASA to construct networks of knowledge and dat a. Through this interface users can select node types, node attribute s and node links to build ad-hoc semantic queries for searching the S emanticOrganizer network.
The implementation of POSTGRES
NASA Technical Reports Server (NTRS)
Stonebraker, Michael; Rowe, Lawrence A.; Hirohama, Michael
1990-01-01
The design and implementation decisions made for the three-dimensional data manager POSTGRES are discussed. Attention is restricted to the DBMS backend functions. The POSTGRES data model and query language, the rules system, the storage system, the POSTGRES implementation, and the current status and performance are discussed.
WATERSHED INFORMATION - SURF YOUR WATERSHED
Surf Your Watershed is both a database of urls to world wide web pages associated with the watershed approach of environmental management and also data sets of relevant environmental information that can be queried. It is designed for citizens and decision makers across the count...
KnowledgeLink: Impact of Context-Sensitive Information Retrieval on Clinicians' Information Needs
Maviglia, Saverio M.; Yoon, Catherine S.; Bates, David W.; Kuperman, Gilad
2006-01-01
Objective: Infobuttons are message-based content search and retrieval functions embedded within other applications that dynamically return information relevant to the clinical task at hand. The objective of this study was to determine whether infobuttons effectively answer providers' questions about medications or affect patient care decisions. Design: The authors implemented and evaluated a medication infobutton application called KnowledgeLink. Health care providers at 18 outpatient clinics were randomized to one of two versions of KnowledgeLink, one that linked to information from Micromedex (Thomson Micromedex, Greenwood Village, Co) and the other to material from SkolarMD (Wolters Kluwer Health, Palo Alto, CA). Measurements: Data were collected about the frequency of use and demographics of users, patients, and drugs that were queried. Users were periodically surveyed with short questionnaires and then with a more extensive survey at the end of one year. Results: During the first year, KnowledgeLink was used 7,972 times by 359 users to look up information about 1,723 medications for 4,961 patients. Clinicians used KnowledgeLink twice a month on average, and during an average of 1.2% of patient encounters. KnowledgeLink was used by a wide variety of medical staff, not just physicians and nurse practitioners. The frequency of usage and the questions asked varied with user role (primary care physician, specialist physician, nurse practitioner). Although the median KnowledgeLink session was brief (21 seconds), KnowledgeLink answered users' queries 84% of the time, and altered patient care decisions 15% of the time. Users rated KnowledgeLink favorably on multiple scales, recommended extending KnowledgeLink to other content domains, and suggested enhancing the interface to allow refinement of the query and selection of the target resource. Conclusion: An infobutton can satisfy information needs about medications. Although used infrequently and for brief sessions, KnowledgeLink was positively received, answered most users' questions, and had a significant impact on medical decision making. The next steps would be to broaden the domains that KnowledgeLink covers to more specifically tailor results to the user type, to provide options when queries are not immediately answered, and to implement KnowledgeLink within other electronic clinical applications. PMID:16221942
Layton, Rebekah L.; Brandt, Patrick D.; Freeman, Ashalla M.; Harrell, Jessica R.; Hall, Joshua D.; Sinche, Melanie
2016-01-01
A national sample of PhD-trained scientists completed training, accepted subsequent employment in academic and nonacademic positions, and were queried about their previous graduate training and current employment. Respondents indicated factors contributing to their employment decision (e.g., working conditions, salary, job security). The data indicate the relative importance of deciding factors influencing career choice, controlling for gender, initial interest in faculty careers, and number of postgraduate publications. Among both well-represented (WR; n = 3444) and underrepresented minority (URM; n = 225) respondents, faculty career choice was positively associated with desire for autonomy and partner opportunity and negatively associated with desire for leadership opportunity. Differences between groups in reasons endorsed included: variety, prestige, salary, family influence, and faculty advisor influence. Furthermore, endorsement of faculty advisor or other mentor influence and family or peer influence were surprisingly rare across groups, suggesting that formal and informal support networks could provide a missed opportunity to provide support for trainees who want to stay in faculty career paths. Reasons requiring alteration of misperceptions (e.g., limited leadership opportunity for faculty) must be distinguished from reasons requiring removal of actual barriers. Further investigation into factors that affect PhDs’ career decisions can help elucidate why URM candidates are disproportionately exiting the academy. PMID:27587854
Measuring up: Implementing a dental quality measure in the electronic health record context.
Bhardwaj, Aarti; Ramoni, Rachel; Kalenderian, Elsbeth; Neumann, Ana; Hebballi, Nutan B; White, Joel M; McClellan, Lyle; Walji, Muhammad F
2016-01-01
Quality improvement requires using quality measures that can be implemented in a valid manner. Using guidelines set forth by the Meaningful Use portion of the Health Information Technology for Economic and Clinical Health Act, the authors assessed the feasibility and performance of an automated electronic Meaningful Use dental clinical quality measure to determine the percentage of children who received fluoride varnish. The authors defined how to implement the automated measure queries in a dental electronic health record. Within records identified through automated query, the authors manually reviewed a subsample to assess the performance of the query. The automated query results revealed that 71.0% of patients had fluoride varnish compared with the manual chart review results that indicated 77.6% of patients had fluoride varnish. The automated quality measure performance results indicated 90.5% sensitivity, 90.8% specificity, 96.9% positive predictive value, and 75.2% negative predictive value. The authors' findings support the feasibility of using automated dental quality measure queries in the context of sufficient structured data. Information noted only in free text rather than in structured data would require using natural language processing approaches to effectively query electronic health records. To participate in self-directed quality improvement, dental clinicians must embrace the accountability era. Commitment to quality will require enhanced documentation to support near-term automated calculation of quality measures. Copyright © 2016 American Dental Association. Published by Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Berriman, G. B.; Ciardi, D. R.; Good, J. C.; Laity, A. C.; Zhang, A.
2006-07-01
At ADASS XIV, we described how the W. M. Keck Observatory Archive (KOA) re-uses and extends the component based architecture of the NASA/IPAC Infrared Science Archive (IRSA) to ingest and serve level 0 observations made with HIRES, the High Resolution Echelle Spectrometer. Since August 18, the KOA has ingested 325 GB of data from 135 nights of observations. The architecture exploits a service layer between the mass storage layer and the user interface. This service layer consists of standalone utilities called through a simple executive that perform generic query and retrieval functions, such as query generation, database table sub-setting, and return page generation etc. It has been extended to implement proprietary access to data through deployment of query management middleware developed for the National Virtual Observatory. The MSC archives have recently extended this design to query and retrieve complex data sets describing the properties of potential target stars for the Terrestrial Planet Finder (TPF) missions. The archives can now support knowledge based retrieval, as well as data retrieval. This paper describes how extensions to the IRSA architecture, which is applicable across all wavelengths and astronomical datatypes, supports the design and development of the MSC NP archives at modest cost.
2013-01-01
Background Clinical Intelligence, as a research and engineering discipline, is dedicated to the development of tools for data analysis for the purposes of clinical research, surveillance, and effective health care management. Self-service ad hoc querying of clinical data is one desirable type of functionality. Since most of the data are currently stored in relational or similar form, ad hoc querying is problematic as it requires specialised technical skills and the knowledge of particular data schemas. Results A possible solution is semantic querying where the user formulates queries in terms of domain ontologies that are much easier to navigate and comprehend than data schemas. In this article, we are exploring the possibility of using SADI Semantic Web services for semantic querying of clinical data. We have developed a prototype of a semantic querying infrastructure for the surveillance of, and research on, hospital-acquired infections. Conclusions Our results suggest that SADI can support ad-hoc, self-service, semantic queries of relational data in a Clinical Intelligence context. The use of SADI compares favourably with approaches based on declarative semantic mappings from data schemas to ontologies, such as query rewriting and RDFizing by materialisation, because it can easily cope with situations when (i) some computation is required to turn relational data into RDF or OWL, e.g., to implement temporal reasoning, or (ii) integration with external data sources is necessary. PMID:23497556
STBase: one million species trees for comparative biology.
McMahon, Michelle M; Deepak, Akshay; Fernández-Baca, David; Boss, Darren; Sanderson, Michael J
2015-01-01
Comprehensively sampled phylogenetic trees provide the most compelling foundations for strong inferences in comparative evolutionary biology. Mismatches are common, however, between the taxa for which comparative data are available and the taxa sampled by published phylogenetic analyses. Moreover, many published phylogenies are gene trees, which cannot always be adapted immediately for species level comparisons because of discordance, gene duplication, and other confounding biological processes. A new database, STBase, lets comparative biologists quickly retrieve species level phylogenetic hypotheses in response to a query list of species names. The database consists of 1 million single- and multi-locus data sets, each with a confidence set of 1000 putative species trees, computed from GenBank sequence data for 413,000 eukaryotic taxa. Two bodies of theoretical work are leveraged to aid in the assembly of multi-locus concatenated data sets for species tree construction. First, multiply labeled gene trees are pruned to conflict-free singly-labeled species-level trees that can be combined between loci. Second, impacts of missing data in multi-locus data sets are ameliorated by assembling only decisive data sets. Data sets overlapping with the user's query are ranked using a scheme that depends on user-provided weights for tree quality and for taxonomic overlap of the tree with the query. Retrieval times are independent of the size of the database, typically a few seconds. Tree quality is assessed by a real-time evaluation of bootstrap support on just the overlapping subtree. Associated sequence alignments, tree files and metadata can be downloaded for subsequent analysis. STBase provides a tool for comparative biologists interested in exploiting the most relevant sequence data available for the taxa of interest. It may also serve as a prototype for future species tree oriented databases and as a resource for assembly of larger species phylogenies from precomputed trees.
NASA Astrophysics Data System (ADS)
Quang Truong, Xuan; Luan Truong, Xuan; Nguyen, Tuan Anh; Nguyen, Dinh Tuan; Cong Nguyen, Chi
2017-12-01
The objective of this study is to design and implement a WebGIS Decision Support System (WDSS) for reducing uncertainty and supporting to improve the quality of exploration decisions in the Sin-Quyen copper mine, northern Vietnam. The main distinctive feature of the Sin-Quyen deposit is an unusual composition of ores. Computer and software applied to the exploration problem have had a significant impact on the exploration process over the past 25 years, but up until now, no online system has been undertaken. The system was completely built on open source technology and the Open Geospatial Consortium Web Services (OWS). The input data includes remote sensing (RS), Geographical Information System (GIS) and data from drillhole explorations, the drillhole exploration data sets were designed as a geodatabase and stored in PostgreSQL. The WDSS must be able to processed exploration data and support users to access 2-dimensional (2D) or 3-dimensional (3D) cross-sections and map of boreholles exploration data and drill holes. The interface was designed in order to interact with based maps (e.g., Digital Elevation Model, Google Map, OpenStreetMap) and thematic maps (e.g., land use and land cover, administrative map, drillholes exploration map), and to provide GIS functions (such as creating a new map, updating an existing map, querying and statistical charts). In addition, the system provides geological cross-sections of ore bodies based on Inverse Distance Weighting (IDW), nearest neighbour interpolation and Kriging methods (e.g., Simple Kriging, Ordinary Kriging, Indicator Kriging and CoKriging). The results based on data available indicate that the best estimation method (of 23 borehole exploration data sets) for estimating geological cross-sections of ore bodies in Sin-Quyen copper mine is Ordinary Kriging. The WDSS could provide useful information to improve drilling efficiency in mineral exploration and for management decision making.
Spatial information semantic query based on SPARQL
NASA Astrophysics Data System (ADS)
Xiao, Zhifeng; Huang, Lei; Zhai, Xiaofang
2009-10-01
How can the efficiency of spatial information inquiries be enhanced in today's fast-growing information age? We are rich in geospatial data but poor in up-to-date geospatial information and knowledge that are ready to be accessed by public users. This paper adopts an approach for querying spatial semantic by building an Web Ontology language(OWL) format ontology and introducing SPARQL Protocol and RDF Query Language(SPARQL) to search spatial semantic relations. It is important to establish spatial semantics that support for effective spatial reasoning for performing semantic query. Compared to earlier keyword-based and information retrieval techniques that rely on syntax, we use semantic approaches in our spatial queries system. Semantic approaches need to be developed by ontology, so we use OWL to describe spatial information extracted by the large-scale map of Wuhan. Spatial information expressed by ontology with formal semantics is available to machines for processing and to people for understanding. The approach is illustrated by introducing a case study for using SPARQL to query geo-spatial ontology instances of Wuhan. The paper shows that making use of SPARQL to search OWL ontology instances can ensure the result's accuracy and applicability. The result also indicates constructing a geo-spatial semantic query system has positive efforts on forming spatial query and retrieval.
Zhou, ZhangBing; Zhao, Deng; Shu, Lei; Tsang, Kim-Fung
2015-01-01
Wireless sensor networks, serving as an important interface between physical environments and computational systems, have been used extensively for supporting domain applications, where multiple-attribute sensory data are queried from the network continuously and periodically. Usually, certain sensory data may not vary significantly within a certain time duration for certain applications. In this setting, sensory data gathered at a certain time slot can be used for answering concurrent queries and may be reused for answering the forthcoming queries when the variation of these data is within a certain threshold. To address this challenge, a popularity-based cooperative caching mechanism is proposed in this article, where the popularity of sensory data is calculated according to the queries issued in recent time slots. This popularity reflects the possibility that sensory data are interested in the forthcoming queries. Generally, sensory data with the highest popularity are cached at the sink node, while sensory data that may not be interested in the forthcoming queries are cached in the head nodes of divided grid cells. Leveraging these cooperatively cached sensory data, queries are answered through composing these two-tier cached data. Experimental evaluation shows that this approach can reduce the network communication cost significantly and increase the network capability. PMID:26131665
Roh, Sungjong; Niederdeppe, Jeff
2016-09-01
This study uses data from systematic Web image search results and two randomized survey experiments to analyze how frames commonly used in public debates about health issues, operationalized here as alternative word choices, influence public support for health policy reforms. In Study 1, analyses of Bing (N = 1,719), Google (N = 1,872), and Yahoo Images (N = 1,657) search results suggest that the images returned from the search query "sugar-sweetened beverage" are more likely to evoke health-related concepts than images returned from a search query about "soda." In contrast, "soda" search queries were more likely to incorporate brand-related concepts than "sugar-sweetened beverage" search queries. In Study 2, participants (N = 206) in a controlled Web experiment rated their support for policies to reduce consumption of these drinks. As expected, strong liberals had more support for policies designed to reduce the consumption of these drinks when the policies referenced "soda" compared to "sugar-sweetened beverage." To the contrary, items describing these drinks as "soda" produced lower policy support than items describing them as "sugar-sweetened beverage" among strong conservatives. In Study 3, participants (N = 1,000) in a national telephone survey experiment rated their support for a similar set of policies. Results conceptually replicated the previous Web-based experiment, such that strong liberals reported greater support for a penny-per-ounce taxation when labeled "soda" versus "sugar-sweetened beverages." In both Studies 2 and 3, more respondents referred to brand-related concepts in response to questions about "sugar-sweetened beverages" compared to "soda." We conclude with a discussion of theoretical and methodological implications for studying framing effects of labels.
Hruby, Gregory W; Matsoukas, Konstantina; Cimino, James J; Weng, Chunhua
2016-04-01
Electronic health records (EHR) are a vital data resource for research uses, including cohort identification, phenotyping, pharmacovigilance, and public health surveillance. To realize the promise of EHR data for accelerating clinical research, it is imperative to enable efficient and autonomous EHR data interrogation by end users such as biomedical researchers. This paper surveys state-of-art approaches and key methodological considerations to this purpose. We adapted a previously published conceptual framework for interactive information retrieval, which defines three entities: user, channel, and source, by elaborating on channels for query formulation in the context of facilitating end users to interrogate EHR data. We show the current progress in biomedical informatics mainly lies in support for query execution and information modeling, primarily due to emphases on infrastructure development for data integration and data access via self-service query tools, but has neglected user support needed during iteratively query formulation processes, which can be costly and error-prone. In contrast, the information science literature has offered elaborate theories and methods for user modeling and query formulation support. The two bodies of literature are complementary, implying opportunities for cross-disciplinary idea exchange. On this basis, we outline the directions for future informatics research to improve our understanding of user needs and requirements for facilitating autonomous interrogation of EHR data by biomedical researchers. We suggest that cross-disciplinary translational research between biomedical informatics and information science can benefit our research in facilitating efficient data access in life sciences. Copyright © 2016 Elsevier Inc. All rights reserved.
Asking better questions: How presentation formats influence information search.
Wu, Charley M; Meder, Björn; Filimon, Flavia; Nelson, Jonathan D
2017-08-01
While the influence of presentation formats have been widely studied in Bayesian reasoning tasks, we present the first systematic investigation of how presentation formats influence information search decisions. Four experiments were conducted across different probabilistic environments, where subjects (N = 2,858) chose between 2 possible search queries, each with binary probabilistic outcomes, with the goal of maximizing classification accuracy. We studied 14 different numerical and visual formats for presenting information about the search environment, constructed across 6 design features that have been prominently related to improvements in Bayesian reasoning accuracy (natural frequencies, posteriors, complement, spatial extent, countability, and part-to-whole information). The posterior variants of the icon array and bar graph formats led to the highest proportion of correct responses, and were substantially better than the standard probability format. Results suggest that presenting information in terms of posterior probabilities and visualizing natural frequencies using spatial extent (a perceptual feature) were especially helpful in guiding search decisions, although environments with a mixture of probabilistic and certain outcomes were challenging across all formats. Subjects who made more accurate probability judgments did not perform better on the search task, suggesting that simple decision heuristics may be used to make search decisions without explicitly applying Bayesian inference to compute probabilities. We propose a new take-the-difference (TTD) heuristic that identifies the accuracy-maximizing query without explicit computation of posterior probabilities. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Wang, Amy Y; Lancaster, William J; Wyatt, Matthew C; Rasmussen, Luke V; Fort, Daniel G; Cimino, James J
2017-01-01
A major challenge in using electronic health record repositories for research is the difficulty matching subject eligibility criteria to query capabilities of the repositories. We propose categories for study criteria corresponding to the effort needed for querying those criteria: "easy" (supporting automated queries), mixed (initial automated querying with manual review), "hard" (fully manual record review), and "impossible" or "point of enrollment" (not typically in health repositories). We obtained a sample of 292 criteria from 20 studies from ClinicalTrials.gov. Six independent reviewers, three each from two academic research institutions, rated criteria according to our four types. We observed high interrater reliability both within and between institutions. The analysis demonstrated typical features of criteria that map with varying levels of difficulty to repositories. We propose using these features to improve enrollment workflow through more standardized study criteria, self-service repository queries, and analyst-mediated retrievals.
Wang, Amy Y.; Lancaster, William J.; Wyatt, Matthew C.; Rasmussen, Luke V.; Fort, Daniel G.; Cimino, James J.
2017-01-01
A major challenge in using electronic health record repositories for research is the difficulty matching subject eligibility criteria to query capabilities of the repositories. We propose categories for study criteria corresponding to the effort needed for querying those criteria: “easy” (supporting automated queries), mixed (initial automated querying with manual review), “hard” (fully manual record review), and “impossible” or “point of enrollment” (not typically in health repositories). We obtained a sample of 292 criteria from 20 studies from ClinicalTrials.gov. Six independent reviewers, three each from two academic research institutions, rated criteria according to our four types. We observed high interrater reliability both within and between institutions. The analysis demonstrated typical features of criteria that map with varying levels of difficulty to repositories. We propose using these features to improve enrollment workflow through more standardized study criteria, self-service repository queries, and analyst-mediated retrievals. PMID:29854246
Remembrance of inferences past: Amortization in human hypothesis generation.
Dasgupta, Ishita; Schulz, Eric; Goodman, Noah D; Gershman, Samuel J
2018-05-21
Bayesian models of cognition assume that people compute probability distributions over hypotheses. However, the required computations are frequently intractable or prohibitively expensive. Since people often encounter many closely related distributions, selective reuse of computations (amortized inference) is a computationally efficient use of the brain's limited resources. We present three experiments that provide evidence for amortization in human probabilistic reasoning. When sequentially answering two related queries about natural scenes, participants' responses to the second query systematically depend on the structure of the first query. This influence is sensitive to the content of the queries, only appearing when the queries are related. Using a cognitive load manipulation, we find evidence that people amortize summary statistics of previous inferences, rather than storing the entire distribution. These findings support the view that the brain trades off accuracy and computational cost, to make efficient use of its limited cognitive resources to approximate probabilistic inference. Copyright © 2018 Elsevier B.V. All rights reserved.
UCSC genome browser: deep support for molecular biomedical research.
Mangan, Mary E; Williams, Jennifer M; Lathe, Scott M; Karolchik, Donna; Lathe, Warren C
2008-01-01
The volume and complexity of genomic sequence data, and the additional experimental data required for annotation of the genomic context, pose a major challenge for display and access for biomedical researchers. Genome browsers organize this data and make it available in various ways to extract useful information to advance research projects. The UCSC Genome Browser is one of these resources. The official sequence data for a given species forms the framework to display many other types of data such as expression, variation, cross-species comparisons, and more. Visual representations of the data are available for exploration. Data can be queried with sequences. Complex database queries are also easily achieved with the Table Browser interface. Associated tools permit additional query types or access to additional data sources such as images of in situ localizations. Support for solving researcher's issues is provided with active discussion mailing lists and by providing updated training materials. The UCSC Genome Browser provides a source of deep support for a wide range of biomedical molecular research (http://genome.ucsc.edu).
An incremental database access method for autonomous interoperable databases
NASA Technical Reports Server (NTRS)
Roussopoulos, Nicholas; Sellis, Timos
1994-01-01
We investigated a number of design and performance issues of interoperable database management systems (DBMS's). The major results of our investigation were obtained in the areas of client-server database architectures for heterogeneous DBMS's, incremental computation models, buffer management techniques, and query optimization. We finished a prototype of an advanced client-server workstation-based DBMS which allows access to multiple heterogeneous commercial DBMS's. Experiments and simulations were then run to compare its performance with the standard client-server architectures. The focus of this research was on adaptive optimization methods of heterogeneous database systems. Adaptive buffer management accounts for the random and object-oriented access methods for which no known characterization of the access patterns exists. Adaptive query optimization means that value distributions and selectives, which play the most significant role in query plan evaluation, are continuously refined to reflect the actual values as opposed to static ones that are computed off-line. Query feedback is a concept that was first introduced to the literature by our group. We employed query feedback for both adaptive buffer management and for computing value distributions and selectivities. For adaptive buffer management, we use the page faults of prior executions to achieve more 'informed' management decisions. For the estimation of the distributions of the selectivities, we use curve-fitting techniques, such as least squares and splines, for regressing on these values.
NASA Technical Reports Server (NTRS)
Friedman, S. Z.; Walker, R. E.; Aitken, R. B.
1986-01-01
The Image Based Information System (IBIS) has been under development at the Jet Propulsion Laboratory (JPL) since 1975. It is a collection of more than 90 programs that enable processing of image, graphical, tabular data for spatial analysis. IBIS can be utilized to create comprehensive geographic data bases. From these data, an analyst can study various attributes describing characteristics of a given study area. Even complex combinations of disparate data types can be synthesized to obtain a new perspective on spatial phenomena. In 1984, new query software was developed enabling direct Boolean queries of IBIS data bases through the submission of easily understood expressions. An improved syntax methodology, a data dictionary, and display software simplified the analysts' tasks associated with building, executing, and subsequently displaying the results of a query. The primary purpose of this report is to describe the features and capabilities of the new query software. A secondary purpose of this report is to compare this new query software to the query software developed previously (Friedman, 1982). With respect to this topic, the relative merits and drawbacks of both approaches are covered.
Concept-based query language approach to enterprise information systems
NASA Astrophysics Data System (ADS)
Niemi, Timo; Junkkari, Marko; Järvelin, Kalervo
2014-01-01
In enterprise information systems (EISs) it is necessary to model, integrate and compute very diverse data. In advanced EISs the stored data often are based both on structured (e.g. relational) and semi-structured (e.g. XML) data models. In addition, the ad hoc information needs of end-users may require the manipulation of data-oriented (structural), behavioural and deductive aspects of data. Contemporary languages capable of treating this kind of diversity suit only persons with good programming skills. In this paper we present a concept-oriented query language approach to manipulate this diversity so that the programming skill requirements are considerably reduced. In our query language, the features which need technical knowledge are hidden in application-specific concepts and structures. Therefore, users need not be aware of the underlying technology. Application-specific concepts and structures are represented by the modelling primitives of the extended RDOOM (relational deductive object-oriented modelling) which contains primitives for all crucial real world relationships (is-a relationship, part-of relationship, association), XML documents and views. Our query language also supports intensional and extensional-intensional queries, in addition to conventional extensional queries. In its query formulation, the end-user combines available application-specific concepts and structures through shared variables.
An exponentiation method for XML element retrieval.
Wichaiwong, Tanakorn
2014-01-01
XML document is now widely used for modelling and storing structured documents. The structure is very rich and carries important information about contents and their relationships, for example, e-Commerce. XML data-centric collections require query terms allowing users to specify constraints on the document structure; mapping structure queries and assigning the weight are significant for the set of possibly relevant documents with respect to structural conditions. In this paper, we present an extension to the MEXIR search system that supports the combination of structural and content queries in the form of content-and-structure queries, which we call the Exponentiation function. It has been shown the structural information improve the effectiveness of the search system up to 52.60% over the baseline BM25 at MAP.
A dirty word or a dirty world?: Attribute framing, political affiliation, and query theory.
Hardisty, David J; Johnson, Eric J; Weber, Elke U
2010-01-01
We explored the effect of attribute framing on choice, labeling charges for environmental costs as either an earmarked tax or an offset. Eight hundred ninety-eight Americans chose between otherwise identical products or services, where one option included a surcharge for emitted carbon dioxide. The cost framing changed preferences for self-identified Republicans and Independents, but did not affect Democrats' preferences. We explain this interaction by means of query theory and show that attribute framing can change the order in which internal queries supporting one or another option are posed. The effect of attribute labeling on query order is shown to depend on the representations of either taxes or offsets held by people with different political affiliations.
A Multimodal Search Engine for Medical Imaging Studies.
Pinho, Eduardo; Godinho, Tiago; Valente, Frederico; Costa, Carlos
2017-02-01
The use of digital medical imaging systems in healthcare institutions has increased significantly, and the large amounts of data in these systems have led to the conception of powerful support tools: recent studies on content-based image retrieval (CBIR) and multimodal information retrieval in the field hold great potential in decision support, as well as for addressing multiple challenges in healthcare systems, such as computer-aided diagnosis (CAD). However, the subject is still under heavy research, and very few solutions have become part of Picture Archiving and Communication Systems (PACS) in hospitals and clinics. This paper proposes an extensible platform for multimodal medical image retrieval, integrated in an open-source PACS software with profile-based CBIR capabilities. In this article, we detail a technical approach to the problem by describing its main architecture and each sub-component, as well as the available web interfaces and the multimodal query techniques applied. Finally, we assess our implementation of the engine with computational performance benchmarks.
High dimensional biological data retrieval optimization with NoSQL technology.
Wang, Shicai; Pandis, Ioannis; Wu, Chao; He, Sijin; Johnson, David; Emam, Ibrahim; Guitton, Florian; Guo, Yike
2014-01-01
High-throughput transcriptomic data generated by microarray experiments is the most abundant and frequently stored kind of data currently used in translational medicine studies. Although microarray data is supported in data warehouses such as tranSMART, when querying relational databases for hundreds of different patient gene expression records queries are slow due to poor performance. Non-relational data models, such as the key-value model implemented in NoSQL databases, hold promise to be more performant solutions. Our motivation is to improve the performance of the tranSMART data warehouse with a view to supporting Next Generation Sequencing data. In this paper we introduce a new data model better suited for high-dimensional data storage and querying, optimized for database scalability and performance. We have designed a key-value pair data model to support faster queries over large-scale microarray data and implemented the model using HBase, an implementation of Google's BigTable storage system. An experimental performance comparison was carried out against the traditional relational data model implemented in both MySQL Cluster and MongoDB, using a large publicly available transcriptomic data set taken from NCBI GEO concerning Multiple Myeloma. Our new key-value data model implemented on HBase exhibits an average 5.24-fold increase in high-dimensional biological data query performance compared to the relational model implemented on MySQL Cluster, and an average 6.47-fold increase on query performance on MongoDB. The performance evaluation found that the new key-value data model, in particular its implementation in HBase, outperforms the relational model currently implemented in tranSMART. We propose that NoSQL technology holds great promise for large-scale data management, in particular for high-dimensional biological data such as that demonstrated in the performance evaluation described in this paper. We aim to use this new data model as a basis for migrating tranSMART's implementation to a more scalable solution for Big Data.
High dimensional biological data retrieval optimization with NoSQL technology
2014-01-01
Background High-throughput transcriptomic data generated by microarray experiments is the most abundant and frequently stored kind of data currently used in translational medicine studies. Although microarray data is supported in data warehouses such as tranSMART, when querying relational databases for hundreds of different patient gene expression records queries are slow due to poor performance. Non-relational data models, such as the key-value model implemented in NoSQL databases, hold promise to be more performant solutions. Our motivation is to improve the performance of the tranSMART data warehouse with a view to supporting Next Generation Sequencing data. Results In this paper we introduce a new data model better suited for high-dimensional data storage and querying, optimized for database scalability and performance. We have designed a key-value pair data model to support faster queries over large-scale microarray data and implemented the model using HBase, an implementation of Google's BigTable storage system. An experimental performance comparison was carried out against the traditional relational data model implemented in both MySQL Cluster and MongoDB, using a large publicly available transcriptomic data set taken from NCBI GEO concerning Multiple Myeloma. Our new key-value data model implemented on HBase exhibits an average 5.24-fold increase in high-dimensional biological data query performance compared to the relational model implemented on MySQL Cluster, and an average 6.47-fold increase on query performance on MongoDB. Conclusions The performance evaluation found that the new key-value data model, in particular its implementation in HBase, outperforms the relational model currently implemented in tranSMART. We propose that NoSQL technology holds great promise for large-scale data management, in particular for high-dimensional biological data such as that demonstrated in the performance evaluation described in this paper. We aim to use this new data model as a basis for migrating tranSMART's implementation to a more scalable solution for Big Data. PMID:25435347
Combining conceptual graphs and argumentation for aiding in the teleexpertise.
Doumbouya, Mamadou Bilo; Kamsu-Foguem, Bernard; Kenfack, Hugues; Foguem, Clovis
2015-08-01
Current medical information systems are too complex to be meaningfully exploited. Hence there is a need to develop new strategies for maximising the exploitation of medical data to the benefit of medical professionals. It is against this backdrop that we want to propose a tangible contribution by providing a tool which combines conceptual graphs and Dung׳s argumentation system in order to assist medical professionals in their decision making process. The proposed tool allows medical professionals to easily manipulate and visualise queries and answers for making decisions during the practice of teleexpertise. The knowledge modelling is made using an open application programming interface (API) called CoGui, which offers the means for building structured knowledge bases with the dedicated functionalities of graph-based reasoning via retrieved data from different institutions (hospitals, national security centre, and nursing homes). The tool that we have described in this study supports a formal traceable structure of the reasoning with acceptable arguments to elucidate some ethical problems that occur very often in the telemedicine domain. Copyright © 2015 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Loyall, Joseph P.; Carvalho, Marco; Martignoni, Andrew, III; Schmidt, Douglas; Sinclair, Asher; Gillen, Matthew; Edmondson, James; Bunch, Larry; Corman, David
2009-05-01
Net-centric information spaces have become a necessary concept to support information exchange for tactical warfighting missions using a publish-subscribe-query paradigm. To support dynamic, mission-critical and time-critical operations, information spaces require quality of service (QoS)-enabled dissemination (QED) of information. This paper describes the results of research we are conducting to provide QED information exchange in tactical environments. We have developed a prototype QoS-enabled publish-subscribe-query information broker that provides timely delivery of information needed by tactical warfighters in mobile scenarios with time-critical emergent targets. This broker enables tailoring and prioritizing of information based on mission needs and responds rapidly to priority shifts and unfolding situations. This paper describes the QED architecture, prototype implementation, testing infrastructure, and empirical evaluations we have conducted based on our prototype.
A Bayesian Approach to Interactive Retrieval
ERIC Educational Resources Information Center
Tague, Jean M.
1973-01-01
A probabilistic model for interactive retrieval is presented. Bayesian statistical decision theory principles are applied: use of prior and sample information about the relationship of document descriptions to query relevance; maximization of expected value of a utility function, to the problem of optimally restructuring search strategies in an…
Leceta, Amalia; Sologuren, Ander; Valiente, Román; Campo, Cristina; Labeaga, Luis
2017-01-01
Background Bilastine is a safe and effective commonly prescribed non-sedating H1-antihistamine approved for symptomatic treatment in patients with allergic disorders such as rhinoconjunctivitis and urticaria. It was evaluated in many patients throughout the clinical development required for its approval, but clinical trials generally exclude many patients who will benefit in everyday clinical practice (especially those with coexisting diseases and/or being treated with concomitant drugs). Following its introduction into clinical practice, the Medical Information Specialists at Faes Farma have received many practical queries regarding the optimal use of bilastine in different circumstances. Data sources and methods Queries received by the Medical Information Department and the responses provided to senders of these queries. Results The most frequent questions received by the Medical Information Department included the potential for drug-drug interactions with bilastine and commonly used agents such as anticoagulants (including the novel oral anticoagulants), antiretrovirals, antituberculosis regimens, corticosteroids, digoxin, oral contraceptives, and proton pump inhibitors. Most of these medicines are not usually allowed in clinical trials, and so advice needs to be based upon the pharmacological profiles of the drugs involved and expert opinion. The pharmacokinetic profile of bilastine appears favourable since it undergoes negligible metabolism and is almost exclusively eliminated via renal excretion, and it neither induces nor inhibits the activity of several isoenzymes from the CYP 450 system. Consequently, bilastine does not interact with cytochrome metabolic pathways. Other queries involved specific patient groups such as subjects with renal impairment, women who are breastfeeding or who are trying to become pregnant, and patients with other concomitant diseases. Interestingly, several questions related to topics that are well covered in the Summary of Product Characteristics (SmPC), which suggests that this resource is not being well used. Conclusions Overall, this analysis highlights gaps in our knowledge regarding the optimal use of bilastine. Expert opinion based upon an understanding of the science can help in the decision-making, but more research is needed to provide evidence-based answers in certain circumstances. PMID:28210286
Leceta, Amalia; Sologuren, Ander; Valiente, Román; Campo, Cristina; Labeaga, Luis
2017-01-01
Bilastine is a safe and effective commonly prescribed non-sedating H 1 -antihistamine approved for symptomatic treatment in patients with allergic disorders such as rhinoconjunctivitis and urticaria. It was evaluated in many patients throughout the clinical development required for its approval, but clinical trials generally exclude many patients who will benefit in everyday clinical practice (especially those with coexisting diseases and/or being treated with concomitant drugs). Following its introduction into clinical practice, the Medical Information Specialists at Faes Farma have received many practical queries regarding the optimal use of bilastine in different circumstances. Queries received by the Medical Information Department and the responses provided to senders of these queries. The most frequent questions received by the Medical Information Department included the potential for drug-drug interactions with bilastine and commonly used agents such as anticoagulants (including the novel oral anticoagulants), antiretrovirals, antituberculosis regimens, corticosteroids, digoxin, oral contraceptives, and proton pump inhibitors. Most of these medicines are not usually allowed in clinical trials, and so advice needs to be based upon the pharmacological profiles of the drugs involved and expert opinion. The pharmacokinetic profile of bilastine appears favourable since it undergoes negligible metabolism and is almost exclusively eliminated via renal excretion, and it neither induces nor inhibits the activity of several isoenzymes from the CYP 450 system. Consequently, bilastine does not interact with cytochrome metabolic pathways. Other queries involved specific patient groups such as subjects with renal impairment, women who are breastfeeding or who are trying to become pregnant, and patients with other concomitant diseases. Interestingly, several questions related to topics that are well covered in the Summary of Product Characteristics (SmPC), which suggests that this resource is not being well used. Overall, this analysis highlights gaps in our knowledge regarding the optimal use of bilastine. Expert opinion based upon an understanding of the science can help in the decision-making, but more research is needed to provide evidence-based answers in certain circumstances.
Decision conflict and regret among surrogate decision makers in the medical intensive care unit.
Miller, Jesse J; Morris, Peter; Files, D Clark; Gower, Emily; Young, Michael
2016-04-01
Family members of critically ill patients in the intensive care unit face significant morbidity. It may be the decision-making process that plays a significant role in the psychological morbidity associated with being a surrogate in the ICU. We hypothesize that family members facing end-of-life decisions will have more decisional conflict and decisional regret than those facing non-end-of-life decisions. We enrolled a sample of adult patients and their surrogates in a tertiary care, academic medical intensive care unit. We queried the surrogates regarding decisions they had made on behalf of the patient and assessed decision conflict. We then contacted the family member again to assess decision regret. Forty (95%) of 42 surrogates were able to identify at least 1 decision they had made on behalf of the patient. End-of-life decisions (defined as do not resuscitate [DNR]/do not intubate [DNI] or continuation of life support) accounted for 19 of 40 decisions (47.5%). Overall, the average Decision Conflict Scale (DCS) score was 21.9 of 100 (range 0-100, with 0 being little decisional conflict and 100 being great decisional conflict). The average DCS score for families facing end-of-life decisions was 25.5 compared with 18.7 for all other decisions. Those facing end-of-life decisions scored higher on the uncertainty subscale (subset of DCS questions that indicates level of certainty regarding decision) with a mean score of 43.4 compared with all other decisions with a mean score of 27.0. Overall, very few surrogates experienced decisional regret with an average DRS score of 13.4 of 100. Nearly all surrogates enrolled were faced with decision-making responsibilities on behalf of his or her critically ill family member. In our small pilot study, we found more decisional conflict in those surrogates facing end-of-life decisions, specifically on the subset of questions dealing with uncertainty. Surrogates report low levels of decisional regret. Copyright © 2015 Elsevier Inc. All rights reserved.
An index-based algorithm for fast on-line query processing of latent semantic analysis
Li, Pohan; Wang, Wei
2017-01-01
Latent Semantic Analysis (LSA) is widely used for finding the documents whose semantic is similar to the query of keywords. Although LSA yield promising similar results, the existing LSA algorithms involve lots of unnecessary operations in similarity computation and candidate check during on-line query processing, which is expensive in terms of time cost and cannot efficiently response the query request especially when the dataset becomes large. In this paper, we study the efficiency problem of on-line query processing for LSA towards efficiently searching the similar documents to a given query. We rewrite the similarity equation of LSA combined with an intermediate value called partial similarity that is stored in a designed index called partial index. For reducing the searching space, we give an approximate form of similarity equation, and then develop an efficient algorithm for building partial index, which skips the partial similarities lower than a given threshold θ. Based on partial index, we develop an efficient algorithm called ILSA for supporting fast on-line query processing. The given query is transformed into a pseudo document vector, and the similarities between query and candidate documents are computed by accumulating the partial similarities obtained from the index nodes corresponds to non-zero entries in the pseudo document vector. Compared to the LSA algorithm, ILSA reduces the time cost of on-line query processing by pruning the candidate documents that are not promising and skipping the operations that make little contribution to similarity scores. Extensive experiments through comparison with LSA have been done, which demonstrate the efficiency and effectiveness of our proposed algorithm. PMID:28520747
An index-based algorithm for fast on-line query processing of latent semantic analysis.
Zhang, Mingxi; Li, Pohan; Wang, Wei
2017-01-01
Latent Semantic Analysis (LSA) is widely used for finding the documents whose semantic is similar to the query of keywords. Although LSA yield promising similar results, the existing LSA algorithms involve lots of unnecessary operations in similarity computation and candidate check during on-line query processing, which is expensive in terms of time cost and cannot efficiently response the query request especially when the dataset becomes large. In this paper, we study the efficiency problem of on-line query processing for LSA towards efficiently searching the similar documents to a given query. We rewrite the similarity equation of LSA combined with an intermediate value called partial similarity that is stored in a designed index called partial index. For reducing the searching space, we give an approximate form of similarity equation, and then develop an efficient algorithm for building partial index, which skips the partial similarities lower than a given threshold θ. Based on partial index, we develop an efficient algorithm called ILSA for supporting fast on-line query processing. The given query is transformed into a pseudo document vector, and the similarities between query and candidate documents are computed by accumulating the partial similarities obtained from the index nodes corresponds to non-zero entries in the pseudo document vector. Compared to the LSA algorithm, ILSA reduces the time cost of on-line query processing by pruning the candidate documents that are not promising and skipping the operations that make little contribution to similarity scores. Extensive experiments through comparison with LSA have been done, which demonstrate the efficiency and effectiveness of our proposed algorithm.
SPANG: a SPARQL client supporting generation and reuse of queries for distributed RDF databases.
Chiba, Hirokazu; Uchiyama, Ikuo
2017-02-08
Toward improved interoperability of distributed biological databases, an increasing number of datasets have been published in the standardized Resource Description Framework (RDF). Although the powerful SPARQL Protocol and RDF Query Language (SPARQL) provides a basis for exploiting RDF databases, writing SPARQL code is burdensome for users including bioinformaticians. Thus, an easy-to-use interface is necessary. We developed SPANG, a SPARQL client that has unique features for querying RDF datasets. SPANG dynamically generates typical SPARQL queries according to specified arguments. It can also call SPARQL template libraries constructed in a local system or published on the Web. Further, it enables combinatorial execution of multiple queries, each with a distinct target database. These features facilitate easy and effective access to RDF datasets and integrative analysis of distributed data. SPANG helps users to exploit RDF datasets by generation and reuse of SPARQL queries through a simple interface. This client will enhance integrative exploitation of biological RDF datasets distributed across the Web. This software package is freely available at http://purl.org/net/spang .
Scotch, Matthew; Parmanto, Bambang; Monaco, Valerie
2008-06-09
Data analysis in community health assessment (CHA) involves the collection, integration, and analysis of large numerical and spatial data sets in order to identify health priorities. Geographic Information Systems (GIS) enable for management and analysis using spatial data, but have limitations in performing analysis of numerical data because of its traditional database architecture.On-Line Analytical Processing (OLAP) is a multidimensional datawarehouse designed to facilitate querying of large numerical data. Coupling the spatial capabilities of GIS with the numerical analysis of OLAP, might enhance CHA data analysis. OLAP-GIS systems have been developed by university researchers and corporations, yet their potential for CHA data analysis is not well understood. To evaluate the potential of an OLAP-GIS decision support system for CHA problem solving, we compared OLAP-GIS to the standard information technology (IT) currently used by many public health professionals. SOVAT, an OLAP-GIS decision support system developed at the University of Pittsburgh, was compared against current IT for data analysis for CHA. For this study, current IT was considered the combined use of SPSS and GIS ("SPSS-GIS"). Graduate students, researchers, and faculty in the health sciences at the University of Pittsburgh were recruited. Each round consisted of: an instructional video of the system being evaluated, two practice tasks, five assessment tasks, and one post-study questionnaire. Objective and subjective measurement included: task completion time, success in answering the tasks, and system satisfaction. Thirteen individuals participated. Inferential statistics were analyzed using linear mixed model analysis. SOVAT was statistically significant (alpha = .01) from SPSS-GIS for satisfaction and time (p < .002). Descriptive results indicated that participants had greater success in answering the tasks when using SOVAT as compared to SPSS-GIS. Using SOVAT, tasks were completed more efficiently, with a higher rate of success, and with greater satisfaction, than the combined use of SPSS and GIS. The results from this study indicate a potential for OLAP-GIS decision support systems as a valuable tool for CHA data analysis.
Scotch, Matthew; Parmanto, Bambang; Monaco, Valerie
2008-01-01
Background Data analysis in community health assessment (CHA) involves the collection, integration, and analysis of large numerical and spatial data sets in order to identify health priorities. Geographic Information Systems (GIS) enable for management and analysis using spatial data, but have limitations in performing analysis of numerical data because of its traditional database architecture. On-Line Analytical Processing (OLAP) is a multidimensional datawarehouse designed to facilitate querying of large numerical data. Coupling the spatial capabilities of GIS with the numerical analysis of OLAP, might enhance CHA data analysis. OLAP-GIS systems have been developed by university researchers and corporations, yet their potential for CHA data analysis is not well understood. To evaluate the potential of an OLAP-GIS decision support system for CHA problem solving, we compared OLAP-GIS to the standard information technology (IT) currently used by many public health professionals. Methods SOVAT, an OLAP-GIS decision support system developed at the University of Pittsburgh, was compared against current IT for data analysis for CHA. For this study, current IT was considered the combined use of SPSS and GIS ("SPSS-GIS"). Graduate students, researchers, and faculty in the health sciences at the University of Pittsburgh were recruited. Each round consisted of: an instructional video of the system being evaluated, two practice tasks, five assessment tasks, and one post-study questionnaire. Objective and subjective measurement included: task completion time, success in answering the tasks, and system satisfaction. Results Thirteen individuals participated. Inferential statistics were analyzed using linear mixed model analysis. SOVAT was statistically significant (α = .01) from SPSS-GIS for satisfaction and time (p < .002). Descriptive results indicated that participants had greater success in answering the tasks when using SOVAT as compared to SPSS-GIS. Conclusion Using SOVAT, tasks were completed more efficiently, with a higher rate of success, and with greater satisfaction, than the combined use of SPSS and GIS. The results from this study indicate a potential for OLAP-GIS decision support systems as a valuable tool for CHA data analysis. PMID:18541037
DOE Office of Scientific and Technical Information (OSTI.GOV)
Czejdo, Bogdan; Bhattacharya, Sambit; Ferragut, Erik M
2012-01-01
This paper describes the syntax and semantics of multi-level state diagrams to support probabilistic behavior of cooperating robots. The techniques are presented to analyze these diagrams by querying combined robots behaviors. It is shown how to use state abstraction and transition abstraction to create, verify and process large probabilistic state diagrams.
LAILAPS-QSM: A RESTful API and JAVA library for semantic query suggestions.
Chen, Jinbo; Scholz, Uwe; Zhou, Ruonan; Lange, Matthias
2018-03-01
In order to access and filter content of life-science databases, full text search is a widely applied query interface. But its high flexibility and intuitiveness is paid for with potentially imprecise and incomplete query results. To reduce this drawback, query assistance systems suggest those combinations of keywords with the highest potential to match most of the relevant data records. Widespread approaches are syntactic query corrections that avoid misspelling and support expansion of words by suffixes and prefixes. Synonym expansion approaches apply thesauri, ontologies, and query logs. All need laborious curation and maintenance. Furthermore, access to query logs is in general restricted. Approaches that infer related queries by their query profile like research field, geographic location, co-authorship, affiliation etc. require user's registration and its public accessibility that contradict privacy concerns. To overcome these drawbacks, we implemented LAILAPS-QSM, a machine learning approach that reconstruct possible linguistic contexts of a given keyword query. The context is referred from the text records that are stored in the databases that are going to be queried or extracted for a general purpose query suggestion from PubMed abstracts and UniProt data. The supplied tool suite enables the pre-processing of these text records and the further computation of customized distributed word vectors. The latter are used to suggest alternative keyword queries. An evaluated of the query suggestion quality was done for plant science use cases. Locally present experts enable a cost-efficient quality assessment in the categories trait, biological entity, taxonomy, affiliation, and metabolic function which has been performed using ontology term similarities. LAILAPS-QSM mean information content similarity for 15 representative queries is 0.70, whereas 34% have a score above 0.80. In comparison, the information content similarity for human expert made query suggestions is 0.90. The software is either available as tool set to build and train dedicated query suggestion services or as already trained general purpose RESTful web service. The service uses open interfaces to be seamless embeddable into database frontends. The JAVA implementation uses highly optimized data structures and streamlined code to provide fast and scalable response for web service calls. The source code of LAILAPS-QSM is available under GNU General Public License version 2 in Bitbucket GIT repository: https://bitbucket.org/ipk_bit_team/bioescorte-suggestion.
Ji, Yanqing; Ying, Hao; Tran, John; Dews, Peter; Massanari, R Michael
2016-07-19
Finding highly relevant articles from biomedical databases is challenging not only because it is often difficult to accurately express a user's underlying intention through keywords but also because a keyword-based query normally returns a long list of hits with many citations being unwanted by the user. This paper proposes a novel biomedical literature search system, called BiomedSearch, which supports complex queries and relevance feedback. The system employed association mining techniques to build a k-profile representing a user's relevance feedback. More specifically, we developed a weighted interest measure and an association mining algorithm to find the strength of association between a query and each concept in the article(s) selected by the user as feedback. The top concepts were utilized to form a k-profile used for the next-round search. BiomedSearch relies on Unified Medical Language System (UMLS) knowledge sources to map text files to standard biomedical concepts. It was designed to support queries with any levels of complexity. A prototype of BiomedSearch software was made and it was preliminarily evaluated using the Genomics data from TREC (Text Retrieval Conference) 2006 Genomics Track. Initial experiment results indicated that BiomedSearch increased the mean average precision (MAP) for a set of queries. With UMLS and association mining techniques, BiomedSearch can effectively utilize users' relevance feedback to improve the performance of biomedical literature search.
Advanced Clinical Decision Support for Vaccine Adverse Event Detection and Reporting.
Baker, Meghan A; Kaelber, David C; Bar-Shain, David S; Moro, Pedro L; Zambarano, Bob; Mazza, Megan; Garcia, Crystal; Henry, Adam; Platt, Richard; Klompas, Michael
2015-09-15
Reporting of adverse events (AEs) following vaccination can help identify rare or unexpected complications of immunizations and aid in characterizing potential vaccine safety signals. We developed an open-source, generalizable clinical decision support system called Electronic Support for Public Health-Vaccine Adverse Event Reporting System (ESP-VAERS) to assist clinicians with AE detection and reporting. ESP-VAERS monitors patients' electronic health records for new diagnoses, changes in laboratory values, and new allergies following vaccinations. When suggestive events are found, ESP-VAERS sends the patient's clinician a secure electronic message with an invitation to affirm or refute the message, add comments, and submit an automated, prepopulated electronic report to VAERS. High-probability AEs are reported automatically if the clinician does not respond. We implemented ESP-VAERS in December 2012 throughout the MetroHealth System, an integrated healthcare system in Ohio. We queried the VAERS database to determine MetroHealth's baseline reporting rates from January 2009 to March 2012 and then assessed changes in reporting rates with ESP-VAERS. In the 8 months following implementation, 91 622 vaccinations were given. ESP-VAERS sent 1385 messages to responsible clinicians describing potential AEs. Clinicians opened 1304 (94.2%) messages, responded to 209 (15.1%), and confirmed 16 for transmission to VAERS. An additional 16 high-probability AEs were sent automatically. Reported events included seizure, pleural effusion, and lymphocytopenia. The odds of a VAERS report submission during the implementation period were 30.2 (95% confidence interval, 9.52-95.5) times greater than the odds during the comparable preimplementation period. An open-source, electronic health record-based clinical decision support system can increase AE detection and reporting rates in VAERS. © The Author 2015. Published by Oxford University Press on behalf of the Infectious Diseases Society of America. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
A high-performance spatial database based approach for pathology imaging algorithm evaluation
Wang, Fusheng; Kong, Jun; Gao, Jingjing; Cooper, Lee A.D.; Kurc, Tahsin; Zhou, Zhengwen; Adler, David; Vergara-Niedermayr, Cristobal; Katigbak, Bryan; Brat, Daniel J.; Saltz, Joel H.
2013-01-01
Background: Algorithm evaluation provides a means to characterize variability across image analysis algorithms, validate algorithms by comparison with human annotations, combine results from multiple algorithms for performance improvement, and facilitate algorithm sensitivity studies. The sizes of images and image analysis results in pathology image analysis pose significant challenges in algorithm evaluation. We present an efficient parallel spatial database approach to model, normalize, manage, and query large volumes of analytical image result data. This provides an efficient platform for algorithm evaluation. Our experiments with a set of brain tumor images demonstrate the application, scalability, and effectiveness of the platform. Context: The paper describes an approach and platform for evaluation of pathology image analysis algorithms. The platform facilitates algorithm evaluation through a high-performance database built on the Pathology Analytic Imaging Standards (PAIS) data model. Aims: (1) Develop a framework to support algorithm evaluation by modeling and managing analytical results and human annotations from pathology images; (2) Create a robust data normalization tool for converting, validating, and fixing spatial data from algorithm or human annotations; (3) Develop a set of queries to support data sampling and result comparisons; (4) Achieve high performance computation capacity via a parallel data management infrastructure, parallel data loading and spatial indexing optimizations in this infrastructure. Materials and Methods: We have considered two scenarios for algorithm evaluation: (1) algorithm comparison where multiple result sets from different methods are compared and consolidated; and (2) algorithm validation where algorithm results are compared with human annotations. We have developed a spatial normalization toolkit to validate and normalize spatial boundaries produced by image analysis algorithms or human annotations. The validated data were formatted based on the PAIS data model and loaded into a spatial database. To support efficient data loading, we have implemented a parallel data loading tool that takes advantage of multi-core CPUs to accelerate data injection. The spatial database manages both geometric shapes and image features or classifications, and enables spatial sampling, result comparison, and result aggregation through expressive structured query language (SQL) queries with spatial extensions. To provide scalable and efficient query support, we have employed a shared nothing parallel database architecture, which distributes data homogenously across multiple database partitions to take advantage of parallel computation power and implements spatial indexing to achieve high I/O throughput. Results: Our work proposes a high performance, parallel spatial database platform for algorithm validation and comparison. This platform was evaluated by storing, managing, and comparing analysis results from a set of brain tumor whole slide images. The tools we develop are open source and available to download. Conclusions: Pathology image algorithm validation and comparison are essential to iterative algorithm development and refinement. One critical component is the support for queries involving spatial predicates and comparisons. In our work, we develop an efficient data model and parallel database approach to model, normalize, manage and query large volumes of analytical image result data. Our experiments demonstrate that the data partitioning strategy and the grid-based indexing result in good data distribution across database nodes and reduce I/O overhead in spatial join queries through parallel retrieval of relevant data and quick subsetting of datasets. The set of tools in the framework provide a full pipeline to normalize, load, manage and query analytical results for algorithm evaluation. PMID:23599905
Database architectures for Space Telescope Science Institute
NASA Astrophysics Data System (ADS)
Lubow, Stephen
1993-08-01
At STScI nearly all large applications require database support. A general purpose architecture has been developed and is in use that relies upon an extended client-server paradigm. Processing is in general distributed across three processes, each of which generally resides on its own processor. Database queries are evaluated on one such process, called the DBMS server. The DBMS server software is provided by a database vendor. The application issues database queries and is called the application client. This client uses a set of generic DBMS application programming calls through our STDB/NET programming interface. Intermediate between the application client and the DBMS server is the STDB/NET server. This server accepts generic query requests from the application and converts them into the specific requirements of the DBMS server. In addition, it accepts query results from the DBMS server and passes them back to the application. Typically the STDB/NET server is local to the DBMS server, while the application client may be remote. The STDB/NET server provides additional capabilities such as database deadlock restart and performance monitoring. This architecture is currently in use for some major STScI applications, including the ground support system. We are currently investigating means of providing ad hoc query support to users through the above architecture. Such support is critical for providing flexible user interface capabilities. The Universal Relation advocated by Ullman, Kernighan, and others appears to be promising. In this approach, the user sees the entire database as a single table, thereby freeing the user from needing to understand the detailed schema. A software layer provides the translation between the user and detailed schema views of the database. However, many subtle issues arise in making this transformation. We are currently exploring this scheme for use in the Hubble Space Telescope user interface to the data archive system (DADS).
ERIC Educational Resources Information Center
Golden, Cynthia; Eisenberger, Dorit
1990-01-01
Carnegie Mellon University's decision to standardize its administrative system development efforts on relational database technology and structured query language is discussed and its impact is examined in one of its larger, more widely used applications, the university information system. Advantages, new responsibilities, and challenges of the…
Database Reports Over the Internet
NASA Technical Reports Server (NTRS)
Smith, Dean Lance
2002-01-01
Most of the summer was spent developing software that would permit existing test report forms to be printed over the web on a printer that is supported by Adobe Acrobat Reader. The data is stored in a DBMS (Data Base Management System). The client asks for the information from the database using an HTML (Hyper Text Markup Language) form in a web browser. JavaScript is used with the forms to assist the user and verify the integrity of the entered data. Queries to a database are made in SQL (Sequential Query Language), a widely supported standard for making queries to databases. Java servlets, programs written in the Java programming language running under the control of network server software, interrogate the database and complete a PDF form template kept in a file. The completed report is sent to the browser requesting the report. Some errors are sent to the browser in an HTML web page, others are reported to the server. Access to the databases was restricted since the data are being transported to new DBMS software that will run on new hardware. However, the SQL queries were made to Microsoft Access, a DBMS that is available on most PCs (Personal Computers). Access does support the SQL commands that were used, and a database was created with Access that contained typical data for the report forms. Some of the problems and features are discussed below.
Enabling multi-level relevance feedback on PubMed by integrating rank learning into DBMS.
Yu, Hwanjo; Kim, Taehoon; Oh, Jinoh; Ko, Ilhwan; Kim, Sungchul; Han, Wook-Shin
2010-04-16
Finding relevant articles from PubMed is challenging because it is hard to express the user's specific intention in the given query interface, and a keyword query typically retrieves a large number of results. Researchers have applied machine learning techniques to find relevant articles by ranking the articles according to the learned relevance function. However, the process of learning and ranking is usually done offline without integrated with the keyword queries, and the users have to provide a large amount of training documents to get a reasonable learning accuracy. This paper proposes a novel multi-level relevance feedback system for PubMed, called RefMed, which supports both ad-hoc keyword queries and a multi-level relevance feedback in real time on PubMed. RefMed supports a multi-level relevance feedback by using the RankSVM as the learning method, and thus it achieves higher accuracy with less feedback. RefMed "tightly" integrates the RankSVM into RDBMS to support both keyword queries and the multi-level relevance feedback in real time; the tight coupling of the RankSVM and DBMS substantially improves the processing time. An efficient parameter selection method for the RankSVM is also proposed, which tunes the RankSVM parameter without performing validation. Thereby, RefMed achieves a high learning accuracy in real time without performing a validation process. RefMed is accessible at http://dm.postech.ac.kr/refmed. RefMed is the first multi-level relevance feedback system for PubMed, which achieves a high accuracy with less feedback. It effectively learns an accurate relevance function from the user's feedback and efficiently processes the function to return relevant articles in real time.
Enabling multi-level relevance feedback on PubMed by integrating rank learning into DBMS
2010-01-01
Background Finding relevant articles from PubMed is challenging because it is hard to express the user's specific intention in the given query interface, and a keyword query typically retrieves a large number of results. Researchers have applied machine learning techniques to find relevant articles by ranking the articles according to the learned relevance function. However, the process of learning and ranking is usually done offline without integrated with the keyword queries, and the users have to provide a large amount of training documents to get a reasonable learning accuracy. This paper proposes a novel multi-level relevance feedback system for PubMed, called RefMed, which supports both ad-hoc keyword queries and a multi-level relevance feedback in real time on PubMed. Results RefMed supports a multi-level relevance feedback by using the RankSVM as the learning method, and thus it achieves higher accuracy with less feedback. RefMed "tightly" integrates the RankSVM into RDBMS to support both keyword queries and the multi-level relevance feedback in real time; the tight coupling of the RankSVM and DBMS substantially improves the processing time. An efficient parameter selection method for the RankSVM is also proposed, which tunes the RankSVM parameter without performing validation. Thereby, RefMed achieves a high learning accuracy in real time without performing a validation process. RefMed is accessible at http://dm.postech.ac.kr/refmed. Conclusions RefMed is the first multi-level relevance feedback system for PubMed, which achieves a high accuracy with less feedback. It effectively learns an accurate relevance function from the user’s feedback and efficiently processes the function to return relevant articles in real time. PMID:20406504
An Exponentiation Method for XML Element Retrieval
2014-01-01
XML document is now widely used for modelling and storing structured documents. The structure is very rich and carries important information about contents and their relationships, for example, e-Commerce. XML data-centric collections require query terms allowing users to specify constraints on the document structure; mapping structure queries and assigning the weight are significant for the set of possibly relevant documents with respect to structural conditions. In this paper, we present an extension to the MEXIR search system that supports the combination of structural and content queries in the form of content-and-structure queries, which we call the Exponentiation function. It has been shown the structural information improve the effectiveness of the search system up to 52.60% over the baseline BM25 at MAP. PMID:24696643
Querying Event Sequences by Exact Match or Similarity Search: Design and Empirical Evaluation
Wongsuphasawat, Krist; Plaisant, Catherine; Taieb-Maimon, Meirav; Shneiderman, Ben
2012-01-01
Specifying event sequence queries is challenging even for skilled computer professionals familiar with SQL. Most graphical user interfaces for database search use an exact match approach, which is often effective, but near misses may also be of interest. We describe a new similarity search interface, in which users specify a query by simply placing events on a blank timeline and retrieve a similarity-ranked list of results. Behind this user interface is a new similarity measure for event sequences which the users can customize by four decision criteria, enabling them to adjust the impact of missing, extra, or swapped events or the impact of time shifts. We describe a use case with Electronic Health Records based on our ongoing collaboration with hospital physicians. A controlled experiment with 18 participants compared exact match and similarity search interfaces. We report on the advantages and disadvantages of each interface and suggest a hybrid interface combining the best of both. PMID:22379286
Logic-Based Retrieval: Technology for Content-Oriented and Analytical Querying of Patent Data
NASA Astrophysics Data System (ADS)
Klampanos, Iraklis Angelos; Wu, Hengzhi; Roelleke, Thomas; Azzam, Hany
Patent searching is a complex retrieval task. An initial document search is only the starting point of a chain of searches and decisions that need to be made by patent searchers. Keyword-based retrieval is adequate for document searching, but it is not suitable for modelling comprehensive retrieval strategies. DB-like and logical approaches are the state-of-the-art techniques to model strategies, reasoning and decision making. In this paper we present the application of logical retrieval to patent searching. The two grand challenges are expressiveness and scalability, where high degree of expressiveness usually means a loss in scalability. In this paper we report how to maintain scalability while offering the expressiveness of logical retrieval required for solving patent search tasks. We present logical retrieval background, and how to model data-source selection and results' fusion. Moreover, we demonstrate the modelling of a retrieval strategy, a technique by which patent professionals are able to express, store and exchange their strategies and rationales when searching patents or when making decisions. An overview of the architecture and technical details complement the paper, while the evaluation reports preliminary results on how query processing times can be guaranteed, and how quality is affected by trading off responsiveness.
Machine Translation-Supported Cross-Language Information Retrieval for a Consumer Health Resource
Rosemblat, Graciela; Gemoets, Darren; Browne, Allen C.; Tse, Tony
2003-01-01
The U.S. National Institutes of Health, through its National Library of Medicine, developed ClinicalTrials.gov to provide the public with easy access to information on clinical trials on a wide range of conditions or diseases. Only English language information retrieval is currently supported. Given the growing number of Spanish speakers in the U.S. and their increasing use of the Web, we anticipate a significant increase in Spanish-speaking users. This study compares the effectiveness of two common cross-language information retrieval methods using machine translation, query translation versus document translation, using a subset of genuine user queries from ClinicalTrials.gov. Preliminary results conducted with the ClinicalTrials.gov search engine show that in our environment, query translation is statistically significantly better than document translation. We discuss possible reasons for this result and we conclude with suggestions for future work. PMID:14728236
Reactome graph database: Efficient access to complex pathway data
Korninger, Florian; Viteri, Guilherme; Marin-Garcia, Pablo; Ping, Peipei; Wu, Guanming; Stein, Lincoln; D’Eustachio, Peter
2018-01-01
Reactome is a free, open-source, open-data, curated and peer-reviewed knowledgebase of biomolecular pathways. One of its main priorities is to provide easy and efficient access to its high quality curated data. At present, biological pathway databases typically store their contents in relational databases. This limits access efficiency because there are performance issues associated with queries traversing highly interconnected data. The same data in a graph database can be queried more efficiently. Here we present the rationale behind the adoption of a graph database (Neo4j) as well as the new ContentService (REST API) that provides access to these data. The Neo4j graph database and its query language, Cypher, provide efficient access to the complex Reactome data model, facilitating easy traversal and knowledge discovery. The adoption of this technology greatly improved query efficiency, reducing the average query time by 93%. The web service built on top of the graph database provides programmatic access to Reactome data by object oriented queries, but also supports more complex queries that take advantage of the new underlying graph-based data storage. By adopting graph database technology we are providing a high performance pathway data resource to the community. The Reactome graph database use case shows the power of NoSQL database engines for complex biological data types. PMID:29377902
Labeling RDF Graphs for Linear Time and Space Querying
NASA Astrophysics Data System (ADS)
Furche, Tim; Weinzierl, Antonius; Bry, François
Indices and data structures for web querying have mostly considered tree shaped data, reflecting the view of XML documents as tree-shaped. However, for RDF (and when querying ID/IDREF constraints in XML) data is indisputably graph-shaped. In this chapter, we first study existing indexing and labeling schemes for RDF and other graph datawith focus on support for efficient adjacency and reachability queries. For XML, labeling schemes are an important part of the widespread adoption of XML, in particular for mapping XML to existing (relational) database technology. However, the existing indexing and labeling schemes for RDF (and graph data in general) sacrifice one of the most attractive properties of XML labeling schemes, the constant time (and per-node space) test for adjacency (child) and reachability (descendant). In the second part, we introduce the first labeling scheme for RDF data that retains this property and thus achieves linear time and space processing of acyclic RDF queries on a significantly larger class of graphs than previous approaches (which are mostly limited to tree-shaped data). Finally, we show how this labeling scheme can be applied to (acyclic) SPARQL queries to obtain an evaluation algorithm with time and space complexity linear in the number of resources in the queried RDF graph.
An approach for heterogeneous and loosely coupled geospatial data distributed computing
NASA Astrophysics Data System (ADS)
Chen, Bin; Huang, Fengru; Fang, Yu; Huang, Zhou; Lin, Hui
2010-07-01
Most GIS (Geographic Information System) applications tend to have heterogeneous and autonomous geospatial information resources, and the availability of these local resources is unpredictable and dynamic under a distributed computing environment. In order to make use of these local resources together to solve larger geospatial information processing problems that are related to an overall situation, in this paper, with the support of peer-to-peer computing technologies, we propose a geospatial data distributed computing mechanism that involves loosely coupled geospatial resource directories and a term named as Equivalent Distributed Program of global geospatial queries to solve geospatial distributed computing problems under heterogeneous GIS environments. First, a geospatial query process schema for distributed computing as well as a method for equivalent transformation from a global geospatial query to distributed local queries at SQL (Structured Query Language) level to solve the coordinating problem among heterogeneous resources are presented. Second, peer-to-peer technologies are used to maintain a loosely coupled network environment that consists of autonomous geospatial information resources, thus to achieve decentralized and consistent synchronization among global geospatial resource directories, and to carry out distributed transaction management of local queries. Finally, based on the developed prototype system, example applications of simple and complex geospatial data distributed queries are presented to illustrate the procedure of global geospatial information processing.
Reactome graph database: Efficient access to complex pathway data.
Fabregat, Antonio; Korninger, Florian; Viteri, Guilherme; Sidiropoulos, Konstantinos; Marin-Garcia, Pablo; Ping, Peipei; Wu, Guanming; Stein, Lincoln; D'Eustachio, Peter; Hermjakob, Henning
2018-01-01
Reactome is a free, open-source, open-data, curated and peer-reviewed knowledgebase of biomolecular pathways. One of its main priorities is to provide easy and efficient access to its high quality curated data. At present, biological pathway databases typically store their contents in relational databases. This limits access efficiency because there are performance issues associated with queries traversing highly interconnected data. The same data in a graph database can be queried more efficiently. Here we present the rationale behind the adoption of a graph database (Neo4j) as well as the new ContentService (REST API) that provides access to these data. The Neo4j graph database and its query language, Cypher, provide efficient access to the complex Reactome data model, facilitating easy traversal and knowledge discovery. The adoption of this technology greatly improved query efficiency, reducing the average query time by 93%. The web service built on top of the graph database provides programmatic access to Reactome data by object oriented queries, but also supports more complex queries that take advantage of the new underlying graph-based data storage. By adopting graph database technology we are providing a high performance pathway data resource to the community. The Reactome graph database use case shows the power of NoSQL database engines for complex biological data types.
Andrews, R D; Beauchamp, C
1989-12-01
The Department of Veterans Affairs (VA) Decentralized Hospital Computer Program (DHCP) contains data modules derived from separate ancillary services (e.g., Lab, Pharmacy and Radiology). It is currently difficult to integrate information between the modules. A prototype is being developed aimed at integrating ancillary data by storing clinical data oriented to the patient so that there is easy interaction of data from multiple services. A set of program utilities provides for user-defined functions of decision support, queries, and reports. Information can be used to monitor quality of care by providing feedback in the form of reports, and reminders. Initial testing has indicated the prototype's design and implementation are feasible (in terms of space requirements, speed, and ease of use) in outpatient and inpatient settings. The design, development, and clinical use of this prototype are described.
Selecting materialized views using random algorithm
NASA Astrophysics Data System (ADS)
Zhou, Lijuan; Hao, Zhongxiao; Liu, Chi
2007-04-01
The data warehouse is a repository of information collected from multiple possibly heterogeneous autonomous distributed databases. The information stored at the data warehouse is in form of views referred to as materialized views. The selection of the materialized views is one of the most important decisions in designing a data warehouse. Materialized views are stored in the data warehouse for the purpose of efficiently implementing on-line analytical processing queries. The first issue for the user to consider is query response time. So in this paper, we develop algorithms to select a set of views to materialize in data warehouse in order to minimize the total view maintenance cost under the constraint of a given query response time. We call it query_cost view_ selection problem. First, cost graph and cost model of query_cost view_ selection problem are presented. Second, the methods for selecting materialized views by using random algorithms are presented. The genetic algorithm is applied to the materialized views selection problem. But with the development of genetic process, the legal solution produced become more and more difficult, so a lot of solutions are eliminated and producing time of the solutions is lengthened in genetic algorithm. Therefore, improved algorithm has been presented in this paper, which is the combination of simulated annealing algorithm and genetic algorithm for the purpose of solving the query cost view selection problem. Finally, in order to test the function and efficiency of our algorithms experiment simulation is adopted. The experiments show that the given methods can provide near-optimal solutions in limited time and works better in practical cases. Randomized algorithms will become invaluable tools for data warehouse evolution.
A journey to Semantic Web query federation in the life sciences.
Cheung, Kei-Hoi; Frost, H Robert; Marshall, M Scott; Prud'hommeaux, Eric; Samwald, Matthias; Zhao, Jun; Paschke, Adrian
2009-10-01
As interest in adopting the Semantic Web in the biomedical domain continues to grow, Semantic Web technology has been evolving and maturing. A variety of technological approaches including triplestore technologies, SPARQL endpoints, Linked Data, and Vocabulary of Interlinked Datasets have emerged in recent years. In addition to the data warehouse construction, these technological approaches can be used to support dynamic query federation. As a community effort, the BioRDF task force, within the Semantic Web for Health Care and Life Sciences Interest Group, is exploring how these emerging approaches can be utilized to execute distributed queries across different neuroscience data sources. We have created two health care and life science knowledge bases. We have explored a variety of Semantic Web approaches to describe, map, and dynamically query multiple datasets. We have demonstrated several federation approaches that integrate diverse types of information about neurons and receptors that play an important role in basic, clinical, and translational neuroscience research. Particularly, we have created a prototype receptor explorer which uses OWL mappings to provide an integrated list of receptors and executes individual queries against different SPARQL endpoints. We have also employed the AIDA Toolkit, which is directed at groups of knowledge workers who cooperatively search, annotate, interpret, and enrich large collections of heterogeneous documents from diverse locations. We have explored a tool called "FeDeRate", which enables a global SPARQL query to be decomposed into subqueries against the remote databases offering either SPARQL or SQL query interfaces. Finally, we have explored how to use the vocabulary of interlinked Datasets (voiD) to create metadata for describing datasets exposed as Linked Data URIs or SPARQL endpoints. We have demonstrated the use of a set of novel and state-of-the-art Semantic Web technologies in support of a neuroscience query federation scenario. We have identified both the strengths and weaknesses of these technologies. While Semantic Web offers a global data model including the use of Uniform Resource Identifiers (URI's), the proliferation of semantically-equivalent URI's hinders large scale data integration. Our work helps direct research and tool development, which will be of benefit to this community.
A journey to Semantic Web query federation in the life sciences
Cheung, Kei-Hoi; Frost, H Robert; Marshall, M Scott; Prud'hommeaux, Eric; Samwald, Matthias; Zhao, Jun; Paschke, Adrian
2009-01-01
Background As interest in adopting the Semantic Web in the biomedical domain continues to grow, Semantic Web technology has been evolving and maturing. A variety of technological approaches including triplestore technologies, SPARQL endpoints, Linked Data, and Vocabulary of Interlinked Datasets have emerged in recent years. In addition to the data warehouse construction, these technological approaches can be used to support dynamic query federation. As a community effort, the BioRDF task force, within the Semantic Web for Health Care and Life Sciences Interest Group, is exploring how these emerging approaches can be utilized to execute distributed queries across different neuroscience data sources. Methods and results We have created two health care and life science knowledge bases. We have explored a variety of Semantic Web approaches to describe, map, and dynamically query multiple datasets. We have demonstrated several federation approaches that integrate diverse types of information about neurons and receptors that play an important role in basic, clinical, and translational neuroscience research. Particularly, we have created a prototype receptor explorer which uses OWL mappings to provide an integrated list of receptors and executes individual queries against different SPARQL endpoints. We have also employed the AIDA Toolkit, which is directed at groups of knowledge workers who cooperatively search, annotate, interpret, and enrich large collections of heterogeneous documents from diverse locations. We have explored a tool called "FeDeRate", which enables a global SPARQL query to be decomposed into subqueries against the remote databases offering either SPARQL or SQL query interfaces. Finally, we have explored how to use the vocabulary of interlinked Datasets (voiD) to create metadata for describing datasets exposed as Linked Data URIs or SPARQL endpoints. Conclusion We have demonstrated the use of a set of novel and state-of-the-art Semantic Web technologies in support of a neuroscience query federation scenario. We have identified both the strengths and weaknesses of these technologies. While Semantic Web offers a global data model including the use of Uniform Resource Identifiers (URI's), the proliferation of semantically-equivalent URI's hinders large scale data integration. Our work helps direct research and tool development, which will be of benefit to this community. PMID:19796394
NASA Astrophysics Data System (ADS)
Wright, D. J.; Lassoued, Y.; Dwyer, N.; Haddad, T.; Bermudez, L. E.; Dunne, D.
2009-12-01
Coastal mapping plays an important role in informing marine spatial planning, resource management, maritime safety, hazard assessment and even national sovereignty. As such, there is now a plethora of data/metadata catalogs, pre-made maps, tabular and text information on resource availability and exploitation, and decision-making tools. A recent trend has been to encapsulate these in a special class of web-enabled geographic information systems called a coastal web atlas (CWA). While multiple benefits are derived from tailor-made atlases, there is great value added from the integration of disparate CWAs. CWAs linked to one another can query more successfully to optimize planning and decision-making. If a dataset is missing in one atlas, it may be immediately located in another. Similar datasets in two atlases may be combined to enhance study in either region. *But how best to achieve semantic interoperability to mitigate vague data queries, concepts or natural language semantics when retrieving and integrating data and information?* We report on the development of a new prototype seeking to interoperate between two initial CWAs: the Marine Irish Digital Atlas (MIDA) and the Oregon Coastal Atlas (OCA). These two mature atlases are used as a testbed for more regional connections, with the intent for the OCA to use lessons learned to develop a regional network of CWAs along the west coast, and for MIDA to do the same in building and strengthening atlas networks with the UK, Belgium, and other parts of Europe. Our prototype uses semantic interoperability via services harmonization and ontology mediation, allowing local atlases to use their own data structures, and vocabularies (ontologies). We use standard technologies such as OGC Web Map Services (WMS) for delivering maps, and OGC Catalogue Service for the Web (CSW) for delivering and querying ISO-19139 metadata. The metadata records of a given CWA use a given ontology of terms called local ontology. Human or machine users formulate their requests using a common ontology of metadata terms, called global ontology. A CSW mediator rewrites the user’s request into CSW requests over local CSWs using their own (local) ontologies, collects the results and sends them back to the user. To extend the system, we have recently added global maritime boundaries and are also considering nearshore ocean observing system data. Ongoing work includes adding WFS, error management, and exception handling, enabling Smart Searches, and writing full documentation. This prototype is a central research project of the new International Coastal Atlas Network (ICAN), a group of 30+ organizations from 14 nations (and growing) dedicated to seeking interoperability approaches to CWAs in support of coastal zone management and the translation of coastal science to coastal decision-making.
Web-Based Geographic Information System Tool for Accessing Hanford Site Environmental Data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Triplett, Mark B.; Seiple, Timothy E.; Watson, David J.
Data volume, complexity, and access issues pose severe challenges for analysts, regulators and stakeholders attempting to efficiently use legacy data to support decision making at the U.S. Department of Energy’s (DOE) Hanford Site. DOE has partnered with the Pacific Northwest National Laboratory (PNNL) on the PHOENIX (PNNL-Hanford Online Environmental Information System) project, which seeks to address data access, transparency, and integration challenges at Hanford to provide effective decision support. PHOENIX is a family of spatially-enabled web applications providing quick access to decades of valuable scientific data and insight through intuitive query, visualization, and analysis tools. PHOENIX realizes broad, public accessibilitymore » by relying only on ubiquitous web-browsers, eliminating the need for specialized software. It accommodates a wide range of users with intuitive user interfaces that require little or no training to quickly obtain and visualize data. Currently, PHOENIX is actively hosting three applications focused on groundwater monitoring, groundwater clean-up performance reporting, and in-tank monitoring. PHOENIX-based applications are being used to streamline investigative and analytical processes at Hanford, saving time and money. But more importantly, by integrating previously isolated datasets and developing relevant visualization and analysis tools, PHOENIX applications are enabling DOE to discover new correlations hidden in legacy data, allowing them to more effectively address complex issues at Hanford.« less
The Arden Syntax standard for clinical decision support: experiences and directions.
Samwald, Matthias; Fehre, Karsten; de Bruin, Jeroen; Adlassnig, Klaus-Peter
2012-08-01
Arden Syntax is a widely recognized standard for representing clinical and scientific knowledge in an executable format. It has a history that reaches back until 1989 and is currently maintained by the Health Level 7 (HL7) organization. We created a production-ready development environment, compiler, rule engine and application server for Arden Syntax. Over the course of several years, we have applied this Arden - Syntax - based CDS system in a wide variety of clinical problem domains, such as hepatitis serology interpretation, monitoring of nosocomial infections or the prediction of metastatic events in melanoma patients. We found the Arden Syntax standard to be very suitable for the practical implementation of CDS systems. Among the advantages of Arden Syntax are its status as an actively developed HL7 standard, the readability of the syntax, and various syntactic features such as flexible list handling. A major challenge we encountered was the technical integration of our CDS systems in existing, heterogeneous health information systems. To address this issue, we are currently working on incorporating the HL7 standard GELLO, which provides a standardized interface and query language for accessing data in health information systems. We hope that these planned extensions of the Arden Syntax might eventually help in realizing the vision of a global, interoperable and shared library of clinical decision support knowledge. Copyright © 2012 Elsevier Inc. All rights reserved.
Robust Requirements Tracing via Internet Search Technology: Improving an IV and V Technique. Phase 2
NASA Technical Reports Server (NTRS)
Hayes, Jane; Dekhtyar, Alex
2004-01-01
There are three major objectives to this phase of the work. (1) Improvement of Information Retrieval (IR) methods for Independent Verification and Validation (IV&V) requirements tracing. Information Retrieval methods are typically developed for very large (order of millions - tens of millions and more documents) document collections and therefore, most successfully used methods somewhat sacrifice precision and recall in order to achieve efficiency. At the same time typical IR systems treat all user queries as independent of each other and assume that relevance of documents to queries is subjective for each user. The IV&V requirements tracing problem has a much smaller data set to operate on, even for large software development projects; the set of queries is predetermined by the high-level specification document and individual requirements considered as query input to IR methods are not necessarily independent from each other. Namely, knowledge about the links for one requirement may be helpful in determining the links of another requirement. Finally, while the final decision on the exact form of the traceability matrix still belongs to the IV&V analyst, his/her decisions are much less arbitrary than those of an Internet search engine user. All this suggests that the information available to us in the framework of the IV&V tracing problem can be successfully leveraged to enhance standard IR techniques, which in turn would lead to increased recall and precision. We developed several new methods during Phase II; (2) IV&V requirements tracing IR toolkit. Based on the methods developed in Phase I and their improvements developed in Phase II, we built a toolkit of IR methods for IV&V requirements tracing. The toolkit has been integrated, at the data level, with SAIC's SuperTracePlus (STP) tool; (3) Toolkit testing. We tested the methods included in the IV&V requirements tracing IR toolkit on a number of projects.
Semantic-based surveillance video retrieval.
Hu, Weiming; Xie, Dan; Fu, Zhouyu; Zeng, Wenrong; Maybank, Steve
2007-04-01
Visual surveillance produces large amounts of video data. Effective indexing and retrieval from surveillance video databases are very important. Although there are many ways to represent the content of video clips in current video retrieval algorithms, there still exists a semantic gap between users and retrieval systems. Visual surveillance systems supply a platform for investigating semantic-based video retrieval. In this paper, a semantic-based video retrieval framework for visual surveillance is proposed. A cluster-based tracking algorithm is developed to acquire motion trajectories. The trajectories are then clustered hierarchically using the spatial and temporal information, to learn activity models. A hierarchical structure of semantic indexing and retrieval of object activities, where each individual activity automatically inherits all the semantic descriptions of the activity model to which it belongs, is proposed for accessing video clips and individual objects at the semantic level. The proposed retrieval framework supports various queries including queries by keywords, multiple object queries, and queries by sketch. For multiple object queries, succession and simultaneity restrictions, together with depth and breadth first orders, are considered. For sketch-based queries, a method for matching trajectories drawn by users to spatial trajectories is proposed. The effectiveness and efficiency of our framework are tested in a crowded traffic scene.
Shuttle-Data-Tape XML Translator
NASA Technical Reports Server (NTRS)
Barry, Matthew R.; Osborne, Richard N.
2005-01-01
JSDTImport is a computer program for translating native Shuttle Data Tape (SDT) files from American Standard Code for Information Interchange (ASCII) format into databases in other formats. JSDTImport solves the problem of organizing the SDT content, affording flexibility to enable users to choose how to store the information in a database to better support client and server applications. JSDTImport can be dynamically configured by use of a simple Extensible Markup Language (XML) file. JSDTImport uses this XML file to define how each record and field will be parsed, its layout and definition, and how the resulting database will be structured. JSDTImport also includes a client application programming interface (API) layer that provides abstraction for the data-querying process. The API enables a user to specify the search criteria to apply in gathering all the data relevant to a query. The API can be used to organize the SDT content and translate into a native XML database. The XML format is structured into efficient sections, enabling excellent query performance by use of the XPath query language. Optionally, the content can be translated into a Structured Query Language (SQL) database for fast, reliable SQL queries on standard database server computers.
Guided Iterative Substructure Search (GI-SSS) - A New Trick for an Old Dog.
Weskamp, Nils
2016-07-01
Substructure search (SSS) is a fundamental technique supported by various chemical information systems. Many users apply it in an iterative manner: they modify their queries to shape the composition of the retrieved hit sets according to their needs. We propose and evaluate two heuristic extensions of SSS aimed at simplifying these iterative query modifications by collecting additional information during query processing and visualizing this information in an intuitive way. This gives the user a convenient feedback on how certain changes to the query would affect the retrieved hit set and reduces the number of trial-and-error cycles needed to generate an optimal search result. The proposed heuristics are simple, yet surprisingly effective and can be easily added to existing SSS implementations. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
An efficient approach for video information retrieval
NASA Astrophysics Data System (ADS)
Dong, Daoguo; Xue, Xiangyang
2005-01-01
Today, more and more video information can be accessed through internet, satellite, etc.. Retrieving specific video information from large-scale video database has become an important and challenging research topic in the area of multimedia information retrieval. In this paper, we introduce a new and efficient index structure OVA-File, which is a variant of VA-File. In OVA-File, the approximations close to each other in data space are stored in close positions of the approximation file. The benefit is that only a part of approximations close to the query vector need to be visited to get the query result. Both shot query algorithm and video clip algorithm are proposed to support video information retrieval efficiently. The experimental results showed that the queries based on OVA-File were much faster than that based on VA-File with small loss of result quality.
ERIC Educational Resources Information Center
Stofer, Kathryn A.; Schiebel, Tracee M.
2017-01-01
Researchers and pollsters still debate the acceptance of genetic engineering technology among U.S. adults, and continue to assess their knowledge as part of this research. While decision-making may not rely entirely on knowledge, querying opinions and perceptions rely on public understanding of genetic engineering terms. Experience with…
Course Recommendation Based on Query Classification Approach
ERIC Educational Resources Information Center
Gulzar, Zameer; Leema, A. Anny
2018-01-01
This article describes how with a non-formal education, a scholar has to choose courses among various domains to meet the research aims. In spite of this, the availability of large number of courses, makes the process of selecting the appropriate course a tedious, time-consuming, and risky decision, and the course selection will directly affect…
ERIC Educational Resources Information Center
Layton, Rebekah L.; Brandt, Patrick D.; Freeman, Ashalla M.; Harrell, Jessica R.; Hall, Joshua D.; Sinche, Melanie
2016-01-01
A national sample of PhD-trained scientists completed training, accepted subsequent employment in academic and nonacademic positions, and were queried about their previous graduate training and current employment. Respondents indicated factors contributing to their employment decision (e.g., working conditions, salary, job security). The data…
Querying archetype-based EHRs by search ontology-based XPath engineering.
Kropf, Stefan; Uciteli, Alexandr; Schierle, Katrin; Krücken, Peter; Denecke, Kerstin; Herre, Heinrich
2018-05-11
Legacy data and new structured data can be stored in a standardized format as XML-based EHRs on XML databases. Querying documents on these databases is crucial for answering research questions. Instead of using free text searches, that lead to false positive results, the precision can be increased by constraining the search to certain parts of documents. A search ontology-based specification of queries on XML documents defines search concepts and relates them to parts in the XML document structure. Such query specification method is practically introduced and evaluated by applying concrete research questions formulated in natural language on a data collection for information retrieval purposes. The search is performed by search ontology-based XPath engineering that reuses ontologies and XML-related W3C standards. The key result is that the specification of research questions can be supported by the usage of search ontology-based XPath engineering. A deeper recognition of entities and a semantic understanding of the content is necessary for a further improvement of precision and recall. Key limitation is that the application of the introduced process requires skills in ontology and software development. In future, the time consuming ontology development could be overcome by implementing a new clinical role: the clinical ontologist. The introduced Search Ontology XML extension connects Search Terms to certain parts in XML documents and enables an ontology-based definition of queries. Search ontology-based XPath engineering can support research question answering by the specification of complex XPath expressions without deep syntax knowledge about XPaths.
Hierarchical video surveillance architecture: a chassis for video big data analytics and exploration
NASA Astrophysics Data System (ADS)
Ajiboye, Sola O.; Birch, Philip; Chatwin, Christopher; Young, Rupert
2015-03-01
There is increasing reliance on video surveillance systems for systematic derivation, analysis and interpretation of the data needed for predicting, planning, evaluating and implementing public safety. This is evident from the massive number of surveillance cameras deployed across public locations. For example, in July 2013, the British Security Industry Association (BSIA) reported that over 4 million CCTV cameras had been installed in Britain alone. The BSIA also reveal that only 1.5% of these are state owned. In this paper, we propose a framework that allows access to data from privately owned cameras, with the aim of increasing the efficiency and accuracy of public safety planning, security activities, and decision support systems that are based on video integrated surveillance systems. The accuracy of results obtained from government-owned public safety infrastructure would improve greatly if privately owned surveillance systems `expose' relevant video-generated metadata events, such as triggered alerts and also permit query of a metadata repository. Subsequently, a police officer, for example, with an appropriate level of system permission can query unified video systems across a large geographical area such as a city or a country to predict the location of an interesting entity, such as a pedestrian or a vehicle. This becomes possible with our proposed novel hierarchical architecture, the Fused Video Surveillance Architecture (FVSA). At the high level, FVSA comprises of a hardware framework that is supported by a multi-layer abstraction software interface. It presents video surveillance systems as an adapted computational grid of intelligent services, which is integration-enabled to communicate with other compatible systems in the Internet of Things (IoT).
Cloud-Based Tools to Support High-Resolution Modeling (Invited)
NASA Astrophysics Data System (ADS)
Jones, N.; Nelson, J.; Swain, N.; Christensen, S.
2013-12-01
The majority of watershed models developed to support decision-making by water management agencies are simple, lumped-parameter models. Maturity in research codes and advances in the computational power from multi-core processors on desktop machines, commercial cloud-computing resources, and supercomputers with thousands of cores have created new opportunities for employing more accurate, high-resolution distributed models for routine use in decision support. The barriers for using such models on a more routine basis include massive amounts of spatial data that must be processed for each new scenario and lack of efficient visualization tools. In this presentation we will review a current NSF-funded project called CI-WATER that is intended to overcome many of these roadblocks associated with high-resolution modeling. We are developing a suite of tools that will make it possible to deploy customized web-based apps for running custom scenarios for high-resolution models with minimal effort. These tools are based on a software stack that includes 52 North, MapServer, PostGIS, HT Condor, CKAN, and Python. This open source stack provides a simple scripting environment for quickly configuring new custom applications for running high-resolution models as geoprocessing workflows. The HT Condor component facilitates simple access to local distributed computers or commercial cloud resources when necessary for stochastic simulations. The CKAN framework provides a powerful suite of tools for hosting such workflows in a web-based environment that includes visualization tools and storage of model simulations in a database to archival, querying, and sharing of model results. Prototype applications including land use change, snow melt, and burned area analysis will be presented. This material is based upon work supported by the National Science Foundation under Grant No. 1135482
Multi-hazard risk analysis using the FP7 RASOR Platform
NASA Astrophysics Data System (ADS)
Koudogbo, Fifamè N.; Duro, Javier; Rossi, Lauro; Rudari, Roberto; Eddy, Andrew
2014-10-01
Climate change challenges our understanding of risk by modifying hazards and their interactions. Sudden increases in population and rapid urbanization are changing exposure to risk around the globe, making impacts harder to predict. Despite the availability of operational mapping products, there is no single tool to integrate diverse data and products across hazards, update exposure data quickly and make scenario-based predictions to support both short and long-term risk-related decisions. RASOR (Rapid Analysis and Spatialization Of Risk) will develop a platform to perform multi-hazard risk analysis for the full cycle of disaster management, including targeted support to critical infrastructure monitoring and climate change impact assessment. A scenario-driven query system simulates future scenarios based on existing or assumed conditions and compares them with historical scenarios. RASOR will thus offer a single work environment that generates new risk information across hazards, across data types (satellite EO, in-situ), across user communities (global, local, climate, civil protection, insurance, etc.) and across the world. Five case study areas are considered within the project, located in Haiti, Indonesia, Netherlands, Italy and Greece. Initially available over those demonstration areas, RASOR will ultimately offer global services to support in-depth risk assessment and full-cycle risk management.
Any information, anywhere, anytime for the warfighter
NASA Astrophysics Data System (ADS)
Lazaroff, Mark B.; Sage, Philip A.
1997-06-01
The objective of the DARPA battlefield awareness data dissemination (BADD) program is to deliver battlefield awareness information to the warfighter -- anywhere, anytime. BADD is an advanced concept technology demonstration (ACTD) to support proof of concept technology demonstrations and experiments with a goal of introducing new technology to support the operational needs and acceptance of the warfighter. BADD's information management technology provides a 'smart' push of information to the users by providing information subscription services implemented via user- generated profiles. The system also provides services for warfighter pull or 'reach-back' of information via ad hoc query support. The high bandwidth delivery of informtion via the Global Broadcast System (GBS) satellites enables users to receive battlefield awareness information virtually anywhere. Very similar goals have been established for data warehousing technology -- that is, deliver the right information, to the right user, at the right time so that effective decisions can be made. In this paper, we examine the BADD Phase II architecture and underlying information management technoloyg in the context of data warehousing technology and a data warehouse reference architecture. In particular, we foucs on the BADD segment that PSR is building, the Interface to Information Sources (I2S).
Engineering the ATLAS TAG Browser
NASA Astrophysics Data System (ADS)
Zhang, Qizhi; ATLAS Collaboration
2011-12-01
ELSSI is a web-based event metadata (TAG) browser and event-level selection service for ATLAS. In this paper, we describe some of the challenges encountered in the process of developing ELSSI, and the software engineering strategies adopted to address those challenges. Approaches to management of access to data, browsing, data rendering, query building, query validation, execution, connection management, and communication with auxiliary services are discussed. We also describe strategies for dealing with data that may vary over time, such as run-dependent trigger decision decoding. Along with examples, we illustrate how programming techniques in multiple languages (PHP, JAVASCRIPT, XML, AJAX, and PL/SQL) have been blended to achieve the required results. Finally, we evaluate features of the ELSSI service in terms of functionality, scalability, and performance.
A distributed query execution engine of big attributed graphs.
Batarfi, Omar; Elshawi, Radwa; Fayoumi, Ayman; Barnawi, Ahmed; Sakr, Sherif
2016-01-01
A graph is a popular data model that has become pervasively used for modeling structural relationships between objects. In practice, in many real-world graphs, the graph vertices and edges need to be associated with descriptive attributes. Such type of graphs are referred to as attributed graphs. G-SPARQL has been proposed as an expressive language, with a centralized execution engine, for querying attributed graphs. G-SPARQL supports various types of graph querying operations including reachability, pattern matching and shortest path where any G-SPARQL query may include value-based predicates on the descriptive information (attributes) of the graph edges/vertices in addition to the structural predicates. In general, a main limitation of centralized systems is that their vertical scalability is always restricted by the physical limits of computer systems. This article describes the design, implementation in addition to the performance evaluation of DG-SPARQL, a distributed, hybrid and adaptive parallel execution engine of G-SPARQL queries. In this engine, the topology of the graph is distributed over the main memory of the underlying nodes while the graph data are maintained in a relational store which is replicated on the disk of each of the underlying nodes. DG-SPARQL evaluates parts of the query plan via SQL queries which are pushed to the underlying relational stores while other parts of the query plan, as necessary, are evaluated via indexless memory-based graph traversal algorithms. Our experimental evaluation shows the efficiency and the scalability of DG-SPARQL on querying massive attributed graph datasets in addition to its ability to outperform the performance of Apache Giraph, a popular distributed graph processing system, by orders of magnitudes.
Automated rule-base creation via CLIPS-Induce
NASA Technical Reports Server (NTRS)
Murphy, Patrick M.
1994-01-01
Many CLIPS rule-bases contain one or more rule groups that perform classification. In this paper we describe CLIPS-Induce, an automated system for the creation of a CLIPS classification rule-base from a set of test cases. CLIPS-Induce consists of two components, a decision tree induction component and a CLIPS production extraction component. ID3, a popular decision tree induction algorithm, is used to induce a decision tree from the test cases. CLIPS production extraction is accomplished through a top-down traversal of the decision tree. Nodes of the tree are used to construct query rules, and branches of the tree are used to construct classification rules. The learned CLIPS productions may easily be incorporated into a large CLIPS system that perform tasks such as accessing a database or displaying information.
Regular paths in SparQL: querying the NCI Thesaurus.
Detwiler, Landon T; Suciu, Dan; Brinkley, James F
2008-11-06
OWL, the Web Ontology Language, provides syntax and semantics for representing knowledge for the semantic web. Many of the constructs of OWL have a basis in the field of description logics. While the formal underpinnings of description logics have lead to a highly computable language, it has come at a cognitive cost. OWL ontologies are often unintuitive to readers lacking a strong logic background. In this work we describe GLEEN, a regular path expression library, which extends the RDF query language SparQL to support complex path expressions over OWL and other RDF-based ontologies. We illustrate the utility of GLEEN by showing how it can be used in a query-based approach to defining simpler, more intuitive views of OWL ontologies. In particular we show how relatively simple GLEEN-enhanced SparQL queries can create views of the OWL version of the NCI Thesaurus that match the views generated by the web-based NCI browser.
Sundvall, Erik; Wei-Kleiner, Fang; Freire, Sergio M; Lambrix, Patrick
2017-01-01
Archetype-based Electronic Health Record (EHR) systems using generic reference models from e.g. openEHR, ISO 13606 or CIMI should be easy to update and reconfigure with new types (or versions) of data models or entries, ideally with very limited programming or manual database tweaking. Exploratory research (e.g. epidemiology) leading to ad-hoc querying on a population-wide scale can be a challenge in such environments. This publication describes implementation and test of an archetype-aware Dewey encoding optimization that can be used to produce such systems in environments supporting relational operations, e.g. RDBMs and distributed map-reduce frameworks like Hadoop. Initial testing was done using a nine-node 2.2 GHz quad-core Hadoop cluster querying a dataset consisting of targeted extracts from 4+ million real patient EHRs, query results with sub-minute response time were obtained.
Optimization of Extended Relational Database Systems
1986-07-23
control functions are integrated into a single system in a homogeneoua way. As a first exam - ple, consider previous work in supporting various semantic...sizes are reduced and, wnk? quently, the number of materializations that will be needed is aba lower. For exam - pie, in the above query tuple...retrieve (EMP.name) where EMP hobbies instrument = ’ violin ’ When the various entries in the hobbies field are materialized, only those queries that
Entity Bases: Large-Scale Knowledgebases for Intelligence Data
2009-02-01
declaratively expressed as Datalog rules . The EntityBase supports two query scenarios: • Free-Form Querying: A human analyst or a client program can pose...integration, Prometheus follows the Inverse Rules algo- rithm (Duschka 1997) with additional optimizations (Thakkar et al. 2005). We use the mediator...Discovery and Data Mining (PAKDD), Sydney, Australia. Crammer , K., Dekel, O., Keshet, J., Shalev-Shwartz, S., and Singer, Y. (2006). Online passive
Secure Database Management Study.
1978-12-01
covers cases Involving indus- trial economics (e.g., Industrial spies) and commercial finances (e.g., fraud). Priv¢j--Protection of date about people...California, Berke - lay [STONM76aI. * The approach to protection taken in INGRE (STOM74| has attracted a lot of Interest* Queries, in a high level query...Material Command Support Activity (NMCSA), and another DoD agency, Cullinane Corporation developed a prototype version of the IDS database system on a
Ong, Edison; Xiang, Zuoshuang; Zhao, Bin; Liu, Yue; Lin, Yu; Zheng, Jie; Mungall, Chris; Courtot, Mélanie; Ruttenberg, Alan; He, Yongqun
2017-01-01
Linked Data (LD) aims to achieve interconnected data by representing entities using Unified Resource Identifiers (URIs), and sharing information using Resource Description Frameworks (RDFs) and HTTP. Ontologies, which logically represent entities and relations in specific domains, are the basis of LD. Ontobee (http://www.ontobee.org/) is a linked ontology data server that stores ontology information using RDF triple store technology and supports query, visualization and linkage of ontology terms. Ontobee is also the default linked data server for publishing and browsing biomedical ontologies in the Open Biological Ontology (OBO) Foundry (http://obofoundry.org) library. Ontobee currently hosts more than 180 ontologies (including 131 OBO Foundry Library ontologies) with over four million terms. Ontobee provides a user-friendly web interface for querying and visualizing the details and hierarchy of a specific ontology term. Using the eXtensible Stylesheet Language Transformation (XSLT) technology, Ontobee is able to dereference a single ontology term URI, and then output RDF/eXtensible Markup Language (XML) for computer processing or display the HTML information on a web browser for human users. Statistics and detailed information are generated and displayed for each ontology listed in Ontobee. In addition, a SPARQL web interface is provided for custom advanced SPARQL queries of one or multiple ontologies. PMID:27733503
78 FR 71660 - Zizhuang Li, M.D.; Decision and Order
Federal Register 2010, 2011, 2012, 2013, 2014
2013-11-29
... Show Cause Order proposed the denial of Applicant's application for a DEA Certificate of Registration... interest.'' Id. at 1 (citing 21 U.S.C. 823(f)). As basis for the denial, the Show Cause Order alleged that... Government queried the Postal Service's Track and Confirm Web page and determined that the mailing had not...
Stream-Dashboard: A Big Data Stream Clustering Framework with Applications to Social Media Streams
ERIC Educational Resources Information Center
Hawwash, Basheer
2013-01-01
Data mining is concerned with detecting patterns of data in raw datasets, which are then used to unearth knowledge that might not have been discovered using conventional querying or statistical methods. This discovered knowledge has been used to empower decision makers in countless applications spanning across many multi-disciplinary areas…
Targeting and Structuring Information Resource Use: A Path toward Informed Clinical Decisions
ERIC Educational Resources Information Center
Mangrulkar, Rajesh S.
2004-01-01
A core skill for all physicians to master is that of information manager. Despite a rapidly expanding set of electronic and print-based information resources, clinicians continue to answer their clinical queries predominantly through informal or formal consultation. Even as new tools are brought to market, the majority of them present information…
Semantic Metadata for Heterogeneous Spatial Planning Documents
NASA Astrophysics Data System (ADS)
Iwaniak, A.; Kaczmarek, I.; Łukowicz, J.; Strzelecki, M.; Coetzee, S.; Paluszyński, W.
2016-09-01
Spatial planning documents contain information about the principles and rights of land use in different zones of a local authority. They are the basis for administrative decision making in support of sustainable development. In Poland these documents are published on the Web according to a prescribed non-extendable XML schema, designed for optimum presentation to humans in HTML web pages. There is no document standard, and limited functionality exists for adding references to external resources. The text in these documents is discoverable and searchable by general-purpose web search engines, but the semantics of the content cannot be discovered or queried. The spatial information in these documents is geographically referenced but not machine-readable. Major manual efforts are required to integrate such heterogeneous spatial planning documents from various local authorities for analysis, scenario planning and decision support. This article presents results of an implementation using machine-readable semantic metadata to identify relationships among regulations in the text, spatial objects in the drawings and links to external resources. A spatial planning ontology was used to annotate different sections of spatial planning documents with semantic metadata in the Resource Description Framework in Attributes (RDFa). The semantic interpretation of the content, links between document elements and links to external resources were embedded in XHTML pages. An example and use case from the spatial planning domain in Poland is presented to evaluate its efficiency and applicability. The solution enables the automated integration of spatial planning documents from multiple local authorities to assist decision makers with understanding and interpreting spatial planning information. The approach is equally applicable to legal documents from other countries and domains, such as cultural heritage and environmental management.
Enabling complex queries to drug information sources through functional composition.
Peters, Lee; Mortensen, Jonathan; Nguyen, Thang; Bodenreider, Olivier
2013-01-01
Our objective was to enable an end-user to create complex queries to drug information sources through functional composition, by creating sequences of functions from application program interfaces (API) to drug terminologies. The development of a functional composition model seeks to link functions from two distinct APIs. An ontology was developed using Protégé to model the functions of the RxNorm and NDF-RT APIs by describing the semantics of their input and output. A set of rules were developed to define the interoperable conditions for functional composition. The operational definition of interoperability between function pairs is established by executing the rules on the ontology. We illustrate that the functional composition model supports common use cases, including checking interactions for RxNorm drugs and deploying allergy lists defined in reference to drug properties in NDF-RT. This model supports the RxMix application (http://mor.nlm.nih.gov/RxMix/), an application we developed for enabling complex queries to the RxNorm and NDF-RT APIs.
The development of an EDSS: Lessons learned and implications for DSS research
El-Gayar, O.; Deokar, A.; Michels, L.; Fosnight, G.
2011-01-01
The Solar and Wind Energy Resource Assessment (SWERA) project is focused on providing renewable energy (RE) planning resources to the public. Examples include wind, solar, and hydro assessments. SWERA DSS consists of three major components. First, SWERA 'Product Archive' provides for a discovery DSS upon which users can find and access renewable energy data and supporting models. Second, the 'Renewable Resource EXplorer' (RREX) component serves as a web-based, GIS analysis tool for viewing RE resource data available through the SWERA Product Archive. Third, the SWERA web service provides computational access to the data available in the SWERA spatial database through a location based query, and is also utilized in the RREX component. We provide a discussion of various design decisions used in the construction of this EDSS, followed by project experiences and implications for EDSS and broader DSS research. ?? 2011 IEEE.
Kumar, Kris; Lotun, Kapildeo
2018-05-07
Out of hospital cardiac arrest management of patients with non-ST myocardial infarction per current American Heart Association and European Resuscitation Council guidelines leave the decision in regard to early angiography up to the physician operators. Guidelines are clear on the positive impact of early intervention on survival and improvement on left ventricular function in patients presenting with cardiac arrest and ST elevation myocardial infarction on electrocardiogram. This review aims to analyze the data that current guidelines are based upon in regards to out of hospital cardiac arrest with electrocardiogram findings of non-ST elevation myocardial infarction as well as other clinical trials that support early angiography and reperfusion strategies as well as future studies that are in trial to study the role of the coronary catheterization laboratory in cardiac arrest. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
Standards-Based Procedural Phenotyping: The Arden Syntax on i2b2.
Mate, Sebastian; Castellanos, Ixchel; Ganslandt, Thomas; Prokosch, Hans-Ulrich; Kraus, Stefan
2017-01-01
Phenotyping, or the identification of patient cohorts, is a recurring challenge in medical informatics. While there are open source tools such as i2b2 that address this problem by providing user-friendly querying interfaces, these platforms lack semantic expressiveness to model complex phenotyping algorithms. The Arden Syntax provides procedural programming language construct, designed specifically for medical decision support and knowledge transfer. In this work, we investigate how language constructs of the Arden Syntax can be used for generic phenotyping. We implemented a prototypical tool to integrate i2b2 with an open source Arden execution environment. To demonstrate the applicability of our approach, we used the tool together with an Arden-based phenotyping algorithm to derive statistics about ICU-acquired hypernatremia. Finally, we discuss how the combination of i2b2's user-friendly cohort pre-selection and Arden's procedural expressiveness could benefit phenotyping.
Benchmarking Using Basic DBMS Operations
NASA Astrophysics Data System (ADS)
Crolotte, Alain; Ghazal, Ahmad
The TPC-H benchmark proved to be successful in the decision support area. Many commercial database vendors and their related hardware vendors used these benchmarks to show the superiority and competitive edge of their products. However, over time, the TPC-H became less representative of industry trends as vendors keep tuning their database to this benchmark-specific workload. In this paper, we present XMarq, a simple benchmark framework that can be used to compare various software/hardware combinations. Our benchmark model is currently composed of 25 queries that measure the performance of basic operations such as scans, aggregations, joins and index access. This benchmark model is based on the TPC-H data model due to its maturity and well-understood data generation capability. We also propose metrics to evaluate single-system performance and compare two systems. Finally we illustrate the effectiveness of this model by showing experimental results comparing two systems under different conditions.
A RESTful image gateway for multiple medical image repositories.
Valente, Frederico; Viana-Ferreira, Carlos; Costa, Carlos; Oliveira, José Luis
2012-05-01
Mobile technologies are increasingly important components in telemedicine systems and are becoming powerful decision support tools. Universal access to data may already be achieved by resorting to the latest generation of tablet devices and smartphones. However, the protocols employed for communicating with image repositories are not suited to exchange data with mobile devices. In this paper, we present an extensible approach to solving the problem of querying and delivering data in a format that is suitable for the bandwidth and graphic capacities of mobile devices. We describe a three-tiered component-based gateway that acts as an intermediary between medical applications and a number of Picture Archiving and Communication Systems (PACS). The interface with the gateway is accomplished using Hypertext Transfer Protocol (HTTP) requests following a Representational State Transfer (REST) methodology, which relieves developers from dealing with complex medical imaging protocols and allows the processing of data on the server side.
Integration of Ancillary Data for Improved Clinical Use: A Prototype within the VA's DHCP
Andrews, Robert D.
1989-01-01
The Department of Veterans Affairs' (VA) Decentralized Hospital Computer Program (DHCP) is composed of several clinical modules that provide for the clinical information needs of their respective ancillary services. Using information from multiple ancillary packages is sometimes cumbersome. A prototype is being developed aimed at integrating ancillary data by storing clinical data oriented to the patient so that there is easy interaction of data from multiple services. A set of program utilities provide for user-defined functions of reporting, queries, entry, and decision support. Information can be used to monitor quality of care by providing feedback in the form of reports, reminders, and bulletins. Initial testing has indicated the prototype's design and implementation are feasible (in terms of space requirements, speed, and ease of use) in both outpatient and inpatient environments. The design and development of this prototype are described.
Pillai, Satish K.; Beekmann, Susan E.; Babcock, Hilary M.; Pavia, Andrew T.; Koonin, Lisa M.; Polgreen, Philip M.
2015-01-01
While influenza transmission is thought to occur primarily by droplet spread, the role of airborne spread remains uncertain. Understanding the beliefs and attitudes of infectious disease physicians regarding influenza transmission and respiratory and barrier protection preferences can provide insights into workplace decisions regarding respiratory protection planning. Physicians participating in the Infectious Diseases Society of America’s Emerging Infections Network were queried in November 2013 to determine beliefs and attitudes on influenza transmission. A subset of physicians involved in their facility’s respiratory protection decision making were queried about respirator and surgical mask choices under various pandemic scenarios; availability of, and challenges associated with, respirators in their facility; and protective strategies during disposable N95 shortages. The majority of 686 respondents (98%) believed influenza transmission occurs frequently or occasionally via droplets; 44% of respondents believed transmission occurs via small particles frequently (12%) or occasionally (32%). Among the subset of respondents involved in respiratory protection planning at their facility, over 90% preferred surgical masks during provision of non-aerosol-generating patient care for seasonal influenza. However, for the same type of care during an influenza pandemic, two-thirds of respondents opted for disposable N95 filtering facepiece respirators. In settings where filtering facepiece (disposable) N95 respirators were in short supply, preferred conservation strategies included extended use and reuse of disposable N95s. Use of reusable (elastomeric facepiece) respirator types was viewed less favorably. While respondents identified droplets as the primary mode of influenza transmission, during a high-severity pandemic scenario there was increased support for devices that reduced aerosol-based transmission. Use of potentially less familiar respirator types may partially relieve shortages of disposable N95s but also may require significant education efforts so that clinicians are aware of the characteristics of alternative personal protective equipment. PMID:26173092
Development of a web-based video management and application processing system
NASA Astrophysics Data System (ADS)
Chan, Shermann S.; Wu, Yi; Li, Qing; Zhuang, Yueting
2001-07-01
How to facilitate efficient video manipulation and access in a web-based environment is becoming a popular trend for video applications. In this paper, we present a web-oriented video management and application processing system, based on our previous work on multimedia database and content-based retrieval. In particular, we extend the VideoMAP architecture with specific web-oriented mechanisms, which include: (1) Concurrency control facilities for the editing of video data among different types of users, such as Video Administrator, Video Producer, Video Editor, and Video Query Client; different users are assigned various priority levels for different operations on the database. (2) Versatile video retrieval mechanism which employs a hybrid approach by integrating a query-based (database) mechanism with content- based retrieval (CBR) functions; its specific language (CAROL/ST with CBR) supports spatio-temporal semantics of video objects, and also offers an improved mechanism to describe visual content of videos by content-based analysis method. (3) Query profiling database which records the `histories' of various clients' query activities; such profiles can be used to provide the default query template when a similar query is encountered by the same kind of users. An experimental prototype system is being developed based on the existing VideoMAP prototype system, using Java and VC++ on the PC platform.
Design of a lattice-based faceted classification system
NASA Technical Reports Server (NTRS)
Eichmann, David A.; Atkins, John
1992-01-01
We describe a software reuse architecture supporting component retrieval by facet classes. The facets are organized into a lattice of facet sets and facet n-tuples. The query mechanism supports precise retrieval and flexible browsing.
Li, Ya-pin; Fang, Li-qun; Gao, Su-qing; Wang, Zhen; Gao, Hong-wei; Liu, Peng; Wang, Ze-Rui; Li, Yan-Li; Zhu, Xu-Guang; Li, Xin-Lou; Xu, Bo; Li, Yin-Jun; Yang, Hong; de Vlas, Sake J; Shi, Tao-Xing; Cao, Wu-Chun
2013-01-01
For years, emerging infectious diseases have appeared worldwide and threatened the health of people. The emergence and spread of an infectious-disease outbreak are usually unforeseen, and have the features of suddenness and uncertainty. Timely understanding of basic information in the field, and the collection and analysis of epidemiological information, is helpful in making rapid decisions and responding to an infectious-disease emergency. Therefore, it is necessary to have an unobstructed channel and convenient tool for the collection and analysis of epidemiologic information in the field. Baseline information for each county in mainland China was collected and a database was established by geo-coding information on a digital map of county boundaries throughout the country. Google Maps was used to display geographic information and to conduct calculations related to maps, and the 3G wireless network was used to transmit information collected in the field to the server. This study established a decision support system for the response to infectious-disease emergencies based on WebGIS and mobile services (DSSRIDE). The DSSRIDE provides functions including data collection, communication and analyses in real time, epidemiological detection, the provision of customized epidemiological questionnaires and guides for handling infectious disease emergencies, and the querying of professional knowledge in the field. These functions of the DSSRIDE could be helpful for epidemiological investigations in the field and the handling of infectious-disease emergencies. The DSSRIDE provides a geographic information platform based on the Google Maps application programming interface to display information of infectious disease emergencies, and transfers information between workers in the field and decision makers through wireless transmission based on personal computers, mobile phones and personal digital assistants. After a 2-year practice and application in infectious disease emergencies, the DSSRIDE is becoming a useful platform and is a useful tool for investigations in the field carried out by response sections and individuals. The system is suitable for use in developing countries and low-income districts.
A semantic proteomics dashboard (SemPoD) for data management in translational research.
Jayapandian, Catherine P; Zhao, Meng; Ewing, Rob M; Zhang, Guo-Qiang; Sahoo, Satya S
2012-01-01
One of the primary challenges in translational research data management is breaking down the barriers between the multiple data silos and the integration of 'omics data with clinical information to complete the cycle from the bench to the bedside. The role of contextual metadata, also called provenance information, is a key factor ineffective data integration, reproducibility of results, correct attribution of original source, and answering research queries involving "What", "Where", "When", "Which", "Who", "How", and "Why" (also known as the W7 model). But, at present there is limited or no effective approach to managing and leveraging provenance information for integrating data across studies or projects. Hence, there is an urgent need for a paradigm shift in creating a "provenance-aware" informatics platform to address this challenge. We introduce an ontology-driven, intuitive Semantic Proteomics Dashboard (SemPoD) that uses provenance together with domain information (semantic provenance) to enable researchers to query, compare, and correlate different types of data across multiple projects, and allow integration with legacy data to support their ongoing research. The SemPoD platform, currently in use at the Case Center for Proteomics and Bioinformatics (CPB), consists of three components: (a) Ontology-driven Visual Query Composer, (b) Result Explorer, and (c) Query Manager. Currently, SemPoD allows provenance-aware querying of 1153 mass-spectrometry experiments from 20 different projects. SemPod uses the systems molecular biology provenance ontology (SysPro) to support a dynamic query composition interface, which automatically updates the components of the query interface based on previous user selections and efficiently prunes the result set usinga "smart filtering" approach. The SysPro ontology re-uses terms from the PROV-ontology (PROV-O) being developed by the World Wide Web Consortium (W3C) provenance working group, the minimum information required for reporting a molecular interaction experiment (MIMIx), and the minimum information about a proteomics experiment (MIAPE) guidelines. The SemPoD was evaluated both in terms of user feedback and as scalability of the system. SemPoD is an intuitive and powerful provenance ontology-driven data access and query platform that uses the MIAPE and MIMIx metadata guideline to create an integrated view over large-scale systems molecular biology datasets. SemPoD leverages the SysPro ontology to create an intuitive dashboard for biologists to compose queries, explore the results, and use a query manager for storing queries for later use. SemPoD can be deployed over many existing database applications storing 'omics data, including, as illustrated here, the LabKey data-management system. The initial user feedback evaluating the usability and functionality of SemPoD has been very positive and it is being considered for wider deployment beyond the proteomics domain, and in other 'omics' centers.
STBase: One Million Species Trees for Comparative Biology
McMahon, Michelle M.; Deepak, Akshay; Fernández-Baca, David; Boss, Darren; Sanderson, Michael J.
2015-01-01
Comprehensively sampled phylogenetic trees provide the most compelling foundations for strong inferences in comparative evolutionary biology. Mismatches are common, however, between the taxa for which comparative data are available and the taxa sampled by published phylogenetic analyses. Moreover, many published phylogenies are gene trees, which cannot always be adapted immediately for species level comparisons because of discordance, gene duplication, and other confounding biological processes. A new database, STBase, lets comparative biologists quickly retrieve species level phylogenetic hypotheses in response to a query list of species names. The database consists of 1 million single- and multi-locus data sets, each with a confidence set of 1000 putative species trees, computed from GenBank sequence data for 413,000 eukaryotic taxa. Two bodies of theoretical work are leveraged to aid in the assembly of multi-locus concatenated data sets for species tree construction. First, multiply labeled gene trees are pruned to conflict-free singly-labeled species-level trees that can be combined between loci. Second, impacts of missing data in multi-locus data sets are ameliorated by assembling only decisive data sets. Data sets overlapping with the user’s query are ranked using a scheme that depends on user-provided weights for tree quality and for taxonomic overlap of the tree with the query. Retrieval times are independent of the size of the database, typically a few seconds. Tree quality is assessed by a real-time evaluation of bootstrap support on just the overlapping subtree. Associated sequence alignments, tree files and metadata can be downloaded for subsequent analysis. STBase provides a tool for comparative biologists interested in exploiting the most relevant sequence data available for the taxa of interest. It may also serve as a prototype for future species tree oriented databases and as a resource for assembly of larger species phylogenies from precomputed trees. PMID:25679219
Query-oriented evidence extraction to support evidence-based medicine practice.
Sarker, Abeed; Mollá, Diego; Paris, Cecile
2016-02-01
Evidence-based medicine practice requires medical practitioners to rely on the best available evidence, in addition to their expertise, when making clinical decisions. The medical domain boasts a large amount of published medical research data, indexed in various medical databases such as MEDLINE. As the size of this data grows, practitioners increasingly face the problem of information overload, and past research has established the time-associated obstacles faced by evidence-based medicine practitioners. In this paper, we focus on the problem of automatic text summarisation to help practitioners quickly find query-focused information from relevant documents. We utilise an annotated corpus that is specialised for the task of evidence-based summarisation of text. In contrast to past summarisation approaches, which mostly rely on surface level features to identify salient pieces of texts that form the summaries, our approach focuses on the use of corpus-based statistics, and domain-specific lexical knowledge for the identification of summary contents. We also apply a target-sentence-specific summarisation technique that reduces the problem of underfitting that persists in generic summarisation models. In automatic evaluations run over a large number of annotated summaries, our extractive summarisation technique statistically outperforms various baseline and benchmark summarisation models with a percentile rank of 96.8%. A manual evaluation shows that our extractive summarisation approach is capable of selecting content with high recall and precision, and may thus be used to generate bottom-line answers to practitioners' queries. Our research shows that the incorporation of specialised data and domain-specific knowledge can significantly improve text summarisation performance in the medical domain. Due to the vast amounts of medical text available, and the high growth of this form of data, we suspect that such summarisation techniques will address the time-related obstacles associated with evidence-based medicine. Copyright © 2015 Elsevier Inc. All rights reserved.
Association between Search Behaviors and Disease Prevalence Rates at 18 U.S. Children's Hospitals.
Daniel, Dennis; Wolbrink, Traci; Logvinenko, Tanya; Harper, Marvin; Burns, Jeffrey
2017-10-01
Background Usage of online resources by clinicians in training and practice can provide insight into knowledge gaps and inform development of decision support tools. Although online information seeking is often driven by encountered patient problems, the relationship between disease prevalence and search rate has not been previously characterized. Objective This article aimed to (1) identify topics frequently searched by pediatric clinicians using UpToDate (http://www.uptodate.com) and (2) explore the association between disease prevalence rate and search rate using data from the Pediatric Health Information System. Methods We identified the most common search queries and resources most frequently accessed on UpToDate for a cohort of 18 children's hospitals during calendar year 2012. We selected 64 of the most frequently searched diseases and matched ICD-9 data from the PHIS database during the same time period. Using linear regression, we explored the relationship between clinician query rate and disease prevalence rate. Results The hospital cohort submitted 1,228,138 search queries across 592,454 sessions. The majority of search sessions focused on a single search topic. We identified no consistent overall association between disease prevalence and search rates. Diseases where search rate was substantially higher than prevalence rate were often infectious or immune/rheumatologic conditions, involved potentially complex diagnosis or management, and carried risk of significant morbidity or mortality. None of the examined diseases showed a decrease in search rate associated with increased disease prevalence rates. Conclusion This is one of the first medical learning needs assessments to use large-scale, multisite data to identify topics of interest to pediatric clinicians, and to examine the relationship between disease prevalence and search rate for a set of pediatric diseases. Overall, disease search rate did not appear to be associated with hospital disease prevalence rates based on ICD-9 codes. However, some diseases were consistently searched at a higher rate than their prevalence rate; many of these diseases shared common features.
muBLASTP: database-indexed protein sequence search on multicore CPUs.
Zhang, Jing; Misra, Sanchit; Wang, Hao; Feng, Wu-Chun
2016-11-04
The Basic Local Alignment Search Tool (BLAST) is a fundamental program in the life sciences that searches databases for sequences that are most similar to a query sequence. Currently, the BLAST algorithm utilizes a query-indexed approach. Although many approaches suggest that sequence search with a database index can achieve much higher throughput (e.g., BLAT, SSAHA, and CAFE), they cannot deliver the same level of sensitivity as the query-indexed BLAST, i.e., NCBI BLAST, or they can only support nucleotide sequence search, e.g., MegaBLAST. Due to different challenges and characteristics between query indexing and database indexing, the existing techniques for query-indexed search cannot be used into database indexed search. muBLASTP, a novel database-indexed BLAST for protein sequence search, delivers identical hits returned to NCBI BLAST. On Intel Haswell multicore CPUs, for a single query, the single-threaded muBLASTP achieves up to a 4.41-fold speedup for alignment stages, and up to a 1.75-fold end-to-end speedup over single-threaded NCBI BLAST. For a batch of queries, the multithreaded muBLASTP achieves up to a 5.7-fold speedups for alignment stages, and up to a 4.56-fold end-to-end speedup over multithreaded NCBI BLAST. With a newly designed index structure for protein database and associated optimizations in BLASTP algorithm, we re-factored BLASTP algorithm for modern multicore processors that achieves much higher throughput with acceptable memory footprint for the database index.
An ontology-driven tool for structured data acquisition using Web forms.
Gonçalves, Rafael S; Tu, Samson W; Nyulas, Csongor I; Tierney, Michael J; Musen, Mark A
2017-08-01
Structured data acquisition is a common task that is widely performed in biomedicine. However, current solutions for this task are far from providing a means to structure data in such a way that it can be automatically employed in decision making (e.g., in our example application domain of clinical functional assessment, for determining eligibility for disability benefits) based on conclusions derived from acquired data (e.g., assessment of impaired motor function). To use data in these settings, we need it structured in a way that can be exploited by automated reasoning systems, for instance, in the Web Ontology Language (OWL); the de facto ontology language for the Web. We tackle the problem of generating Web-based assessment forms from OWL ontologies, and aggregating input gathered through these forms as an ontology of "semantically-enriched" form data that can be queried using an RDF query language, such as SPARQL. We developed an ontology-based structured data acquisition system, which we present through its specific application to the clinical functional assessment domain. We found that data gathered through our system is highly amenable to automatic analysis using queries. We demonstrated how ontologies can be used to help structuring Web-based forms and to semantically enrich the data elements of the acquired structured data. The ontologies associated with the enriched data elements enable automated inferences and provide a rich vocabulary for performing queries.
Comprehensive Optimal Manpower and Personnel Analytic Simulation System (COMPASS)
2009-10-01
4 The EDB consists of 4 major components (some of which are re-usable): 1. Metadata Editor ( MDE ): Also considered a leaf node, the metadata...end-user queries via the QB. The EDB supports multiple instances of the MDE , although currently, only a single instance is recommended. 2 Query...the MSB is a central collection of web services, responsible for the authentication and authorization of users, maintenance of the EDB metadata
User centered and ontology based information retrieval system for life sciences.
Sy, Mohameth-François; Ranwez, Sylvie; Montmain, Jacky; Regnault, Armelle; Crampes, Michel; Ranwez, Vincent
2012-01-25
Because of the increasing number of electronic resources, designing efficient tools to retrieve and exploit them is a major challenge. Some improvements have been offered by semantic Web technologies and applications based on domain ontologies. In life science, for instance, the Gene Ontology is widely exploited in genomic applications and the Medical Subject Headings is the basis of biomedical publications indexation and information retrieval process proposed by PubMed. However current search engines suffer from two main drawbacks: there is limited user interaction with the list of retrieved resources and no explanation for their adequacy to the query is provided. Users may thus be confused by the selection and have no idea on how to adapt their queries so that the results match their expectations. This paper describes an information retrieval system that relies on domain ontology to widen the set of relevant documents that is retrieved and that uses a graphical rendering of query results to favor user interactions. Semantic proximities between ontology concepts and aggregating models are used to assess documents adequacy with respect to a query. The selection of documents is displayed in a semantic map to provide graphical indications that make explicit to what extent they match the user's query; this man/machine interface favors a more interactive and iterative exploration of data corpus, by facilitating query concepts weighting and visual explanation. We illustrate the benefit of using this information retrieval system on two case studies one of which aiming at collecting human genes related to transcription factors involved in hemopoiesis pathway. The ontology based information retrieval system described in this paper (OBIRS) is freely available at: http://www.ontotoolkit.mines-ales.fr/ObirsClient/. This environment is a first step towards a user centred application in which the system enlightens relevant information to provide decision help.
User centered and ontology based information retrieval system for life sciences
2012-01-01
Background Because of the increasing number of electronic resources, designing efficient tools to retrieve and exploit them is a major challenge. Some improvements have been offered by semantic Web technologies and applications based on domain ontologies. In life science, for instance, the Gene Ontology is widely exploited in genomic applications and the Medical Subject Headings is the basis of biomedical publications indexation and information retrieval process proposed by PubMed. However current search engines suffer from two main drawbacks: there is limited user interaction with the list of retrieved resources and no explanation for their adequacy to the query is provided. Users may thus be confused by the selection and have no idea on how to adapt their queries so that the results match their expectations. Results This paper describes an information retrieval system that relies on domain ontology to widen the set of relevant documents that is retrieved and that uses a graphical rendering of query results to favor user interactions. Semantic proximities between ontology concepts and aggregating models are used to assess documents adequacy with respect to a query. The selection of documents is displayed in a semantic map to provide graphical indications that make explicit to what extent they match the user's query; this man/machine interface favors a more interactive and iterative exploration of data corpus, by facilitating query concepts weighting and visual explanation. We illustrate the benefit of using this information retrieval system on two case studies one of which aiming at collecting human genes related to transcription factors involved in hemopoiesis pathway. Conclusions The ontology based information retrieval system described in this paper (OBIRS) is freely available at: http://www.ontotoolkit.mines-ales.fr/ObirsClient/. This environment is a first step towards a user centred application in which the system enlightens relevant information to provide decision help. PMID:22373375
NASA Astrophysics Data System (ADS)
Sugumaran, Ramanathan; Meyer, James C.; Davis, Jim
2004-10-01
Local governments often struggle to balance competing demands for residential, commercial and industrial development with imperatives to minimize environmental degradation. In order to effectively manage this development process on a sustainable basis, local planners and government agencies are increasingly seeking better tools and techniques. In this paper, we describe the development of a Web-Based Environmental Decision Support System (WEDSS), which helps to prioritize local watersheds in terms of environmental sensitivity using multiple criteria identified by planners and local government staff in the city of Columbia, and Boone County, Missouri. The development of the system involved three steps, the first was to establish the relevant environmental criteria and develop data layers for each criterion, then a spatial model was developed for analysis, and lastly a Web-based interface with analysis tools was developed using client-server technology. The WEDSS is an example of a way to run spatial models over the Web and represents a significant increase in capability over other WWW-based GIS applications that focus on database querying and map display. The WEDSS seeks to aid in the development of agreement regarding specific local areas deserving increased protection and the public policies to be pursued in minimizing the environmental impact of future development. The tool is also intended to assist ongoing public information and education efforts concerning watershed management and water quality issues for the City of Columbia, Missouri and adjacent developing areas within Boone County, Missouri.
NASA Astrophysics Data System (ADS)
Escarzaga, S. M.; Cody, R. P.; Kassin, A.; Barba, M.; Gaylord, A. G.; Manley, W. F.; Mazza Ramsay, F. D.; Vargas, S. A., Jr.; Tarin, G.; Laney, C. M.; Villarreal, S.; Aiken, Q.; Collins, J. A.; Green, E.; Nelson, L.; Tweedie, C. E.
2015-12-01
The Barrow area of northern Alaska is one of the most intensely researched locations in the Arctic and the Barrow Area Information Database (BAID, www.barrowmapped.org) tracks and facilitates a gamut of research, management, and educational activities in the area. BAID is a cyberinfrastructure (CI) that details much of the historic and extant research undertaken within in the Barrow region in a suite of interactive web-based mapping and information portals (geobrowsers). The BAID user community and target audience for BAID is diverse and includes research scientists, science logisticians, land managers, educators, students, and the general public. BAID contains information on more than 12,000 Barrow area research sites that extend back to the 1940's and more than 640 remote sensing images and geospatial datasets. In a web-based setting, users can zoom, pan, query, measure distance, save or print maps and query results, and filter or view information by space, time, and/or other tags. Additionally, data are described with metadata that meet Federal Geographic Data Committee standards. Recent advances include the addition of more than 2000 new research sites, the addition of a query builder user interface allowing rich and complex queries, and provision of differential global position system (dGPS) and high-resolution aerial imagery support to visiting scientists. Recent field surveys include over 80 miles of coastline to document rates of erosion and the collection of high-resolution sonar data for bathymetric mapping of Elson Lagoon and near shore region of the Chukchi Sea. A network of five climate stations has been deployed across the peninsula to serve as a wireless net for the research community and to deliver near real time climatic data to the user community. Local GIS personal have also been trained to better make use of scientific data for local decision making. Links to Barrow area datasets are housed at national data archives and substantial upgrades have been made to the BAID website and web mapping applications to include the public release of a new multi-temporal Imagery Viewer that allow users to interact with and compare imagery of the Barrow area from 1949 to present.
Examining the Impact of Culture and Human Elements on OLAP Tools Usefulness
ERIC Educational Resources Information Center
Sharoupim, Magdy S.
2010-01-01
The purpose of the present study was to examine the impact of culture and human-related elements on the On-line Analytical Processing (OLAP) usability in generating decision-making information. The use of OLAP technology has evolved rapidly and gained momentum, mainly due to the ability of OLAP tools to examine and query large amounts of data sets…
FAWKES Information Management for Space Situational Awareness
NASA Astrophysics Data System (ADS)
Spetka, S.; Ramseyer, G.; Tucker, S.
2010-09-01
Current space situational awareness assets can be fully utilized by managing their inputs and outputs in real time. Ideally, sensors are tasked to perform specific functions to maximize their effectiveness. Many sensors are capable of collecting more data than is needed for a particular purpose, leading to the potential to enhance a sensor’s utilization by allowing it to be re-tasked in real time when it is determined that sufficient data has been acquired to meet the first task’s requirements. In addition, understanding a situation involving fast-traveling objects in space may require inputs from more than one sensor, leading to a need for information sharing in real time. Observations that are not processed in real time may be archived to support forensic analysis for accidents and for long-term studies. Space Situational Awareness (SSA) requires an extremely robust distributed software platform to appropriately manage the collection and distribution for both real-time decision-making as well as for analysis. FAWKES is being developed as a Joint Space Operations Center (JSPOC) Mission System (JMS) compliant implementation of the AFRL Phoenix information management architecture. It implements a pub/sub/archive/query (PSAQ) approach to communications designed for high performance applications. FAWKES provides an easy to use, reliable interface for structuring parallel processing, and is particularly well suited to the requirements of SSA. In addition to supporting point-to-point communications, it offers an elegant and robust implementation of collective communications, to scatter, gather and reduce values. A query capability is also supported that enhances reliability. Archived messages can be queried to re-create a computation or to selectively retrieve previous publications. PSAQ processes express their role in a computation by subscribing to their inputs and by publishing their results. Sensors on the edge can subscribe to inputs by appropriately authorized users, allowing dynamic tasking capabilities. Previously, the publication of sensor data collected by mobile systems was demonstrated. Thumbnails of infrared imagery that were imaged in real time by an aircraft [1] were published over a grid. This airborne system subscribed to requests for and then published the requested detailed images. In another experiment a system employing video subscriptions [2] drove the analysis of live video streams, resulting in a published stream of processed video output. We are currently implementing an SSA system that uses FAWKES to deliver imagery from telescopes through a pipeline of processing steps that are performed on high performance computers. PSAQ facilitates the decomposition of a problem into components that can be distributed across processing assets from the smallest sensors in space to the largest high performance computing (HPC) centers, as well as the integration and distribution of the results, all in real time. FAWKES supports the real-time latency requirements demanded by all of these applications. It also enhances reliability by easily supporting redundant computation. This study shows how FAWKES/PSAQ is utilized in SSA applications, and presents performance results for latency and throughput that meet these needs.
TBIdoc: 3D content-based CT image retrieval system for traumatic brain injury
NASA Astrophysics Data System (ADS)
Li, Shimiao; Gong, Tianxia; Wang, Jie; Liu, Ruizhe; Tan, Chew Lim; Leong, Tze Yun; Pang, Boon Chuan; Lim, C. C. Tchoyoson; Lee, Cheng Kiang; Tian, Qi; Zhang, Zhuo
2010-03-01
Traumatic brain injury (TBI) is a major cause of death and disability. Computed Tomography (CT) scan is widely used in the diagnosis of TBI. Nowadays, large amount of TBI CT data is stacked in the hospital radiology department. Such data and the associated patient information contain valuable information for clinical diagnosis and outcome prediction. However, current hospital database system does not provide an efficient and intuitive tool for doctors to search out cases relevant to the current study case. In this paper, we present the TBIdoc system: a content-based image retrieval (CBIR) system which works on the TBI CT images. In this web-based system, user can query by uploading CT image slices from one study, retrieval result is a list of TBI cases ranked according to their 3D visual similarity to the query case. Specifically, cases of TBI CT images often present diffuse or focal lesions. In TBIdoc system, these pathological image features are represented as bin-based binary feature vectors. We use the Jaccard-Needham measure as the similarity measurement. Based on these, we propose a 3D similarity measure for computing the similarity score between two series of CT slices. nDCG is used to evaluate the system performance, which shows the system produces satisfactory retrieval results. The system is expected to improve the current hospital data management in TBI and to give better support for the clinical decision-making process. It may also contribute to the computer-aided education in TBI.
Forecasting influenza outbreak dynamics in Melbourne from Internet search query surveillance data.
Moss, Robert; Zarebski, Alexander; Dawson, Peter; McCaw, James M
2016-07-01
Accurate forecasting of seasonal influenza epidemics is of great concern to healthcare providers in temperate climates, as these epidemics vary substantially in their size, timing and duration from year to year, making it a challenge to deliver timely and proportionate responses. Previous studies have shown that Bayesian estimation techniques can accurately predict when an influenza epidemic will peak many weeks in advance, using existing surveillance data, but these methods must be tailored both to the target population and to the surveillance system. Our aim was to evaluate whether forecasts of similar accuracy could be obtained for metropolitan Melbourne (Australia). We used the bootstrap particle filter and a mechanistic infection model to generate epidemic forecasts for metropolitan Melbourne (Australia) from weekly Internet search query surveillance data reported by Google Flu Trends for 2006-14. Optimal observation models were selected from hundreds of candidates using a novel approach that treats forecasts akin to receiver operating characteristic (ROC) curves. We show that the timing of the epidemic peak can be accurately predicted 4-6 weeks in advance, but that the magnitude of the epidemic peak and the overall burden are much harder to predict. We then discuss how the infection and observation models and the filtering process may be refined to improve forecast robustness, thereby improving the utility of these methods for healthcare decision support. © 2016 The Authors. Influenza and Other Respiratory Viruses Published by John Wiley & Sons Ltd.
Rondon Garcia, Luis Miguel; Aguirre Arizala, Blanca Aranzazu; Garcia Garcia, Francisco Jose; Gallego, Carmen Castillo
2017-01-01
Bacground: This article tackles social support as a meta-variable that is reinforced by a set of social variables, which correlate and act as predictors of social welfare and life quality of the older person. The objective of the study is to know how social support, networks and social contacts can influence the health of the elderly person, especially if these are interrelated factors. The population studied are individuals from both sexes living in Toledo (Spanish people) and who were 65 years of age or over. Several scales were applied to assess the frequency of and the degree of satisfaction with perceived social support received from different sources in relation to social support. The co relational analysis showed significant positive associations between scores and measures of and social support, social relations, contact and social networks. We conclude that the support in general is very good, over 90% of people from the sample have someone who would help if needed. Social and health factors are interrelated with social support. Social contact can also be considered as a life quality estimate. He progressive loss of contact over the years is a social factor that affects the quality of life. The meta-analysis we find that social support and the emotional factor, along with social interactions, have powerful effects on preventing morbidity and mortality, which are important social indicators. We conclude that social support based on positive social interactions provides an optimal state of health in the older person. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
Information engineering for molecular diagnostics.
Sorace, J. M.; Ritondo, M.; Canfield, K.
1994-01-01
Clinical laboratories are beginning to apply the recent advances in molecular biology to the testing of patient samples. The emerging field of Molecular Diagnostics will require a new Molecular Diagnostics Laboratory Information System which handles the data types, samples and test methods found in this field. The system must be very flexible in regards to supporting ad-hoc queries. The requirements which are shaping the developments in this field are reviewed and a data model developed. Several queries which demonstrate the data models ability to support the information needs of this area have been developed and run. These results demonstrate the ability of the purposed data model to meet the current and projected needs of this rapidly expanding field. PMID:7949937
EasyKSORD: A Platform of Keyword Search Over Relational Databases
NASA Astrophysics Data System (ADS)
Peng, Zhaohui; Li, Jing; Wang, Shan
Keyword Search Over Relational Databases (KSORD) enables casual users to use keyword queries (a set of keywords) to search relational databases just like searching the Web, without any knowledge of the database schema or any need of writing SQL queries. Based on our previous work, we design and implement a novel KSORD platform named EasyKSORD for users and system administrators to use and manage different KSORD systems in a novel and simple manner. EasyKSORD supports advanced queries, efficient data-graph-based search engines, multiform result presentations, and system logging and analysis. Through EasyKSORD, users can search relational databases easily and read search results conveniently, and system administrators can easily monitor and analyze the operations of KSORD and manage KSORD systems much better.
An intuitive graphical webserver for multiple-choice protein sequence search.
Banky, Daniel; Szalkai, Balazs; Grolmusz, Vince
2014-04-10
Every day tens of thousands of sequence searches and sequence alignment queries are submitted to webservers. The capitalized word "BLAST" becomes a verb, describing the act of performing sequence search and alignment. However, if one needs to search for sequences that contain, for example, two hydrophobic and three polar residues at five given positions, the query formation on the most frequently used webservers will be difficult. Some servers support the formation of queries with regular expressions, but most of the users are unfamiliar with their syntax. Here we present an intuitive, easily applicable webserver, the Protein Sequence Analysis server, that allows the formation of multiple choice queries by simply drawing the residues to their positions; if more than one residue are drawn to the same position, then they will be nicely stacked on the user interface, indicating the multiple choice at the given position. This computer-game-like interface is natural and intuitive, and the coloring of the residues makes possible to form queries requiring not just certain amino acids in the given positions, but also small nonpolar, negatively charged, hydrophobic, positively charged, or polar ones. The webserver is available at http://psa.pitgroup.org. Copyright © 2014 Elsevier B.V. All rights reserved.
Ad-Hoc Queries over Document Collections - A Case Study
NASA Astrophysics Data System (ADS)
Löser, Alexander; Lutter, Steffen; Düssel, Patrick; Markl, Volker
We discuss the novel problem of supporting analytical business intelligence queries over web-based textual content, e.g., BI-style reports based on 100.000's of documents from an ad-hoc web search result. Neither conventional search engines nor conventional Business Intelligence and ETL tools address this problem, which lies at the intersection of their capabilities. "Google Squared" or our system GOOLAP.info, are examples of these kinds of systems. They execute information extraction methods over one or several document collections at query time and integrate extracted records into a common view or tabular structure. Frequent extraction and object resolution failures cause incomplete records which could not be joined into a record answering the query. Our focus is the identification of join-reordering heuristics maximizing the size of complete records answering a structured query. With respect to given costs for document extraction we propose two novel join-operations: The multi-way CJ-operator joins records from multiple relationships extracted from a single document. The two-way join-operator DJ ensures data density by removing incomplete records from results. In a preliminary case study we observe that our join-reordering heuristics positively impact result size, record density and lower execution costs.
Ong, Edison; Xiang, Zuoshuang; Zhao, Bin; Liu, Yue; Lin, Yu; Zheng, Jie; Mungall, Chris; Courtot, Mélanie; Ruttenberg, Alan; He, Yongqun
2017-01-04
Linked Data (LD) aims to achieve interconnected data by representing entities using Unified Resource Identifiers (URIs), and sharing information using Resource Description Frameworks (RDFs) and HTTP. Ontologies, which logically represent entities and relations in specific domains, are the basis of LD. Ontobee (http://www.ontobee.org/) is a linked ontology data server that stores ontology information using RDF triple store technology and supports query, visualization and linkage of ontology terms. Ontobee is also the default linked data server for publishing and browsing biomedical ontologies in the Open Biological Ontology (OBO) Foundry (http://obofoundry.org) library. Ontobee currently hosts more than 180 ontologies (including 131 OBO Foundry Library ontologies) with over four million terms. Ontobee provides a user-friendly web interface for querying and visualizing the details and hierarchy of a specific ontology term. Using the eXtensible Stylesheet Language Transformation (XSLT) technology, Ontobee is able to dereference a single ontology term URI, and then output RDF/eXtensible Markup Language (XML) for computer processing or display the HTML information on a web browser for human users. Statistics and detailed information are generated and displayed for each ontology listed in Ontobee. In addition, a SPARQL web interface is provided for custom advanced SPARQL queries of one or multiple ontologies. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Ontology-Driven Provenance Management in eScience: An Application in Parasite Research
NASA Astrophysics Data System (ADS)
Sahoo, Satya S.; Weatherly, D. Brent; Mutharaju, Raghava; Anantharam, Pramod; Sheth, Amit; Tarleton, Rick L.
Provenance, from the French word "provenir", describes the lineage or history of a data entity. Provenance is critical information in scientific applications to verify experiment process, validate data quality and associate trust values with scientific results. Current industrial scale eScience projects require an end-to-end provenance management infrastructure. This infrastructure needs to be underpinned by formal semantics to enable analysis of large scale provenance information by software applications. Further, effective analysis of provenance information requires well-defined query mechanisms to support complex queries over large datasets. This paper introduces an ontology-driven provenance management infrastructure for biology experiment data, as part of the Semantic Problem Solving Environment (SPSE) for Trypanosoma cruzi (T.cruzi). This provenance infrastructure, called T.cruzi Provenance Management System (PMS), is underpinned by (a) a domain-specific provenance ontology called Parasite Experiment ontology, (b) specialized query operators for provenance analysis, and (c) a provenance query engine. The query engine uses a novel optimization technique based on materialized views called materialized provenance views (MPV) to scale with increasing data size and query complexity. This comprehensive ontology-driven provenance infrastructure not only allows effective tracking and management of ongoing experiments in the Tarleton Research Group at the Center for Tropical and Emerging Global Diseases (CTEGD), but also enables researchers to retrieve the complete provenance information of scientific results for publication in literature.
Heterogeneous database integration in biomedicine.
Sujansky, W
2001-08-01
The rapid expansion of biomedical knowledge, reduction in computing costs, and spread of internet access have created an ocean of electronic data. The decentralized nature of our scientific community and healthcare system, however, has resulted in a patchwork of diverse, or heterogeneous, database implementations, making access to and aggregation of data across databases very difficult. The database heterogeneity problem applies equally to clinical data describing individual patients and biological data characterizing our genome. Specifically, databases are highly heterogeneous with respect to the data models they employ, the data schemas they specify, the query languages they support, and the terminologies they recognize. Heterogeneous database systems attempt to unify disparate databases by providing uniform conceptual schemas that resolve representational heterogeneities, and by providing querying capabilities that aggregate and integrate distributed data. Research in this area has applied a variety of database and knowledge-based techniques, including semantic data modeling, ontology definition, query translation, query optimization, and terminology mapping. Existing systems have addressed heterogeneous database integration in the realms of molecular biology, hospital information systems, and application portability.
An interactive system for computer-aided diagnosis of breast masses.
Wang, Xingwei; Li, Lihua; Liu, Wei; Xu, Weidong; Lederman, Dror; Zheng, Bin
2012-10-01
Although mammography is the only clinically accepted imaging modality for screening the general population to detect breast cancer, interpreting mammograms is difficult with lower sensitivity and specificity. To provide radiologists "a visual aid" in interpreting mammograms, we developed and tested an interactive system for computer-aided detection and diagnosis (CAD) of mass-like cancers. Using this system, an observer can view CAD-cued mass regions depicted on one image and then query any suspicious regions (either cued or not cued by CAD). CAD scheme automatically segments the suspicious region or accepts manually defined region and computes a set of image features. Using content-based image retrieval (CBIR) algorithm, CAD searches for a set of reference images depicting "abnormalities" similar to the queried region. Based on image retrieval results and a decision algorithm, a classification score is assigned to the queried region. In this study, a reference database with 1,800 malignant mass regions and 1,800 benign and CAD-generated false-positive regions was used. A modified CBIR algorithm with a new function of stretching the attributes in the multi-dimensional space and decision scheme was optimized using a genetic algorithm. Using a leave-one-out testing method to classify suspicious mass regions, we compared the classification performance using two CBIR algorithms with either equally weighted or optimally stretched attributes. Using the modified CBIR algorithm, the area under receiver operating characteristic curve was significantly increased from 0.865 ± 0.006 to 0.897 ± 0.005 (p < 0.001). This study demonstrated the feasibility of developing an interactive CAD system with a large reference database and achieving improved performance.
An infrastructure for ontology-based information systems in biomedicine: RICORDO case study.
Wimalaratne, Sarala M; Grenon, Pierre; Hoehndorf, Robert; Gkoutos, Georgios V; de Bono, Bernard
2012-02-01
The article presents an infrastructure for supporting the semantic interoperability of biomedical resources based on the management (storing and inference-based querying) of their ontology-based annotations. This infrastructure consists of: (i) a repository to store and query ontology-based annotations; (ii) a knowledge base server with an inference engine to support the storage of and reasoning over ontologies used in the annotation of resources; (iii) a set of applications and services allowing interaction with the integrated repository and knowledge base. The infrastructure is being prototyped and developed and evaluated by the RICORDO project in support of the knowledge management of biomedical resources, including physiology and pharmacology models and associated clinical data. The RICORDO toolkit and its source code are freely available from http://ricordo.eu/relevant-resources. sarala@ebi.ac.uk.
Agile Datacube Analytics (not just) for the Earth Sciences
NASA Astrophysics Data System (ADS)
Misev, Dimitar; Merticariu, Vlad; Baumann, Peter
2017-04-01
Metadata are considered small, smart, and queryable; data, on the other hand, are known as big, clumsy, hard to analyze. Consequently, gridded data - such as images, image timeseries, and climate datacubes - are managed separately from the metadata, and with different, restricted retrieval capabilities. One reason for this silo approach is that databases, while good at tables, XML hierarchies, RDF graphs, etc., traditionally do not support multi-dimensional arrays well. This gap is being closed by Array Databases which extend the SQL paradigm of "any query, anytime" to NoSQL arrays. They introduce semantically rich modelling combined with declarative, high-level query languages on n-D arrays. On Server side, such queries can be optimized, parallelized, and distributed based on partitioned array storage. This way, they offer new vistas in flexibility, scalability, performance, and data integration. In this respect, the forthcoming ISO SQL extension MDA ("Multi-dimensional Arrays") will be a game changer in Big Data Analytics. We introduce concepts and opportunities through the example of rasdaman ("raster data manager") which in fact has pioneered the field of Array Databases and forms the blueprint for ISO SQL/MDA and further Big Data standards, such as OGC WCPS for querying spatio-temporal Earth datacubes. With operational installations exceeding 140 TB queries have been split across more than one thousand cloud nodes, using CPUs as well as GPUs. Installations can easily be mashed up securely, enabling large-scale location-transparent query processing in federations. Federation queries have been demonstrated live at EGU 2016 spanning Europe and Australia in the context of the intercontinental EarthServer initiative, visualized through NASA WorldWind.
Agile Datacube Analytics (not just) for the Earth Sciences
NASA Astrophysics Data System (ADS)
Baumann, P.
2016-12-01
Metadata are considered small, smart, and queryable; data, on the other hand, are known as big, clumsy, hard to analyze. Consequently, gridded data - such as images, image timeseries, and climate datacubes - are managed separately from the metadata, and with different, restricted retrieval capabilities. One reason for this silo approach is that databases, while good at tables, XML hierarchies, RDF graphs, etc., traditionally do not support multi-dimensional arrays well.This gap is being closed by Array Databases which extend the SQL paradigm of "any query, anytime" to NoSQL arrays. They introduce semantically rich modelling combined with declarative, high-level query languages on n-D arrays. On Server side, such queries can be optimized, parallelized, and distributed based on partitioned array storage. This way, they offer new vistas in flexibility, scalability, performance, and data integration. In this respect, the forthcoming ISO SQL extension MDA ("Multi-dimensional Arrays") will be a game changer in Big Data Analytics.We introduce concepts and opportunities through the example of rasdaman ("raster data manager") which in fact has pioneered the field of Array Databases and forms the blueprint for ISO SQL/MDA and further Big Data standards, such as OGC WCPS for querying spatio-temporal Earth datacubes. With operational installations exceeding 140 TB queries have been split across more than one thousand cloud nodes, using CPUs as well as GPUs. Installations can easily be mashed up securely, enabling large-scale location-transparent query processing in federations. Federation queries have been demonstrated live at EGU 2016 spanning Europe and Australia in the context of the intercontinental EarthServer initiative, visualized through NASA WorldWind.
Supporting Dynamic Quantization for High-Dimensional Data Analytics.
Guzun, Gheorghi; Canahuate, Guadalupe
2017-05-01
Similarity searches are at the heart of exploratory data analysis tasks. Distance metrics are typically used to characterize the similarity between data objects represented as feature vectors. However, when the dimensionality of the data increases and the number of features is large, traditional distance metrics fail to distinguish between the closest and furthest data points. Localized distance functions have been proposed as an alternative to traditional distance metrics. These functions only consider dimensions close to query to compute the distance/similarity. Furthermore, in order to enable interactive explorations of high-dimensional data, indexing support for ad-hoc queries is needed. In this work we set up to investigate whether bit-sliced indices can be used for exploratory analytics such as similarity searches and data clustering for high-dimensional big-data. We also propose a novel dynamic quantization called Query dependent Equi-Depth (QED) quantization and show its effectiveness on characterizing high-dimensional similarity. When applying QED we observe improvements in kNN classification accuracy over traditional distance functions. Gheorghi Guzun and Guadalupe Canahuate. 2017. Supporting Dynamic Quantization for High-Dimensional Data Analytics. In Proceedings of Ex-ploreDB'17, Chicago, IL, USA, May 14-19, 2017, 6 pages. https://doi.org/http://dx.doi.org/10.1145/3077331.3077336.
Child pornography in peer-to-peer networks.
Steel, Chad M S
2009-08-01
The presence of child pornography in peer-to-peer networks is not disputed, but there has been little effort done to quantify and analyze the distribution and nature of that content to-date. By performing an analysis of queries and query hits on the largest peer-to-peer network, we are able to both quantify and describe the nature of querying by child pornographers as well as the content they are sharing. Child pornography related content was identified and analyzed in 235,513 user queries and 194,444 query hits. The research confirmed a large amount of peer-to-peer traffic is dedicated to child pornography, but supply and demand must be separated for a better understanding. The most prevalent query and the top two most prevalent filenames returned as query hits were child pornography related. However, it would be inaccurate to state child pornography dominates peer-to-peer as 1% of all queries were related to child pornography and 1.45% of all query hits (unique filenames) were related to child pornography, consistent with a smaller study (Hughes et al., 2008). In addition to the above, research indicates that the median age searched for was 13 years old, and the majority of queries were gender-neutral, but of those with gender-related terms, 79% were female-oriented. Distribution-wise, the vast majority of content-specific searches are for movies at 99%, though images are still the most prevalent in availability. There is no shortage of child pornography supply and demand on peer-to-peer networks and by analyzing how consumers seek and distributors advertise content we can better understand their motivations. Understanding the behavior of child pornographers and how they search for content when contrasted with those sharing content provides a basis for finding and combating that behavior. For law enforcement, knowing the specific terms used allows more timely and accurate forensics and better identification of those seeking and distributing child pornography. For Internet researchers, better filtering and monitoring is possible. For mental health professionals, understanding the preferences and behaviors of those searching supports more effective treatment.
Evaluation of relational and NoSQL database architectures to manage genomic annotations.
Schulz, Wade L; Nelson, Brent G; Felker, Donn K; Durant, Thomas J S; Torres, Richard
2016-12-01
While the adoption of next generation sequencing has rapidly expanded, the informatics infrastructure used to manage the data generated by this technology has not kept pace. Historically, relational databases have provided much of the framework for data storage and retrieval. Newer technologies based on NoSQL architectures may provide significant advantages in storage and query efficiency, thereby reducing the cost of data management. But their relative advantage when applied to biomedical data sets, such as genetic data, has not been characterized. To this end, we compared the storage, indexing, and query efficiency of a common relational database (MySQL), a document-oriented NoSQL database (MongoDB), and a relational database with NoSQL support (PostgreSQL). When used to store genomic annotations from the dbSNP database, we found the NoSQL architectures to outperform traditional, relational models for speed of data storage, indexing, and query retrieval in nearly every operation. These findings strongly support the use of novel database technologies to improve the efficiency of data management within the biological sciences. Copyright © 2016 Elsevier Inc. All rights reserved.
The Database Query Support Processor (QSP)
NASA Technical Reports Server (NTRS)
1993-01-01
The number and diversity of databases available to users continues to increase dramatically. Currently, the trend is towards decentralized, client server architectures that (on the surface) are less expensive to acquire, operate, and maintain than information architectures based on centralized, monolithic mainframes. The database query support processor (QSP) effort evaluates the performance of a network level, heterogeneous database access capability. Air Force Material Command's Rome Laboratory has developed an approach, based on ANSI standard X3.138 - 1988, 'The Information Resource Dictionary System (IRDS)' to seamless access to heterogeneous databases based on extensions to data dictionary technology. To successfully query a decentralized information system, users must know what data are available from which source, or have the knowledge and system privileges necessary to find out this information. Privacy and security considerations prohibit free and open access to every information system in every network. Even in completely open systems, time required to locate relevant data (in systems of any appreciable size) would be better spent analyzing the data, assuming the original question was not forgotten. Extensions to data dictionary technology have the potential to more fully automate the search and retrieval for relevant data in a decentralized environment. Substantial amounts of time and money could be saved by not having to teach users what data resides in which systems and how to access each of those systems. Information describing data and how to get it could be removed from the application and placed in a dedicated repository where it belongs. The result simplified applications that are less brittle and less expensive to build and maintain. Software technology providing the required functionality is off the shelf. The key difficulty is in defining the metadata required to support the process. The database query support processor effort will provide quantitative data on the amount of effort required to implement an extended data dictionary at the network level, add new systems, adapt to changing user needs, and provide sound estimates on operations and maintenance costs and savings.
A unified framework for managing provenance information in translational research
2011-01-01
Background A critical aspect of the NIH Translational Research roadmap, which seeks to accelerate the delivery of "bench-side" discoveries to patient's "bedside," is the management of the provenance metadata that keeps track of the origin and history of data resources as they traverse the path from the bench to the bedside and back. A comprehensive provenance framework is essential for researchers to verify the quality of data, reproduce scientific results published in peer-reviewed literature, validate scientific process, and associate trust value with data and results. Traditional approaches to provenance management have focused on only partial sections of the translational research life cycle and they do not incorporate "domain semantics", which is essential to support domain-specific querying and analysis by scientists. Results We identify a common set of challenges in managing provenance information across the pre-publication and post-publication phases of data in the translational research lifecycle. We define the semantic provenance framework (SPF), underpinned by the Provenir upper-level provenance ontology, to address these challenges in the four stages of provenance metadata: (a) Provenance collection - during data generation (b) Provenance representation - to support interoperability, reasoning, and incorporate domain semantics (c) Provenance storage and propagation - to allow efficient storage and seamless propagation of provenance as the data is transferred across applications (d) Provenance query - to support queries with increasing complexity over large data size and also support knowledge discovery applications We apply the SPF to two exemplar translational research projects, namely the Semantic Problem Solving Environment for Trypanosoma cruzi (T.cruzi SPSE) and the Biomedical Knowledge Repository (BKR) project, to demonstrate its effectiveness. Conclusions The SPF provides a unified framework to effectively manage provenance of translational research data during pre and post-publication phases. This framework is underpinned by an upper-level provenance ontology called Provenir that is extended to create domain-specific provenance ontologies to facilitate provenance interoperability, seamless propagation of provenance, automated querying, and analysis. PMID:22126369
NASA Astrophysics Data System (ADS)
McWhirter, J.; Boler, F. M.; Bock, Y.; Jamason, P.; Squibb, M. B.; Noll, C. E.; Blewitt, G.; Kreemer, C. W.
2010-12-01
Three geodesy Archive Centers, Scripps Orbit and Permanent Array Center (SOPAC), NASA's Crustal Dynamics Data Information System (CDDIS) and UNAVCO are engaged in a joint effort to define and develop a common Web Service Application Programming Interface (API) for accessing geodetic data holdings. This effort is funded by the NASA ROSES ACCESS Program to modernize the original GPS Seamless Archive Centers (GSAC) technology which was developed in the 1990s. A new web service interface, the GSAC-WS, is being developed to provide uniform and expanded mechanisms through which users can access our data repositories. In total, our respective archives hold tens of millions of files and contain a rich collection of site/station metadata. Though we serve similar user communities, we currently provide a range of different access methods, query services and metadata formats. This leads to a lack of consistency in the userís experience and a duplication of engineering efforts. The GSAC-WS API and its reference implementation in an underlying Java-based GSAC Service Layer (GSL) supports metadata and data queries into site/station oriented data archives. The general nature of this API makes it applicable to a broad range of data systems. The overall goals of this project include providing consistent and rich query interfaces for end users and client programs, the development of enabling technology to facilitate third party repositories in developing these web service capabilities and to enable the ability to perform data queries across a collection of federated GSAC-WS enabled repositories. A fundamental challenge faced in this project is to provide a common suite of query services across a heterogeneous collection of data yet enabling each repository to expose their specific metadata holdings. To address this challenge we are developing a "capabilities" based service where a repository can describe its specific query and metadata capabilities. Furthermore, the architecture of the GSL is based on a model-view paradigm that decouples the underlying data model semantics from particular representations of the data model. This will allow for the GSAC-WS enabled repositories to evolve their service offerings to incorporate new metadata definition formats (e.g., ISO-19115, FGDC, JSON, etc.) and new techniques for accessing their holdings. Building on the core GSAC-WS implementations the project is also developing a federated/distributed query service. This service will seamlessly integrate with the GSAC Service Layer and will support data and metadata queries across a collection of federated GSAC repositories.
Kauppinen, Tomi; Keßler, Carsten; Fritz, Fleur
2014-01-01
Background Healthcare organizations around the world are challenged by pressures to reduce cost, improve coordination and outcome, and provide more with less. This requires effective planning and evidence-based practice by generating important information from available data. Thus, flexible and user-friendly ways to represent, query, and visualize health data becomes increasingly important. International organizations such as the World Health Organization (WHO) regularly publish vital data on priority health topics that can be utilized for public health policy and health service development. However, the data in most portals is displayed in either Excel or PDF formats, which makes information discovery and reuse difficult. Linked Open Data (LOD)—a new Semantic Web set of best practice of standards to publish and link heterogeneous data—can be applied to the representation and management of public level health data to alleviate such challenges. However, the technologies behind building LOD systems and their effectiveness for health data are yet to be assessed. Objective The objective of this study is to evaluate whether Linked Data technologies are potential options for health information representation, visualization, and retrieval systems development and to identify the available tools and methodologies to build Linked Data-based health information systems. Methods We used the Resource Description Framework (RDF) for data representation, Fuseki triple store for data storage, and Sgvizler for information visualization. Additionally, we integrated SPARQL query interface for interacting with the data. We primarily use the WHO health observatory dataset to test the system. All the data were represented using RDF and interlinked with other related datasets on the Web of Data using Silk—a link discovery framework for Web of Data. A preliminary usability assessment was conducted following the System Usability Scale (SUS) method. Results We developed an LOD-based health information representation, querying, and visualization system by using Linked Data tools. We imported more than 20,000 HIV-related data elements on mortality, prevalence, incidence, and related variables, which are freely available from the WHO global health observatory database. Additionally, we automatically linked 5312 data elements from DBpedia, Bio2RDF, and LinkedCT using the Silk framework. The system users can retrieve and visualize health information according to their interests. For users who are not familiar with SPARQL queries, we integrated a Linked Data search engine interface to search and browse the data. We used the system to represent and store the data, facilitating flexible queries and different kinds of visualizations. The preliminary user evaluation score by public health data managers and users was 82 on the SUS usability measurement scale. The need to write queries in the interface was the main reported difficulty of LOD-based systems to the end user. Conclusions The system introduced in this article shows that current LOD technologies are a promising alternative to represent heterogeneous health data in a flexible and reusable manner so that they can serve intelligent queries, and ultimately support decision-making. However, the development of advanced text-based search engines is necessary to increase its usability especially for nontechnical users. Further research with large datasets is recommended in the future to unfold the potential of Linked Data and Semantic Web for future health information systems development. PMID:25601195
Tilahun, Binyam; Kauppinen, Tomi; Keßler, Carsten; Fritz, Fleur
2014-10-25
Healthcare organizations around the world are challenged by pressures to reduce cost, improve coordination and outcome, and provide more with less. This requires effective planning and evidence-based practice by generating important information from available data. Thus, flexible and user-friendly ways to represent, query, and visualize health data becomes increasingly important. International organizations such as the World Health Organization (WHO) regularly publish vital data on priority health topics that can be utilized for public health policy and health service development. However, the data in most portals is displayed in either Excel or PDF formats, which makes information discovery and reuse difficult. Linked Open Data (LOD)-a new Semantic Web set of best practice of standards to publish and link heterogeneous data-can be applied to the representation and management of public level health data to alleviate such challenges. However, the technologies behind building LOD systems and their effectiveness for health data are yet to be assessed. The objective of this study is to evaluate whether Linked Data technologies are potential options for health information representation, visualization, and retrieval systems development and to identify the available tools and methodologies to build Linked Data-based health information systems. We used the Resource Description Framework (RDF) for data representation, Fuseki triple store for data storage, and Sgvizler for information visualization. Additionally, we integrated SPARQL query interface for interacting with the data. We primarily use the WHO health observatory dataset to test the system. All the data were represented using RDF and interlinked with other related datasets on the Web of Data using Silk-a link discovery framework for Web of Data. A preliminary usability assessment was conducted following the System Usability Scale (SUS) method. We developed an LOD-based health information representation, querying, and visualization system by using Linked Data tools. We imported more than 20,000 HIV-related data elements on mortality, prevalence, incidence, and related variables, which are freely available from the WHO global health observatory database. Additionally, we automatically linked 5312 data elements from DBpedia, Bio2RDF, and LinkedCT using the Silk framework. The system users can retrieve and visualize health information according to their interests. For users who are not familiar with SPARQL queries, we integrated a Linked Data search engine interface to search and browse the data. We used the system to represent and store the data, facilitating flexible queries and different kinds of visualizations. The preliminary user evaluation score by public health data managers and users was 82 on the SUS usability measurement scale. The need to write queries in the interface was the main reported difficulty of LOD-based systems to the end user. The system introduced in this article shows that current LOD technologies are a promising alternative to represent heterogeneous health data in a flexible and reusable manner so that they can serve intelligent queries, and ultimately support decision-making. However, the development of advanced text-based search engines is necessary to increase its usability especially for nontechnical users. Further research with large datasets is recommended in the future to unfold the potential of Linked Data and Semantic Web for future health information systems development.
Koutkias, Vassilis; Kilintzis, Vassilis; Stalidis, George; Lazou, Katerina; Niès, Julie; Durand-Texte, Ludovic; McNair, Peter; Beuscart, Régis; Maglaveras, Nicos
2012-06-01
The primary aim of this work was the development of a uniform, contextualized and sustainable knowledge-based framework to support adverse drug event (ADE) prevention via Clinical Decision Support Systems (CDSSs). In this regard, the employed methodology involved first the systematic analysis and formalization of the knowledge sources elaborated in the scope of this work, through which an application-specific knowledge model has been defined. The entire framework architecture has been then specified and implemented by adopting Computer Interpretable Guidelines (CIGs) as the knowledge engineering formalism for its construction. The framework integrates diverse and dynamic knowledge sources in the form of rule-based ADE signals, all under a uniform Knowledge Base (KB) structure, according to the defined knowledge model. Equally important, it employs the means to contextualize the encapsulated knowledge, in order to provide appropriate support considering the specific local environment (hospital, medical department, language, etc.), as well as the mechanisms for knowledge querying, inference, sharing, and management. In this paper, we present thoroughly the establishment of the proposed knowledge framework by presenting the employed methodology and the results obtained as regards implementation, performance and validation aspects that highlight its applicability and virtue in medication safety. Copyright © 2012 Elsevier Inc. All rights reserved.
Male Spouses of Women Physicians: Communication, Compromise, and Carving Out Time.
Isaac, Carol; Petrashek, Kara; Steiner, Megan; Manwell, Linda Baier; Byars-Winston, Angela; Carnes, Molly
2013-01-01
As the numbers of female physicians continue to grow, fewer medical marriages are comprised of the traditional dyad of male physician and stay-at-home wife. The "two-career family" is an increasingly frequent state for both male and female physicians' families, and dual-doctor marriages are on the rise. This qualitative study explored the contemporary medical marriage from the perspective of male spouses of female physicians. In 2010, we conducted semi-structured, in-depth interviews with nine spouses of internal medicine resident and faculty physicians. Interviewers queried work-home balance, career choices, and support networks. We used an interpretive, inductive, iterative approach to thematically analyze interview transcripts and develop broad, consensus-derived themes. A conceptual framework based on three major themes emerged: "A time for us? Really?", "Supporting and protecting her, sometimes at my expense,'" and "Hers is a career, mine is a job." This framework described the inflexibility of physicians' time and its impact on spousal time, career development, and choices. Having a set time for synchronizing schedules, frequent verbal support, and shared decision-making were seen as important by the husbands of female, full-time physicians. This exploratory study examined the contemporary medical marriage from the male spouse's perspective and highlights specific strategies for success.
A Multimedia Database Management System Supporting Contents Search in Media Data
1989-03-01
predicates: car (x), manufacturer (x, Horch ), year-built (x, 1922) to designate that the car was manufactured by Horch in the year 1922. The use of the name...manufacturer (imagel, x, " Horch "). year-built (imagel, x, 1922). query(I) :- car (I, Z), manufacturer (I, Z, " Horch "). ?- query(Image). yes. Image = image... Horch ) : imageJ 1, imageJ2, etc. In certain cases a predicate may generate entries in more than one list. For example, if the predicate of action
2011-01-01
Shotgun lipidome profiling relies on direct mass spectrometric analysis of total lipid extracts from cells, tissues or organisms and is a powerful tool to elucidate the molecular composition of lipidomes. We present a novel informatics concept of the molecular fragmentation query language implemented within the LipidXplorer open source software kit that supports accurate quantification of individual species of any ionizable lipid class in shotgun spectra acquired on any mass spectrometry platform. PMID:21247462
Distributed Multi-interface Catalogue for Geospatial Data
NASA Astrophysics Data System (ADS)
Nativi, S.; Bigagli, L.; Mazzetti, P.; Mattia, U.; Boldrini, E.
2007-12-01
Several geosciences communities (e.g. atmospheric science, oceanography, hydrology) have developed tailored data and metadata models and service protocol specifications for enabling online data discovery, inventory, evaluation, access and download. These specifications are conceived either profiling geospatial information standards or extending the well-accepted geosciences data models and protocols in order to capture more semantics. These artifacts have generated a set of related catalog -and inventory services- characterizing different communities, initiatives and projects. In fact, these geospatial data catalogs are discovery and access systems that use metadata as the target for query on geospatial information. The indexed and searchable metadata provide a disciplined vocabulary against which intelligent geospatial search can be performed within or among communities. There exists a clear need to conceive and achieve solutions to implement interoperability among geosciences communities, in the context of the more general geospatial information interoperability framework. Such solutions should provide search and access capabilities across catalogs, inventory lists and their registered resources. Thus, the development of catalog clearinghouse solutions is a near-term challenge in support of fully functional and useful infrastructures for spatial data (e.g. INSPIRE, GMES, NSDI, GEOSS). This implies the implementation of components for query distribution and virtual resource aggregation. These solutions must implement distributed discovery functionalities in an heterogeneous environment, requiring metadata profiles harmonization as well as protocol adaptation and mediation. We present a catalog clearinghouse solution for the interoperability of several well-known cataloguing systems (e.g. OGC CSW, THREDDS catalog and data services). The solution implements consistent resource discovery and evaluation over a dynamic federation of several well-known cataloguing and inventory systems. Prominent features include: 1)Support to distributed queries over a hierarchical data model, supporting incremental queries (i.e. query over collections, to be subsequently refined) and opaque/translucent chaining; 2)Support to several client protocols, through a compound front-end interface module. This allows to accommodate a (growing) number of cataloguing standards, or profiles thereof, including the OGC CSW interface, ebRIM Application Profile (for Core ISO Metadata and other data models), and the ISO Application Profile. The presented catalog clearinghouse supports both the opaque and translucent pattern for service chaining. In fact, the clearinghouse catalog may be configured either to completely hide the underlying federated services or to provide clients with services information. In both cases, the clearinghouse solution presents a higher level interface (i.e. OGC CSW) which harmonizes multiple lower level services (e.g. OGC CSW, WMS and WCS, THREDDS, etc.), and handles all control and interaction with them. In the translucent case, client has the option to directly access the lower level services (e.g. to improve performances). In the GEOSS context, the solution has been experimented both as a stand-alone user application and as a service framework. The first scenario allows a user to download a multi-platform client software and query a federation of cataloguing systems, that he can customize at will. The second scenario support server-side deployment and can be flexibly adapted to several use-cases, such as intranet proxy, catalog broker, etc.
A data model and database for high-resolution pathology analytical image informatics.
Wang, Fusheng; Kong, Jun; Cooper, Lee; Pan, Tony; Kurc, Tahsin; Chen, Wenjin; Sharma, Ashish; Niedermayr, Cristobal; Oh, Tae W; Brat, Daniel; Farris, Alton B; Foran, David J; Saltz, Joel
2011-01-01
The systematic analysis of imaged pathology specimens often results in a vast amount of morphological information at both the cellular and sub-cellular scales. While microscopy scanners and computerized analysis are capable of capturing and analyzing data rapidly, microscopy image data remain underutilized in research and clinical settings. One major obstacle which tends to reduce wider adoption of these new technologies throughout the clinical and scientific communities is the challenge of managing, querying, and integrating the vast amounts of data resulting from the analysis of large digital pathology datasets. This paper presents a data model, which addresses these challenges, and demonstrates its implementation in a relational database system. This paper describes a data model, referred to as Pathology Analytic Imaging Standards (PAIS), and a database implementation, which are designed to support the data management and query requirements of detailed characterization of micro-anatomic morphology through many interrelated analysis pipelines on whole-slide images and tissue microarrays (TMAs). (1) Development of a data model capable of efficiently representing and storing virtual slide related image, annotation, markup, and feature information. (2) Development of a database, based on the data model, capable of supporting queries for data retrieval based on analysis and image metadata, queries for comparison of results from different analyses, and spatial queries on segmented regions, features, and classified objects. The work described in this paper is motivated by the challenges associated with characterization of micro-scale features for comparative and correlative analyses involving whole-slides tissue images and TMAs. Technologies for digitizing tissues have advanced significantly in the past decade. Slide scanners are capable of producing high-magnification, high-resolution images from whole slides and TMAs within several minutes. Hence, it is becoming increasingly feasible for basic, clinical, and translational research studies to produce thousands of whole-slide images. Systematic analysis of these large datasets requires efficient data management support for representing and indexing results from hundreds of interrelated analyses generating very large volumes of quantifications such as shape and texture and of classifications of the quantified features. We have designed a data model and a database to address the data management requirements of detailed characterization of micro-anatomic morphology through many interrelated analysis pipelines. The data model represents virtual slide related image, annotation, markup and feature information. The database supports a wide range of metadata and spatial queries on images, annotations, markups, and features. We currently have three databases running on a Dell PowerEdge T410 server with CentOS 5.5 Linux operating system. The database server is IBM DB2 Enterprise Edition 9.7.2. The set of databases consists of 1) a TMA database containing image analysis results from 4740 cases of breast cancer, with 641 MB storage size; 2) an algorithm validation database, which stores markups and annotations from two segmentation algorithms and two parameter sets on 18 selected slides, with 66 GB storage size; and 3) an in silico brain tumor study database comprising results from 307 TCGA slides, with 365 GB storage size. The latter two databases also contain human-generated annotations and markups for regions and nuclei. Modeling and managing pathology image analysis results in a database provide immediate benefits on the value and usability of data in a research study. The database provides powerful query capabilities, which are otherwise difficult or cumbersome to support by other approaches such as programming languages. Standardized, semantic annotated data representation and interfaces also make it possible to more efficiently share image data and analysis results.
One Shot Detection with Laplacian Object and Fast Matrix Cosine Similarity.
Biswas, Sujoy Kumar; Milanfar, Peyman
2016-03-01
One shot, generic object detection involves searching for a single query object in a larger target image. Relevant approaches have benefited from features that typically model the local similarity patterns. In this paper, we combine local similarity (encoded by local descriptors) with a global context (i.e., a graph structure) of pairwise affinities among the local descriptors, embedding the query descriptors into a low dimensional but discriminatory subspace. Unlike principal components that preserve global structure of feature space, we actually seek a linear approximation to the Laplacian eigenmap that permits us a locality preserving embedding of high dimensional region descriptors. Our second contribution is an accelerated but exact computation of matrix cosine similarity as the decision rule for detection, obviating the computationally expensive sliding window search. We leverage the power of Fourier transform combined with integral image to achieve superior runtime efficiency that allows us to test multiple hypotheses (for pose estimation) within a reasonably short time. Our approach to one shot detection is training-free, and experiments on the standard data sets confirm the efficacy of our model. Besides, low computation cost of the proposed (codebook-free) object detector facilitates rather straightforward query detection in large data sets including movie videos.
Knowledge Acquisition of Generic Queries for Information Retrieval
Seol, Yoon-Ho; Johnson, Stephen B.; Cimino, James J.
2002-01-01
Several studies have identified clinical questions posed by health care professionals to understand the nature of information needs during clinical practice. To support access to digital information sources, it is necessary to integrate the information needs with a computer system. We have developed a conceptual guidance approach in information retrieval, based on a knowledge base that contains the patterns of information needs. The knowledge base uses a formal representation of clinical questions based on the UMLS knowledge sources, called the Generic Query model. To improve the coverage of the knowledge base, we investigated a method for extracting plausible clinical questions from the medical literature. This poster presents the Generic Query model, shows how it is used to represent the patterns of clinical questions, and describes the framework used to extract knowledge from the medical literature.
What Is Spatio-Temporal Data Warehousing?
NASA Astrophysics Data System (ADS)
Vaisman, Alejandro; Zimányi, Esteban
In the last years, extending OLAP (On-Line Analytical Processing) systems with spatial and temporal features has attracted the attention of the GIS (Geographic Information Systems) and database communities. However, there is no a commonly agreed definition of what is a spatio-temporal data warehouse and what functionality such a data warehouse should support. Further, the solutions proposed in the literature vary considerably in the kind of data that can be represented as well as the kind of queries that can be expressed. In this paper we present a conceptual framework for defining spatio-temporal data warehouses using an extensible data type system. We also define a taxonomy of different classes of queries of increasing expressive power, and show how to express such queries using an extension of the tuple relational calculus with aggregated functions.
Linked data and provenance in biological data webs.
Zhao, Jun; Miles, Alistair; Klyne, Graham; Shotton, David
2009-03-01
The Web is now being used as a platform for publishing and linking life science data. The Web's linking architecture can be exploited to join heterogeneous data from multiple sources. However, as data are frequently being updated in a decentralized environment, provenance information becomes critical to providing reliable and trustworthy services to scientists. This article presents design patterns for representing and querying provenance information relating to mapping links between heterogeneous data from sources in the domain of functional genomics. We illustrate the use of named resource description framework (RDF) graphs at different levels of granularity to make provenance assertions about linked data, and demonstrate that these assertions are sufficient to support requirements including data currency, integrity, evidential support and historical queries.
Understanding USGS user needs and Earth observing data use for decision making
NASA Astrophysics Data System (ADS)
Wu, Z.
2016-12-01
US Geological Survey (USGS) initiated the Requirements, Capabilities and Analysis for Earth Observations (RCA-EO) project in the Land Remote Sensing (LRS) program, collaborating with the National Oceanic and Atmospheric Administration (NOAA) to jointly develop the supporting information infrastructure - The Earth Observation Requirements Evaluation Systems (EORES). RCA-EO enables us to collect information on current data products and projects across the USGS and evaluate the impacts of Earth observation data from all sources, including spaceborne, airborne, and ground-based platforms. EORES allows users to query, filter, and analyze usage and impacts of Earth observation data at different organizational level within the bureau. We engaged over 500 subject matter experts and evaluated more than 1000 different Earth observing data sources and products. RCA-EO provides a comprehensive way to evaluate impacts of Earth observing data on USGS mission areas and programs through the survey of 345 key USGS products and services. We paid special attention to user feedback about Earth observing data to inform decision making on improving user satisfaction. We believe the approach and philosophy of RCA-EO can be applied in much broader scope to derive comprehensive knowledge of Earth observing systems impacts and usage and inform data products development and remote sensing technology innovation.
Learning of Multimodal Representations With Random Walks on the Click Graph.
Wu, Fei; Lu, Xinyan; Song, Jun; Yan, Shuicheng; Zhang, Zhongfei Mark; Rui, Yong; Zhuang, Yueting
2016-02-01
In multimedia information retrieval, most classic approaches tend to represent different modalities of media in the same feature space. With the click data collected from the users' searching behavior, existing approaches take either one-to-one paired data (text-image pairs) or ranking examples (text-query-image and/or image-query-text ranking lists) as training examples, which do not make full use of the click data, particularly the implicit connections among the data objects. In this paper, we treat the click data as a large click graph, in which vertices are images/text queries and edges indicate the clicks between an image and a query. We consider learning a multimodal representation from the perspective of encoding the explicit/implicit relevance relationship between the vertices in the click graph. By minimizing both the truncated random walk loss as well as the distance between the learned representation of vertices and their corresponding deep neural network output, the proposed model which is named multimodal random walk neural network (MRW-NN) can be applied to not only learn robust representation of the existing multimodal data in the click graph, but also deal with the unseen queries and images to support cross-modal retrieval. We evaluate the latent representation learned by MRW-NN on a public large-scale click log data set Clickture and further show that MRW-NN achieves much better cross-modal retrieval performance on the unseen queries/images than the other state-of-the-art methods.
Ayers, John W; Althouse, Benjamin M; Ribisl, Kurt M; Emery, Sherry
2014-05-01
The Internet is revolutionizing tobacco control, but few have harnessed the Web for surveillance. We demonstrate for the first time an approach for analyzing aggregate Internet search queries that captures precise changes in population considerations about tobacco. We compared tobacco-related Google queries originating in the United States during the week of the State Children's Health Insurance Program (SCHIP) 2009 cigarette excise tax increase with a historic baseline. Specific queries were then ranked according to their relative increases while also considering approximations of changes in absolute search volume. Individual queries with the largest relative increases the week of the SCHIP tax were "cigarettes Indian reservations" 640% (95% CI, 472-918), "free cigarettes online" 557% (95% CI, 432-756), and "Indian reservations cigarettes" 542% (95% CI, 414-733), amounting to about 7,500 excess searches. By themes, the largest relative increases were tribal cigarettes 246% (95% CI, 228-265), "free" cigarettes 215% (95% CI, 191-242), and cigarette stores 176% (95% CI, 160-193), accounting for 21,000, 27,000, and 90,000 excess queries. All avoidance queries, including those aforementioned themes, relatively increased 150% (95% CI, 144-155) or 550,000 from their baseline. All cessation queries increased 46% (95% CI, 44-48), or 175,000, around SCHIP; including themes for "cold turkey" 19% (95% CI, 11-27) or 2,600, cessation products 47% (95% CI, 44-50) or 78,000, and dubious cessation approaches (e.g., hypnosis) 40% (95% CI, 33-47) or 2,300. The SCHIP tax motivated specific changes in population considerations. Our strategy can support evaluations that temporally link tobacco control measures with instantaneous population reactions, as well as serve as a springboard for traditional studies, for example, including survey questionnaire design.
A Layered Searchable Encryption Scheme with Functional Components Independent of Encryption Methods
Luo, Guangchun; Qin, Ke
2014-01-01
Searchable encryption technique enables the users to securely store and search their documents over the remote semitrusted server, which is especially suitable for protecting sensitive data in the cloud. However, various settings (based on symmetric or asymmetric encryption) and functionalities (ranked keyword query, range query, phrase query, etc.) are often realized by different methods with different searchable structures that are generally not compatible with each other, which limits the scope of application and hinders the functional extensions. We prove that asymmetric searchable structure could be converted to symmetric structure, and functions could be modeled separately apart from the core searchable structure. Based on this observation, we propose a layered searchable encryption (LSE) scheme, which provides compatibility, flexibility, and security for various settings and functionalities. In this scheme, the outputs of the core searchable component based on either symmetric or asymmetric setting are converted to some uniform mappings, which are then transmitted to loosely coupled functional components to further filter the results. In such a way, all functional components could directly support both symmetric and asymmetric settings. Based on LSE, we propose two representative and novel constructions for ranked keyword query (previously only available in symmetric scheme) and range query (previously only available in asymmetric scheme). PMID:24719565
BioFed: federated query processing over life sciences linked open data.
Hasnain, Ali; Mehmood, Qaiser; Sana E Zainab, Syeda; Saleem, Muhammad; Warren, Claude; Zehra, Durre; Decker, Stefan; Rebholz-Schuhmann, Dietrich
2017-03-15
Biomedical data, e.g. from knowledge bases and ontologies, is increasingly made available following open linked data principles, at best as RDF triple data. This is a necessary step towards unified access to biological data sets, but this still requires solutions to query multiple endpoints for their heterogeneous data to eventually retrieve all the meaningful information. Suggested solutions are based on query federation approaches, which require the submission of SPARQL queries to endpoints. Due to the size and complexity of available data, these solutions have to be optimised for efficient retrieval times and for users in life sciences research. Last but not least, over time, the reliability of data resources in terms of access and quality have to be monitored. Our solution (BioFed) federates data over 130 SPARQL endpoints in life sciences and tailors query submission according to the provenance information. BioFed has been evaluated against the state of the art solution FedX and forms an important benchmark for the life science domain. The efficient cataloguing approach of the federated query processing system 'BioFed', the triple pattern wise source selection and the semantic source normalisation forms the core to our solution. It gathers and integrates data from newly identified public endpoints for federated access. Basic provenance information is linked to the retrieved data. Last but not least, BioFed makes use of the latest SPARQL standard (i.e., 1.1) to leverage the full benefits for query federation. The evaluation is based on 10 simple and 10 complex queries, which address data in 10 major and very popular data sources (e.g., Dugbank, Sider). BioFed is a solution for a single-point-of-access for a large number of SPARQL endpoints providing life science data. It facilitates efficient query generation for data access and provides basic provenance information in combination with the retrieved data. BioFed fully supports SPARQL 1.1 and gives access to the endpoint's availability based on the EndpointData graph. Our evaluation of BioFed against FedX is based on 20 heterogeneous federated SPARQL queries and shows competitive execution performance in comparison to FedX, which can be attributed to the provision of provenance information for the source selection. Developing and testing federated query engines for life sciences data is still a challenging task. According to our findings, it is advantageous to optimise the source selection. The cataloguing of SPARQL endpoints, including type and property indexing, leads to efficient querying of data resources over the Web of Data. This could even be further improved through the use of ontologies, e.g., for abstract normalisation of query terms.
A systematic review of data mining and machine learning for air pollution epidemiology.
Bellinger, Colin; Mohomed Jabbar, Mohomed Shazan; Zaïane, Osmar; Osornio-Vargas, Alvaro
2017-11-28
Data measuring airborne pollutants, public health and environmental factors are increasingly being stored and merged. These big datasets offer great potential, but also challenge traditional epidemiological methods. This has motivated the exploration of alternative methods to make predictions, find patterns and extract information. To this end, data mining and machine learning algorithms are increasingly being applied to air pollution epidemiology. We conducted a systematic literature review on the application of data mining and machine learning methods in air pollution epidemiology. We carried out our search process in PubMed, the MEDLINE database and Google Scholar. Research articles applying data mining and machine learning methods to air pollution epidemiology were queried and reviewed. Our search queries resulted in 400 research articles. Our fine-grained analysis employed our inclusion/exclusion criteria to reduce the results to 47 articles, which we separate into three primary areas of interest: 1) source apportionment; 2) forecasting/prediction of air pollution/quality or exposure; and 3) generating hypotheses. Early applications had a preference for artificial neural networks. In more recent work, decision trees, support vector machines, k-means clustering and the APRIORI algorithm have been widely applied. Our survey shows that the majority of the research has been conducted in Europe, China and the USA, and that data mining is becoming an increasingly common tool in environmental health. For potential new directions, we have identified that deep learning and geo-spacial pattern mining are two burgeoning areas of data mining that have good potential for future applications in air pollution epidemiology. We carried out a systematic review identifying the current trends, challenges and new directions to explore in the application of data mining methods to air pollution epidemiology. This work shows that data mining is increasingly being applied in air pollution epidemiology. The potential to support air pollution epidemiology continues to grow with advancements in data mining related to temporal and geo-spacial mining, and deep learning. This is further supported by new sensors and storage mediums that enable larger, better quality data. This suggests that many more fruitful applications can be expected in the future.
Guide to Using the Teacher Data Use Survey. REL 2017-166
ERIC Educational Resources Information Center
Wayman, Jeffrey C.; Wilkerson, Stephanie B.; Cho, Vincent; Mandinach, Ellen B.; Supovitz, Jonathan A.
2016-01-01
The Teacher Data Use Survey can be used to query teachers, administrators, and instructional support staff about how teachers use data to support instruction, their attitudes toward data, and the supports that help teachers use data. This guide provides step-by-step instructions to help district and school planners implement the survey. The…
GIS Tool for Real-time Decision Making and Analysis of Multidisciplinary Cryosphere Datasets.
NASA Astrophysics Data System (ADS)
Roberts, S. D.; Moore, J. A.
2004-12-01
In support of the Western Arctic Shelf-Basin Interaction Project(SBI) a web-based interactive mapping server was installed on the USCGC Healy's on-board science computer network during its 2004 spring(HLY-04-02) and summer cruises (HLY-04-03) in the Chukchi and Beaufort Seas. SBI is a National Science Foundation sponsored multi-year and multidisciplinary project studying the biological productivity in the region. The mapping server was developed by the UCAR Joint Office of Science Support(JOSS) using OpenSource GIS tools(University of Minnesota Mapserver and USGS MapSurfer). Additional OpenSource tools such as GMT and MB-Systems were also utilized. The key layers in this system are the current ship track, station locations, multibeam bottom bathymetry, IBCAO bathymetry, DMSP satellite imagery , NOAA AVHRR Sea Surface temperature and all past SBI Project ship tracks and station locations. The ship track and multibeam layers are updated in real-time and the satellite layers are updated daily only during clear weather. In addition to using current high resolution multibeam bathymetry data, a composite high resolution bathymetry layer was created using multibeam data from past cruises in the SBI region. The server provides click-and-drag zooms, pan, feature query, distance measure and lat/lon/depth querys on a polar projection map of the arctic ocean. The main use of the system on the ship was for cruise track and station position planning by the scientists utilizing all available historical data and high resolution bathymetry. It was also the main source of information to all the scientist on board as to the cruise progress and plans. The system permitted on-board scientists to integrate historical cruise information for comparative purposes. A mirror web site was set up on land and the current ship track/station information was copied once a day to this site via a satellite link so people interested SBI research could follow the cruise progress.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hendrickson, K; Phillips, M; Fishburn, M
Purpose: To implement a common database structure and user-friendly web-browser based data collection tools across several medical institutions to better support evidence-based clinical decision making and comparative effectiveness research through shared outcomes data. Methods: A consortium of four academic medical centers agreed to implement a federated database, known as Oncospace. Initial implementation has addressed issues of differences between institutions in workflow and types and breadth of structured information captured. This requires coordination of data collection from departmental oncology information systems (OIS), treatment planning systems, and hospital electronic medical records in order to include as much as possible the multi-disciplinary clinicalmore » data associated with a patients care. Results: The original database schema was well-designed and required only minor changes to meet institution-specific data requirements. Mobile browser interfaces for data entry and review for both the OIS and the Oncospace database were tailored for the workflow of individual institutions. Federation of database queries--the ultimate goal of the project--was tested using artificial patient data. The tests serve as proof-of-principle that the system as a whole--from data collection and entry to providing responses to research queries of the federated database--was viable. The resolution of inter-institutional use of patient data for research is still not completed. Conclusions: The migration from unstructured data mainly in the form of notes and documents to searchable, structured data is difficult. Making the transition requires cooperation of many groups within the department and can be greatly facilitated by using the structured data to improve clinical processes and workflow. The original database schema design is critical to providing enough flexibility for multi-institutional use to improve each institution s ability to study outcomes, determine best practices, and support research. The project has demonstrated the feasibility of deploying a federated database environment for research purposes to multiple institutions.« less
Quantum computation with classical light: Implementation of the Deutsch-Jozsa algorithm
NASA Astrophysics Data System (ADS)
Perez-Garcia, Benjamin; McLaren, Melanie; Goyal, Sandeep K.; Hernandez-Aranda, Raul I.; Forbes, Andrew; Konrad, Thomas
2016-05-01
We propose an optical implementation of the Deutsch-Jozsa Algorithm using classical light in a binary decision-tree scheme. Our approach uses a ring cavity and linear optical devices in order to efficiently query the oracle functional values. In addition, we take advantage of the intrinsic Fourier transforming properties of a lens to read out whether the function given by the oracle is balanced or constant.
Battle Staff Operations: Synchronization of Planning at Battalion and Brigade Level
1989-06-02
tactical employment of the tank and mechanized infantry battalion task force. In consonance with Airland Battle doctrine, the document emphasizes ...letter entitled *Emphasis on Rapid Estimates and Decisions on the Atomic Battlefield.’ Emphasizing the increased tempo of the post war mechanized army...If so, a copy of the message is automatically canted to the user(s) identified in the distribution field. Queries are messages retrived records from
Usability Evaluation of NLP-PIER: A Clinical Document Search Engine for Researchers.
Hultman, Gretchen; McEwan, Reed; Pakhomov, Serguei; Lindemann, Elizabeth; Skube, Steven; Melton, Genevieve B
2017-01-01
NLP-PIER (Natural Language Processing - Patient Information Extraction for Research) is a self-service platform with a search engine for clinical researchers to perform natural language processing (NLP) queries using clinical notes. We conducted user-centered testing of NLP-PIER's usability to inform future design decisions. Quantitative and qualitative data were analyzed. Our findings will be used to improve the usability of NLP-PIER.
Haim, Mario; Arendt, Florian; Scherr, Sebastian
2017-02-01
Despite evidence that suicide rates can increase after suicides are widely reported in the media, appropriate depictions of suicide in the media can help people to overcome suicidal crises and can thus elicit preventive effects. We argue on the level of individual media users that a similar ambivalence can be postulated for search results on online suicide-related search queries. Importantly, the filter bubble hypothesis (Pariser, 2011) states that search results are biased by algorithms based on a person's previous search behavior. In this study, we investigated whether suicide-related search queries, including either potentially suicide-preventive or -facilitative terms, influence subsequent search results. This might thus protect or harm suicidal Internet users. We utilized a 3 (search history: suicide-related harmful, suicide-related helpful, and suicide-unrelated) × 2 (reactive: clicking the top-most result link and no clicking) experimental design applying agent-based testing. While findings show no influences either of search histories or of reactivity on search results in a subsequent situation, the presentation of a helpline offer raises concerns about possible detrimental algorithmic decision-making: Algorithms "decided" whether or not to present a helpline, and this automated decision, then, followed the agent throughout the rest of the observation period. Implications for policy-making and search providers are discussed.
Building an Intelligent Water Information System - American River Prototype
NASA Astrophysics Data System (ADS)
Glaser, S. D.; Bales, R. C.; Conklin, M. H.
2013-12-01
With better management, California's existing water supplies could go further to meeting the needs of the state's urban and agricultural uses. For example, California's water reservoirs are currently controlled and regulated using forecasts based upon more than 75 years of historical data. In the face of global climate change, these forecasts are becoming increasingly inadequate to precisely manage water resources. We propose implementing Leveraging the newest frontiers of information technology, we are developing a basin-scale real-time intelligent water infrastructure system that enables more information-intensive decision support. The complete system is made up of four key components. First, a strategically deployed ground-observation system will complement satellite measurements and provide continuous and accurate estimates of snowpack, soil moisture, vegetation state and energy balance across watersheds. Using our recently developed but mature technologies, we deliver measurements of hydrologic variables over a multi- tiered network of wireless sensor arrays, with a granularity of time and space previously unheard of. Second, satellite and aircraft remote sensing provide the only practical means of spatially continuous basin-wide measurement and monitoring of snow properties, vegetation characteristics and other watershed conditions. The ground-based system is designed to blend with remote sensing data on Sierra Nevada snow properties, and provide value-added products of unprecedented spatial detail and accuracy that are useable on a watershed level. Third, together the satellite and ground-based data make possible the updating of forecast tools, and routine use of physically based hydrologic models. The decision-support framework will provide tools to extract and visualize information of interest from the measured and modeled data, to assess uncertainties, and to optimize operations. Fourth, the advanced cyber infrastructure blends and transforms the numbers recorded by sensors into information in the form that is useful for decision-making. In a sense it 'monetizes' the data. It is the cyber infrastructure that links measurements, data processing, models and users. System software must provide flexibility for multiple types of access from user queries to automated and direct links with analysis tools and decision-support systems. We are currently installing a basin-scale ground-based sensor network focusing on measurements of snowpack, solar radiation, temperature, rH and soil moisture across the American River basin. Although this is a research network, it also provides core elements of a full ground-based operational system.
Using NASA Techniques to Atmospherically Correct AWiFS Data for Carbon Sequestration Studies
NASA Technical Reports Server (NTRS)
Holekamp, Kara L.
2007-01-01
Carbon dioxide is a greenhouse gas emitted in a number of ways, including the burning of fossil fuels and the conversion of forest to agriculture. Research has begun to quantify the ability of vegetative land cover and oceans to absorb and store carbon dioxide. The USDA (U.S. Department of Agriculture) Forest Service is currently evaluating a DSS (decision support system) developed by researchers at the NASA Ames Research Center called CASA-CQUEST (Carnegie-Ames-Stanford Approach-Carbon Query and Evaluation Support Tools). CASA-CQUEST is capable of estimating levels of carbon sequestration based on different land cover types and of predicting the effects of land use change on atmospheric carbon amounts to assist land use management decisions. The CASA-CQUEST DSS currently uses land cover data acquired from MODIS (the Moderate Resolution Imaging Spectroradiometer), and the CASA-CQUEST project team is involved in several projects that use moderate-resolution land cover data derived from Landsat surface reflectance. Landsat offers higher spatial resolution than MODIS, allowing for increased ability to detect land use changes and forest disturbance. However, because of the rate at which changes occur and the fact that disturbances can be hidden by regrowth, updated land cover classifications may be required before the launch of the Landsat Data Continuity Mission, and consistent classifications will be needed after that time. This candidate solution investigates the potential of using NASA atmospheric correction techniques to produce science-quality surface reflectance data from the Indian Remote Sensing Advanced Wide-Field Sensor on the RESOURCESAT-1 mission to produce land cover classification maps for the CASA-CQUEST DSS.
Citizen centered health and lifestyle management via interactive TV: The PANACEIA-ITV health system.
Maglaveras, N; Chouvarda, I; Koutkias, V; Lekka, I; Tsakali, M; Tsetoglou, S; Maglavera, S; Leondaridis, L; Zeevi, B; Danelli, V; Kotis, T; De Moore, G; Balas, E A
2003-01-01
In the context of an IST European project with acronym PANACEIA-ITV, a home care service provisioning system is described, based on interactive TV technology. The purpose of PANACEIA-ITV is to facilitate essential lifestyle changes and to promote compliance with scientifically sound self-care recommendations, through the application of interactive digital television for family health maintenance. The means to achieve these goals are based on technological, health services and business models. PANACEIA-ITV is looking for communication of monitoring micro-devices with I-TV set-top-boxes using infrared technology, and embodiment of analogous H/W and S/W in the I-TV set-top-boxes. Intelligent agents are used to regulate data flow, user queries as well as service provisions from and to the household through the satellite digital platform, the portal and the back-end decision support mechanisms, using predominantly the Active Service Provision (ASP) model. Moreover, interactive digital TV services are developed for the delivery of health care in the home care environment.
McDaniel, Robert B; Burlison, Jonathan D; Baker, Donald K; Hasan, Murad; Robertson, Jennifer; Hartford, Christine; Howard, Scott C; Sablauer, Andras
2016-01-01
Metrics for evaluating interruptive prescribing alerts have many limitations. Additional methods are needed to identify opportunities to improve alerting systems and prevent alert fatigue. In this study, the authors determined whether alert dwell time—the time elapsed from when an interruptive alert is generated to when it is dismissed—could be calculated by using historical alert data from log files. Drug–drug interaction (DDI) alerts from 3 years of electronic health record data were queried. Alert dwell time was calculated for 25,965 alerts, including 777 unique DDIs. The median alert dwell time was 8 s (range, 1–4913 s). Resident physicians had longer median alert dwell times than other prescribers (P < .001). The 10 most frequent DDI alerts (n = 8759 alerts) had shorter median dwell times than alerts that only occurred once (P < .001). This metric can be used in future research to evaluate the effectiveness and efficiency of interruptive prescribing alerts. PMID:26499101
Meteorological disaster management and assessment system design and implementation
NASA Astrophysics Data System (ADS)
Tang, Wei; Luo, Bin; Wu, Huanping
2009-09-01
Disaster prevention and mitigation get more and more attentions by Chinese government, with the national economic development in recent years. Some problems exhibit in traditional disaster management, such as the chaotic management of data, low level of information, poor data sharing. To improve the capability of information in disaster management, Meteorological Disaster Management and Assessment System (MDMAS) was developed and is introduced in the paper. MDMAS uses three-tier C/S architecture, including the application layer, data layer and service layer. Current functions of MDMAS include the typhoon and rainstorm assessment, disaster data query and statistics, automatic cartography for disaster management. The typhoon and rainstorm assessment models can be used in both pre-assessment of pre-disaster and post-disaster assessment. Implementation of automatic cartography uses ArcGIS Geoprocessing and ModelBuilder. In practice, MDMAS has been utilized to provide warning information, disaster assessment and services products. MDMAS is an efficient tool for meteorological disaster management and assessment. It can provide decision supports for disaster prevention and mitigation.
Meteorological disaster management and assessment system design and implementation
NASA Astrophysics Data System (ADS)
Tang, Wei; Luo, Bin; Wu, Huanping
2010-11-01
Disaster prevention and mitigation get more and more attentions by Chinese government, with the national economic development in recent years. Some problems exhibit in traditional disaster management, such as the chaotic management of data, low level of information, poor data sharing. To improve the capability of information in disaster management, Meteorological Disaster Management and Assessment System (MDMAS) was developed and is introduced in the paper. MDMAS uses three-tier C/S architecture, including the application layer, data layer and service layer. Current functions of MDMAS include the typhoon and rainstorm assessment, disaster data query and statistics, automatic cartography for disaster management. The typhoon and rainstorm assessment models can be used in both pre-assessment of pre-disaster and post-disaster assessment. Implementation of automatic cartography uses ArcGIS Geoprocessing and ModelBuilder. In practice, MDMAS has been utilized to provide warning information, disaster assessment and services products. MDMAS is an efficient tool for meteorological disaster management and assessment. It can provide decision supports for disaster prevention and mitigation.
Towards Phenotyping of Clinical Trial Eligibility Criteria.
Löbe, Matthias; Stäubert, Sebastian; Goldberg, Colleen; Haffner, Ivonne; Winter, Alfred
2018-01-01
Medical plaintext documents contain important facts about patients, but they are rarely available for structured queries. The provision of structured information from natural language texts in addition to the existing structured data can significantly speed up the search for fulfilled inclusion criteria and thus improve the recruitment rate. This work is aimed at supporting clinical trial recruitment with text mining techniques to identify suitable subjects in hospitals. Based on the inclusion/exclusion criteria of 5 sample studies and a text corpus consisting of 212 doctor's letters and medical follow-up documentation from a university cancer center, a prototype was developed and technically evaluated using NLP procedures (UIMA) for the extraction of facts from medical free texts. It was found that although the extracted entities are not always correct (precision between 23% and 96%), they provide a decisive indication as to which patient file should be read preferentially. The prototype presented here demonstrates the technical feasibility. In order to find available, lucrative phenotypes, an in-depth evaluation is required.
Price, Morgan; Davies, Iryna; Rusk, Raymond; Lesperance, Mary; Weber, Jens
2017-06-15
Potentially Inappropriate Prescriptions (PIPs) are a common cause of morbidity, particularly in the elderly. We sought to understand how the Screening Tool of Older People's Prescriptions (STOPP) prescribing criteria, implemented in a routinely used primary care Electronic Medical Record (EMR), could impact PIP rates in community (non-academic) primary care practices. We conducted a mixed-method, pragmatic, cluster, randomized control trial in research naïve primary care practices. Phase 1: In the randomized controlled trial, 40 fully automated STOPP rules were implemented as EMR alerts during a 16-week intervention period. The control group did not receive the 40 STOPP rules (but received other alerts). Participants were recruited through the OSCAR EMR user group mailing list and in person at user group meetings. Results were assessed by querying EMR data PIPs. EMR data quality probes were included. Phase 2: physicians were invited to participate in 1-hour semi-structured interviews to discuss the results. In the EMR, 40 STOPP rules were successfully implemented. Phase 1: A total of 28 physicians from 8 practices were recruited (16 in intervention and 12 in control groups). The calculated PIP rate was 2.6% (138/5308) (control) and 4.11% (768/18,668) (intervention) at baseline. No change in PIPs was observed through the intervention (P=.80). Data quality probes generally showed low use of problem list and medication list. Phase 2: A total of 5 physicians participated. All the participants felt that they were aware of the alerts but commented on workflow and presentation challenges. The calculated PIP rate was markedly less than the expected rate found in literature (2.6% and 4.0% vs 20% in literature). Data quality probes highlighted issues related to completeness of data in areas of the EMR used for PIP reporting and by the decision support such as problem and medication lists. Users also highlighted areas for better integration of STOPP guidelines with prescribing workflows. Many of the STOPP criteria can be implemented in EMRs using simple logic. However, data quality in EMRs continues to be a challenge and was a limiting step in the effectiveness of the decision support in this study. This is important as decision makers continue to fund implementation and adoption of EMRs with the expectation of the use of advanced tools (such as decision support) without ongoing review of data quality and improvement. Clinicaltrials.gov NCT02130895; https://clinicaltrials.gov/ct2/show/NCT02130895 (Archived by WebCite at http://www.webcitation.org/6qyFigSYT). ©Morgan Price, Iryna Davies, Raymond Rusk, Mary Lesperance, Jens Weber. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 15.06.2017.
Enhanced Approximate Nearest Neighbor via Local Area Focused Search.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gonzales, Antonio; Blazier, Nicholas Paul
Approximate Nearest Neighbor (ANN) algorithms are increasingly important in machine learning, data mining, and image processing applications. There is a large family of space- partitioning ANN algorithms, such as randomized KD-Trees, that work well in practice but are limited by an exponential increase in similarity comparisons required to optimize recall. Additionally, they only support a small set of similarity metrics. We present Local Area Fo- cused Search (LAFS), a method that enhances the way queries are performed using an existing ANN index. Instead of a single query, LAFS performs a number of smaller (fewer similarity comparisons) queries and focuses onmore » a local neighborhood which is refined as candidates are identified. We show that our technique improves performance on several well known datasets and is easily extended to general similarity metrics using kernel projection techniques.« less
Spatial Query for Planetary Data
NASA Technical Reports Server (NTRS)
Shams, Khawaja S.; Crockett, Thomas M.; Powell, Mark W.; Joswig, Joseph C.; Fox, Jason M.
2011-01-01
Science investigators need to quickly and effectively assess past observations of specific locations on a planetary surface. This innovation involves a location-based search technology that was adapted and applied to planetary science data to support a spatial query capability for mission operations software. High-performance location-based searching requires the use of spatial data structures for database organization. Spatial data structures are designed to organize datasets based on their coordinates in a way that is optimized for location-based retrieval. The particular spatial data structure that was adapted for planetary data search is the R+ tree.
A database system to support image algorithm evaluation
NASA Technical Reports Server (NTRS)
Lien, Y. E.
1977-01-01
The design is given of an interactive image database system IMDB, which allows the user to create, retrieve, store, display, and manipulate images through the facility of a high-level, interactive image query (IQ) language. The query language IQ permits the user to define false color functions, pixel value transformations, overlay functions, zoom functions, and windows. The user manipulates the images through generic functions. The user can direct images to display devices for visual and qualitative analysis. Image histograms and pixel value distributions can also be computed to obtain a quantitative analysis of images.
CytoscapeRPC: a plugin to create, modify and query Cytoscape networks from scripting languages.
Bot, Jan J; Reinders, Marcel J T
2011-09-01
CytoscapeRPC is a plugin for Cytoscape which allows users to create, query and modify Cytoscape networks from any programming language which supports XML-RPC. This enables them to access Cytoscape functionality and visualize their data interactively without leaving the programming environment with which they are familiar. Install through the Cytoscape plugin manager or visit the web page: http://wiki.nbic.nl/index.php/CytoscapeRPC for the user tutorial and download. j.j.bot@tudelft.nl; j.j.bot@tudelft.nl.
Davidson, Judy E; Powers, Karen; Hedayat, Kamyar M; Tieszen, Mark; Kon, Alexander A; Shepard, Eric; Spuhler, Vicki; Todres, I David; Levy, Mitchell; Barr, Juliana; Ghandi, Raj; Hirsch, Gregory; Armstrong, Deborah
2007-02-01
To develop clinical practice guidelines for the support of the patient and family in the adult, pediatric, or neonatal patient-centered ICU. A multidisciplinary task force of experts in critical care practice was convened from the membership of the American College of Critical Care Medicine (ACCM) and the Society of Critical Care Medicine (SCCM) to include representation from adult, pediatric, and neonatal intensive care units. The task force members reviewed the published literature. The Cochrane library, Cinahl, and MedLine were queried for articles published between 1980 and 2003. Studies were scored according to Cochrane methodology. Where evidence did not exist or was of a low level, consensus was derived from expert opinion. The topic was divided into subheadings: decision making, family coping, staff stress related to family interactions, cultural support, spiritual/religious support, family visitation, family presence on rounds, family presence at resuscitation, family environment of care, and palliative care. Each section was led by one task force member. Each section draft was reviewed by the group and debated until consensus was achieved. The draft document was reviewed by a committee of the Board of Regents of the ACCM. After steering committee approval, the draft was approved by the SCCM Council and was again subjected to peer review by this journal. More than 300 related studies were reviewed. However, the level of evidence in most cases is at Cochrane level 4 or 5, indicating the need for further research. Forty-three recommendations are presented that include, but are not limited to, endorsement of a shared decision-making model, early and repeated care conferencing to reduce family stress and improve consistency in communication, honoring culturally appropriate requests for truth-telling and informed refusal, spiritual support, staff education and debriefing to minimize the impact of family interactions on staff health, family presence at both rounds and resuscitation, open flexible visitation, way-finding and family-friendly signage, and family support before, during, and after a death.
Using J-Query Mobile Technology to Support a Pedagogical Proficiency Course
ERIC Educational Resources Information Center
Kert, Serhat Bahadir
2013-01-01
Technology-enriched educational environments supported by different technological tools and applications are today's important research areas in the educational literature. During the educational process, different types of technologies are used in order to enhance the learning capabilities of students. Given the popularity of mobile phones, it…
Big Data Analytics with Datalog Queries on Spark.
Shkapsky, Alexander; Yang, Mohan; Interlandi, Matteo; Chiu, Hsuan; Condie, Tyson; Zaniolo, Carlo
2016-01-01
There is great interest in exploiting the opportunity provided by cloud computing platforms for large-scale analytics. Among these platforms, Apache Spark is growing in popularity for machine learning and graph analytics. Developing efficient complex analytics in Spark requires deep understanding of both the algorithm at hand and the Spark API or subsystem APIs (e.g., Spark SQL, GraphX). Our BigDatalog system addresses the problem by providing concise declarative specification of complex queries amenable to efficient evaluation. Towards this goal, we propose compilation and optimization techniques that tackle the important problem of efficiently supporting recursion in Spark. We perform an experimental comparison with other state-of-the-art large-scale Datalog systems and verify the efficacy of our techniques and effectiveness of Spark in supporting Datalog-based analytics.
Big Data Analytics with Datalog Queries on Spark
Shkapsky, Alexander; Yang, Mohan; Interlandi, Matteo; Chiu, Hsuan; Condie, Tyson; Zaniolo, Carlo
2017-01-01
There is great interest in exploiting the opportunity provided by cloud computing platforms for large-scale analytics. Among these platforms, Apache Spark is growing in popularity for machine learning and graph analytics. Developing efficient complex analytics in Spark requires deep understanding of both the algorithm at hand and the Spark API or subsystem APIs (e.g., Spark SQL, GraphX). Our BigDatalog system addresses the problem by providing concise declarative specification of complex queries amenable to efficient evaluation. Towards this goal, we propose compilation and optimization techniques that tackle the important problem of efficiently supporting recursion in Spark. We perform an experimental comparison with other state-of-the-art large-scale Datalog systems and verify the efficacy of our techniques and effectiveness of Spark in supporting Datalog-based analytics. PMID:28626296
Computing health quality measures using Informatics for Integrating Biology and the Bedside.
Klann, Jeffrey G; Murphy, Shawn N
2013-04-19
The Health Quality Measures Format (HQMF) is a Health Level 7 (HL7) standard for expressing computable Clinical Quality Measures (CQMs). Creating tools to process HQMF queries in clinical databases will become increasingly important as the United States moves forward with its Health Information Technology Strategic Plan to Stages 2 and 3 of the Meaningful Use incentive program (MU2 and MU3). Informatics for Integrating Biology and the Bedside (i2b2) is one of the analytical databases used as part of the Office of the National Coordinator (ONC)'s Query Health platform to move toward this goal. Our goal is to integrate i2b2 with the Query Health HQMF architecture, to prepare for other HQMF use-cases (such as MU2 and MU3), and to articulate the functional overlap between i2b2 and HQMF. Therefore, we analyze the structure of HQMF, and then we apply this understanding to HQMF computation on the i2b2 clinical analytical database platform. Specifically, we develop a translator between two query languages, HQMF and i2b2, so that the i2b2 platform can compute HQMF queries. We use the HQMF structure of queries for aggregate reporting, which define clinical data elements and the temporal and logical relationships between them. We use the i2b2 XML format, which allows flexible querying of a complex clinical data repository in an easy-to-understand domain-specific language. The translator can represent nearly any i2b2-XML query as HQMF and execute in i2b2 nearly any HQMF query expressible in i2b2-XML. This translator is part of the freely available reference implementation of the QueryHealth initiative. We analyze limitations of the conversion and find it covers many, but not all, of the complex temporal and logical operators required by quality measures. HQMF is an expressive language for defining quality measures, and it will be important to understand and implement for CQM computation, in both meaningful use and population health. However, its current form might allow complexity that is intractable for current database systems (both in terms of implementation and computation). Our translator, which supports the subset of HQMF currently expressible in i2b2-XML, may represent the beginnings of a practical compromise. It is being pilot-tested in two Query Health demonstration projects, and it can be further expanded to balance computational tractability with the advanced features needed by measure developers.
Computing Health Quality Measures Using Informatics for Integrating Biology and the Bedside
Murphy, Shawn N
2013-01-01
Background The Health Quality Measures Format (HQMF) is a Health Level 7 (HL7) standard for expressing computable Clinical Quality Measures (CQMs). Creating tools to process HQMF queries in clinical databases will become increasingly important as the United States moves forward with its Health Information Technology Strategic Plan to Stages 2 and 3 of the Meaningful Use incentive program (MU2 and MU3). Informatics for Integrating Biology and the Bedside (i2b2) is one of the analytical databases used as part of the Office of the National Coordinator (ONC)’s Query Health platform to move toward this goal. Objective Our goal is to integrate i2b2 with the Query Health HQMF architecture, to prepare for other HQMF use-cases (such as MU2 and MU3), and to articulate the functional overlap between i2b2 and HQMF. Therefore, we analyze the structure of HQMF, and then we apply this understanding to HQMF computation on the i2b2 clinical analytical database platform. Specifically, we develop a translator between two query languages, HQMF and i2b2, so that the i2b2 platform can compute HQMF queries. Methods We use the HQMF structure of queries for aggregate reporting, which define clinical data elements and the temporal and logical relationships between them. We use the i2b2 XML format, which allows flexible querying of a complex clinical data repository in an easy-to-understand domain-specific language. Results The translator can represent nearly any i2b2-XML query as HQMF and execute in i2b2 nearly any HQMF query expressible in i2b2-XML. This translator is part of the freely available reference implementation of the QueryHealth initiative. We analyze limitations of the conversion and find it covers many, but not all, of the complex temporal and logical operators required by quality measures. Conclusions HQMF is an expressive language for defining quality measures, and it will be important to understand and implement for CQM computation, in both meaningful use and population health. However, its current form might allow complexity that is intractable for current database systems (both in terms of implementation and computation). Our translator, which supports the subset of HQMF currently expressible in i2b2-XML, may represent the beginnings of a practical compromise. It is being pilot-tested in two Query Health demonstration projects, and it can be further expanded to balance computational tractability with the advanced features needed by measure developers. PMID:23603227
Gao, Su-qing; Wang, Zhen; Gao, Hong-wei; Liu, Peng; Wang, Ze-rui; Li, Yan-li; Zhu, Xu-guang; Li, Xin-lou; Xu, Bo; Li, Yin-jun; Yang, Hong; de Vlas, Sake J.; Shi, Tao-xing; Cao, Wu-chun
2013-01-01
Background For years, emerging infectious diseases have appeared worldwide and threatened the health of people. The emergence and spread of an infectious-disease outbreak are usually unforeseen, and have the features of suddenness and uncertainty. Timely understanding of basic information in the field, and the collection and analysis of epidemiological information, is helpful in making rapid decisions and responding to an infectious-disease emergency. Therefore, it is necessary to have an unobstructed channel and convenient tool for the collection and analysis of epidemiologic information in the field. Methodology/Principal Findings Baseline information for each county in mainland China was collected and a database was established by geo-coding information on a digital map of county boundaries throughout the country. Google Maps was used to display geographic information and to conduct calculations related to maps, and the 3G wireless network was used to transmit information collected in the field to the server. This study established a decision support system for the response to infectious-disease emergencies based on WebGIS and mobile services (DSSRIDE). The DSSRIDE provides functions including data collection, communication and analyses in real time, epidemiological detection, the provision of customized epidemiological questionnaires and guides for handling infectious disease emergencies, and the querying of professional knowledge in the field. These functions of the DSSRIDE could be helpful for epidemiological investigations in the field and the handling of infectious-disease emergencies. Conclusions/Significance The DSSRIDE provides a geographic information platform based on the Google Maps application programming interface to display information of infectious disease emergencies, and transfers information between workers in the field and decision makers through wireless transmission based on personal computers, mobile phones and personal digital assistants. After a 2-year practice and application in infectious disease emergencies, the DSSRIDE is becoming a useful platform and is a useful tool for investigations in the field carried out by response sections and individuals. The system is suitable for use in developing countries and low-income districts. PMID:23372780
Collaborative Supervised Learning for Sensor Networks
NASA Technical Reports Server (NTRS)
Wagstaff, Kiri L.; Rebbapragada, Umaa; Lane, Terran
2011-01-01
Collaboration methods for distributed machine-learning algorithms involve the specification of communication protocols for the learners, which can query other learners and/or broadcast their findings preemptively. Each learner incorporates information from its neighbors into its own training set, and they are thereby able to bootstrap each other to higher performance. Each learner resides at a different node in the sensor network and makes observations (collects data) independently of the other learners. After being seeded with an initial labeled training set, each learner proceeds to learn in an iterative fashion. New data is collected and classified. The learner can then either broadcast its most confident classifications for use by other learners, or can query neighbors for their classifications of its least confident items. As such, collaborative learning combines elements of both passive (broadcast) and active (query) learning. It also uses ideas from ensemble learning to combine the multiple responses to a given query into a single useful label. This approach has been evaluated against current non-collaborative alternatives, including training a single classifier and deploying it at all nodes with no further learning possible, and permitting learners to learn from their own most confident judgments, absent interaction with their neighbors. On several data sets, it has been consistently found that active collaboration is the best strategy for a distributed learner network. The main advantages include the ability for learning to take place autonomously by collaboration rather than by requiring intervention from an oracle (usually human), and also the ability to learn in a distributed environment, permitting decisions to be made in situ and to yield faster response time.
Horvath, Monica M.; Rusincovitch, Shelley A.; Brinson, Stephanie; Shang, Howard C.; Evans, Steve; Ferranti, Jeffrey M.
2015-01-01
Purpose Data generated in the care of patients are widely used to support clinical research and quality improvement, which has hastened the development of self-service query tools. User interface design for such tools, execution of query activity, and underlying application architecture have not been widely reported, and existing tools reflect a wide heterogeneity of methods and technical frameworks. We describe the design, application architecture, and use of a self-service model for enterprise data delivery within Duke Medicine. Methods Our query platform, the Duke Enterprise Data Unified Content Explorer (DEDUCE), supports enhanced data exploration, cohort identification, and data extraction from our enterprise data warehouse (EDW) using a series of modular environments that interact with a central keystone module, Cohort Manager (CM). A data-driven application architecture is implemented through three components: an application data dictionary, the concept of “smart dimensions”, and dynamically-generated user interfaces. Results DEDUCE CM allows flexible hierarchies of EDW queries within a grid-like workspace. A cohort “join” functionality allows switching between filters based on criteria occurring within or across patient encounters. To date, 674 users have been trained and activated in DEDUCE, and logon activity shows a steady increase, with variability between months. A comparison of filter conditions and export criteria shows that these activities have different patterns of usage across subject areas. Conclusions Organizations with sophisticated EDWs may find that users benefit from development of advanced query functionality, complimentary to the user interfaces and infrastructure used in other well-published models. Driven by its EDW context, the DEDUCE application architecture was also designed to be responsive to source data and to allow modification through alterations in metadata rather than programming, allowing an agile response to source system changes. PMID:25051403
Horvath, Monica M; Rusincovitch, Shelley A; Brinson, Stephanie; Shang, Howard C; Evans, Steve; Ferranti, Jeffrey M
2014-12-01
Data generated in the care of patients are widely used to support clinical research and quality improvement, which has hastened the development of self-service query tools. User interface design for such tools, execution of query activity, and underlying application architecture have not been widely reported, and existing tools reflect a wide heterogeneity of methods and technical frameworks. We describe the design, application architecture, and use of a self-service model for enterprise data delivery within Duke Medicine. Our query platform, the Duke Enterprise Data Unified Content Explorer (DEDUCE), supports enhanced data exploration, cohort identification, and data extraction from our enterprise data warehouse (EDW) using a series of modular environments that interact with a central keystone module, Cohort Manager (CM). A data-driven application architecture is implemented through three components: an application data dictionary, the concept of "smart dimensions", and dynamically-generated user interfaces. DEDUCE CM allows flexible hierarchies of EDW queries within a grid-like workspace. A cohort "join" functionality allows switching between filters based on criteria occurring within or across patient encounters. To date, 674 users have been trained and activated in DEDUCE, and logon activity shows a steady increase, with variability between months. A comparison of filter conditions and export criteria shows that these activities have different patterns of usage across subject areas. Organizations with sophisticated EDWs may find that users benefit from development of advanced query functionality, complimentary to the user interfaces and infrastructure used in other well-published models. Driven by its EDW context, the DEDUCE application architecture was also designed to be responsive to source data and to allow modification through alterations in metadata rather than programming, allowing an agile response to source system changes. Copyright © 2014 Elsevier Inc. All rights reserved.
The Battlefield Commander’s Assistant Project: Research in Terrain Reasoning
1987-05-22
order dissemination. In order to restrict the survey problem to a manageable level, we made the a priori decision to focus on activities related to...models Manages tools for: Conmander , tactical a explanations * situation assessment1Lplans s plan and plan option " a query/edit capabilities...from our work on the Air Land Battle Management Study ( ’Stachnick 87:) which was tasked to compare Al planning techniques with the requirements of
Visual perception-based criminal identification: a query-based approach
NASA Astrophysics Data System (ADS)
Singh, Avinash Kumar; Nandi, G. C.
2017-01-01
The visual perception of eyewitness plays a vital role in criminal identification scenario. It helps law enforcement authorities in searching particular criminal from their previous record. It has been reported that searching a criminal record manually requires too much time to get the accurate result. We have proposed a query-based approach which minimises the computational cost along with the reduction of search space. A symbolic database has been created to perform a stringent analysis on 150 public (Bollywood celebrities and Indian cricketers) and 90 local faces (our data-set). An expert knowledge has been captured to encapsulate every criminal's anatomical and facial attributes in the form of symbolic representation. A fast query-based searching strategy has been implemented using dynamic decision tree data structure which allows four levels of decomposition to fetch respective criminal records. Two types of case studies - viewed and forensic sketches have been considered to evaluate the strength of our proposed approach. We have derived 1200 views of the entire population by taking into consideration 80 participants as eyewitness. The system demonstrates an accuracy level of 98.6% for test case I and 97.8% for test case II. It has also been reported that experimental results reduce the search space up to 30 most relevant records.
KARL: A Knowledge-Assisted Retrieval Language. M.S. Thesis Final Report, 1 Jul. 1985 - 31 Dec. 1987
NASA Technical Reports Server (NTRS)
Dominick, Wayne D. (Editor); Triantafyllopoulos, Spiros
1985-01-01
Data classification and storage are tasks typically performed by application specialists. In contrast, information users are primarily non-computer specialists who use information in their decision-making and other activities. Interaction efficiency between such users and the computer is often reduced by machine requirements and resulting user reluctance to use the system. This thesis examines the problems associated with information retrieval for non-computer specialist users, and proposes a method for communicating in restricted English that uses knowledge of the entities involved, relationships between entities, and basic English language syntax and semantics to translate the user requests into formal queries. The proposed method includes an intelligent dictionary, syntax and semantic verifiers, and a formal query generator. In addition, the proposed system has a learning capability that can improve portability and performance. With the increasing demand for efficient human-machine communication, the significance of this thesis becomes apparent. As human resources become more valuable, software systems that will assist in improving the human-machine interface will be needed and research addressing new solutions will be of utmost importance. This thesis presents an initial design and implementation as a foundation for further research and development into the emerging field of natural language database query systems.
Manchester visual query language
NASA Astrophysics Data System (ADS)
Oakley, John P.; Davis, Darryl N.; Shann, Richard T.
1993-04-01
We report a database language for visual retrieval which allows queries on image feature information which has been computed and stored along with images. The language is novel in that it provides facilities for dealing with feature data which has actually been obtained from image analysis. Each line in the Manchester Visual Query Language (MVQL) takes a set of objects as input and produces another, usually smaller, set as output. The MVQL constructs are mainly based on proven operators from the field of digital image analysis. An example is the Hough-group operator which takes as input a specification for the objects to be grouped, a specification for the relevant Hough space, and a definition of the voting rule. The output is a ranked list of high scoring bins. The query could be directed towards one particular image or an entire image database, in the latter case the bins in the output list would in general be associated with different images. We have implemented MVQL in two layers. The command interpreter is a Lisp program which maps each MVQL line to a sequence of commands which are used to control a specialized database engine. The latter is a hybrid graph/relational system which provides low-level support for inheritance and schema evolution. In the paper we outline the language and provide examples of useful queries. We also describe our solution to the engineering problems associated with the implementation of MVQL.
The Ruby UCSC API: accessing the UCSC genome database using Ruby.
Mishima, Hiroyuki; Aerts, Jan; Katayama, Toshiaki; Bonnal, Raoul J P; Yoshiura, Koh-ichiro
2012-09-21
The University of California, Santa Cruz (UCSC) genome database is among the most used sources of genomic annotation in human and other organisms. The database offers an excellent web-based graphical user interface (the UCSC genome browser) and several means for programmatic queries. A simple application programming interface (API) in a scripting language aimed at the biologist was however not yet available. Here, we present the Ruby UCSC API, a library to access the UCSC genome database using Ruby. The API is designed as a BioRuby plug-in and built on the ActiveRecord 3 framework for the object-relational mapping, making writing SQL statements unnecessary. The current version of the API supports databases of all organisms in the UCSC genome database including human, mammals, vertebrates, deuterostomes, insects, nematodes, and yeast.The API uses the bin index-if available-when querying for genomic intervals. The API also supports genomic sequence queries using locally downloaded *.2bit files that are not stored in the official MySQL database. The API is implemented in pure Ruby and is therefore available in different environments and with different Ruby interpreters (including JRuby). Assisted by the straightforward object-oriented design of Ruby and ActiveRecord, the Ruby UCSC API will facilitate biologists to query the UCSC genome database programmatically. The API is available through the RubyGem system. Source code and documentation are available at https://github.com/misshie/bioruby-ucsc-api/ under the Ruby license. Feedback and help is provided via the website at http://rubyucscapi.userecho.com/.
The Ruby UCSC API: accessing the UCSC genome database using Ruby
2012-01-01
Background The University of California, Santa Cruz (UCSC) genome database is among the most used sources of genomic annotation in human and other organisms. The database offers an excellent web-based graphical user interface (the UCSC genome browser) and several means for programmatic queries. A simple application programming interface (API) in a scripting language aimed at the biologist was however not yet available. Here, we present the Ruby UCSC API, a library to access the UCSC genome database using Ruby. Results The API is designed as a BioRuby plug-in and built on the ActiveRecord 3 framework for the object-relational mapping, making writing SQL statements unnecessary. The current version of the API supports databases of all organisms in the UCSC genome database including human, mammals, vertebrates, deuterostomes, insects, nematodes, and yeast. The API uses the bin index—if available—when querying for genomic intervals. The API also supports genomic sequence queries using locally downloaded *.2bit files that are not stored in the official MySQL database. The API is implemented in pure Ruby and is therefore available in different environments and with different Ruby interpreters (including JRuby). Conclusions Assisted by the straightforward object-oriented design of Ruby and ActiveRecord, the Ruby UCSC API will facilitate biologists to query the UCSC genome database programmatically. The API is available through the RubyGem system. Source code and documentation are available at https://github.com/misshie/bioruby-ucsc-api/ under the Ruby license. Feedback and help is provided via the website at http://rubyucscapi.userecho.com/. PMID:22994508
Brown, Jeffrey S; Holmes, John H; Shah, Kiran; Hall, Ken; Lazarus, Ross; Platt, Richard
2010-06-01
Comparative effectiveness research, medical product safety evaluation, and quality measurement will require the ability to use electronic health data held by multiple organizations. There is no consensus about whether to create regional or national combined (eg, "all payer") databases for these purposes, or distributed data networks that leave most Protected Health Information and proprietary data in the possession of the original data holders. Demonstrate functions of a distributed research network that supports research needs and also address data holders concerns about participation. Key design functions included strong local control of data uses and a centralized web-based querying interface. We implemented a pilot distributed research network and evaluated the design considerations, utility for research, and the acceptability to data holders of methods for menu-driven querying. We developed and tested a central, web-based interface with supporting network software. Specific functions assessed include query formation and distribution, query execution and review, and aggregation of results. This pilot successfully evaluated temporal trends in medication use and diagnoses at 5 separate sites, demonstrating some of the possibilities of using a distributed research network. The pilot demonstrated the potential utility of the design, which addressed the major concerns of both users and data holders. No serious obstacles were identified that would prevent development of a fully functional, scalable network. Distributed networks are capable of addressing nearly all anticipated uses of routinely collected electronic healthcare data. Distributed networks would obviate the need for centralized databases, thus avoiding numerous obstacles.
ERIC Educational Resources Information Center
Barrera, Arnold; Braley, Richard T.; Slate, John R.
2010-01-01
Teacher mentors of first-year teachers provided insight into those practices they viewed as essential for their success in the mentoring role. Specifically, they were queried about teacher involvement/support, staff development, administrative support and resource materials. Almost all of the mentor teachers believed a teacher mentoring program…
Mentors' Views of Factors Essential for the Success of Beginning Teachers
ERIC Educational Resources Information Center
Barrera, Arnold; Braley, Richard; Slate, John R.
2008-01-01
The views of 46 mentors of first-year teachers were obtained regarding practices that they viewed as essential for their success in mentoring teachers. Specifically, they were queried about teacher involvement/support, staff development, administrative support, and resource materials. Almost all of the mentor teachers believed a teacher-mentoring…
Percolator: Scalable Pattern Discovery in Dynamic Graphs
DOE Office of Scientific and Technical Information (OSTI.GOV)
Choudhury, Sutanay; Purohit, Sumit; Lin, Peng
We demonstrate Percolator, a distributed system for graph pattern discovery in dynamic graphs. In contrast to conventional mining systems, Percolator advocates efficient pattern mining schemes that (1) support pattern detection with keywords; (2) integrate incremental and parallel pattern mining; and (3) support analytical queries such as trend analysis. The core idea of Percolator is to dynamically decide and verify a small fraction of patterns and their in- stances that must be inspected in response to buffered updates in dynamic graphs, with a total mining cost independent of graph size. We demonstrate a) the feasibility of incremental pattern mining by walkingmore » through each component of Percolator, b) the efficiency and scalability of Percolator over the sheer size of real-world dynamic graphs, and c) how the user-friendly GUI of Percolator inter- acts with users to support keyword-based queries that detect, browse and inspect trending patterns. We also demonstrate two user cases of Percolator, in social media trend analysis and academic collaboration analysis, respectively.« less
NoSQL Based 3D City Model Management System
NASA Astrophysics Data System (ADS)
Mao, B.; Harrie, L.; Cao, J.; Wu, Z.; Shen, J.
2014-04-01
To manage increasingly complicated 3D city models, a framework based on NoSQL database is proposed in this paper. The framework supports import and export of 3D city model according to international standards such as CityGML, KML/COLLADA and X3D. We also suggest and implement 3D model analysis and visualization in the framework. For city model analysis, 3D geometry data and semantic information (such as name, height, area, price and so on) are stored and processed separately. We use a Map-Reduce method to deal with the 3D geometry data since it is more complex, while the semantic analysis is mainly based on database query operation. For visualization, a multiple 3D city representation structure CityTree is implemented within the framework to support dynamic LODs based on user viewpoint. Also, the proposed framework is easily extensible and supports geoindexes to speed up the querying. Our experimental results show that the proposed 3D city management system can efficiently fulfil the analysis and visualization requirements.
Stjernswärd, Sigrid
2015-03-01
Families living with mental illness can experience added burden and eventually own ill health. National and international guidelines support the development of web based solutions to cost-effectively address health care needs. Mapping patterns of internet use may help tailor interventions to effectively address users' needs. The study's aim was to explore the contents of relatives' and friends' queries on a psychologists' website with professional feedback. All visible questions [n = 59] classified under the website's "helping a relative/friend" category between 20 090 615-20 130 927 were printed out and analyzed using content analysis. The analysis resulted in four categories and subcategories illuminating families'/friends' areas of concern: support to help; concerns with health care; young children's welfare; and repercussions on own health. "Question & Answer" [Q&A] forums can shed light onto health seekers' online behavior and may be a way of addressing families' needs of support and information. Further studies are needed to assess the replies' therapeutic value for their recipients.
Spatial Knowledge Infrastructures - Creating Value for Policy Makers and Benefits the Community
NASA Astrophysics Data System (ADS)
Arnold, L. M.
2016-12-01
The spatial data infrastructure is arguably one of the most significant advancements in the spatial sector. It's been a game changer for governments, providing for the coordination and sharing of spatial data across organisations and the provision of accessible information to the broader community of users. Today however, end-users such as policy-makers require far more from these spatial data infrastructures. They want more than just data; they want the knowledge that can be extracted from data and they don't want to have to download, manipulate and process data in order to get the knowledge they seek. It's time for the spatial sector to reduce its focus on data in spatial data infrastructures and take a more proactive step in emphasising and delivering the knowledge value. Nowadays, decision-makers want to be able to query at will the data to meet their immediate need for knowledge. This is a new value proposal for the decision-making consumer and will require a shift in thinking. This paper presents a model for a Spatial Knowledge Infrastructure and underpinning methods that will realise a new real-time approach to delivering knowledge. The methods embrace the new capabilities afforded through the sematic web, domain and process ontologies and natural query language processing. Semantic Web technologies today have the potential to transform the spatial industry into more than just a distribution channel for data. The Semantic Web RDF (Resource Description Framework) enables meaning to be drawn from data automatically. While pushing data out to end-users will remain a central role for data producers, the power of the semantic web is that end-users have the ability to marshal a broad range of spatial resources via a query to extract knowledge from available data. This can be done without actually having to configure systems specifically for the end-user. All data producers need do is make data accessible in RDF and the spatial analytics does the rest.
NLM at TREC 2012 Medical Records Track
2012-11-01
automatic runs are not significantly above the medians. As in 2011, we conclude that the existing search engines are mature enough to support cohort selection tasks, and the quality of the queries could be
Rice, Michael; Gladstone, William; Weir, Michael
2004-01-01
We discuss how relational databases constitute an ideal framework for representing and analyzing large-scale genomic data sets in biology. As a case study, we describe a Drosophila splice-site database that we recently developed at Wesleyan University for use in research and teaching. The database stores data about splice sites computed by a custom algorithm using Drosophila cDNA transcripts and genomic DNA and supports a set of procedures for analyzing splice-site sequence space. A generic Web interface permits the execution of the procedures with a variety of parameter settings and also supports custom structured query language queries. Moreover, new analytical procedures can be added by updating special metatables in the database without altering the Web interface. The database provides a powerful setting for students to develop informatic thinking skills.
2004-01-01
We discuss how relational databases constitute an ideal framework for representing and analyzing large-scale genomic data sets in biology. As a case study, we describe a Drosophila splice-site database that we recently developed at Wesleyan University for use in research and teaching. The database stores data about splice sites computed by a custom algorithm using Drosophila cDNA transcripts and genomic DNA and supports a set of procedures for analyzing splice-site sequence space. A generic Web interface permits the execution of the procedures with a variety of parameter settings and also supports custom structured query language queries. Moreover, new analytical procedures can be added by updating special metatables in the database without altering the Web interface. The database provides a powerful setting for students to develop informatic thinking skills. PMID:15592597
Stetler, Cheryl B; McQueen, Lynn; Demakis, John; Mittman, Brian S
2008-01-01
Background The continuing gap between available evidence and current practice in health care reinforces the need for more effective solutions, in particular related to organizational context. Considerable advances have been made within the U.S. Veterans Health Administration (VA) in systematically implementing evidence into practice. These advances have been achieved through a system-level program focused on collaboration and partnerships among policy makers, clinicians, and researchers. The Quality Enhancement Research Initiative (QUERI) was created to generate research-driven initiatives that directly enhance health care quality within the VA and, simultaneously, contribute to the field of implementation science. This paradigm-shifting effort provided a natural laboratory for exploring organizational change processes. This article describes the underlying change framework and implementation strategy used to operationalize QUERI. Strategic approach to organizational change QUERI used an evidence-based organizational framework focused on three contextual elements: 1) cultural norms and values, in this case related to the role of health services researchers in evidence-based quality improvement; 2) capacity, in this case among researchers and key partners to engage in implementation research; 3) and supportive infrastructures to reinforce expectations for change and to sustain new behaviors as part of the norm. As part of a QUERI Series in Implementation Science, this article describes the framework's application in an innovative integration of health services research, policy, and clinical care delivery. Conclusion QUERI's experience and success provide a case study in organizational change. It demonstrates that progress requires a strategic, systems-based effort. QUERI's evidence-based initiative involved a deliberate cultural shift, requiring ongoing commitment in multiple forms and at multiple levels. VA's commitment to QUERI came in the form of visionary leadership, targeted allocation of resources, infrastructure refinements, innovative peer review and study methods, and direct involvement of key stakeholders. Stakeholders included both those providing and managing clinical care, as well as those producing relevant evidence within the health care system. The organizational framework and related implementation interventions used to achieve contextual change resulted in engaged investigators and enhanced uptake of research knowledge. QUERI's approach and progress provide working hypotheses for others pursuing similar system-wide efforts to routinely achieve evidence-based care. PMID:18510750
2014-01-01
Introduction: The Internet is revolutionizing tobacco control, but few have harnessed the Web for surveillance. We demonstrate for the first time an approach for analyzing aggregate Internet search queries that captures precise changes in population considerations about tobacco. Methods: We compared tobacco-related Google queries originating in the United States during the week of the State Children’s Health Insurance Program (SCHIP) 2009 cigarette excise tax increase with a historic baseline. Specific queries were then ranked according to their relative increases while also considering approximations of changes in absolute search volume. Results: Individual queries with the largest relative increases the week of the SCHIP tax were “cigarettes Indian reservations” 640% (95% CI, 472–918), “free cigarettes online” 557% (95% CI, 432–756), and “Indian reservations cigarettes” 542% (95% CI, 414–733), amounting to about 7,500 excess searches. By themes, the largest relative increases were tribal cigarettes 246% (95% CI, 228–265), “free” cigarettes 215% (95% CI, 191–242), and cigarette stores 176% (95% CI, 160–193), accounting for 21,000, 27,000, and 90,000 excess queries. All avoidance queries, including those aforementioned themes, relatively increased 150% (95% CI, 144–155) or 550,000 from their baseline. All cessation queries increased 46% (95% CI, 44–48), or 175,000, around SCHIP; including themes for “cold turkey” 19% (95% CI, 11–27) or 2,600, cessation products 47% (95% CI, 44–50) or 78,000, and dubious cessation approaches (e.g., hypnosis) 40% (95% CI, 33–47) or 2,300. Conclusions: The SCHIP tax motivated specific changes in population considerations. Our strategy can support evaluations that temporally link tobacco control measures with instantaneous population reactions, as well as serve as a springboard for traditional studies, for example, including survey questionnaire design. PMID:24323570
YAHA: fast and flexible long-read alignment with optimal breakpoint detection.
Faust, Gregory G; Hall, Ira M
2012-10-01
With improved short-read assembly algorithms and the recent development of long-read sequencers, split mapping will soon be the preferred method for structural variant (SV) detection. Yet, current alignment tools are not well suited for this. We present YAHA, a fast and flexible hash-based aligner. YAHA is as fast and accurate as BWA-SW at finding the single best alignment per query and is dramatically faster and more sensitive than both SSAHA2 and MegaBLAST at finding all possible alignments. Unlike other aligners that report all, or one, alignment per query, or that use simple heuristics to select alignments, YAHA uses a directed acyclic graph to find the optimal set of alignments that cover a query using a biologically relevant breakpoint penalty. YAHA can also report multiple mappings per defined segment of the query. We show that YAHA detects more breakpoints in less time than BWA-SW across all SV classes, and especially excels at complex SVs comprising multiple breakpoints. YAHA is currently supported on 64-bit Linux systems. Binaries and sample data are freely available for download from http://faculty.virginia.edu/irahall/YAHA. imh4y@virginia.edu.
Automated Database Mediation Using Ontological Metadata Mappings
Marenco, Luis; Wang, Rixin; Nadkarni, Prakash
2009-01-01
Objective To devise an automated approach for integrating federated database information using database ontologies constructed from their extended metadata. Background One challenge of database federation is that the granularity of representation of equivalent data varies across systems. Dealing effectively with this problem is analogous to dealing with precoordinated vs. postcoordinated concepts in biomedical ontologies. Model Description The authors describe an approach based on ontological metadata mapping rules defined with elements of a global vocabulary, which allows a query specified at one granularity level to fetch data, where possible, from databases within the federation that use different granularities. This is implemented in OntoMediator, a newly developed production component of our previously described Query Integrator System. OntoMediator's operation is illustrated with a query that accesses three geographically separate, interoperating databases. An example based on SNOMED also illustrates the applicability of high-level rules to support the enforcement of constraints that can prevent inappropriate curator or power-user actions. Summary A rule-based framework simplifies the design and maintenance of systems where categories of data must be mapped to each other, for the purpose of either cross-database query or for curation of the contents of compositional controlled vocabularies. PMID:19567801
Query-Biased Preview over Outsourced and Encrypted Data
Luo, Guangchun; Qin, Ke; Chen, Aiguo
2013-01-01
For both convenience and security, more and more users encrypt their sensitive data before outsourcing it to a third party such as cloud storage service. However, searching for the desired documents becomes problematic since it is costly to download and decrypt each possibly needed document to check if it contains the desired content. An informative query-biased preview feature, as applied in modern search engine, could help the users to learn about the content without downloading the entire document. However, when the data are encrypted, securely extracting a keyword-in-context snippet from the data as a preview becomes a challenge. Based on private information retrieval protocol and the core concept of searchable encryption, we propose a single-server and two-round solution to securely obtain a query-biased snippet over the encrypted data from the server. We achieve this novel result by making a document (plaintext) previewable under any cryptosystem and constructing a secure index to support dynamic computation for a best matched snippet when queried by some keywords. For each document, the scheme has O(d) storage complexity and O(log(d/s) + s + d/s) communication complexity, where d is the document size and s is the snippet length. PMID:24078798
Query-biased preview over outsourced and encrypted data.
Peng, Ningduo; Luo, Guangchun; Qin, Ke; Chen, Aiguo
2013-01-01
For both convenience and security, more and more users encrypt their sensitive data before outsourcing it to a third party such as cloud storage service. However, searching for the desired documents becomes problematic since it is costly to download and decrypt each possibly needed document to check if it contains the desired content. An informative query-biased preview feature, as applied in modern search engine, could help the users to learn about the content without downloading the entire document. However, when the data are encrypted, securely extracting a keyword-in-context snippet from the data as a preview becomes a challenge. Based on private information retrieval protocol and the core concept of searchable encryption, we propose a single-server and two-round solution to securely obtain a query-biased snippet over the encrypted data from the server. We achieve this novel result by making a document (plaintext) previewable under any cryptosystem and constructing a secure index to support dynamic computation for a best matched snippet when queried by some keywords. For each document, the scheme has O(d) storage complexity and O(log(d/s) + s + d/s) communication complexity, where d is the document size and s is the snippet length.
GenoMetric Query Language: a novel approach to large-scale genomic data management.
Masseroli, Marco; Pinoli, Pietro; Venco, Francesco; Kaitoua, Abdulrahman; Jalili, Vahid; Palluzzi, Fernando; Muller, Heiko; Ceri, Stefano
2015-06-15
Improvement of sequencing technologies and data processing pipelines is rapidly providing sequencing data, with associated high-level features, of many individual genomes in multiple biological and clinical conditions. They allow for data-driven genomic, transcriptomic and epigenomic characterizations, but require state-of-the-art 'big data' computing strategies, with abstraction levels beyond available tool capabilities. We propose a high-level, declarative GenoMetric Query Language (GMQL) and a toolkit for its use. GMQL operates downstream of raw data preprocessing pipelines and supports queries over thousands of heterogeneous datasets and samples; as such it is key to genomic 'big data' analysis. GMQL leverages a simple data model that provides both abstractions of genomic region data and associated experimental, biological and clinical metadata and interoperability between many data formats. Based on Hadoop framework and Apache Pig platform, GMQL ensures high scalability, expressivity, flexibility and simplicity of use, as demonstrated by several biological query examples on ENCODE and TCGA datasets. The GMQL toolkit is freely available for non-commercial use at http://www.bioinformatics.deib.polimi.it/GMQL/. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
CellLineNavigator: a workbench for cancer cell line analysis
Krupp, Markus; Itzel, Timo; Maass, Thorsten; Hildebrandt, Andreas; Galle, Peter R.; Teufel, Andreas
2013-01-01
The CellLineNavigator database, freely available at http://www.medicalgenomics.org/celllinenavigator, is a web-based workbench for large scale comparisons of a large collection of diverse cell lines. It aims to support experimental design in the fields of genomics, systems biology and translational biomedical research. Currently, this compendium holds genome wide expression profiles of 317 different cancer cell lines, categorized into 57 different pathological states and 28 individual tissues. To enlarge the scope of CellLineNavigator, the database was furthermore closely linked to commonly used bioinformatics databases and knowledge repositories. To ensure easy data access and search ability, a simple data and an intuitive querying interface were implemented. It allows the user to explore and filter gene expression, focusing on pathological or physiological conditions. For a more complex search, the advanced query interface may be used to query for (i) differentially expressed genes; (ii) pathological or physiological conditions; or (iii) gene names or functional attributes, such as Kyoto Encyclopaedia of Genes and Genomes pathway maps. These queries may also be combined. Finally, CellLineNavigator allows additional advanced analysis of differentially regulated genes by a direct link to the Database for Annotation, Visualization and Integrated Discovery (DAVID) Bioinformatics Resources. PMID:23118487
Federated Space-Time Query for Earth Science Data Using OpenSearch Conventions
NASA Astrophysics Data System (ADS)
Lynnes, C.; Beaumont, B.; Duerr, R. E.; Hua, H.
2009-12-01
The past decade has seen a burgeoning of remote sensing and Earth science data providers, as evidenced in the growth of the Earth Science Information Partner (ESIP) federation. At the same time, the need to combine diverse data sets to enable understanding of the Earth as a system has also grown. While the expansion of data providers is in general a boon to such studies, the diversity presents a challenge to finding useful data for a given study. Locating all the data files with aerosol information for a particular volcanic eruption, for example, may involve learning and using several different search tools to execute the requisite space-time queries. To address this issue, the ESIP federation is developing a federated space-time query framework, based on the OpenSearch convention (www.opensearch.org), with Geo and Time extensions. In this framework, data providers publish OpenSearch Description Documents that describe in a machine-readable form how to execute queries against the provider. The novelty of OpenSearch is that the space-time query interface becomes both machine callable and easy enough to integrate into the web browser's search box. This flexibility, together with a simple REST (HTTP-get) interface, should allow a variety of data providers to participate in the federated search framework, from large institutional data centers to individual scientists. The simple interface enables trivial querying of multiple data sources and participation in recursive-like federated searches--all using the same common OpenSearch interface. This simplicity also makes the construction of clients easy, as does existing OpenSearch client libraries in a variety of languages. Moreover, a number of clients and aggregation services already exist and OpenSearch is already supported by a number of web browsers such as Firefox and Internet Explorer.
Bilbrey, Ann Choryan; Humber, Marika B; Plowey, Edward D; Garcia, Iliana; Chennapragada, Lakshmi; Desai, Kanchi; Rosen, Allyson; Askari, Nusha; Gallagher-Thompson, Dolores
2018-01-01
Increasing the number of Latino persons with dementia who consent to brain donation (BD) upon death is an important public health goal that has not yet been realized. This study identified the need for culturally sensitive materials to answer questions and support the decision-making process for the family. Information about existing rates of BD was obtained from the Alzheimer's Disease Centers. Several methods of data collection (query NACC database, contacting Centers, focus groups, online survey, assessing current protocol and materials) were used to give the needed background to create culturally appropriate BD materials. A decision was made that a brochure for undecided enrollees would be beneficial to discuss BD with family members. For those needing further details, a step-by-step handout would provide additional information. Through team collaboration and engagement of others in the community who work with Latinos with dementia, we believe this process allowed us to successfully create culturally appropriate informational materials that address a sensitive topic for Hispanic/Latino families. Brain tissue is needed to further knowledge about underlying biological mechanism of neurodegenerative diseases, however it is a sensitive topic. Materials assist with family discussion and facilitate the family's follow-through with BD.
Analysis of the Impact of Data Normalization on Cyber Event Correlation Query Performance
2012-03-01
2003). Organizations use it in planning, target marketing , decision-making, data analysis, and customer services (Shin, 2003). Organizations that...Following this IP address is a router message sequence number. This is a globally unique number for each router terminal and can range from...Appendix G, invokes the PERL parser for the log files from a particular USAF base, and invokes the CTL file that loads the resultant CSV file into the
High Performance Analytics with the R3-Cache
NASA Astrophysics Data System (ADS)
Eavis, Todd; Sayeed, Ruhan
Contemporary data warehouses now represent some of the world’s largest databases. As these systems grow in size and complexity, however, it becomes increasingly difficult for brute force query processing approaches to meet the performance demands of end users. Certainly, improved indexing and more selective view materialization are helpful in this regard. Nevertheless, with warehouses moving into the multi-terabyte range, it is clear that the minimization of external memory accesses must be a primary performance objective. In this paper, we describe the R 3-cache, a natively multi-dimensional caching framework designed specifically to support sophisticated warehouse/OLAP environments. R 3-cache is based upon an in-memory version of the R-tree that has been extended to support buffer pages rather than disk blocks. A key strength of the R 3-cache is that it is able to utilize multi-dimensional fragments of previous query results so as to significantly minimize the frequency and scale of disk accesses. Moreover, the new caching model directly accommodates the standard relational storage model and provides mechanisms for pro-active updates that exploit the existence of query “hot spots”. The current prototype has been evaluated as a component of the Sidera DBMS, a “shared nothing” parallel OLAP server designed for multi-terabyte analytics. Experimental results demonstrate significant performance improvements relative to simpler alternatives.
An ontology-based comparative anatomy information system
Travillian, Ravensara S.; Diatchka, Kremena; Judge, Tejinder K.; Wilamowska, Katarzyna; Shapiro, Linda G.
2010-01-01
Introduction This paper describes the design, implementation, and potential use of a comparative anatomy information system (CAIS) for querying on similarities and differences between homologous anatomical structures across species, the knowledge base it operates upon, the method it uses for determining the answers to the queries, and the user interface it employs to present the results. The relevant informatics contributions of our work include (1) the development and application of the structural difference method, a formalism for symbolically representing anatomical similarities and differences across species; (2) the design of the structure of a mapping between the anatomical models of two different species and its application to information about specific structures in humans, mice, and rats; and (3) the design of the internal syntax and semantics of the query language. These contributions provide the foundation for the development of a working system that allows users to submit queries about the similarities and differences between mouse, rat, and human anatomy; delivers result sets that describe those similarities and differences in symbolic terms; and serves as a prototype for the extension of the knowledge base to any number of species. Additionally, we expanded the domain knowledge by identifying medically relevant structural questions for the human, the mouse, and the rat, and made an initial foray into the validation of the application and its content by means of user questionnaires, software testing, and other feedback. Methods The anatomical structures of the species to be compared, as well as the mappings between species, are modeled on templates from the Foundational Model of Anatomy knowledge base, and compared using graph-matching techniques. A graphical user interface allows users to issue queries that retrieve information concerning similarities and differences between structures in the species being examined. Queries from diverse information sources, including domain experts, peer-reviewed articles, and reference books, have been used to test the system and to illustrate its potential use in comparative anatomy studies. Results 157 test queries were submitted to the CAIS system, and all of them were correctly answered. The interface was evaluated in terms of clarity and ease of use. This testing determined that the application works well, and is fairly intuitive to use, but users want to see more clarification of the meaning of the different types of possible queries. Some of the interface issues will naturally be resolved as we refine our conceptual model to deal with partial and complex homologies in the content. Conclusions The CAIS system and its associated methods are expected to be useful to biologists and translational medicine researchers. Possible applications range from supporting theoretical work in clarifying and modeling ontogenetic, physiological, pathological, and evolutionary transformations, to concrete techniques for improving the analysis of genotype–phenotype relationships among various animal models in support of a wide array of clinical and scientific initiatives. PMID:21146377
... the responding members. Read More 20th Annual White Horse Beach Charity Golf Tournament On July 29, 2017 ... and his committee held the 20th Annual White Horse Beach Charity Golf Tournament. Read More '); jQuery('#s5_ ...
Layton, Rebekah L; Brandt, Patrick D; Freeman, Ashalla M; Harrell, Jessica R; Hall, Joshua D; Sinche, Melanie
2016-01-01
A national sample of PhD-trained scientists completed training, accepted subsequent employment in academic and nonacademic positions, and were queried about their previous graduate training and current employment. Respondents indicated factors contributing to their employment decision (e.g., working conditions, salary, job security). The data indicate the relative importance of deciding factors influencing career choice, controlling for gender, initial interest in faculty careers, and number of postgraduate publications. Among both well-represented (WR; n = 3444) and underrepresented minority (URM; n = 225) respondents, faculty career choice was positively associated with desire for autonomy and partner opportunity and negatively associated with desire for leadership opportunity. Differences between groups in reasons endorsed included: variety, prestige, salary, family influence, and faculty advisor influence. Furthermore, endorsement of faculty advisor or other mentor influence and family or peer influence were surprisingly rare across groups, suggesting that formal and informal support networks could provide a missed opportunity to provide support for trainees who want to stay in faculty career paths. Reasons requiring alteration of misperceptions (e.g., limited leadership opportunity for faculty) must be distinguished from reasons requiring removal of actual barriers. Further investigation into factors that affect PhDs' career decisions can help elucidate why URM candidates are disproportionately exiting the academy. © 2016 R. L. Layton et al. CBE—Life Sciences Education © 2016 The American Society for Cell Biology. This article is distributed by The American Society for Cell Biology under license from the author(s). It is available to the public under an Attribution–Noncommercial–Share Alike 3.0 Unported Creative Commons License (http://creativecommons.org/licenses/by-nc-sa/3.0).
Content-based image retrieval on mobile devices
NASA Astrophysics Data System (ADS)
Ahmad, Iftikhar; Abdullah, Shafaq; Kiranyaz, Serkan; Gabbouj, Moncef
2005-03-01
Content-based image retrieval area possesses a tremendous potential for exploration and utilization equally for researchers and people in industry due to its promising results. Expeditious retrieval of desired images requires indexing of the content in large-scale databases along with extraction of low-level features based on the content of these images. With the recent advances in wireless communication technology and availability of multimedia capable phones it has become vital to enable query operation in image databases and retrieve results based on the image content. In this paper we present a content-based image retrieval system for mobile platforms, providing the capability of content-based query to any mobile device that supports Java platform. The system consists of light-weight client application running on a Java enabled device and a server containing a servlet running inside a Java enabled web server. The server responds to image query using efficient native code from selected image database. The client application, running on a mobile phone, is able to initiate a query request, which is handled by a servlet in the server for finding closest match to the queried image. The retrieved results are transmitted over mobile network and images are displayed on the mobile phone. We conclude that such system serves as a basis of content-based information retrieval on wireless devices and needs to cope up with factors such as constraints on hand-held devices and reduced network bandwidth available in mobile environments.
Introducing glycomics data into the Semantic Web
2013-01-01
Background Glycoscience is a research field focusing on complex carbohydrates (otherwise known as glycans)a, which can, for example, serve as “switches” that toggle between different functions of a glycoprotein or glycolipid. Due to the advancement of glycomics technologies that are used to characterize glycan structures, many glycomics databases are now publicly available and provide useful information for glycoscience research. However, these databases have almost no link to other life science databases. Results In order to implement support for the Semantic Web most efficiently for glycomics research, the developers of major glycomics databases agreed on a minimal standard for representing glycan structure and annotation information using RDF (Resource Description Framework). Moreover, all of the participants implemented this standard prototype and generated preliminary RDF versions of their data. To test the utility of the converted data, all of the data sets were uploaded into a Virtuoso triple store, and several SPARQL queries were tested as “proofs-of-concept” to illustrate the utility of the Semantic Web in querying across databases which were originally difficult to implement. Conclusions We were able to successfully retrieve information by linking UniCarbKB, GlycomeDB and JCGGDB in a single SPARQL query to obtain our target information. We also tested queries linking UniProt with GlycoEpitope as well as lectin data with GlycomeDB through PDB. As a result, we have been able to link proteomics data with glycomics data through the implementation of Semantic Web technologies, allowing for more flexible queries across these domains. PMID:24280648
Introducing glycomics data into the Semantic Web.
Aoki-Kinoshita, Kiyoko F; Bolleman, Jerven; Campbell, Matthew P; Kawano, Shin; Kim, Jin-Dong; Lütteke, Thomas; Matsubara, Masaaki; Okuda, Shujiro; Ranzinger, Rene; Sawaki, Hiromichi; Shikanai, Toshihide; Shinmachi, Daisuke; Suzuki, Yoshinori; Toukach, Philip; Yamada, Issaku; Packer, Nicolle H; Narimatsu, Hisashi
2013-11-26
Glycoscience is a research field focusing on complex carbohydrates (otherwise known as glycans)a, which can, for example, serve as "switches" that toggle between different functions of a glycoprotein or glycolipid. Due to the advancement of glycomics technologies that are used to characterize glycan structures, many glycomics databases are now publicly available and provide useful information for glycoscience research. However, these databases have almost no link to other life science databases. In order to implement support for the Semantic Web most efficiently for glycomics research, the developers of major glycomics databases agreed on a minimal standard for representing glycan structure and annotation information using RDF (Resource Description Framework). Moreover, all of the participants implemented this standard prototype and generated preliminary RDF versions of their data. To test the utility of the converted data, all of the data sets were uploaded into a Virtuoso triple store, and several SPARQL queries were tested as "proofs-of-concept" to illustrate the utility of the Semantic Web in querying across databases which were originally difficult to implement. We were able to successfully retrieve information by linking UniCarbKB, GlycomeDB and JCGGDB in a single SPARQL query to obtain our target information. We also tested queries linking UniProt with GlycoEpitope as well as lectin data with GlycomeDB through PDB. As a result, we have been able to link proteomics data with glycomics data through the implementation of Semantic Web technologies, allowing for more flexible queries across these domains.
Exposing the cancer genome atlas as a SPARQL endpoint
Deus, Helena F.; Veiga, Diogo F.; Freire, Pablo R.; Weinstein, John N.; Mills, Gordon B.; Almeida, Jonas S.
2011-01-01
The Cancer Genome Atlas (TCGA) is a multidisciplinary, multi-institutional effort to characterize several types of cancer. Datasets from biomedical domains such as TCGA present a particularly challenging task for those interested in dynamically aggregating its results because the data sources are typically both heterogeneous and distributed. The Linked Data best practices offer a solution to integrate and discover data with those characteristics, namely through exposure of data as Web services supporting SPARQL, the Resource Description Framework query language. Most SPARQL endpoints, however, cannot easily be queried by data experts. Furthermore, exposing experimental data as SPARQL endpoints remains a challenging task because, in most cases, data must first be converted to Resource Description Framework triples. In line with those requirements, we have developed an infrastructure to expose clinical, demographic and molecular data elements generated by TCGA as a SPARQL endpoint by assigning elements to entities of the Simple Sloppy Semantic Database (S3DB) management model. All components of the infrastructure are available as independent Representational State Transfer (REST) Web services to encourage reusability, and a simple interface was developed to automatically assemble SPARQL queries by navigating a representation of the TCGA domain. A key feature of the proposed solution that greatly facilitates assembly of SPARQL queries is the distinction between the TCGA domain descriptors and data elements. Furthermore, the use of the S3DB management model as a mediator enables queries to both public and protected data without the need for prior submission to a single data source. PMID:20851208
NASA Astrophysics Data System (ADS)
Cody, R. P.; Kassin, A.; Kofoed, K. B.; Copenhaver, W.; Laney, C. M.; Gaylord, A. G.; Collins, J. A.; Tweedie, C. E.
2014-12-01
The Barrow area of northern Alaska is one of the most intensely researched locations in the Arctic and the Barrow Area Information Database (BAID, www.barrowmapped.org) tracks and facilitates a gamut of research, management, and educational activities in the area. BAID is a cyberinfrastructure (CI) that details much of the historic and extant research undertaken within in the Barrow region in a suite of interactive web-based mapping and information portals (geobrowsers). The BAID user community and target audience for BAID is diverse and includes research scientists, science logisticians, land managers, educators, students, and the general public. BAID contains information on more than 12,000 Barrow area research sites that extend back to the 1940's and more than 640 remote sensing images and geospatial datasets. In a web-based setting, users can zoom, pan, query, measure distance, save or print maps and query results, and filter or view information by space, time, and/or other tags. Data are described with metadata that meet Federal Geographic Data Committee standards and are archived at the University Corporation for Atmospheric Research Earth Observing Laboratory (EOL) where non-proprietary BAID data can be freely downloaded. Recent advances include the addition of more than 2000 new research sites, provision of differential global position system (dGPS) and Unmanned Aerial Vehicle (UAV) support to visiting scientists, surveying over 80 miles of coastline to document rates of erosion, training of local GIS personal to better make use of science in local decision making, deployment and near real time connectivity to a wireless micrometeorological sensor network, links to Barrow area datasets housed at national data archives and substantial upgrades to the BAID website and web mapping applications.
Automatically finding relevant citations for clinical guideline development.
Bui, Duy Duc An; Jonnalagadda, Siddhartha; Del Fiol, Guilherme
2015-10-01
Literature database search is a crucial step in the development of clinical practice guidelines and systematic reviews. In the age of information technology, the process of literature search is still conducted manually, therefore it is costly, slow and subject to human errors. In this research, we sought to improve the traditional search approach using innovative query expansion and citation ranking approaches. We developed a citation retrieval system composed of query expansion and citation ranking methods. The methods are unsupervised and easily integrated over the PubMed search engine. To validate the system, we developed a gold standard consisting of citations that were systematically searched and screened to support the development of cardiovascular clinical practice guidelines. The expansion and ranking methods were evaluated separately and compared with baseline approaches. Compared with the baseline PubMed expansion, the query expansion algorithm improved recall (80.2% vs. 51.5%) with small loss on precision (0.4% vs. 0.6%). The algorithm could find all citations used to support a larger number of guideline recommendations than the baseline approach (64.5% vs. 37.2%, p<0.001). In addition, the citation ranking approach performed better than PubMed's "most recent" ranking (average precision +6.5%, recall@k +21.1%, p<0.001), PubMed's rank by "relevance" (average precision +6.1%, recall@k +14.8%, p<0.001), and the machine learning classifier that identifies scientifically sound studies from MEDLINE citations (average precision +4.9%, recall@k +4.2%, p<0.001). Our unsupervised query expansion and ranking techniques are more flexible and effective than PubMed's default search engine behavior and the machine learning classifier. Automated citation finding is promising to augment the traditional literature search. Copyright © 2015 Elsevier Inc. All rights reserved.
Masseroli, Marco; Kaitoua, Abdulrahman; Pinoli, Pietro; Ceri, Stefano
2016-12-01
While a huge amount of (epi)genomic data of multiple types is becoming available by using Next Generation Sequencing (NGS) technologies, the most important emerging problem is the so-called tertiary analysis, concerned with sense making, e.g., discovering how different (epi)genomic regions and their products interact and cooperate with each other. We propose a paradigm shift in tertiary analysis, based on the use of the Genomic Data Model (GDM), a simple data model which links genomic feature data to their associated experimental, biological and clinical metadata. GDM encompasses all the data formats which have been produced for feature extraction from (epi)genomic datasets. We specifically describe the mapping to GDM of SAM (Sequence Alignment/Map), VCF (Variant Call Format), NARROWPEAK (for called peaks produced by NGS ChIP-seq or DNase-seq methods), and BED (Browser Extensible Data) formats, but GDM supports as well all the formats describing experimental datasets (e.g., including copy number variations, DNA somatic mutations, or gene expressions) and annotations (e.g., regarding transcription start sites, genes, enhancers or CpG islands). We downloaded and integrated samples of all the above-mentioned data types and formats from multiple sources. The GDM is able to homogeneously describe semantically heterogeneous data and makes the ground for providing data interoperability, e.g., achieved through the GenoMetric Query Language (GMQL), a high-level, declarative query language for genomic big data. The combined use of the data model and the query language allows comprehensive processing of multiple heterogeneous data, and supports the development of domain-specific data-driven computations and bio-molecular knowledge discovery. Copyright © 2016 Elsevier Inc. All rights reserved.
Array Databases: Agile Analytics (not just) for the Earth Sciences
NASA Astrophysics Data System (ADS)
Baumann, P.; Misev, D.
2015-12-01
Gridded data, such as images, image timeseries, and climate datacubes, today are managed separately from the metadata, and with different, restricted retrieval capabilities. While databases are good at metadata modelled in tables, XML hierarchies, or RDF graphs, they traditionally do not support multi-dimensional arrays.This gap is being closed by Array Databases, pioneered by the scalable rasdaman ("raster data manager") array engine. Its declarative query language, rasql, extends SQL with array operators which are optimized and parallelized on server side. Installations can easily be mashed up securely, thereby enabling large-scale location-transparent query processing in federations. Domain experts value the integration with their commonly used tools leading to a quick learning curve.Earth, Space, and Life sciences, but also Social sciences as well as business have massive amounts of data and complex analysis challenges that are answered by rasdaman. As of today, rasdaman is mature and in operational use on hundreds of Terabytes of timeseries datacubes, with transparent query distribution across more than 1,000 nodes. Additionally, its concepts have shaped international Big Data standards in the field, including the forthcoming array extension to ISO SQL, many of which are supported by both open-source and commercial systems meantime. In the geo field, rasdaman is reference implementation for the Open Geospatial Consortium (OGC) Big Data standard, WCS, now also under adoption by ISO. Further, rasdaman is in the final stage of OSGeo incubation.In this contribution we present array queries a la rasdaman, describe the architecture and novel optimization and parallelization techniques introduced in 2015, and put this in context of the intercontinental EarthServer initiative which utilizes rasdaman for enabling agile analytics on Petascale datacubes.
A Python Geospatial Language Toolkit
NASA Astrophysics Data System (ADS)
Fillmore, D.; Pletzer, A.; Galloy, M.
2012-12-01
The volume and scope of geospatial data archives, such as collections of satellite remote sensing or climate model products, has been rapidly increasing and will continue to do so in the near future. The recently launched (October 2011) Suomi National Polar-orbiting Partnership satellite (NPP) for instance, is the first of a new generation of Earth observation platforms that will monitor the atmosphere, oceans, and ecosystems, and its suite of instruments will generate several terabytes each day in the form of multi-spectral images and derived datasets. Full exploitation of such data for scientific analysis and decision support applications has become a major computational challenge. Geophysical data exploration and knowledge discovery could benefit, in particular, from intelligent mechanisms for extracting and manipulating subsets of data relevant to the problem of interest. Potential developments include enhanced support for natural language queries and directives to geospatial datasets. The translation of natural language (that is, human spoken or written phrases) into complex but unambiguous objects and actions can be based on a context, or knowledge domain, that represents the underlying geospatial concepts. This poster describes a prototype Python module that maps English phrases onto basic geospatial objects and operations. This module, along with the associated computational geometry methods, enables the resolution of natural language directives that include geographic regions of arbitrary shape and complexity.
Using web technology and Java mobile software agents to manage outside referrals.
Murphy, S. N.; Ng, T.; Sittig, D. F.; Barnett, G. O.
1998-01-01
A prototype, web-based referral application was created with the objective of providing outside primary care providers (PCP's) the means to refer patients to the Massachusetts General Hospital and the Brigham and Women's Hospital. The application was designed to achieve the two primary objectives of providing the consultant with enough data to make decisions even at the initial visit, and providing the PCP with a prompt response from the consultant. The system uses a web browser/server to initiate the referral and Java mobile software agents to support the workflow of the referral. This combination provides a light client implementation that can run on a wide variety of hardware and software platforms found in the office of the PCP. The implementation can guarantee a high degree of security for the computer of the PCP. Agents can be adapted to support the wide variety of data types that may be used in referral transactions, including reports with complex presentation needs and scanned (faxed) images Agents can be delivered to the PCP as running applications that can perform ongoing queries and alerts at the office of the PCP. Finally, the agent architecture is designed to scale in a natural and seamless manner for unforeseen future needs. PMID:9929190
Careyva, Beth; Shaak, Kyle; Mills, Geoffrey; Johnson, Melanie; Goodrich, Samantha; Stello, Brian; Wallace, Lorraine S
2016-01-01
Technology-based patient engagement strategies (such as patient portals) are increasingly available, yet little is known about current use and barriers within practice-based research networks (PBRNs). PBRN directors have unique opportunities to inform the implementation of patient-facing technology and to translate these findings into practice. PBRN directors were queried regarding technology-based patient engagement strategies as part of the 2015 CAFM Educational Research Alliance (CERA) survey of PBRN directors. A total of 102 PBRN directors were identified via the Agency for Healthcare Research and Quality's registry; 54 of 96 eligible PBRN directors completed the survey, for a response rate of 56%. Use of technology-based patient engagement strategies within PBRNs was limited, with less than half of respondents reporting experience with the most frequently named tools (risk assessments/decision aids). Information technology (IT) support was the top barrier, followed by low rates of portal enrollment. For engaging participant practices, workload and practice leadership were cited as most important, with fewer respondents noting concerns about patient privacy. Given limited use of patient-facing technologies, PBRNs have an opportunity to clarify the optimal use of these strategies. Providing IT support and addressing clinician concerns regarding workload may facilitate the inclusion of innovative technologies in PBRNs. © Copyright 2016 by the American Board of Family Medicine.
Abugessaisa, Imad; Saevarsdottir, Saedis; Tsipras, Giorgos; Lindblad, Staffan; Sandin, Charlotta; Nikamo, Pernilla; Ståhle, Mona; Malmström, Vivianne; Klareskog, Lars; Tegnér, Jesper
2014-01-01
Translational medicine is becoming increasingly dependent upon data generated from health care, clinical research, and molecular investigations. This increasing rate of production and diversity in data has brought about several challenges, including the need to integrate fragmented databases, enable secondary use of patient clinical data from health care in clinical research, and to create information systems that clinicians and biomedical researchers can readily use. Our case study effectively integrates requirements from the clinical and biomedical researcher perspectives in a translational medicine setting. Our three principal achievements are (a) a design of a user-friendly web-based system for management and integration of clinical and molecular databases, while adhering to proper de-identification and security measures; (b) providing a real-world test of the system functionalities using clinical cohorts; and (c) system integration with a clinical decision support system to demonstrate system interoperability. We engaged two active clinical cohorts, 747 psoriasis patients and 2001 rheumatoid arthritis patients, to demonstrate efficient query possibilities across the data sources, enable cohort stratification, extract variation in antibody patterns, study biomarker predictors of treatment response in RA patients, and to explore metabolic profiles of psoriasis patients. Finally, we demonstrated system interoperability by enabling integration with an established clinical decision support system in health care. To assure the usefulness and usability of the system, we followed two approaches. First, we created a graphical user interface supporting all user interactions. Secondly we carried out a system performance evaluation study where we measured the average response time in seconds for active users, http errors, and kilobits per second received and sent. The maximum response time was found to be 0.12 seconds; no server or client errors of any kind were detected. In conclusion, the system can readily be used by clinicians and biomedical researchers in a translational medicine setting. PMID:25203647
Abugessaisa, Imad; Saevarsdottir, Saedis; Tsipras, Giorgos; Lindblad, Staffan; Sandin, Charlotta; Nikamo, Pernilla; Ståhle, Mona; Malmström, Vivianne; Klareskog, Lars; Tegnér, Jesper
2014-01-01
Translational medicine is becoming increasingly dependent upon data generated from health care, clinical research, and molecular investigations. This increasing rate of production and diversity in data has brought about several challenges, including the need to integrate fragmented databases, enable secondary use of patient clinical data from health care in clinical research, and to create information systems that clinicians and biomedical researchers can readily use. Our case study effectively integrates requirements from the clinical and biomedical researcher perspectives in a translational medicine setting. Our three principal achievements are (a) a design of a user-friendly web-based system for management and integration of clinical and molecular databases, while adhering to proper de-identification and security measures; (b) providing a real-world test of the system functionalities using clinical cohorts; and (c) system integration with a clinical decision support system to demonstrate system interoperability. We engaged two active clinical cohorts, 747 psoriasis patients and 2001 rheumatoid arthritis patients, to demonstrate efficient query possibilities across the data sources, enable cohort stratification, extract variation in antibody patterns, study biomarker predictors of treatment response in RA patients, and to explore metabolic profiles of psoriasis patients. Finally, we demonstrated system interoperability by enabling integration with an established clinical decision support system in health care. To assure the usefulness and usability of the system, we followed two approaches. First, we created a graphical user interface supporting all user interactions. Secondly we carried out a system performance evaluation study where we measured the average response time in seconds for active users, http errors, and kilobits per second received and sent. The maximum response time was found to be 0.12 seconds; no server or client errors of any kind were detected. In conclusion, the system can readily be used by clinicians and biomedical researchers in a translational medicine setting.
NASA Astrophysics Data System (ADS)
Weltzin, J. F.
2017-12-01
Phenological data from a variety of platforms - across a range of spatial and temporal scales - are required to support research, natural resource management, and policy- and decision-making in a changing world. Observational and modeled phenological data, especially when integrated with associated biophysical data (e.g., climate, land-use/land-cover, hydrology) has great potential to provide multi-faceted information critical to decision support systems, vulnerability and risk assessments, change detection applications, and early-warning and forecasting systems for natural and modified ecosystems. The USA National Phenology Network (USA-NPN; www.usanpn.org) is a national-scale science and monitoring initiative focused on understanding the drivers and feedback effects of phenological variation in changing environments. The Network maintains a centralized database of over 10M ground-based observations of plants and animals for 1954-present, and leverages these data to produce operational data products for use by a variety of audiences, including researchers and resource managers. This presentation highlights our operational data products, including the tools, maps, and services that facilitate discovery, accessibility and usability of integrated phenological information. We describe (1) the data download tool, a customizable GUI that provides geospatially referenced raw, bounded or summarized organismal and climatological data and associated metadata (including calendars, time-series curves, and XY graphs), (2) the visualization tool, which provides opportunities to explore, visualize and export or download both organismal and modeled (gridded) products at daily time-steps and relatively fine spatial resolutions ( 2.5 km to 4 km) for the period 1980 to 6 days into the future, and (3) web services that enable custom query and download of map, feature and cover services in a variety of standard formats. These operational products facilitate scaling of integrated phenological and associated data to landscapes and regions, and enable novel investigations of biophysical interactions at unprecedented scales, e.g., continental-scale migrations.
Legaz-García, María Del Carmen; Dentler, Kathrin; Fernández-Breis, Jesualdo Tomás; Cornet, Ronald
2017-01-01
ArchMS is a framework that represents clinical information and knowledge using ontologies in OWL, which facilitates semantic interoperability and thereby the exploitation and secondary use of clinical data. However, it does not yet support the automated assessment of quality of care. CLIF is a stepwise method to formalize quality indicators. The method has been implemented in the CLIF tool which supports its users in generating computable queries based on a patient data model which can be based on archetypes. To enable the automated computation of quality indicators using ontologies and archetypes, we tested whether ArchMS and the CLIF tool can be integrated. We successfully automated the process of generating SPARQL queries from quality indicators that have been formalized with CLIF and integrated them into ArchMS. Hence, ontologies and archetypes can be combined for the execution of formalized quality indicators.
A Distributed Multi-Agent System for Collaborative Information Management and Learning
NASA Technical Reports Server (NTRS)
Chen, James R.; Wolfe, Shawn R.; Wragg, Stephen D.; Koga, Dennis (Technical Monitor)
2000-01-01
In this paper, we present DIAMS, a system of distributed, collaborative agents to help users access, manage, share and exchange information. A DIAMS personal agent helps its owner find information most relevant to current needs. It provides tools and utilities for users to manage their information repositories with dynamic organization and virtual views. Flexible hierarchical display is integrated with indexed query search-to support effective information access. Automatic indexing methods are employed to support user queries and communication between agents. Contents of a repository are kept in object-oriented storage to facilitate information sharing. Collaboration between users is aided by easy sharing utilities as well as automated information exchange. Matchmaker agents are designed to establish connections between users with similar interests and expertise. DIAMS agents provide needed services for users to share and learn information from one another on the World Wide Web.
Indexing and retrieving point and region objects
NASA Astrophysics Data System (ADS)
Ibrahim, Azzam T.; Fotouhi, Farshad A.
1996-03-01
R-tree and its variants are examples of spatial data structures for paged-secondary memory. To process a query, these structures require multiple path traversals. In this paper, we present a new image access method, SB+-tree which requires a single path traversal to process a query. Also, SB+-tree will allow commercial databases an access method for spatial objects without a major change, since most commercial databases already support B+-tree as an access method for text data. The SB+-tree can be used for zero and non-zero size data objects. Non-zero size objects are approximated by their minimum bounding rectangles (MBRs). The number of SB+-trees generated is dependent upon the number of dimensions of the approximation of the object. The structure supports efficient spatial operations such as regions-overlap, distance and direction. In this paper, we experimentally and analytically demonstrate the superiority of SB+-tree over R-tree.
Agrafiotis, Dimitris K; Alex, Simson; Dai, Heng; Derkinderen, An; Farnum, Michael; Gates, Peter; Izrailev, Sergei; Jaeger, Edward P; Konstant, Paul; Leung, Albert; Lobanov, Victor S; Marichal, Patrick; Martin, Douglas; Rassokhin, Dmitrii N; Shemanarev, Maxim; Skalkin, Andrew; Stong, John; Tabruyn, Tom; Vermeiren, Marleen; Wan, Jackson; Xu, Xiang Yang; Yao, Xiang
2007-01-01
We present ABCD, an integrated drug discovery informatics platform developed at Johnson & Johnson Pharmaceutical Research & Development, L.L.C. ABCD is an attempt to bridge multiple continents, data systems, and cultures using modern information technology and to provide scientists with tools that allow them to analyze multifactorial SAR and make informed, data-driven decisions. The system consists of three major components: (1) a data warehouse, which combines data from multiple chemical and pharmacological transactional databases, designed for supreme query performance; (2) a state-of-the-art application suite, which facilitates data upload, retrieval, mining, and reporting, and (3) a workspace, which facilitates collaboration and data sharing by allowing users to share queries, templates, results, and reports across project teams, campuses, and other organizational units. Chemical intelligence, performance, and analytical sophistication lie at the heart of the new system, which was developed entirely in-house. ABCD is used routinely by more than 1000 scientists around the world and is rapidly expanding into other functional areas within the J&J organization.
Tourassi, Georgia D; Harrawood, Brian; Singh, Swatee; Lo, Joseph Y; Floyd, Carey E
2007-01-01
The purpose of this study was to evaluate image similarity measures employed in an information-theoretic computer-assisted detection (IT-CAD) scheme. The scheme was developed for content-based retrieval and detection of masses in screening mammograms. The study is aimed toward an interactive clinical paradigm where physicians query the proposed IT-CAD scheme on mammographic locations that are either visually suspicious or indicated as suspicious by other cuing CAD systems. The IT-CAD scheme provides an evidence-based, second opinion for query mammographic locations using a knowledge database of mass and normal cases. In this study, eight entropy-based similarity measures were compared with respect to retrieval precision and detection accuracy using a database of 1820 mammographic regions of interest. The IT-CAD scheme was then validated on a separate database for false positive reduction of progressively more challenging visual cues generated by an existing, in-house mass detection system. The study showed that the image similarity measures fall into one of two categories; one category is better suited to the retrieval of semantically similar cases while the second is more effective with knowledge-based decisions regarding the presence of a true mass in the query location. In addition, the IT-CAD scheme yielded a substantial reduction in false-positive detections while maintaining high detection rate for malignant masses.
Performance Prediction of a MongoDB-Based Traceability System in Smart Factory Supply Chains
Kang, Yong-Shin; Park, Il-Ha; Youm, Sekyoung
2016-01-01
In the future, with the advent of the smart factory era, manufacturing and logistics processes will become more complex, and the complexity and criticality of traceability will further increase. This research aims at developing a performance assessment method to verify scalability when implementing traceability systems based on key technologies for smart factories, such as Internet of Things (IoT) and BigData. To this end, based on existing research, we analyzed traceability requirements and an event schema for storing traceability data in MongoDB, a document-based Not Only SQL (NoSQL) database. Next, we analyzed the algorithm of the most representative traceability query and defined a query-level performance model, which is composed of response times for the components of the traceability query algorithm. Next, this performance model was solidified as a linear regression model because the response times increase linearly by a benchmark test. Finally, for a case analysis, we applied the performance model to a virtual automobile parts logistics. As a result of the case study, we verified the scalability of a MongoDB-based traceability system and predicted the point when data node servers should be expanded in this case. The traceability system performance assessment method proposed in this research can be used as a decision-making tool for hardware capacity planning during the initial stage of construction of traceability systems and during their operational phase. PMID:27983654
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tourassi, Georgia D.; Harrawood, Brian; Singh, Swatee
The purpose of this study was to evaluate image similarity measures employed in an information-theoretic computer-assisted detection (IT-CAD) scheme. The scheme was developed for content-based retrieval and detection of masses in screening mammograms. The study is aimed toward an interactive clinical paradigm where physicians query the proposed IT-CAD scheme on mammographic locations that are either visually suspicious or indicated as suspicious by other cuing CAD systems. The IT-CAD scheme provides an evidence-based, second opinion for query mammographic locations using a knowledge database of mass and normal cases. In this study, eight entropy-based similarity measures were compared with respect to retrievalmore » precision and detection accuracy using a database of 1820 mammographic regions of interest. The IT-CAD scheme was then validated on a separate database for false positive reduction of progressively more challenging visual cues generated by an existing, in-house mass detection system. The study showed that the image similarity measures fall into one of two categories; one category is better suited to the retrieval of semantically similar cases while the second is more effective with knowledge-based decisions regarding the presence of a true mass in the query location. In addition, the IT-CAD scheme yielded a substantial reduction in false-positive detections while maintaining high detection rate for malignant masses.« less
Performance Prediction of a MongoDB-Based Traceability System in Smart Factory Supply Chains.
Kang, Yong-Shin; Park, Il-Ha; Youm, Sekyoung
2016-12-14
In the future, with the advent of the smart factory era, manufacturing and logistics processes will become more complex, and the complexity and criticality of traceability will further increase. This research aims at developing a performance assessment method to verify scalability when implementing traceability systems based on key technologies for smart factories, such as Internet of Things (IoT) and BigData. To this end, based on existing research, we analyzed traceability requirements and an event schema for storing traceability data in MongoDB, a document-based Not Only SQL (NoSQL) database. Next, we analyzed the algorithm of the most representative traceability query and defined a query-level performance model, which is composed of response times for the components of the traceability query algorithm. Next, this performance model was solidified as a linear regression model because the response times increase linearly by a benchmark test. Finally, for a case analysis, we applied the performance model to a virtual automobile parts logistics. As a result of the case study, we verified the scalability of a MongoDB-based traceability system and predicted the point when data node servers should be expanded in this case. The traceability system performance assessment method proposed in this research can be used as a decision-making tool for hardware capacity planning during the initial stage of construction of traceability systems and during their operational phase.
Score Fusion and Decision Fusion for the Performance Improvement of Face Recognition
2013-07-01
0.1). A Hamming distance (HD) [7] is calculated with the FP-CGF to measure the similarities among faces. The matched face has the shortest HD from...then put into a face pattern byte (FPB) pixel- by-pixel. A HD is calculated with the FPB to measure the similarities among faces, and recognition is...all query users are included in the database), the recognition performance can be measured by a verification rate (VR), the percentage of the
Design and implementation of visualization methods for the CHANGES Spatial Decision Support System
NASA Astrophysics Data System (ADS)
Cristal, Irina; van Westen, Cees; Bakker, Wim; Greiving, Stefan
2014-05-01
The CHANGES Spatial Decision Support System (SDSS) is a web-based system aimed for risk assessment and the evaluation of optimal risk reduction alternatives at local level as a decision support tool in long-term natural risk management. The SDSS use multidimensional information, integrating thematic, spatial, temporal and documentary data. The role of visualization in this context becomes of vital importance for efficiently representing each dimension. This multidimensional aspect of the required for the system risk information, combined with the diversity of the end-users imposes the use of sophisticated visualization methods and tools. The key goal of the present work is to exploit efficiently the large amount of data in relation to the needs of the end-user, utilizing proper visualization techniques. Three main tasks have been accomplished for this purpose: categorization of the end-users, the definition of system's modules and the data definition. The graphical representation of the data and the visualization tools were designed to be relevant to the data type and the purpose of the analysis. Depending on the end-users category, each user should have access to different modules of the system and thus, to the proper visualization environment. The technologies used for the development of the visualization component combine the latest and most innovative open source JavaScript frameworks, such as OpenLayers 2.13.1, ExtJS 4 and GeoExt 2. Moreover, the model-view-controller (MVC) pattern is used in order to ensure flexibility of the system at the implementation level. Using the above technologies, the visualization techniques implemented so far offer interactive map navigation, querying and comparison tools. The map comparison tools are of great importance within the SDSS and include the following: swiping tool for comparison of different data of the same location; raster subtraction for comparison of the same phenomena varying in time; linked views for comparison of data from different locations and a time slider tool for monitoring changes in spatio-temporal data. All these techniques are part of the interactive interface of the system and make use of spatial and spatio-temporal data. Further significant aspects of the visualization component include conventional cartographic techniques and visualization of non-spatial data. The main expectation from the present work is to offer efficient visualization of risk-related data in order to facilitate the decision making process, which is the final purpose of the CHANGES SDSS. This work is part of the "CHANGES" project, funded by the European Community's 7th Framework Programme.
Supporting infobuttons with terminological knowledge.
Cimino, J. J.; Elhanan, G.; Zeng, Q.
1997-01-01
We have developed several prototype applications which integrate clinical systems with on-line information resources by using patient data to drive queries in response to user information needs. We refer to these collectively as infobuttons because they are evoked with a minimum of keyboard entry. We make use of knowledge in our terminology, the Medical Entities Dictionary (MED) to assist with the selection of appropriate queries and resources, as well as the translation of patient data to forms recognized by the resources. This paper describes the kinds of knowledge in the MED, including literal attributes, hierarchical links and other semantic links, and how this knowledge is used in system integration. PMID:9357682
Hewitt, Robin; Gobbi, Alberto; Lee, Man-Ling
2005-01-01
Relational databases are the current standard for storing and retrieving data in the pharmaceutical and biotech industries. However, retrieving data from a relational database requires specialized knowledge of the database schema and of the SQL query language. At Anadys, we have developed an easy-to-use system for searching and reporting data in a relational database to support our drug discovery project teams. This system is fast and flexible and allows users to access all data without having to write SQL queries. This paper presents the hierarchical, graph-based metadata representation and SQL-construction methods that, together, are the basis of this system's capabilities.
Supporting infobuttons with terminological knowledge.
Cimino, J J; Elhanan, G; Zeng, Q
1997-01-01
We have developed several prototype applications which integrate clinical systems with on-line information resources by using patient data to drive queries in response to user information needs. We refer to these collectively as infobuttons because they are evoked with a minimum of keyboard entry. We make use of knowledge in our terminology, the Medical Entities Dictionary (MED) to assist with the selection of appropriate queries and resources, as well as the translation of patient data to forms recognized by the resources. This paper describes the kinds of knowledge in the MED, including literal attributes, hierarchical links and other semantic links, and how this knowledge is used in system integration.
Mercury Toolset for Spatiotemporal Metadata
NASA Technical Reports Server (NTRS)
Wilson, Bruce E.; Palanisamy, Giri; Devarakonda, Ranjeet; Rhyne, B. Timothy; Lindsley, Chris; Green, James
2010-01-01
Mercury (http://mercury.ornl.gov) is a set of tools for federated harvesting, searching, and retrieving metadata, particularly spatiotemporal metadata. Version 3.0 of the Mercury toolset provides orders of magnitude improvements in search speed, support for additional metadata formats, integration with Google Maps for spatial queries, facetted type search, support for RSS (Really Simple Syndication) delivery of search results, and enhanced customization to meet the needs of the multiple projects that use Mercury. It provides a single portal to very quickly search for data and information contained in disparate data management systems, each of which may use different metadata formats. Mercury harvests metadata and key data from contributing project servers distributed around the world and builds a centralized index. The search interfaces then allow the users to perform a variety of fielded, spatial, and temporal searches across these metadata sources. This centralized repository of metadata with distributed data sources provides extremely fast search results to the user, while allowing data providers to advertise the availability of their data and maintain complete control and ownership of that data. Mercury periodically (typically daily) harvests metadata sources through a collection of interfaces and re-indexes these metadata to provide extremely rapid search capabilities, even over collections with tens of millions of metadata records. A number of both graphical and application interfaces have been constructed within Mercury, to enable both human users and other computer programs to perform queries. Mercury was also designed to support multiple different projects, so that the particular fields that can be queried and used with search filters are easy to configure for each different project.
Yu, Bowen; Doraiswamy, Harish; Chen, Xi; Miraldi, Emily; Arrieta-Ortiz, Mario Luis; Hafemeister, Christoph; Madar, Aviv; Bonneau, Richard; Silva, Cláudio T
2014-12-01
Elucidation of transcriptional regulatory networks (TRNs) is a fundamental goal in biology, and one of the most important components of TRNs are transcription factors (TFs), proteins that specifically bind to gene promoter and enhancer regions to alter target gene expression patterns. Advances in genomic technologies as well as advances in computational biology have led to multiple large regulatory network models (directed networks) each with a large corpus of supporting data and gene-annotation. There are multiple possible biological motivations for exploring large regulatory network models, including: validating TF-target gene relationships, figuring out co-regulation patterns, and exploring the coordination of cell processes in response to changes in cell state or environment. Here we focus on queries aimed at validating regulatory network models, and on coordinating visualization of primary data and directed weighted gene regulatory networks. The large size of both the network models and the primary data can make such coordinated queries cumbersome with existing tools and, in particular, inhibits the sharing of results between collaborators. In this work, we develop and demonstrate a web-based framework for coordinating visualization and exploration of expression data (RNA-seq, microarray), network models and gene-binding data (ChIP-seq). Using specialized data structures and multiple coordinated views, we design an efficient querying model to support interactive analysis of the data. Finally, we show the effectiveness of our framework through case studies for the mouse immune system (a dataset focused on a subset of key cellular functions) and a model bacteria (a small genome with high data-completeness).
Mercury Toolset for Spatiotemporal Metadata
NASA Astrophysics Data System (ADS)
Devarakonda, Ranjeet; Palanisamy, Giri; Green, James; Wilson, Bruce; Rhyne, B. Timothy; Lindsley, Chris
2010-06-01
Mercury (http://mercury.ornl.gov) is a set of tools for federated harvesting, searching, and retrieving metadata, particularly spatiotemporal metadata. Version 3.0 of the Mercury toolset provides orders of magnitude improvements in search speed, support for additional metadata formats, integration with Google Maps for spatial queries, facetted type search, support for RSS (Really Simple Syndication) delivery of search results, and enhanced customization to meet the needs of the multiple projects that use Mercury. It provides a single portal to very quickly search for data and information contained in disparate data management systems, each of which may use different metadata formats. Mercury harvests metadata and key data from contributing project servers distributed around the world and builds a centralized index. The search interfaces then allow the users to perform a variety of fielded, spatial, and temporal searches across these metadata sources. This centralized repository of metadata with distributed data sources provides extremely fast search results to the user, while allowing data providers to advertise the availability of their data and maintain complete control and ownership of that data. Mercury periodically (typically daily)harvests metadata sources through a collection of interfaces and re-indexes these metadata to provide extremely rapid search capabilities, even over collections with tens of millions of metadata records. A number of both graphical and application interfaces have been constructed within Mercury, to enable both human users and other computer programs to perform queries. Mercury was also designed to support multiple different projects, so that the particular fields that can be queried and used with search filters are easy to configure for each different project.
Oliveira, S R M; Almeida, G V; Souza, K R R; Rodrigues, D N; Kuser-Falcão, P R; Yamagishi, M E B; Santos, E H; Vieira, F D; Jardine, J G; Neshich, G
2007-10-05
An effective strategy for managing protein databases is to provide mechanisms to transform raw data into consistent, accurate and reliable information. Such mechanisms will greatly reduce operational inefficiencies and improve one's ability to better handle scientific objectives and interpret the research results. To achieve this challenging goal for the STING project, we introduce Sting_RDB, a relational database of structural parameters for protein analysis with support for data warehousing and data mining. In this article, we highlight the main features of Sting_RDB and show how a user can explore it for efficient and biologically relevant queries. Considering its importance for molecular biologists, effort has been made to advance Sting_RDB toward data quality assessment. To the best of our knowledge, Sting_RDB is one of the most comprehensive data repositories for protein analysis, now also capable of providing its users with a data quality indicator. This paper differs from our previous study in many aspects. First, we introduce Sting_RDB, a relational database with mechanisms for efficient and relevant queries using SQL. Sting_rdb evolved from the earlier, text (flat file)-based database, in which data consistency and integrity was not guaranteed. Second, we provide support for data warehousing and mining. Third, the data quality indicator was introduced. Finally and probably most importantly, complex queries that could not be posed on a text-based database, are now easily implemented. Further details are accessible at the Sting_RDB demo web page: http://www.cbi.cnptia.embrapa.br/StingRDB.
Classification of Chemical Compounds to Support Complex Queries in a Pathway Database
Weidemann, Andreas; Kania, Renate; Peiss, Christian; Rojas, Isabel
2004-01-01
Data quality in biological databases has become a topic of great discussion. To provide high quality data and to deal with the vast amount of biochemical data, annotators and curators need to be supported by software that carries out part of their work in an (semi-) automatic manner. The detection of errors and inconsistencies is a part that requires the knowledge of domain experts, thus in most cases it is done manually, making it very expensive and time-consuming. This paper presents two tools to partially support the curation of data on biochemical pathways. The tool enables the automatic classification of chemical compounds based on their respective SMILES strings. Such classification allows the querying and visualization of biochemical reactions at different levels of abstraction, according to the level of detail at which the reaction participants are described. Chemical compounds can be classified in a flexible manner based on different criteria. The support of the process of data curation is provided by facilitating the detection of compounds that are identified as different but that are actually the same. This is also used to identify similar reactions and, in turn, pathways. PMID:18629066
The NOAA Local Climate Analysis Tool - An Application in Support of a Weather Ready Nation
NASA Astrophysics Data System (ADS)
Timofeyeva, M. M.; Horsfall, F. M.
2012-12-01
Citizens across the U.S., including decision makers from the local to the national level, have a multitude of questions about climate, such as the current state and how that state fits into the historical context, and more importantly, how climate will impact them, especially with regard to linkages to extreme weather events. Developing answers to these types of questions for locations has typically required extensive work to gather data, conduct analyses, and generate relevant explanations and graphics. Too frequently providers don't have ready access to or knowledge of reliable, trusted data sets, nor sound, scientifically accepted analysis techniques such that they can provide a rapid response to queries they receive. In order to support National Weather Service (NWS) local office forecasters with information they need to deliver timely responses to climate-related questions from their customers, we have developed the Local Climate Analysis Tool (LCAT). LCAT uses the principles of artificial intelligence to respond to queries, in particular, through use of machine technology that responds intelligently to input from users. A user translates customer questions into primary variables and issues and LCAT pulls the most relevant data and analysis techniques to provide information back to the user, who in turn responds to their customer. Most responses take on the order of 10 seconds, which includes providing statistics, graphical displays of information, translations for users, metadata, and a summary of the user request to LCAT. Applications in Phase I of LCAT, which is targeted for the NWS field offices, include Climate Change Impacts, Climate Variability Impacts, Drought Analysis and Impacts, Water Resources Applications, Attribution of Extreme Events, and analysis techniques such as time series analysis, trend analysis, compositing, and correlation and regression techniques. Data accessed by LCAT are homogenized historical COOP and Climate Prediction Center climate division data available at NCDC. Applications for other NOAA offices and Federal agencies are currently being investigated, such as incorporation of tidal data, fish stocks, sea surface temperature, health-related data, and analyses relevant to those datasets. We will describe LCAT, its basic functionality, examples of analyses, and progress being made to provide the tool to a broader audience in support of ocean, fisheries, and health applications.
State & Society: Presidential Candidates Answer Queries on Science Policy
ERIC Educational Resources Information Center
Physics Today, 1976
1976-01-01
Presents views of Gerald Ford and Jimmy Carter on the role of science advisors in the Executive Office of the President, national energy needs and the nuclear power program, and federal support for basic and applied science. (MLH)
Data Sharing in DHT Based P2P Systems
NASA Astrophysics Data System (ADS)
Roncancio, Claudia; Del Pilar Villamil, María; Labbé, Cyril; Serrano-Alvarado, Patricia
The evolution of peer-to-peer (P2P) systems triggered the building of large scale distributed applications. The main application domain is data sharing across a very large number of highly autonomous participants. Building such data sharing systems is particularly challenging because of the “extreme” characteristics of P2P infrastructures: massive distribution, high churn rate, no global control, potentially untrusted participants... This article focuses on declarative querying support, query optimization and data privacy on a major class of P2P systems, that based on Distributed Hash Table (P2P DHT). The usual approaches and the algorithms used by classic distributed systems and databases for providing data privacy and querying services are not well suited to P2P DHT systems. A considerable amount of work was required to adapt them for the new challenges such systems present. This paper describes the most important solutions found. It also identifies important future research trends in data management in P2P DHT systems.
Query optimization for graph analytics on linked data using SPARQL
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hong, Seokyong; Lee, Sangkeun; Lim, Seung -Hwan
2015-07-01
Triplestores that support query languages such as SPARQL are emerging as the preferred and scalable solution to represent data and meta-data as massive heterogeneous graphs using Semantic Web standards. With increasing adoption, the desire to conduct graph-theoretic mining and exploratory analysis has also increased. Addressing that desire, this paper presents a solution that is the marriage of Graph Theory and the Semantic Web. We present software that can analyze Linked Data using graph operations such as counting triangles, finding eccentricity, testing connectedness, and computing PageRank directly on triple stores via the SPARQL interface. We describe the process of optimizing performancemore » of the SPARQL-based implementation of such popular graph algorithms by reducing the space-overhead, simplifying iterative complexity and removing redundant computations by understanding query plans. Our optimized approach shows significant performance gains on triplestores hosted on stand-alone workstations as well as hardware-optimized scalable supercomputers such as the Cray XMT.« less
Image query and indexing for digital x rays
NASA Astrophysics Data System (ADS)
Long, L. Rodney; Thoma, George R.
1998-12-01
The web-based medical information retrieval system (WebMIRS) allows interned access to databases containing 17,000 digitized x-ray spine images and associated text data from National Health and Nutrition Examination Surveys (NHANES). WebMIRS allows SQL query of the text, and viewing of the returned text records and images using a standard browser. We are now working (1) to determine utility of data directly derived from the images in our databases, and (2) to investigate the feasibility of computer-assisted or automated indexing of the images to support image retrieval of images of interest to biomedical researchers in the field of osteoarthritis. To build an initial database based on image data, we are manually segmenting a subset of the vertebrae, using techniques from vertebral morphometry. From this, we will derive and add to the database vertebral features. This image-derived data will enhance the user's data access capability by enabling the creation of combined SQL/image-content queries.
Toward An Unstructured Mesh Database
NASA Astrophysics Data System (ADS)
Rezaei Mahdiraji, Alireza; Baumann, Peter Peter
2014-05-01
Unstructured meshes are used in several application domains such as earth sciences (e.g., seismology), medicine, oceanography, cli- mate modeling, GIS as approximate representations of physical objects. Meshes subdivide a domain into smaller geometric elements (called cells) which are glued together by incidence relationships. The subdivision of a domain allows computational manipulation of complicated physical structures. For instance, seismologists model earthquakes using elastic wave propagation solvers on hexahedral meshes. The hexahedral con- tains several hundred millions of grid points and millions of hexahedral cells. Each vertex node in the hexahedrals stores a multitude of data fields. To run simulation on such meshes, one needs to iterate over all the cells, iterate over incident cells to a given cell, retrieve coordinates of cells, assign data values to cells, etc. Although meshes are used in many application domains, to the best of our knowledge there is no database vendor that support unstructured mesh features. Currently, the main tool for querying and manipulating unstructured meshes are mesh libraries, e.g., CGAL and GRAL. Mesh li- braries are dedicated libraries which includes mesh algorithms and can be run on mesh representations. The libraries do not scale with dataset size, do not have declarative query language, and need deep C++ knowledge for query implementations. Furthermore, due to high coupling between the implementations and input file structure, the implementations are less reusable and costly to maintain. A dedicated mesh database offers the following advantages: 1) declarative querying, 2) ease of maintenance, 3) hiding mesh storage structure from applications, and 4) transparent query optimization. To design a mesh database, the first challenge is to define a suitable generic data model for unstructured meshes. We proposed ImG-Complexes data model as a generic topological mesh data model which extends incidence graph model to multi-incidence relationships. We instrument ImG model with sets of optional and application-specific constraints which can be used to check validity of meshes for a specific class of object such as manifold, pseudo-manifold, and simplicial manifold. We conducted experiments to measure the performance of the graph database solution in processing mesh queries and compare it with GrAL mesh library and PostgreSQL database on synthetic and real mesh datasets. The experiments show that each system perform well on specific types of mesh queries, e.g., graph databases perform well on global path-intensive queries. In the future, we investigate database operations for the ImG model and design a mesh query language.
Sundanese ancient manuscripts search engine using probability approach
NASA Astrophysics Data System (ADS)
Suryani, Mira; Hadi, Setiawan; Paulus, Erick; Nurma Yulita, Intan; Supriatna, Asep K.
2017-10-01
Today, Information and Communication Technology (ICT) has become a regular thing for every aspect of live include cultural and heritage aspect. Sundanese ancient manuscripts as Sundanese heritage are in damage condition and also the information that containing on it. So in order to preserve the information in Sundanese ancient manuscripts and make them easier to search, a search engine has been developed. The search engine must has good computing ability. In order to get the best computation in developed search engine, three types of probabilistic approaches: Bayesian Networks Model, Divergence from Randomness with PL2 distribution, and DFR-PL2F as derivative form DFR-PL2 have been compared in this study. The three probabilistic approaches supported by index of documents and three different weighting methods: term occurrence, term frequency, and TF-IDF. The experiment involved 12 Sundanese ancient manuscripts. From 12 manuscripts there are 474 distinct terms. The developed search engine tested by 50 random queries for three types of query. The experiment results showed that for the single query and multiple query, the best searching performance given by the combination of PL2F approach and TF-IDF weighting method. The performance has been evaluated using average time responds with value about 0.08 second and Mean Average Precision (MAP) about 0.33.
Omicseq: a web-based search engine for exploring omics datasets
Sun, Xiaobo; Pittard, William S.; Xu, Tianlei; Chen, Li; Zwick, Michael E.; Jiang, Xiaoqian; Wang, Fusheng
2017-01-01
Abstract The development and application of high-throughput genomics technologies has resulted in massive quantities of diverse omics data that continue to accumulate rapidly. These rich datasets offer unprecedented and exciting opportunities to address long standing questions in biomedical research. However, our ability to explore and query the content of diverse omics data is very limited. Existing dataset search tools rely almost exclusively on the metadata. A text-based query for gene name(s) does not work well on datasets wherein the vast majority of their content is numeric. To overcome this barrier, we have developed Omicseq, a novel web-based platform that facilitates the easy interrogation of omics datasets holistically to improve ‘findability’ of relevant data. The core component of Omicseq is trackRank, a novel algorithm for ranking omics datasets that fully uses the numerical content of the dataset to determine relevance to the query entity. The Omicseq system is supported by a scalable and elastic, NoSQL database that hosts a large collection of processed omics datasets. In the front end, a simple, web-based interface allows users to enter queries and instantly receive search results as a list of ranked datasets deemed to be the most relevant. Omicseq is freely available at http://www.omicseq.org. PMID:28402462
Human motion retrieval from hand-drawn sketch.
Chao, Min-Wen; Lin, Chao-Hung; Assa, Jackie; Lee, Tong-Yee
2012-05-01
The rapid growth of motion capture data increases the importance of motion retrieval. The majority of the existing motion retrieval approaches are based on a labor-intensive step in which the user browses and selects a desired query motion clip from the large motion clip database. In this work, a novel sketching interface for defining the query is presented. This simple approach allows users to define the required motion by sketching several motion strokes over a drawn character, which requires less effort and extends the users’ expressiveness. To support the real-time interface, a specialized encoding of the motions and the hand-drawn query is required. Here, we introduce a novel hierarchical encoding scheme based on a set of orthonormal spherical harmonic (SH) basis functions, which provides a compact representation, and avoids the CPU/processing intensive stage of temporal alignment used by previous solutions. Experimental results show that the proposed approach can well retrieve the motions, and is capable of retrieve logically and numerically similar motions, which is superior to previous approaches. The user study shows that the proposed system can be a useful tool to input motion query if the users are familiar with it. Finally, an application of generating a 3D animation from a hand-drawn comics strip is demonstrated.
A rank-based Prediction Algorithm of Learning User's Intention
NASA Astrophysics Data System (ADS)
Shen, Jie; Gao, Ying; Chen, Cang; Gong, HaiPing
Internet search has become an important part in people's daily life. People can find many types of information to meet different needs through search engines on the Internet. There are two issues for the current search engines: first, the users should predetermine the types of information they want and then change to the appropriate types of search engine interfaces. Second, most search engines can support multiple kinds of search functions, each function has its own separate search interface. While users need different types of information, they must switch between different interfaces. In practice, most queries are corresponding to various types of information results. These queries can search the relevant results in various search engines, such as query "Palace" contains the websites about the introduction of the National Palace Museum, blog, Wikipedia, some pictures and video information. This paper presents a new aggregative algorithm for all kinds of search results. It can filter and sort the search results by learning three aspects about the query words, search results and search history logs to achieve the purpose of detecting user's intention. Experiments demonstrate that this rank-based method for multi-types of search results is effective. It can meet the user's search needs well, enhance user's satisfaction, provide an effective and rational model for optimizing search engines and improve user's search experience.
An Improvement to a Multi-Client Searchable Encryption Scheme for Boolean Queries.
Jiang, Han; Li, Xue; Xu, Qiuliang
2016-12-01
The migration of e-health systems to the cloud computing brings huge benefits, as same as some security risks. Searchable Encryption(SE) is a cryptography encryption scheme that can protect the confidentiality of data and utilize the encrypted data at the same time. The SE scheme proposed by Cash et al. in Crypto2013 and its follow-up work in CCS2013 are most practical SE Scheme that support Boolean queries at present. In their scheme, the data user has to generate the search tokens by the counter number one by one and interact with server repeatedly, until he meets the correct one, or goes through plenty of tokens to illustrate that there is no search result. In this paper, we make an improvement to their scheme. We allow server to send back some information and help the user to generate exact search token in the search phase. In our scheme, there are only two round interaction between server and user, and the search token has [Formula: see text] elements, where n is the keywords number in query expression, and [Formula: see text] is the minimum documents number that contains one of keyword in query expression, and the computation cost of server is [Formula: see text] modular exponentiation operation.
Designing integrated computational biology pipelines visually.
Jamil, Hasan M
2013-01-01
The long-term cost of developing and maintaining a computational pipeline that depends upon data integration and sophisticated workflow logic is too high to even contemplate "what if" or ad hoc type queries. In this paper, we introduce a novel application building interface for computational biology research, called VizBuilder, by leveraging a recent query language called BioFlow for life sciences databases. Using VizBuilder, it is now possible to develop ad hoc complex computational biology applications at throw away costs. The underlying query language supports data integration and workflow construction almost transparently and fully automatically, using a best effort approach. Users express their application by drawing it with VizBuilder icons and connecting them in a meaningful way. Completed applications are compiled and translated as BioFlow queries for execution by the data management system LifeDB, for which VizBuilder serves as a front end. We discuss VizBuilder features and functionalities in the context of a real life application after we briefly introduce BioFlow. The architecture and design principles of VizBuilder are also discussed. Finally, we outline future extensions of VizBuilder. To our knowledge, VizBuilder is a unique system that allows visually designing computational biology pipelines involving distributed and heterogeneous resources in an ad hoc manner.
Mapping analysis and planning system for the John F. Kennedy Space Center
NASA Technical Reports Server (NTRS)
Hall, C. R.; Barkaszi, M. J.; Provancha, M. J.; Reddick, N. A.; Hinkle, C. R.; Engel, B. A.; Summerfield, B. R.
1994-01-01
Environmental management, impact assessment, research and monitoring are multidisciplinary activities which are ideally suited to incorporate a multi-media approach to environmental problem solving. Geographic information systems (GIS), simulation models, neural networks and expert-system software are some of the advancing technologies being used for data management, query, analysis and display. At the 140,000 acre John F. Kennedy Space Center, the Advanced Software Technology group has been supporting development and implementation of a program that integrates these and other rapidly evolving hardware and software capabilities into a comprehensive Mapping, Analysis and Planning System (MAPS) based in a workstation/local are network environment. An expert-system shell is being developed to link the various databases to guide users through the numerous stages of a facility siting and environmental assessment. The expert-system shell approach is appealing for its ease of data access by management-level decision makers while maintaining the involvement of the data specialists. This, as well as increased efficiency and accuracy in data analysis and report preparation, can benefit any organization involved in natural resources management.
Large margin nearest neighbor classifiers.
Domeniconi, Carlotta; Gunopulos, Dimitrios; Peng, Jing
2005-07-01
The nearest neighbor technique is a simple and appealing approach to addressing classification problems. It relies on the assumption of locally constant class conditional probabilities. This assumption becomes invalid in high dimensions with a finite number of examples due to the curse of dimensionality. Severe bias can be introduced under these conditions when using the nearest neighbor rule. The employment of a locally adaptive metric becomes crucial in order to keep class conditional probabilities close to uniform, thereby minimizing the bias of estimates. We propose a technique that computes a locally flexible metric by means of support vector machines (SVMs). The decision function constructed by SVMs is used to determine the most discriminant direction in a neighborhood around the query. Such a direction provides a local feature weighting scheme. We formally show that our method increases the margin in the weighted space where classification takes place. Moreover, our method has the important advantage of online computational efficiency over competing locally adaptive techniques for nearest neighbor classification. We demonstrate the efficacy of our method using both real and simulated data.
Adapting current Arden Syntax knowledge for an object oriented event monitor.
Choi, Jeeyae; Lussier, Yves A; Mendoça, Eneida A
2003-01-01
Arden Syntax for Medical Logic Module (MLM)1 was designed for writing and sharing task-specific health knowledge in 1989. Several researchers have developed frameworks to improve the sharability and adaptability of Arden Syntax MLMs, an issue known as "curly braces" problem. Karadimas et al proposed an Arden Syntax MLM-based decision support system that uses an object oriented model and the dynamic linking features of the Java platform.2 Peleg et al proposed creating a Guideline Expression Language (GEL) based on Arden Syntax's logic grammar.3 The New York Presbyterian Hospital (NYPH) has a collection of about 200 MLMs. In a process of adapting the current MLMs for an object-oriented event monitor, we identified two problems that may influence the "curly braces" one: (1) the query expressions within the curly braces of Arden Syntax used in our institution are cryptic to the physicians, institutional dependent and written ineffectively (unpublished results), and (2) the events are coded individually within a curly braces, resulting sometimes in a large number of events - up to 200.
Context-Aware Intelligent Assistant Approach to Improving Pilot's Situational Awareness
NASA Technical Reports Server (NTRS)
Spirkovska, Lilly; Lodha, Suresh K.
2004-01-01
Faulty decision making due to inaccurate or incomplete awareness of the situation tends to be the prevailing cause of fatal general aviation accidents. Of these accidents, loss of weather situational awareness accounts for the largest number of fatalities. We describe a method for improving weather situational awareness through the support of a contextaware,domain and task knowledgeable, personalized and adaptive assistant. The assistant automatically monitors weather reports for the pilot's route of flight and warns her of detected anomalies. When and how warnings are issued is determined by phase of flight, the pilot s definition of acceptable weather conditions, and the pilot's preferences for automatic notification. In addition to automatic warnings, the pilot is able to verbally query for weather and airport information. By noting the requests she makes during the approach phase of flight, our system learns to provide the information without explicit requests on subsequent flights with similar conditions. We show that our weather assistant decreases the effort required to maintain situational awareness by more than 5.5 times when compared to the conventional method of in-flight weather briefings.
Visual analytics for semantic queries of TerraSAR-X image content
NASA Astrophysics Data System (ADS)
Espinoza-Molina, Daniela; Alonso, Kevin; Datcu, Mihai
2015-10-01
With the continuous image product acquisition of satellite missions, the size of the image archives is considerably increasing every day as well as the variety and complexity of their content, surpassing the end-user capacity to analyse and exploit them. Advances in the image retrieval field have contributed to the development of tools for interactive exploration and extraction of the images from huge archives using different parameters like metadata, key-words, and basic image descriptors. Even though we count on more powerful tools for automated image retrieval and data analysis, we still face the problem of understanding and analyzing the results. Thus, a systematic computational analysis of these results is required in order to provide to the end-user a summary of the archive content in comprehensible terms. In this context, visual analytics combines automated analysis with interactive visualizations analysis techniques for an effective understanding, reasoning and decision making on the basis of very large and complex datasets. Moreover, currently several researches are focused on associating the content of the images with semantic definitions for describing the data in a format to be easily understood by the end-user. In this paper, we present our approach for computing visual analytics and semantically querying the TerraSAR-X archive. Our approach is mainly composed of four steps: 1) the generation of a data model that explains the information contained in a TerraSAR-X product. The model is formed by primitive descriptors and metadata entries, 2) the storage of this model in a database system, 3) the semantic definition of the image content based on machine learning algorithms and relevance feedback, and 4) querying the image archive using semantic descriptors as query parameters and computing the statistical analysis of the query results. The experimental results shows that with the help of visual analytics and semantic definitions we are able to explain the image content using semantic terms and the relations between them answering questions such as what is the percentage of urban area in a region? or what is the distribution of water bodies in a city?
Harvesting implementation for the GI-cat distributed catalog
NASA Astrophysics Data System (ADS)
Boldrini, Enrico; Papeschi, Fabrizio; Bigagli, Lorenzo; Mazzetti, Paolo
2010-05-01
GI-cat framework implements a distributed catalog service supporting different international standards and interoperability arrangements in use by the geoscientific community. The distribution functionality in conjunction with the mediation functionality allows to seamlessly query remote heterogeneous data sources, including OGC Web Services - e.e. OGC CSW, WCS, WFS and WMS, community standards such as UNIDATA THREDDS/OPeNDAP, SeaDataNet CDI (Common Data Index), GBIF (Global Biodiversity Information Facility) services and OpenSearch engines. In the GI-cat modular architecture a distributor component carry out the distribution functionality by query delegation to the mediator components (one for each different data source). Each of these mediator components is able to query a specific data source and convert back the results by mapping of the foreign data model to the GI-cat internal one, based on ISO 19139. In order to cope with deployment scenarios in which local data is expected, an harvesting approach has been experimented. The new strategy comes in addition to the consolidated distributed approach, allowing the user to switch between a remote and a local search at will for each federated resource; this extends GI-cat configuration possibilities. The harvesting strategy is designed in GI-cat by the use at the core of a local cache component, implemented as a native XML database and based on eXist. The different heterogeneous sources are queried for the bulk of available data; this data is then injected into the cache component after being converted to the GI-cat data model. The query and conversion steps are performed by the mediator components that were are part of the GI-cat framework. Afterward each new query can be exercised against local data that have been stored in the cache component. Considering both advantages and shortcomings that affect harvesting and query distribution approaches, it comes out that a user driven tuning is required to take the best of them. This is often related to the specific user scenarios to be implemented. GI-cat proved to be a flexible framework to address user need. The GI-cat configurator tool was updated to make such a tuning possible: each data source can be configured to enable either harvesting or query distribution approaches; in the former case an appropriate harvesting interval can be set.
Mirel, Barbara
2009-02-13
Current usability studies of bioinformatics tools suggest that tools for exploratory analysis support some tasks related to finding relationships of interest but not the deep causal insights necessary for formulating plausible and credible hypotheses. To better understand design requirements for gaining these causal insights in systems biology analyses a longitudinal field study of 15 biomedical researchers was conducted. Researchers interacted with the same protein-protein interaction tools to discover possible disease mechanisms for further experimentation. Findings reveal patterns in scientists' exploratory and explanatory analysis and reveal that tools positively supported a number of well-structured query and analysis tasks. But for several of scientists' more complex, higher order ways of knowing and reasoning the tools did not offer adequate support. Results show that for a better fit with scientists' cognition for exploratory analysis systems biology tools need to better match scientists' processes for validating, for making a transition from classification to model-based reasoning, and for engaging in causal mental modelling. As the next great frontier in bioinformatics usability, tool designs for exploratory systems biology analysis need to move beyond the successes already achieved in supporting formulaic query and analysis tasks and now reduce current mismatches with several of scientists' higher order analytical practices. The implications of results for tool designs are discussed.
Secure searching of biomarkers through hybrid homomorphic encryption scheme.
Kim, Miran; Song, Yongsoo; Cheon, Jung Hee
2017-07-26
As genome sequencing technology develops rapidly, there has lately been an increasing need to keep genomic data secure even when stored in the cloud and still used for research. We are interested in designing a protocol for the secure outsourcing matching problem on encrypted data. We propose an efficient method to securely search a matching position with the query data and extract some information at the position. After decryption, only a small amount of comparisons with the query information should be performed in plaintext state. We apply this method to find a set of biomarkers in encrypted genomes. The important feature of our method is to encode a genomic database as a single element of polynomial ring. Since our method requires a single homomorphic multiplication of hybrid scheme for query computation, it has the advantage over the previous methods in parameter size, computation complexity, and communication cost. In particular, the extraction procedure not only prevents leakage of database information that has not been queried by user but also reduces the communication cost by half. We evaluate the performance of our method and verify that the computation on large-scale personal data can be securely and practically outsourced to a cloud environment during data analysis. It takes about 3.9 s to search-and-extract the reference and alternate sequences at the queried position in a database of size 4M. Our solution for finding a set of biomarkers in DNA sequences shows the progress of cryptographic techniques in terms of their capability can support real-world genome data analysis in a cloud environment.
Exposing the cancer genome atlas as a SPARQL endpoint.
Deus, Helena F; Veiga, Diogo F; Freire, Pablo R; Weinstein, John N; Mills, Gordon B; Almeida, Jonas S
2010-12-01
The Cancer Genome Atlas (TCGA) is a multidisciplinary, multi-institutional effort to characterize several types of cancer. Datasets from biomedical domains such as TCGA present a particularly challenging task for those interested in dynamically aggregating its results because the data sources are typically both heterogeneous and distributed. The Linked Data best practices offer a solution to integrate and discover data with those characteristics, namely through exposure of data as Web services supporting SPARQL, the Resource Description Framework query language. Most SPARQL endpoints, however, cannot easily be queried by data experts. Furthermore, exposing experimental data as SPARQL endpoints remains a challenging task because, in most cases, data must first be converted to Resource Description Framework triples. In line with those requirements, we have developed an infrastructure to expose clinical, demographic and molecular data elements generated by TCGA as a SPARQL endpoint by assigning elements to entities of the Simple Sloppy Semantic Database (S3DB) management model. All components of the infrastructure are available as independent Representational State Transfer (REST) Web services to encourage reusability, and a simple interface was developed to automatically assemble SPARQL queries by navigating a representation of the TCGA domain. A key feature of the proposed solution that greatly facilitates assembly of SPARQL queries is the distinction between the TCGA domain descriptors and data elements. Furthermore, the use of the S3DB management model as a mediator enables queries to both public and protected data without the need for prior submission to a single data source. Copyright © 2010 Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Xiong, Wei; Qiu, Bo; Tian, Qi; Mueller, Henning; Xu, Changsheng
2005-04-01
Medical image retrieval is still mainly a research domain with a large variety of applications and techniques. With the ImageCLEF 2004 benchmark, an evaluation framework has been created that includes a database, query topics and ground truth data. Eleven systems (with a total of more than 50 runs) compared their performance in various configurations. The results show that there is not any one feature that performs well on all query tasks. Key to successful retrieval is rather the selection of features and feature weights based on a specific set of input features, thus on the query task. In this paper we propose a novel method based on query topic dependent image features (QTDIF) for content-based medical image retrieval. These feature sets are designed to capture both inter-category and intra-category statistical variations to achieve good retrieval performance in terms of recall and precision. We have used Gaussian Mixture Models (GMM) and blob representation to model medical images and construct the proposed novel QTDIF for CBIR. Finally, trained multi-class support vector machines (SVM) are used for image similarity ranking. The proposed methods have been tested over the Casimage database with around 9000 images, for the given 26 image topics, used for imageCLEF 2004. The retrieval performance has been compared with the medGIFT system, which is based on the GNU Image Finding Tool (GIFT). The experimental results show that the proposed QTDIF-based CBIR can provide significantly better performance than systems based general features only.
Silva, Sara; Gouveia-Oliveira, Rodrigo; Maretzek, António; Carriço, João; Gudnason, Thorolfur; Kristinsson, Karl G; Ekdahl, Karl; Brito-Avô, António; Tomasz, Alexander; Sanches, Ilda Santos; Lencastre, Hermínia de; Almeida, Jonas
2003-01-01
Background EURIS (European Resistance Intervention Study) was launched as a multinational study in September of 2000 to identify the multitude of complex risk factors that contribute to the high carriage rate of drug resistant Streptococcus pneumoniae strains in children attending Day Care Centers in several European countries. Access to the very large number of data required the development of a web-based infrastructure – EURISWEB – that includes a relational online database, coupled with a query system for data retrieval, and allows integrative storage of demographic, clinical and molecular biology data generated in EURIS. Methods All components of the system were developed using open source programming tools: data storage management was supported by PostgreSQL, and the hypertext preprocessor to generate the web pages was implemented using PHP. The query system is based on a software agent running in the background specifically developed for EURIS. Results The website currently contains data related to 13,500 nasopharyngeal samples and over one million measures taken from 5,250 individual children, as well as over one thousand pre-made and user-made queries aggregated into several reports, approximately. It is presently in use by participating researchers from three countries (Iceland, Portugal and Sweden). Conclusion An operational model centered on a PHP engine builds the interface between the user and the database automatically, allowing an easy maintenance of the system. The query system is also sufficiently adaptable to allow the integration of several advanced data analysis procedures far more demanding than simple queries, eventually including artificial intelligence predictive models. PMID:12846930
Guhlin, Joseph; Silverstein, Kevin A T; Zhou, Peng; Tiffin, Peter; Young, Nevin D
2017-08-10
Rapid generation of omics data in recent years have resulted in vast amounts of disconnected datasets without systemic integration and knowledge building, while individual groups have made customized, annotated datasets available on the web with few ways to link them to in-lab datasets. With so many research groups generating their own data, the ability to relate it to the larger genomic and comparative genomic context is becoming increasingly crucial to make full use of the data. The Omics Database Generator (ODG) allows users to create customized databases that utilize published genomics data integrated with experimental data which can be queried using a flexible graph database. When provided with omics and experimental data, ODG will create a comparative, multi-dimensional graph database. ODG can import definitions and annotations from other sources such as InterProScan, the Gene Ontology, ENZYME, UniPathway, and others. This annotation data can be especially useful for studying new or understudied species for which transcripts have only been predicted, and rapidly give additional layers of annotation to predicted genes. In better studied species, ODG can perform syntenic annotation translations or rapidly identify characteristics of a set of genes or nucleotide locations, such as hits from an association study. ODG provides a web-based user-interface for configuring the data import and for querying the database. Queries can also be run from the command-line and the database can be queried directly through programming language hooks available for most languages. ODG supports most common genomic formats as well as generic, easy to use tab-separated value format for user-provided annotations. ODG is a user-friendly database generation and query tool that adapts to the supplied data to produce a comparative genomic database or multi-layered annotation database. ODG provides rapid comparative genomic annotation and is therefore particularly useful for non-model or understudied species. For species for which more data are available, ODG can be used to conduct complex multi-omics, pattern-matching queries.
Taboada, María; Martínez, Diego; Pilo, Belén; Jiménez-Escrig, Adriano; Robinson, Peter N; Sobrido, María J
2012-07-31
Semantic Web technology can considerably catalyze translational genetics and genomics research in medicine, where the interchange of information between basic research and clinical levels becomes crucial. This exchange involves mapping abstract phenotype descriptions from research resources, such as knowledge databases and catalogs, to unstructured datasets produced through experimental methods and clinical practice. This is especially true for the construction of mutation databases. This paper presents a way of harmonizing abstract phenotype descriptions with patient data from clinical practice, and querying this dataset about relationships between phenotypes and genetic variants, at different levels of abstraction. Due to the current availability of ontological and terminological resources that have already reached some consensus in biomedicine, a reuse-based ontology engineering approach was followed. The proposed approach uses the Ontology Web Language (OWL) to represent the phenotype ontology and the patient model, the Semantic Web Rule Language (SWRL) to bridge the gap between phenotype descriptions and clinical data, and the Semantic Query Web Rule Language (SQWRL) to query relevant phenotype-genotype bidirectional relationships. The work tests the use of semantic web technology in the biomedical research domain named cerebrotendinous xanthomatosis (CTX), using a real dataset and ontologies. A framework to query relevant phenotype-genotype bidirectional relationships is provided. Phenotype descriptions and patient data were harmonized by defining 28 Horn-like rules in terms of the OWL concepts. In total, 24 patterns of SWQRL queries were designed following the initial list of competency questions. As the approach is based on OWL, the semantic of the framework adapts the standard logical model of an open world assumption. This work demonstrates how semantic web technologies can be used to support flexible representation and computational inference mechanisms required to query patient datasets at different levels of abstraction. The open world assumption is especially good for describing only partially known phenotype-genotype relationships, in a way that is easily extensible. In future, this type of approach could offer researchers a valuable resource to infer new data from patient data for statistical analysis in translational research. In conclusion, phenotype description formalization and mapping to clinical data are two key elements for interchanging knowledge between basic and clinical research.
NASA Interactive Forms Type Interface - NIFTI
NASA Technical Reports Server (NTRS)
Jain, Bobby; Morris, Bill
2005-01-01
A flexible database query, update, modify, and delete tool was developed that provides an easy interface to Oracle forms. This tool - the NASA interactive forms type interface, or NIFTI - features on-the- fly forms creation, forms sharing among users, the capability to query the database from user-entered criteria on forms, traversal of query results, an ability to generate tab-delimited reports, viewing and downloading of reports to the user s workstation, and a hypertext-based help system. NIFTI is a very powerful ad hoc query tool that was developed using C++, X-Windows by a Motif application framework. A unique tool, NIFTI s capabilities appear in no other known commercial-off-the- shelf (COTS) tool, because NIFTI, which can be launched from the user s desktop, is a simple yet very powerful tool with a highly intuitive, easy-to-use graphical user interface (GUI) that will expedite the creation of database query/update forms. NIFTI, therefore, can be used in NASA s International Space Station (ISS) as well as within government and industry - indeed by all users of the widely disseminated Oracle base. And it will provide significant cost savings in the areas of user training and scalability while advancing the art over current COTS browsers. No COTS browser performs all the functions NIFTI does, and NIFTI is easier to use. NIFTI s cost savings are very significant considering the very large database with which it is used and the large user community with varying data requirements it will support. Its ease of use means that personnel unfamiliar with databases (e.g., managers, supervisors, clerks, and others) can develop their own personal reports. For NASA, a tool such as NIFTI was needed to query, update, modify, and make deletions within the ISS vehicle master database (VMDB), a repository of engineering data that includes an indentured parts list and associated resource data (power, thermal, volume, weight, and the like). Since the VMDB is used both as a collection point for data and as a common repository for engineering, integration, and operations teams, a tool such as NIFTI had to be designed that could expedite the creation of database query/update forms which could then be shared among users.
From Provenance Standards and Tools to Queries and Actionable Provenance
NASA Astrophysics Data System (ADS)
Ludaescher, B.
2017-12-01
The W3C PROV standard provides a minimal core for sharing retrospective provenance information for scientific workflows and scripts. PROV extensions such as DataONE's ProvONE model are necessary for linking runtime observables in retrospective provenance records with conceptual-level prospective provenance information, i.e., workflow (or dataflow) graphs. Runtime provenance recorders, such as DataONE's RunManager for R, or noWorkflow for Python capture retrospective provenance automatically. YesWorkflow (YW) is a toolkit that allows researchers to declare high-level prospective provenance models of scripts via simple inline comments (YW-annotations), revealing the computational modules and dataflow dependencies in the script. By combining and linking both forms of provenance, important queries and use cases can be supported that neither provenance model can afford on its own. We present existing and emerging provenance tools developed for the DataONE and SKOPE (Synthesizing Knowledge of Past Environments) projects. We show how the different tools can be used individually and in combination to model, capture, share, query, and visualize provenance information. We also present challenges and opportunities for making provenance information more immediately actionable for the researchers who create it in the first place. We argue that such a shift towards "provenance-for-self" is necessary to accelerate the creation, sharing, and use of provenance in support of transparent, reproducible computational and data science.
Expert system for maintenance management of a boiling water reactor power plant
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hong Shen; Liou, L.W.; Levine, S.
1992-01-01
An expert system code has been developed for the maintenance of two boiling water reactor units in Berwick, Pennsylvania, that are operated by the Pennsylvania Power and Light Company (PP and L). The objective of this expert system code, where the knowledge of experienced operators and engineers is captured and implemented, is to support the decisions regarding which components can be safely and reliably removed from service for maintenance. It can also serve as a query-answering facility for checking the plant system status and for training purposes. The operating and maintenance information of a large number of support systems, whichmore » must be available for emergencies and/or in the event of an accident, is stored in the data base of the code. It identifies the relevant technical specifications and management rules for shutting down any one of the systems or removing a component from service to support maintenance. Because of the complexity and time needed to incorporate a large number of systems and their components, the first phase of the expert system develops a prototype code, which includes only the reactor core isolation coolant system, the high-pressure core injection system, the instrument air system, the service water system, and the plant electrical system. The next phase is scheduled to expand the code to include all other systems. This paper summarizes the prototype code and the design concept of the complete expert system code for maintenance management of all plant systems and components.« less
EON: a component-based approach to automation of protocol-directed therapy.
Musen, M A; Tu, S W; Das, A K; Shahar, Y
1996-01-01
Provision of automated support for planning protocol-directed therapy requires a computer program to take as input clinical data stored in an electronic patient-record system and to generate as output recommendations for therapeutic interventions and laboratory testing that are defined by applicable protocols. This paper presents a synthesis of research carried out at Stanford University to model the therapy-planning task and to demonstrate a component-based architecture for building protocol-based decision-support systems. We have constructed general-purpose software components that (1) interpret abstract protocol specifications to construct appropriate patient-specific treatment plans; (2) infer from time-stamped patient data higher-level, interval-based, abstract concepts; (3) perform time-oriented queries on a time-oriented patient database; and (4) allow acquisition and maintenance of protocol knowledge in a manner that facilitates efficient processing both by humans and by computers. We have implemented these components in a computer system known as EON. Each of the components has been developed, evaluated, and reported independently. We have evaluated the integration of the components as a composite architecture by implementing T-HELPER, a computer-based patient-record system that uses EON to offer advice regarding the management of patients who are following clinical trial protocols for AIDS or HIV infection. A test of the reuse of the software components in a different clinical domain demonstrated rapid development of a prototype application to support protocol-based care of patients who have breast cancer. PMID:8930854
A SOA broker solution for standard discovery and access services: the GI-cat framework
NASA Astrophysics Data System (ADS)
Boldrini, Enrico
2010-05-01
GI-cat ideal users are data providers or service providers within the geoscience community. The former have their data already available through an access service (e.g. an OGC Web Service) and would have it published through a standard catalog service, in a seamless way. The latter would develop a catalog broker and let users query and access different geospatial resources through one or more standard interfaces and Application Profiles (AP) (e.g. OGC CSW ISO AP, CSW ebRIM/EO AP, etc.). GI-cat actually implements a broker components (i.e. a middleware service) which carries out distribution and mediation functionalities among "well-adopted" catalog interfaces and data access protocols. GI-cat also publishes different discovery interfaces: the OGC CSW ISO and ebRIM Application Profiles (the latter coming with support for the EO and CIM extension packages) and two different OpenSearch interfaces developed in order to explore Web 2.0 possibilities. An extended interface is also available to exploit all available GI-cat features, such as interruptible incremental queries and queries feedback. Interoperability tests performed in the context of different projects have also pointed out the importance to enforce compatibility with existing and wide-spread tools of the open source community (e.g. GeoNetwork and Deegree catalogs), which was then achieved. Based on a service-oriented framework of modular components, GI-cat can effectively be customized and tailored to support different deployment scenarios. In addition to the distribution functionality an harvesting approach has been lately experimented, allowing the user to switch between a distributed and a local search giving thus more possibilities to support different deployment scenarios. A configurator tool is available in order to enable an effective high level configuration of the broker service. A specific geobrowser was also naturally developed, for demonstrating the advanced GI-cat functionalities. This client, called GI-go, is an example of the possible applications which may be built on top of the GI-cat broker component. GI-go allows discovering and browsing of the available datasets, retrieving and evaluating their description and performing distributed queries according to any combination of the following criteria: geographic area, temporal interval, topic of interest (free-text and/or keyword selection are allowed) and data source (i.e. where, when, what, who). The results set of a query (e.g. datasets metadata) are then displayed in an incremental way leveraging the asynchronous interactions approach implemented by GI-cat. This feature allows the user to access the intermediate query results. Query interruption and feedback features are also provided to the user. Alternatively, user may perform a browsing task by selecting a catalog resource from the current configuration and navigate through its aggregated and/or leaf datasets. In both cases datasets metadata, expressed according to ISO 19139 (and also Dublin Core and ebRIM if available), are displayed for download, along with a resource portrayal and actual data access (when this is meaningful and possible). The GI-cat distributed catalog service has been successfully deployed and experimented in the framework of different projects and initiative, including the SeaDataNet FP6 project, GEOSS IP3 (Interoperability Process Pilot Project), GEOSS AIP-2 (Architectural Implementation Project - Phase 2), FP7 GENESI-DR, CNR GIIDA, FP7 EUROGEOSS and ESA HMA project.
This procedure is designed to support the collection of potentially responsive information using automated E-Discovery tools that rely on keywords, key phrases, index queries, or other technological assistance to retrieve Electronically Stored Information
Visualization and Analysis for Near-Real-Time Decision Making in Distributed Workflows
Pugmire, David; Kress, James; Choi, Jong; ...
2016-08-04
Data driven science is becoming increasingly more common, complex, and is placing tremendous stresses on visualization and analysis frameworks. Data sources producing 10GB per second (and more) are becoming increasingly commonplace in both simulation, sensor and experimental sciences. These data sources, which are often distributed around the world, must be analyzed by teams of scientists that are also distributed. Enabling scientists to view, query and interact with such large volumes of data in near-real-time requires a rich fusion of visualization and analysis techniques, middleware and workflow systems. Here, this paper discusses initial research into visualization and analysis of distributed datamore » workflows that enables scientists to make near-real-time decisions of large volumes of time varying data.« less
Boeckers, Anja; Brinkmann, Anke; Jerg-Bretzke, Lucia; Lamp, Christoph; Traue, Harald C; Boeckers, Tobias M
2010-12-20
The dissection course (DC) is an essential part of the preclinical medical curriculum that mediates professionalism. The process of dissecting, however, has an inherent additional stress potential. Our study determined student mental stress, their need of psychological support and factors influencing this need. A quantitative longitudinal query before, during and after the DC was performed including the Brief Symptom Inventory (BSI) as well as self-formulated questions used a 5-point Likert scale. Half of the students who anticipated dissection to be a stress factor reported that this declined significantly over time. Instead, student fear of not being able to cope with the work load increased significantly. As many as 64% of the students favored psychological support on the first course day, while 75% rejected this during the period of dissection and 39% appreciated this after the course. Moreover, 42% emphasized the importance of the funeral ceremony. Additionally, 75% documented their need for support in coping with stress and learning strategies. Gender, previous medical training, and BSI levels were identified as psychosocial influence factors. A majority of students named friends, members of their family or workmates as partners with whom they could talk about mental stress. Our results document the need to develop an optimum support during the DC taking into account the ascertained indicators. Exemplarily the Institute of Anatomy and Cell Biology at Ulm University suggests several options like a step by step approach for optimization. These measures reduce mental stress and help students to cope with it by the development of "detached concern" towards their "first patient" as this will decisively influence their future professional behavior. 2010 Elsevier GmbH. All rights reserved.
Query Auto-Completion Based on Word2vec Semantic Similarity
NASA Astrophysics Data System (ADS)
Shao, Taihua; Chen, Honghui; Chen, Wanyu
2018-04-01
Query auto-completion (QAC) is the first step of information retrieval, which helps users formulate the entire query after inputting only a few prefixes. Regarding the models of QAC, the traditional method ignores the contribution from the semantic relevance between queries. However, similar queries always express extremely similar search intention. In this paper, we propose a hybrid model FS-QAC based on query semantic similarity as well as the query frequency. We choose word2vec method to measure the semantic similarity between intended queries and pre-submitted queries. By combining both features, our experiments show that FS-QAC model improves the performance when predicting the user’s query intention and helping formulate the right query. Our experimental results show that the optimal hybrid model contributes to a 7.54% improvement in terms of MRR against a state-of-the-art baseline using the public AOL query logs.
A Modular Framework for Transforming Structured Data into HTML with Machine-Readable Annotations
NASA Astrophysics Data System (ADS)
Patton, E. W.; West, P.; Rozell, E.; Zheng, J.
2010-12-01
There is a plethora of web-based Content Management Systems (CMS) available for maintaining projects and data, i.a. However, each system varies in its capabilities and often content is stored separately and accessed via non-uniform web interfaces. Moving from one CMS to another (e.g., MediaWiki to Drupal) can be cumbersome, especially if a large quantity of data must be adapted to the new system. To standardize the creation, display, management, and sharing of project information, we have assembled a framework that uses existing web technologies to transform data provided by any service that supports the SPARQL Protocol and RDF Query Language (SPARQL) queries into HTML fragments, allowing it to be embedded in any existing website. The framework utilizes a two-tier XML Stylesheet Transformation (XSLT) that uses existing ontologies (e.g., Friend-of-a-Friend, Dublin Core) to interpret query results and render them as HTML documents. These ontologies can be used in conjunction with custom ontologies suited to individual needs (e.g., domain-specific ontologies for describing data records). Furthermore, this transformation process encodes machine-readable annotations, namely, the Resource Description Framework in attributes (RDFa), into the resulting HTML, so that capable parsers and search engines can extract the relationships between entities (e.g, people, organizations, datasets). To facilitate editing of content, the framework provides a web-based form system, mapping each query to a dynamically generated form that can be used to modify and create entities, while keeping the native data store up-to-date. This open framework makes it easy to duplicate data across many different sites, allowing researchers to distribute their data in many different online forums. In this presentation we will outline the structure of queries and the stylesheets used to transform them, followed by a brief walkthrough that follows the data from storage to human- and machine-accessible web page. We conclude with a discussion on content caching and steps toward performing queries across multiple domains.
Seeking Insights About Cycling Mood Disorders via Anonymized Search Logs
White, Ryen W; Horvitz, Eric
2014-01-01
Background Mood disorders affect a significant portion of the general population. Cycling mood disorders are characterized by intermittent episodes (or events) of the disease. Objective Using anonymized Web search logs, we identify a population of people with significant interest in mood stabilizing drugs (MSD) and seek evidence of mood swings in this population. Methods We extracted queries to the Microsoft Bing search engine made by 20,046 Web searchers over six months, separately explored searcher demographics using data from a large external panel of users, and sought supporting information from people with mood disorders via a survey. We analyzed changes in information needs over time relative to searches on MSD. Results Queries for MSD focused on side effects and their relation to the disease. We found evidence of significant changes in search behavior and interests coinciding with days that MSD queries are made. These include large increases (>100%) in the access of nutrition information, commercial information, and adult materials. A survey of patients diagnosed with mood disorders provided evidence that repeated queries on MSD may come with exacerbations of mood disorder. A classifier predicting the occurrence of such queries one day before they are observed obtains strong performance (AUC=0.78). Conclusions Observed patterns in search behavior align with known behaviors and those highlighted by survey respondents. These observations suggest that searchers showing intensive interest in MSD may be patients who have been prescribed these drugs. Given behavioral dynamics, we surmise that the days on which MSD queries are made may coincide with commencement of mania or depression. Although we do not have data on mood changes and whether users have been diagnosed with bipolar illness, we see evidence of cycling in people who show interest in MSD and further show that we can predict impending shifts in behavior and interest. PMID:24568936
EquiX-A Search and Query Language for XML.
ERIC Educational Resources Information Center
Cohen, Sara; Kanza, Yaron; Kogan, Yakov; Sagiv, Yehoshua; Nutt, Werner; Serebrenik, Alexander
2002-01-01
Describes EquiX, a search language for XML that combines querying with searching to query the data and the meta-data content of Web pages. Topics include search engines; a data model for XML documents; search query syntax; search query semantics; an algorithm for evaluating a query on a document; and indexing EquiX queries. (LRW)
NASA Astrophysics Data System (ADS)
Zachary, Wayne; Eggleston, Robert; Donmoyer, Jason; Schremmer, Serge
2003-09-01
Decision-making is strongly shaped and influenced by the work context in which decisions are embedded. This suggests that decision support needs to be anchored by a model (implicit or explicit) of the work process, in contrast to traditional approaches that anchor decision support to either context free decision models (e.g., utility theory) or to detailed models of the external (e.g., battlespace) environment. An architecture for cognitively-based, work centered decision support called the Work-centered Informediary Layer (WIL) is presented. WIL separates decision support into three overall processes that build and dynamically maintain an explicit context model, use the context model to identify opportunities for decision support and tailor generic decision-support strategies to the current context and offer them to the system-user/decision-maker. The generic decision support strategies include such things as activity/attention aiding, decision process structuring, work performance support (selective, contextual automation), explanation/ elaboration, infosphere data retrieval, and what if/action-projection and visualization. A WIL-based application is a work-centered decision support layer that provides active support without intent inferencing, and that is cognitively based without requiring classical cognitive task analyses. Example WIL applications are detailed and discussed.
Spatial and symbolic queries for 3D image data
NASA Astrophysics Data System (ADS)
Benson, Daniel C.; Zick, Gregory L.
1992-04-01
We present a query system for an object-oriented biomedical imaging database containing 3-D anatomical structures and their corresponding 2-D images. The graphical interface facilitates the formation of spatial queries, nonspatial or symbolic queries, and combined spatial/symbolic queries. A query editor is used for the creation and manipulation of 3-D query objects as volumes, surfaces, lines, and points. Symbolic predicates are formulated through a combination of text fields and multiple choice selections. Query results, which may include images, image contents, composite objects, graphics, and alphanumeric data, are displayed in multiple views. Objects returned by the query may be selected directly within the views for further inspection or modification, or for use as query objects in subsequent queries. Our image database query system provides visual feedback and manipulation of spatial query objects, multiple views of volume data, and the ability to combine spatial and symbolic queries. The system allows for incremental enhancement of existing objects and the addition of new objects and spatial relationships. The query system is designed for databases containing symbolic and spatial data. This paper discuses its application to data acquired in biomedical 3- D image reconstruction, but it is applicable to other areas such as CAD/CAM, geographical information systems, and computer vision.
GenoQuery: a new querying module for functional annotation in a genomic warehouse
Lemoine, Frédéric; Labedan, Bernard; Froidevaux, Christine
2008-01-01
Motivation: We have to cope with both a deluge of new genome sequences and a huge amount of data produced by high-throughput approaches used to exploit these genomic features. Crossing and comparing such heterogeneous and disparate data will help improving functional annotation of genomes. This requires designing elaborate integration systems such as warehouses for storing and querying these data. Results: We have designed a relational genomic warehouse with an original multi-layer architecture made of a databases layer and an entities layer. We describe a new querying module, GenoQuery, which is based on this architecture. We use the entities layer to define mixed queries. These mixed queries allow searching for instances of biological entities and their properties in the different databases, without specifying in which database they should be found. Accordingly, we further introduce the central notion of alternative queries. Such queries have the same meaning as the original mixed queries, while exploiting complementarities yielded by the various integrated databases of the warehouse. We explain how GenoQuery computes all the alternative queries of a given mixed query. We illustrate how useful this querying module is by means of a thorough example. Availability: http://www.lri.fr/~lemoine/GenoQuery/ Contact: chris@lri.fr, lemoine@lri.fr PMID:18586731
Stacey, Dawn; Chambers, Suzanne K; Jacobsen, Mary Jane; Dunn, Jeff
2008-11-01
To evaluate the effect of an intervention on healthcare professionals' perceptions of barriers influencing their provision of decision support for callers facing cancer-related decisions. A pre- and post-test study guided by the Ottawa Model of Research Use. Australian statewide cancer call center that provides public access to information and supportive cancer services. 34 nurses, psychologists, and other allied healthcare professionals at the cancer call center. Participants completed baseline measures and, subsequently, were exposed to an intervention that included a decision support tutorial, coaching protocol, and skill-building workshop. Strategies were implemented to address organizational barriers. Perceived barriers and facilitators influencing provision of decision support, decision support knowledge, quality of decision support provided to standardized callers, and call length. Postintervention participants felt more prepared, confident in providing decision support, and aware of decision support resources. They had a stronger belief that providing decision support was within their role. Participants significantly improved their knowledge and provided higher-quality decision support to standardized callers without changing call length. The implementation intervention overcame several identified barriers that influenced call center professionals when providing decision support. Nurses and other helpline professionals have the potential to provide decision support designed to help callers understand cancer information, clarify their values associated with their options, and reduce decisional conflict. However, they require targeted education and organizational interventions to reduce their perceived barriers to providing decision support.
Phillips, Andrew B; Wilson, Rosalind V; Kaushal, Rainu; Merrill, Jacqueline A
2014-01-01
Health information exchange (HIE) is a significant component of healthcare transformation strategies at both the state and national levels. HIE is expected to improve care coordination, and advance public health, but implementation is massively complex and involves significant risk. In New York, three regional health information organizations (RHIOs) implemented an HIE use case for public health reporting by demonstrating capability to deliver accurate responses to electronic queries via a set of services called the Universal Public Health Node. We investigated process and outcomes of the implementation with a comparative case study. Qualitative analysis was structured around a decision and risk matrix. Although each RHIO had a unique operational model, two common factors influenced risk management and implementation success: leadership capable of agile decision-making and commitment to a strong organizational vision. While all three RHIOs achieved certification for the public health reporting, only one has elected to deploy a production version. PMID:23975626
Phillips, Andrew B; Wilson, Rosalind V; Kaushal, Rainu; Merrill, Jacqueline A
2014-02-01
Health information exchange (HIE) is a significant component of healthcare transformation strategies at both the state and national levels. HIE is expected to improve care coordination, and advance public health, but implementation is massively complex and involves significant risk. In New York, three regional health information organizations (RHIOs) implemented an HIE use case for public health reporting by demonstrating capability to deliver accurate responses to electronic queries via a set of services called the Universal Public Health Node. We investigated process and outcomes of the implementation with a comparative case study. Qualitative analysis was structured around a decision and risk matrix. Although each RHIO had a unique operational model, two common factors influenced risk management and implementation success: leadership capable of agile decision-making and commitment to a strong organizational vision. While all three RHIOs achieved certification for the public health reporting, only one has elected to deploy a production version.
Preliminary Results on Uncertainty Quantification for Pattern Analytics
DOE Office of Scientific and Technical Information (OSTI.GOV)
Stracuzzi, David John; Brost, Randolph; Chen, Maximillian Gene
2015-09-01
This report summarizes preliminary research into uncertainty quantification for pattern ana- lytics within the context of the Pattern Analytics to Support High-Performance Exploitation and Reasoning (PANTHER) project. The primary focus of PANTHER was to make large quantities of remote sensing data searchable by analysts. The work described in this re- port adds nuance to both the initial data preparation steps and the search process. Search queries are transformed from does the specified pattern exist in the data? to how certain is the system that the returned results match the query? We show example results for both data processing and search,more » and discuss a number of possible improvements for each.« less
SPARK: Adapting Keyword Query to Semantic Search
NASA Astrophysics Data System (ADS)
Zhou, Qi; Wang, Chong; Xiong, Miao; Wang, Haofen; Yu, Yong
Semantic search promises to provide more accurate result than present-day keyword search. However, progress with semantic search has been delayed due to the complexity of its query languages. In this paper, we explore a novel approach of adapting keywords to querying the semantic web: the approach automatically translates keyword queries into formal logic queries so that end users can use familiar keywords to perform semantic search. A prototype system named 'SPARK' has been implemented in light of this approach. Given a keyword query, SPARK outputs a ranked list of SPARQL queries as the translation result. The translation in SPARK consists of three major steps: term mapping, query graph construction and query ranking. Specifically, a probabilistic query ranking model is proposed to select the most likely SPARQL query. In the experiment, SPARK achieved an encouraging translation result.
Griffon, N; Schuers, M; Dhombres, F; Merabti, T; Kerdelhué, G; Rollin, L; Darmoni, S J
2016-08-02
Despite international initiatives like Orphanet, it remains difficult to find up-to-date information about rare diseases. The aim of this study is to propose an exhaustive set of queries for PubMed based on terminological knowledge and to evaluate it versus the queries based on expertise provided by the most frequently used resource in Europe: Orphanet. Four rare disease terminologies (MeSH, OMIM, HPO and HRDO) were manually mapped to each other permitting the automatic creation of expended terminological queries for rare diseases. For 30 rare diseases, 30 citations retrieved by Orphanet expert query and/or query based on terminological knowledge were assessed for relevance by two independent reviewers unaware of the query's origin. An adjudication procedure was used to resolve any discrepancy. Precision, relative recall and F-measure were all computed. For each Orphanet rare disease (n = 8982), there was a corresponding terminological query, in contrast with only 2284 queries provided by Orphanet. Only 553 citations were evaluated due to queries with 0 or only a few hits. There were no significant differences between the Orpha query and terminological query in terms of precision, respectively 0.61 vs 0.52 (p = 0.13). Nevertheless, terminological queries retrieved more citations more often than Orpha queries (0.57 vs. 0.33; p = 0.01). Interestingly, Orpha queries seemed to retrieve older citations than terminological queries (p < 0.0001). The terminological queries proposed in this study are now currently available for all rare diseases. They may be a useful tool for both precision or recall oriented literature search.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zamora, Antonio
Advanced Natural Language Processing Tools for Web Information Retrieval, Content Analysis, and Synthesis. The goal of this SBIR was to implement and evaluate several advanced Natural Language Processing (NLP) tools and techniques to enhance the precision and relevance of search results by analyzing and augmenting search queries and by helping to organize the search output obtained from heterogeneous databases and web pages containing textual information of interest to DOE and the scientific-technical user communities in general. The SBIR investigated 1) the incorporation of spelling checkers in search applications, 2) identification of significant phrases and concepts using a combination of linguisticmore » and statistical techniques, and 3) enhancement of the query interface and search retrieval results through the use of semantic resources, such as thesauri. A search program with a flexible query interface was developed to search reference databases with the objective of enhancing search results from web queries or queries of specialized search systems such as DOE's Information Bridge. The DOE ETDE/INIS Joint Thesaurus was processed to create a searchable database. Term frequencies and term co-occurrences were used to enhance the web information retrieval by providing algorithmically-derived objective criteria to organize relevant documents into clusters containing significant terms. A thesaurus provides an authoritative overview and classification of a field of knowledge. By organizing the results of a search using the thesaurus terminology, the output is more meaningful than when the results are just organized based on the terms that co-occur in the retrieved documents, some of which may not be significant. An attempt was made to take advantage of the hierarchy provided by broader and narrower terms, as well as other field-specific information in the thesauri. The search program uses linguistic morphological routines to find relevant entries regardless of whether terms are stored in singular or plural form. Implementation of additional inflectional morphology processes for verbs can enhance retrieval further, but this has to be balanced by the possibility of broadening the results too much. In addition to the DOE energy thesaurus, other sources of specialized organized knowledge such as the Medical Subject Headings (MeSH), the Unified Medical Language System (UMLS), and Wikipedia were investigated. The supporting role of the NLP thesaurus search program was enhanced by incorporating spelling aid and a part-of-speech tagger to cope with misspellings in the queries and to determine the grammatical roles of the query words and identify nouns for special processing. To improve precision, multiple modes of searching were implemented including Boolean operators, and field-specific searches. Programs to convert a thesaurus or reference file into searchable support files can be deployed easily, and the resulting files are immediately searchable to produce relevance-ranked results with builtin spelling aid, morphological processing, and advanced search logic. Demonstration systems were built for several databases, including the DOE energy thesaurus.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cheng, Z; Moore, J; Rosati, L
Purpose: In radiotherapy, size, location and proximity of the target to critical structures influence treatment decisions. It has been shown that proximity of the target predicts dosimetric sparing of critical structures. In addition to dosimetry, precise location of disease has further implications such as tumor invasion, or proximity to major arteries that inhibit surgery. Knowledge of which patients can be converted to surgical candidates by radiation may have high impact on future treat/no-treat decisions. We propose a method to improve our characterization of the location of pancreatic cancer and treatment volume extent with respect to nearby arteries with the goalmore » of developing features to improve clinical predictions and decisions. Methods: Oncospace is a local learning health system that systematically captures clinical outcomes and all aspects of radiotherapy treatment plans, including overlap volume histograms (OVH) – a measure of spatial relationships between two structures. Minimum and maximum distances of PTV and OARs based on OVH, PTV volume, anatomic location by ICD-9 code, and surgical outcome were queried. Normalized distance to center from the left and right kidney was calculated to indicate tumor location and laterality. Distance to critical arteries (celiac, superior mesenteric, common hepatic) is validated by surgical status (borderline resectable, locally advanced converted to resectable). Results: There were 205 pancreas stereotactic body radiotherapy patients treated from 2009–2015 queried. Location/laterality of tumor based on kidney OVH show strong trends between location by OVH and by ICD-9. Compared to the locally advanced group, the borderline resectable group showed larger geometrical distance from critical arteries (p=0.03). Conclusion: Our platform enabled analysis of shape/size-location relationships. These data suggest that PTV volume and attention to distance between PTVs and surrounding OARs and major arteries may be promising for improving characterization of treatment anatomy that can refine our ability for outcome predictions and decision making. Elekta, Toshiba.« less
SPARQL Query Re-writing Using Partonomy Based Transformation Rules
NASA Astrophysics Data System (ADS)
Jain, Prateek; Yeh, Peter Z.; Verma, Kunal; Henson, Cory A.; Sheth, Amit P.
Often the information present in a spatial knowledge base is represented at a different level of granularity and abstraction than the query constraints. For querying ontology's containing spatial information, the precise relationships between spatial entities has to be specified in the basic graph pattern of SPARQL query which can result in long and complex queries. We present a novel approach to help users intuitively write SPARQL queries to query spatial data, rather than relying on knowledge of the ontology structure. Our framework re-writes queries, using transformation rules to exploit part-whole relations between geographical entities to address the mismatches between query constraints and knowledge base. Our experiments were performed on completely third party datasets and queries. Evaluations were performed on Geonames dataset using questions from National Geographic Bee serialized into SPARQL and British Administrative Geography Ontology using questions from a popular trivia website. These experiments demonstrate high precision in retrieval of results and ease in writing queries.
Understanding PubMed user search behavior through log analysis.
Islamaj Dogan, Rezarta; Murray, G Craig; Névéol, Aurélie; Lu, Zhiyong
2009-01-01
This article reports on a detailed investigation of PubMed users' needs and behavior as a step toward improving biomedical information retrieval. PubMed is providing free service to researchers with access to more than 19 million citations for biomedical articles from MEDLINE and life science journals. It is accessed by millions of users each day. Efficient search tools are crucial for biomedical researchers to keep abreast of the biomedical literature relating to their own research. This study provides insight into PubMed users' needs and their behavior. This investigation was conducted through the analysis of one month of log data, consisting of more than 23 million user sessions and more than 58 million user queries. Multiple aspects of users' interactions with PubMed are characterized in detail with evidence from these logs. Despite having many features in common with general Web searches, biomedical information searches have unique characteristics that are made evident in this study. PubMed users are more persistent in seeking information and they reformulate queries often. The three most frequent types of search are search by author name, search by gene/protein, and search by disease. Use of abbreviation in queries is very frequent. Factors such as result set size influence users' decisions. Analysis of characteristics such as these plays a critical role in identifying users' information needs and their search habits. In turn, such an analysis also provides useful insight for improving biomedical information retrieval.Database URL:http://www.ncbi.nlm.nih.gov/PubMed.
Implementation of Quantum Private Queries Using Nuclear Magnetic Resonance
NASA Astrophysics Data System (ADS)
Wang, Chuan; Hao, Liang; Zhao, Lian-Jie
2011-08-01
We present a modified protocol for the realization of a quantum private query process on a classical database. Using one-qubit query and CNOT operation, the query process can be realized in a two-mode database. In the query process, the data privacy is preserved as the sender would not reveal any information about the database besides her query information, and the database provider cannot retain any information about the query. We implement the quantum private query protocol in a nuclear magnetic resonance system. The density matrix of the memory registers are constructed.
Omicseq: a web-based search engine for exploring omics datasets.
Sun, Xiaobo; Pittard, William S; Xu, Tianlei; Chen, Li; Zwick, Michael E; Jiang, Xiaoqian; Wang, Fusheng; Qin, Zhaohui S
2017-07-03
The development and application of high-throughput genomics technologies has resulted in massive quantities of diverse omics data that continue to accumulate rapidly. These rich datasets offer unprecedented and exciting opportunities to address long standing questions in biomedical research. However, our ability to explore and query the content of diverse omics data is very limited. Existing dataset search tools rely almost exclusively on the metadata. A text-based query for gene name(s) does not work well on datasets wherein the vast majority of their content is numeric. To overcome this barrier, we have developed Omicseq, a novel web-based platform that facilitates the easy interrogation of omics datasets holistically to improve 'findability' of relevant data. The core component of Omicseq is trackRank, a novel algorithm for ranking omics datasets that fully uses the numerical content of the dataset to determine relevance to the query entity. The Omicseq system is supported by a scalable and elastic, NoSQL database that hosts a large collection of processed omics datasets. In the front end, a simple, web-based interface allows users to enter queries and instantly receive search results as a list of ranked datasets deemed to be the most relevant. Omicseq is freely available at http://www.omicseq.org. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
A Query Expansion Framework in Image Retrieval Domain Based on Local and Global Analysis
Rahman, M. M.; Antani, S. K.; Thoma, G. R.
2011-01-01
We present an image retrieval framework based on automatic query expansion in a concept feature space by generalizing the vector space model of information retrieval. In this framework, images are represented by vectors of weighted concepts similar to the keyword-based representation used in text retrieval. To generate the concept vocabularies, a statistical model is built by utilizing Support Vector Machine (SVM)-based classification techniques. The images are represented as “bag of concepts” that comprise perceptually and/or semantically distinguishable color and texture patches from local image regions in a multi-dimensional feature space. To explore the correlation between the concepts and overcome the assumption of feature independence in this model, we propose query expansion techniques in the image domain from a new perspective based on both local and global analysis. For the local analysis, the correlations between the concepts based on the co-occurrence pattern, and the metrical constraints based on the neighborhood proximity between the concepts in encoded images, are analyzed by considering local feedback information. We also analyze the concept similarities in the collection as a whole in the form of a similarity thesaurus and propose an efficient query expansion based on the global analysis. The experimental results on a photographic collection of natural scenes and a biomedical database of different imaging modalities demonstrate the effectiveness of the proposed framework in terms of precision and recall. PMID:21822350
ArrayBridge: Interweaving declarative array processing with high-performance computing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Xing, Haoyuan; Floratos, Sofoklis; Blanas, Spyros
Scientists are increasingly turning to datacenter-scale computers to produce and analyze massive arrays. Despite decades of database research that extols the virtues of declarative query processing, scientists still write, debug and parallelize imperative HPC kernels even for the most mundane queries. This impedance mismatch has been partly attributed to the cumbersome data loading process; in response, the database community has proposed in situ mechanisms to access data in scientific file formats. Scientists, however, desire more than a passive access method that reads arrays from files. This paper describes ArrayBridge, a bi-directional array view mechanism for scientific file formats, that aimsmore » to make declarative array manipulations interoperable with imperative file-centric analyses. Our prototype implementation of ArrayBridge uses HDF5 as the underlying array storage library and seamlessly integrates into the SciDB open-source array database system. In addition to fast querying over external array objects, ArrayBridge produces arrays in the HDF5 file format just as easily as it can read from it. ArrayBridge also supports time travel queries from imperative kernels through the unmodified HDF5 API, and automatically deduplicates between array versions for space efficiency. Our extensive performance evaluation in NERSC, a large-scale scientific computing facility, shows that ArrayBridge exhibits statistically indistinguishable performance and I/O scalability to the native SciDB storage engine.« less
Lobach, David F; Kawamoto, Kensaku; Anstrom, Kevin J; Russell, Michael L; Woods, Peter; Smith, Dwight
2007-01-01
Clinical decision support is recognized as one potential remedy for the growing crisis in healthcare quality in the United States and other industrialized nations. While decision support systems have been shown to improve care quality and reduce errors, these systems are not widely available. This lack of availability arises in part because most decision support systems are not portable or scalable. The Health Level 7 international standard development organization recently adopted a draft standard known as the Decision Support Service standard to facilitate the implementation of clinical decision support systems using software services. In this paper, we report the first implementation of a clinical decision support system using this new standard. This system provides point-of-care chronic disease management for diabetes and other conditions and is deployed throughout a large regional health system. We also report process measures and usability data concerning the system. Use of the Decision Support Service standard provides a portable and scalable approach to clinical decision support that could facilitate the more extensive use of decision support systems.
Semantics Enabled Queries in EuroGEOSS: a Discovery Augmentation Approach
NASA Astrophysics Data System (ADS)
Santoro, M.; Mazzetti, P.; Fugazza, C.; Nativi, S.; Craglia, M.
2010-12-01
One of the main challenges in Earth Science Informatics is to build interoperability frameworks which allow users to discover, evaluate, and use information from different scientific domains. This needs to address multidisciplinary interoperability challenges concerning both technological and scientific aspects. From the technological point of view, it is necessary to provide a set of special interoperability arrangement in order to develop flexible frameworks that allow a variety of loosely-coupled services to interact with each other. From a scientific point of view, it is necessary to document clearly the theoretical and methodological assumptions underpinning applications in different scientific domains, and develop cross-domain ontologies to facilitate interdisciplinary dialogue and understanding. In this presentation we discuss a brokering approach that extends the traditional Service Oriented Architecture (SOA) adopted by most Spatial Data Infrastructures (SDIs) to provide the necessary special interoperability arrangements. In the EC-funded EuroGEOSS (A European approach to GEOSS) project, we distinguish among three possible functional brokering components: discovery, access and semantics brokers. This presentation focuses on the semantics broker, the Discovery Augmentation Component (DAC), which was specifically developed to address the three thematic areas covered by the EuroGEOSS project: biodiversity, forestry and drought. The EuroGEOSS DAC federates both semantics (e.g. SKOS repositories) and ISO-compliant geospatial catalog services. The DAC can be queried using common geospatial constraints (i.e. what, where, when, etc.). Two different augmented discovery styles are supported: a) automatic query expansion; b) user assisted query expansion. In the first case, the main discovery steps are: i. the query keywords (the what constraint) are “expanded” with related concepts/terms retrieved from the set of federated semantic services. A default expansion regards the multilinguality relationship; ii. The resulting queries are submitted to the federated catalog services; iii. The DAC performs a “smart” aggregation of the queries results and provides them back to the client. In the second case, the main discovery steps are: i. the user browses the federated semantic repositories and selects the concepts/terms-of-interest; ii. The DAC creates the set of geospatial queries based on the selected concepts/terms and submits them to the federated catalog services; iii. The DAC performs a “smart” aggregation of the queries results and provides them back to the client. A Graphical User Interface (GUI) was also developed for testing and interacting with the DAC. The entire brokering framework is deployed in the context of EuroGEOSS infrastructure and it is used in a couple of GEOSS AIP-3 use scenarios: the “e-Habitat Use Scenario” for the Biodiversity and Climate Change topic, and the “Comprehensive Drought Index Use Scenario” for Water/Drought topic
Peeling the Onion: Okapi System Architecture and Software Design Issues.
ERIC Educational Resources Information Center
Jones, S.; And Others
1997-01-01
Discusses software design issues for Okapi, an information retrieval system that incorporates both search engine and user interface and supports weighted searching, relevance feedback, and query expansion. The basic search system, adjacency searching, and moving toward a distributed system are discussed. (Author/LRW)
Providing R-Tree Support for Mongodb
NASA Astrophysics Data System (ADS)
Xiang, Longgang; Shao, Xiaotian; Wang, Dehao
2016-06-01
Supporting large amounts of spatial data is a significant characteristic of modern databases. However, unlike some mature relational databases, such as Oracle and PostgreSQL, most of current burgeoning NoSQL databases are not well designed for storing geospatial data, which is becoming increasingly important in various fields. In this paper, we propose a novel method to provide R-tree index, as well as corresponding spatial range query and nearest neighbour query functions, for MongoDB, one of the most prevalent NoSQL databases. First, after in-depth analysis of MongoDB's features, we devise an efficient tabular document structure which flattens R-tree index into MongoDB collections. Further, relevant mechanisms of R-tree operations are issued, and then we discuss in detail how to integrate R-tree into MongoDB. Finally, we present the experimental results which show that our proposed method out-performs the built-in spatial index of MongoDB. Our research will greatly facilitate big data management issues with MongoDB in a variety of geospatial information applications.
Rosenbaum, Benjamin P; Silkin, Nikolay; Miller, Randolph A
2014-01-01
Real-time alerting systems typically warn providers about abnormal laboratory results or medication interactions. For more complex tasks, institutions create site-wide 'data warehouses' to support quality audits and longitudinal research. Sophisticated systems like i2b2 or Stanford's STRIDE utilize data warehouses to identify cohorts for research and quality monitoring. However, substantial resources are required to install and maintain such systems. For more modest goals, an organization desiring merely to identify patients with 'isolation' orders, or to determine patients' eligibility for clinical trials, may adopt a simpler, limited approach based on processing the output of one clinical system, and not a data warehouse. We describe a limited, order-entry-based, real-time 'pick off' tool, utilizing public domain software (PHP, MySQL). Through a web interface the tool assists users in constructing complex order-related queries and auto-generates corresponding database queries that can be executed at recurring intervals. We describe successful application of the tool for research and quality monitoring.
A study of medical and health queries to web search engines.
Spink, Amanda; Yang, Yin; Jansen, Jim; Nykanen, Pirrko; Lorence, Daniel P; Ozmutlu, Seda; Ozmutlu, H Cenk
2004-03-01
This paper reports findings from an analysis of medical or health queries to different web search engines. We report results: (i). comparing samples of 10000 web queries taken randomly from 1.2 million query logs from the AlltheWeb.com and Excite.com commercial web search engines in 2001 for medical or health queries, (ii). comparing the 2001 findings from Excite and AlltheWeb.com users with results from a previous analysis of medical and health related queries from the Excite Web search engine for 1997 and 1999, and (iii). medical or health advice-seeking queries beginning with the word 'should'. Findings suggest: (i). a small percentage of web queries are medical or health related, (ii). the top five categories of medical or health queries were: general health, weight issues, reproductive health and puberty, pregnancy/obstetrics, and human relationships, and (iii). over time, the medical and health queries may have declined as a proportion of all web queries, as the use of specialized medical/health websites and e-commerce-related queries has increased. Findings provide insights into medical and health-related web querying and suggests some implications for the use of the general web search engines when seeking medical/health information.
Monitoring Moving Queries inside a Safe Region
Al-Khalidi, Haidar; Taniar, David; Alamri, Sultan
2014-01-01
With mobile moving range queries, there is a need to recalculate the relevant surrounding objects of interest whenever the query moves. Therefore, monitoring the moving query is very costly. The safe region is one method that has been proposed to minimise the communication and computation cost of continuously monitoring a moving range query. Inside the safe region the set of objects of interest to the query do not change; thus there is no need to update the query while it is inside its safe region. However, when the query leaves its safe region the mobile device has to reevaluate the query, necessitating communication with the server. Knowing when and where the mobile device will leave a safe region is widely known as a difficult problem. To solve this problem, we propose a novel method to monitor the position of the query over time using a linear function based on the direction of the query obtained by periodic monitoring of its position. Periodic monitoring ensures that the query is aware of its location all the time. This method reduces the costs associated with communications in client-server architecture. Computational results show that our method is successful in handling moving query patterns. PMID:24696652
Sahoo, Satya S.; Bodenreider, Olivier; Rutter, Joni L.; Skinner, Karen J.; Sheth, Amit P.
2008-01-01
Objectives This paper illustrates how Semantic Web technologies (especially RDF, OWL, and SPARQL) can support information integration and make it easy to create semantic mashups (semantically integrated resources). In the context of understanding the genetic basis of nicotine dependence, we integrate gene and pathway information and show how three complex biological queries can be answered by the integrated knowledge base. Methods We use an ontology-driven approach to integrate two gene resources (Entrez Gene and HomoloGene) and three pathway resources (KEGG, Reactome and BioCyc), for five organisms, including humans. We created the Entrez Knowledge Model (EKoM), an information model in OWL for the gene resources, and integrated it with the extant BioPAX ontology designed for pathway resources. The integrated schema is populated with data from the pathway resources, publicly available in BioPAX-compatible format, and gene resources for which a population procedure was created. The SPARQL query language is used to formulate queries over the integrated knowledge base to answer the three biological queries. Results Simple SPARQL queries could easily identify hub genes, i.e., those genes whose gene products participate in many pathways or interact with many other gene products. The identification of the genes expressed in the brain turned out to be more difficult, due to the lack of a common identification scheme for proteins. Conclusion Semantic Web technologies provide a valid framework for information integration in the life sciences. Ontology-driven integration represents a flexible, sustainable and extensible solution to the integration of large volumes of information. Additional resources, which enable the creation of mappings between information sources, are required to compensate for heterogeneity across namespaces. Resource page http://knoesis.wright.edu/research/lifesci/integration/structured_data/JBI-2008/ PMID:18395495
Sahoo, Satya S; Bodenreider, Olivier; Rutter, Joni L; Skinner, Karen J; Sheth, Amit P
2008-10-01
This paper illustrates how Semantic Web technologies (especially RDF, OWL, and SPARQL) can support information integration and make it easy to create semantic mashups (semantically integrated resources). In the context of understanding the genetic basis of nicotine dependence, we integrate gene and pathway information and show how three complex biological queries can be answered by the integrated knowledge base. We use an ontology-driven approach to integrate two gene resources (Entrez Gene and HomoloGene) and three pathway resources (KEGG, Reactome and BioCyc), for five organisms, including humans. We created the Entrez Knowledge Model (EKoM), an information model in OWL for the gene resources, and integrated it with the extant BioPAX ontology designed for pathway resources. The integrated schema is populated with data from the pathway resources, publicly available in BioPAX-compatible format, and gene resources for which a population procedure was created. The SPARQL query language is used to formulate queries over the integrated knowledge base to answer the three biological queries. Simple SPARQL queries could easily identify hub genes, i.e., those genes whose gene products participate in many pathways or interact with many other gene products. The identification of the genes expressed in the brain turned out to be more difficult, due to the lack of a common identification scheme for proteins. Semantic Web technologies provide a valid framework for information integration in the life sciences. Ontology-driven integration represents a flexible, sustainable and extensible solution to the integration of large volumes of information. Additional resources, which enable the creation of mappings between information sources, are required to compensate for heterogeneity across namespaces. RESOURCE PAGE: http://knoesis.wright.edu/research/lifesci/integration/structured_data/JBI-2008/
Automatic classification and detection of clinically relevant images for diabetic retinopathy
NASA Astrophysics Data System (ADS)
Xu, Xinyu; Li, Baoxin
2008-03-01
We proposed a novel approach to automatic classification of Diabetic Retinopathy (DR) images and retrieval of clinically-relevant DR images from a database. Given a query image, our approach first classifies the image into one of the three categories: microaneurysm (MA), neovascularization (NV) and normal, and then it retrieves DR images that are clinically-relevant to the query image from an archival image database. In the classification stage, the query DR images are classified by the Multi-class Multiple-Instance Learning (McMIL) approach, where images are viewed as bags, each of which contains a number of instances corresponding to non-overlapping blocks, and each block is characterized by low-level features including color, texture, histogram of edge directions, and shape. McMIL first learns a collection of instance prototypes for each class that maximizes the Diverse Density function using Expectation- Maximization algorithm. A nonlinear mapping is then defined using the instance prototypes and maps every bag to a point in a new multi-class bag feature space. Finally a multi-class Support Vector Machine is trained in the multi-class bag feature space. In the retrieval stage, we retrieve images from the archival database who bear the same label with the query image, and who are the top K nearest neighbors of the query image in terms of similarity in the multi-class bag feature space. The classification approach achieves high classification accuracy, and the retrieval of clinically-relevant images not only facilitates utilization of the vast amount of hidden diagnostic knowledge in the database, but also improves the efficiency and accuracy of DR lesion diagnosis and assessment.
Tang, Muh-Chyun; Liu, Ying-Hsang; Wu, Wan-Ching
2013-09-01
Previous research has shown that information seekers in biomedical domain need more support in formulating their queries. A user study was conducted to evaluate the effectiveness of a metadata based query suggestion interface for PubMed bibliographic search. The study also investigated the impact of search task familiarity on search behaviors and the effectiveness of the interface. A real user, user search request and real system approach was used for the study. Unlike tradition IR evaluation, where assigned tasks were used, the participants were asked to search requests of their own. Forty-four researchers in Health Sciences participated in the evaluation - each conducted two research requests of their own, alternately with the proposed interface and the PubMed baseline. Several performance criteria were measured to assess the potential benefits of the experimental interface, including users' assessment of their original and eventual queries, the perceived usefulness of the interfaces, satisfaction with the search results, and the average relevance score of the saved records. The results show that, when searching for an unfamiliar topic, users were more likely to change their queries, indicating the effect of familiarity on search behaviors. The results also show that the interface scored higher on several of the performance criteria, such as the "goodness" of the queries, perceived usefulness, and user satisfaction. Furthermore, in line with our hypothesis, the proposed interface was relatively more effective when less familiar search requests were attempted. Results indicate that there is a selective compatibility between search familiarity and search interface. One implication of the research for system evaluation is the importance of taking into consideration task familiarity when assessing the effectiveness of interactive IR systems. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Retrieving high-resolution images over the Internet from an anatomical image database
NASA Astrophysics Data System (ADS)
Strupp-Adams, Annette; Henderson, Earl
1999-12-01
The Visible Human Data set is an important contribution to the national collection of anatomical images. To enhance the availability of these images, the National Library of Medicine has supported the design and development of a prototype object-oriented image database which imports, stores, and distributes high resolution anatomical images in both pixel and voxel formats. One of the key database modules is its client-server Internet interface. This Web interface provides a query engine with retrieval access to high-resolution anatomical images that range in size from 100KB for browser viewable rendered images, to 1GB for anatomical structures in voxel file formats. The Web query and retrieval client-server system is composed of applet GUIs, servlets, and RMI application modules which communicate with each other to allow users to query for specific anatomical structures, and retrieve image data as well as associated anatomical images from the database. Selected images can be downloaded individually as single files via HTTP or downloaded in batch-mode over the Internet to the user's machine through an applet that uses Netscape's Object Signing mechanism. The image database uses ObjectDesign's object-oriented DBMS, ObjectStore that has a Java interface. The query and retrieval systems has been tested with a Java-CDE window system, and on the x86 architecture using Windows NT 4.0. This paper describes the Java applet client search engine that queries the database; the Java client module that enables users to view anatomical images online; the Java application server interface to the database which organizes data returned to the user, and its distribution engine that allow users to download image files individually and/or in batch-mode.
How Decision Support Systems Can Benefit from a Theory of Change Approach.
Allen, Will; Cruz, Jennyffer; Warburton, Bruce
2017-06-01
Decision support systems are now mostly computer and internet-based information systems designed to support land managers with complex decision-making. However, there is concern that many environmental and agricultural decision support systems remain underutilized and ineffective. Recent efforts to improve decision support systems use have focused on enhancing stakeholder participation in their development, but a mismatch between stakeholders' expectations and the reality of decision support systems outputs continues to limit uptake. Additional challenges remain in problem-framing and evaluation. We propose using an outcomes-based approach called theory of change in conjunction with decision support systems development to support both wider problem-framing and outcomes-based monitoring and evaluation. The theory of change helps framing by placing the decision support systems within a wider context. It highlights how decision support systems use can "contribute" to long-term outcomes, and helps align decision support systems outputs with these larger goals. We illustrate the benefits of linking decision support systems development and application with a theory of change approach using an example of pest rabbit management in Australia. We develop a theory of change that outlines the activities required to achieve the outcomes desired from an effective rabbit management program, and two decision support systems that contribute to specific aspects of decision making in this wider problem context. Using a theory of change in this way should increase acceptance of the role of decision support systems by end-users, clarify their limitations and, importantly, increase effectiveness of rabbit management. The use of a theory of change should benefit those seeking to improve decision support systems design, use and, evaluation.
How Decision Support Systems Can Benefit from a Theory of Change Approach
NASA Astrophysics Data System (ADS)
Allen, Will; Cruz, Jennyffer; Warburton, Bruce
2017-06-01
Decision support systems are now mostly computer and internet-based information systems designed to support land managers with complex decision-making. However, there is concern that many environmental and agricultural decision support systems remain underutilized and ineffective. Recent efforts to improve decision support systems use have focused on enhancing stakeholder participation in their development, but a mismatch between stakeholders' expectations and the reality of decision support systems outputs continues to limit uptake. Additional challenges remain in problem-framing and evaluation. We propose using an outcomes-based approach called theory of change in conjunction with decision support systems development to support both wider problem-framing and outcomes-based monitoring and evaluation. The theory of change helps framing by placing the decision support systems within a wider context. It highlights how decision support systems use can "contribute" to long-term outcomes, and helps align decision support systems outputs with these larger goals. We illustrate the benefits of linking decision support systems development and application with a theory of change approach using an example of pest rabbit management in Australia. We develop a theory of change that outlines the activities required to achieve the outcomes desired from an effective rabbit management program, and two decision support systems that contribute to specific aspects of decision making in this wider problem context. Using a theory of change in this way should increase acceptance of the role of decision support systems by end-users, clarify their limitations and, importantly, increase effectiveness of rabbit management. The use of a theory of change should benefit those seeking to improve decision support systems design, use and, evaluation.