Rogers, Patrick; Erdal, Selnur; Santangelo, Jennifer; Liu, Jianhua; Schuster, Dara; Kamal, Jyoti
2008-11-06
The Ohio State University Medical Center (OSUMC) Information Warehouse (IW) is a comprehensive data warehousing facility incorporating operational, clinical, and biological data sets from multiple enterprise system. It is common for users of the IW to request complex ad-hoc queries that often require significant intervention by data analyst. In response to this challenge, we have designed a workflow that leverages synthesized data elements to support such queries in an more timely, efficient manner.
2013-01-01
Background Clinical Intelligence, as a research and engineering discipline, is dedicated to the development of tools for data analysis for the purposes of clinical research, surveillance, and effective health care management. Self-service ad hoc querying of clinical data is one desirable type of functionality. Since most of the data are currently stored in relational or similar form, ad hoc querying is problematic as it requires specialised technical skills and the knowledge of particular data schemas. Results A possible solution is semantic querying where the user formulates queries in terms of domain ontologies that are much easier to navigate and comprehend than data schemas. In this article, we are exploring the possibility of using SADI Semantic Web services for semantic querying of clinical data. We have developed a prototype of a semantic querying infrastructure for the surveillance of, and research on, hospital-acquired infections. Conclusions Our results suggest that SADI can support ad-hoc, self-service, semantic queries of relational data in a Clinical Intelligence context. The use of SADI compares favourably with approaches based on declarative semantic mappings from data schemas to ontologies, such as query rewriting and RDFizing by materialisation, because it can easily cope with situations when (i) some computation is required to turn relational data into RDF or OWL, e.g., to implement temporal reasoning, or (ii) integration with external data sources is necessary. PMID:23497556
Designing integrated computational biology pipelines visually.
Jamil, Hasan M
2013-01-01
The long-term cost of developing and maintaining a computational pipeline that depends upon data integration and sophisticated workflow logic is too high to even contemplate "what if" or ad hoc type queries. In this paper, we introduce a novel application building interface for computational biology research, called VizBuilder, by leveraging a recent query language called BioFlow for life sciences databases. Using VizBuilder, it is now possible to develop ad hoc complex computational biology applications at throw away costs. The underlying query language supports data integration and workflow construction almost transparently and fully automatically, using a best effort approach. Users express their application by drawing it with VizBuilder icons and connecting them in a meaningful way. Completed applications are compiled and translated as BioFlow queries for execution by the data management system LifeDB, for which VizBuilder serves as a front end. We discuss VizBuilder features and functionalities in the context of a real life application after we briefly introduce BioFlow. The architecture and design principles of VizBuilder are also discussed. Finally, we outline future extensions of VizBuilder. To our knowledge, VizBuilder is a unique system that allows visually designing computational biology pipelines involving distributed and heterogeneous resources in an ad hoc manner.
Ad-Hoc Queries over Document Collections - A Case Study
NASA Astrophysics Data System (ADS)
Löser, Alexander; Lutter, Steffen; Düssel, Patrick; Markl, Volker
We discuss the novel problem of supporting analytical business intelligence queries over web-based textual content, e.g., BI-style reports based on 100.000's of documents from an ad-hoc web search result. Neither conventional search engines nor conventional Business Intelligence and ETL tools address this problem, which lies at the intersection of their capabilities. "Google Squared" or our system GOOLAP.info, are examples of these kinds of systems. They execute information extraction methods over one or several document collections at query time and integrate extracted records into a common view or tabular structure. Frequent extraction and object resolution failures cause incomplete records which could not be joined into a record answering the query. Our focus is the identification of join-reordering heuristics maximizing the size of complete records answering a structured query. With respect to given costs for document extraction we propose two novel join-operations: The multi-way CJ-operator joins records from multiple relationships extracted from a single document. The two-way join-operator DJ ensures data density by removing incomplete records from results. In a preliminary case study we observe that our join-reordering heuristics positively impact result size, record density and lower execution costs.
Developing A Web-based User Interface for Semantic Information Retrieval
NASA Technical Reports Server (NTRS)
Berrios, Daniel C.; Keller, Richard M.
2003-01-01
While there are now a number of languages and frameworks that enable computer-based systems to search stored data semantically, the optimal design for effective user interfaces for such systems is still uncle ar. Such interfaces should mask unnecessary query detail from users, yet still allow them to build queries of arbitrary complexity without significant restrictions. We developed a user interface supporting s emantic query generation for Semanticorganizer, a tool used by scient ists and engineers at NASA to construct networks of knowledge and dat a. Through this interface users can select node types, node attribute s and node links to build ad-hoc semantic queries for searching the S emanticOrganizer network.
A high performance, ad-hoc, fuzzy query processing system for relational databases
NASA Technical Reports Server (NTRS)
Mansfield, William H., Jr.; Fleischman, Robert M.
1992-01-01
Database queries involving imprecise or fuzzy predicates are currently an evolving area of academic and industrial research. Such queries place severe stress on the indexing and I/O subsystems of conventional database environments since they involve the search of large numbers of records. The Datacycle architecture and research prototype is a database environment that uses filtering technology to perform an efficient, exhaustive search of an entire database. It has recently been modified to include fuzzy predicates in its query processing. The approach obviates the need for complex index structures, provides unlimited query throughput, permits the use of ad-hoc fuzzy membership functions, and provides a deterministic response time largely independent of query complexity and load. This paper describes the Datacycle prototype implementation of fuzzy queries and some recent performance results.
The StarView intelligent query mechanism
NASA Technical Reports Server (NTRS)
Semmel, R. D.; Silberberg, D. P.
1993-01-01
The StarView interface is being developed to facilitate the retrieval of scientific and engineering data produced by the Hubble Space Telescope. While predefined screens in the interface can be used to specify many common requests, ad hoc requests require a dynamic query formulation capability. Unfortunately, logical level knowledge is too sparse to support this capability. In particular, essential formulation knowledge is lost when the domain of interest is mapped to a set of database relation schemas. Thus, a system known as QUICK has been developed that uses conceptual design knowledge to facilitate query formulation. By heuristically determining strongly associated objects at the conceptual level, QUICK is able to formulate semantically reasonable queries in response to high-level requests that specify only attributes of interest. Moreover, by exploiting constraint knowledge in the conceptual design, QUICK assures that queries are formulated quickly and will execute efficiently.
Information engineering for molecular diagnostics.
Sorace, J. M.; Ritondo, M.; Canfield, K.
1994-01-01
Clinical laboratories are beginning to apply the recent advances in molecular biology to the testing of patient samples. The emerging field of Molecular Diagnostics will require a new Molecular Diagnostics Laboratory Information System which handles the data types, samples and test methods found in this field. The system must be very flexible in regards to supporting ad-hoc queries. The requirements which are shaping the developments in this field are reviewed and a data model developed. Several queries which demonstrate the data models ability to support the information needs of this area have been developed and run. These results demonstrate the ability of the purposed data model to meet the current and projected needs of this rapidly expanding field. PMID:7949937
Towards Hybrid Online On-Demand Querying of Realtime Data with Stateful Complex Event Processing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhou, Qunzhi; Simmhan, Yogesh; Prasanna, Viktor K.
Emerging Big Data applications in areas like e-commerce and energy industry require both online and on-demand queries to be performed over vast and fast data arriving as streams. These present novel challenges to Big Data management systems. Complex Event Processing (CEP) is recognized as a high performance online query scheme which in particular deals with the velocity aspect of the 3-V’s of Big Data. However, traditional CEP systems do not consider data variety and lack the capability to embed ad hoc queries over the volume of data streams. In this paper, we propose H2O, a stateful complex event processing framework,more » to support hybrid online and on-demand queries over realtime data. We propose a semantically enriched event and query model to address data variety. A formal query algebra is developed to precisely capture the stateful and containment semantics of online and on-demand queries. We describe techniques to achieve the interactive query processing over realtime data featured by efficient online querying, dynamic stream data persistence and on-demand access. The system architecture is presented and the current implementation status reported.« less
Sundvall, Erik; Wei-Kleiner, Fang; Freire, Sergio M; Lambrix, Patrick
2017-01-01
Archetype-based Electronic Health Record (EHR) systems using generic reference models from e.g. openEHR, ISO 13606 or CIMI should be easy to update and reconfigure with new types (or versions) of data models or entries, ideally with very limited programming or manual database tweaking. Exploratory research (e.g. epidemiology) leading to ad-hoc querying on a population-wide scale can be a challenge in such environments. This publication describes implementation and test of an archetype-aware Dewey encoding optimization that can be used to produce such systems in environments supporting relational operations, e.g. RDBMs and distributed map-reduce frameworks like Hadoop. Initial testing was done using a nine-node 2.2 GHz quad-core Hadoop cluster querying a dataset consisting of targeted extracts from 4+ million real patient EHRs, query results with sub-minute response time were obtained.
Concept-based query language approach to enterprise information systems
NASA Astrophysics Data System (ADS)
Niemi, Timo; Junkkari, Marko; Järvelin, Kalervo
2014-01-01
In enterprise information systems (EISs) it is necessary to model, integrate and compute very diverse data. In advanced EISs the stored data often are based both on structured (e.g. relational) and semi-structured (e.g. XML) data models. In addition, the ad hoc information needs of end-users may require the manipulation of data-oriented (structural), behavioural and deductive aspects of data. Contemporary languages capable of treating this kind of diversity suit only persons with good programming skills. In this paper we present a concept-oriented query language approach to manipulate this diversity so that the programming skill requirements are considerably reduced. In our query language, the features which need technical knowledge are hidden in application-specific concepts and structures. Therefore, users need not be aware of the underlying technology. Application-specific concepts and structures are represented by the modelling primitives of the extended RDOOM (relational deductive object-oriented modelling) which contains primitives for all crucial real world relationships (is-a relationship, part-of relationship, association), XML documents and views. Our query language also supports intensional and extensional-intensional queries, in addition to conventional extensional queries. In its query formulation, the end-user combines available application-specific concepts and structures through shared variables.
Ad Hoc Information Extraction for Clinical Data Warehouses.
Dietrich, Georg; Krebs, Jonathan; Fette, Georg; Ertl, Maximilian; Kaspar, Mathias; Störk, Stefan; Puppe, Frank
2018-05-01
Clinical Data Warehouses (CDW) reuse Electronic health records (EHR) to make their data retrievable for research purposes or patient recruitment for clinical trials. However, much information are hidden in unstructured data like discharge letters. They can be preprocessed and converted to structured data via information extraction (IE), which is unfortunately a laborious task and therefore usually not available for most of the text data in CDW. The goal of our work is to provide an ad hoc IE service that allows users to query text data ad hoc in a manner similar to querying structured data in a CDW. While search engines just return text snippets, our systems also returns frequencies (e.g. how many patients exist with "heart failure" including textual synonyms or how many patients have an LVEF < 45) based on the content of discharge letters or textual reports for special investigations like heart echo. Three subtasks are addressed: (1) To recognize and to exclude negations and their scopes, (2) to extract concepts, i.e. Boolean values and (3) to extract numerical values. We implemented an extended version of the NegEx-algorithm for German texts that detects negations and determines their scope. Furthermore, our document oriented CDW PaDaWaN was extended with query functions, e.g. context sensitive queries and regex queries, and an extraction mode for computing the frequencies for Boolean and numerical values. Evaluations in chest X-ray reports and in discharge letters showed high F1-scores for the three subtasks: Detection of negated concepts in chest X-ray reports with an F1-score of 0.99 and in discharge letters with 0.97; of Boolean values in chest X-ray reports about 0.99, and of numerical values in chest X-ray reports and discharge letters also around 0.99 with the exception of the concept age. The advantages of an ad hoc IE over a standard IE are the low development effort (just entering the concept with its variants), the promptness of the results and the adaptability by the user to his or her particular question. Disadvantage are usually lower accuracy and confidence.This ad hoc information extraction approach is novel and exceeds existing systems: Roogle [1] extracts predefined concepts from texts at preprocessing and makes them retrievable at runtime. Dr. Warehouse [2] applies negation detection and indexes the produced subtexts which include affirmed findings. Our approach combines negation detection and the extraction of concepts. But the extraction does not take place during preprocessing, but at runtime. That provides an ad hoc, dynamic, interactive and adjustable information extraction of random concepts and even their values on the fly at runtime. We developed an ad hoc information extraction query feature for Boolean and numerical values within a CDW with high recall and precision based on a pipeline that detects and removes negations and their scope in clinical texts. Schattauer GmbH.
Supporting Dynamic Quantization for High-Dimensional Data Analytics.
Guzun, Gheorghi; Canahuate, Guadalupe
2017-05-01
Similarity searches are at the heart of exploratory data analysis tasks. Distance metrics are typically used to characterize the similarity between data objects represented as feature vectors. However, when the dimensionality of the data increases and the number of features is large, traditional distance metrics fail to distinguish between the closest and furthest data points. Localized distance functions have been proposed as an alternative to traditional distance metrics. These functions only consider dimensions close to query to compute the distance/similarity. Furthermore, in order to enable interactive explorations of high-dimensional data, indexing support for ad-hoc queries is needed. In this work we set up to investigate whether bit-sliced indices can be used for exploratory analytics such as similarity searches and data clustering for high-dimensional big-data. We also propose a novel dynamic quantization called Query dependent Equi-Depth (QED) quantization and show its effectiveness on characterizing high-dimensional similarity. When applying QED we observe improvements in kNN classification accuracy over traditional distance functions. Gheorghi Guzun and Guadalupe Canahuate. 2017. Supporting Dynamic Quantization for High-Dimensional Data Analytics. In Proceedings of Ex-ploreDB'17, Chicago, IL, USA, May 14-19, 2017, 6 pages. https://doi.org/http://dx.doi.org/10.1145/3077331.3077336.
Enabling multi-level relevance feedback on PubMed by integrating rank learning into DBMS.
Yu, Hwanjo; Kim, Taehoon; Oh, Jinoh; Ko, Ilhwan; Kim, Sungchul; Han, Wook-Shin
2010-04-16
Finding relevant articles from PubMed is challenging because it is hard to express the user's specific intention in the given query interface, and a keyword query typically retrieves a large number of results. Researchers have applied machine learning techniques to find relevant articles by ranking the articles according to the learned relevance function. However, the process of learning and ranking is usually done offline without integrated with the keyword queries, and the users have to provide a large amount of training documents to get a reasonable learning accuracy. This paper proposes a novel multi-level relevance feedback system for PubMed, called RefMed, which supports both ad-hoc keyword queries and a multi-level relevance feedback in real time on PubMed. RefMed supports a multi-level relevance feedback by using the RankSVM as the learning method, and thus it achieves higher accuracy with less feedback. RefMed "tightly" integrates the RankSVM into RDBMS to support both keyword queries and the multi-level relevance feedback in real time; the tight coupling of the RankSVM and DBMS substantially improves the processing time. An efficient parameter selection method for the RankSVM is also proposed, which tunes the RankSVM parameter without performing validation. Thereby, RefMed achieves a high learning accuracy in real time without performing a validation process. RefMed is accessible at http://dm.postech.ac.kr/refmed. RefMed is the first multi-level relevance feedback system for PubMed, which achieves a high accuracy with less feedback. It effectively learns an accurate relevance function from the user's feedback and efficiently processes the function to return relevant articles in real time.
Enabling multi-level relevance feedback on PubMed by integrating rank learning into DBMS
2010-01-01
Background Finding relevant articles from PubMed is challenging because it is hard to express the user's specific intention in the given query interface, and a keyword query typically retrieves a large number of results. Researchers have applied machine learning techniques to find relevant articles by ranking the articles according to the learned relevance function. However, the process of learning and ranking is usually done offline without integrated with the keyword queries, and the users have to provide a large amount of training documents to get a reasonable learning accuracy. This paper proposes a novel multi-level relevance feedback system for PubMed, called RefMed, which supports both ad-hoc keyword queries and a multi-level relevance feedback in real time on PubMed. Results RefMed supports a multi-level relevance feedback by using the RankSVM as the learning method, and thus it achieves higher accuracy with less feedback. RefMed "tightly" integrates the RankSVM into RDBMS to support both keyword queries and the multi-level relevance feedback in real time; the tight coupling of the RankSVM and DBMS substantially improves the processing time. An efficient parameter selection method for the RankSVM is also proposed, which tunes the RankSVM parameter without performing validation. Thereby, RefMed achieves a high learning accuracy in real time without performing a validation process. RefMed is accessible at http://dm.postech.ac.kr/refmed. Conclusions RefMed is the first multi-level relevance feedback system for PubMed, which achieves a high accuracy with less feedback. It effectively learns an accurate relevance function from the user’s feedback and efficiently processes the function to return relevant articles in real time. PMID:20406504
UMass at TREC WEB 2014: Entity Query Feature Expansion using Knowledge Base Links
2014-11-01
bears 270 sun tzu 274 golf instruction 291 sangre de cristo mountains 263 evidence for evolution 300 how to find the mean 262 balding cure 280 view my...internet history 294 flowering plants (b) Worst Query Title 264 tribe formerly living in alabama 295 how to tie a windsor knot 283 hayrides in pa 252...work we leverage the rich semantic knowledge available through these links to understand relevance of documents for a query. We fo- cus on the ad hoc
Virgilio, Massimiliano; Jordaens, Kurt; Breman, Floris C; Backeljau, Thierry; De Meyer, Marc
2012-01-01
We propose a general working strategy to deal with incomplete reference libraries in the DNA barcoding identification of species. Considering that (1) queries with a large genetic distance with their best DNA barcode match are more likely to be misidentified and (2) imposing a distance threshold profitably reduces identification errors, we modelled relationships between identification performances and distance thresholds in four DNA barcode libraries of Diptera (n = 4270), Lepidoptera (n = 7577), Hymenoptera (n = 2067) and Tephritidae (n = 602 DNA barcodes). In all cases, more restrictive distance thresholds produced a gradual increase in the proportion of true negatives, a gradual decrease of false positives and more abrupt variations in the proportions of true positives and false negatives. More restrictive distance thresholds improved precision, yet negatively affected accuracy due to the higher proportions of queries discarded (viz. having a distance query-best match above the threshold). Using a simple linear regression we calculated an ad hoc distance threshold for the tephritid library producing an estimated relative identification error <0.05. According to the expectations, when we used this threshold for the identification of 188 independently collected tephritids, less than 5% of queries with a distance query-best match below the threshold were misidentified. Ad hoc thresholds can be calculated for each particular reference library of DNA barcodes and should be used as cut-off mark defining whether we can proceed identifying the query with a known estimated error probability (e.g. 5%) or whether we should discard the query and consider alternative/complementary identification methods.
Virgilio, Massimiliano; Jordaens, Kurt; Breman, Floris C.; Backeljau, Thierry; De Meyer, Marc
2012-01-01
We propose a general working strategy to deal with incomplete reference libraries in the DNA barcoding identification of species. Considering that (1) queries with a large genetic distance with their best DNA barcode match are more likely to be misidentified and (2) imposing a distance threshold profitably reduces identification errors, we modelled relationships between identification performances and distance thresholds in four DNA barcode libraries of Diptera (n = 4270), Lepidoptera (n = 7577), Hymenoptera (n = 2067) and Tephritidae (n = 602 DNA barcodes). In all cases, more restrictive distance thresholds produced a gradual increase in the proportion of true negatives, a gradual decrease of false positives and more abrupt variations in the proportions of true positives and false negatives. More restrictive distance thresholds improved precision, yet negatively affected accuracy due to the higher proportions of queries discarded (viz. having a distance query-best match above the threshold). Using a simple linear regression we calculated an ad hoc distance threshold for the tephritid library producing an estimated relative identification error <0.05. According to the expectations, when we used this threshold for the identification of 188 independently collected tephritids, less than 5% of queries with a distance query-best match below the threshold were misidentified. Ad hoc thresholds can be calculated for each particular reference library of DNA barcodes and should be used as cut-off mark defining whether we can proceed identifying the query with a known estimated error probability (e.g. 5%) or whether we should discard the query and consider alternative/complementary identification methods. PMID:22359600
Database architectures for Space Telescope Science Institute
NASA Astrophysics Data System (ADS)
Lubow, Stephen
1993-08-01
At STScI nearly all large applications require database support. A general purpose architecture has been developed and is in use that relies upon an extended client-server paradigm. Processing is in general distributed across three processes, each of which generally resides on its own processor. Database queries are evaluated on one such process, called the DBMS server. The DBMS server software is provided by a database vendor. The application issues database queries and is called the application client. This client uses a set of generic DBMS application programming calls through our STDB/NET programming interface. Intermediate between the application client and the DBMS server is the STDB/NET server. This server accepts generic query requests from the application and converts them into the specific requirements of the DBMS server. In addition, it accepts query results from the DBMS server and passes them back to the application. Typically the STDB/NET server is local to the DBMS server, while the application client may be remote. The STDB/NET server provides additional capabilities such as database deadlock restart and performance monitoring. This architecture is currently in use for some major STScI applications, including the ground support system. We are currently investigating means of providing ad hoc query support to users through the above architecture. Such support is critical for providing flexible user interface capabilities. The Universal Relation advocated by Ullman, Kernighan, and others appears to be promising. In this approach, the user sees the entire database as a single table, thereby freeing the user from needing to understand the detailed schema. A software layer provides the translation between the user and detailed schema views of the database. However, many subtle issues arise in making this transformation. We are currently exploring this scheme for use in the Hubble Space Telescope user interface to the data archive system (DADS).
The Hub Population Health System: distributed ad hoc queries and alerts
Anane, Sheila; Taverna, John; Amirfar, Sam; Stubbs-Dame, Remle; Singer, Jesse
2011-01-01
The Hub Population Health System enables the creation and distribution of queries for aggregate count information, clinical decision support alerts at the point-of-care for patients who meet specified conditions, and secure messages sent directly to provider electronic health record (EHR) inboxes. Using a metronidazole medication recall, the New York City Department of Health was able to determine the number of affected patients and message providers, and distribute an alert to participating practices. As of September 2011, the system is live in 400 practices and within a year will have over 532 practices with 2500 providers, representing over 2.5 million New Yorkers. The Hub can help public health experts to evaluate population health and quality improvement activities throughout the ambulatory care network. Multiple EHR vendors are building these features in partnership with the department's regional extension center in anticipation of new meaningful use requirements. PMID:22071531
Monitoring and tracing of critical software systems: State of the work and project definition
2008-12-01
analysis, troubleshooting and debugging. Some of these subsystems already come with ad hoc tracers for events like wireless connections or SCSI disk... SQLite ). Additional synthetic events (e.g. states) are added to the database. The database thus consists in contexts (process, CPU, state), event...capability on a [operating] system-by-system basis. Additionally, the mechanics of querying the data in an ad - hoc manner outside the boundaries of the
An advanced search engine for patent analytics in medicinal chemistry.
Pasche, Emilie; Gobeill, Julien; Teodoro, Douglas; Gaudinat, Arnaud; Vishnykova, Dina; Lovis, Christian; Ruch, Patrick
2012-01-01
Patent collections contain an important amount of medical-related knowledge, but existing tools were reported to lack of useful functionalities. We present here the development of TWINC, an advanced search engine dedicated to patent retrieval in the domain of health and life sciences. Our tool embeds two search modes: an ad hoc search to retrieve relevant patents given a short query and a related patent search to retrieve similar patents given a patent. Both search modes rely on tuning experiments performed during several patent retrieval competitions. Moreover, TWINC is enhanced with interactive modules, such as chemical query expansion, which is of prior importance to cope with various ways of naming biomedical entities. While the related patent search showed promising performances, the ad-hoc search resulted in fairly contrasted results. Nonetheless, TWINC performed well during the Chemathlon task of the PatOlympics competition and experts appreciated its usability.
PiCO QL: A software library for runtime interactive queries on program data
NASA Astrophysics Data System (ADS)
Fragkoulis, Marios; Spinellis, Diomidis; Louridas, Panos
PiCO QL is an open source C/C++ software whose scientific scope is real-time interactive analysis of in-memory data through SQL queries. It exposes a relational view of a system's or application's data structures, which is queryable through SQL. While the application or system is executing, users can input queries through a web-based interface or issue web service requests. Queries execute on the live data structures through the respective relational views. PiCO QL makes a good candidate for ad-hoc data analysis in applications and for diagnostics in systems settings. Applications of PiCO QL include the Linux kernel, the Valgrind instrumentation framework, a GIS application, a virtual real-time observatory of stellar objects, and a source code analyser.
Mining the SDSS SkyServer SQL queries log
NASA Astrophysics Data System (ADS)
Hirota, Vitor M.; Santos, Rafael; Raddick, Jordan; Thakar, Ani
2016-05-01
SkyServer, the Internet portal for the Sloan Digital Sky Survey (SDSS) astronomic catalog, provides a set of tools that allows data access for astronomers and scientific education. One of SkyServer data access interfaces allows users to enter ad-hoc SQL statements to query the catalog. SkyServer also presents some template queries that can be used as basis for more complex queries. This interface has logged over 330 million queries submitted since 2001. It is expected that analysis of this data can be used to investigate usage patterns, identify potential new classes of queries, find similar queries, etc. and to shed some light on how users interact with the Sloan Digital Sky Survey data and how scientists have adopted the new paradigm of e-Science, which could in turn lead to enhancements on the user interfaces and experience in general. In this paper we review some approaches to SQL query mining, apply the traditional techniques used in the literature and present lessons learned, namely, that the general text mining approach for feature extraction and clustering does not seem to be adequate for this type of data, and, most importantly, we find that this type of analysis can result in very different queries being clustered together.
1982-12-01
management, plus the comments received from the faculty and staff. A major assumption in this thesis is that automated database tech- niques offer the...and major advantage of a DBMS is that of real-time, on- line data accessibility. Routine queries, reports and ad hoc queries caii be performed...used or as applications programs evolve. Such changes can have a major impact on the organization and storage of data and ultimately on the response
Active Learning by Querying Informative and Representative Examples.
Huang, Sheng-Jun; Jin, Rong; Zhou, Zhi-Hua
2014-10-01
Active learning reduces the labeling cost by iteratively selecting the most valuable data to query their labels. It has attracted a lot of interests given the abundance of unlabeled data and the high cost of labeling. Most active learning approaches select either informative or representative unlabeled instances to query their labels, which could significantly limit their performance. Although several active learning algorithms were proposed to combine the two query selection criteria, they are usually ad hoc in finding unlabeled instances that are both informative and representative. We address this limitation by developing a principled approach, termed QUIRE, based on the min-max view of active learning. The proposed approach provides a systematic way for measuring and combining the informativeness and representativeness of an unlabeled instance. Further, by incorporating the correlation among labels, we extend the QUIRE approach to multi-label learning by actively querying instance-label pairs. Extensive experimental results show that the proposed QUIRE approach outperforms several state-of-the-art active learning approaches in both single-label and multi-label learning.
Optimizing Interactive Development of Data-Intensive Applications
Interlandi, Matteo; Tetali, Sai Deep; Gulzar, Muhammad Ali; Noor, Joseph; Condie, Tyson; Kim, Miryung; Millstein, Todd
2017-01-01
Modern Data-Intensive Scalable Computing (DISC) systems are designed to process data through batch jobs that execute programs (e.g., queries) compiled from a high-level language. These programs are often developed interactively by posing ad-hoc queries over the base data until a desired result is generated. We observe that there can be significant overlap in the structure of these queries used to derive the final program. Yet, each successive execution of a slightly modified query is performed anew, which can significantly increase the development cycle. Vega is an Apache Spark framework that we have implemented for optimizing a series of similar Spark programs, likely originating from a development or exploratory data analysis session. Spark developers (e.g., data scientists) can leverage Vega to significantly reduce the amount of time it takes to re-execute a modified Spark program, reducing the overall time to market for their Big Data applications. PMID:28405637
Horvath, Monica M.; Winfield, Stephanie; Evans, Steve; Slopek, Steve; Shang, Howard; Ferranti, Jeffrey
2011-01-01
In many healthcare organizations, comparative effectiveness research and quality improvement (QI) investigations are hampered by a lack of access to data created as a byproduct of patient care. Data collection often hinges upon either manual chart review or ad hoc requests to technical experts who support legacy clinical systems. In order to facilitate this needed capacity for data exploration at our institution (Duke University Health System), we have designed and deployed a robust Web application for cohort identification and data extraction—the Duke Enterprise Data Unified Content Explorer (DEDUCE). DEDUCE is envisioned as a simple, web-based environment that allows investigators access to administrative, financial, and clinical information generated during patient care. By using business intelligence tools to create a view into Duke Medicine's enterprise data warehouse, DEDUCE provides a guided query functionality using a wizard-like interface that lets users filter through millions of clinical records, explore aggregate reports, and, export extracts. Researchers and QI specialists can obtain detailed patient- and observation-level extracts without needing to understand structured query language or the underlying database model. Developers designing such tools must devote sufficient training and develop application safeguards to ensure that patient-centered clinical researchers understand when observation-level extracts should be used. This may mitigate the risk of data being misunderstood and consequently used in an improper fashion. PMID:21130181
Horvath, Monica M; Winfield, Stephanie; Evans, Steve; Slopek, Steve; Shang, Howard; Ferranti, Jeffrey
2011-04-01
In many healthcare organizations, comparative effectiveness research and quality improvement (QI) investigations are hampered by a lack of access to data created as a byproduct of patient care. Data collection often hinges upon either manual chart review or ad hoc requests to technical experts who support legacy clinical systems. In order to facilitate this needed capacity for data exploration at our institution (Duke University Health System), we have designed and deployed a robust Web application for cohort identification and data extraction--the Duke Enterprise Data Unified Content Explorer (DEDUCE). DEDUCE is envisioned as a simple, web-based environment that allows investigators access to administrative, financial, and clinical information generated during patient care. By using business intelligence tools to create a view into Duke Medicine's enterprise data warehouse, DEDUCE provides a Guided Query functionality using a wizard-like interface that lets users filter through millions of clinical records, explore aggregate reports, and, export extracts. Researchers and QI specialists can obtain detailed patient- and observation-level extracts without needing to understand structured query language or the underlying database model. Developers designing such tools must devote sufficient training and develop application safeguards to ensure that patient-centered clinical researchers understand when observation-level extracts should be used. This may mitigate the risk of data being misunderstood and consequently used in an improper fashion. Copyright © 2010 Elsevier Inc. All rights reserved.
Engineering a Multi-Purpose Test Collection for Web Retrieval Experiments.
ERIC Educational Resources Information Center
Bailey, Peter; Craswell, Nick; Hawking, David
2003-01-01
Describes a test collection that was developed as a multi-purpose testbed for experiments on the Web in distributed information retrieval, hyperlink algorithms, and conventional ad hoc retrieval. Discusses inter-server connectivity, integrity of server holdings, inclusion of documents related to a wide spread of likely queries, and distribution of…
NASA Interactive Forms Type Interface - NIFTI
NASA Technical Reports Server (NTRS)
Jain, Bobby; Morris, Bill
2005-01-01
A flexible database query, update, modify, and delete tool was developed that provides an easy interface to Oracle forms. This tool - the NASA interactive forms type interface, or NIFTI - features on-the- fly forms creation, forms sharing among users, the capability to query the database from user-entered criteria on forms, traversal of query results, an ability to generate tab-delimited reports, viewing and downloading of reports to the user s workstation, and a hypertext-based help system. NIFTI is a very powerful ad hoc query tool that was developed using C++, X-Windows by a Motif application framework. A unique tool, NIFTI s capabilities appear in no other known commercial-off-the- shelf (COTS) tool, because NIFTI, which can be launched from the user s desktop, is a simple yet very powerful tool with a highly intuitive, easy-to-use graphical user interface (GUI) that will expedite the creation of database query/update forms. NIFTI, therefore, can be used in NASA s International Space Station (ISS) as well as within government and industry - indeed by all users of the widely disseminated Oracle base. And it will provide significant cost savings in the areas of user training and scalability while advancing the art over current COTS browsers. No COTS browser performs all the functions NIFTI does, and NIFTI is easier to use. NIFTI s cost savings are very significant considering the very large database with which it is used and the large user community with varying data requirements it will support. Its ease of use means that personnel unfamiliar with databases (e.g., managers, supervisors, clerks, and others) can develop their own personal reports. For NASA, a tool such as NIFTI was needed to query, update, modify, and make deletions within the ISS vehicle master database (VMDB), a repository of engineering data that includes an indentured parts list and associated resource data (power, thermal, volume, weight, and the like). Since the VMDB is used both as a collection point for data and as a common repository for engineering, integration, and operations teams, a tool such as NIFTI had to be designed that could expedite the creation of database query/update forms which could then be shared among users.
EarthServer - an FP7 project to enable the web delivery and analysis of 3D/4D models
NASA Astrophysics Data System (ADS)
Laxton, John; Sen, Marcus; Passmore, James
2013-04-01
EarthServer aims at open access and ad-hoc analytics on big Earth Science data, based on the OGC geoservice standards Web Coverage Service (WCS) and Web Coverage Processing Service (WCPS). The WCS model defines "coverages" as a unifying paradigm for multi-dimensional raster data, point clouds, meshes, etc., thereby addressing a wide range of Earth Science data including 3D/4D models. WCPS allows declarative SQL-style queries on coverages. The project is developing a pilot implementing these standards, and will also investigate the use of GeoSciML to describe coverages. Integration of WCPS with XQuery will in turn allow coverages to be queried in combination with their metadata and GeoSciML description. The unified service will support navigation, extraction, aggregation, and ad-hoc analysis on coverage data from SQL. Clients will range from mobile devices to high-end immersive virtual reality, and will enable 3D model visualisation using web browser technology coupled with developing web standards. EarthServer is establishing open-source client and server technology intended to be scalable to Petabyte/Exabyte volumes, based on distributed processing, supercomputing, and cloud virtualization. Implementation will be based on the existing rasdaman server technology developed. Services using rasdaman technology are being installed serving the atmospheric, oceanographic, geological, cryospheric, planetary and general earth observation communities. The geology service (http://earthserver.bgs.ac.uk/) is being provided by BGS and at present includes satellite imagery, superficial thickness data, onshore DTMs and 3D models for the Glasgow area. It is intended to extend the data sets available to include 3D voxel models. Use of the WCPS standard allows queries to be constructed against single or multiple coverages. For example on a single coverage data for a particular area can be selected or data with a particular range of pixel values. Queries on multiple surfaces can be constructed to calculate, for example, the thickness between two surfaces in a 3D model or the depth from ground surface to the top of a particular geologic unit. In the first version of the service a simple interface showing some example queries has been implemented in order to show the potential of the technologies. The project aims to develop the services available in light of user feedback, both in terms of the data available, the functionality and the interface. User feedback on the services guides the software and standards development aspects of the project, leading to enhanced versions of the software which will be implemented in upgraded versions of the services during the lifetime of the project.
Relational similarity-based model of data part 1: foundations and query systems
NASA Astrophysics Data System (ADS)
Belohlavek, Radim; Vychodil, Vilem
2017-10-01
We present a general rank-aware model of data which supports handling of similarity in relational databases. The model is based on the assumption that in many cases it is desirable to replace equalities on values in data tables by similarity relations expressing degrees to which the values are similar. In this context, we study various phenomena which emerge in the model, including similarity-based queries and similarity-based data dependencies. Central notion in our model is that of a ranked data table over domains with similarities which is our counterpart to the notion of relation on relation scheme from the classical relational model. Compared to other approaches which cover related problems, we do not propose a similarity-based or ranking module on top of the classical relational model. Instead, we generalize the very core of the model by replacing the classical, two-valued logic upon which the classical model is built by a more general logic involving a scale of truth degrees that, in addition to the classical truth degrees 0 and 1, contains intermediate truth degrees. While the classical truth degrees 0 and 1 represent nonequality and equality of values, and subsequently mismatch and match of queries, the intermediate truth degrees in the new model represent similarity of values and partial match of queries. Moreover, the truth functions of many-valued logical connectives in the new model serve to aggregate degrees of similarity. The presented approach is conceptually clean, logically sound, and retains most properties of the classical model while enabling us to employ new types of queries and data dependencies. Most importantly, similarity is not handled in an ad hoc way or by putting a "similarity module" atop the classical model in our approach. Rather, it is consistently viewed as a notion that generalizes and replaces equality in the very core of the relational model. We present fundamentals of the formal model and two equivalent query systems which are analogues of the classical relational algebra and domain relational calculus with range declarations. In the sequel to this paper, we deal with similarity-based dependencies.
Breaking the Curse of Cardinality on Bitmap Indexes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wu, Kesheng; Wu, Kesheng; Stockinger, Kurt
2008-04-04
Bitmap indexes are known to be efficient for ad-hoc range queries that are common in data warehousing and scientific applications. However, they suffer from the curse of cardinality, that is, their efficiency deteriorates as attribute cardinalities increase. A number of strategies have been proposed, but none of them addresses the problem adequately. In this paper, we propose a novel binned bitmap index that greatly reduces the cost to answer queries, and therefore breaks the curse of cardinality. The key idea is to augment the binned index with an Order-preserving Bin-based Clustering (OrBiC) structure. This data structure significantly reduces the I/Omore » operations needed to resolve records that cannot be resolved with the bitmaps. To further improve the proposed index structure, we also present a strategy to create single-valued bins for frequent values. This strategy reduces index sizes and improves query processing speed. Overall, the binned indexes with OrBiC great improves the query processing speed, and are 3 - 25 times faster than the best available indexes for high-cardinality data.« less
Distributed XQuery-Based Integration and Visualization of Multimodality Brain Mapping Data
Detwiler, Landon T.; Suciu, Dan; Franklin, Joshua D.; Moore, Eider B.; Poliakov, Andrew V.; Lee, Eunjung S.; Corina, David P.; Ojemann, George A.; Brinkley, James F.
2008-01-01
This paper addresses the need for relatively small groups of collaborating investigators to integrate distributed and heterogeneous data about the brain. Although various national efforts facilitate large-scale data sharing, these approaches are generally too “heavyweight” for individual or small groups of investigators, with the result that most data sharing among collaborators continues to be ad hoc. Our approach to this problem is to create a “lightweight” distributed query architecture, in which data sources are accessible via web services that accept arbitrary query languages but return XML results. A Distributed XQuery Processor (DXQP) accepts distributed XQueries in which subqueries are shipped to the remote data sources to be executed, with the resulting XML integrated by DXQP. A web-based application called DXBrain accesses DXQP, allowing a user to create, save and execute distributed XQueries, and to view the results in various formats including a 3-D brain visualization. Example results are presented using distributed brain mapping data sources obtained in studies of language organization in the brain, but any other XML source could be included. The advantage of this approach is that it is very easy to add and query a new source, the tradeoff being that the user needs to understand XQuery and the schemata of the underlying sources. For small numbers of known sources this burden is not onerous for a knowledgeable user, leading to the conclusion that the system helps to fill the gap between ad hoc local methods and large scale but complex national data sharing efforts. PMID:19198662
Joint Experimentation on Scalable Parallel Processors (JESPP)
2006-04-01
made use of local embedded relational databases, implemented using sqlite on each node of an SPP to execute queries and return results via an ad hoc ...rl.af.mil 12a. DISTRIBUTION / AVAILABILITY STATEENT APPROVED FOR PUBLIC RELEASE; DISTRIBUTION UNLIMITED. 12b. DISTRIBUTION CODE 13. ABSTRACT...Experimentation Directorate (J9) required expansion of its joint semi-automated forces (JSAF) code capabilities; including number of entities, behavior complexity
A Statewide Management Information System for the Control of Sexually Transmitted Diseases
Fichtner, Ronald R.; Blount, Joseph H.; Spencer, Jack N.
1983-01-01
The persistent endemicity in the U.S. of infectious syphilis and gonorrhea, together with increasing diagnoses of gonococcal-related pelvic inflammatory disease in women and genital herpes infections, have intensified pressures on state and local VD control programs to measure, analyze, and interpret the distribution and transmission of these and other sexually transmitted diseases. In response, the Division of Venereal Disease Control (DVDC) of the Centers for Disease Control (CDC) is participating in the development of three state-wide, prototype sexually transmitted disease (STD) management information systems. A systems analysis of a typical state-wide STD control program indicated that timely, comprehensive, informational support to public health managers and policy makers should be combined with rapid, direct support of program activities using an on-line, integrated data base, computer system with telecommunications capability. This methodology uses a data base management system, query facility for ad hoc inquiries, custom design philosophies, but utilizes distinct hardware and software implementations.
A practical implementation for a data dictionary in an environment of diverse data sets
Sprenger, Karla K.; Larsen, Dana M.
1993-01-01
The need for a data dictionary database at the U.S. Geological Survey's EROS Data Center (EDC) was reinforced with the Earth Observing System Data and Information System (EOSDIS) requirement for consistent field definitions of data sets residing at more than one archive center. The EDC requirement addresses the existence of multiple sets with identical field definitions using various naming conventions. The EDC is developing a data dictionary database to accomplish the following foals: to standardize field names for ease in software development; to facilitate querying and updating of the date; and to generate ad hoc reports. The structure of the EDC electronic data dictionary database supports different metadata systems as well as many different data sets. A series of reports is used to keep consistency among data sets and various metadata systems.
Frequent Itemset Mining for Query Expansion in Microblog Ad-hoc Search
2012-11-01
captin, @rayhattersley, #rhythm, almond, #lift, #white, rhythm, #blues, #sci- ence, loss*, you, #eating, #beauty, ambria, #with, weight*, diets , white...puwisdom, sci- ence, #and, @pulistbook, lift, dingle, @mir- acleweight, @tweettraffic4u, # diets , oranges, #of, @aase25, and*, berries*, #no...and, it, is, daily* MB055 benefits, with, acai, their, diet , on, of, wt, health, plans, ber, for, first, how, are, loss*, you, this, mt, lose, weight
Achieve Location Privacy-Preserving Range Query in Vehicular Sensing
Lu, Rongxing; Ma, Maode; Bao, Haiyong
2017-01-01
Modern vehicles are equipped with a plethora of on-board sensors and large on-board storage, which enables them to gather and store various local-relevant data. However, the wide application of vehicular sensing has its own challenges, among which location-privacy preservation and data query accuracy are two critical problems. In this paper, we propose a novel range query scheme, which helps the data requester to accurately retrieve the sensed data from the distributive on-board storage in vehicular ad hoc networks (VANETs) with location privacy preservation. The proposed scheme exploits structured scalars to denote the locations of data requesters and vehicles, and achieves the privacy-preserving location matching with the homomorphic Paillier cryptosystem technique. Detailed security analysis shows that the proposed range query scheme can successfully preserve the location privacy of the involved data requesters and vehicles, and protect the confidentiality of the sensed data. In addition, performance evaluations are conducted to show the efficiency of the proposed scheme, in terms of computation delay and communication overhead. Specifically, the computation delay and communication overhead are not dependent on the length of the scalar, and they are only proportional to the number of vehicles. PMID:28786943
Achieve Location Privacy-Preserving Range Query in Vehicular Sensing.
Kong, Qinglei; Lu, Rongxing; Ma, Maode; Bao, Haiyong
2017-08-08
Modern vehicles are equipped with a plethora of on-board sensors and large on-board storage, which enables them to gather and store various local-relevant data. However, the wide application of vehicular sensing has its own challenges, among which location-privacy preservation and data query accuracy are two critical problems. In this paper, we propose a novel range query scheme, which helps the data requester to accurately retrieve the sensed data from the distributive on-board storage in vehicular ad hoc networks (VANETs) with location privacy preservation. The proposed scheme exploits structured scalars to denote the locations of data requesters and vehicles, and achieves the privacy-preserving location matching with the homomorphic Paillier cryptosystem technique. Detailed security analysis shows that the proposed range query scheme can successfully preserve the location privacy of the involved data requesters and vehicles, and protect the confidentiality of the sensed data. In addition, performance evaluations are conducted to show the efficiency of the proposed scheme, in terms of computation delay and communication overhead. Specifically, the computation delay and communication overhead are not dependent on the length of the scalar, and they are only proportional to the number of vehicles.
EarthServer: Use of Rasdaman as a data store for use in visualisation of complex EO data
NASA Astrophysics Data System (ADS)
Clements, Oliver; Walker, Peter; Grant, Mike
2013-04-01
The European Commission FP7 project EarthServer is establishing open access and ad-hoc analytics on extreme-size Earth Science data, based on and extending cutting-edge Array Database technology. EarthServer is built around the Rasdaman Raster Data Manager which extends standard relational database systems with the ability to store and retrieve multi-dimensional raster data of unlimited size through an SQL style query language. Rasdaman facilitates visualisation of data by providing several Open Geospatial Consortium (OGC) standard interfaces through its web services wrapper, Petascope. These include the well established standards, Web Coverage Service (WCS) and Web Map Service (WMS) as well as the emerging standard, Web Coverage Processing Service (WCPS). The WCPS standard allows the running of ad-hoc queries on the data stored within Rasdaman, creating an infrastructure where users are not restricted by bandwidth when manipulating or querying huge datasets. Here we will show that the use of EarthServer technologies and infrastructure allows access and visualisation of massive scale data through a web client with only marginal bandwidth use as opposed to the current mechanism of copying huge amounts of data to create visualisations locally. For example if a user wanted to generate a plot of global average chlorophyll for a complete decade time series they would only have to download the result instead of Terabytes of data. Firstly we will present a brief overview of the capabilities of Rasdaman and the WCPS query language to introduce the ways in which it is used in a visualisation tool chain. We will show that there are several ways in which WCPS can be utilised to create both standard and novel web based visualisations. An example of a standard visualisation is the production of traditional 2d plots, allowing users the ability to plot data products easily. However, the query language allows the creation of novel/custom products, which can then immediately be plotted with the same system. For more complex multi-spectral data, WCPS allows the user to explore novel combinations of bands in standard band-ratio algorithms through a web browser with dynamic updating of the resultant image. To visualise very large datasets Rasdaman has the capability to dynamically scale a dataset or query result so that it can be appraised quickly for use in later unscaled queries. All of these techniques are accessible through a web based GIS interface increasing the number of potential users of the system. Lastly we will show the advances in dynamic web based 3D visualisations being explored within the EarthServer project. By utilising the emerging declarative 3D web standard X3DOM as a tool to visualise the results of WCPS queries we introduce several possible benefits, including quick appraisal of data for outliers or anomalous data points and visualisation of the uncertainty of data alongside the actual data values.
Integrated Array/Metadata Analytics
NASA Astrophysics Data System (ADS)
Misev, Dimitar; Baumann, Peter
2015-04-01
Data comes in various forms and types, and integration usually presents a problem that is often simply ignored and solved with ad-hoc solutions. Multidimensional arrays are an ubiquitous data type, that we find at the core of virtually all science and engineering domains, as sensor, model, image, statistics data. Naturally, arrays are richly described by and intertwined with additional metadata (alphanumeric relational data, XML, JSON, etc). Database systems, however, a fundamental building block of what we call "Big Data", lack adequate support for modelling and expressing these array data/metadata relationships. Array analytics is hence quite primitive or non-existent at all in modern relational DBMS. Recognizing this, we extended SQL with a new SQL/MDA part seamlessly integrating multidimensional array analytics into the standard database query language. We demonstrate the benefits of SQL/MDA with real-world examples executed in ASQLDB, an open-source mediator system based on HSQLDB and rasdaman, that already implements SQL/MDA.
NASA Astrophysics Data System (ADS)
Baumann, Peter
2013-04-01
There is a traditional saying that metadata are understandable, semantic-rich, and searchable. Data, on the other hand, are big, with no accessible semantics, and just downloadable. Not only has this led to an imbalance of search support form a user perspective, but also underneath to a deep technology divide often using relational databases for metadata and bespoke archive solutions for data. Our vision is that this barrier will be overcome, and data and metadata become searchable likewise, leveraging the potential of semantic technologies in combination with scalability technologies. Ultimately, in this vision ad-hoc processing and filtering will not distinguish any longer, forming a uniformly accessible data universe. In the European EarthServer initiative, we work towards this vision by federating database-style raster query languages with metadata search and geo broker technology. We present our approach taken, how it can leverage OGC standards, the benefits envisaged, and first results.
Query-Structure Based Web Page Indexing
2012-11-01
the massive amount of data present on the web. In our third participation in the web track at TREC 2012, we explore the idea of building an...the ad-hoc and diversity task. 1 INTRODUCTION The rapid growth and massive quantities of data on the Internet have increased the importance and...complexity of information retrieval systems. The amount and the diversity of the web data introduce shortcomings in the way search engines rank their
Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce.
Aji, Ablimit; Wang, Fusheng; Vo, Hoang; Lee, Rubao; Liu, Qiaoling; Zhang, Xiaodong; Saltz, Joel
2013-08-01
Support of high performance queries on large volumes of spatial data becomes increasingly important in many application domains, including geospatial problems in numerous fields, location based services, and emerging scientific applications that are increasingly data- and compute-intensive. The emergence of massive scale spatial data is due to the proliferation of cost effective and ubiquitous positioning technologies, development of high resolution imaging technologies, and contribution from a large number of community users. There are two major challenges for managing and querying massive spatial data to support spatial queries: the explosion of spatial data, and the high computational complexity of spatial queries. In this paper, we present Hadoop-GIS - a scalable and high performance spatial data warehousing system for running large scale spatial queries on Hadoop. Hadoop-GIS supports multiple types of spatial queries on MapReduce through spatial partitioning, customizable spatial query engine RESQUE, implicit parallel spatial query execution on MapReduce, and effective methods for amending query results through handling boundary objects. Hadoop-GIS utilizes global partition indexing and customizable on demand local spatial indexing to achieve efficient query processing. Hadoop-GIS is integrated into Hive to support declarative spatial queries with an integrated architecture. Our experiments have demonstrated the high efficiency of Hadoop-GIS on query response and high scalability to run on commodity clusters. Our comparative experiments have showed that performance of Hadoop-GIS is on par with parallel SDBMS and outperforms SDBMS for compute-intensive queries. Hadoop-GIS is available as a set of library for processing spatial queries, and as an integrated software package in Hive.
Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce
Aji, Ablimit; Wang, Fusheng; Vo, Hoang; Lee, Rubao; Liu, Qiaoling; Zhang, Xiaodong; Saltz, Joel
2013-01-01
Support of high performance queries on large volumes of spatial data becomes increasingly important in many application domains, including geospatial problems in numerous fields, location based services, and emerging scientific applications that are increasingly data- and compute-intensive. The emergence of massive scale spatial data is due to the proliferation of cost effective and ubiquitous positioning technologies, development of high resolution imaging technologies, and contribution from a large number of community users. There are two major challenges for managing and querying massive spatial data to support spatial queries: the explosion of spatial data, and the high computational complexity of spatial queries. In this paper, we present Hadoop-GIS – a scalable and high performance spatial data warehousing system for running large scale spatial queries on Hadoop. Hadoop-GIS supports multiple types of spatial queries on MapReduce through spatial partitioning, customizable spatial query engine RESQUE, implicit parallel spatial query execution on MapReduce, and effective methods for amending query results through handling boundary objects. Hadoop-GIS utilizes global partition indexing and customizable on demand local spatial indexing to achieve efficient query processing. Hadoop-GIS is integrated into Hive to support declarative spatial queries with an integrated architecture. Our experiments have demonstrated the high efficiency of Hadoop-GIS on query response and high scalability to run on commodity clusters. Our comparative experiments have showed that performance of Hadoop-GIS is on par with parallel SDBMS and outperforms SDBMS for compute-intensive queries. Hadoop-GIS is available as a set of library for processing spatial queries, and as an integrated software package in Hive. PMID:24187650
Acquire: an open-source comprehensive cancer biobanking system.
Dowst, Heidi; Pew, Benjamin; Watkins, Chris; McOwiti, Apollo; Barney, Jonathan; Qu, Shijing; Becnel, Lauren B
2015-05-15
The probability of effective treatment of cancer with a targeted therapeutic can be improved for patients with defined genotypes containing actionable mutations. To this end, many human cancer biobanks are integrating more tightly with genomic sequencing facilities and with those creating and maintaining patient-derived xenografts (PDX) and cell lines to provide renewable resources for translational research. To support the complex data management needs and workflows of several such biobanks, we developed Acquire. It is a robust, secure, web-based, database-backed open-source system that supports all major needs of a modern cancer biobank. Its modules allow for i) up-to-the-minute 'scoreboard' and graphical reporting of collections; ii) end user roles and permissions; iii) specimen inventory through caTissue Suite; iv) shipping forms for distribution of specimens to pathology, genomic analysis and PDX/cell line creation facilities; v) robust ad hoc querying; vi) molecular and cellular quality control metrics to track specimens' progress and quality; vii) public researcher request; viii) resource allocation committee distribution request review and oversight and ix) linkage to available derivatives of specimen. © The Author 2015. Published by Oxford University Press.
Any information, anywhere, anytime for the warfighter
NASA Astrophysics Data System (ADS)
Lazaroff, Mark B.; Sage, Philip A.
1997-06-01
The objective of the DARPA battlefield awareness data dissemination (BADD) program is to deliver battlefield awareness information to the warfighter -- anywhere, anytime. BADD is an advanced concept technology demonstration (ACTD) to support proof of concept technology demonstrations and experiments with a goal of introducing new technology to support the operational needs and acceptance of the warfighter. BADD's information management technology provides a 'smart' push of information to the users by providing information subscription services implemented via user- generated profiles. The system also provides services for warfighter pull or 'reach-back' of information via ad hoc query support. The high bandwidth delivery of informtion via the Global Broadcast System (GBS) satellites enables users to receive battlefield awareness information virtually anywhere. Very similar goals have been established for data warehousing technology -- that is, deliver the right information, to the right user, at the right time so that effective decisions can be made. In this paper, we examine the BADD Phase II architecture and underlying information management technoloyg in the context of data warehousing technology and a data warehouse reference architecture. In particular, we foucs on the BADD segment that PSR is building, the Interface to Information Sources (I2S).
Semantic encoding of relational databases in wireless networks
NASA Astrophysics Data System (ADS)
Benjamin, David P.; Walker, Adrian
2005-03-01
Semantic Encoding is a new, patented technology that greatly increases the speed of transmission of distributed databases over networks, especially over ad hoc wireless networks, while providing a novel method of data security. It reduces bandwidth consumption and storage requirements, while speeding up query processing, encryption and computation of digital signatures. We describe the application of Semantic Encoding in a wireless setting and provide an example of its operation in which a compression of 290:1 would be achieved.
DCU@TRECMed 2012: Using Ad-Hoc Baselines for Domain-Specific Retrieval
2012-11-01
description to extend the query, for example: Patients with complicated GERD who receive endoscopy will be extended with Gastroesophageal reflux disease ... Diseases and Related Health Problems, version 9) for the patient’s admission or discharge status [1, 5]; treating negation (e.g. negative test results or...codes were mapped to a description of the code, usually a short phrase/sentence. For instance, the ICD9 code 253.5 corresponds to the disease Diabetes
2007-06-10
limited high risk OB care Pelvic Pain Routine anatomy scan Polycystic Ovarian Syndrome (PCOS) Screening for cystic fibrosis/triple screen Incontinence...Amniocentesis Abnormal Uterine Bleeding Gestational diabetes Dysplasia (colposcopy/LEEP/cone biopsy) Preeclampsia Endometriosis Fetal anomalies (some...Analysis and Reporting Tool is a powerful ad-hoc query tool used to obtain summary and detailed views of population, clinical, and financial data from all
Duplicate document detection in DocBrowse
NASA Astrophysics Data System (ADS)
Chalana, Vikram; Bruce, Andrew G.; Nguyen, Thien
1998-04-01
Duplicate documents are frequently found in large databases of digital documents, such as those found in digital libraries or in the government declassification effort. Efficient duplicate document detection is important not only to allow querying for similar documents, but also to filter out redundant information in large document databases. We have designed three different algorithm to identify duplicate documents. The first algorithm is based on features extracted from the textual content of a document, the second algorithm is based on wavelet features extracted from the document image itself, and the third algorithm is a combination of the first two. These algorithms are integrated within the DocBrowse system for information retrieval from document images which is currently under development at MathSoft. DocBrowse supports duplicate document detection by allowing (1) automatic filtering to hide duplicate documents, and (2) ad hoc querying for similar or duplicate documents. We have tested the duplicate document detection algorithms on 171 documents and found that text-based method has an average 11-point precision of 97.7 percent while the image-based method has an average 11- point precision of 98.9 percent. However, in general, the text-based method performs better when the document contains enough high-quality machine printed text while the image- based method performs better when the document contains little or no quality machine readable text.
Spatial cyberinfrastructures, ontologies, and the humanities.
Sieber, Renee E; Wellen, Christopher C; Jin, Yuan
2011-04-05
We report on research into building a cyberinfrastructure for Chinese biographical and geographic data. Our cyberinfrastructure contains (i) the McGill-Harvard-Yenching Library Ming Qing Women's Writings database (MQWW), the only online database on historical Chinese women's writings, (ii) the China Biographical Database, the authority for Chinese historical people, and (iii) the China Historical Geographical Information System, one of the first historical geographic information systems. Key to this integration is that linked databases retain separate identities as bases of knowledge, while they possess sufficient semantic interoperability to allow for multidatabase concepts and to support cross-database queries on an ad hoc basis. Computational ontologies create underlying semantics for database access. This paper focuses on the spatial component in a humanities cyberinfrastructure, which includes issues of conflicting data, heterogeneous data models, disambiguation, and geographic scale. First, we describe the methodology for integrating the databases. Then we detail the system architecture, which includes a tier of ontologies and schema. We describe the user interface and applications that allow for cross-database queries. For instance, users should be able to analyze the data, examine hypotheses on spatial and temporal relationships, and generate historical maps with datasets from MQWW for research, teaching, and publication on Chinese women writers, their familial relations, publishing venues, and the literary and social communities. Last, we discuss the social side of cyberinfrastructure development, as people are considered to be as critical as the technical components for its success.
WATCHMAN: A Data Warehouse Intelligent Cache Manager
NASA Technical Reports Server (NTRS)
Scheuermann, Peter; Shim, Junho; Vingralek, Radek
1996-01-01
Data warehouses store large volumes of data which are used frequently by decision support applications. Such applications involve complex queries. Query performance in such an environment is critical because decision support applications often require interactive query response time. Because data warehouses are updated infrequently, it becomes possible to improve query performance by caching sets retrieved by queries in addition to query execution plans. In this paper we report on the design of an intelligent cache manager for sets retrieved by queries called WATCHMAN, which is particularly well suited for data warehousing environment. Our cache manager employs two novel, complementary algorithms for cache replacement and for cache admission. WATCHMAN aims at minimizing query response time and its cache replacement policy swaps out entire retrieved sets of queries instead of individual pages. The cache replacement and admission algorithms make use of a profit metric, which considers for each retrieved set its average rate of reference, its size, and execution cost of the associated query. We report on a performance evaluation based on the TPC-D and Set Query benchmarks. These experiments show that WATCHMAN achieves a substantial performance improvement in a decision support environment when compared to a traditional LRU replacement algorithm.
2013-03-01
DSR Dynamic Source Routing DSSS Direct -sequence spread spectrum GUID Globally Unique ID MANET Mobile Ad-hoc Network NS3 Network Simulator 3 OLSR...networking schemes for safe maneuvering and data communication. Imagine needing to maintain an operational picture of an overall environment using a...as simple as O(n) where every node is sequentially queried to O log(n), or O(1). These schemes will be discussed with each individual DHT. Four of the
Towards Building a High Performance Spatial Query System for Large Scale Medical Imaging Data.
Aji, Ablimit; Wang, Fusheng; Saltz, Joel H
2012-11-06
Support of high performance queries on large volumes of scientific spatial data is becoming increasingly important in many applications. This growth is driven by not only geospatial problems in numerous fields, but also emerging scientific applications that are increasingly data- and compute-intensive. For example, digital pathology imaging has become an emerging field during the past decade, where examination of high resolution images of human tissue specimens enables more effective diagnosis, prediction and treatment of diseases. Systematic analysis of large-scale pathology images generates tremendous amounts of spatially derived quantifications of micro-anatomic objects, such as nuclei, blood vessels, and tissue regions. Analytical pathology imaging provides high potential to support image based computer aided diagnosis. One major requirement for this is effective querying of such enormous amount of data with fast response, which is faced with two major challenges: the "big data" challenge and the high computation complexity. In this paper, we present our work towards building a high performance spatial query system for querying massive spatial data on MapReduce. Our framework takes an on demand index building approach for processing spatial queries and a partition-merge approach for building parallel spatial query pipelines, which fits nicely with the computing model of MapReduce. We demonstrate our framework on supporting multi-way spatial joins for algorithm evaluation and nearest neighbor queries for microanatomic objects. To reduce query response time, we propose cost based query optimization to mitigate the effect of data skew. Our experiments show that the framework can efficiently support complex analytical spatial queries on MapReduce.
Towards Building a High Performance Spatial Query System for Large Scale Medical Imaging Data
Aji, Ablimit; Wang, Fusheng; Saltz, Joel H.
2013-01-01
Support of high performance queries on large volumes of scientific spatial data is becoming increasingly important in many applications. This growth is driven by not only geospatial problems in numerous fields, but also emerging scientific applications that are increasingly data- and compute-intensive. For example, digital pathology imaging has become an emerging field during the past decade, where examination of high resolution images of human tissue specimens enables more effective diagnosis, prediction and treatment of diseases. Systematic analysis of large-scale pathology images generates tremendous amounts of spatially derived quantifications of micro-anatomic objects, such as nuclei, blood vessels, and tissue regions. Analytical pathology imaging provides high potential to support image based computer aided diagnosis. One major requirement for this is effective querying of such enormous amount of data with fast response, which is faced with two major challenges: the “big data” challenge and the high computation complexity. In this paper, we present our work towards building a high performance spatial query system for querying massive spatial data on MapReduce. Our framework takes an on demand index building approach for processing spatial queries and a partition-merge approach for building parallel spatial query pipelines, which fits nicely with the computing model of MapReduce. We demonstrate our framework on supporting multi-way spatial joins for algorithm evaluation and nearest neighbor queries for microanatomic objects. To reduce query response time, we propose cost based query optimization to mitigate the effect of data skew. Our experiments show that the framework can efficiently support complex analytical spatial queries on MapReduce. PMID:24501719
NASA Astrophysics Data System (ADS)
Li, C.; Zhu, X.; Guo, W.; Liu, Y.; Huang, H.
2015-05-01
A method suitable for indoor complex semantic query considering the computation of indoor spatial relations is provided According to the characteristics of indoor space. This paper designs ontology model describing the space related information of humans, events and Indoor space objects (e.g. Storey and Room) as well as their relations to meet the indoor semantic query. The ontology concepts are used in IndoorSPARQL query language which extends SPARQL syntax for representing and querying indoor space. And four types specific primitives for indoor query, "Adjacent", "Opposite", "Vertical" and "Contain", are defined as query functions in IndoorSPARQL used to support quantitative spatial computations. Also a method is proposed to analysis the query language. Finally this paper adopts this method to realize indoor semantic query on the study area through constructing the ontology model for the study building. The experimental results show that the method proposed in this paper can effectively support complex indoor space semantic query.
2006-06-01
SPARQL SPARQL Protocol and RDF Query Language SQL Structured Query Language SUMO Suggested Upper Merged Ontology SW... Query optimization algorithms are implemented in the Pellet reasoner in order to ensure querying a knowledge base is efficient . These algorithms...memory as a treelike structure in order for the data to be queried . XML Query (XQuery) is the standard language used when querying XML
QRFXFreeze: Queryable Compressor for RFX.
Senthilkumar, Radha; Nandagopal, Gomathi; Ronald, Daphne
2015-01-01
The verbose nature of XML has been mulled over again and again and many compression techniques for XML data have been excogitated over the years. Some of the techniques incorporate support for querying the XML database in its compressed format while others have to be decompressed before they can be queried. XML compression in which querying is directly supported instantaneously with no compromise over time is forced to compromise over space. In this paper, we propose the compressor, QRFXFreeze, which not only reduces the space of storage but also supports efficient querying. The compressor does this without decompressing the compressed XML file. The compressor supports all kinds of XML documents along with insert, update, and delete operations. The forte of QRFXFreeze is that the textual data are semantically compressed and are indexed to reduce the querying time. Experimental results show that the proposed compressor performs much better than other well-known compressors.
Spatial cyberinfrastructures, ontologies, and the humanities
Sieber, Renee E.; Wellen, Christopher C.; Jin, Yuan
2011-01-01
We report on research into building a cyberinfrastructure for Chinese biographical and geographic data. Our cyberinfrastructure contains (i) the McGill-Harvard-Yenching Library Ming Qing Women's Writings database (MQWW), the only online database on historical Chinese women's writings, (ii) the China Biographical Database, the authority for Chinese historical people, and (iii) the China Historical Geographical Information System, one of the first historical geographic information systems. Key to this integration is that linked databases retain separate identities as bases of knowledge, while they possess sufficient semantic interoperability to allow for multidatabase concepts and to support cross-database queries on an ad hoc basis. Computational ontologies create underlying semantics for database access. This paper focuses on the spatial component in a humanities cyberinfrastructure, which includes issues of conflicting data, heterogeneous data models, disambiguation, and geographic scale. First, we describe the methodology for integrating the databases. Then we detail the system architecture, which includes a tier of ontologies and schema. We describe the user interface and applications that allow for cross-database queries. For instance, users should be able to analyze the data, examine hypotheses on spatial and temporal relationships, and generate historical maps with datasets from MQWW for research, teaching, and publication on Chinese women writers, their familial relations, publishing venues, and the literary and social communities. Last, we discuss the social side of cyberinfrastructure development, as people are considered to be as critical as the technical components for its success. PMID:21444819
SORTEZ: a relational translator for NCBI's ASN.1 database.
Hart, K W; Searls, D B; Overton, G C
1994-07-01
The National Center for Biotechnology Information (NCBI) has created a database collection that includes several protein and nucleic acid sequence databases, a biosequence-specific subset of MEDLINE, as well as value-added information such as links between similar sequences. Information in the NCBI database is modeled in Abstract Syntax Notation 1 (ASN.1) an Open Systems Interconnection protocol designed for the purpose of exchanging structured data between software applications rather than as a data model for database systems. While the NCBI database is distributed with an easy-to-use information retrieval system, ENTREZ, the ASN.1 data model currently lacks an ad hoc query language for general-purpose data access. For that reason, we have developed a software package, SORTEZ, that transforms the ASN.1 database (or other databases with nested data structures) to a relational data model and subsequently to a relational database management system (Sybase) where information can be accessed through the relational query language, SQL. Because the need to transform data from one data model and schema to another arises naturally in several important contexts, including efficient execution of specific applications, access to multiple databases and adaptation to database evolution this work also serves as a practical study of the issues involved in the various stages of database transformation. We show that transformation from the ASN.1 data model to a relational data model can be largely automated, but that schema transformation and data conversion require considerable domain expertise and would greatly benefit from additional support tools.
In-context query reformulation for failing SPARQL queries
NASA Astrophysics Data System (ADS)
Viswanathan, Amar; Michaelis, James R.; Cassidy, Taylor; de Mel, Geeth; Hendler, James
2017-05-01
Knowledge bases for decision support systems are growing increasingly complex, through continued advances in data ingest and management approaches. However, humans do not possess the cognitive capabilities to retain a bird's-eyeview of such knowledge bases, and may end up issuing unsatisfiable queries to such systems. This work focuses on the implementation of a query reformulation approach for graph-based knowledge bases, specifically designed to support the Resource Description Framework (RDF). The reformulation approach presented is instance-and schema-aware. Thus, in contrast to relaxation techniques found in the state-of-the-art, the presented approach produces in-context query reformulation.
McKenna, Thomas M; Bawa, Gagandeep; Kumar, Kamal; Reifman, Jaques
2007-04-01
The physiology analysis system (PAS) was developed as a resource to support the efficient warehousing, management, and analysis of physiology data, particularly, continuous time-series data that may be extensive, of variable quality, and distributed across many files. The PAS incorporates time-series data collected by many types of data-acquisition devices, and it is designed to free users from data management burdens. This Web-based system allows both discrete (attribute) and time-series (ordered) data to be manipulated, visualized, and analyzed via a client's Web browser. All processes occur on a server, so that the client does not have to download data or any application programs, and the PAS is independent of the client's computer operating system. The PAS contains a library of functions, written in different computer languages that the client can add to and use to perform specific data operations. Functions from the library are sequentially inserted into a function chain-based logical structure to construct sophisticated data operators from simple function building blocks, affording ad hoc query and analysis of time-series data. These features support advanced mining of physiology data.
Visually defining and querying consistent multi-granular clinical temporal abstractions.
Combi, Carlo; Oliboni, Barbara
2012-02-01
The main goal of this work is to propose a framework for the visual specification and query of consistent multi-granular clinical temporal abstractions. We focus on the issue of querying patient clinical information by visually defining and composing temporal abstractions, i.e., high level patterns derived from several time-stamped raw data. In particular, we focus on the visual specification of consistent temporal abstractions with different granularities and on the visual composition of different temporal abstractions for querying clinical databases. Temporal abstractions on clinical data provide a concise and high-level description of temporal raw data, and a suitable way to support decision making. Granularities define partitions on the time line and allow one to represent time and, thus, temporal clinical information at different levels of detail, according to the requirements coming from the represented clinical domain. The visual representation of temporal information has been considered since several years in clinical domains. Proposed visualization techniques must be easy and quick to understand, and could benefit from visual metaphors that do not lead to ambiguous interpretations. Recently, physical metaphors such as strips, springs, weights, and wires have been proposed and evaluated on clinical users for the specification of temporal clinical abstractions. Visual approaches to boolean queries have been considered in the last years and confirmed that the visual support to the specification of complex boolean queries is both an important and difficult research topic. We propose and describe a visual language for the definition of temporal abstractions based on a set of intuitive metaphors (striped wall, plastered wall, brick wall), allowing the clinician to use different granularities. A new algorithm, underlying the visual language, allows the physician to specify only consistent abstractions, i.e., abstractions not containing contradictory conditions on the component abstractions. Moreover, we propose a visual query language where different temporal abstractions can be composed to build complex queries: temporal abstractions are visually connected through the usual logical connectives AND, OR, and NOT. The proposed visual language allows one to simply define temporal abstractions by using intuitive metaphors, and to specify temporal intervals related to abstractions by using different temporal granularities. The physician can interact with the designed and implemented tool by point-and-click selections, and can visually compose queries involving several temporal abstractions. The evaluation of the proposed granularity-related metaphors consisted in two parts: (i) solving 30 interpretation exercises by choosing the correct interpretation of a given screenshot representing a possible scenario, and (ii) solving a complex exercise, by visually specifying through the interface a scenario described only in natural language. The exercises were done by 13 subjects. The percentage of correct answers to the interpretation exercises were slightly different with respect to the considered metaphors (54.4--striped wall, 73.3--plastered wall, 61--brick wall, and 61--no wall), but post hoc statistical analysis on means confirmed that differences were not statistically significant. The result of the user's satisfaction questionnaire related to the evaluation of the proposed granularity-related metaphors ratified that there are no preferences for one of them. The evaluation of the proposed logical notation consisted in two parts: (i) solving five interpretation exercises provided by a screenshot representing a possible scenario and by three different possible interpretations, of which only one was correct, and (ii) solving five exercises, by visually defining through the interface a scenario described only in natural language. Exercises had an increasing difficulty. The evaluation involved a total of 31 subjects. Results related to this evaluation phase confirmed us about the soundness of the proposed solution even in comparison with a well known proposal based on a tabular query form (the only significant difference is that our proposal requires more time for the training phase: 21 min versus 14 min). In this work we have considered the issue of visually composing and querying temporal clinical patient data. In this context we have proposed a visual framework for the specification of consistent temporal abstractions with different granularities and for the visual composition of different temporal abstractions to build (possibly) complex queries on clinical databases. A new algorithm has been proposed to check the consistency of the specified granular abstraction. From the evaluation of the proposed metaphors and interfaces and from the comparison of the visual query language with a well known visual method for boolean queries, the soundness of the overall system has been confirmed; moreover, pros and cons and possible improvements emerged from the comparison of different visual metaphors and solutions. Copyright © 2011 Elsevier B.V. All rights reserved.
Ingredients for an Integrated Dinner: Parsley, Sage, Rosemary and Thyme
NASA Astrophysics Data System (ADS)
Baumann, Peter
2013-04-01
In 1966, Simon and Garfunkel combined the English traditional "Scarborough Fair" with a counter melody. This is one of the manifold techniques of the Kontrapunktik described by Bach around 1745 in "The Art of the Fugue": combining completely different and seemingly independent melodies (or motifs) into a coherent piece of music, pleasant for the audience. This achievement, transposed into Computer Science, could be of great benefit for geo services as we look at the currently disparate situation: On the one hand, we have metadata - traditionally, they are understood as being small in volume, but rich in content and semantics, and flexibly queryable through the rich body of technologies established over several decades of database research, centering around query languages like SQL. On the other hand, we have data themselves, such as remote sensing and other measured and observed data sets - they are considered difficult to interpret, semantic-poor, and only for clumsy download, as they are the main constituent of what we today call Big Data. The traditional advantages of databases, such as information integration, query flexibility, and scalability seem to be unavailable. These are the melodies that require a kontrapunctic harmonization, leading to a Holy Grail where different information categories enjoy individually tailored support, while an overall integrating framework allows seamless and convenient access and processing by the user. Most of the data categories to be integrated are well known in fact: ontologies, geospatial meshes, spatiotemporal arrays, and free text constitute major ingredients in this orchestration. For many of them, isolated solutions have been presented, and for some of them (like ontologies and text) integration has been achieved already; a complete harmonic integration, though, is still lacking as of today. In our talk, we detail our vision on such integration through query models and languages which merge established concepts and novel paradigms in a harmonic way. We present the EarthServer initiative which has set out to demonstrate flexible ad-hoc processing and filtering on massive Earth data sets.
Secure Skyline Queries on Cloud Platform.
Liu, Jinfei; Yang, Juncheng; Xiong, Li; Pei, Jian
2017-04-01
Outsourcing data and computation to cloud server provides a cost-effective way to support large scale data storage and query processing. However, due to security and privacy concerns, sensitive data (e.g., medical records) need to be protected from the cloud server and other unauthorized users. One approach is to outsource encrypted data to the cloud server and have the cloud server perform query processing on the encrypted data only. It remains a challenging task to support various queries over encrypted data in a secure and efficient way such that the cloud server does not gain any knowledge about the data, query, and query result. In this paper, we study the problem of secure skyline queries over encrypted data. The skyline query is particularly important for multi-criteria decision making but also presents significant challenges due to its complex computations. We propose a fully secure skyline query protocol on data encrypted using semantically-secure encryption. As a key subroutine, we present a new secure dominance protocol, which can be also used as a building block for other queries. Finally, we provide both serial and parallelized implementations and empirically study the protocols in terms of efficiency and scalability under different parameter settings, verifying the feasibility of our proposed solutions.
Demonstration of Hadoop-GIS: A Spatial Data Warehousing System Over MapReduce.
Aji, Ablimit; Sun, Xiling; Vo, Hoang; Liu, Qioaling; Lee, Rubao; Zhang, Xiaodong; Saltz, Joel; Wang, Fusheng
2013-11-01
The proliferation of GPS-enabled devices, and the rapid improvement of scientific instruments have resulted in massive amounts of spatial data in the last decade. Support of high performance spatial queries on large volumes data has become increasingly important in numerous fields, which requires a scalable and efficient spatial data warehousing solution as existing approaches exhibit scalability limitations and efficiency bottlenecks for large scale spatial applications. In this demonstration, we present Hadoop-GIS - a scalable and high performance spatial query system over MapReduce. Hadoop-GIS provides an efficient spatial query engine to process spatial queries, data and space based partitioning, and query pipelines that parallelize queries implicitly on MapReduce. Hadoop-GIS also provides an expressive, SQL-like spatial query language for workload specification. We will demonstrate how spatial queries are expressed in spatially extended SQL queries, and submitted through a command line/web interface for execution. Parallel to our system demonstration, we explain the system architecture and details on how queries are translated to MapReduce operators, optimized, and executed on Hadoop. In addition, we will showcase how the system can be used to support two representative real world use cases: large scale pathology analytical imaging, and geo-spatial data warehousing.
Querying and Ranking XML Documents.
ERIC Educational Resources Information Center
Schlieder, Torsten; Meuss, Holger
2002-01-01
Discussion of XML, information retrieval, precision, and recall focuses on a retrieval technique that adopts the similarity measure of the vector space model, incorporates the document structure, and supports structured queries. Topics include a query model based on tree matching; structured queries and term-based ranking; and term frequency and…
Data management issues in mobile ad hoc networks
HARA, Takahiro
2017-01-01
Research on mobile ad hoc networks (MANETs) has become a hot research topic since the middle 1990’s. Over the first decade, most research focused on networking techniques, ignoring data management issues. We, however, realized early the importance of data management in MANETs, and have been conducting studies in this area for 15 years. In this review, we summarize some key technical issues related to data management in MANETs, and the studies we have done in addressing these issues, which include placement of data replicas, update management, and query processing with security management. The techniques proposed in our studies have been designed with deep considerations of MANET features including network partitioning, node participation/disappearance, limited network bandwidth, and energy efficiency. Our studies published in early 2000’s have developed a new research field as data management in MANETs. Also, our recent studies are expected to be significant guidelines of new research directions. We conclude the review by discussing some future directions for research. PMID:28496052
Data management issues in mobile ad hoc networks.
Hara, Takahiro
2017-01-01
Research on mobile ad hoc networks (MANETs) has become a hot research topic since the middle 1990's. Over the first decade, most research focused on networking techniques, ignoring data management issues. We, however, realized early the importance of data management in MANETs, and have been conducting studies in this area for 15 years. In this review, we summarize some key technical issues related to data management in MANETs, and the studies we have done in addressing these issues, which include placement of data replicas, update management, and query processing with security management. The techniques proposed in our studies have been designed with deep considerations of MANET features including network partitioning, node participation/disappearance, limited network bandwidth, and energy efficiency. Our studies published in early 2000's have developed a new research field as data management in MANETs. Also, our recent studies are expected to be significant guidelines of new research directions. We conclude the review by discussing some future directions for research.
Leroy, Gondy; Xu, Jennifer; Chung, Wingyan; Eggers, Shauna; Chen, Hsinchun
2007-01-01
Retrieving sufficient relevant information online is difficult for many people because they use too few keywords to search and search engines do not provide many support tools. To further complicate the search, users often ignore support tools when available. Our goal is to evaluate in a realistic setting when users use support tools and how they perceive these tools. We compared three medical search engines with support tools that require more or less effort from users to form a query and evaluate results. We carried out an end user study with 23 users who were asked to find information, i.e., subtopics and supporting abstracts, for a given theme. We used a balanced within-subjects design and report on the effectiveness, efficiency and usability of the support tools from the end user perspective. We found significant differences in efficiency but did not find significant differences in effectiveness between the three search engines. Dynamic user support tools requiring less effort led to higher efficiency. Fewer searches were needed and more documents were found per search when both query reformulation and result review tools dynamically adjust to the user query. The query reformulation tool that provided a long list of keywords, dynamically adjusted to the user query, was used most often and led to more subtopics. As hypothesized, the dynamic result review tools were used more often and led to more subtopics than static ones. These results were corroborated by the usability questionnaires, which showed that support tools that dynamically optimize output were preferred.
Supporting ontology-based keyword search over medical databases.
Kementsietsidis, Anastasios; Lim, Lipyeow; Wang, Min
2008-11-06
The proliferation of medical terms poses a number of challenges in the sharing of medical information among different stakeholders. Ontologies are commonly used to establish relationships between different terms, yet their role in querying has not been investigated in detail. In this paper, we study the problem of supporting ontology-based keyword search queries on a database of electronic medical records. We present several approaches to support this type of queries, study the advantages and limitations of each approach, and summarize the lessons learned as best practices.
Processing SPARQL queries with regular expressions in RDF databases
2011-01-01
Background As the Resource Description Framework (RDF) data model is widely used for modeling and sharing a lot of online bioinformatics resources such as Uniprot (dev.isb-sib.ch/projects/uniprot-rdf) or Bio2RDF (bio2rdf.org), SPARQL - a W3C recommendation query for RDF databases - has become an important query language for querying the bioinformatics knowledge bases. Moreover, due to the diversity of users’ requests for extracting information from the RDF data as well as the lack of users’ knowledge about the exact value of each fact in the RDF databases, it is desirable to use the SPARQL query with regular expression patterns for querying the RDF data. To the best of our knowledge, there is currently no work that efficiently supports regular expression processing in SPARQL over RDF databases. Most of the existing techniques for processing regular expressions are designed for querying a text corpus, or only for supporting the matching over the paths in an RDF graph. Results In this paper, we propose a novel framework for supporting regular expression processing in SPARQL query. Our contributions can be summarized as follows. 1) We propose an efficient framework for processing SPARQL queries with regular expression patterns in RDF databases. 2) We propose a cost model in order to adapt the proposed framework in the existing query optimizers. 3) We build a prototype for the proposed framework in C++ and conduct extensive experiments demonstrating the efficiency and effectiveness of our technique. Conclusions Experiments with a full-blown RDF engine show that our framework outperforms the existing ones by up to two orders of magnitude in processing SPARQL queries with regular expression patterns. PMID:21489225
Processing SPARQL queries with regular expressions in RDF databases.
Lee, Jinsoo; Pham, Minh-Duc; Lee, Jihwan; Han, Wook-Shin; Cho, Hune; Yu, Hwanjo; Lee, Jeong-Hoon
2011-03-29
As the Resource Description Framework (RDF) data model is widely used for modeling and sharing a lot of online bioinformatics resources such as Uniprot (dev.isb-sib.ch/projects/uniprot-rdf) or Bio2RDF (bio2rdf.org), SPARQL - a W3C recommendation query for RDF databases - has become an important query language for querying the bioinformatics knowledge bases. Moreover, due to the diversity of users' requests for extracting information from the RDF data as well as the lack of users' knowledge about the exact value of each fact in the RDF databases, it is desirable to use the SPARQL query with regular expression patterns for querying the RDF data. To the best of our knowledge, there is currently no work that efficiently supports regular expression processing in SPARQL over RDF databases. Most of the existing techniques for processing regular expressions are designed for querying a text corpus, or only for supporting the matching over the paths in an RDF graph. In this paper, we propose a novel framework for supporting regular expression processing in SPARQL query. Our contributions can be summarized as follows. 1) We propose an efficient framework for processing SPARQL queries with regular expression patterns in RDF databases. 2) We propose a cost model in order to adapt the proposed framework in the existing query optimizers. 3) We build a prototype for the proposed framework in C++ and conduct extensive experiments demonstrating the efficiency and effectiveness of our technique. Experiments with a full-blown RDF engine show that our framework outperforms the existing ones by up to two orders of magnitude in processing SPARQL queries with regular expression patterns.
Graphical modeling and query language for hospitals.
Barzdins, Janis; Barzdins, Juris; Rencis, Edgars; Sostaks, Agris
2013-01-01
So far there has been little evidence that implementation of the health information technologies (HIT) is leading to health care cost savings. One of the reasons for this lack of impact by the HIT likely lies in the complexity of the business process ownership in the hospitals. The goal of our research is to develop a business model-based method for hospital use which would allow doctors to retrieve directly the ad-hoc information from various hospital databases. We have developed a special domain-specific process modelling language called the MedMod. Formally, we define the MedMod language as a profile on UML Class diagrams, but we also demonstrate it on examples, where we explain the semantics of all its elements informally. Moreover, we have developed the Process Query Language (PQL) that is based on MedMod process definition language. The purpose of PQL is to allow a doctor querying (filtering) runtime data of hospital's processes described using MedMod. The MedMod language tries to overcome deficiencies in existing process modeling languages, allowing to specify the loosely-defined sequence of the steps to be performed in the clinical process. The main advantages of PQL are in two main areas - usability and efficiency. They are: 1) the view on data through "glasses" of familiar process, 2) the simple and easy-to-perceive means of setting filtering conditions require no more expertise than using spreadsheet applications, 3) the dynamic response to each step in construction of the complete query that shortens the learning curve greatly and reduces the error rate, and 4) the selected means of filtering and data retrieving allows to execute queries in O(n) time regarding the size of the dataset. We are about to continue developing this project with three further steps. First, we are planning to develop user-friendly graphical editors for the MedMod process modeling and query languages. The second step is to do evaluation of usability the proposed language and tool involving the physicians from several hospitals in Latvia and working with real data from these hospitals. Our third step is to develop an efficient implementation of the query language.
Occam's razor: supporting visual query expression for content-based image queries
NASA Astrophysics Data System (ADS)
Venters, Colin C.; Hartley, Richard J.; Hewitt, William T.
2005-01-01
This paper reports the results of a usability experiment that investigated visual query formulation on three dimensions: effectiveness, efficiency, and user satisfaction. Twenty eight evaluation sessions were conducted in order to assess the extent to which query by visual example supports visual query formulation in a content-based image retrieval environment. In order to provide a context and focus for the investigation, the study was segmented by image type, user group, and use function. The image type consisted of a set of abstract geometric device marks supplied by the UK Trademark Registry. Users were selected from the 14 UK Patent Information Network offices. The use function was limited to the retrieval of images by shape similarity. Two client interfaces were developed for comparison purposes: Trademark Image Browser Engine (TRIBE) and Shape Query Image Retrieval Systems Engine (SQUIRE).
Occam"s razor: supporting visual query expression for content-based image queries
NASA Astrophysics Data System (ADS)
Venters, Colin C.; Hartley, Richard J.; Hewitt, William T.
2004-12-01
This paper reports the results of a usability experiment that investigated visual query formulation on three dimensions: effectiveness, efficiency, and user satisfaction. Twenty eight evaluation sessions were conducted in order to assess the extent to which query by visual example supports visual query formulation in a content-based image retrieval environment. In order to provide a context and focus for the investigation, the study was segmented by image type, user group, and use function. The image type consisted of a set of abstract geometric device marks supplied by the UK Trademark Registry. Users were selected from the 14 UK Patent Information Network offices. The use function was limited to the retrieval of images by shape similarity. Two client interfaces were developed for comparison purposes: Trademark Image Browser Engine (TRIBE) and Shape Query Image Retrieval Systems Engine (SQUIRE).
RDF-GL: A SPARQL-Based Graphical Query Language for RDF
NASA Astrophysics Data System (ADS)
Hogenboom, Frederik; Milea, Viorel; Frasincar, Flavius; Kaymak, Uzay
This chapter presents RDF-GL, a graphical query language (GQL) for RDF. The GQL is based on the textual query language SPARQL and mainly focuses on SPARQL SELECT queries. The advantage of a GQL over textual query languages is that complexity is hidden through the use of graphical symbols. RDF-GL is supported by a Java-based editor, SPARQLinG, which is presented as well. The editor does not only allow for RDF-GL query creation, but also converts RDF-GL queries to SPARQL queries and is able to subsequently execute these. Experiments show that using the GQL in combination with the editor makes RDF querying more accessible for end users.
Secure Skyline Queries on Cloud Platform
Liu, Jinfei; Yang, Juncheng; Xiong, Li; Pei, Jian
2017-01-01
Outsourcing data and computation to cloud server provides a cost-effective way to support large scale data storage and query processing. However, due to security and privacy concerns, sensitive data (e.g., medical records) need to be protected from the cloud server and other unauthorized users. One approach is to outsource encrypted data to the cloud server and have the cloud server perform query processing on the encrypted data only. It remains a challenging task to support various queries over encrypted data in a secure and efficient way such that the cloud server does not gain any knowledge about the data, query, and query result. In this paper, we study the problem of secure skyline queries over encrypted data. The skyline query is particularly important for multi-criteria decision making but also presents significant challenges due to its complex computations. We propose a fully secure skyline query protocol on data encrypted using semantically-secure encryption. As a key subroutine, we present a new secure dominance protocol, which can be also used as a building block for other queries. Finally, we provide both serial and parallelized implementations and empirically study the protocols in terms of efficiency and scalability under different parameter settings, verifying the feasibility of our proposed solutions. PMID:28883710
Demonstration of Hadoop-GIS: A Spatial Data Warehousing System Over MapReduce
Aji, Ablimit; Sun, Xiling; Vo, Hoang; Liu, Qioaling; Lee, Rubao; Zhang, Xiaodong; Saltz, Joel; Wang, Fusheng
2016-01-01
The proliferation of GPS-enabled devices, and the rapid improvement of scientific instruments have resulted in massive amounts of spatial data in the last decade. Support of high performance spatial queries on large volumes data has become increasingly important in numerous fields, which requires a scalable and efficient spatial data warehousing solution as existing approaches exhibit scalability limitations and efficiency bottlenecks for large scale spatial applications. In this demonstration, we present Hadoop-GIS – a scalable and high performance spatial query system over MapReduce. Hadoop-GIS provides an efficient spatial query engine to process spatial queries, data and space based partitioning, and query pipelines that parallelize queries implicitly on MapReduce. Hadoop-GIS also provides an expressive, SQL-like spatial query language for workload specification. We will demonstrate how spatial queries are expressed in spatially extended SQL queries, and submitted through a command line/web interface for execution. Parallel to our system demonstration, we explain the system architecture and details on how queries are translated to MapReduce operators, optimized, and executed on Hadoop. In addition, we will showcase how the system can be used to support two representative real world use cases: large scale pathology analytical imaging, and geo-spatial data warehousing. PMID:27617325
EarthServer: Visualisation and use of uncertainty as a data exploration tool
NASA Astrophysics Data System (ADS)
Walker, Peter; Clements, Oliver; Grant, Mike
2013-04-01
The Ocean Science/Earth Observation community generates huge datasets from satellite observation. Until recently it has been difficult to obtain matching uncertainty information for these datasets and to apply this to their processing. In order to make use of uncertainty information when analysing "Big Data" we need both the uncertainty itself (attached to the underlying data) and a means of working with the combined product without requiring the entire dataset to be downloaded. The European Commission FP7 project EarthServer (http://earthserver.eu) is addressing the problem of accessing and ad-hoc analysis of extreme-size Earth Science data using cutting-edge Array Database technology. The core software (Rasdaman) and web services wrapper (Petascope) allow huge datasets to be accessed using Open Geospatial Consortium (OGC) standard interfaces including the well established standards, Web Coverage Service (WCS) and Web Map Service (WMS) as well as the emerging standard, Web Coverage Processing Service (WCPS). The WCPS standard allows the running of ad-hoc queries on any of the data stored within Rasdaman, creating an infrastructure where users are not restricted by bandwidth when manipulating or querying huge datasets. The ESA Ocean Colour - Climate Change Initiative (OC-CCI) project (http://www.esa-oceancolour-cci.org/), is producing high-resolution, global ocean colour datasets over the full time period (1998-2012) where high quality observations were available. This climate data record includes per-pixel uncertainty data for each variable, based on an analytic method that classifies how much and which types of water are present in a pixel, and assigns uncertainty based on robust comparisons to global in-situ validation datasets. These uncertainty values take two forms, Root Mean Square (RMS) and Bias uncertainty, respectively representing the expected variability and expected offset error. By combining the data produced through the OC-CCI project with the software from the EarthServer project we can produce a novel data offering that allows the use of traditional exploration and access mechanisms such as WMS and WCS. However the real benefits can be seen when utilising WCPS to explore the data . We will show two major benefits to this infrastructure. Firstly we will show that the visualisation of the combined chlorophyll and uncertainty datasets through a web based GIS portal gives users the ability to instantaneously assess the quality of the data they are exploring using traditional web based plotting techniques as well as through novel web based 3 dimensional visualisation. Secondly we will showcase the benefits available when combining these data with the WCPS standard. The uncertainty data can be utilised in queries using the standard WCPS query language. This allows selection of data either for download or use within the query, based on the respective uncertainty values as well as the possibility of incorporating both the chlorophyll data and uncertainty data into complex queries to produce additional novel data products. By filtering with uncertainty at the data source rather than the client we can minimise traffic over the network allowing huge datasets to be worked on with a minimal time penalty.
Intervening to Reduce Suicide Risk in Veterans with Substance Use Disorders
2015-01-01
Other Support for Dr. Valenstein: 1. Effort ended on NIH/ NIA P01 AG031098 (PI – Cutler). 2. Effort ended on DoD W81XWH-11-2-0059 (PI – Hunt). 3...Effort started on VA QUERI RRP 12-511 (PI – Zivin). 19. Effort started on VA QUERI RRP 12- 505 (PI – Pfeiffer). OTHER SUPPORT VALENSTEIN...Pfeiffer, P.) 11/01/13 – 02/28/15 1.2 calendar VA MH-QUERI RRP 12- 505 $90,942 Technology-assisted peer support for recently hospitalized
StarView: The object oriented design of the ST DADS user interface
NASA Technical Reports Server (NTRS)
Williams, J. D.; Pollizzi, J. A.
1992-01-01
StarView is the user interface being developed for the Hubble Space Telescope Data Archive and Distribution Service (ST DADS). ST DADS is the data archive for HST observations and a relational database catalog describing the archived data. Users will use StarView to query the catalog and select appropriate datasets for study. StarView sends requests for archived datasets to ST DADS which processes the requests and returns the database to the user. StarView is designed to be a powerful and extensible user interface. Unique features include an internal relational database to navigate query results, a form definition language that will work with both CRT and X interfaces, a data definition language that will allow StarView to work with any relational database, and the ability to generate adhoc queries without requiring the user to understand the structure of the ST DADS catalog. Ultimately, StarView will allow the user to refine queries in the local database for improved performance and merge in data from external sources for correlation with other query results. The user will be able to create a query from single or multiple forms, merging the selected attributes into a single query. Arbitrary selection of attributes for querying is supported. The user will be able to select how query results are viewed. A standard form or table-row format may be used. Navigation capabilities are provided to aid the user in viewing query results. Object oriented analysis and design techniques were used in the design of StarView to support the mechanisms and concepts required to implement these features. One such mechanism is the Model-View-Controller (MVC) paradigm. The MVC allows the user to have multiple views of the underlying database, while providing a consistent mechanism for interaction regardless of the view. This approach supports both CRT and X interfaces while providing a common mode of user interaction. Another powerful abstraction is the concept of a Query Model. This concept allows a single query to be built form a single or multiple forms before it is submitted to ST DADS. Supporting this concept is the adhoc query generator which allows the user to select and qualify an indeterminate number attributes from the database. The user does not need any knowledge of how the joins across various tables are to be resolved. The adhoc generator calculates the joins automatically and generates the correct SQL query.
ERIC Educational Resources Information Center
Belkin, N. J.; Cool, C.; Kelly, D.; Lin, S. -J.; Park, S. Y.; Perez-Carballo, J.; Sikora, C.
2001-01-01
Reports on the progressive investigation of techniques for supporting interactive query reformulation in the TREC (Text Retrieval Conference) Interactive Track. Highlights include methods of term suggestion; interface design to support different system functionalities; an overview of each year's TREC investigation; and relevance to the development…
Woo, Hyekyung; Cho, Youngtae; Shim, Eunyoung; Lee, Jong-Koo; Lee, Chang-Gun; Kim, Seong Hwan
2016-07-04
As suggested as early as in 2006, logs of queries submitted to search engines seeking information could be a source for detection of emerging influenza epidemics if changes in the volume of search queries are monitored (infodemiology). However, selecting queries that are most likely to be associated with influenza epidemics is a particular challenge when it comes to generating better predictions. In this study, we describe a methodological extension for detecting influenza outbreaks using search query data; we provide a new approach for query selection through the exploration of contextual information gleaned from social media data. Additionally, we evaluate whether it is possible to use these queries for monitoring and predicting influenza epidemics in South Korea. Our study was based on freely available weekly influenza incidence data and query data originating from the search engine on the Korean website Daum between April 3, 2011 and April 5, 2014. To select queries related to influenza epidemics, several approaches were applied: (1) exploring influenza-related words in social media data, (2) identifying the chief concerns related to influenza, and (3) using Web query recommendations. Optimal feature selection by least absolute shrinkage and selection operator (Lasso) and support vector machine for regression (SVR) were used to construct a model predicting influenza epidemics. In total, 146 queries related to influenza were generated through our initial query selection approach. A considerable proportion of optimal features for final models were derived from queries with reference to the social media data. The SVR model performed well: the prediction values were highly correlated with the recent observed influenza-like illness (r=.956; P<.001) and virological incidence rate (r=.963; P<.001). These results demonstrate the feasibility of using search queries to enhance influenza surveillance in South Korea. In addition, an approach for query selection using social media data seems ideal for supporting influenza surveillance based on search query data.
Woo, Hyekyung; Shim, Eunyoung; Lee, Jong-Koo; Lee, Chang-Gun; Kim, Seong Hwan
2016-01-01
Background As suggested as early as in 2006, logs of queries submitted to search engines seeking information could be a source for detection of emerging influenza epidemics if changes in the volume of search queries are monitored (infodemiology). However, selecting queries that are most likely to be associated with influenza epidemics is a particular challenge when it comes to generating better predictions. Objective In this study, we describe a methodological extension for detecting influenza outbreaks using search query data; we provide a new approach for query selection through the exploration of contextual information gleaned from social media data. Additionally, we evaluate whether it is possible to use these queries for monitoring and predicting influenza epidemics in South Korea. Methods Our study was based on freely available weekly influenza incidence data and query data originating from the search engine on the Korean website Daum between April 3, 2011 and April 5, 2014. To select queries related to influenza epidemics, several approaches were applied: (1) exploring influenza-related words in social media data, (2) identifying the chief concerns related to influenza, and (3) using Web query recommendations. Optimal feature selection by least absolute shrinkage and selection operator (Lasso) and support vector machine for regression (SVR) were used to construct a model predicting influenza epidemics. Results In total, 146 queries related to influenza were generated through our initial query selection approach. A considerable proportion of optimal features for final models were derived from queries with reference to the social media data. The SVR model performed well: the prediction values were highly correlated with the recent observed influenza-like illness (r=.956; P<.001) and virological incidence rate (r=.963; P<.001). Conclusions These results demonstrate the feasibility of using search queries to enhance influenza surveillance in South Korea. In addition, an approach for query selection using social media data seems ideal for supporting influenza surveillance based on search query data. PMID:27377323
NASA Technical Reports Server (NTRS)
Aspinall, David; Denney, Ewen; Lueth, Christoph
2012-01-01
We motivate and introduce a query language PrQL designed for inspecting machine representations of proofs. PrQL natively supports hiproofs which express proof structure using hierarchical nested labelled trees. The core language presented in this paper is locally structured (first-order), with queries built using recursion and patterns over proof structure and rule names. We define the syntax and semantics of locally structured queries, demonstrate their power, and sketch some implementation experiments.
Improved data retrieval from TreeBASE via taxonomic and linguistic data enrichment
Anwar, Nadia; Hunt, Ela
2009-01-01
Background TreeBASE, the only data repository for phylogenetic studies, is not being used effectively since it does not meet the taxonomic data retrieval requirements of the systematics community. We show, through an examination of the queries performed on TreeBASE, that data retrieval using taxon names is unsatisfactory. Results We report on a new wrapper supporting taxon queries on TreeBASE by utilising a Taxonomy and Classification Database (TCl-Db) we created. TCl-Db holds merged and consolidated taxonomic names from multiple data sources and can be used to translate hierarchical, vernacular and synonym queries into specific query terms in TreeBASE. The query expansion supported by TCl-Db shows very significant information retrieval quality improvement. The wrapper can be accessed at the URL The methodology we developed is scalable and can be applied to new data, as those become available in the future. Conclusion Significantly improved data retrieval quality is shown for all queries, and additional flexibility is achieved via user-driven taxonomy selection. PMID:19426482
SPARQL Assist language-neutral query composer
2012-01-01
Background SPARQL query composition is difficult for the lay-person, and even the experienced bioinformatician in cases where the data model is unfamiliar. Moreover, established best-practices and internationalization concerns dictate that the identifiers for ontological terms should be opaque rather than human-readable, which further complicates the task of synthesizing queries manually. Results We present SPARQL Assist: a Web application that addresses these issues by providing context-sensitive type-ahead completion during SPARQL query construction. Ontological terms are suggested using their multi-lingual labels and descriptions, leveraging existing support for internationalization and language-neutrality. Moreover, the system utilizes the semantics embedded in ontologies, and within the query itself, to help prioritize the most likely suggestions. Conclusions To ensure success, the Semantic Web must be easily available to all users, regardless of locale, training, or preferred language. By enhancing support for internationalization, and moreover by simplifying the manual construction of SPARQL queries through the use of controlled-natural-language interfaces, we believe we have made some early steps towards simplifying access to Semantic Web resources. PMID:22373327
SPARQL assist language-neutral query composer.
McCarthy, Luke; Vandervalk, Ben; Wilkinson, Mark
2012-01-25
SPARQL query composition is difficult for the lay-person, and even the experienced bioinformatician in cases where the data model is unfamiliar. Moreover, established best-practices and internationalization concerns dictate that the identifiers for ontological terms should be opaque rather than human-readable, which further complicates the task of synthesizing queries manually. We present SPARQL Assist: a Web application that addresses these issues by providing context-sensitive type-ahead completion during SPARQL query construction. Ontological terms are suggested using their multi-lingual labels and descriptions, leveraging existing support for internationalization and language-neutrality. Moreover, the system utilizes the semantics embedded in ontologies, and within the query itself, to help prioritize the most likely suggestions. To ensure success, the Semantic Web must be easily available to all users, regardless of locale, training, or preferred language. By enhancing support for internationalization, and moreover by simplifying the manual construction of SPARQL queries through the use of controlled-natural-language interfaces, we believe we have made some early steps towards simplifying access to Semantic Web resources.
Information Network Model Query Processing
NASA Astrophysics Data System (ADS)
Song, Xiaopu
Information Networking Model (INM) [31] is a novel database model for real world objects and relationships management. It naturally and directly supports various kinds of static and dynamic relationships between objects. In INM, objects are networked through various natural and complex relationships. INM Query Language (INM-QL) [30] is designed to explore such information network, retrieve information about schema, instance, their attributes, relationships, and context-dependent information, and process query results in the user specified form. INM database management system has been implemented using Berkeley DB, and it supports INM-QL. This thesis is mainly focused on the implementation of the subsystem that is able to effectively and efficiently process INM-QL. The subsystem provides a lexical and syntactical analyzer of INM-QL, and it is able to choose appropriate evaluation strategies and index mechanism to process queries in INM-QL without the user's intervention. It also uses intermediate result structure to hold intermediate query result and other helping structures to reduce complexity of query processing.
Simulation Data as Data Streams
DOE Office of Scientific and Technical Information (OSTI.GOV)
Abdulla, G; Arrighi, W; Critchlow, T
2003-11-18
Computational or scientific simulations are increasingly being applied to solve a variety of scientific problems. Domains such as astrophysics, engineering, chemistry, biology, and environmental studies are benefiting from this important capability. Simulations, however, produce enormous amounts of data that need to be analyzed and understood. In this overview paper, we describe scientific simulation data, its characteristics, and the way scientists generate and use the data. We then compare and contrast simulation data to data streams. Finally, we describe our approach to analyzing simulation data, present the AQSim (Ad-hoc Queries for Simulation data) system, and discuss some of the challenges thatmore » result from handling this kind of data.« less
Operational Monitoring of GOME-2 and IASI Level 1 Product Processing at EUMETSAT
NASA Astrophysics Data System (ADS)
Livschitz, Yakov; Munro, Rosemary; Lang, Rüdiger; Fiedler, Lars; Dyer, Richard; Eisinger, Michael
2010-05-01
The growing complexity of operational level 1 radiance products from Low Earth Orbiting (LEO) platforms like EUMETSATs Metop series makes near-real-time monitoring of product quality a challenging task. The main challenge is to provide a monitoring system which is flexible and robust enough to identify and to react to anomalies which may be previously unknown to the system, as well as to provide all means and parameters necessary in order to support efficient ad-hoc analysis of the incident. The operational monitoring system developed at EUMETSAT for monitoring of GOME-2 and IASI level 1 data allows to perform near-real-time monitoring of operational products and instrument's health in a robust and flexible fashion. For effective information management, the system is based on a relational database (Oracle). An Extract, Transform, Load (ETL) process transforms products in EUMETSAT Polar System (EPS) format into relational data structures. The identification of commonalities between products and instruments allows for a database structure design in such a way that different data can be analyzed using the same business intelligence functionality. An interactive analysis software implementing modern data mining techniques is also provided for a detailed look into the data. The system is effectively used for day-to-day monitoring, long-term reporting, instrument's degradation analysis as well as for ad-hoc queries in case of an unexpected instrument or processing behaviour. Having data from different sources on a single instrument and even from different instruments, platforms or numerical weather prediction within the same database allows effective cross-comparison and looking for correlated parameters. Automatic alarms raised by checking for deviation of certain parameters, for data losses and other events significantly reduce time, necessary to monitor the processing on a day-to-day basis.
Operational Monitoring of GOME-2 and IASI Level 1 Product Processing at EUMETSAT
NASA Astrophysics Data System (ADS)
Livschitz, Y.; Munro, R.; Lang, R.; Fiedler, L.; Dyer, R.; Eisinger, M.
2009-12-01
The growing complexity of operational level 1 radiance products from Low Earth Orbiting (LEO) platforms like EUMETSATs Metop series makes near-real-time monitoring of product quality a challenging task. The main challenge is to provide a monitoring system which is flexible and robust enough to identify and to react to anomalies which may be previously unknown to the system, as well as to provide all means and parameters necessary in order to support efficient ad-hoc analysis of the incident. The operational monitoring system developed at EUMETSAT for monitoring of GOME-2 and IASI level 1 data allows to perform near-real-time monitoring of operational products and instrument’s health in a robust and flexible fashion. For effective information management, the system is based on a relational database (Oracle). An Extract, Transform, Load (ETL) process transforms products in EUMETSAT Polar System (EPS) format into relational data structures. The identification of commonalities between products and instruments allows for a database structure design in such a way that different data can be analyzed using the same business intelligence functionality. An interactive analysis software implementing modern data mining techniques is also provided for a detailed look into the data. The system is effectively used for day-to-day monitoring, long-term reporting, instrument’s degradation analysis as well as for ad-hoc queries in case of an unexpected instrument or processing behaviour. Having data from different sources on a single instrument and even from different instruments, platforms or numerical weather prediction within the same database allows effective cross-comparison and looking for correlated parameters. Automatic alarms raised by checking for deviation of certain parameters, for data losses and other events significantly reduce time, necessary to monitor the processing on a day-to-day basis.
A new XML-based query language, CSRML, has been developed for representing chemical substructures, molecules, reaction rules, and reactions. CSRML queries are capable of integrating additional forms of information beyond the simple substructure (e.g., SMARTS) or reaction transfor...
WORDGRAPH: Keyword-in-Context Visualization for NETSPEAK's Wildcard Search.
Riehmann, Patrick; Gruendl, Henning; Potthast, Martin; Trenkmann, Martin; Stein, Benno; Froehlich, Benno
2012-09-01
The WORDGRAPH helps writers in visually choosing phrases while writing a text. It checks for the commonness of phrases and allows for the retrieval of alternatives by means of wildcard queries. To support such queries, we implement a scalable retrieval engine, which returns high-quality results within milliseconds using a probabilistic retrieval strategy. The results are displayed as WORDGRAPH visualization or as a textual list. The graphical interface provides an effective means for interactive exploration of search results using filter techniques, query expansion, and navigation. Our observations indicate that, of three investigated retrieval tasks, the textual interface is sufficient for the phrase verification task, wherein both interfaces support context-sensitive word choice, and the WORDGRAPH best supports the exploration of a phrase's context or the underlying corpus. Our user study confirms these observations and shows that WORDGRAPH is generally the preferred interface over the textual result list for queries containing multiple wildcards.
Content-aware network storage system supporting metadata retrieval
NASA Astrophysics Data System (ADS)
Liu, Ke; Qin, Leihua; Zhou, Jingli; Nie, Xuejun
2008-12-01
Nowadays, content-based network storage has become the hot research spot of academy and corporation[1]. In order to solve the problem of hit rate decline causing by migration and achieve the content-based query, we exploit a new content-aware storage system which supports metadata retrieval to improve the query performance. Firstly, we extend the SCSI command descriptor block to enable system understand those self-defined query requests. Secondly, the extracted metadata is encoded by extensible markup language to improve the universality. Thirdly, according to the demand of information lifecycle management (ILM), we store those data in different storage level and use corresponding query strategy to retrieval them. Fourthly, as the file content identifier plays an important role in locating data and calculating block correlation, we use it to fetch files and sort query results through friendly user interface. Finally, the experiments indicate that the retrieval strategy and sort algorithm have enhanced the retrieval efficiency and precision.
Xiao, Fuyuan; Aritsugi, Masayoshi; Wang, Qing; Zhang, Rong
2016-09-01
For efficient and sophisticated analysis of complex event patterns that appear in streams of big data from health care information systems and support for decision-making, a triaxial hierarchical model is proposed in this paper. Our triaxial hierarchical model is developed by focusing on hierarchies among nested event pattern queries with an event concept hierarchy, thereby allowing us to identify the relationships among the expressions and sub-expressions of the queries extensively. We devise a cost-based heuristic by means of the triaxial hierarchical model to find an optimised query execution plan in terms of the costs of both the operators and the communications between them. According to the triaxial hierarchical model, we can also calculate how to reuse the results of the common sub-expressions in multiple queries. By integrating the optimised query execution plan with the reuse schemes, a multi-query optimisation strategy is developed to accomplish efficient processing of multiple nested event pattern queries. We present empirical studies in which the performance of multi-query optimisation strategy was examined under various stream input rates and workloads. Specifically, the workloads of pattern queries can be used for supporting monitoring patients' conditions. On the other hand, experiments with varying input rates of streams can correspond to changes of the numbers of patients that a system should manage, whereas burst input rates can correspond to changes of rushes of patients to be taken care of. The experimental results have shown that, in Workload 1, our proposal can improve about 4 and 2 times throughput comparing with the relative works, respectively; in Workload 2, our proposal can improve about 3 and 2 times throughput comparing with the relative works, respectively; in Workload 3, our proposal can improve about 6 times throughput comparing with the relative work. The experimental results demonstrated that our proposal was able to process complex queries efficiently which can support health information systems and further decision-making. Copyright © 2016 Elsevier B.V. All rights reserved.
Federated ontology-based queries over cancer data
2012-01-01
Background Personalised medicine provides patients with treatments that are specific to their genetic profiles. It requires efficient data sharing of disparate data types across a variety of scientific disciplines, such as molecular biology, pathology, radiology and clinical practice. Personalised medicine aims to offer the safest and most effective therapeutic strategy based on the gene variations of each subject. In particular, this is valid in oncology, where knowledge about genetic mutations has already led to new therapies. Current molecular biology techniques (microarrays, proteomics, epigenetic technology and improved DNA sequencing technology) enable better characterisation of cancer tumours. The vast amounts of data, however, coupled with the use of different terms - or semantic heterogeneity - in each discipline makes the retrieval and integration of information difficult. Results Existing software infrastructures for data-sharing in the cancer domain, such as caGrid, support access to distributed information. caGrid follows a service-oriented model-driven architecture. Each data source in caGrid is associated with metadata at increasing levels of abstraction, including syntactic, structural, reference and domain metadata. The domain metadata consists of ontology-based annotations associated with the structural information of each data source. However, caGrid's current querying functionality is given at the structural metadata level, without capitalising on the ontology-based annotations. This paper presents the design of and theoretical foundations for distributed ontology-based queries over cancer research data. Concept-based queries are reformulated to the target query language, where join conditions between multiple data sources are found by exploiting the semantic annotations. The system has been implemented, as a proof of concept, over the caGrid infrastructure. The approach is applicable to other model-driven architectures. A graphical user interface has been developed, supporting ontology-based queries over caGrid data sources. An extensive evaluation of the query reformulation technique is included. Conclusions To support personalised medicine in oncology, it is crucial to retrieve and integrate molecular, pathology, radiology and clinical data in an efficient manner. The semantic heterogeneity of the data makes this a challenging task. Ontologies provide a formal framework to support querying and integration. This paper provides an ontology-based solution for querying distributed databases over service-oriented, model-driven infrastructures. PMID:22373043
NASA Astrophysics Data System (ADS)
Liao, S.; Chen, L.; Li, J.; Xiong, W.; Wu, Q.
2015-07-01
Existing spatiotemporal database supports spatiotemporal aggregation query over massive moving objects datasets. Due to the large amounts of data and single-thread processing method, the query speed cannot meet the application requirements. On the other hand, the query efficiency is more sensitive to spatial variation then temporal variation. In this paper, we proposed a spatiotemporal aggregation query method using multi-thread parallel technique based on regional divison and implemented it on the server. Concretely, we divided the spatiotemporal domain into several spatiotemporal cubes, computed spatiotemporal aggregation on all cubes using the technique of multi-thread parallel processing, and then integrated the query results. By testing and analyzing on the real datasets, this method has improved the query speed significantly.
ERIC Educational Resources Information Center
Clement, T.; Brown, J.; Morrison, J.; Nestel, D.
2016-01-01
General practice registrars in Australia undertake most of their vocational training in accredited general practices. They typically see patients alone from the start of their community-based training and are expected to seek timely ad hoc support from their supervisor. Such ad hoc encounters are a mechanism for ensuring patient safety, but also…
Hybrid Packet-Pheromone-Based Probabilistic Routing for Mobile Ad Hoc Networks
NASA Astrophysics Data System (ADS)
Kashkouli Nejad, Keyvan; Shawish, Ahmed; Jiang, Xiaohong; Horiguchi, Susumu
Ad-Hoc networks are collections of mobile nodes communicating using wireless media without any fixed infrastructure. Minimal configuration and quick deployment make Ad-Hoc networks suitable for emergency situations like natural disasters or military conflicts. The current Ad-Hoc networks can only support either high mobility or high transmission rate at a time because they employ static approaches in their routing schemes. However, due to the continuous expansion of the Ad-Hoc network size, node-mobility and transmission rate, the development of new adaptive and dynamic routing schemes has become crucial. In this paper we propose a new routing scheme to support high transmission rates and high node-mobility simultaneously in a big Ad-Hoc network, by combining a new proposed packet-pheromone-based approach with the Hint Based Probabilistic Protocol (HBPP) for congestion avoidance with dynamic path selection in packet forwarding process. Because of using the available feedback information, the proposed algorithm does not introduce any additional overhead. The extensive simulation-based analysis conducted in this paper indicates that the proposed algorithm offers small packet-latency and achieves a significantly higher delivery probability in comparison with the available Hint-Based Probabilistic Protocol (HBPP).
ERIC Educational Resources Information Center
Lynch, Clifford A.
1991-01-01
Describes several aspects of the problem of supporting information retrieval system query requirements in the relational database management system (RDBMS) environment and proposes an extension to query processing called nonmaterialized relations. User interactions with information retrieval systems are discussed, and nonmaterialized relations are…
How Do Children Reformulate Their Search Queries?
ERIC Educational Resources Information Center
Rutter, Sophie; Ford, Nigel; Clough, Paul
2015-01-01
Introduction: This paper investigates techniques used by children in year 4 (age eight to nine) of a UK primary school to reformulate their queries, and how they use information retrieval systems to support query reformulation. Method: An in-depth study analysing the interactions of twelve children carrying out search tasks in a primary school…
An Experimental Investigation of Complexity in Database Query Formulation Tasks
ERIC Educational Resources Information Center
Casterella, Gretchen Irwin; Vijayasarathy, Leo
2013-01-01
Information Technology professionals and other knowledge workers rely on their ability to extract data from organizational databases to respond to business questions and support decision making. Structured query language (SQL) is the standard programming language for querying data in relational databases, and SQL skills are in high demand and are…
Blind Seer: A Scalable Private DBMS
2014-05-01
searchable index terms per DB row, in time comparable to (insecure) MySQL (many practical queries can be privately executed with work 1.2-3 times slower...than MySQL , although some queries are costlier). We support a rich query set, including searching on arbitrary boolean formulas on keywords and ranges...index terms per DB row, in time comparable to (insecure) MySQL (many practical queries can be privately executed with work 1.2-3 times slower than MySQL
Debruyne, Philip R; Johnson, Philip J; Pottel, Lies; Daniels, Susanna; Greer, Rachel; Hodgkinson, Elizabeth; Kelly, Stephen; Lycke, Michelle; Samol, Jens; Mason, Julie; Kimber, Donna; Loucaides, Eileen; Parmar, Mahesh Kb; Harvey, Sally
2015-06-01
Clarity and accuracy of the pharmacy aspects of cancer clinical trial protocols is essential. Inconsistencies and ambiguities in such protocols have the potential to delay research and jeopardise both patient safety and collection of credible data. The Chemotherapy and Pharmacy Advisory Service was established by the UK National Cancer Research Network, currently known as National Institute for Health Research Clinical Research Network, to improve the quality of pharmacy-related content in cancer clinical research protocols. This article reports the scope of Chemotherapy and Pharmacy Advisory Service, its methodology of mandated protocol review and pharmacy-related guidance initiatives and its current impact. Over a 6-year period (2008-2013) since the inception of Chemotherapy and Pharmacy Advisory Service, cancer clinical trial protocols were reviewed by the service, prior to implementation at clinical trial sites. A customised Review Checklist was developed and used by a panel of experts to standardise the review process and report back queries and inconsistencies to chief investigators. Based on common queries, a Standard Protocol Template comprising specific guidance on drug-related content and a Pharmacy Manual Template were developed. In addition, a guidance framework was established to address 'ad hoc' pharmacy-related queries. The most common remarks made at protocol review have been summarised and categorised through retrospective analysis. In order to evaluate the impact of the service, chief investigators were asked to respond to queries made at protocol review and make appropriate changes to their protocols. Responses from chief investigators have been collated and acceptance rates determined. A total of 176 protocols were reviewed. The median number of remarks per protocol was 26, of which 20 were deemed clinically relevant and mainly concerned the drug regimen, support medication, frequency and type of monitoring and drug supply aspects. Further analysis revealed that 62% of chief investigators responded to the review. All responses were positive with an overall acceptance rate of 89% of the proposed protocol changes. Review of pharmacy content of cancer clinical trial protocols is feasible and exposes many undetected clinically relevant issues that could hinder efficient trial conduct. Our service audit revealed that the majority of suggestions were effectively incorporated in the final protocols. The refinement of existing and development of new pharmacy-related guidance documents by Chemotherapy and Pharmacy Advisory Service might aid in better and safer clinical research. © The Author(s) 2015.
SADI, SHARE, and the in silico scientific method
2010-01-01
Background The emergence and uptake of Semantic Web technologies by the Life Sciences provides exciting opportunities for exploring novel ways to conduct in silico science. Web Service Workflows are already becoming first-class objects in “the new way”, and serve as explicit, shareable, referenceable representations of how an experiment was done. In turn, Semantic Web Service projects aim to facilitate workflow construction by biological domain-experts such that workflows can be edited, re-purposed, and re-published by non-informaticians. However the aspects of the scientific method relating to explicit discourse, disagreement, and hypothesis generation have remained relatively impervious to new technologies. Results Here we present SADI and SHARE - a novel Semantic Web Service framework, and a reference implementation of its client libraries. Together, SADI and SHARE allow the semi- or fully-automatic discovery and pipelining of Semantic Web Services in response to ad hoc user queries. Conclusions The semantic behaviours exhibited by SADI and SHARE extend the functionalities provided by Description Logic Reasoners such that novel assertions can be automatically added to a data-set without logical reasoning, but rather by analytical or annotative services. This behaviour might be applied to achieve the “semantification” of those aspects of the in silico scientific method that are not yet supported by Semantic Web technologies. We support this suggestion using an example in the clinical research space. PMID:21210986
Flexible Decision Support in Device-Saturated Environments
2003-10-01
also output tuples to a remote MySQL or Postgres database. 3.3 GUI The GUI allows the user to pose queries using SQL and to display query...DatabaseConnection.java – handles connections to an external database (such as MySQL or Postgres ). • Debug.java – contains the code for printing out Debug messages...also provided. It is possible to output the results of queries to a MySQL or Postgres database for archival and the GUI can query those results
Trocky, NM; Fontinha, M
2005-01-01
Data collected throughout the course of a clinical research trial must be reviewed for accuracy and completeness continually. The Oracle Clinical® (OC) data management application utilized to capture clinical data facilitates data integrity through pre-programmed validations, edit and range checks, and discrepancy management modules. These functions were not enough. Coupled with the use of specially created reports in Oracle Discoverer® and Integrated Review TM, both ad-hoc query and reporting tools, research staff have enhanced their ability to clean, analyze and report more accurate data captured within and among Case Report Forms (eCRFs) by individual study or across multiple studies. PMID:16779428
Using the Weighted Keyword Model to Improve Information Retrieval for Answering Biomedical Questions
Yu, Hong; Cao, Yong-gang
2009-01-01
Physicians ask many complex questions during the patient encounter. Information retrieval systems that can provide immediate and relevant answers to these questions can be invaluable aids to the practice of evidence-based medicine. In this study, we first automatically identify topic keywords from ad hoc clinical questions with a Condition Random Field model that is trained over thousands of manually annotated clinical questions. We then report on a linear model that assigns query weights based on their automatically identified semantic roles: topic keywords, domain specific terms, and their synonyms. Our evaluation shows that this weighted keyword model improves information retrieval from the Text Retrieval Conference Genomics track data. PMID:21347188
Yu, Hong; Cao, Yong-Gang
2009-03-01
Physicians ask many complex questions during the patient encounter. Information retrieval systems that can provide immediate and relevant answers to these questions can be invaluable aids to the practice of evidence-based medicine. In this study, we first automatically identify topic keywords from ad hoc clinical questions with a Condition Random Field model that is trained over thousands of manually annotated clinical questions. We then report on a linear model that assigns query weights based on their automatically identified semantic roles: topic keywords, domain specific terms, and their synonyms. Our evaluation shows that this weighted keyword model improves information retrieval from the Text Retrieval Conference Genomics track data.
PAQ: Persistent Adaptive Query Middleware for Dynamic Environments
NASA Astrophysics Data System (ADS)
Rajamani, Vasanth; Julien, Christine; Payton, Jamie; Roman, Gruia-Catalin
Pervasive computing applications often entail continuous monitoring tasks, issuing persistent queries that return continuously updated views of the operational environment. We present PAQ, a middleware that supports applications' needs by approximating a persistent query as a sequence of one-time queries. PAQ introduces an integration strategy abstraction that allows composition of one-time query responses into streams representing sophisticated spatio-temporal phenomena of interest. A distinguishing feature of our middleware is the realization that the suitability of a persistent query's result is a function of the application's tolerance for accuracy weighed against the associated overhead costs. In PAQ, programmers can specify an inquiry strategy that dictates how information is gathered. Since network dynamics impact the suitability of a particular inquiry strategy, PAQ associates an introspection strategy with a persistent query, that evaluates the quality of the query's results. The result of introspection can trigger application-defined adaptation strategies that alter the nature of the query. PAQ's simple API makes developing adaptive querying systems easily realizable. We present the key abstractions, describe their implementations, and demonstrate the middleware's usefulness through application examples and evaluation.
RCQ-GA: RDF Chain Query Optimization Using Genetic Algorithms
NASA Astrophysics Data System (ADS)
Hogenboom, Alexander; Milea, Viorel; Frasincar, Flavius; Kaymak, Uzay
The application of Semantic Web technologies in an Electronic Commerce environment implies a need for good support tools. Fast query engines are needed for efficient querying of large amounts of data, usually represented using RDF. We focus on optimizing a special class of SPARQL queries, the so-called RDF chain queries. For this purpose, we devise a genetic algorithm called RCQ-GA that determines the order in which joins need to be performed for an efficient evaluation of RDF chain queries. The approach is benchmarked against a two-phase optimization algorithm, previously proposed in literature. The more complex a query is, the more RCQ-GA outperforms the benchmark in solution quality, execution time needed, and consistency of solution quality. When the algorithms are constrained by a time limit, the overall performance of RCQ-GA compared to the benchmark further improves.
An advanced web query interface for biological databases
Latendresse, Mario; Karp, Peter D.
2010-01-01
Although most web-based biological databases (DBs) offer some type of web-based form to allow users to author DB queries, these query forms are quite restricted in the complexity of DB queries that they can formulate. They can typically query only one DB, and can query only a single type of object at a time (e.g. genes) with no possible interaction between the objects—that is, in SQL parlance, no joins are allowed between DB objects. Writing precise queries against biological DBs is usually left to a programmer skillful enough in complex DB query languages like SQL. We present a web interface for building precise queries for biological DBs that can construct much more precise queries than most web-based query forms, yet that is user friendly enough to be used by biologists. It supports queries containing multiple conditions, and connecting multiple object types without using the join concept, which is unintuitive to biologists. This interactive web interface is called the Structured Advanced Query Page (SAQP). Users interactively build up a wide range of query constructs. Interactive documentation within the SAQP describes the schema of the queried DBs. The SAQP is based on BioVelo, a query language based on list comprehension. The SAQP is part of the Pathway Tools software and is available as part of several bioinformatics web sites powered by Pathway Tools, including the BioCyc.org site that contains more than 500 Pathway/Genome DBs. PMID:20624715
Measuring Up: Implementing a Dental Quality Measure in the Electronic Health Record Context
Bhardwaj, Aarti; Ramoni, Rachel; Kalenderian, Elsbeth; Neumann, Ana; Hebballi, Nutan B; White, Joel M; McClellan, Lyle; Walji, Muhammad F
2015-01-01
Background Quality improvement requires quality measures that are validly implementable. In this work, we assessed the feasibility and performance of an automated electronic Meaningful Use dental clinical quality measure (percentage of children who received fluoride varnish). Methods We defined how to implement the automated measure queries in a dental electronic health record (EHR). Within records identified through automated query, we manually reviewed a subsample to assess the performance of the query. Results The automated query found 71.0% of patients to have had fluoride varnish compared to 77.6% found using the manual chart review. The automated quality measure performance was 90.5% sensitivity, 90.8% specificity, 96.9% positive predictive value, and 75.2% negative predictive value. Conclusions Our findings support the feasibility of automated dental quality measure queries in the context of sufficient structured data. Information noted only in the free text rather than in structured data would require natural language processing approaches to effectively query. Practical Implications To participate in self-directed quality improvement, dental clinicians must embrace the accountability era. Commitment to quality will require enhanced documentation in order to support near-term automated calculation of quality measures. PMID:26562736
An adaptable architecture for patient cohort identification from diverse data sources.
Bache, Richard; Miles, Simon; Taweel, Adel
2013-12-01
We define and validate an architecture for systems that identify patient cohorts for clinical trials from multiple heterogeneous data sources. This architecture has an explicit query model capable of supporting temporal reasoning and expressing eligibility criteria independently of the representation of the data used to evaluate them. The architecture has the key feature that queries defined according to the query model are both pre and post-processed and this is used to address both structural and semantic heterogeneity. The process of extracting the relevant clinical facts is separated from the process of reasoning about them. A specific instance of the query model is then defined and implemented. We show that the specific instance of the query model has wide applicability. We then describe how it is used to access three diverse data warehouses to determine patient counts. Although the proposed architecture requires greater effort to implement the query model than would be the case for using just SQL and accessing a data-based management system directly, this effort is justified because it supports both temporal reasoning and heterogeneous data sources. The query model only needs to be implemented once no matter how many data sources are accessed. Each additional source requires only the implementation of a lightweight adaptor. The architecture has been used to implement a specific query model that can express complex eligibility criteria and access three diverse data warehouses thus demonstrating the feasibility of this approach in dealing with temporal reasoning and data heterogeneity.
A similarity-based data warehousing environment for medical images.
Teixeira, Jefferson William; Annibal, Luana Peixoto; Felipe, Joaquim Cezar; Ciferri, Ricardo Rodrigues; Ciferri, Cristina Dutra de Aguiar
2015-11-01
A core issue of the decision-making process in the medical field is to support the execution of analytical (OLAP) similarity queries over images in data warehousing environments. In this paper, we focus on this issue. We propose imageDWE, a non-conventional data warehousing environment that enables the storage of intrinsic features taken from medical images in a data warehouse and supports OLAP similarity queries over them. To comply with this goal, we introduce the concept of perceptual layer, which is an abstraction used to represent an image dataset according to a given feature descriptor in order to enable similarity search. Based on this concept, we propose the imageDW, an extended data warehouse with dimension tables specifically designed to support one or more perceptual layers. We also detail how to build an imageDW and how to load image data into it. Furthermore, we show how to process OLAP similarity queries composed of a conventional predicate and a similarity search predicate that encompasses the specification of one or more perceptual layers. Moreover, we introduce an index technique to improve the OLAP query processing over images. We carried out performance tests over a data warehouse environment that consolidated medical images from exams of several modalities. The results demonstrated the feasibility and efficiency of our proposed imageDWE to manage images and to process OLAP similarity queries. The results also demonstrated that the use of the proposed index technique guaranteed a great improvement in query processing. Copyright © 2015 Elsevier Ltd. All rights reserved.
A Visual Interface for Querying Heterogeneous Phylogenetic Databases.
Jamil, Hasan M
2017-01-01
Despite the recent growth in the number of phylogenetic databases, access to these wealth of resources remain largely tool or form-based interface driven. It is our thesis that the flexibility afforded by declarative query languages may offer the opportunity to access these repositories in a better way, and to use such a language to pose truly powerful queries in unprecedented ways. In this paper, we propose a substantially enhanced closed visual query language, called PhyQL, that can be used to query phylogenetic databases represented in a canonical form. The canonical representation presented helps capture most phylogenetic tree formats in a convenient way, and is used as the storage model for our PhyloBase database for which PhyQL serves as the query language. We have implemented a visual interface for the end users to pose PhyQL queries using visual icons, and drag and drop operations defined over them. Once a query is posed, the interface translates the visual query into a Datalog query for execution over the canonical database. Responses are returned as hyperlinks to phylogenies that can be viewed in several formats using the tree viewers supported by PhyloBase. Results cached in PhyQL buffer allows secondary querying on the computed results making it a truly powerful querying architecture.
A Secure and Privacy-Preserving Navigation Scheme Using Spatial Crowdsourcing in Fog-Based VANETs
Wang, Lingling; Liu, Guozhu; Sun, Lijun
2017-01-01
Fog-based VANETs (Vehicular ad hoc networks) is a new paradigm of vehicular ad hoc networks with the advantages of both vehicular cloud and fog computing. Real-time navigation schemes based on fog-based VANETs can promote the scheme performance efficiently. In this paper, we propose a secure and privacy-preserving navigation scheme by using vehicular spatial crowdsourcing based on fog-based VANETs. Fog nodes are used to generate and release the crowdsourcing tasks, and cooperatively find the optimal route according to the real-time traffic information collected by vehicles in their coverage areas. Meanwhile, the vehicle performing the crowdsourcing task can get a reasonable reward. The querying vehicle can retrieve the navigation results from each fog node successively when entering its coverage area, and follow the optimal route to the next fog node until it reaches the desired destination. Our scheme fulfills the security and privacy requirements of authentication, confidentiality and conditional privacy preservation. Some cryptographic primitives, including the Elgamal encryption algorithm, AES, randomized anonymous credentials and group signatures, are adopted to achieve this goal. Finally, we analyze the security and the efficiency of the proposed scheme. PMID:28338620
A Secure and Privacy-Preserving Navigation Scheme Using Spatial Crowdsourcing in Fog-Based VANETs.
Wang, Lingling; Liu, Guozhu; Sun, Lijun
2017-03-24
Fog-based VANETs (Vehicular ad hoc networks) is a new paradigm of vehicular ad hoc networks with the advantages of both vehicular cloud and fog computing. Real-time navigation schemes based on fog-based VANETs can promote the scheme performance efficiently. In this paper, we propose a secure and privacy-preserving navigation scheme by using vehicular spatial crowdsourcing based on fog-based VANETs. Fog nodes are used to generate and release the crowdsourcing tasks, and cooperatively find the optimal route according to the real-time traffic information collected by vehicles in their coverage areas. Meanwhile, the vehicle performing the crowdsourcing task can get a reasonable reward. The querying vehicle can retrieve the navigation results from each fog node successively when entering its coverage area, and follow the optimal route to the next fog node until it reaches the desired destination. Our scheme fulfills the security and privacy requirements of authentication, confidentiality and conditional privacy preservation. Some cryptographic primitives, including the Elgamal encryption algorithm, AES, randomized anonymous credentials and group signatures, are adopted to achieve this goal. Finally, we analyze the security and the efficiency of the proposed scheme.
Provenance Storage, Querying, and Visualization in PBase
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kianmajd, Parisa; Ludascher, Bertram; Missier, Paolo
2015-01-01
We present PBase, a repository for scientific workflows and their corresponding provenance information that facilitates the sharing of experiments among the scientific community. PBase is interoperable since it uses ProvONE, a standard provenance model for scientific workflows. Workflows and traces are stored in RDF, and with the support of SPARQL and the tree cover encoding, the repository provides a scalable infrastructure for querying the provenance data. Furthermore, through its user interface, it is possible to: visualize workflows and execution traces; visualize reachability relations within these traces; issue SPARQL queries; and visualize query results.
Collaboration-Centred Cities through Urban Apps Based on Open and User-Generated Data
Aguilera, Unai; López-de-Ipiña, Diego; Pérez, Jorge
2016-01-01
This paper describes the IES Cities platform conceived to streamline the development of urban apps that combine heterogeneous datasets provided by diverse entities, namely, government, citizens, sensor infrastructure and other information data sources. This work pursues the challenge of achieving effective citizen collaboration by empowering them to prosume urban data across time. Particularly, this paper focuses on the query mapper; a key component of the IES Cities platform devised to democratize the development of open data-based mobile urban apps. This component allows developers not only to use available data, but also to contribute to existing datasets with the execution of SQL sentences. In addition, the component allows developers to create ad hoc storages for their applications, publishable as new datasets accessible by other consumers. As multiple users could be contributing and using a dataset, our solution also provides a data level permission mechanism to control how the platform manages the access to its datasets. We have evaluated the advantages brought forward by IES Cities from the developers’ perspective by describing an exemplary urban app created on top of it. In addition, we include an evaluation of the main functionalities of the query mapper. PMID:27376300
Collaboration-Centred Cities through Urban Apps Based on Open and User-Generated Data.
Aguilera, Unai; López-de-Ipiña, Diego; Pérez, Jorge
2016-07-01
This paper describes the IES Cities platform conceived to streamline the development of urban apps that combine heterogeneous datasets provided by diverse entities, namely, government, citizens, sensor infrastructure and other information data sources. This work pursues the challenge of achieving effective citizen collaboration by empowering them to prosume urban data across time. Particularly, this paper focuses on the query mapper; a key component of the IES Cities platform devised to democratize the development of open data-based mobile urban apps. This component allows developers not only to use available data, but also to contribute to existing datasets with the execution of SQL sentences. In addition, the component allows developers to create ad hoc storages for their applications, publishable as new datasets accessible by other consumers. As multiple users could be contributing and using a dataset, our solution also provides a data level permission mechanism to control how the platform manages the access to its datasets. We have evaluated the advantages brought forward by IES Cities from the developers' perspective by describing an exemplary urban app created on top of it. In addition, we include an evaluation of the main functionalities of the query mapper.
GEMINI: Integrative Exploration of Genetic Variation and Genome Annotations
Paila, Umadevi; Chapman, Brad A.; Kirchner, Rory; Quinlan, Aaron R.
2013-01-01
Modern DNA sequencing technologies enable geneticists to rapidly identify genetic variation among many human genomes. However, isolating the minority of variants underlying disease remains an important, yet formidable challenge for medical genetics. We have developed GEMINI (GEnome MINIng), a flexible software package for exploring all forms of human genetic variation. Unlike existing tools, GEMINI integrates genetic variation with a diverse and adaptable set of genome annotations (e.g., dbSNP, ENCODE, UCSC, ClinVar, KEGG) into a unified database to facilitate interpretation and data exploration. Whereas other methods provide an inflexible set of variant filters or prioritization methods, GEMINI allows researchers to compose complex queries based on sample genotypes, inheritance patterns, and both pre-installed and custom genome annotations. GEMINI also provides methods for ad hoc queries and data exploration, a simple programming interface for custom analyses that leverage the underlying database, and both command line and graphical tools for common analyses. We demonstrate GEMINI's utility for exploring variation in personal genomes and family based genetic studies, and illustrate its ability to scale to studies involving thousands of human samples. GEMINI is designed for reproducibility and flexibility and our goal is to provide researchers with a standard framework for medical genomics. PMID:23874191
Earth science big data at users' fingertips: the EarthServer Science Gateway Mobile
NASA Astrophysics Data System (ADS)
Barbera, Roberto; Bruno, Riccardo; Calanducci, Antonio; Fargetta, Marco; Pappalardo, Marco; Rundo, Francesco
2014-05-01
The EarthServer project (www.earthserver.eu), funded by the European Commission under its Seventh Framework Program, aims at establishing open access and ad-hoc analytics on extreme-size Earth Science data, based on and extending leading-edge Array Database technology. The core idea is to use database query languages as client/server interface to achieve barrier-free "mix & match" access to multi-source, any-size, multi-dimensional space-time data -- in short: "Big Earth Data Analytics" - based on the open standards of the Open Geospatial Consortium Web Coverage Processing Service (OGC WCPS) and the W3C XQuery. EarthServer combines both, thereby achieving a tight data/metadata integration. Further, the rasdaman Array Database System (www.rasdaman.com) is extended with further space-time coverage data types. On server side, highly effective optimizations - such as parallel and distributed query processing - ensure scalability to Exabyte volumes. In this contribution we will report on the EarthServer Science Gateway Mobile, an app for both iOS and Android-based devices that allows users to seamlessly access some of the EarthServer applications using SAML-based federated authentication and fine-grained authorisation mechanisms.
Supporting Dynamic Ad hoc Collaboration Capabilities
DOE Office of Scientific and Technical Information (OSTI.GOV)
Agarwal, Deborah A.; Berket, Karlo
2003-07-14
Modern HENP experiments such as CMS and Atlas involve as many as 2000 collaborators around the world. Collaborations this large will be unable to meet often enough to support working closely together. Many of the tools currently available for collaboration focus on heavy-weight applications such as videoconferencing tools. While these are important, there is a more basic need for tools that support connecting physicists to work together on an ad hoc or continuous basis. Tools that support the day-to-day connectivity and underlying needs of a group of collaborators are important for providing light-weight, non-intrusive, and flexible ways to work collaboratively.more » Some example tools include messaging, file-sharing, and shared plot viewers. An important component of the environment is a scalable underlying communication framework. In this paper we will describe our current progress on building a dynamic and ad hoc collaboration environment and our vision for its evolution into a HENP collaboration environment.« less
An adaptable architecture for patient cohort identification from diverse data sources
Bache, Richard; Miles, Simon; Taweel, Adel
2013-01-01
Objective We define and validate an architecture for systems that identify patient cohorts for clinical trials from multiple heterogeneous data sources. This architecture has an explicit query model capable of supporting temporal reasoning and expressing eligibility criteria independently of the representation of the data used to evaluate them. Method The architecture has the key feature that queries defined according to the query model are both pre and post-processed and this is used to address both structural and semantic heterogeneity. The process of extracting the relevant clinical facts is separated from the process of reasoning about them. A specific instance of the query model is then defined and implemented. Results We show that the specific instance of the query model has wide applicability. We then describe how it is used to access three diverse data warehouses to determine patient counts. Discussion Although the proposed architecture requires greater effort to implement the query model than would be the case for using just SQL and accessing a data-based management system directly, this effort is justified because it supports both temporal reasoning and heterogeneous data sources. The query model only needs to be implemented once no matter how many data sources are accessed. Each additional source requires only the implementation of a lightweight adaptor. Conclusions The architecture has been used to implement a specific query model that can express complex eligibility criteria and access three diverse data warehouses thus demonstrating the feasibility of this approach in dealing with temporal reasoning and data heterogeneity. PMID:24064442
EpiK: A Knowledge Base for Epidemiological Modeling and Analytics of Infectious Diseases
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hasan, S. M. Shamimul; Fox, Edward A.; Bisset, Keith
Computational epidemiology seeks to develop computational methods to study the distribution and determinants of health-related states or events (including disease), and the application of this study to the control of diseases and other health problems. Recent advances in computing and data sciences have led to the development of innovative modeling environments to support this important goal. The datasets used to drive the dynamic models as well as the data produced by these models presents unique challenges owing to their size, heterogeneity and diversity. These datasets form the basis of effective and easy to use decision support and analytical environments. Asmore » a result, it is important to develop scalable data management systems to store, manage and integrate these datasets. In this paper, we develop EpiK—a knowledge base that facilitates the development of decision support and analytical environments to support epidemic science. An important goal is to develop a framework that links the input as well as output datasets to facilitate effective spatio-temporal and social reasoning that is critical in planning and intervention analysis before and during an epidemic. The data management framework links modeling workflow data and its metadata using a controlled vocabulary. The metadata captures information about storage, the mapping between the linked model and the physical layout, and relationships to support services. EpiK is designed to support agent-based modeling and analytics frameworks—aggregate models can be seen as special cases and are thus supported. We use semantic web technologies to create a representation of the datasets that encapsulates both the location and the schema heterogeneity. The choice of RDF as a representation language is motivated by the diversity and growth of the datasets that need to be integrated. A query bank is developed—the queries capture a broad range of questions that can be posed and answered during a typical case study pertaining to disease outbreaks. The queries are constructed using SPARQL Protocol and RDF Query Language (SPARQL) over the EpiK. EpiK can hide schema and location heterogeneity while efficiently supporting queries that span the computational epidemiology modeling pipeline: from model construction to simulation output. As a result, we show that the performance of benchmark queries varies significantly with respect to the choice of hardware underlying the database and resource description framework (RDF) engine.« less
EpiK: A Knowledge Base for Epidemiological Modeling and Analytics of Infectious Diseases
Hasan, S. M. Shamimul; Fox, Edward A.; Bisset, Keith; ...
2017-11-06
Computational epidemiology seeks to develop computational methods to study the distribution and determinants of health-related states or events (including disease), and the application of this study to the control of diseases and other health problems. Recent advances in computing and data sciences have led to the development of innovative modeling environments to support this important goal. The datasets used to drive the dynamic models as well as the data produced by these models presents unique challenges owing to their size, heterogeneity and diversity. These datasets form the basis of effective and easy to use decision support and analytical environments. Asmore » a result, it is important to develop scalable data management systems to store, manage and integrate these datasets. In this paper, we develop EpiK—a knowledge base that facilitates the development of decision support and analytical environments to support epidemic science. An important goal is to develop a framework that links the input as well as output datasets to facilitate effective spatio-temporal and social reasoning that is critical in planning and intervention analysis before and during an epidemic. The data management framework links modeling workflow data and its metadata using a controlled vocabulary. The metadata captures information about storage, the mapping between the linked model and the physical layout, and relationships to support services. EpiK is designed to support agent-based modeling and analytics frameworks—aggregate models can be seen as special cases and are thus supported. We use semantic web technologies to create a representation of the datasets that encapsulates both the location and the schema heterogeneity. The choice of RDF as a representation language is motivated by the diversity and growth of the datasets that need to be integrated. A query bank is developed—the queries capture a broad range of questions that can be posed and answered during a typical case study pertaining to disease outbreaks. The queries are constructed using SPARQL Protocol and RDF Query Language (SPARQL) over the EpiK. EpiK can hide schema and location heterogeneity while efficiently supporting queries that span the computational epidemiology modeling pipeline: from model construction to simulation output. As a result, we show that the performance of benchmark queries varies significantly with respect to the choice of hardware underlying the database and resource description framework (RDF) engine.« less
An alternative database approach for management of SNOMED CT and improved patient data queries.
Campbell, W Scott; Pedersen, Jay; McClay, James C; Rao, Praveen; Bastola, Dhundy; Campbell, James R
2015-10-01
SNOMED CT is the international lingua franca of terminologies for human health. Based in Description Logics (DL), the terminology enables data queries that incorporate inferences between data elements, as well as, those relationships that are explicitly stated. However, the ontologic and polyhierarchical nature of the SNOMED CT concept model make it difficult to implement in its entirety within electronic health record systems that largely employ object oriented or relational database architectures. The result is a reduction of data richness, limitations of query capability and increased systems overhead. The hypothesis of this research was that a graph database (graph DB) architecture using SNOMED CT as the basis for the data model and subsequently modeling patient data upon the semantic core of SNOMED CT could exploit the full value of the terminology to enrich and support advanced data querying capability of patient data sets. The hypothesis was tested by instantiating a graph DB with the fully classified SNOMED CT concept model. The graph DB instance was tested for integrity by calculating the transitive closure table for the SNOMED CT hierarchy and comparing the results with transitive closure tables created using current, validated methods. The graph DB was then populated with 461,171 anonymized patient record fragments and over 2.1 million associated SNOMED CT clinical findings. Queries, including concept negation and disjunction, were then run against the graph database and an enterprise Oracle relational database (RDBMS) of the same patient data sets. The graph DB was then populated with laboratory data encoded using LOINC, as well as, medication data encoded with RxNorm and complex queries performed using LOINC, RxNorm and SNOMED CT to identify uniquely described patient populations. A graph database instance was successfully created for two international releases of SNOMED CT and two US SNOMED CT editions. Transitive closure tables and descriptive statistics generated using the graph database were identical to those using validated methods. Patient queries produced identical patient count results to the Oracle RDBMS with comparable times. Database queries involving defining attributes of SNOMED CT concepts were possible with the graph DB. The same queries could not be directly performed with the Oracle RDBMS representation of the patient data and required the creation and use of external terminology services. Further, queries of undefined depth were successful in identifying unknown relationships between patient cohorts. The results of this study supported the hypothesis that a patient database built upon and around the semantic model of SNOMED CT was possible. The model supported queries that leveraged all aspects of the SNOMED CT logical model to produce clinically relevant query results. Logical disjunction and negation queries were possible using the data model, as well as, queries that extended beyond the structural IS_A hierarchy of SNOMED CT to include queries that employed defining attribute-values of SNOMED CT concepts as search parameters. As medical terminologies, such as SNOMED CT, continue to expand, they will become more complex and model consistency will be more difficult to assure. Simultaneously, consumers of data will increasingly demand improvements to query functionality to accommodate additional granularity of clinical concepts without sacrificing speed. This new line of research provides an alternative approach to instantiating and querying patient data represented using advanced computable clinical terminologies. Copyright © 2015 Elsevier Inc. All rights reserved.
Profile-IQ: Web-based data query system for local health department infrastructure and activities.
Shah, Gulzar H; Leep, Carolyn J; Alexander, Dayna
2014-01-01
To demonstrate the use of National Association of County & City Health Officials' Profile-IQ, a Web-based data query system, and how policy makers, researchers, the general public, and public health professionals can use the system to generate descriptive statistics on local health departments. This article is a descriptive account of an important health informatics tool based on information from the project charter for Profile-IQ and the authors' experience and knowledge in design and use of this query system. Profile-IQ is a Web-based data query system that is based on open-source software: MySQL 5.5, Google Web Toolkit 2.2.0, Apache Commons Math library, Google Chart API, and Tomcat 6.0 Web server deployed on an Amazon EC2 server. It supports dynamic queries of National Profile of Local Health Departments data on local health department finances, workforce, and activities. Profile-IQ's customizable queries provide a variety of statistics not available in published reports and support the growing information needs of users who do not wish to work directly with data files for lack of staff skills or time, or to avoid a data use agreement. Profile-IQ also meets the growing demand of public health practitioners and policy makers for data to support quality improvement, community health assessment, and other processes associated with voluntary public health accreditation. It represents a step forward in the recent health informatics movement of data liberation and use of open source information technology solutions to promote public health.
TCW: Transcriptome Computational Workbench
Soderlund, Carol; Nelson, William; Willer, Mark; Gang, David R.
2013-01-01
Background The analysis of transcriptome data involves many steps and various programs, along with organization of large amounts of data and results. Without a methodical approach for storage, analysis and query, the resulting ad hoc analysis can lead to human error, loss of data and results, inefficient use of time, and lack of verifiability, repeatability, and extensibility. Methodology The Transcriptome Computational Workbench (TCW) provides Java graphical interfaces for methodical analysis for both single and comparative transcriptome data without the use of a reference genome (e.g. for non-model organisms). The singleTCW interface steps the user through importing transcript sequences (e.g. Illumina) or assembling long sequences (e.g. Sanger, 454, transcripts), annotating the sequences, and performing differential expression analysis using published statistical programs in R. The data, metadata, and results are stored in a MySQL database. The multiTCW interface builds a comparison database by importing sequence and annotation from one or more single TCW databases, executes the ESTscan program to translate the sequences into proteins, and then incorporates one or more clusterings, where the clustering options are to execute the orthoMCL program, compute transitive closure, or import clusters. Both singleTCW and multiTCW allow extensive query and display of the results, where singleTCW displays the alignment of annotation hits to transcript sequences, and multiTCW displays multiple transcript alignments with MUSCLE or pairwise alignments. The query programs can be executed on the desktop for fastest analysis, or from the web for sharing the results. Conclusion It is now affordable to buy a multi-processor machine, and easy to install Java and MySQL. By simply downloading the TCW, the user can interactively analyze, query and view their data. The TCW allows in-depth data mining of the results, which can lead to a better understanding of the transcriptome. TCW is freely available from www.agcol.arizona.edu/software/tcw. PMID:23874959
TCW: transcriptome computational workbench.
Soderlund, Carol; Nelson, William; Willer, Mark; Gang, David R
2013-01-01
The analysis of transcriptome data involves many steps and various programs, along with organization of large amounts of data and results. Without a methodical approach for storage, analysis and query, the resulting ad hoc analysis can lead to human error, loss of data and results, inefficient use of time, and lack of verifiability, repeatability, and extensibility. The Transcriptome Computational Workbench (TCW) provides Java graphical interfaces for methodical analysis for both single and comparative transcriptome data without the use of a reference genome (e.g. for non-model organisms). The singleTCW interface steps the user through importing transcript sequences (e.g. Illumina) or assembling long sequences (e.g. Sanger, 454, transcripts), annotating the sequences, and performing differential expression analysis using published statistical programs in R. The data, metadata, and results are stored in a MySQL database. The multiTCW interface builds a comparison database by importing sequence and annotation from one or more single TCW databases, executes the ESTscan program to translate the sequences into proteins, and then incorporates one or more clusterings, where the clustering options are to execute the orthoMCL program, compute transitive closure, or import clusters. Both singleTCW and multiTCW allow extensive query and display of the results, where singleTCW displays the alignment of annotation hits to transcript sequences, and multiTCW displays multiple transcript alignments with MUSCLE or pairwise alignments. The query programs can be executed on the desktop for fastest analysis, or from the web for sharing the results. It is now affordable to buy a multi-processor machine, and easy to install Java and MySQL. By simply downloading the TCW, the user can interactively analyze, query and view their data. The TCW allows in-depth data mining of the results, which can lead to a better understanding of the transcriptome. TCW is freely available from www.agcol.arizona.edu/software/tcw.
Managing and Querying Image Annotation and Markup in XML.
Wang, Fusheng; Pan, Tony; Sharma, Ashish; Saltz, Joel
2010-01-01
Proprietary approaches for representing annotations and image markup are serious barriers for researchers to share image data and knowledge. The Annotation and Image Markup (AIM) project is developing a standard based information model for image annotation and markup in health care and clinical trial environments. The complex hierarchical structures of AIM data model pose new challenges for managing such data in terms of performance and support of complex queries. In this paper, we present our work on managing AIM data through a native XML approach, and supporting complex image and annotation queries through native extension of XQuery language. Through integration with xService, AIM databases can now be conveniently shared through caGrid.
Managing and Querying Image Annotation and Markup in XML
Wang, Fusheng; Pan, Tony; Sharma, Ashish; Saltz, Joel
2010-01-01
Proprietary approaches for representing annotations and image markup are serious barriers for researchers to share image data and knowledge. The Annotation and Image Markup (AIM) project is developing a standard based information model for image annotation and markup in health care and clinical trial environments. The complex hierarchical structures of AIM data model pose new challenges for managing such data in terms of performance and support of complex queries. In this paper, we present our work on managing AIM data through a native XML approach, and supporting complex image and annotation queries through native extension of XQuery language. Through integration with xService, AIM databases can now be conveniently shared through caGrid. PMID:21218167
Genomes as geography: using GIS technology to build interactive genome feature maps
Dolan, Mary E; Holden, Constance C; Beard, M Kate; Bult, Carol J
2006-01-01
Background Many commonly used genome browsers display sequence annotations and related attributes as horizontal data tracks that can be toggled on and off according to user preferences. Most genome browsers use only simple keyword searches and limit the display of detailed annotations to one chromosomal region of the genome at a time. We have employed concepts, methodologies, and tools that were developed for the display of geographic data to develop a Genome Spatial Information System (GenoSIS) for displaying genomes spatially, and interacting with genome annotations and related attribute data. In contrast to the paradigm of horizontally stacked data tracks used by most genome browsers, GenoSIS uses the concept of registered spatial layers composed of spatial objects for integrated display of diverse data. In addition to basic keyword searches, GenoSIS supports complex queries, including spatial queries, and dynamically generates genome maps. Our adaptation of the geographic information system (GIS) model in a genome context supports spatial representation of genome features at multiple scales with a versatile and expressive query capability beyond that supported by existing genome browsers. Results We implemented an interactive genome sequence feature map for the mouse genome in GenoSIS, an application that uses ArcGIS, a commercially available GIS software system. The genome features and their attributes are represented as spatial objects and data layers that can be toggled on and off according to user preferences or displayed selectively in response to user queries. GenoSIS supports the generation of custom genome maps in response to complex queries about genome features based on both their attributes and locations. Our example application of GenoSIS to the mouse genome demonstrates the powerful visualization and query capability of mature GIS technology applied in a novel domain. Conclusion Mapping tools developed specifically for geographic data can be exploited to display, explore and interact with genome data. The approach we describe here is organism independent and is equally useful for linear and circular chromosomes. One of the unique capabilities of GenoSIS compared to existing genome browsers is the capacity to generate genome feature maps dynamically in response to complex attribute and spatial queries. PMID:16984652
Fragger: a protein fragment picker for structural queries.
Berenger, Francois; Simoncini, David; Voet, Arnout; Shrestha, Rojan; Zhang, Kam Y J
2017-01-01
Protein modeling and design activities often require querying the Protein Data Bank (PDB) with a structural fragment, possibly containing gaps. For some applications, it is preferable to work on a specific subset of the PDB or with unpublished structures. These requirements, along with specific user needs, motivated the creation of a new software to manage and query 3D protein fragments. Fragger is a protein fragment picker that allows protein fragment databases to be created and queried. All fragment lengths are supported and any set of PDB files can be used to create a database. Fragger can efficiently search a fragment database with a query fragment and a distance threshold. Matching fragments are ranked by distance to the query. The query fragment can have structural gaps and the allowed amino acid sequences matching a query can be constrained via a regular expression of one-letter amino acid codes. Fragger also incorporates a tool to compute the backbone RMSD of one versus many fragments in high throughput. Fragger should be useful for protein design, loop grafting and related structural bioinformatics tasks.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fokine, Andrei; Islam, Mohammad Z.; Zhang, Zhihong
2011-09-16
The head of bacteriophage T4 is decorated with 155 copies of the highly antigenic outer capsid protein (Hoc). One Hoc molecule binds near the center of each hexameric capsomer. Hoc is dispensable for capsid assembly and has been used to display pathogenic antigens on the surface of T4. Here we report the crystal structure of a protein containing the first three of four domains of Hoc from bacteriophage RB49, a close relative of T4. The structure shows an approximately linear arrangement of the protein domains. Each of these domains has an immunoglobulin-like fold, frequently found in cell attachment molecules. Inmore » addition, we report biochemical data suggesting that Hoc can bind to Escherichia coli, supporting the hypothesis that Hoc could attach the phage capsids to bacterial surfaces and perhaps also to other organisms. The capacity for such reversible adhesion probably provides survival advantages to the bacteriophage.« less
Relational Algebra in Spatial Decision Support Systems Ontologies.
Diomidous, Marianna; Chardalias, Kostis; Koutonias, Panagiotis; Magnita, Adrianna; Andrianopoulos, Charalampos; Zimeras, Stelios; Mechili, Enkeleint Aggelos
2017-01-01
Decision Support Systems (DSS) is a powerful tool, for facilitates researchers to choose the correct decision based on their final results. Especially in medical cases where doctors could use these systems, to overcome the problem with the clinical misunderstanding. Based on these systems, queries must be constructed based on the particular questions that doctors must answer. In this work, combination between questions and queries would be presented via relational algebra.
a Novel Approach of Indexing and Retrieving Spatial Polygons for Efficient Spatial Region Queries
NASA Astrophysics Data System (ADS)
Zhao, J. H.; Wang, X. Z.; Wang, F. Y.; Shen, Z. H.; Zhou, Y. C.; Wang, Y. L.
2017-10-01
Spatial region queries are more and more widely used in web-based applications. Mechanisms to provide efficient query processing over geospatial data are essential. However, due to the massive geospatial data volume, heavy geometric computation, and high access concurrency, it is difficult to get response in real time. Spatial indexes are usually used in this situation. In this paper, based on k-d tree, we introduce a distributed KD-Tree (DKD-Tree) suitbable for polygon data, and a two-step query algorithm. The spatial index construction is recursive and iterative, and the query is an in memory process. Both the index and query methods can be processed in parallel, and are implemented based on HDFS, Spark and Redis. Experiments on a large volume of Remote Sensing images metadata have been carried out, and the advantages of our method are investigated by comparing with spatial region queries executed on PostgreSQL and PostGIS. Results show that our approach not only greatly improves the efficiency of spatial region query, but also has good scalability, Moreover, the two-step spatial range query algorithm can also save cluster resources to support a large number of concurrent queries. Therefore, this method is very useful when building large geographic information systems.
Image Retrieval by Color Semantics with Incomplete Knowledge.
ERIC Educational Resources Information Center
Corridoni, Jacopo M.; Del Bimbo, Alberto; Vicario, Enrico
1998-01-01
Presents a system which supports image retrieval by high-level chromatic contents, the sensations that color accordances generate on the observer. Surveys Itten's theory of color semantics and discusses image description and query specification. Presents examples of visual querying. (AEF)
Experiments on Interfaces To Support Query Expansion.
ERIC Educational Resources Information Center
Beaulieu, M.
1997-01-01
Focuses on the user and human-computer interaction aspects of the research based on the Okapi text retrieval system. Three experiments implementing different approaches to query expansion are described, including the use of graphical user interfaces with different windowing techniques. (Author/LRW)
Information Sharing Modalities for Mobile Ad-Hoc Networks
NASA Astrophysics Data System (ADS)
de Spindler, Alexandre; Grossniklaus, Michael; Lins, Christoph; Norrie, Moira C.
Current mobile phone technologies have fostered the emergence of a new generation of mobile applications. Such applications allow users to interact and share information opportunistically when their mobile devices are in physical proximity or close to fixed installations. It has been shown how mobile applications such as collaborative filtering and location-based services can take advantage of ad-hoc connectivity to use physical proximity as a filter mechanism inherent to the application logic. We discuss the different modes of information sharing that arise in such settings based on the models of persistence and synchronisation. We present a platform that supports the development of applications that can exploit these modes of ad-hoc information sharing and, by means of an example, show how such an application can be realised based on the supported event model.
GELLO: an object-oriented query and expression language for clinical decision support.
Sordo, Margarita; Ogunyemi, Omolola; Boxwala, Aziz A; Greenes, Robert A
2003-01-01
GELLO is a purpose-specific, object-oriented (OO) query and expression language. GELLO is the result of a concerted effort of the Decision Systems Group (DSG) working with the HL7 Clinical Decision Support Technical Committee (CDSTC) to provide the HL7 community with a common format for data encoding and manipulation. GELLO will soon be submitted for ballot to the HL7 CDSTC for consideration as a standard.
A natural language interface plug-in for cooperative query answering in biological databases.
Jamil, Hasan M
2012-06-11
One of the many unique features of biological databases is that the mere existence of a ground data item is not always a precondition for a query response. It may be argued that from a biologist's standpoint, queries are not always best posed using a structured language. By this we mean that approximate and flexible responses to natural language like queries are well suited for this domain. This is partly due to biologists' tendency to seek simpler interfaces and partly due to the fact that questions in biology involve high level concepts that are open to interpretations computed using sophisticated tools. In such highly interpretive environments, rigidly structured databases do not always perform well. In this paper, our goal is to propose a semantic correspondence plug-in to aid natural language query processing over arbitrary biological database schema with an aim to providing cooperative responses to queries tailored to users' interpretations. Natural language interfaces for databases are generally effective when they are tuned to the underlying database schema and its semantics. Therefore, changes in database schema become impossible to support, or a substantial reorganization cost must be absorbed to reflect any change. We leverage developments in natural language parsing, rule languages and ontologies, and data integration technologies to assemble a prototype query processor that is able to transform a natural language query into a semantically equivalent structured query over the database. We allow knowledge rules and their frequent modifications as part of the underlying database schema. The approach we adopt in our plug-in overcomes some of the serious limitations of many contemporary natural language interfaces, including support for schema modifications and independence from underlying database schema. The plug-in introduced in this paper is generic and facilitates connecting user selected natural language interfaces to arbitrary databases using a semantic description of the intended application. We demonstrate the feasibility of our approach with a practical example.
A Semantic Basis for Proof Queries and Transformations
NASA Technical Reports Server (NTRS)
Aspinall, David; Denney, Ewen W.; Luth, Christoph
2013-01-01
We extend the query language PrQL, designed for inspecting machine representations of proofs, to also allow transformation of proofs. PrQL natively supports hiproofs which express proof structure using hierarchically nested labelled trees, which we claim is a natural way of taming the complexity of huge proofs. Query-driven transformations enable manipulation of this structure, in particular, to transform proofs produced by interactive theorem provers into forms that assist their understanding, or that could be consumed by other tools. In this paper we motivate and define basic transformation operations, using an abstract denotational semantics of hiproofs and queries. This extends our previous semantics for queries based on syntactic tree representations.We define update operations that add and remove sub-proofs, and manipulate the hierarchy to group and ungroup nodes. We show that
Supporting temporal queries on clinical relational databases: the S-WATCH-QL language.
Combi, C.; Missora, L.; Pinciroli, F.
1996-01-01
Due to the ubiquitous and special nature of time, specially in clinical datábases there's the need of particular temporal data and operators. In this paper we describe S-WATCH-QL (Structured Watch Query Language), a temporal extension of SQL, the widespread query language based on the relational model. S-WATCH-QL extends the well-known SQL by the addition of: a) temporal data types that allow the storage of information with different levels of granularity; b) historical relations that can store together both instantaneous valid times and intervals; c) some temporal clauses, functions and predicates allowing to define complex temporal queries. PMID:8947722
Accelerating Research Impact in a Learning Health Care System
Elwy, A. Rani; Sales, Anne E.; Atkins, David
2017-01-01
Background: Since 1998, the Veterans Health Administration (VHA) Quality Enhancement Research Initiative (QUERI) has supported more rapid implementation of research into clinical practice. Objectives: With the passage of the Veterans Access, Choice and Accountability Act of 2014 (Choice Act), QUERI further evolved to support VHA’s transformation into a Learning Health Care System by aligning science with clinical priority goals based on a strategic planning process and alignment of funding priorities with updated VHA priority goals in response to the Choice Act. Design: QUERI updated its strategic goals in response to independent assessments mandated by the Choice Act that recommended VHA reduce variation in care by providing a clear path to implement best practices. Specifically, QUERI updated its application process to ensure its centers (Programs) focus on cross-cutting VHA priorities and specify roadmaps for implementation of research-informed practices across different settings. QUERI also increased funding for scientific evaluations of the Choice Act and other policies in response to Commission on Care recommendations. Results: QUERI’s national network of Programs deploys effective practices using implementation strategies across different settings. QUERI Choice Act evaluations informed the law’s further implementation, setting the stage for additional rigorous national evaluations of other VHA programs and policies including community provider networks. Conclusions: Grounded in implementation science and evidence-based policy, QUERI serves as an example of how to operationalize core components of a Learning Health Care System, notably through rigorous evaluation and scientific testing of implementation strategies to ultimately reduce variation in quality and improve overall population health. PMID:27997456
Multidimensional indexing structure for use with linear optimization queries
NASA Technical Reports Server (NTRS)
Bergman, Lawrence David (Inventor); Castelli, Vittorio (Inventor); Chang, Yuan-Chi (Inventor); Li, Chung-Sheng (Inventor); Smith, John Richard (Inventor)
2002-01-01
Linear optimization queries, which usually arise in various decision support and resource planning applications, are queries that retrieve top N data records (where N is an integer greater than zero) which satisfy a specific optimization criterion. The optimization criterion is to either maximize or minimize a linear equation. The coefficients of the linear equation are given at query time. Methods and apparatus are disclosed for constructing, maintaining and utilizing a multidimensional indexing structure of database records to improve the execution speed of linear optimization queries. Database records with numerical attributes are organized into a number of layers and each layer represents a geometric structure called convex hull. Such linear optimization queries are processed by searching from the outer-most layer of this multi-layer indexing structure inwards. At least one record per layer will satisfy the query criterion and the number of layers needed to be searched depends on the spatial distribution of records, the query-issued linear coefficients, and N, the number of records to be returned. When N is small compared to the total size of the database, answering the query typically requires searching only a small fraction of all relevant records, resulting in a tremendous speedup as compared to linearly scanning the entire dataset.
SeqWare Query Engine: storing and searching sequence data in the cloud.
O'Connor, Brian D; Merriman, Barry; Nelson, Stanley F
2010-12-21
Since the introduction of next-generation DNA sequencers the rapid increase in sequencer throughput, and associated drop in costs, has resulted in more than a dozen human genomes being resequenced over the last few years. These efforts are merely a prelude for a future in which genome resequencing will be commonplace for both biomedical research and clinical applications. The dramatic increase in sequencer output strains all facets of computational infrastructure, especially databases and query interfaces. The advent of cloud computing, and a variety of powerful tools designed to process petascale datasets, provide a compelling solution to these ever increasing demands. In this work, we present the SeqWare Query Engine which has been created using modern cloud computing technologies and designed to support databasing information from thousands of genomes. Our backend implementation was built using the highly scalable, NoSQL HBase database from the Hadoop project. We also created a web-based frontend that provides both a programmatic and interactive query interface and integrates with widely used genome browsers and tools. Using the query engine, users can load and query variants (SNVs, indels, translocations, etc) with a rich level of annotations including coverage and functional consequences. As a proof of concept we loaded several whole genome datasets including the U87MG cell line. We also used a glioblastoma multiforme tumor/normal pair to both profile performance and provide an example of using the Hadoop MapReduce framework within the query engine. This software is open source and freely available from the SeqWare project (http://seqware.sourceforge.net). The SeqWare Query Engine provided an easy way to make the U87MG genome accessible to programmers and non-programmers alike. This enabled a faster and more open exploration of results, quicker tuning of parameters for heuristic variant calling filters, and a common data interface to simplify development of analytical tools. The range of data types supported, the ease of querying and integrating with existing tools, and the robust scalability of the underlying cloud-based technologies make SeqWare Query Engine a nature fit for storing and searching ever-growing genome sequence datasets.
SeqWare Query Engine: storing and searching sequence data in the cloud
2010-01-01
Background Since the introduction of next-generation DNA sequencers the rapid increase in sequencer throughput, and associated drop in costs, has resulted in more than a dozen human genomes being resequenced over the last few years. These efforts are merely a prelude for a future in which genome resequencing will be commonplace for both biomedical research and clinical applications. The dramatic increase in sequencer output strains all facets of computational infrastructure, especially databases and query interfaces. The advent of cloud computing, and a variety of powerful tools designed to process petascale datasets, provide a compelling solution to these ever increasing demands. Results In this work, we present the SeqWare Query Engine which has been created using modern cloud computing technologies and designed to support databasing information from thousands of genomes. Our backend implementation was built using the highly scalable, NoSQL HBase database from the Hadoop project. We also created a web-based frontend that provides both a programmatic and interactive query interface and integrates with widely used genome browsers and tools. Using the query engine, users can load and query variants (SNVs, indels, translocations, etc) with a rich level of annotations including coverage and functional consequences. As a proof of concept we loaded several whole genome datasets including the U87MG cell line. We also used a glioblastoma multiforme tumor/normal pair to both profile performance and provide an example of using the Hadoop MapReduce framework within the query engine. This software is open source and freely available from the SeqWare project (http://seqware.sourceforge.net). Conclusions The SeqWare Query Engine provided an easy way to make the U87MG genome accessible to programmers and non-programmers alike. This enabled a faster and more open exploration of results, quicker tuning of parameters for heuristic variant calling filters, and a common data interface to simplify development of analytical tools. The range of data types supported, the ease of querying and integrating with existing tools, and the robust scalability of the underlying cloud-based technologies make SeqWare Query Engine a nature fit for storing and searching ever-growing genome sequence datasets. PMID:21210981
Time series patterns and language support in DBMS
NASA Astrophysics Data System (ADS)
Telnarova, Zdenka
2017-07-01
This contribution is focused on pattern type Time Series as a rich in semantics representation of data. Some example of implementation of this pattern type in traditional Data Base Management Systems is briefly presented. There are many approaches how to manipulate with patterns and query patterns. Crucial issue can be seen in systematic approach to pattern management and specific pattern query language which takes into consideration semantics of patterns. Query language SQL-TS for manipulating with patterns is shown on Time Series data.
Anytime query-tuned kernel machine classifiers via Cholesky factorization
NASA Technical Reports Server (NTRS)
DeCoste, D.
2002-01-01
We recently demonstrated 2 to 64-fold query-time speedups of Support Vector Machine and Kernel Fisher classifiers via a new computational geometry method for anytime output bounds (DeCoste,2002). This new paper refines our approach in two key ways. First, we introduce a simple linear algebra formulation based on Cholesky factorization, yielding simpler equations and lower computational overhead. Second, this new formulation suggests new methods for achieving additional speedups, including tuning on query samples. We demonstrate effectiveness on benchmark datasets.
Extending the Query Language of a Data Warehouse for Patient Recruitment.
Dietrich, Georg; Ertl, Maximilian; Fette, Georg; Kaspar, Mathias; Krebs, Jonathan; Mackenrodt, Daniel; Störk, Stefan; Puppe, Frank
2017-01-01
Patient recruitment for clinical trials is a laborious task, as many texts have to be screened. Usually, this work is done manually and takes a lot of time. We have developed a system that automates the screening process. Besides standard keyword queries, the query language supports extraction of numbers, time-spans and negations. In a feasibility study for patient recruitment from a stroke unit with 40 patients, we achieved encouraging extraction rates above 95% for numbers and negations and ca. 86% for time spans.
Query Health: standards-based, cross-platform population health surveillance
Klann, Jeffrey G; Buck, Michael D; Brown, Jeffrey; Hadley, Marc; Elmore, Richard; Weber, Griffin M; Murphy, Shawn N
2014-01-01
Objective Understanding population-level health trends is essential to effectively monitor and improve public health. The Office of the National Coordinator for Health Information Technology (ONC) Query Health initiative is a collaboration to develop a national architecture for distributed, population-level health queries across diverse clinical systems with disparate data models. Here we review Query Health activities, including a standards-based methodology, an open-source reference implementation, and three pilot projects. Materials and methods Query Health defined a standards-based approach for distributed population health queries, using an ontology based on the Quality Data Model and Consolidated Clinical Document Architecture, Health Quality Measures Format (HQMF) as the query language, the Query Envelope as the secure transport layer, and the Quality Reporting Document Architecture as the result language. Results We implemented this approach using Informatics for Integrating Biology and the Bedside (i2b2) and hQuery for data analytics and PopMedNet for access control, secure query distribution, and response. We deployed the reference implementation at three pilot sites: two public health departments (New York City and Massachusetts) and one pilot designed to support Food and Drug Administration post-market safety surveillance activities. The pilots were successful, although improved cross-platform data normalization is needed. Discussions This initiative resulted in a standards-based methodology for population health queries, a reference implementation, and revision of the HQMF standard. It also informed future directions regarding interoperability and data access for ONC's Data Access Framework initiative. Conclusions Query Health was a test of the learning health system that supplied a functional methodology and reference implementation for distributed population health queries that has been validated at three sites. PMID:24699371
Query Health: standards-based, cross-platform population health surveillance.
Klann, Jeffrey G; Buck, Michael D; Brown, Jeffrey; Hadley, Marc; Elmore, Richard; Weber, Griffin M; Murphy, Shawn N
2014-01-01
Understanding population-level health trends is essential to effectively monitor and improve public health. The Office of the National Coordinator for Health Information Technology (ONC) Query Health initiative is a collaboration to develop a national architecture for distributed, population-level health queries across diverse clinical systems with disparate data models. Here we review Query Health activities, including a standards-based methodology, an open-source reference implementation, and three pilot projects. Query Health defined a standards-based approach for distributed population health queries, using an ontology based on the Quality Data Model and Consolidated Clinical Document Architecture, Health Quality Measures Format (HQMF) as the query language, the Query Envelope as the secure transport layer, and the Quality Reporting Document Architecture as the result language. We implemented this approach using Informatics for Integrating Biology and the Bedside (i2b2) and hQuery for data analytics and PopMedNet for access control, secure query distribution, and response. We deployed the reference implementation at three pilot sites: two public health departments (New York City and Massachusetts) and one pilot designed to support Food and Drug Administration post-market safety surveillance activities. The pilots were successful, although improved cross-platform data normalization is needed. This initiative resulted in a standards-based methodology for population health queries, a reference implementation, and revision of the HQMF standard. It also informed future directions regarding interoperability and data access for ONC's Data Access Framework initiative. Query Health was a test of the learning health system that supplied a functional methodology and reference implementation for distributed population health queries that has been validated at three sites. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
A clinical data repository enhances hospital infection control.
Samore, M.; Lichtenberg, D.; Saubermann, L.; Kawachi, C.; Carmeli, Y.
1997-01-01
We describe the benefits of a relational database of hospital clinical data (Clinical Data Repository; CDR) for an infection control program. The CDR consists of > 40 Sybase tables, and is directly accessible for ad hoc queries by members of the infection control unit who have been granted privileges for access by the Information Systems Department. The data elements and functional requirements most useful for surveillance of nosocomial infections, antibiotic use, and resistant organisms are characterized. Specific applications of the CDR are presented, including the use of automated definitions of nosocomial infection, graphical monitoring of resistant organisms with quality control limits, and prospective detection of inappropriate antibiotic use. Hospital surveillance and quality improvement activities are significantly benefited by the availability of a querable set of tables containing diverse clinical data. PMID:9357588
Measuring up: Implementing a dental quality measure in the electronic health record context.
Bhardwaj, Aarti; Ramoni, Rachel; Kalenderian, Elsbeth; Neumann, Ana; Hebballi, Nutan B; White, Joel M; McClellan, Lyle; Walji, Muhammad F
2016-01-01
Quality improvement requires using quality measures that can be implemented in a valid manner. Using guidelines set forth by the Meaningful Use portion of the Health Information Technology for Economic and Clinical Health Act, the authors assessed the feasibility and performance of an automated electronic Meaningful Use dental clinical quality measure to determine the percentage of children who received fluoride varnish. The authors defined how to implement the automated measure queries in a dental electronic health record. Within records identified through automated query, the authors manually reviewed a subsample to assess the performance of the query. The automated query results revealed that 71.0% of patients had fluoride varnish compared with the manual chart review results that indicated 77.6% of patients had fluoride varnish. The automated quality measure performance results indicated 90.5% sensitivity, 90.8% specificity, 96.9% positive predictive value, and 75.2% negative predictive value. The authors' findings support the feasibility of using automated dental quality measure queries in the context of sufficient structured data. Information noted only in free text rather than in structured data would require using natural language processing approaches to effectively query electronic health records. To participate in self-directed quality improvement, dental clinicians must embrace the accountability era. Commitment to quality will require enhanced documentation to support near-term automated calculation of quality measures. Copyright © 2016 American Dental Association. Published by Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Berriman, G. B.; Ciardi, D. R.; Good, J. C.; Laity, A. C.; Zhang, A.
2006-07-01
At ADASS XIV, we described how the W. M. Keck Observatory Archive (KOA) re-uses and extends the component based architecture of the NASA/IPAC Infrared Science Archive (IRSA) to ingest and serve level 0 observations made with HIRES, the High Resolution Echelle Spectrometer. Since August 18, the KOA has ingested 325 GB of data from 135 nights of observations. The architecture exploits a service layer between the mass storage layer and the user interface. This service layer consists of standalone utilities called through a simple executive that perform generic query and retrieval functions, such as query generation, database table sub-setting, and return page generation etc. It has been extended to implement proprietary access to data through deployment of query management middleware developed for the National Virtual Observatory. The MSC archives have recently extended this design to query and retrieve complex data sets describing the properties of potential target stars for the Terrestrial Planet Finder (TPF) missions. The archives can now support knowledge based retrieval, as well as data retrieval. This paper describes how extensions to the IRSA architecture, which is applicable across all wavelengths and astronomical datatypes, supports the design and development of the MSC NP archives at modest cost.
Analysis of Pervasive Mobile Ad Hoc Routing Protocols
NASA Astrophysics Data System (ADS)
Qadri, Nadia N.; Liotta, Antonio
Mobile ad hoc networks (MANETs) are a fundamental element of pervasive networks and therefore, of pervasive systems that truly support pervasive computing, where user can communicate anywhere, anytime and on-the-fly. In fact, future advances in pervasive computing rely on advancements in mobile communication, which includes both infrastructure-based wireless networks and non-infrastructure-based MANETs. MANETs introduce a new communication paradigm, which does not require a fixed infrastructure - they rely on wireless terminals for routing and transport services. Due to highly dynamic topology, absence of established infrastructure for centralized administration, bandwidth constrained wireless links, and limited resources in MANETs, it is challenging to design an efficient and reliable routing protocol. This chapter reviews the key studies carried out so far on the performance of mobile ad hoc routing protocols. We discuss performance issues and metrics required for the evaluation of ad hoc routing protocols. This leads to a survey of existing work, which captures the performance of ad hoc routing algorithms and their behaviour from different perspectives and highlights avenues for future research.
Spatial information semantic query based on SPARQL
NASA Astrophysics Data System (ADS)
Xiao, Zhifeng; Huang, Lei; Zhai, Xiaofang
2009-10-01
How can the efficiency of spatial information inquiries be enhanced in today's fast-growing information age? We are rich in geospatial data but poor in up-to-date geospatial information and knowledge that are ready to be accessed by public users. This paper adopts an approach for querying spatial semantic by building an Web Ontology language(OWL) format ontology and introducing SPARQL Protocol and RDF Query Language(SPARQL) to search spatial semantic relations. It is important to establish spatial semantics that support for effective spatial reasoning for performing semantic query. Compared to earlier keyword-based and information retrieval techniques that rely on syntax, we use semantic approaches in our spatial queries system. Semantic approaches need to be developed by ontology, so we use OWL to describe spatial information extracted by the large-scale map of Wuhan. Spatial information expressed by ontology with formal semantics is available to machines for processing and to people for understanding. The approach is illustrated by introducing a case study for using SPARQL to query geo-spatial ontology instances of Wuhan. The paper shows that making use of SPARQL to search OWL ontology instances can ensure the result's accuracy and applicability. The result also indicates constructing a geo-spatial semantic query system has positive efforts on forming spatial query and retrieval.
Zhou, ZhangBing; Zhao, Deng; Shu, Lei; Tsang, Kim-Fung
2015-01-01
Wireless sensor networks, serving as an important interface between physical environments and computational systems, have been used extensively for supporting domain applications, where multiple-attribute sensory data are queried from the network continuously and periodically. Usually, certain sensory data may not vary significantly within a certain time duration for certain applications. In this setting, sensory data gathered at a certain time slot can be used for answering concurrent queries and may be reused for answering the forthcoming queries when the variation of these data is within a certain threshold. To address this challenge, a popularity-based cooperative caching mechanism is proposed in this article, where the popularity of sensory data is calculated according to the queries issued in recent time slots. This popularity reflects the possibility that sensory data are interested in the forthcoming queries. Generally, sensory data with the highest popularity are cached at the sink node, while sensory data that may not be interested in the forthcoming queries are cached in the head nodes of divided grid cells. Leveraging these cooperatively cached sensory data, queries are answered through composing these two-tier cached data. Experimental evaluation shows that this approach can reduce the network communication cost significantly and increase the network capability. PMID:26131665
Roh, Sungjong; Niederdeppe, Jeff
2016-09-01
This study uses data from systematic Web image search results and two randomized survey experiments to analyze how frames commonly used in public debates about health issues, operationalized here as alternative word choices, influence public support for health policy reforms. In Study 1, analyses of Bing (N = 1,719), Google (N = 1,872), and Yahoo Images (N = 1,657) search results suggest that the images returned from the search query "sugar-sweetened beverage" are more likely to evoke health-related concepts than images returned from a search query about "soda." In contrast, "soda" search queries were more likely to incorporate brand-related concepts than "sugar-sweetened beverage" search queries. In Study 2, participants (N = 206) in a controlled Web experiment rated their support for policies to reduce consumption of these drinks. As expected, strong liberals had more support for policies designed to reduce the consumption of these drinks when the policies referenced "soda" compared to "sugar-sweetened beverage." To the contrary, items describing these drinks as "soda" produced lower policy support than items describing them as "sugar-sweetened beverage" among strong conservatives. In Study 3, participants (N = 1,000) in a national telephone survey experiment rated their support for a similar set of policies. Results conceptually replicated the previous Web-based experiment, such that strong liberals reported greater support for a penny-per-ounce taxation when labeled "soda" versus "sugar-sweetened beverages." In both Studies 2 and 3, more respondents referred to brand-related concepts in response to questions about "sugar-sweetened beverages" compared to "soda." We conclude with a discussion of theoretical and methodological implications for studying framing effects of labels.
Hruby, Gregory W; Matsoukas, Konstantina; Cimino, James J; Weng, Chunhua
2016-04-01
Electronic health records (EHR) are a vital data resource for research uses, including cohort identification, phenotyping, pharmacovigilance, and public health surveillance. To realize the promise of EHR data for accelerating clinical research, it is imperative to enable efficient and autonomous EHR data interrogation by end users such as biomedical researchers. This paper surveys state-of-art approaches and key methodological considerations to this purpose. We adapted a previously published conceptual framework for interactive information retrieval, which defines three entities: user, channel, and source, by elaborating on channels for query formulation in the context of facilitating end users to interrogate EHR data. We show the current progress in biomedical informatics mainly lies in support for query execution and information modeling, primarily due to emphases on infrastructure development for data integration and data access via self-service query tools, but has neglected user support needed during iteratively query formulation processes, which can be costly and error-prone. In contrast, the information science literature has offered elaborate theories and methods for user modeling and query formulation support. The two bodies of literature are complementary, implying opportunities for cross-disciplinary idea exchange. On this basis, we outline the directions for future informatics research to improve our understanding of user needs and requirements for facilitating autonomous interrogation of EHR data by biomedical researchers. We suggest that cross-disciplinary translational research between biomedical informatics and information science can benefit our research in facilitating efficient data access in life sciences. Copyright © 2016 Elsevier Inc. All rights reserved.
Wang, Amy Y; Lancaster, William J; Wyatt, Matthew C; Rasmussen, Luke V; Fort, Daniel G; Cimino, James J
2017-01-01
A major challenge in using electronic health record repositories for research is the difficulty matching subject eligibility criteria to query capabilities of the repositories. We propose categories for study criteria corresponding to the effort needed for querying those criteria: "easy" (supporting automated queries), mixed (initial automated querying with manual review), "hard" (fully manual record review), and "impossible" or "point of enrollment" (not typically in health repositories). We obtained a sample of 292 criteria from 20 studies from ClinicalTrials.gov. Six independent reviewers, three each from two academic research institutions, rated criteria according to our four types. We observed high interrater reliability both within and between institutions. The analysis demonstrated typical features of criteria that map with varying levels of difficulty to repositories. We propose using these features to improve enrollment workflow through more standardized study criteria, self-service repository queries, and analyst-mediated retrievals.
Wang, Amy Y.; Lancaster, William J.; Wyatt, Matthew C.; Rasmussen, Luke V.; Fort, Daniel G.; Cimino, James J.
2017-01-01
A major challenge in using electronic health record repositories for research is the difficulty matching subject eligibility criteria to query capabilities of the repositories. We propose categories for study criteria corresponding to the effort needed for querying those criteria: “easy” (supporting automated queries), mixed (initial automated querying with manual review), “hard” (fully manual record review), and “impossible” or “point of enrollment” (not typically in health repositories). We obtained a sample of 292 criteria from 20 studies from ClinicalTrials.gov. Six independent reviewers, three each from two academic research institutions, rated criteria according to our four types. We observed high interrater reliability both within and between institutions. The analysis demonstrated typical features of criteria that map with varying levels of difficulty to repositories. We propose using these features to improve enrollment workflow through more standardized study criteria, self-service repository queries, and analyst-mediated retrievals. PMID:29854246
Remembrance of inferences past: Amortization in human hypothesis generation.
Dasgupta, Ishita; Schulz, Eric; Goodman, Noah D; Gershman, Samuel J
2018-05-21
Bayesian models of cognition assume that people compute probability distributions over hypotheses. However, the required computations are frequently intractable or prohibitively expensive. Since people often encounter many closely related distributions, selective reuse of computations (amortized inference) is a computationally efficient use of the brain's limited resources. We present three experiments that provide evidence for amortization in human probabilistic reasoning. When sequentially answering two related queries about natural scenes, participants' responses to the second query systematically depend on the structure of the first query. This influence is sensitive to the content of the queries, only appearing when the queries are related. Using a cognitive load manipulation, we find evidence that people amortize summary statistics of previous inferences, rather than storing the entire distribution. These findings support the view that the brain trades off accuracy and computational cost, to make efficient use of its limited cognitive resources to approximate probabilistic inference. Copyright © 2018 Elsevier B.V. All rights reserved.
Algorithmic, LOCS and HOCS (chemistry) exam questions: performance and attitudes of college students
NASA Astrophysics Data System (ADS)
Zoller, Uri
2002-02-01
The performance of freshmen biology and physics-mathematics majors and chemistry majors as well as pre- and in-service chemistry teachers in two Israeli universities on algorithmic (ALG), lower-order cognitive skills (LOCS), and higher-order cognitive skills (HOCS) chemistry exam questions were studied. The driving force for the study was an interest in moving science and chemistry instruction from an algorithmic and factual recall orientation dominated by LOCS, to a decision-making, problem-solving and critical system thinking approach, dominated by HOCS. College students' responses to the specially designed ALG, LOCS and HOCS chemistry exam questions were scored and analysed for differences and correlation between the performance means within and across universities by the questions' category. This was followed by a combined student interview - 'speaking aloud' problem solving session for assessing the thinking processes involved in solving these types of questions and the students' attitudes towards them. The main findings were: (1) students in both universities performed consistently in each of the three categories in the order of ALG > LOCS > HOCS; their 'ideological' preference, was HOCS > algorithmic/LOCS, - referred to as 'computational questions', but their pragmatic preference was the reverse; (2) success on algorithmic/LOCS does not imply success on HOCS questions; algorithmic questions constitute a category on its own as far as students success in solving them is concerned. Our study and its results support the effort being made, worldwide, to integrate HOCS-fostering teaching and assessment strategies and, to develop HOCS-oriented science-technology-environment-society (STES)-type curricula within science and chemistry education.
UCSC genome browser: deep support for molecular biomedical research.
Mangan, Mary E; Williams, Jennifer M; Lathe, Scott M; Karolchik, Donna; Lathe, Warren C
2008-01-01
The volume and complexity of genomic sequence data, and the additional experimental data required for annotation of the genomic context, pose a major challenge for display and access for biomedical researchers. Genome browsers organize this data and make it available in various ways to extract useful information to advance research projects. The UCSC Genome Browser is one of these resources. The official sequence data for a given species forms the framework to display many other types of data such as expression, variation, cross-species comparisons, and more. Visual representations of the data are available for exploration. Data can be queried with sequences. Complex database queries are also easily achieved with the Table Browser interface. Associated tools permit additional query types or access to additional data sources such as images of in situ localizations. Support for solving researcher's issues is provided with active discussion mailing lists and by providing updated training materials. The UCSC Genome Browser provides a source of deep support for a wide range of biomedical molecular research (http://genome.ucsc.edu).
Efficient Byzantine Fault Tolerance for Scalable Storage and Services
2009-07-01
most critical applications must survive in ever harsher environments. Less synchronous networking delivers packets unreliably and unpredictably, and... synchronous environments to allowing asynchrony, and from tolerating crashes to tolerating some corruptions through ad-hoc consistency checks. Ad-hoc...servers are responsive. To support this thesis statement, this disseration takes the following steps. First, it develops a new cryptographic primitive
NASA Technical Reports Server (NTRS)
Friedman, S. Z.; Walker, R. E.; Aitken, R. B.
1986-01-01
The Image Based Information System (IBIS) has been under development at the Jet Propulsion Laboratory (JPL) since 1975. It is a collection of more than 90 programs that enable processing of image, graphical, tabular data for spatial analysis. IBIS can be utilized to create comprehensive geographic data bases. From these data, an analyst can study various attributes describing characteristics of a given study area. Even complex combinations of disparate data types can be synthesized to obtain a new perspective on spatial phenomena. In 1984, new query software was developed enabling direct Boolean queries of IBIS data bases through the submission of easily understood expressions. An improved syntax methodology, a data dictionary, and display software simplified the analysts' tasks associated with building, executing, and subsequently displaying the results of a query. The primary purpose of this report is to describe the features and capabilities of the new query software. A secondary purpose of this report is to compare this new query software to the query software developed previously (Friedman, 1982). With respect to this topic, the relative merits and drawbacks of both approaches are covered.
An exponentiation method for XML element retrieval.
Wichaiwong, Tanakorn
2014-01-01
XML document is now widely used for modelling and storing structured documents. The structure is very rich and carries important information about contents and their relationships, for example, e-Commerce. XML data-centric collections require query terms allowing users to specify constraints on the document structure; mapping structure queries and assigning the weight are significant for the set of possibly relevant documents with respect to structural conditions. In this paper, we present an extension to the MEXIR search system that supports the combination of structural and content queries in the form of content-and-structure queries, which we call the Exponentiation function. It has been shown the structural information improve the effectiveness of the search system up to 52.60% over the baseline BM25 at MAP.
A dirty word or a dirty world?: Attribute framing, political affiliation, and query theory.
Hardisty, David J; Johnson, Eric J; Weber, Elke U
2010-01-01
We explored the effect of attribute framing on choice, labeling charges for environmental costs as either an earmarked tax or an offset. Eight hundred ninety-eight Americans chose between otherwise identical products or services, where one option included a surcharge for emitted carbon dioxide. The cost framing changed preferences for self-identified Republicans and Independents, but did not affect Democrats' preferences. We explain this interaction by means of query theory and show that attribute framing can change the order in which internal queries supporting one or another option are posed. The effect of attribute labeling on query order is shown to depend on the representations of either taxes or offsets held by people with different political affiliations.
Koelmans, Albert A; Bakir, Adil; Burton, G Allen; Janssen, Colin R
2016-04-05
The hypothesis that 'microplastic will transfer hazardous hydrophobic organic chemicals (HOC) to marine animals' has been central to the perceived hazard and risk of plastic in the marine environment. The hypothesis is often cited and has gained momentum, turning it into paradigm status. We provide a critical evaluation of the scientific literature regarding this hypothesis. Using new calculations based on published studies, we explain the sometimes contrasting views and unify them in one interpretive framework. One explanation for the contrasting views among studies is that they test different hypotheses. When reframed in the context of the above hypothesis, the available data become consistent. We show that HOC microplastic-water partitioning can be assumed to be at equilibrium for most microplastic residing in the oceans. We calculate the fraction of total HOC sorbed by plastics to be small compared to that sorbed by other media in the ocean. We further demonstrate consistency among (a) measured HOC transfer from microplastic to organisms in the laboratory, (b) measured HOC desorption rates for polymers in artificial gut fluids (c) simulations by plastic-inclusive bioaccumulation models and (d) HOC desorption rates for polymers inferred from first principles. We conclude that overall the flux of HOCs bioaccumulated from natural prey overwhelms the flux from ingested microplastic for most habitats, which implies that microplastic ingestion is not likely to increase the exposure to and thus risks of HOCs in the marine environment.
High dimensional biological data retrieval optimization with NoSQL technology.
Wang, Shicai; Pandis, Ioannis; Wu, Chao; He, Sijin; Johnson, David; Emam, Ibrahim; Guitton, Florian; Guo, Yike
2014-01-01
High-throughput transcriptomic data generated by microarray experiments is the most abundant and frequently stored kind of data currently used in translational medicine studies. Although microarray data is supported in data warehouses such as tranSMART, when querying relational databases for hundreds of different patient gene expression records queries are slow due to poor performance. Non-relational data models, such as the key-value model implemented in NoSQL databases, hold promise to be more performant solutions. Our motivation is to improve the performance of the tranSMART data warehouse with a view to supporting Next Generation Sequencing data. In this paper we introduce a new data model better suited for high-dimensional data storage and querying, optimized for database scalability and performance. We have designed a key-value pair data model to support faster queries over large-scale microarray data and implemented the model using HBase, an implementation of Google's BigTable storage system. An experimental performance comparison was carried out against the traditional relational data model implemented in both MySQL Cluster and MongoDB, using a large publicly available transcriptomic data set taken from NCBI GEO concerning Multiple Myeloma. Our new key-value data model implemented on HBase exhibits an average 5.24-fold increase in high-dimensional biological data query performance compared to the relational model implemented on MySQL Cluster, and an average 6.47-fold increase on query performance on MongoDB. The performance evaluation found that the new key-value data model, in particular its implementation in HBase, outperforms the relational model currently implemented in tranSMART. We propose that NoSQL technology holds great promise for large-scale data management, in particular for high-dimensional biological data such as that demonstrated in the performance evaluation described in this paper. We aim to use this new data model as a basis for migrating tranSMART's implementation to a more scalable solution for Big Data.
High dimensional biological data retrieval optimization with NoSQL technology
2014-01-01
Background High-throughput transcriptomic data generated by microarray experiments is the most abundant and frequently stored kind of data currently used in translational medicine studies. Although microarray data is supported in data warehouses such as tranSMART, when querying relational databases for hundreds of different patient gene expression records queries are slow due to poor performance. Non-relational data models, such as the key-value model implemented in NoSQL databases, hold promise to be more performant solutions. Our motivation is to improve the performance of the tranSMART data warehouse with a view to supporting Next Generation Sequencing data. Results In this paper we introduce a new data model better suited for high-dimensional data storage and querying, optimized for database scalability and performance. We have designed a key-value pair data model to support faster queries over large-scale microarray data and implemented the model using HBase, an implementation of Google's BigTable storage system. An experimental performance comparison was carried out against the traditional relational data model implemented in both MySQL Cluster and MongoDB, using a large publicly available transcriptomic data set taken from NCBI GEO concerning Multiple Myeloma. Our new key-value data model implemented on HBase exhibits an average 5.24-fold increase in high-dimensional biological data query performance compared to the relational model implemented on MySQL Cluster, and an average 6.47-fold increase on query performance on MongoDB. Conclusions The performance evaluation found that the new key-value data model, in particular its implementation in HBase, outperforms the relational model currently implemented in tranSMART. We propose that NoSQL technology holds great promise for large-scale data management, in particular for high-dimensional biological data such as that demonstrated in the performance evaluation described in this paper. We aim to use this new data model as a basis for migrating tranSMART's implementation to a more scalable solution for Big Data. PMID:25435347
NASA Astrophysics Data System (ADS)
Loyall, Joseph P.; Carvalho, Marco; Martignoni, Andrew, III; Schmidt, Douglas; Sinclair, Asher; Gillen, Matthew; Edmondson, James; Bunch, Larry; Corman, David
2009-05-01
Net-centric information spaces have become a necessary concept to support information exchange for tactical warfighting missions using a publish-subscribe-query paradigm. To support dynamic, mission-critical and time-critical operations, information spaces require quality of service (QoS)-enabled dissemination (QED) of information. This paper describes the results of research we are conducting to provide QED information exchange in tactical environments. We have developed a prototype QoS-enabled publish-subscribe-query information broker that provides timely delivery of information needed by tactical warfighters in mobile scenarios with time-critical emergent targets. This broker enables tailoring and prioritizing of information based on mission needs and responds rapidly to priority shifts and unfolding situations. This paper describes the QED architecture, prototype implementation, testing infrastructure, and empirical evaluations we have conducted based on our prototype.
Scientific Workflows + Provenance = Better (Meta-)Data Management
NASA Astrophysics Data System (ADS)
Ludaescher, B.; Cuevas-Vicenttín, V.; Missier, P.; Dey, S.; Kianmajd, P.; Wei, Y.; Koop, D.; Chirigati, F.; Altintas, I.; Belhajjame, K.; Bowers, S.
2013-12-01
The origin and processing history of an artifact is known as its provenance. Data provenance is an important form of metadata that explains how a particular data product came about, e.g., how and when it was derived in a computational process, which parameter settings and input data were used, etc. Provenance information provides transparency and helps to explain and interpret data products. Other common uses and applications of provenance include quality control, data curation, result debugging, and more generally, 'reproducible science'. Scientific workflow systems (e.g. Kepler, Taverna, VisTrails, and others) provide controlled environments for developing computational pipelines with built-in provenance support. Workflow results can then be explained in terms of workflow steps, parameter settings, input data, etc. using provenance that is automatically captured by the system. Scientific workflows themselves provide a user-friendly abstraction of the computational process and are thus a form of ('prospective') provenance in their own right. The full potential of provenance information is realized when combining workflow-level information (prospective provenance) with trace-level information (retrospective provenance). To this end, the DataONE Provenance Working Group (ProvWG) has developed an extension of the W3C PROV standard, called D-PROV. Whereas PROV provides a 'least common denominator' for exchanging and integrating provenance information, D-PROV adds new 'observables' that described workflow-level information (e.g., the functional steps in a pipeline), as well as workflow-specific trace-level information ( timestamps for each workflow step executed, the inputs and outputs used, etc.) Using examples, we will demonstrate how the combination of prospective and retrospective provenance provides added value in managing scientific data. The DataONE ProvWG is also developing tools based on D-PROV that allow scientists to get more mileage from provenance metadata. DataONE is a federation of member nodes that store data and metadata for discovery and access. By enriching metadata with provenance information, search and reuse of data is enhanced, and the 'social life' of data (being the product of many workflow runs, different people, etc.) is revealed. We are currently prototyping a provenance repository (PBase) to demonstrate what can be achieved with advanced provenance queries. The ProvExplorer and ProPub tools support advanced ad-hoc querying and visualization of provenance as well as customized provenance publications (e.g., to address privacy issues, or to focus provenance to relevant details). In a parallel line of work, we are exploring ways to add provenance support to widely-used scripting platforms (e.g. R and Python) and then expose that information via D-PROV.
An index-based algorithm for fast on-line query processing of latent semantic analysis
Li, Pohan; Wang, Wei
2017-01-01
Latent Semantic Analysis (LSA) is widely used for finding the documents whose semantic is similar to the query of keywords. Although LSA yield promising similar results, the existing LSA algorithms involve lots of unnecessary operations in similarity computation and candidate check during on-line query processing, which is expensive in terms of time cost and cannot efficiently response the query request especially when the dataset becomes large. In this paper, we study the efficiency problem of on-line query processing for LSA towards efficiently searching the similar documents to a given query. We rewrite the similarity equation of LSA combined with an intermediate value called partial similarity that is stored in a designed index called partial index. For reducing the searching space, we give an approximate form of similarity equation, and then develop an efficient algorithm for building partial index, which skips the partial similarities lower than a given threshold θ. Based on partial index, we develop an efficient algorithm called ILSA for supporting fast on-line query processing. The given query is transformed into a pseudo document vector, and the similarities between query and candidate documents are computed by accumulating the partial similarities obtained from the index nodes corresponds to non-zero entries in the pseudo document vector. Compared to the LSA algorithm, ILSA reduces the time cost of on-line query processing by pruning the candidate documents that are not promising and skipping the operations that make little contribution to similarity scores. Extensive experiments through comparison with LSA have been done, which demonstrate the efficiency and effectiveness of our proposed algorithm. PMID:28520747
An index-based algorithm for fast on-line query processing of latent semantic analysis.
Zhang, Mingxi; Li, Pohan; Wang, Wei
2017-01-01
Latent Semantic Analysis (LSA) is widely used for finding the documents whose semantic is similar to the query of keywords. Although LSA yield promising similar results, the existing LSA algorithms involve lots of unnecessary operations in similarity computation and candidate check during on-line query processing, which is expensive in terms of time cost and cannot efficiently response the query request especially when the dataset becomes large. In this paper, we study the efficiency problem of on-line query processing for LSA towards efficiently searching the similar documents to a given query. We rewrite the similarity equation of LSA combined with an intermediate value called partial similarity that is stored in a designed index called partial index. For reducing the searching space, we give an approximate form of similarity equation, and then develop an efficient algorithm for building partial index, which skips the partial similarities lower than a given threshold θ. Based on partial index, we develop an efficient algorithm called ILSA for supporting fast on-line query processing. The given query is transformed into a pseudo document vector, and the similarities between query and candidate documents are computed by accumulating the partial similarities obtained from the index nodes corresponds to non-zero entries in the pseudo document vector. Compared to the LSA algorithm, ILSA reduces the time cost of on-line query processing by pruning the candidate documents that are not promising and skipping the operations that make little contribution to similarity scores. Extensive experiments through comparison with LSA have been done, which demonstrate the efficiency and effectiveness of our proposed algorithm.
SPANG: a SPARQL client supporting generation and reuse of queries for distributed RDF databases.
Chiba, Hirokazu; Uchiyama, Ikuo
2017-02-08
Toward improved interoperability of distributed biological databases, an increasing number of datasets have been published in the standardized Resource Description Framework (RDF). Although the powerful SPARQL Protocol and RDF Query Language (SPARQL) provides a basis for exploiting RDF databases, writing SPARQL code is burdensome for users including bioinformaticians. Thus, an easy-to-use interface is necessary. We developed SPANG, a SPARQL client that has unique features for querying RDF datasets. SPANG dynamically generates typical SPARQL queries according to specified arguments. It can also call SPARQL template libraries constructed in a local system or published on the Web. Further, it enables combinatorial execution of multiple queries, each with a distinct target database. These features facilitate easy and effective access to RDF datasets and integrative analysis of distributed data. SPANG helps users to exploit RDF datasets by generation and reuse of SPARQL queries through a simple interface. This client will enhance integrative exploitation of biological RDF datasets distributed across the Web. This software package is freely available at http://purl.org/net/spang .
A Fast Healthcare Interoperability Resources (FHIR) layer implemented over i2b2.
Boussadi, Abdelali; Zapletal, Eric
2017-08-14
Standards and technical specifications have been developed to define how the information contained in Electronic Health Records (EHRs) should be structured, semantically described, and communicated. Current trends rely on differentiating the representation of data instances from the definition of clinical information models. The dual model approach, which combines a reference model (RM) and a clinical information model (CIM), sets in practice this software design pattern. The most recent initiative, proposed by HL7, is called Fast Health Interoperability Resources (FHIR). The aim of our study was to investigate the feasibility of applying the FHIR standard to modeling and exposing EHR data of the Georges Pompidou European Hospital (HEGP) integrating biology and the bedside (i2b2) clinical data warehouse (CDW). We implemented a FHIR server over i2b2 to expose EHR data in relation with five FHIR resources: DiagnosisReport, MedicationOrder, Patient, Encounter, and Medication. The architecture of the server combines a Data Access Object design pattern and FHIR resource providers, implemented using the Java HAPI FHIR API. Two types of queries were tested: query type #1 requests the server to display DiagnosticReport resources, for which the diagnosis code is equal to a given ICD-10 code. A total of 80 DiagnosticReport resources, corresponding to 36 patients, were displayed. Query type #2, requests the server to display MedicationOrder, for which the FHIR Medication identification code is equal to a given code expressed in a French coding system. A total of 503 MedicationOrder resources, corresponding to 290 patients, were displayed. Results were validated by manually comparing the results of each request to the results displayed by an ad-hoc SQL query. We showed the feasibility of implementing a Java layer over the i2b2 database model to expose data of the CDW as a set of FHIR resources. An important part of this work was the structural and semantic mapping between the i2b2 model and the FHIR RM. To accomplish this, developers must manually browse the specifications of the FHIR standard. Our source code is freely available and can be adapted for use in other i2b2 sites.
Huang, Chung-Chi; Lu, Zhiyong
2016-01-01
Identifying relevant papers from the literature is a common task in biocuration. Most current biomedical literature search systems primarily rely on matching user keywords. Semantic search, on the other hand, seeks to improve search accuracy by understanding the entities and contextual relations in user keywords. However, past research has mostly focused on semantically identifying biological entities (e.g. chemicals, diseases and genes) with little effort on discovering semantic relations. In this work, we aim to discover biomedical semantic relations in PubMed queries in an automated and unsupervised fashion. Specifically, we focus on extracting and understanding the contextual information (or context patterns) that is used by PubMed users to represent semantic relations between entities such as ‘CHEMICAL-1 compared to CHEMICAL-2.’ With the advances in automatic named entity recognition, we first tag entities in PubMed queries and then use tagged entities as knowledge to recognize pattern semantics. More specifically, we transform PubMed queries into context patterns involving participating entities, which are subsequently projected to latent topics via latent semantic analysis (LSA) to avoid the data sparseness and specificity issues. Finally, we mine semantically similar contextual patterns or semantic relations based on LSA topic distributions. Our two separate evaluation experiments of chemical-chemical (CC) and chemical–disease (CD) relations show that the proposed approach significantly outperforms a baseline method, which simply measures pattern semantics by similarity in participating entities. The highest performance achieved by our approach is nearly 0.9 and 0.85 respectively for the CC and CD task when compared against the ground truth in terms of normalized discounted cumulative gain (nDCG), a standard measure of ranking quality. These results suggest that our approach can effectively identify and return related semantic patterns in a ranked order covering diverse bio-entity relations. To assess the potential utility of our automated top-ranked patterns of a given relation in semantic search, we performed a pilot study on frequently sought semantic relations in PubMed and observed improved literature retrieval effectiveness based on post-hoc human relevance evaluation. Further investigation in larger tests and in real-world scenarios is warranted. PMID:27016698
NASA Astrophysics Data System (ADS)
Edwards, Mark; Hu, Fei; Kumar, Sunil
2004-10-01
The research on the Novelty Detection System (NDS) (called as VENUS) at the authors' universities has generated exciting results. For example, we can detect an abnormal behavior (such as cars thefts from the parking lot) from a series of video frames based on the cognitively motivated theory of habituation. In this paper, we would like to describe the implementation strategies of lower layer protocols for using large-scale Wireless Sensor Networks (WSN) to NDS with Quality-of-Service (QoS) support. Wireless data collection framework, consisting of small and low-power sensor nodes, provides an alternative mechanism to observe the physical world, by using various types of sensing capabilities that include images (and even videos using Panoptos), sound and basic physical measurements such as temperature. We do not want to lose any 'data query command' packets (in the downstream direction: sink-to-sensors) or have any bit-errors in them since they are so important to the whole sensor network. In the upstream direction (sensors-to-sink), we may tolerate the loss of some sensing data packets. But the 'interested' sensing flow should be assigned a higher priority in terms of multi-hop path choice, network bandwidth allocation, and sensing data packet generation frequency (we hope to generate more sensing data packet for that novel event in the specified network area). The focus of this paper is to investigate MAC-level Quality of Service (QoS) issue in Wireless Sensor Networks (WSN) for Novelty Detection applications. Although QoS has been widely studied in other types of networks including wired Internet, general ad hoc networks and mobile cellular networks, we argue that QoS in WSN has its own characteristics. In wired Internet, the main QoS parameters include delay, jitter and bandwidth. In mobile cellular networks, two most common QoS metrics are: handoff call dropping probability and new call blocking probability. Since the main task of WSN is to detect and report events, the most important QoS parameters should include sensing data packet transmission reliability, lifetime extension degree from sensor sleeping control, event detection latency, congestion reduction level through removal of redundant sensing data. In this paper, we will focus on the following bi-directional QoS topics: (1) Downstream (sink-to-sensor) QoS: Reliable data query command forwarding to particular sensor(s). In other words, we do not want to lose the query command packets; (2) Upstream (sensor-to-sink) QoS: transmission of sensed data with priority control. The more interested data that can help in novelty detection should be transmitted on an optimal path with higher reliability. We propose the use of Differentiated Data Collection. Due to the large-scale nature and resource constraints of typical wireless sensor networks, such as limited energy, small memory (typically RAM < 4K bytes) and short communication range, the above problems become even more challenging. Besides QoS support issue, we will also describe our low-energy Sensing Data Transmission network Architecture. Our research results show the scalability and energy-efficiency of our proposed WSN QoS schemes.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Czejdo, Bogdan; Bhattacharya, Sambit; Ferragut, Erik M
2012-01-01
This paper describes the syntax and semantics of multi-level state diagrams to support probabilistic behavior of cooperating robots. The techniques are presented to analyze these diagrams by querying combined robots behaviors. It is shown how to use state abstraction and transition abstraction to create, verify and process large probabilistic state diagrams.
2008-03-01
Fortunately, built into Excel is the capability to use ActiveX Data Objects (ADO), a software feature which uses VBA to interface with external...part of Excel’s ActiveX Direct Objects (ADO) functionality, Excel can execute SQL queries in Access with VBA. An SQL query statement can be written
LAILAPS-QSM: A RESTful API and JAVA library for semantic query suggestions.
Chen, Jinbo; Scholz, Uwe; Zhou, Ruonan; Lange, Matthias
2018-03-01
In order to access and filter content of life-science databases, full text search is a widely applied query interface. But its high flexibility and intuitiveness is paid for with potentially imprecise and incomplete query results. To reduce this drawback, query assistance systems suggest those combinations of keywords with the highest potential to match most of the relevant data records. Widespread approaches are syntactic query corrections that avoid misspelling and support expansion of words by suffixes and prefixes. Synonym expansion approaches apply thesauri, ontologies, and query logs. All need laborious curation and maintenance. Furthermore, access to query logs is in general restricted. Approaches that infer related queries by their query profile like research field, geographic location, co-authorship, affiliation etc. require user's registration and its public accessibility that contradict privacy concerns. To overcome these drawbacks, we implemented LAILAPS-QSM, a machine learning approach that reconstruct possible linguistic contexts of a given keyword query. The context is referred from the text records that are stored in the databases that are going to be queried or extracted for a general purpose query suggestion from PubMed abstracts and UniProt data. The supplied tool suite enables the pre-processing of these text records and the further computation of customized distributed word vectors. The latter are used to suggest alternative keyword queries. An evaluated of the query suggestion quality was done for plant science use cases. Locally present experts enable a cost-efficient quality assessment in the categories trait, biological entity, taxonomy, affiliation, and metabolic function which has been performed using ontology term similarities. LAILAPS-QSM mean information content similarity for 15 representative queries is 0.70, whereas 34% have a score above 0.80. In comparison, the information content similarity for human expert made query suggestions is 0.90. The software is either available as tool set to build and train dedicated query suggestion services or as already trained general purpose RESTful web service. The service uses open interfaces to be seamless embeddable into database frontends. The JAVA implementation uses highly optimized data structures and streamlined code to provide fast and scalable response for web service calls. The source code of LAILAPS-QSM is available under GNU General Public License version 2 in Bitbucket GIT repository: https://bitbucket.org/ipk_bit_team/bioescorte-suggestion.
Ji, Yanqing; Ying, Hao; Tran, John; Dews, Peter; Massanari, R Michael
2016-07-19
Finding highly relevant articles from biomedical databases is challenging not only because it is often difficult to accurately express a user's underlying intention through keywords but also because a keyword-based query normally returns a long list of hits with many citations being unwanted by the user. This paper proposes a novel biomedical literature search system, called BiomedSearch, which supports complex queries and relevance feedback. The system employed association mining techniques to build a k-profile representing a user's relevance feedback. More specifically, we developed a weighted interest measure and an association mining algorithm to find the strength of association between a query and each concept in the article(s) selected by the user as feedback. The top concepts were utilized to form a k-profile used for the next-round search. BiomedSearch relies on Unified Medical Language System (UMLS) knowledge sources to map text files to standard biomedical concepts. It was designed to support queries with any levels of complexity. A prototype of BiomedSearch software was made and it was preliminarily evaluated using the Genomics data from TREC (Text Retrieval Conference) 2006 Genomics Track. Initial experiment results indicated that BiomedSearch increased the mean average precision (MAP) for a set of queries. With UMLS and association mining techniques, BiomedSearch can effectively utilize users' relevance feedback to improve the performance of biomedical literature search.
A high-performance spatial database based approach for pathology imaging algorithm evaluation
Wang, Fusheng; Kong, Jun; Gao, Jingjing; Cooper, Lee A.D.; Kurc, Tahsin; Zhou, Zhengwen; Adler, David; Vergara-Niedermayr, Cristobal; Katigbak, Bryan; Brat, Daniel J.; Saltz, Joel H.
2013-01-01
Background: Algorithm evaluation provides a means to characterize variability across image analysis algorithms, validate algorithms by comparison with human annotations, combine results from multiple algorithms for performance improvement, and facilitate algorithm sensitivity studies. The sizes of images and image analysis results in pathology image analysis pose significant challenges in algorithm evaluation. We present an efficient parallel spatial database approach to model, normalize, manage, and query large volumes of analytical image result data. This provides an efficient platform for algorithm evaluation. Our experiments with a set of brain tumor images demonstrate the application, scalability, and effectiveness of the platform. Context: The paper describes an approach and platform for evaluation of pathology image analysis algorithms. The platform facilitates algorithm evaluation through a high-performance database built on the Pathology Analytic Imaging Standards (PAIS) data model. Aims: (1) Develop a framework to support algorithm evaluation by modeling and managing analytical results and human annotations from pathology images; (2) Create a robust data normalization tool for converting, validating, and fixing spatial data from algorithm or human annotations; (3) Develop a set of queries to support data sampling and result comparisons; (4) Achieve high performance computation capacity via a parallel data management infrastructure, parallel data loading and spatial indexing optimizations in this infrastructure. Materials and Methods: We have considered two scenarios for algorithm evaluation: (1) algorithm comparison where multiple result sets from different methods are compared and consolidated; and (2) algorithm validation where algorithm results are compared with human annotations. We have developed a spatial normalization toolkit to validate and normalize spatial boundaries produced by image analysis algorithms or human annotations. The validated data were formatted based on the PAIS data model and loaded into a spatial database. To support efficient data loading, we have implemented a parallel data loading tool that takes advantage of multi-core CPUs to accelerate data injection. The spatial database manages both geometric shapes and image features or classifications, and enables spatial sampling, result comparison, and result aggregation through expressive structured query language (SQL) queries with spatial extensions. To provide scalable and efficient query support, we have employed a shared nothing parallel database architecture, which distributes data homogenously across multiple database partitions to take advantage of parallel computation power and implements spatial indexing to achieve high I/O throughput. Results: Our work proposes a high performance, parallel spatial database platform for algorithm validation and comparison. This platform was evaluated by storing, managing, and comparing analysis results from a set of brain tumor whole slide images. The tools we develop are open source and available to download. Conclusions: Pathology image algorithm validation and comparison are essential to iterative algorithm development and refinement. One critical component is the support for queries involving spatial predicates and comparisons. In our work, we develop an efficient data model and parallel database approach to model, normalize, manage and query large volumes of analytical image result data. Our experiments demonstrate that the data partitioning strategy and the grid-based indexing result in good data distribution across database nodes and reduce I/O overhead in spatial join queries through parallel retrieval of relevant data and quick subsetting of datasets. The set of tools in the framework provide a full pipeline to normalize, load, manage and query analytical results for algorithm evaluation. PMID:23599905
Office of university affairs management information system: Users guide and documentation
NASA Technical Reports Server (NTRS)
Distin, J.; Goodwin, D.; Greene, W. A.
1977-01-01
Data on the NASA-University relationship are reported that encompass research in over 600 schools through several thousand grants and contracts. This user-driven system is capable of producing a variety of cyclical and query-type reports describing the total NASA-University profile. The capabilities, designed as part of the system, require a minimum of user maintenance in order to ensure system efficiency and data validity to meet the recurrent Statutory and Executive Branch information requirements as well as ad hoc inquiries from NASA general management, Congress, other Federal agencies, private sector organizations, universities and individuals. The data base contains information on each university, the individual projects and the financial details, current and historic, on all contracts and grants. Complete details are given on the system from its unique design features to the actual steps required for daily operation.
Database Reports Over the Internet
NASA Technical Reports Server (NTRS)
Smith, Dean Lance
2002-01-01
Most of the summer was spent developing software that would permit existing test report forms to be printed over the web on a printer that is supported by Adobe Acrobat Reader. The data is stored in a DBMS (Data Base Management System). The client asks for the information from the database using an HTML (Hyper Text Markup Language) form in a web browser. JavaScript is used with the forms to assist the user and verify the integrity of the entered data. Queries to a database are made in SQL (Sequential Query Language), a widely supported standard for making queries to databases. Java servlets, programs written in the Java programming language running under the control of network server software, interrogate the database and complete a PDF form template kept in a file. The completed report is sent to the browser requesting the report. Some errors are sent to the browser in an HTML web page, others are reported to the server. Access to the databases was restricted since the data are being transported to new DBMS software that will run on new hardware. However, the SQL queries were made to Microsoft Access, a DBMS that is available on most PCs (Personal Computers). Access does support the SQL commands that were used, and a database was created with Access that contained typical data for the report forms. Some of the problems and features are discussed below.
An Exponentiation Method for XML Element Retrieval
2014-01-01
XML document is now widely used for modelling and storing structured documents. The structure is very rich and carries important information about contents and their relationships, for example, e-Commerce. XML data-centric collections require query terms allowing users to specify constraints on the document structure; mapping structure queries and assigning the weight are significant for the set of possibly relevant documents with respect to structural conditions. In this paper, we present an extension to the MEXIR search system that supports the combination of structural and content queries in the form of content-and-structure queries, which we call the Exponentiation function. It has been shown the structural information improve the effectiveness of the search system up to 52.60% over the baseline BM25 at MAP. PMID:24696643
Representing and querying now-relative relational medical data.
Anselma, Luca; Piovesan, Luca; Stantic, Bela; Terenziani, Paolo
2018-03-01
Temporal information plays a crucial role in medicine. Patients' clinical records are intrinsically temporal. Thus, in Medical Informatics there is an increasing need to store, support and query temporal data (particularly in relational databases), in order, for instance, to supplement decision-support systems. In this paper, we show that current approaches to relational data have remarkable limitations in the treatment of "now-relative" data (i.e., data holding true at the current time). This can severely compromise their applicability in general, and specifically in the medical context, where "now-relative" data are essential to assess the current status of the patients. We propose a theoretically grounded and application-independent relational approach to cope with now-relative data (which can be paired, e.g., with different decision support systems) overcoming such limitations. We propose a new temporal relational representation, which is the first relational model coping with the temporal indeterminacy intrinsic in now-relative data. We also propose new temporal algebraic operators to query them, supporting the distinction between possible and necessary time, and Allen's temporal relations between data. We exemplify the impact of our approach, and study the theoretical and computational properties of the new representation and algebra. Copyright © 2018 Elsevier B.V. All rights reserved.
Machine Translation-Supported Cross-Language Information Retrieval for a Consumer Health Resource
Rosemblat, Graciela; Gemoets, Darren; Browne, Allen C.; Tse, Tony
2003-01-01
The U.S. National Institutes of Health, through its National Library of Medicine, developed ClinicalTrials.gov to provide the public with easy access to information on clinical trials on a wide range of conditions or diseases. Only English language information retrieval is currently supported. Given the growing number of Spanish speakers in the U.S. and their increasing use of the Web, we anticipate a significant increase in Spanish-speaking users. This study compares the effectiveness of two common cross-language information retrieval methods using machine translation, query translation versus document translation, using a subset of genuine user queries from ClinicalTrials.gov. Preliminary results conducted with the ClinicalTrials.gov search engine show that in our environment, query translation is statistically significantly better than document translation. We discuss possible reasons for this result and we conclude with suggestions for future work. PMID:14728236
Supporting diagnosis and treatment in medical care based on Big Data processing.
Lupşe, Oana-Sorina; Crişan-Vida, Mihaela; Stoicu-Tivadar, Lăcrămioara; Bernard, Elena
2014-01-01
With information and data in all domains growing every day, it is difficult to manage and extract useful knowledge for specific situations. This paper presents an integrated system architecture to support the activity in the Ob-Gin departments with further developments in using new technology to manage Big Data processing - using Google BigQuery - in the medical domain. The data collected and processed with Google BigQuery results from different sources: two Obstetrics & Gynaecology Departments, the TreatSuggest application - an application for suggesting treatments, and a home foetal surveillance system. Data is uploaded in Google BigQuery from Bega Hospital Timişoara, Romania. The analysed data is useful for the medical staff, researchers and statisticians from public health domain. The current work describes the technological architecture and its processing possibilities that in the future will be proved based on quality criteria to lead to a better decision process in diagnosis and public health.
Reactome graph database: Efficient access to complex pathway data
Korninger, Florian; Viteri, Guilherme; Marin-Garcia, Pablo; Ping, Peipei; Wu, Guanming; Stein, Lincoln; D’Eustachio, Peter
2018-01-01
Reactome is a free, open-source, open-data, curated and peer-reviewed knowledgebase of biomolecular pathways. One of its main priorities is to provide easy and efficient access to its high quality curated data. At present, biological pathway databases typically store their contents in relational databases. This limits access efficiency because there are performance issues associated with queries traversing highly interconnected data. The same data in a graph database can be queried more efficiently. Here we present the rationale behind the adoption of a graph database (Neo4j) as well as the new ContentService (REST API) that provides access to these data. The Neo4j graph database and its query language, Cypher, provide efficient access to the complex Reactome data model, facilitating easy traversal and knowledge discovery. The adoption of this technology greatly improved query efficiency, reducing the average query time by 93%. The web service built on top of the graph database provides programmatic access to Reactome data by object oriented queries, but also supports more complex queries that take advantage of the new underlying graph-based data storage. By adopting graph database technology we are providing a high performance pathway data resource to the community. The Reactome graph database use case shows the power of NoSQL database engines for complex biological data types. PMID:29377902
Labeling RDF Graphs for Linear Time and Space Querying
NASA Astrophysics Data System (ADS)
Furche, Tim; Weinzierl, Antonius; Bry, François
Indices and data structures for web querying have mostly considered tree shaped data, reflecting the view of XML documents as tree-shaped. However, for RDF (and when querying ID/IDREF constraints in XML) data is indisputably graph-shaped. In this chapter, we first study existing indexing and labeling schemes for RDF and other graph datawith focus on support for efficient adjacency and reachability queries. For XML, labeling schemes are an important part of the widespread adoption of XML, in particular for mapping XML to existing (relational) database technology. However, the existing indexing and labeling schemes for RDF (and graph data in general) sacrifice one of the most attractive properties of XML labeling schemes, the constant time (and per-node space) test for adjacency (child) and reachability (descendant). In the second part, we introduce the first labeling scheme for RDF data that retains this property and thus achieves linear time and space processing of acyclic RDF queries on a significantly larger class of graphs than previous approaches (which are mostly limited to tree-shaped data). Finally, we show how this labeling scheme can be applied to (acyclic) SPARQL queries to obtain an evaluation algorithm with time and space complexity linear in the number of resources in the queried RDF graph.
An approach for heterogeneous and loosely coupled geospatial data distributed computing
NASA Astrophysics Data System (ADS)
Chen, Bin; Huang, Fengru; Fang, Yu; Huang, Zhou; Lin, Hui
2010-07-01
Most GIS (Geographic Information System) applications tend to have heterogeneous and autonomous geospatial information resources, and the availability of these local resources is unpredictable and dynamic under a distributed computing environment. In order to make use of these local resources together to solve larger geospatial information processing problems that are related to an overall situation, in this paper, with the support of peer-to-peer computing technologies, we propose a geospatial data distributed computing mechanism that involves loosely coupled geospatial resource directories and a term named as Equivalent Distributed Program of global geospatial queries to solve geospatial distributed computing problems under heterogeneous GIS environments. First, a geospatial query process schema for distributed computing as well as a method for equivalent transformation from a global geospatial query to distributed local queries at SQL (Structured Query Language) level to solve the coordinating problem among heterogeneous resources are presented. Second, peer-to-peer technologies are used to maintain a loosely coupled network environment that consists of autonomous geospatial information resources, thus to achieve decentralized and consistent synchronization among global geospatial resource directories, and to carry out distributed transaction management of local queries. Finally, based on the developed prototype system, example applications of simple and complex geospatial data distributed queries are presented to illustrate the procedure of global geospatial information processing.
Reactome graph database: Efficient access to complex pathway data.
Fabregat, Antonio; Korninger, Florian; Viteri, Guilherme; Sidiropoulos, Konstantinos; Marin-Garcia, Pablo; Ping, Peipei; Wu, Guanming; Stein, Lincoln; D'Eustachio, Peter; Hermjakob, Henning
2018-01-01
Reactome is a free, open-source, open-data, curated and peer-reviewed knowledgebase of biomolecular pathways. One of its main priorities is to provide easy and efficient access to its high quality curated data. At present, biological pathway databases typically store their contents in relational databases. This limits access efficiency because there are performance issues associated with queries traversing highly interconnected data. The same data in a graph database can be queried more efficiently. Here we present the rationale behind the adoption of a graph database (Neo4j) as well as the new ContentService (REST API) that provides access to these data. The Neo4j graph database and its query language, Cypher, provide efficient access to the complex Reactome data model, facilitating easy traversal and knowledge discovery. The adoption of this technology greatly improved query efficiency, reducing the average query time by 93%. The web service built on top of the graph database provides programmatic access to Reactome data by object oriented queries, but also supports more complex queries that take advantage of the new underlying graph-based data storage. By adopting graph database technology we are providing a high performance pathway data resource to the community. The Reactome graph database use case shows the power of NoSQL database engines for complex biological data types.
A journey to Semantic Web query federation in the life sciences.
Cheung, Kei-Hoi; Frost, H Robert; Marshall, M Scott; Prud'hommeaux, Eric; Samwald, Matthias; Zhao, Jun; Paschke, Adrian
2009-10-01
As interest in adopting the Semantic Web in the biomedical domain continues to grow, Semantic Web technology has been evolving and maturing. A variety of technological approaches including triplestore technologies, SPARQL endpoints, Linked Data, and Vocabulary of Interlinked Datasets have emerged in recent years. In addition to the data warehouse construction, these technological approaches can be used to support dynamic query federation. As a community effort, the BioRDF task force, within the Semantic Web for Health Care and Life Sciences Interest Group, is exploring how these emerging approaches can be utilized to execute distributed queries across different neuroscience data sources. We have created two health care and life science knowledge bases. We have explored a variety of Semantic Web approaches to describe, map, and dynamically query multiple datasets. We have demonstrated several federation approaches that integrate diverse types of information about neurons and receptors that play an important role in basic, clinical, and translational neuroscience research. Particularly, we have created a prototype receptor explorer which uses OWL mappings to provide an integrated list of receptors and executes individual queries against different SPARQL endpoints. We have also employed the AIDA Toolkit, which is directed at groups of knowledge workers who cooperatively search, annotate, interpret, and enrich large collections of heterogeneous documents from diverse locations. We have explored a tool called "FeDeRate", which enables a global SPARQL query to be decomposed into subqueries against the remote databases offering either SPARQL or SQL query interfaces. Finally, we have explored how to use the vocabulary of interlinked Datasets (voiD) to create metadata for describing datasets exposed as Linked Data URIs or SPARQL endpoints. We have demonstrated the use of a set of novel and state-of-the-art Semantic Web technologies in support of a neuroscience query federation scenario. We have identified both the strengths and weaknesses of these technologies. While Semantic Web offers a global data model including the use of Uniform Resource Identifiers (URI's), the proliferation of semantically-equivalent URI's hinders large scale data integration. Our work helps direct research and tool development, which will be of benefit to this community.
A journey to Semantic Web query federation in the life sciences
Cheung, Kei-Hoi; Frost, H Robert; Marshall, M Scott; Prud'hommeaux, Eric; Samwald, Matthias; Zhao, Jun; Paschke, Adrian
2009-01-01
Background As interest in adopting the Semantic Web in the biomedical domain continues to grow, Semantic Web technology has been evolving and maturing. A variety of technological approaches including triplestore technologies, SPARQL endpoints, Linked Data, and Vocabulary of Interlinked Datasets have emerged in recent years. In addition to the data warehouse construction, these technological approaches can be used to support dynamic query federation. As a community effort, the BioRDF task force, within the Semantic Web for Health Care and Life Sciences Interest Group, is exploring how these emerging approaches can be utilized to execute distributed queries across different neuroscience data sources. Methods and results We have created two health care and life science knowledge bases. We have explored a variety of Semantic Web approaches to describe, map, and dynamically query multiple datasets. We have demonstrated several federation approaches that integrate diverse types of information about neurons and receptors that play an important role in basic, clinical, and translational neuroscience research. Particularly, we have created a prototype receptor explorer which uses OWL mappings to provide an integrated list of receptors and executes individual queries against different SPARQL endpoints. We have also employed the AIDA Toolkit, which is directed at groups of knowledge workers who cooperatively search, annotate, interpret, and enrich large collections of heterogeneous documents from diverse locations. We have explored a tool called "FeDeRate", which enables a global SPARQL query to be decomposed into subqueries against the remote databases offering either SPARQL or SQL query interfaces. Finally, we have explored how to use the vocabulary of interlinked Datasets (voiD) to create metadata for describing datasets exposed as Linked Data URIs or SPARQL endpoints. Conclusion We have demonstrated the use of a set of novel and state-of-the-art Semantic Web technologies in support of a neuroscience query federation scenario. We have identified both the strengths and weaknesses of these technologies. While Semantic Web offers a global data model including the use of Uniform Resource Identifiers (URI's), the proliferation of semantically-equivalent URI's hinders large scale data integration. Our work helps direct research and tool development, which will be of benefit to this community. PMID:19796394
A Reference Implementation of the OGC CSW EO Standard for the ESA HMA-T project
NASA Astrophysics Data System (ADS)
Bigagli, Lorenzo; Boldrini, Enrico; Papeschi, Fabrizio; Vitale, Fabrizio
2010-05-01
This work was developed in the context of the ESA Heterogeneous Missions Accessibility (HMA) project, whose main objective is to involve the stakeholders, namely National space agencies, satellite or mission owners and operators, in an harmonization and standardization process of their ground segment services and related interfaces. Among HMA objectives was the specification, conformance testing, and experimentation of two Extension Packages (EPs) of the ebRIM Application Profile (AP) of the OGC Catalog Service for the Web (CSW) specification: the Earth Observation Products (EO) EP (OGC 06-131) and the Cataloguing of ISO Metadata (CIM) EP (OGC 07-038). Our contributions have included the development and deployment of Reference Implementations (RIs) for both the above specifications, and their integration with the ESA Service Support Environment (SSE). The RIs are based on the GI-cat framework, an implementation of a distributed catalog service, able to query disparate Earth and Space Science data sources (e.g. OGC Web Services, Unidata THREDDS) and to expose several standard interfaces for data discovery (e.g. OGC CSW ISO AP). Following our initial planning, the GI-cat framework has been extended in order to expose the CSW.ebRIM-CIM and CSW.ebRIM-EO interfaces, and to distribute queries to CSW.ebRIM-CIM and CSW.ebRIM-EO data sources. We expected that a mapping strategy would suffice for accommodating CIM, but this proved to be unpractical during implementation. Hence, a model extension strategy was eventually implemented for both the CIM and EO EPs, and the GI-cat federal model was enhanced in order to support the underlying ebRIM AP. This work has provided us with new insights into the different data models for geospatial data, and the technologies for their implementation. The extension is used by suitable CIM and EO profilers (front-end mediator components) and accessors (back-end mediator components), that relate ISO 19115 concepts to EO and CIM ones. Moreover, a mapping to GI-cat federal model was developed for each EP (quite limited for EO; complete for CIM), in order to enable the discovery of resources through any of GI-cat profilers. The query manager was also improved. GI-cat-EO and -CIM installation packages were made available for distribution, and two RI instances were deployed on the Amazon EC2 facility (plus an ad-hoc instance returning incorrect control data). Integration activities of the EO RI with the ESA SSE Portal for Earth Observation Products were also successfully carried on. During our work, we have contributed feedback and comments to the CIM and EO EP specification working groups. Our contributions resulted in version 0.2.5 of the EO EP, recently approved as an OGC standard, and were useful to consolidate version 0.1.11 of the CIM EP (still being developed).
On knowledge transfer management as a learning process for ad hoc teams
NASA Astrophysics Data System (ADS)
Iliescu, D.
2017-08-01
Knowledge management represents an emerging domain becoming more and more important. Concepts like knowledge codification and personalisation, knowledge life-cycle, social and technological dimensions, knowledge transfer and learning management are integral parts. Focus goes here in the process of knowledge transfer for the case of ad hoc teams. The social dimension of knowledge transfer plays an important role. No single individual actors involved in the process, but a collective one, representing the organisation. It is critically important for knowledge to be managed from the life-cycle point of view. A complex communication network needs to be in place to supports the process of knowledge transfer. Two particular concepts, the bridge tie and transactive memory, would eventually enhance the communication. The paper focuses on an informational communication platform supporting the collaborative work on knowledge transfer. The platform facilitates the creation of a topic language to be used in knowledge modelling, storage and reuse, by the ad hoc teams.
Ziccardi, Linda M; Edgington, Aaron; Hentz, Karyn; Kulacki, Konrad J; Kane Driscoll, Susan
2016-07-01
A state-of-the-science review was conducted to examine the potential for microplastics to sorb hydrophobic organic chemicals (HOCs) from the marine environment, for aquatic organisms to take up these HOCs from the microplastics, and for this exposure to result in adverse effects to ecological and human health. Despite concentrations of HOCs associated with microplastics that can be orders of magnitude greater than surrounding seawater, the relative importance of microplastics as a route of exposure is difficult to quantify because aquatic organisms are typically exposed to HOCs from various compartments, including water, sediment, and food. Results of laboratory experiments and modeling studies indicate that HOCs can partition from microplastics to organisms or from organisms to microplastics, depending on experimental conditions. Very little information is available to evaluate ecological or human health effects from this exposure. Most of the available studies measured biomarkers that are more indicative of exposure than effects, and no studies showed effects to ecologically relevant endpoints. Therefore, evidence is weak to support the occurrence of ecologically significant adverse effects on aquatic life as a result of exposure to HOCs sorbed to microplastics or to wildlife populations and humans from secondary exposure via the food chain. More data are needed to fully understand the relative importance of exposure to HOCs from microplastics compared with other exposure pathways. Environ Toxicol Chem 2016;35:1667-1676. © 2016 SETAC. © 2016 SETAC.
Semantic-based surveillance video retrieval.
Hu, Weiming; Xie, Dan; Fu, Zhouyu; Zeng, Wenrong; Maybank, Steve
2007-04-01
Visual surveillance produces large amounts of video data. Effective indexing and retrieval from surveillance video databases are very important. Although there are many ways to represent the content of video clips in current video retrieval algorithms, there still exists a semantic gap between users and retrieval systems. Visual surveillance systems supply a platform for investigating semantic-based video retrieval. In this paper, a semantic-based video retrieval framework for visual surveillance is proposed. A cluster-based tracking algorithm is developed to acquire motion trajectories. The trajectories are then clustered hierarchically using the spatial and temporal information, to learn activity models. A hierarchical structure of semantic indexing and retrieval of object activities, where each individual activity automatically inherits all the semantic descriptions of the activity model to which it belongs, is proposed for accessing video clips and individual objects at the semantic level. The proposed retrieval framework supports various queries including queries by keywords, multiple object queries, and queries by sketch. For multiple object queries, succession and simultaneity restrictions, together with depth and breadth first orders, are considered. For sketch-based queries, a method for matching trajectories drawn by users to spatial trajectories is proposed. The effectiveness and efficiency of our framework are tested in a crowded traffic scene.
Shuttle-Data-Tape XML Translator
NASA Technical Reports Server (NTRS)
Barry, Matthew R.; Osborne, Richard N.
2005-01-01
JSDTImport is a computer program for translating native Shuttle Data Tape (SDT) files from American Standard Code for Information Interchange (ASCII) format into databases in other formats. JSDTImport solves the problem of organizing the SDT content, affording flexibility to enable users to choose how to store the information in a database to better support client and server applications. JSDTImport can be dynamically configured by use of a simple Extensible Markup Language (XML) file. JSDTImport uses this XML file to define how each record and field will be parsed, its layout and definition, and how the resulting database will be structured. JSDTImport also includes a client application programming interface (API) layer that provides abstraction for the data-querying process. The API enables a user to specify the search criteria to apply in gathering all the data relevant to a query. The API can be used to organize the SDT content and translate into a native XML database. The XML format is structured into efficient sections, enabling excellent query performance by use of the XPath query language. Optionally, the content can be translated into a Structured Query Language (SQL) database for fast, reliable SQL queries on standard database server computers.
A data analysis expert system for large established distributed databases
NASA Technical Reports Server (NTRS)
Gnacek, Anne-Marie; An, Y. Kim; Ryan, J. Patrick
1987-01-01
A design for a natural language database interface system, called the Deductively Augmented NASA Management Decision support System (DANMDS), is presented. The DANMDS system components have been chosen on the basis of the following considerations: maximal employment of the existing NASA IBM-PC computers and supporting software; local structuring and storing of external data via the entity-relationship model; a natural easy-to-use error-free database query language; user ability to alter query language vocabulary and data analysis heuristic; and significant artificial intelligence data analysis heuristic techniques that allow the system to become progressively and automatically more useful.
Guided Iterative Substructure Search (GI-SSS) - A New Trick for an Old Dog.
Weskamp, Nils
2016-07-01
Substructure search (SSS) is a fundamental technique supported by various chemical information systems. Many users apply it in an iterative manner: they modify their queries to shape the composition of the retrieved hit sets according to their needs. We propose and evaluate two heuristic extensions of SSS aimed at simplifying these iterative query modifications by collecting additional information during query processing and visualizing this information in an intuitive way. This gives the user a convenient feedback on how certain changes to the query would affect the retrieved hit set and reduces the number of trial-and-error cycles needed to generate an optimal search result. The proposed heuristics are simple, yet surprisingly effective and can be easily added to existing SSS implementations. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
An efficient approach for video information retrieval
NASA Astrophysics Data System (ADS)
Dong, Daoguo; Xue, Xiangyang
2005-01-01
Today, more and more video information can be accessed through internet, satellite, etc.. Retrieving specific video information from large-scale video database has become an important and challenging research topic in the area of multimedia information retrieval. In this paper, we introduce a new and efficient index structure OVA-File, which is a variant of VA-File. In OVA-File, the approximations close to each other in data space are stored in close positions of the approximation file. The benefit is that only a part of approximations close to the query vector need to be visited to get the query result. Both shot query algorithm and video clip algorithm are proposed to support video information retrieval efficiently. The experimental results showed that the queries based on OVA-File were much faster than that based on VA-File with small loss of result quality.
Querying archetype-based EHRs by search ontology-based XPath engineering.
Kropf, Stefan; Uciteli, Alexandr; Schierle, Katrin; Krücken, Peter; Denecke, Kerstin; Herre, Heinrich
2018-05-11
Legacy data and new structured data can be stored in a standardized format as XML-based EHRs on XML databases. Querying documents on these databases is crucial for answering research questions. Instead of using free text searches, that lead to false positive results, the precision can be increased by constraining the search to certain parts of documents. A search ontology-based specification of queries on XML documents defines search concepts and relates them to parts in the XML document structure. Such query specification method is practically introduced and evaluated by applying concrete research questions formulated in natural language on a data collection for information retrieval purposes. The search is performed by search ontology-based XPath engineering that reuses ontologies and XML-related W3C standards. The key result is that the specification of research questions can be supported by the usage of search ontology-based XPath engineering. A deeper recognition of entities and a semantic understanding of the content is necessary for a further improvement of precision and recall. Key limitation is that the application of the introduced process requires skills in ontology and software development. In future, the time consuming ontology development could be overcome by implementing a new clinical role: the clinical ontologist. The introduced Search Ontology XML extension connects Search Terms to certain parts in XML documents and enables an ontology-based definition of queries. Search ontology-based XPath engineering can support research question answering by the specification of complex XPath expressions without deep syntax knowledge about XPaths.
Fast Query-Optimized Kernel-Machine Classification
NASA Technical Reports Server (NTRS)
Mazzoni, Dominic; DeCoste, Dennis
2004-01-01
A recently developed algorithm performs kernel-machine classification via incremental approximate nearest support vectors. The algorithm implements support-vector machines (SVMs) at speeds 10 to 100 times those attainable by use of conventional SVM algorithms. The algorithm offers potential benefits for classification of images, recognition of speech, recognition of handwriting, and diverse other applications in which there are requirements to discern patterns in large sets of data. SVMs constitute a subset of kernel machines (KMs), which have become popular as models for machine learning and, more specifically, for automated classification of input data on the basis of labeled training data. While similar in many ways to k-nearest-neighbors (k-NN) models and artificial neural networks (ANNs), SVMs tend to be more accurate. Using representations that scale only linearly in the numbers of training examples, while exploring nonlinear (kernelized) feature spaces that are exponentially larger than the original input dimensionality, KMs elegantly and practically overcome the classic curse of dimensionality. However, the price that one must pay for the power of KMs is that query-time complexity scales linearly with the number of training examples, making KMs often orders of magnitude more computationally expensive than are ANNs, decision trees, and other popular machine learning alternatives. The present algorithm treats an SVM classifier as a special form of a k-NN. The algorithm is based partly on an empirical observation that one can often achieve the same classification as that of an exact KM by using only small fraction of the nearest support vectors (SVs) of a query. The exact KM output is a weighted sum over the kernel values between the query and the SVs. In this algorithm, the KM output is approximated with a k-NN classifier, the output of which is a weighted sum only over the kernel values involving k selected SVs. Before query time, there are gathered statistics about how misleading the output of the k-NN model can be, relative to the outputs of the exact KM for a representative set of examples, for each possible k from 1 to the total number of SVs. From these statistics, there are derived upper and lower thresholds for each step k. These thresholds identify output levels for which the particular variant of the k-NN model already leans so strongly positively or negatively that a reversal in sign is unlikely, given the weaker SV neighbors still remaining. At query time, the partial output of each query is incrementally updated, stopping as soon as it exceeds the predetermined statistical thresholds of the current step. For an easy query, stopping can occur as early as step k = 1. For more difficult queries, stopping might not occur until nearly all SVs are touched. A key empirical observation is that this approach can tolerate very approximate nearest-neighbor orderings. In experiments, SVs and queries were projected to a subspace comprising the top few principal- component dimensions and neighbor orderings were computed in that subspace. This approach ensured that the overhead of the nearest-neighbor computations was insignificant, relative to that of the exact KM computation.
A distributed query execution engine of big attributed graphs.
Batarfi, Omar; Elshawi, Radwa; Fayoumi, Ayman; Barnawi, Ahmed; Sakr, Sherif
2016-01-01
A graph is a popular data model that has become pervasively used for modeling structural relationships between objects. In practice, in many real-world graphs, the graph vertices and edges need to be associated with descriptive attributes. Such type of graphs are referred to as attributed graphs. G-SPARQL has been proposed as an expressive language, with a centralized execution engine, for querying attributed graphs. G-SPARQL supports various types of graph querying operations including reachability, pattern matching and shortest path where any G-SPARQL query may include value-based predicates on the descriptive information (attributes) of the graph edges/vertices in addition to the structural predicates. In general, a main limitation of centralized systems is that their vertical scalability is always restricted by the physical limits of computer systems. This article describes the design, implementation in addition to the performance evaluation of DG-SPARQL, a distributed, hybrid and adaptive parallel execution engine of G-SPARQL queries. In this engine, the topology of the graph is distributed over the main memory of the underlying nodes while the graph data are maintained in a relational store which is replicated on the disk of each of the underlying nodes. DG-SPARQL evaluates parts of the query plan via SQL queries which are pushed to the underlying relational stores while other parts of the query plan, as necessary, are evaluated via indexless memory-based graph traversal algorithms. Our experimental evaluation shows the efficiency and the scalability of DG-SPARQL on querying massive attributed graph datasets in addition to its ability to outperform the performance of Apache Giraph, a popular distributed graph processing system, by orders of magnitudes.
SLIDE - a web-based tool for interactive visualization of large-scale -omics data.
Ghosh, Soumita; Datta, Abhik; Tan, Kaisen; Choi, Hyungwon
2018-06-28
Data visualization is often regarded as a post hoc step for verifying statistically significant results in the analysis of high-throughput data sets. This common practice leaves a large amount of raw data behind, from which more information can be extracted. However, existing solutions do not provide capabilities to explore large-scale raw datasets using biologically sensible queries, nor do they allow user interaction based real-time customization of graphics. To address these drawbacks, we have designed an open-source, web-based tool called Systems-Level Interactive Data Exploration, or SLIDE to visualize large-scale -omics data interactively. SLIDE's interface makes it easier for scientists to explore quantitative expression data in multiple resolutions in a single screen. SLIDE is publicly available under BSD license both as an online version as well as a stand-alone version at https://github.com/soumitag/SLIDE. Supplementary Information are available at Bioinformatics online.
A web-based platform for virtual screening.
Watson, Paul; Verdonk, Marcel; Hartshorn, Michael J
2003-09-01
A fully integrated, web-based, virtual screening platform has been developed to allow rapid virtual screening of large numbers of compounds. ORACLE is used to store information at all stages of the process. The system includes a large database of historical compounds from high throughput screenings (HTS) chemical suppliers, ATLAS, containing over 3.1 million unique compounds with their associated physiochemical properties (ClogP, MW, etc.). The database can be screened using a web-based interface to produce compound subsets for virtual screening or virtual library (VL) enumeration. In order to carry out the latter task within ORACLE a reaction data cartridge has been developed. Virtual libraries can be enumerated rapidly using the web-based interface to the cartridge. The compound subsets can be seamlessly submitted for virtual screening experiments, and the results can be viewed via another web-based interface allowing ad hoc querying of the virtual screening data stored in ORACLE.
A user-friendly tool for medical-related patent retrieval.
Pasche, Emilie; Gobeill, Julien; Teodoro, Douglas; Gaudinat, Arnaud; Vishnyakova, Dina; Lovis, Christian; Ruch, Patrick
2012-01-01
Health-related information retrieval is complicated by the variety of nomenclatures available to name entities, since different communities of users will use different ways to name a same entity. We present in this report the development and evaluation of a user-friendly interactive Web application aiming at facilitating health-related patent search. Our tool, called TWINC, relies on a search engine tuned during several patent retrieval competitions, enhanced with intelligent interaction modules, such as chemical query, normalization and expansion. While the functionality of related article search showed promising performances, the ad hoc search results in fairly contrasted results. Nonetheless, TWINC performed well during the PatOlympics competition and was appreciated by intellectual property experts. This result should be balanced by the limited evaluation sample. We can also assume that it can be customized to be applied in corporate search environments to process domain and company-specific vocabularies, including non-English literature and patents reports.
Quantum load balancing in ad hoc networks
NASA Astrophysics Data System (ADS)
Hasanpour, M.; Shariat, S.; Barnaghi, P.; Hoseinitabatabaei, S. A.; Vahid, S.; Tafazolli, R.
2017-06-01
This paper presents a novel approach in targeting load balancing in ad hoc networks utilizing the properties of quantum game theory. This approach benefits from the instantaneous and information-less capability of entangled particles to synchronize the load balancing strategies in ad hoc networks. The quantum load balancing (QLB) algorithm proposed by this work is implemented on top of OLSR as the baseline routing protocol; its performance is analyzed against the baseline OLSR, and considerable gain is reported regarding some of the main QoS metrics such as delay and jitter. Furthermore, it is shown that QLB algorithm supports a solid stability gain in terms of throughput which stands a proof of concept for the load balancing properties of the proposed theory.
Jupiter, Daniel; Chen, Hailin; VanBuren, Vincent
2009-01-01
Background Although expression microarrays have become a standard tool used by biologists, analysis of data produced by microarray experiments may still present challenges. Comparison of data from different platforms, organisms, and labs may involve complicated data processing, and inferring relationships between genes remains difficult. Results STARNET 2 is a new web-based tool that allows post hoc visual analysis of correlations that are derived from expression microarray data. STARNET 2 facilitates user discovery of putative gene regulatory networks in a variety of species (human, rat, mouse, chicken, zebrafish, Drosophila, C. elegans, S. cerevisiae, Arabidopsis and rice) by graphing networks of genes that are closely co-expressed across a large heterogeneous set of preselected microarray experiments. For each of the represented organisms, raw microarray data were retrieved from NCBI's Gene Expression Omnibus for a selected Affymetrix platform. All pairwise Pearson correlation coefficients were computed for expression profiles measured on each platform, respectively. These precompiled results were stored in a MySQL database, and supplemented by additional data retrieved from NCBI. A web-based tool allows user-specified queries of the database, centered at a gene of interest. The result of a query includes graphs of correlation networks, graphs of known interactions involving genes and gene products that are present in the correlation networks, and initial statistical analyses. Two analyses may be performed in parallel to compare networks, which is facilitated by the new HEATSEEKER module. Conclusion STARNET 2 is a useful tool for developing new hypotheses about regulatory relationships between genes and gene products, and has coverage for 10 species. Interpretation of the correlation networks is supported with a database of previously documented interactions, a test for enrichment of Gene Ontology terms, and heat maps of correlation distances that may be used to compare two networks. The list of genes in a STARNET network may be useful in developing a list of candidate genes to use for the inference of causal networks. The tool is freely available at , and does not require user registration. PMID:19828039
Regular paths in SparQL: querying the NCI Thesaurus.
Detwiler, Landon T; Suciu, Dan; Brinkley, James F
2008-11-06
OWL, the Web Ontology Language, provides syntax and semantics for representing knowledge for the semantic web. Many of the constructs of OWL have a basis in the field of description logics. While the formal underpinnings of description logics have lead to a highly computable language, it has come at a cognitive cost. OWL ontologies are often unintuitive to readers lacking a strong logic background. In this work we describe GLEEN, a regular path expression library, which extends the RDF query language SparQL to support complex path expressions over OWL and other RDF-based ontologies. We illustrate the utility of GLEEN by showing how it can be used in a query-based approach to defining simpler, more intuitive views of OWL ontologies. In particular we show how relatively simple GLEEN-enhanced SparQL queries can create views of the OWL version of the NCI Thesaurus that match the views generated by the web-based NCI browser.
Secure Mobile Distributed File System (MDFS)
2011-03-01
dissemination of data. In a mobile ad - hoc network, there are two classes of devices: content generators and content consumers. One im- plementation of...use of infrastructure mode is necessary because current Android implemen- tations do not support Mobile Ad - Hoc network without modification of the...NUMBER (include area code ) Standard Form 298 (Rev. 8–98) Prescribed by ANSI Std. Z39.18 24–3–2011 Master’s Thesis 2009-03-01—2011-03-31 Secure Mobile
Optimization of Extended Relational Database Systems
1986-07-23
control functions are integrated into a single system in a homogeneoua way. As a first exam - ple, consider previous work in supporting various semantic...sizes are reduced and, wnk? quently, the number of materializations that will be needed is aba lower. For exam - pie, in the above query tuple...retrieve (EMP.name) where EMP hobbies instrument = ’ violin ’ When the various entries in the hobbies field are materialized, only those queries that
Entity Bases: Large-Scale Knowledgebases for Intelligence Data
2009-02-01
declaratively expressed as Datalog rules . The EntityBase supports two query scenarios: • Free-Form Querying: A human analyst or a client program can pose...integration, Prometheus follows the Inverse Rules algo- rithm (Duschka 1997) with additional optimizations (Thakkar et al. 2005). We use the mediator...Discovery and Data Mining (PAKDD), Sydney, Australia. Crammer , K., Dekel, O., Keshet, J., Shalev-Shwartz, S., and Singer, Y. (2006). Online passive
Secure Database Management Study.
1978-12-01
covers cases Involving indus- trial economics (e.g., Industrial spies) and commercial finances (e.g., fraud). Priv¢j--Protection of date about people...California, Berke - lay [STONM76aI. * The approach to protection taken in INGRE (STOM74| has attracted a lot of Interest* Queries, in a high level query...Material Command Support Activity (NMCSA), and another DoD agency, Cullinane Corporation developed a prototype version of the IDS database system on a
Ong, Edison; Xiang, Zuoshuang; Zhao, Bin; Liu, Yue; Lin, Yu; Zheng, Jie; Mungall, Chris; Courtot, Mélanie; Ruttenberg, Alan; He, Yongqun
2017-01-01
Linked Data (LD) aims to achieve interconnected data by representing entities using Unified Resource Identifiers (URIs), and sharing information using Resource Description Frameworks (RDFs) and HTTP. Ontologies, which logically represent entities and relations in specific domains, are the basis of LD. Ontobee (http://www.ontobee.org/) is a linked ontology data server that stores ontology information using RDF triple store technology and supports query, visualization and linkage of ontology terms. Ontobee is also the default linked data server for publishing and browsing biomedical ontologies in the Open Biological Ontology (OBO) Foundry (http://obofoundry.org) library. Ontobee currently hosts more than 180 ontologies (including 131 OBO Foundry Library ontologies) with over four million terms. Ontobee provides a user-friendly web interface for querying and visualizing the details and hierarchy of a specific ontology term. Using the eXtensible Stylesheet Language Transformation (XSLT) technology, Ontobee is able to dereference a single ontology term URI, and then output RDF/eXtensible Markup Language (XML) for computer processing or display the HTML information on a web browser for human users. Statistics and detailed information are generated and displayed for each ontology listed in Ontobee. In addition, a SPARQL web interface is provided for custom advanced SPARQL queries of one or multiple ontologies. PMID:27733503
A weight based genetic algorithm for selecting views
NASA Astrophysics Data System (ADS)
Talebian, Seyed H.; Kareem, Sameem A.
2013-03-01
Data warehouse is a technology designed for supporting decision making. Data warehouse is made by extracting large amount of data from different operational systems; transforming it to a consistent form and loading it to the central repository. The type of queries in data warehouse environment differs from those in operational systems. In contrast to operational systems, the analytical queries that are issued in data warehouses involve summarization of large volume of data and therefore in normal circumstance take a long time to be answered. On the other hand, the result of these queries must be answered in a short time to enable managers to make decisions as short time as possible. As a result, an essential need in this environment is in improving the performances of queries. One of the most popular methods to do this task is utilizing pre-computed result of queries. In this method, whenever a new query is submitted by the user instead of calculating the query on the fly through a large underlying database, the pre-computed result or views are used to answer the queries. Although, the ideal option would be pre-computing and saving all possible views, but, in practice due to disk space constraint and overhead due to view updates it is not considered as a feasible choice. Therefore, we need to select a subset of possible views to save on disk. The problem of selecting the right subset of views is considered as an important challenge in data warehousing. In this paper we suggest a Weighted Based Genetic Algorithm (WBGA) for solving the view selection problem with two objectives.
Enabling complex queries to drug information sources through functional composition.
Peters, Lee; Mortensen, Jonathan; Nguyen, Thang; Bodenreider, Olivier
2013-01-01
Our objective was to enable an end-user to create complex queries to drug information sources through functional composition, by creating sequences of functions from application program interfaces (API) to drug terminologies. The development of a functional composition model seeks to link functions from two distinct APIs. An ontology was developed using Protégé to model the functions of the RxNorm and NDF-RT APIs by describing the semantics of their input and output. A set of rules were developed to define the interoperable conditions for functional composition. The operational definition of interoperability between function pairs is established by executing the rules on the ontology. We illustrate that the functional composition model supports common use cases, including checking interactions for RxNorm drugs and deploying allergy lists defined in reference to drug properties in NDF-RT. This model supports the RxMix application (http://mor.nlm.nih.gov/RxMix/), an application we developed for enabling complex queries to the RxNorm and NDF-RT APIs.
Development of a web-based video management and application processing system
NASA Astrophysics Data System (ADS)
Chan, Shermann S.; Wu, Yi; Li, Qing; Zhuang, Yueting
2001-07-01
How to facilitate efficient video manipulation and access in a web-based environment is becoming a popular trend for video applications. In this paper, we present a web-oriented video management and application processing system, based on our previous work on multimedia database and content-based retrieval. In particular, we extend the VideoMAP architecture with specific web-oriented mechanisms, which include: (1) Concurrency control facilities for the editing of video data among different types of users, such as Video Administrator, Video Producer, Video Editor, and Video Query Client; different users are assigned various priority levels for different operations on the database. (2) Versatile video retrieval mechanism which employs a hybrid approach by integrating a query-based (database) mechanism with content- based retrieval (CBR) functions; its specific language (CAROL/ST with CBR) supports spatio-temporal semantics of video objects, and also offers an improved mechanism to describe visual content of videos by content-based analysis method. (3) Query profiling database which records the `histories' of various clients' query activities; such profiles can be used to provide the default query template when a similar query is encountered by the same kind of users. An experimental prototype system is being developed based on the existing VideoMAP prototype system, using Java and VC++ on the PC platform.
Design of a lattice-based faceted classification system
NASA Technical Reports Server (NTRS)
Eichmann, David A.; Atkins, John
1992-01-01
We describe a software reuse architecture supporting component retrieval by facet classes. The facets are organized into a lattice of facet sets and facet n-tuples. The query mechanism supports precise retrieval and flexible browsing.
On the routing protocol influence on the resilience of wireless sensor networks to jamming attacks.
Del-Valle-Soto, Carolina; Mex-Perera, Carlos; Monroy, Raul; Nolazco-Flores, Juan Arturo
2015-03-27
In this work, we compare a recently proposed routing protocol, the multi-parent hierarchical (MPH) protocol, with two well-known protocols, the ad hoc on-demand distance vector (AODV) and dynamic source routing (DSR). For this purpose, we have developed a simulator, which faithfully reifies the workings of a given protocol, considering a fixed, reconfigurable ad hoc network given by the number and location of participants, and general network conditions. We consider a scenario that can be found in a large number of wireless sensor network applications, a single sink node that collects all of the information generated by the sensors. The metrics used to compare the protocols were the number of packet retransmissions, carrier sense multiple access (CSMA) inner loop retries, the number of nodes answering the queries from the coordinator (sink) node and the energy consumption. We tested the network under ordinary (without attacks) conditions (and combinations thereof) and when it is subject to different types of jamming attacks (in particular, random and reactive jamming attacks), considering several positions for the jammer. Our results report that MPH has a greater ability to tolerate such attacks than DSR and AODV, since it minimizes and encapsulates the network segment under attack. The self-configuring capabilities of MPH derived from a combination of a proactive routes update, on a periodic-time basis, and a reactive behavior provide higher resilience while offering a better performance (overhead and energy consumption) than AODV and DSR, as shown in our simulation results.
A semantic proteomics dashboard (SemPoD) for data management in translational research.
Jayapandian, Catherine P; Zhao, Meng; Ewing, Rob M; Zhang, Guo-Qiang; Sahoo, Satya S
2012-01-01
One of the primary challenges in translational research data management is breaking down the barriers between the multiple data silos and the integration of 'omics data with clinical information to complete the cycle from the bench to the bedside. The role of contextual metadata, also called provenance information, is a key factor ineffective data integration, reproducibility of results, correct attribution of original source, and answering research queries involving "What", "Where", "When", "Which", "Who", "How", and "Why" (also known as the W7 model). But, at present there is limited or no effective approach to managing and leveraging provenance information for integrating data across studies or projects. Hence, there is an urgent need for a paradigm shift in creating a "provenance-aware" informatics platform to address this challenge. We introduce an ontology-driven, intuitive Semantic Proteomics Dashboard (SemPoD) that uses provenance together with domain information (semantic provenance) to enable researchers to query, compare, and correlate different types of data across multiple projects, and allow integration with legacy data to support their ongoing research. The SemPoD platform, currently in use at the Case Center for Proteomics and Bioinformatics (CPB), consists of three components: (a) Ontology-driven Visual Query Composer, (b) Result Explorer, and (c) Query Manager. Currently, SemPoD allows provenance-aware querying of 1153 mass-spectrometry experiments from 20 different projects. SemPod uses the systems molecular biology provenance ontology (SysPro) to support a dynamic query composition interface, which automatically updates the components of the query interface based on previous user selections and efficiently prunes the result set usinga "smart filtering" approach. The SysPro ontology re-uses terms from the PROV-ontology (PROV-O) being developed by the World Wide Web Consortium (W3C) provenance working group, the minimum information required for reporting a molecular interaction experiment (MIMIx), and the minimum information about a proteomics experiment (MIAPE) guidelines. The SemPoD was evaluated both in terms of user feedback and as scalability of the system. SemPoD is an intuitive and powerful provenance ontology-driven data access and query platform that uses the MIAPE and MIMIx metadata guideline to create an integrated view over large-scale systems molecular biology datasets. SemPoD leverages the SysPro ontology to create an intuitive dashboard for biologists to compose queries, explore the results, and use a query manager for storing queries for later use. SemPoD can be deployed over many existing database applications storing 'omics data, including, as illustrated here, the LabKey data-management system. The initial user feedback evaluating the usability and functionality of SemPoD has been very positive and it is being considered for wider deployment beyond the proteomics domain, and in other 'omics' centers.
muBLASTP: database-indexed protein sequence search on multicore CPUs.
Zhang, Jing; Misra, Sanchit; Wang, Hao; Feng, Wu-Chun
2016-11-04
The Basic Local Alignment Search Tool (BLAST) is a fundamental program in the life sciences that searches databases for sequences that are most similar to a query sequence. Currently, the BLAST algorithm utilizes a query-indexed approach. Although many approaches suggest that sequence search with a database index can achieve much higher throughput (e.g., BLAT, SSAHA, and CAFE), they cannot deliver the same level of sensitivity as the query-indexed BLAST, i.e., NCBI BLAST, or they can only support nucleotide sequence search, e.g., MegaBLAST. Due to different challenges and characteristics between query indexing and database indexing, the existing techniques for query-indexed search cannot be used into database indexed search. muBLASTP, a novel database-indexed BLAST for protein sequence search, delivers identical hits returned to NCBI BLAST. On Intel Haswell multicore CPUs, for a single query, the single-threaded muBLASTP achieves up to a 4.41-fold speedup for alignment stages, and up to a 1.75-fold end-to-end speedup over single-threaded NCBI BLAST. For a batch of queries, the multithreaded muBLASTP achieves up to a 5.7-fold speedups for alignment stages, and up to a 4.56-fold end-to-end speedup over multithreaded NCBI BLAST. With a newly designed index structure for protein database and associated optimizations in BLASTP algorithm, we re-factored BLASTP algorithm for modern multicore processors that achieves much higher throughput with acceptable memory footprint for the database index.
Huang, Chung-Chi; Lu, Zhiyong
2016-01-01
Identifying relevant papers from the literature is a common task in biocuration. Most current biomedical literature search systems primarily rely on matching user keywords. Semantic search, on the other hand, seeks to improve search accuracy by understanding the entities and contextual relations in user keywords. However, past research has mostly focused on semantically identifying biological entities (e.g. chemicals, diseases and genes) with little effort on discovering semantic relations. In this work, we aim to discover biomedical semantic relations in PubMed queries in an automated and unsupervised fashion. Specifically, we focus on extracting and understanding the contextual information (or context patterns) that is used by PubMed users to represent semantic relations between entities such as 'CHEMICAL-1 compared to CHEMICAL-2' With the advances in automatic named entity recognition, we first tag entities in PubMed queries and then use tagged entities as knowledge to recognize pattern semantics. More specifically, we transform PubMed queries into context patterns involving participating entities, which are subsequently projected to latent topics via latent semantic analysis (LSA) to avoid the data sparseness and specificity issues. Finally, we mine semantically similar contextual patterns or semantic relations based on LSA topic distributions. Our two separate evaluation experiments of chemical-chemical (CC) and chemical-disease (CD) relations show that the proposed approach significantly outperforms a baseline method, which simply measures pattern semantics by similarity in participating entities. The highest performance achieved by our approach is nearly 0.9 and 0.85 respectively for the CC and CD task when compared against the ground truth in terms of normalized discounted cumulative gain (nDCG), a standard measure of ranking quality. These results suggest that our approach can effectively identify and return related semantic patterns in a ranked order covering diverse bio-entity relations. To assess the potential utility of our automated top-ranked patterns of a given relation in semantic search, we performed a pilot study on frequently sought semantic relations in PubMed and observed improved literature retrieval effectiveness based on post-hoc human relevance evaluation. Further investigation in larger tests and in real-world scenarios is warranted. Published by Oxford University Press 2016. This work is written by US Government employees and is in the public domain in the US.
Comprehensive Optimal Manpower and Personnel Analytic Simulation System (COMPASS)
2009-10-01
4 The EDB consists of 4 major components (some of which are re-usable): 1. Metadata Editor ( MDE ): Also considered a leaf node, the metadata...end-user queries via the QB. The EDB supports multiple instances of the MDE , although currently, only a single instance is recommended. 2 Query...the MSB is a central collection of web services, responsible for the authentication and authorization of users, maintenance of the EDB metadata
a 3d Information System for the Documentation of Archaeologica L Excavations
NASA Astrophysics Data System (ADS)
Ardissone, P.; Bornaz, L.; Degattis, G.; Domaine, R.
2013-07-01
Documentation of archaeological and cultural heritage sites is at the heart of the archaeological process and an important component in cultural heritage research, presentation and restorations. In 2012 the Superintendence of Cultural Heritage of Aosta Valley - IT (Soprintendenza per i Beni e le Attività Culturali della Region e Autonoma Valle d'Aosta) carried out a complex archaeological excavation in a composite archaeological context, situated an urban background: the Aosta city centre. This archaeological excavation has been characterized by the classical urban archaeological issues: little space, short time, complex stratigraphy. Moreover the investigations have come out several structures and walls that required safety and conservation measures. Ad hoc 3D solutions carried out a complete 3D survey of the area in 10 different time/situations of the Archaeological digs, chosen in collaborations with the archaeological staff. In this way a multi temporal 3D description of the site has been provided for the archaeological analysis and for the project of the restorations activities. The 3D surveys has been carried out integrating GPS, laser scanner technology and photogrammetry. In order to meet the needs of the site, and its complex logistics and to obtain products that guarantee the high quality and detail required for archaeological analysis, we have developed different procedures and methodologies: hdr imaging for 3D model with correct, consistent and uniform colours, noise filtering and people filtering, for the removal of interference between laser instrument and object of the survey, Advanced laser scanner triangulation, in order to consider both artificial and natural tie points, for a correct registration of a huge amount of scans. Single image orientation on 3D data, in order to integrate the laser data with data coming from digital photogrammetry (faster on the field than the laser scanner survey, than used in certain situations). The results of all these methodologies and procedures will be presented and described in the article. For the documentation of the archaeological excavations and for the management of the conservation activities (condition assessment, planning, and conservation work). Ad Hoc 3D solutions has costumized 2 special plug-ins of its own software platform Ad Hoc: Ad Hoc Archaeology and Ad Hoc Conservation. The software platform integrates a 3D database management system. All information (measurements, plotting, areas of interests…) are organized according to their correct 3D position. They can be queried using attributes, geometric characteristics or their spatial position. The Ad Hoc Archaeology plug-in allows archeologists to fill out UUSS sheets in an internal database, put them in the correct location within the 3D model of the site, define the mutual relations between the UUSS, divide the different archaeological phases. A simple interface will facilitate the construction of the stratigraphic chart (matrix), in a 3D environment as well (matrix 3D). The Ad Hoc Conservation plug-in permits conservators and restorers to create relationships between the different approaches and descriptions of the same parts of the monument, i.e.: between stratigraphyc units or historical phases and architectural components and/or decay pathologies. The 3D DBMS conservation module uses a codified terminology based on "ICOMOS illustrated glossary of stone deterioration" and other glossary. Specific tools permits restorers to compute correctly surfaces and volumes. In this way decay extension and intensity can be measured with high precision and with an high level of detail, for a correct time and costs estimation of each conservation step.
Visual exploration of big spatio-temporal urban data: a study of New York City taxi trips.
Ferreira, Nivan; Poco, Jorge; Vo, Huy T; Freire, Juliana; Silva, Cláudio T
2013-12-01
As increasing volumes of urban data are captured and become available, new opportunities arise for data-driven analysis that can lead to improvements in the lives of citizens through evidence-based decision making and policies. In this paper, we focus on a particularly important urban data set: taxi trips. Taxis are valuable sensors and information associated with taxi trips can provide unprecedented insight into many different aspects of city life, from economic activity and human behavior to mobility patterns. But analyzing these data presents many challenges. The data are complex, containing geographical and temporal components in addition to multiple variables associated with each trip. Consequently, it is hard to specify exploratory queries and to perform comparative analyses (e.g., compare different regions over time). This problem is compounded due to the size of the data-there are on average 500,000 taxi trips each day in NYC. We propose a new model that allows users to visually query taxi trips. Besides standard analytics queries, the model supports origin-destination queries that enable the study of mobility across the city. We show that this model is able to express a wide range of spatio-temporal queries, and it is also flexible in that not only can queries be composed but also different aggregations and visual representations can be applied, allowing users to explore and compare results. We have built a scalable system that implements this model which supports interactive response times; makes use of an adaptive level-of-detail rendering strategy to generate clutter-free visualization for large results; and shows hidden details to the users in a summary through the use of overlay heat maps. We present a series of case studies motivated by traffic engineers and economists that show how our model and system enable domain experts to perform tasks that were previously unattainable for them.
ERIC Educational Resources Information Center
Zhang, Jianwei; Chen, Qi; Sun, Yanquing; Reid, David J.
2004-01-01
Learning support studies involving simulation-based scientific discovery learning have tended to adopt an ad hoc strategies-oriented approach in which the support strategies are typically pre-specified according to learners' difficulties in particular activities. This article proposes a more integrated approach, a triple scheme for learning…
Akhtar, Tahira; Hipkiss, Vicki; Stoves, John
2014-09-01
A large number of patients with advanced chronic kidney disease in Bradford, UK are of South Asian origin. Effective delivery of care for these patients demands a consistently high standard of communication between patients, their families and renal unit staff. Communication may be problematic for a number of reasons, including language barriers and an incomplete appreciation of important cultural or religious beliefs. In 2010, we received charitable funding for an Ethnic Liaison Support Worker, renamed Cultural and Health Improvement Officer (CHIO). The CHIO is able to engage with and support patients, relatives and other members of the multidisciplinary renal team. Core responsibilities include ensuring that patients and their families feel at ease whenever they are in communication with the renal services team in Bradford; contributing to formal discussion and explanation of important clinical issues in the patient's first language (typically Urdu or Punjabi), thereby encouraging patients to feel more involved in decisions about their care; listening to ad hoc patient queries and concerns and addressing these directly or indirectly through timely involvement of other members of the renal team; cultivating a greater awareness of and empathy towards the holistic needs of patients and their families within the renal MDT; and strengthening the links between renal and social care services. The involvement of the CHIO is especially important when discussing sensitive and complex issues such as conservative and end-of-life care with patients and their families, as communication needs to be detailed, precise and unequivocal. This is crucial to providing a bespoke supportive service for our all patients, especially those who have opted for conservative management. The significant contribution of the CHIO to the quality of renal services in Bradford has been recognised and a permanent CHIO post has now been established. © 2014 European Dialysis and Transplant Nurses Association/European Renal Care Association.
NASA Astrophysics Data System (ADS)
Di Felice, P.; Spadoni, M.
2013-04-01
MAHA is a database-centred software system for the storage and visualization of subsoil data used for the production of seismic microzonation maps in Italy. The application was implemented using open source software in order to grant its maximum diffusion and customization. A conceptual model of the subsoil, jointly developed by the Italian National Research Council and the National Department of Civil Protection, inspired the structure of the underlying database, consisting of 15 tables, 3 of which of spatial nature to accommodate geo-referenced data associated to points, lines and polygons. A web-GIS interface acts as a bridge between the user and the database, drives the input of geo-referenced data and enables the users to formulate different types of spatial queries. A series of forms designed "ad hoc" and enriched with combo boxes provide guided procedures to maximize the fluency of data entry and to reduce the possibility of erroneous typing. One of these procedures helps to transform the descriptions of the geological units (granular materials), given in technical paper documents by using a conversational style, into standardized numeric codes. Summary reports, produced in the pdf format, can be generated through decoding and graphic display of the parameters previously entered in the database. MAHA was approved by the national commission for seismic microzonation established by the Italian Prime Minister and, in the next years, it is expected to significantly support the entire process of map production in the urban areas more exposed to seismic hazard.
Clement, T; Brown, J; Morrison, J; Nestel, D
2016-05-01
General practice registrars in Australia undertake most of their vocational training in accredited general practices. They typically see patients alone from the start of their community-based training and are expected to seek timely ad hoc support from their supervisor. Such ad hoc encounters are a mechanism for ensuring patient safety, but also provide an opportunity for learning and teaching. Wenger's (Communities of practice: learning, meaning, and identity. Cambridge University Press, New York, 1998) social theory of learning ('communities of practice') guided a secondary analysis of audio-recordings of ad hoc encounters. Data from one encounter is re-presented as an extended sequence to maintain congruence with the theoretical perspective and enhance vicariousness. An interpretive commentary communicates key features of Wenger's theory and highlights the researchers' interpretations. We argue that one encounter can reveal universal understandings of clinical supervision and that the process of naturalistic generalisation allows readers to transfer others' experiences to their own contexts. The paper raises significant analytic, interpretive, and representational issues. We highlight that report writing is an important, but infrequently discussed, part of research design. We discuss the challenges of supporting the learning and teaching that arises from adopting a socio-cultural lens and argue that such a perspective importantly captures the complex range of issues that work-based practitioners have to grapple with. This offers a challenge to how we research and seek to influence work-based learning and teaching in health care settings.
EasyKSORD: A Platform of Keyword Search Over Relational Databases
NASA Astrophysics Data System (ADS)
Peng, Zhaohui; Li, Jing; Wang, Shan
Keyword Search Over Relational Databases (KSORD) enables casual users to use keyword queries (a set of keywords) to search relational databases just like searching the Web, without any knowledge of the database schema or any need of writing SQL queries. Based on our previous work, we design and implement a novel KSORD platform named EasyKSORD for users and system administrators to use and manage different KSORD systems in a novel and simple manner. EasyKSORD supports advanced queries, efficient data-graph-based search engines, multiform result presentations, and system logging and analysis. Through EasyKSORD, users can search relational databases easily and read search results conveniently, and system administrators can easily monitor and analyze the operations of KSORD and manage KSORD systems much better.
An intuitive graphical webserver for multiple-choice protein sequence search.
Banky, Daniel; Szalkai, Balazs; Grolmusz, Vince
2014-04-10
Every day tens of thousands of sequence searches and sequence alignment queries are submitted to webservers. The capitalized word "BLAST" becomes a verb, describing the act of performing sequence search and alignment. However, if one needs to search for sequences that contain, for example, two hydrophobic and three polar residues at five given positions, the query formation on the most frequently used webservers will be difficult. Some servers support the formation of queries with regular expressions, but most of the users are unfamiliar with their syntax. Here we present an intuitive, easily applicable webserver, the Protein Sequence Analysis server, that allows the formation of multiple choice queries by simply drawing the residues to their positions; if more than one residue are drawn to the same position, then they will be nicely stacked on the user interface, indicating the multiple choice at the given position. This computer-game-like interface is natural and intuitive, and the coloring of the residues makes possible to form queries requiring not just certain amino acids in the given positions, but also small nonpolar, negatively charged, hydrophobic, positively charged, or polar ones. The webserver is available at http://psa.pitgroup.org. Copyright © 2014 Elsevier B.V. All rights reserved.
Ong, Edison; Xiang, Zuoshuang; Zhao, Bin; Liu, Yue; Lin, Yu; Zheng, Jie; Mungall, Chris; Courtot, Mélanie; Ruttenberg, Alan; He, Yongqun
2017-01-04
Linked Data (LD) aims to achieve interconnected data by representing entities using Unified Resource Identifiers (URIs), and sharing information using Resource Description Frameworks (RDFs) and HTTP. Ontologies, which logically represent entities and relations in specific domains, are the basis of LD. Ontobee (http://www.ontobee.org/) is a linked ontology data server that stores ontology information using RDF triple store technology and supports query, visualization and linkage of ontology terms. Ontobee is also the default linked data server for publishing and browsing biomedical ontologies in the Open Biological Ontology (OBO) Foundry (http://obofoundry.org) library. Ontobee currently hosts more than 180 ontologies (including 131 OBO Foundry Library ontologies) with over four million terms. Ontobee provides a user-friendly web interface for querying and visualizing the details and hierarchy of a specific ontology term. Using the eXtensible Stylesheet Language Transformation (XSLT) technology, Ontobee is able to dereference a single ontology term URI, and then output RDF/eXtensible Markup Language (XML) for computer processing or display the HTML information on a web browser for human users. Statistics and detailed information are generated and displayed for each ontology listed in Ontobee. In addition, a SPARQL web interface is provided for custom advanced SPARQL queries of one or multiple ontologies. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Ontology-Driven Provenance Management in eScience: An Application in Parasite Research
NASA Astrophysics Data System (ADS)
Sahoo, Satya S.; Weatherly, D. Brent; Mutharaju, Raghava; Anantharam, Pramod; Sheth, Amit; Tarleton, Rick L.
Provenance, from the French word "provenir", describes the lineage or history of a data entity. Provenance is critical information in scientific applications to verify experiment process, validate data quality and associate trust values with scientific results. Current industrial scale eScience projects require an end-to-end provenance management infrastructure. This infrastructure needs to be underpinned by formal semantics to enable analysis of large scale provenance information by software applications. Further, effective analysis of provenance information requires well-defined query mechanisms to support complex queries over large datasets. This paper introduces an ontology-driven provenance management infrastructure for biology experiment data, as part of the Semantic Problem Solving Environment (SPSE) for Trypanosoma cruzi (T.cruzi). This provenance infrastructure, called T.cruzi Provenance Management System (PMS), is underpinned by (a) a domain-specific provenance ontology called Parasite Experiment ontology, (b) specialized query operators for provenance analysis, and (c) a provenance query engine. The query engine uses a novel optimization technique based on materialized views called materialized provenance views (MPV) to scale with increasing data size and query complexity. This comprehensive ontology-driven provenance infrastructure not only allows effective tracking and management of ongoing experiments in the Tarleton Research Group at the Center for Tropical and Emerging Global Diseases (CTEGD), but also enables researchers to retrieve the complete provenance information of scientific results for publication in literature.
Heterogeneous database integration in biomedicine.
Sujansky, W
2001-08-01
The rapid expansion of biomedical knowledge, reduction in computing costs, and spread of internet access have created an ocean of electronic data. The decentralized nature of our scientific community and healthcare system, however, has resulted in a patchwork of diverse, or heterogeneous, database implementations, making access to and aggregation of data across databases very difficult. The database heterogeneity problem applies equally to clinical data describing individual patients and biological data characterizing our genome. Specifically, databases are highly heterogeneous with respect to the data models they employ, the data schemas they specify, the query languages they support, and the terminologies they recognize. Heterogeneous database systems attempt to unify disparate databases by providing uniform conceptual schemas that resolve representational heterogeneities, and by providing querying capabilities that aggregate and integrate distributed data. Research in this area has applied a variety of database and knowledge-based techniques, including semantic data modeling, ontology definition, query translation, query optimization, and terminology mapping. Existing systems have addressed heterogeneous database integration in the realms of molecular biology, hospital information systems, and application portability.
An infrastructure for ontology-based information systems in biomedicine: RICORDO case study.
Wimalaratne, Sarala M; Grenon, Pierre; Hoehndorf, Robert; Gkoutos, Georgios V; de Bono, Bernard
2012-02-01
The article presents an infrastructure for supporting the semantic interoperability of biomedical resources based on the management (storing and inference-based querying) of their ontology-based annotations. This infrastructure consists of: (i) a repository to store and query ontology-based annotations; (ii) a knowledge base server with an inference engine to support the storage of and reasoning over ontologies used in the annotation of resources; (iii) a set of applications and services allowing interaction with the integrated repository and knowledge base. The infrastructure is being prototyped and developed and evaluated by the RICORDO project in support of the knowledge management of biomedical resources, including physiology and pharmacology models and associated clinical data. The RICORDO toolkit and its source code are freely available from http://ricordo.eu/relevant-resources. sarala@ebi.ac.uk.
Agile Datacube Analytics (not just) for the Earth Sciences
NASA Astrophysics Data System (ADS)
Misev, Dimitar; Merticariu, Vlad; Baumann, Peter
2017-04-01
Metadata are considered small, smart, and queryable; data, on the other hand, are known as big, clumsy, hard to analyze. Consequently, gridded data - such as images, image timeseries, and climate datacubes - are managed separately from the metadata, and with different, restricted retrieval capabilities. One reason for this silo approach is that databases, while good at tables, XML hierarchies, RDF graphs, etc., traditionally do not support multi-dimensional arrays well. This gap is being closed by Array Databases which extend the SQL paradigm of "any query, anytime" to NoSQL arrays. They introduce semantically rich modelling combined with declarative, high-level query languages on n-D arrays. On Server side, such queries can be optimized, parallelized, and distributed based on partitioned array storage. This way, they offer new vistas in flexibility, scalability, performance, and data integration. In this respect, the forthcoming ISO SQL extension MDA ("Multi-dimensional Arrays") will be a game changer in Big Data Analytics. We introduce concepts and opportunities through the example of rasdaman ("raster data manager") which in fact has pioneered the field of Array Databases and forms the blueprint for ISO SQL/MDA and further Big Data standards, such as OGC WCPS for querying spatio-temporal Earth datacubes. With operational installations exceeding 140 TB queries have been split across more than one thousand cloud nodes, using CPUs as well as GPUs. Installations can easily be mashed up securely, enabling large-scale location-transparent query processing in federations. Federation queries have been demonstrated live at EGU 2016 spanning Europe and Australia in the context of the intercontinental EarthServer initiative, visualized through NASA WorldWind.
Agile Datacube Analytics (not just) for the Earth Sciences
NASA Astrophysics Data System (ADS)
Baumann, P.
2016-12-01
Metadata are considered small, smart, and queryable; data, on the other hand, are known as big, clumsy, hard to analyze. Consequently, gridded data - such as images, image timeseries, and climate datacubes - are managed separately from the metadata, and with different, restricted retrieval capabilities. One reason for this silo approach is that databases, while good at tables, XML hierarchies, RDF graphs, etc., traditionally do not support multi-dimensional arrays well.This gap is being closed by Array Databases which extend the SQL paradigm of "any query, anytime" to NoSQL arrays. They introduce semantically rich modelling combined with declarative, high-level query languages on n-D arrays. On Server side, such queries can be optimized, parallelized, and distributed based on partitioned array storage. This way, they offer new vistas in flexibility, scalability, performance, and data integration. In this respect, the forthcoming ISO SQL extension MDA ("Multi-dimensional Arrays") will be a game changer in Big Data Analytics.We introduce concepts and opportunities through the example of rasdaman ("raster data manager") which in fact has pioneered the field of Array Databases and forms the blueprint for ISO SQL/MDA and further Big Data standards, such as OGC WCPS for querying spatio-temporal Earth datacubes. With operational installations exceeding 140 TB queries have been split across more than one thousand cloud nodes, using CPUs as well as GPUs. Installations can easily be mashed up securely, enabling large-scale location-transparent query processing in federations. Federation queries have been demonstrated live at EGU 2016 spanning Europe and Australia in the context of the intercontinental EarthServer initiative, visualized through NASA WorldWind.
Child pornography in peer-to-peer networks.
Steel, Chad M S
2009-08-01
The presence of child pornography in peer-to-peer networks is not disputed, but there has been little effort done to quantify and analyze the distribution and nature of that content to-date. By performing an analysis of queries and query hits on the largest peer-to-peer network, we are able to both quantify and describe the nature of querying by child pornographers as well as the content they are sharing. Child pornography related content was identified and analyzed in 235,513 user queries and 194,444 query hits. The research confirmed a large amount of peer-to-peer traffic is dedicated to child pornography, but supply and demand must be separated for a better understanding. The most prevalent query and the top two most prevalent filenames returned as query hits were child pornography related. However, it would be inaccurate to state child pornography dominates peer-to-peer as 1% of all queries were related to child pornography and 1.45% of all query hits (unique filenames) were related to child pornography, consistent with a smaller study (Hughes et al., 2008). In addition to the above, research indicates that the median age searched for was 13 years old, and the majority of queries were gender-neutral, but of those with gender-related terms, 79% were female-oriented. Distribution-wise, the vast majority of content-specific searches are for movies at 99%, though images are still the most prevalent in availability. There is no shortage of child pornography supply and demand on peer-to-peer networks and by analyzing how consumers seek and distributors advertise content we can better understand their motivations. Understanding the behavior of child pornographers and how they search for content when contrasted with those sharing content provides a basis for finding and combating that behavior. For law enforcement, knowing the specific terms used allows more timely and accurate forensics and better identification of those seeking and distributing child pornography. For Internet researchers, better filtering and monitoring is possible. For mental health professionals, understanding the preferences and behaviors of those searching supports more effective treatment.
Evaluation of relational and NoSQL database architectures to manage genomic annotations.
Schulz, Wade L; Nelson, Brent G; Felker, Donn K; Durant, Thomas J S; Torres, Richard
2016-12-01
While the adoption of next generation sequencing has rapidly expanded, the informatics infrastructure used to manage the data generated by this technology has not kept pace. Historically, relational databases have provided much of the framework for data storage and retrieval. Newer technologies based on NoSQL architectures may provide significant advantages in storage and query efficiency, thereby reducing the cost of data management. But their relative advantage when applied to biomedical data sets, such as genetic data, has not been characterized. To this end, we compared the storage, indexing, and query efficiency of a common relational database (MySQL), a document-oriented NoSQL database (MongoDB), and a relational database with NoSQL support (PostgreSQL). When used to store genomic annotations from the dbSNP database, we found the NoSQL architectures to outperform traditional, relational models for speed of data storage, indexing, and query retrieval in nearly every operation. These findings strongly support the use of novel database technologies to improve the efficiency of data management within the biological sciences. Copyright © 2016 Elsevier Inc. All rights reserved.
The Database Query Support Processor (QSP)
NASA Technical Reports Server (NTRS)
1993-01-01
The number and diversity of databases available to users continues to increase dramatically. Currently, the trend is towards decentralized, client server architectures that (on the surface) are less expensive to acquire, operate, and maintain than information architectures based on centralized, monolithic mainframes. The database query support processor (QSP) effort evaluates the performance of a network level, heterogeneous database access capability. Air Force Material Command's Rome Laboratory has developed an approach, based on ANSI standard X3.138 - 1988, 'The Information Resource Dictionary System (IRDS)' to seamless access to heterogeneous databases based on extensions to data dictionary technology. To successfully query a decentralized information system, users must know what data are available from which source, or have the knowledge and system privileges necessary to find out this information. Privacy and security considerations prohibit free and open access to every information system in every network. Even in completely open systems, time required to locate relevant data (in systems of any appreciable size) would be better spent analyzing the data, assuming the original question was not forgotten. Extensions to data dictionary technology have the potential to more fully automate the search and retrieval for relevant data in a decentralized environment. Substantial amounts of time and money could be saved by not having to teach users what data resides in which systems and how to access each of those systems. Information describing data and how to get it could be removed from the application and placed in a dedicated repository where it belongs. The result simplified applications that are less brittle and less expensive to build and maintain. Software technology providing the required functionality is off the shelf. The key difficulty is in defining the metadata required to support the process. The database query support processor effort will provide quantitative data on the amount of effort required to implement an extended data dictionary at the network level, add new systems, adapt to changing user needs, and provide sound estimates on operations and maintenance costs and savings.
A unified framework for managing provenance information in translational research
2011-01-01
Background A critical aspect of the NIH Translational Research roadmap, which seeks to accelerate the delivery of "bench-side" discoveries to patient's "bedside," is the management of the provenance metadata that keeps track of the origin and history of data resources as they traverse the path from the bench to the bedside and back. A comprehensive provenance framework is essential for researchers to verify the quality of data, reproduce scientific results published in peer-reviewed literature, validate scientific process, and associate trust value with data and results. Traditional approaches to provenance management have focused on only partial sections of the translational research life cycle and they do not incorporate "domain semantics", which is essential to support domain-specific querying and analysis by scientists. Results We identify a common set of challenges in managing provenance information across the pre-publication and post-publication phases of data in the translational research lifecycle. We define the semantic provenance framework (SPF), underpinned by the Provenir upper-level provenance ontology, to address these challenges in the four stages of provenance metadata: (a) Provenance collection - during data generation (b) Provenance representation - to support interoperability, reasoning, and incorporate domain semantics (c) Provenance storage and propagation - to allow efficient storage and seamless propagation of provenance as the data is transferred across applications (d) Provenance query - to support queries with increasing complexity over large data size and also support knowledge discovery applications We apply the SPF to two exemplar translational research projects, namely the Semantic Problem Solving Environment for Trypanosoma cruzi (T.cruzi SPSE) and the Biomedical Knowledge Repository (BKR) project, to demonstrate its effectiveness. Conclusions The SPF provides a unified framework to effectively manage provenance of translational research data during pre and post-publication phases. This framework is underpinned by an upper-level provenance ontology called Provenir that is extended to create domain-specific provenance ontologies to facilitate provenance interoperability, seamless propagation of provenance, automated querying, and analysis. PMID:22126369
NASA Astrophysics Data System (ADS)
McWhirter, J.; Boler, F. M.; Bock, Y.; Jamason, P.; Squibb, M. B.; Noll, C. E.; Blewitt, G.; Kreemer, C. W.
2010-12-01
Three geodesy Archive Centers, Scripps Orbit and Permanent Array Center (SOPAC), NASA's Crustal Dynamics Data Information System (CDDIS) and UNAVCO are engaged in a joint effort to define and develop a common Web Service Application Programming Interface (API) for accessing geodetic data holdings. This effort is funded by the NASA ROSES ACCESS Program to modernize the original GPS Seamless Archive Centers (GSAC) technology which was developed in the 1990s. A new web service interface, the GSAC-WS, is being developed to provide uniform and expanded mechanisms through which users can access our data repositories. In total, our respective archives hold tens of millions of files and contain a rich collection of site/station metadata. Though we serve similar user communities, we currently provide a range of different access methods, query services and metadata formats. This leads to a lack of consistency in the userís experience and a duplication of engineering efforts. The GSAC-WS API and its reference implementation in an underlying Java-based GSAC Service Layer (GSL) supports metadata and data queries into site/station oriented data archives. The general nature of this API makes it applicable to a broad range of data systems. The overall goals of this project include providing consistent and rich query interfaces for end users and client programs, the development of enabling technology to facilitate third party repositories in developing these web service capabilities and to enable the ability to perform data queries across a collection of federated GSAC-WS enabled repositories. A fundamental challenge faced in this project is to provide a common suite of query services across a heterogeneous collection of data yet enabling each repository to expose their specific metadata holdings. To address this challenge we are developing a "capabilities" based service where a repository can describe its specific query and metadata capabilities. Furthermore, the architecture of the GSL is based on a model-view paradigm that decouples the underlying data model semantics from particular representations of the data model. This will allow for the GSAC-WS enabled repositories to evolve their service offerings to incorporate new metadata definition formats (e.g., ISO-19115, FGDC, JSON, etc.) and new techniques for accessing their holdings. Building on the core GSAC-WS implementations the project is also developing a federated/distributed query service. This service will seamlessly integrate with the GSAC Service Layer and will support data and metadata queries across a collection of federated GSAC repositories.
A Multimedia Database Management System Supporting Contents Search in Media Data
1989-03-01
predicates: car (x), manufacturer (x, Horch ), year-built (x, 1922) to designate that the car was manufactured by Horch in the year 1922. The use of the name...manufacturer (imagel, x, " Horch "). year-built (imagel, x, 1922). query(I) :- car (I, Z), manufacturer (I, Z, " Horch "). ?- query(Image). yes. Image = image... Horch ) : imageJ 1, imageJ2, etc. In certain cases a predicate may generate entries in more than one list. For example, if the predicate of action
2011-01-01
Shotgun lipidome profiling relies on direct mass spectrometric analysis of total lipid extracts from cells, tissues or organisms and is a powerful tool to elucidate the molecular composition of lipidomes. We present a novel informatics concept of the molecular fragmentation query language implemented within the LipidXplorer open source software kit that supports accurate quantification of individual species of any ionizable lipid class in shotgun spectra acquired on any mass spectrometry platform. PMID:21247462
2015-11-20
noticed erythematous rash on the girl’s trunk. Physical examination reveals strawberry red tongue, red and cracked lips, and swollen red hands. The whites...as erythematous rash, and test results, such as revealed swollen red hands and strawberry red tongue, as well as indicates a possible di- agnosis...such as strawberry tongue (also known as Kawasaki disease). Although such queries are fairly long, only a fraction of concepts corresponding to an in
Distributed Multi-interface Catalogue for Geospatial Data
NASA Astrophysics Data System (ADS)
Nativi, S.; Bigagli, L.; Mazzetti, P.; Mattia, U.; Boldrini, E.
2007-12-01
Several geosciences communities (e.g. atmospheric science, oceanography, hydrology) have developed tailored data and metadata models and service protocol specifications for enabling online data discovery, inventory, evaluation, access and download. These specifications are conceived either profiling geospatial information standards or extending the well-accepted geosciences data models and protocols in order to capture more semantics. These artifacts have generated a set of related catalog -and inventory services- characterizing different communities, initiatives and projects. In fact, these geospatial data catalogs are discovery and access systems that use metadata as the target for query on geospatial information. The indexed and searchable metadata provide a disciplined vocabulary against which intelligent geospatial search can be performed within or among communities. There exists a clear need to conceive and achieve solutions to implement interoperability among geosciences communities, in the context of the more general geospatial information interoperability framework. Such solutions should provide search and access capabilities across catalogs, inventory lists and their registered resources. Thus, the development of catalog clearinghouse solutions is a near-term challenge in support of fully functional and useful infrastructures for spatial data (e.g. INSPIRE, GMES, NSDI, GEOSS). This implies the implementation of components for query distribution and virtual resource aggregation. These solutions must implement distributed discovery functionalities in an heterogeneous environment, requiring metadata profiles harmonization as well as protocol adaptation and mediation. We present a catalog clearinghouse solution for the interoperability of several well-known cataloguing systems (e.g. OGC CSW, THREDDS catalog and data services). The solution implements consistent resource discovery and evaluation over a dynamic federation of several well-known cataloguing and inventory systems. Prominent features include: 1)Support to distributed queries over a hierarchical data model, supporting incremental queries (i.e. query over collections, to be subsequently refined) and opaque/translucent chaining; 2)Support to several client protocols, through a compound front-end interface module. This allows to accommodate a (growing) number of cataloguing standards, or profiles thereof, including the OGC CSW interface, ebRIM Application Profile (for Core ISO Metadata and other data models), and the ISO Application Profile. The presented catalog clearinghouse supports both the opaque and translucent pattern for service chaining. In fact, the clearinghouse catalog may be configured either to completely hide the underlying federated services or to provide clients with services information. In both cases, the clearinghouse solution presents a higher level interface (i.e. OGC CSW) which harmonizes multiple lower level services (e.g. OGC CSW, WMS and WCS, THREDDS, etc.), and handles all control and interaction with them. In the translucent case, client has the option to directly access the lower level services (e.g. to improve performances). In the GEOSS context, the solution has been experimented both as a stand-alone user application and as a service framework. The first scenario allows a user to download a multi-platform client software and query a federation of cataloguing systems, that he can customize at will. The second scenario support server-side deployment and can be flexibly adapted to several use-cases, such as intranet proxy, catalog broker, etc.
A data model and database for high-resolution pathology analytical image informatics.
Wang, Fusheng; Kong, Jun; Cooper, Lee; Pan, Tony; Kurc, Tahsin; Chen, Wenjin; Sharma, Ashish; Niedermayr, Cristobal; Oh, Tae W; Brat, Daniel; Farris, Alton B; Foran, David J; Saltz, Joel
2011-01-01
The systematic analysis of imaged pathology specimens often results in a vast amount of morphological information at both the cellular and sub-cellular scales. While microscopy scanners and computerized analysis are capable of capturing and analyzing data rapidly, microscopy image data remain underutilized in research and clinical settings. One major obstacle which tends to reduce wider adoption of these new technologies throughout the clinical and scientific communities is the challenge of managing, querying, and integrating the vast amounts of data resulting from the analysis of large digital pathology datasets. This paper presents a data model, which addresses these challenges, and demonstrates its implementation in a relational database system. This paper describes a data model, referred to as Pathology Analytic Imaging Standards (PAIS), and a database implementation, which are designed to support the data management and query requirements of detailed characterization of micro-anatomic morphology through many interrelated analysis pipelines on whole-slide images and tissue microarrays (TMAs). (1) Development of a data model capable of efficiently representing and storing virtual slide related image, annotation, markup, and feature information. (2) Development of a database, based on the data model, capable of supporting queries for data retrieval based on analysis and image metadata, queries for comparison of results from different analyses, and spatial queries on segmented regions, features, and classified objects. The work described in this paper is motivated by the challenges associated with characterization of micro-scale features for comparative and correlative analyses involving whole-slides tissue images and TMAs. Technologies for digitizing tissues have advanced significantly in the past decade. Slide scanners are capable of producing high-magnification, high-resolution images from whole slides and TMAs within several minutes. Hence, it is becoming increasingly feasible for basic, clinical, and translational research studies to produce thousands of whole-slide images. Systematic analysis of these large datasets requires efficient data management support for representing and indexing results from hundreds of interrelated analyses generating very large volumes of quantifications such as shape and texture and of classifications of the quantified features. We have designed a data model and a database to address the data management requirements of detailed characterization of micro-anatomic morphology through many interrelated analysis pipelines. The data model represents virtual slide related image, annotation, markup and feature information. The database supports a wide range of metadata and spatial queries on images, annotations, markups, and features. We currently have three databases running on a Dell PowerEdge T410 server with CentOS 5.5 Linux operating system. The database server is IBM DB2 Enterprise Edition 9.7.2. The set of databases consists of 1) a TMA database containing image analysis results from 4740 cases of breast cancer, with 641 MB storage size; 2) an algorithm validation database, which stores markups and annotations from two segmentation algorithms and two parameter sets on 18 selected slides, with 66 GB storage size; and 3) an in silico brain tumor study database comprising results from 307 TCGA slides, with 365 GB storage size. The latter two databases also contain human-generated annotations and markups for regions and nuclei. Modeling and managing pathology image analysis results in a database provide immediate benefits on the value and usability of data in a research study. The database provides powerful query capabilities, which are otherwise difficult or cumbersome to support by other approaches such as programming languages. Standardized, semantic annotated data representation and interfaces also make it possible to more efficiently share image data and analysis results.
Knowledge Acquisition of Generic Queries for Information Retrieval
Seol, Yoon-Ho; Johnson, Stephen B.; Cimino, James J.
2002-01-01
Several studies have identified clinical questions posed by health care professionals to understand the nature of information needs during clinical practice. To support access to digital information sources, it is necessary to integrate the information needs with a computer system. We have developed a conceptual guidance approach in information retrieval, based on a knowledge base that contains the patterns of information needs. The knowledge base uses a formal representation of clinical questions based on the UMLS knowledge sources, called the Generic Query model. To improve the coverage of the knowledge base, we investigated a method for extracting plausible clinical questions from the medical literature. This poster presents the Generic Query model, shows how it is used to represent the patterns of clinical questions, and describes the framework used to extract knowledge from the medical literature.
What Is Spatio-Temporal Data Warehousing?
NASA Astrophysics Data System (ADS)
Vaisman, Alejandro; Zimányi, Esteban
In the last years, extending OLAP (On-Line Analytical Processing) systems with spatial and temporal features has attracted the attention of the GIS (Geographic Information Systems) and database communities. However, there is no a commonly agreed definition of what is a spatio-temporal data warehouse and what functionality such a data warehouse should support. Further, the solutions proposed in the literature vary considerably in the kind of data that can be represented as well as the kind of queries that can be expressed. In this paper we present a conceptual framework for defining spatio-temporal data warehouses using an extensible data type system. We also define a taxonomy of different classes of queries of increasing expressive power, and show how to express such queries using an extension of the tuple relational calculus with aggregated functions.
Linked data and provenance in biological data webs.
Zhao, Jun; Miles, Alistair; Klyne, Graham; Shotton, David
2009-03-01
The Web is now being used as a platform for publishing and linking life science data. The Web's linking architecture can be exploited to join heterogeneous data from multiple sources. However, as data are frequently being updated in a decentralized environment, provenance information becomes critical to providing reliable and trustworthy services to scientists. This article presents design patterns for representing and querying provenance information relating to mapping links between heterogeneous data from sources in the domain of functional genomics. We illustrate the use of named resource description framework (RDF) graphs at different levels of granularity to make provenance assertions about linked data, and demonstrate that these assertions are sufficient to support requirements including data currency, integrity, evidential support and historical queries.
Hynes, Denise M.; Perrin, Ruth A.; Rappaport, Steven; Stevens, Joanne M.; Demakis, John G.
2004-01-01
Information systems are increasingly important for measuring and improving health care quality. A number of integrated health care delivery systems use advanced information systems and integrated decision support to carry out quality assurance activities, but none as large as the Veterans Health Administration (VHA). The VHA's Quality Enhancement Research Initiative (QUERI) is a large-scale, multidisciplinary quality improvement initiative designed to ensure excellence in all areas where VHA provides health care services, including inpatient, outpatient, and long-term care settings. In this paper, we describe the role of information systems in the VHA QUERI process, highlight the major information systems critical to this quality improvement process, and discuss issues associated with the use of these systems. PMID:15187063
2016-03-01
Infrastructure to Support Mobile Devices (Takai, 2012, p. 2). The objectives needed in order to meet this goal are to: evolve spectrum management, expand... infrastructure to support wireless capabilities, and establish a mobile device security architecture (Takai, 2012, p. 2). By expanding infrastructure to...often used on Mobile Ad-Hoc Networks (MANETs). MANETS are infrastructure -less networks that include, but are not limited to, mobile devices. These
Learning of Multimodal Representations With Random Walks on the Click Graph.
Wu, Fei; Lu, Xinyan; Song, Jun; Yan, Shuicheng; Zhang, Zhongfei Mark; Rui, Yong; Zhuang, Yueting
2016-02-01
In multimedia information retrieval, most classic approaches tend to represent different modalities of media in the same feature space. With the click data collected from the users' searching behavior, existing approaches take either one-to-one paired data (text-image pairs) or ranking examples (text-query-image and/or image-query-text ranking lists) as training examples, which do not make full use of the click data, particularly the implicit connections among the data objects. In this paper, we treat the click data as a large click graph, in which vertices are images/text queries and edges indicate the clicks between an image and a query. We consider learning a multimodal representation from the perspective of encoding the explicit/implicit relevance relationship between the vertices in the click graph. By minimizing both the truncated random walk loss as well as the distance between the learned representation of vertices and their corresponding deep neural network output, the proposed model which is named multimodal random walk neural network (MRW-NN) can be applied to not only learn robust representation of the existing multimodal data in the click graph, but also deal with the unseen queries and images to support cross-modal retrieval. We evaluate the latent representation learned by MRW-NN on a public large-scale click log data set Clickture and further show that MRW-NN achieves much better cross-modal retrieval performance on the unseen queries/images than the other state-of-the-art methods.
Ayers, John W; Althouse, Benjamin M; Ribisl, Kurt M; Emery, Sherry
2014-05-01
The Internet is revolutionizing tobacco control, but few have harnessed the Web for surveillance. We demonstrate for the first time an approach for analyzing aggregate Internet search queries that captures precise changes in population considerations about tobacco. We compared tobacco-related Google queries originating in the United States during the week of the State Children's Health Insurance Program (SCHIP) 2009 cigarette excise tax increase with a historic baseline. Specific queries were then ranked according to their relative increases while also considering approximations of changes in absolute search volume. Individual queries with the largest relative increases the week of the SCHIP tax were "cigarettes Indian reservations" 640% (95% CI, 472-918), "free cigarettes online" 557% (95% CI, 432-756), and "Indian reservations cigarettes" 542% (95% CI, 414-733), amounting to about 7,500 excess searches. By themes, the largest relative increases were tribal cigarettes 246% (95% CI, 228-265), "free" cigarettes 215% (95% CI, 191-242), and cigarette stores 176% (95% CI, 160-193), accounting for 21,000, 27,000, and 90,000 excess queries. All avoidance queries, including those aforementioned themes, relatively increased 150% (95% CI, 144-155) or 550,000 from their baseline. All cessation queries increased 46% (95% CI, 44-48), or 175,000, around SCHIP; including themes for "cold turkey" 19% (95% CI, 11-27) or 2,600, cessation products 47% (95% CI, 44-50) or 78,000, and dubious cessation approaches (e.g., hypnosis) 40% (95% CI, 33-47) or 2,300. The SCHIP tax motivated specific changes in population considerations. Our strategy can support evaluations that temporally link tobacco control measures with instantaneous population reactions, as well as serve as a springboard for traditional studies, for example, including survey questionnaire design.
Hynes, Denise M; Weddle, Timothy; Smith, Nina; Whittier, Erika; Atkins, David; Francis, Joseph
2010-01-01
As the Department of Veterans Affairs (VA) Health Services Research and Development Service's Quality Enhancement Research Initiative (QUERI) has progressed, health information technology (HIT) has occupied a crucial role in implementation research projects. We evaluated the role of HIT in VA QUERI implementation research, including HIT use and development, the contributions implementation research has made to HIT development, and HIT-related barriers and facilitators to implementation research. Key informants from nine disease-specific QUERI Centers. Documentation analysis of 86 implementation project abstracts followed up by semi-structured interviews with key informants from each of the nine QUERI centers. We used qualitative and descriptive analyses. We found: (1) HIT provided data and information to facilitate implementation research, (2) implementation research helped to further HIT development in a variety of uses including the development of clinical decision support systems (23 of 86 implementation research projects), and (3) common HIT barriers to implementation research existed but could be overcome by collaborations with clinical and administrative leadership. Our review of the implementation research progress in the VA revealed interdependency on an HIT infrastructure and research-based development. Collaboration with multiple stakeholders is a key factor in successful use and development of HIT in implementation research efforts and in advancing evidence-based practice.
A Layered Searchable Encryption Scheme with Functional Components Independent of Encryption Methods
Luo, Guangchun; Qin, Ke
2014-01-01
Searchable encryption technique enables the users to securely store and search their documents over the remote semitrusted server, which is especially suitable for protecting sensitive data in the cloud. However, various settings (based on symmetric or asymmetric encryption) and functionalities (ranked keyword query, range query, phrase query, etc.) are often realized by different methods with different searchable structures that are generally not compatible with each other, which limits the scope of application and hinders the functional extensions. We prove that asymmetric searchable structure could be converted to symmetric structure, and functions could be modeled separately apart from the core searchable structure. Based on this observation, we propose a layered searchable encryption (LSE) scheme, which provides compatibility, flexibility, and security for various settings and functionalities. In this scheme, the outputs of the core searchable component based on either symmetric or asymmetric setting are converted to some uniform mappings, which are then transmitted to loosely coupled functional components to further filter the results. In such a way, all functional components could directly support both symmetric and asymmetric settings. Based on LSE, we propose two representative and novel constructions for ranked keyword query (previously only available in symmetric scheme) and range query (previously only available in asymmetric scheme). PMID:24719565
Segev, Elad; Baram-Tsabari, Ayelet
2012-10-01
Which extrinsic cues motivate people to search for science-related information? For many science-related search queries, media attention and time during the academic year are highly correlated with changes in information seeking behavior (expressed by changes in the proportion of Google science-related searches). The data mining analysis presented here shows that changes in the volume of searches for general and well-established science terms are strongly linked to the education system. By contrast, ad-hoc events and current concerns were better aligned with media coverage. The interest and ability to independently seek science knowledge in response to current events or concerns is one of the fundamental goals of the science literacy movement. This method provides a mirror of extrapolated behavior and as such can assist researchers in assessing the role of the media in shaping science interests, and inform the ways in which lifelong interests in science are manifested in real world situations.
A JEE RESTful service to access Conditions Data in ATLAS
NASA Astrophysics Data System (ADS)
Formica, Andrea; Gallas, E. J.
2015-12-01
Usage of condition data in ATLAS is extensive for offline reconstruction and analysis (e.g. alignment, calibration, data quality). The system is based on the LCG Conditions Database infrastructure, with read and write access via an ad hoc C++ API (COOL), a system which was developed before Run 1 data taking began. The infrastructure dictates that the data is organized into separate schemas (assigned to subsystems/groups storing distinct and independent sets of conditions), making it difficult to access information from several schemas at the same time. We have thus created PL/SQL functions containing queries to provide content extraction at multi-schema level. The PL/SQL API has been exposed to external clients by means of a Java application providing DB access via REST services, deployed inside an application server (JBoss WildFly). The services allow navigation over multiple schemas via simple URLs. The data can be retrieved either in XML or JSON formats, via simple clients (like curl or Web browsers).
NASA Astrophysics Data System (ADS)
Pollizzi, J. A.; Lezon, K.
The Hubble Space Telescope (HST) generates on the order of 7,000 telemetry values, many of which are sampled at 1Hz, and with several hundred parameters being sampled at 40Hz. Such data volumes would quickly tax even the largest of processing facilities. Yet the ability to access the telemetry data in a variety of ways, and in particular, using ad hoc (i.e., no a priori fixed) queries, is essential to assuring the long term viability and usefulness of this instrument. As part of the recent NASA initiative to re-engineer HST's ground control systems, a concept arose to apply newly available data warehousing technologies to this problem. The Space Telescope Science Institute was engaged to develop a pilot to investigate the technology and to create a proof-of-concept testbed that could be demonstrated and evaluated for operational use. This paper describes this effort and its results.
Applying Content Management to Automated Provenance Capture
DOE Office of Scientific and Technical Information (OSTI.GOV)
Schuchardt, Karen L.; Gibson, Tara D.; Stephan, Eric G.
2008-04-10
Workflows and data pipelines are becoming increasingly valuable in both computational and experimen-tal sciences. These automated systems are capable of generating significantly more data within the same amount of time than their manual counterparts. Automatically capturing and recording data prove-nance and annotation as part of these workflows is critical for data management, verification, and dis-semination. Our goal in addressing the provenance challenge was to develop and end-to-end system that demonstrates real-time capture, persistent content management, and ad-hoc searches of both provenance and metadata using open source software and standard protocols. We describe our prototype, which extends the Kepler workflow toolsmore » for the execution environment, the Scientific Annotation Middleware (SAM) content management software for data services, and an existing HTTP-based query protocol. Our implementation offers several unique capabilities, and through the use of standards, is able to pro-vide access to the provenance record to a variety of commonly available client tools.« less
BioFed: federated query processing over life sciences linked open data.
Hasnain, Ali; Mehmood, Qaiser; Sana E Zainab, Syeda; Saleem, Muhammad; Warren, Claude; Zehra, Durre; Decker, Stefan; Rebholz-Schuhmann, Dietrich
2017-03-15
Biomedical data, e.g. from knowledge bases and ontologies, is increasingly made available following open linked data principles, at best as RDF triple data. This is a necessary step towards unified access to biological data sets, but this still requires solutions to query multiple endpoints for their heterogeneous data to eventually retrieve all the meaningful information. Suggested solutions are based on query federation approaches, which require the submission of SPARQL queries to endpoints. Due to the size and complexity of available data, these solutions have to be optimised for efficient retrieval times and for users in life sciences research. Last but not least, over time, the reliability of data resources in terms of access and quality have to be monitored. Our solution (BioFed) federates data over 130 SPARQL endpoints in life sciences and tailors query submission according to the provenance information. BioFed has been evaluated against the state of the art solution FedX and forms an important benchmark for the life science domain. The efficient cataloguing approach of the federated query processing system 'BioFed', the triple pattern wise source selection and the semantic source normalisation forms the core to our solution. It gathers and integrates data from newly identified public endpoints for federated access. Basic provenance information is linked to the retrieved data. Last but not least, BioFed makes use of the latest SPARQL standard (i.e., 1.1) to leverage the full benefits for query federation. The evaluation is based on 10 simple and 10 complex queries, which address data in 10 major and very popular data sources (e.g., Dugbank, Sider). BioFed is a solution for a single-point-of-access for a large number of SPARQL endpoints providing life science data. It facilitates efficient query generation for data access and provides basic provenance information in combination with the retrieved data. BioFed fully supports SPARQL 1.1 and gives access to the endpoint's availability based on the EndpointData graph. Our evaluation of BioFed against FedX is based on 20 heterogeneous federated SPARQL queries and shows competitive execution performance in comparison to FedX, which can be attributed to the provision of provenance information for the source selection. Developing and testing federated query engines for life sciences data is still a challenging task. According to our findings, it is advantageous to optimise the source selection. The cataloguing of SPARQL endpoints, including type and property indexing, leads to efficient querying of data resources over the Web of Data. This could even be further improved through the use of ontologies, e.g., for abstract normalisation of query terms.
Dinh, Duy; Tamine, Lynda; Boubekeur, Fatiha
2013-02-01
The aim of this work is to evaluate a set of indexing and retrieval strategies based on the integration of several biomedical terminologies on the available TREC Genomics collections for an ad hoc information retrieval (IR) task. We propose a multi-terminology based concept extraction approach to selecting best concepts from free text by means of voting techniques. We instantiate this general approach on four terminologies (MeSH, SNOMED, ICD-10 and GO). We particularly focus on the effect of integrating terminologies into a biomedical IR process, and the utility of using voting techniques for combining the extracted concepts from each document in order to provide a list of unique concepts. Experimental studies conducted on the TREC Genomics collections show that our multi-terminology IR approach based on voting techniques are statistically significant compared to the baseline. For example, tested on the 2005 TREC Genomics collection, our multi-terminology based IR approach provides an improvement rate of +6.98% in terms of MAP (mean average precision) (p<0.05) compared to the baseline. In addition, our experimental results show that document expansion using preferred terms in combination with query expansion using terms from top ranked expanded documents improve the biomedical IR effectiveness. We have evaluated several voting models for combining concepts issued from multiple terminologies. Through this study, we presented many factors affecting the effectiveness of biomedical IR system including term weighting, query expansion, and document expansion models. The appropriate combination of those factors could be useful to improve the IR performance. Copyright © 2012 Elsevier B.V. All rights reserved.
Soulakis, Nicholas D; Carson, Matthew B; Lee, Young Ji; Schneider, Daniel H; Skeehan, Connor T; Scholtens, Denise M
2015-03-01
To visualize and describe collaborative electronic health record (EHR) usage for hospitalized patients with heart failure. We identified records of patients with heart failure and all associated healthcare provider record usage through queries of the Northwestern Medicine Enterprise Data Warehouse. We constructed a network by equating access and updates of a patient's EHR to a provider-patient interaction. We then considered shared patient record access as the basis for a second network that we termed the provider collaboration network. We calculated network statistics, the modularity of provider interactions, and provider cliques. We identified 548 patient records accessed by 5113 healthcare providers in 2012. The provider collaboration network had 1504 nodes and 83 998 edges. We identified 7 major provider collaboration modules. Average clique size was 87.9 providers. We used a graph database to demonstrate an ad hoc query of our provider-patient network. Our analysis suggests a large number of healthcare providers across a wide variety of professions access records of patients with heart failure during their hospital stay. This shared record access tends to take place not only in a pairwise manner but also among large groups of providers. EHRs encode valuable interactions, implicitly or explicitly, between patients and providers. Network analysis provided strong evidence of multidisciplinary record access of patients with heart failure across teams of 100+ providers. Further investigation may lead to clearer understanding of how record access information can be used to strategically guide care coordination for patients hospitalized for heart failure. © The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association.
Identifying QT prolongation from ECG impressions using a general-purpose Natural Language Processor
Denny, Joshua C.; Miller, Randolph A.; Waitman, Lemuel Russell; Arrieta, Mark; Peterson, Joshua F.
2009-01-01
Objective Typically detected via electrocardiograms (ECGs), QT interval prolongation is a known risk factor for sudden cardiac death. Since medications can promote or exacerbate the condition, detection of QT interval prolongation is important for clinical decision support. We investigated the accuracy of natural language processing (NLP) for identifying QT prolongation from cardiologist-generated, free-text ECG impressions compared to corrected QT (QTc) thresholds reported by ECG machines. Methods After integrating negation detection to a locally-developed natural language processor, the KnowledgeMap concept identifier, we evaluated NLP-based detection of QT prolongation compared to the calculated QTc on a set of 44,318 ECGs obtained from hospitalized patients. We also created a string query using regular expressions to identify QT prolongation. We calculated sensitivity and specificity of the methods using manual physician review of the cardiologist-generated reports as the gold standard. To investigate causes of “false positive” calculated QTc, we manually reviewed randomly selected ECGs with a long calculated QTc but no mention of QT prolongation. Separately, we validated the performance of the negation detection algorithm on 5,000 manually-categorized ECG phrases for any medical concept (not limited to QT prolongation) prior to developing the NLP query for QT prolongation. Results The NLP query for QT prolongation correctly identified 2,364 of 2,373 ECGs with QT prolongation with a sensitivity of 0.996 and a positive predictive value of 1.000. There were no false positives. The regular expression query had a sensitivity of 0.999 and positive predictive value of 0.982. In contrast, the positive predictive value of common QTc thresholds derived from ECG machines was 0.07–0.25 with corresponding sensitivities of 0.994–0.046. The negation detection algorithm had a recall of 0.973 and precision of 0.982 for 10,490 concepts found within ECG impressions. Conclusions NLP and regular expression queries of cardiologists’ ECG interpretations can more effectively identify QT prolongation than the automated QTc intervals reported by ECG machines. Future clinical decision support could employ NLP queries to detect QTc prolongation and other reported ECG abnormalities. PMID:18938105
Guide to Using the Teacher Data Use Survey. REL 2017-166
ERIC Educational Resources Information Center
Wayman, Jeffrey C.; Wilkerson, Stephanie B.; Cho, Vincent; Mandinach, Ellen B.; Supovitz, Jonathan A.
2016-01-01
The Teacher Data Use Survey can be used to query teachers, administrators, and instructional support staff about how teachers use data to support instruction, their attitudes toward data, and the supports that help teachers use data. This guide provides step-by-step instructions to help district and school planners implement the survey. The…
An Efficient and QoS Supported Multichannel MAC Protocol for Vehicular Ad Hoc Networks
Tan, Guozhen; Yu, Chao
2017-01-01
Vehicular Ad Hoc Networks (VANETs) employ multichannel to provide a variety of safety and non-safety (transport efficiency and infotainment) applications, based on the IEEE 802.11p and IEEE 1609.4 protocols. Different types of applications require different levels Quality-of-Service (QoS) support. Recently, transport efficiency and infotainment applications (e.g., electronic map download and Internet access) have received more and more attention, and this kind of applications is expected to become a big market driver in a near future. In this paper, we propose an Efficient and QoS supported Multichannel Medium Access Control (EQM-MAC) protocol for VANETs in a highway environment. The EQM-MAC protocol utilizes the service channel resources for non-safety message transmissions during the whole synchronization interval, and it dynamically adjusts minimum contention window size for different non-safety services according to the traffic conditions. Theoretical model analysis and extensive simulation results show that the EQM-MAC protocol can support QoS services, while ensuring the high saturation throughput and low transmission delay for non-safety applications. PMID:28991217
Tiede, Dirk; Baraldi, Andrea; Sudmanns, Martin; Belgiu, Mariana; Lang, Stefan
2017-01-01
ABSTRACT Spatiotemporal analytics of multi-source Earth observation (EO) big data is a pre-condition for semantic content-based image retrieval (SCBIR). As a proof of concept, an innovative EO semantic querying (EO-SQ) subsystem was designed and prototypically implemented in series with an EO image understanding (EO-IU) subsystem. The EO-IU subsystem is automatically generating ESA Level 2 products (scene classification map, up to basic land cover units) from optical satellite data. The EO-SQ subsystem comprises a graphical user interface (GUI) and an array database embedded in a client server model. In the array database, all EO images are stored as a space-time data cube together with their Level 2 products generated by the EO-IU subsystem. The GUI allows users to (a) develop a conceptual world model based on a graphically supported query pipeline as a combination of spatial and temporal operators and/or standard algorithms and (b) create, save and share within the client-server architecture complex semantic queries/decision rules, suitable for SCBIR and/or spatiotemporal EO image analytics, consistent with the conceptual world model. PMID:29098143
Integrating the Department of Defense Supply Chain
2012-01-01
Summary xvii Integrate Financial Policy with System Design and Inventory Planning All DoD operating activities supported by retail and/or tactical...and Kristin J. Leuschner, The Strategic Distribution System in Support of Operation Enduring Freedom, Santa Monica, Calif.: RAND Corporation, DB...to support initial operations for all services early in OIF through ad hoc arrangements. 3 There were three GS SSAs at the time for different
Emotion recognition from EEG using higher order crossings.
Petrantonakis, Panagiotis C; Hadjileontiadis, Leontios J
2010-03-01
Electroencephalogram (EEG)-based emotion recognition is a relatively new field in the affective computing area with challenging issues regarding the induction of the emotional states and the extraction of the features in order to achieve optimum classification performance. In this paper, a novel emotion evocation and EEG-based feature extraction technique is presented. In particular, the mirror neuron system concept was adapted to efficiently foster emotion induction by the process of imitation. In addition, higher order crossings (HOC) analysis was employed for the feature extraction scheme and a robust classification method, namely HOC-emotion classifier (HOC-EC), was implemented testing four different classifiers [quadratic discriminant analysis (QDA), k-nearest neighbor, Mahalanobis distance, and support vector machines (SVMs)], in order to accomplish efficient emotion recognition. Through a series of facial expression image projection, EEG data have been collected by 16 healthy subjects using only 3 EEG channels, namely Fp1, Fp2, and a bipolar channel of F3 and F4 positions according to 10-20 system. Two scenarios were examined using EEG data from a single-channel and from combined-channels, respectively. Compared with other feature extraction methods, HOC-EC appears to outperform them, achieving a 62.3% (using QDA) and 83.33% (using SVM) classification accuracy for the single-channel and combined-channel cases, respectively, differentiating among the six basic emotions, i.e., happiness, surprise, anger, fear, disgust, and sadness. As the emotion class-set reduces its dimension, the HOC-EC converges toward maximum classification rate (100% for five or less emotions), justifying the efficiency of the proposed approach. This could facilitate the integration of HOC-EC in human machine interfaces, such as pervasive healthcare systems, enhancing their affective character and providing information about the user's emotional status (e.g., identifying user's emotion experiences, recurring affective states, time-dependent emotional trends).
Enhanced Approximate Nearest Neighbor via Local Area Focused Search.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gonzales, Antonio; Blazier, Nicholas Paul
Approximate Nearest Neighbor (ANN) algorithms are increasingly important in machine learning, data mining, and image processing applications. There is a large family of space- partitioning ANN algorithms, such as randomized KD-Trees, that work well in practice but are limited by an exponential increase in similarity comparisons required to optimize recall. Additionally, they only support a small set of similarity metrics. We present Local Area Fo- cused Search (LAFS), a method that enhances the way queries are performed using an existing ANN index. Instead of a single query, LAFS performs a number of smaller (fewer similarity comparisons) queries and focuses onmore » a local neighborhood which is refined as candidates are identified. We show that our technique improves performance on several well known datasets and is easily extended to general similarity metrics using kernel projection techniques.« less
Spatial Query for Planetary Data
NASA Technical Reports Server (NTRS)
Shams, Khawaja S.; Crockett, Thomas M.; Powell, Mark W.; Joswig, Joseph C.; Fox, Jason M.
2011-01-01
Science investigators need to quickly and effectively assess past observations of specific locations on a planetary surface. This innovation involves a location-based search technology that was adapted and applied to planetary science data to support a spatial query capability for mission operations software. High-performance location-based searching requires the use of spatial data structures for database organization. Spatial data structures are designed to organize datasets based on their coordinates in a way that is optimized for location-based retrieval. The particular spatial data structure that was adapted for planetary data search is the R+ tree.
A database system to support image algorithm evaluation
NASA Technical Reports Server (NTRS)
Lien, Y. E.
1977-01-01
The design is given of an interactive image database system IMDB, which allows the user to create, retrieve, store, display, and manipulate images through the facility of a high-level, interactive image query (IQ) language. The query language IQ permits the user to define false color functions, pixel value transformations, overlay functions, zoom functions, and windows. The user manipulates the images through generic functions. The user can direct images to display devices for visual and qualitative analysis. Image histograms and pixel value distributions can also be computed to obtain a quantitative analysis of images.
CytoscapeRPC: a plugin to create, modify and query Cytoscape networks from scripting languages.
Bot, Jan J; Reinders, Marcel J T
2011-09-01
CytoscapeRPC is a plugin for Cytoscape which allows users to create, query and modify Cytoscape networks from any programming language which supports XML-RPC. This enables them to access Cytoscape functionality and visualize their data interactively without leaving the programming environment with which they are familiar. Install through the Cytoscape plugin manager or visit the web page: http://wiki.nbic.nl/index.php/CytoscapeRPC for the user tutorial and download. j.j.bot@tudelft.nl; j.j.bot@tudelft.nl.
Using J-Query Mobile Technology to Support a Pedagogical Proficiency Course
ERIC Educational Resources Information Center
Kert, Serhat Bahadir
2013-01-01
Technology-enriched educational environments supported by different technological tools and applications are today's important research areas in the educational literature. During the educational process, different types of technologies are used in order to enhance the learning capabilities of students. Given the popularity of mobile phones, it…
Big Data Analytics with Datalog Queries on Spark.
Shkapsky, Alexander; Yang, Mohan; Interlandi, Matteo; Chiu, Hsuan; Condie, Tyson; Zaniolo, Carlo
2016-01-01
There is great interest in exploiting the opportunity provided by cloud computing platforms for large-scale analytics. Among these platforms, Apache Spark is growing in popularity for machine learning and graph analytics. Developing efficient complex analytics in Spark requires deep understanding of both the algorithm at hand and the Spark API or subsystem APIs (e.g., Spark SQL, GraphX). Our BigDatalog system addresses the problem by providing concise declarative specification of complex queries amenable to efficient evaluation. Towards this goal, we propose compilation and optimization techniques that tackle the important problem of efficiently supporting recursion in Spark. We perform an experimental comparison with other state-of-the-art large-scale Datalog systems and verify the efficacy of our techniques and effectiveness of Spark in supporting Datalog-based analytics.
Big Data Analytics with Datalog Queries on Spark
Shkapsky, Alexander; Yang, Mohan; Interlandi, Matteo; Chiu, Hsuan; Condie, Tyson; Zaniolo, Carlo
2017-01-01
There is great interest in exploiting the opportunity provided by cloud computing platforms for large-scale analytics. Among these platforms, Apache Spark is growing in popularity for machine learning and graph analytics. Developing efficient complex analytics in Spark requires deep understanding of both the algorithm at hand and the Spark API or subsystem APIs (e.g., Spark SQL, GraphX). Our BigDatalog system addresses the problem by providing concise declarative specification of complex queries amenable to efficient evaluation. Towards this goal, we propose compilation and optimization techniques that tackle the important problem of efficiently supporting recursion in Spark. We perform an experimental comparison with other state-of-the-art large-scale Datalog systems and verify the efficacy of our techniques and effectiveness of Spark in supporting Datalog-based analytics. PMID:28626296
Computing health quality measures using Informatics for Integrating Biology and the Bedside.
Klann, Jeffrey G; Murphy, Shawn N
2013-04-19
The Health Quality Measures Format (HQMF) is a Health Level 7 (HL7) standard for expressing computable Clinical Quality Measures (CQMs). Creating tools to process HQMF queries in clinical databases will become increasingly important as the United States moves forward with its Health Information Technology Strategic Plan to Stages 2 and 3 of the Meaningful Use incentive program (MU2 and MU3). Informatics for Integrating Biology and the Bedside (i2b2) is one of the analytical databases used as part of the Office of the National Coordinator (ONC)'s Query Health platform to move toward this goal. Our goal is to integrate i2b2 with the Query Health HQMF architecture, to prepare for other HQMF use-cases (such as MU2 and MU3), and to articulate the functional overlap between i2b2 and HQMF. Therefore, we analyze the structure of HQMF, and then we apply this understanding to HQMF computation on the i2b2 clinical analytical database platform. Specifically, we develop a translator between two query languages, HQMF and i2b2, so that the i2b2 platform can compute HQMF queries. We use the HQMF structure of queries for aggregate reporting, which define clinical data elements and the temporal and logical relationships between them. We use the i2b2 XML format, which allows flexible querying of a complex clinical data repository in an easy-to-understand domain-specific language. The translator can represent nearly any i2b2-XML query as HQMF and execute in i2b2 nearly any HQMF query expressible in i2b2-XML. This translator is part of the freely available reference implementation of the QueryHealth initiative. We analyze limitations of the conversion and find it covers many, but not all, of the complex temporal and logical operators required by quality measures. HQMF is an expressive language for defining quality measures, and it will be important to understand and implement for CQM computation, in both meaningful use and population health. However, its current form might allow complexity that is intractable for current database systems (both in terms of implementation and computation). Our translator, which supports the subset of HQMF currently expressible in i2b2-XML, may represent the beginnings of a practical compromise. It is being pilot-tested in two Query Health demonstration projects, and it can be further expanded to balance computational tractability with the advanced features needed by measure developers.
Computing Health Quality Measures Using Informatics for Integrating Biology and the Bedside
Murphy, Shawn N
2013-01-01
Background The Health Quality Measures Format (HQMF) is a Health Level 7 (HL7) standard for expressing computable Clinical Quality Measures (CQMs). Creating tools to process HQMF queries in clinical databases will become increasingly important as the United States moves forward with its Health Information Technology Strategic Plan to Stages 2 and 3 of the Meaningful Use incentive program (MU2 and MU3). Informatics for Integrating Biology and the Bedside (i2b2) is one of the analytical databases used as part of the Office of the National Coordinator (ONC)’s Query Health platform to move toward this goal. Objective Our goal is to integrate i2b2 with the Query Health HQMF architecture, to prepare for other HQMF use-cases (such as MU2 and MU3), and to articulate the functional overlap between i2b2 and HQMF. Therefore, we analyze the structure of HQMF, and then we apply this understanding to HQMF computation on the i2b2 clinical analytical database platform. Specifically, we develop a translator between two query languages, HQMF and i2b2, so that the i2b2 platform can compute HQMF queries. Methods We use the HQMF structure of queries for aggregate reporting, which define clinical data elements and the temporal and logical relationships between them. We use the i2b2 XML format, which allows flexible querying of a complex clinical data repository in an easy-to-understand domain-specific language. Results The translator can represent nearly any i2b2-XML query as HQMF and execute in i2b2 nearly any HQMF query expressible in i2b2-XML. This translator is part of the freely available reference implementation of the QueryHealth initiative. We analyze limitations of the conversion and find it covers many, but not all, of the complex temporal and logical operators required by quality measures. Conclusions HQMF is an expressive language for defining quality measures, and it will be important to understand and implement for CQM computation, in both meaningful use and population health. However, its current form might allow complexity that is intractable for current database systems (both in terms of implementation and computation). Our translator, which supports the subset of HQMF currently expressible in i2b2-XML, may represent the beginnings of a practical compromise. It is being pilot-tested in two Query Health demonstration projects, and it can be further expanded to balance computational tractability with the advanced features needed by measure developers. PMID:23603227
A mass spectrometry proteomics data management platform.
Sharma, Vagisha; Eng, Jimmy K; Maccoss, Michael J; Riffle, Michael
2012-09-01
Mass spectrometry-based proteomics is increasingly being used in biomedical research. These experiments typically generate a large volume of highly complex data, and the volume and complexity are only increasing with time. There exist many software pipelines for analyzing these data (each typically with its own file formats), and as technology improves, these file formats change and new formats are developed. Files produced from these myriad software programs may accumulate on hard disks or tape drives over time, with older files being rendered progressively more obsolete and unusable with each successive technical advancement and data format change. Although initiatives exist to standardize the file formats used in proteomics, they do not address the core failings of a file-based data management system: (1) files are typically poorly annotated experimentally, (2) files are "organically" distributed across laboratory file systems in an ad hoc manner, (3) files formats become obsolete, and (4) searching the data and comparing and contrasting results across separate experiments is very inefficient (if possible at all). Here we present a relational database architecture and accompanying web application dubbed Mass Spectrometry Data Platform that is designed to address the failings of the file-based mass spectrometry data management approach. The database is designed such that the output of disparate software pipelines may be imported into a core set of unified tables, with these core tables being extended to support data generated by specific pipelines. Because the data are unified, they may be queried, viewed, and compared across multiple experiments using a common web interface. Mass Spectrometry Data Platform is open source and freely available at http://code.google.com/p/msdapl/.
A semantic web framework to integrate cancer omics data with biological knowledge.
Holford, Matthew E; McCusker, James P; Cheung, Kei-Hoi; Krauthammer, Michael
2012-01-25
The RDF triple provides a simple linguistic means of describing limitless types of information. Triples can be flexibly combined into a unified data source we call a semantic model. Semantic models open new possibilities for the integration of variegated biological data. We use Semantic Web technology to explicate high throughput clinical data in the context of fundamental biological knowledge. We have extended Corvus, a data warehouse which provides a uniform interface to various forms of Omics data, by providing a SPARQL endpoint. With the querying and reasoning tools made possible by the Semantic Web, we were able to explore quantitative semantic models retrieved from Corvus in the light of systematic biological knowledge. For this paper, we merged semantic models containing genomic, transcriptomic and epigenomic data from melanoma samples with two semantic models of functional data - one containing Gene Ontology (GO) data, the other, regulatory networks constructed from transcription factor binding information. These two semantic models were created in an ad hoc manner but support a common interface for integration with the quantitative semantic models. Such combined semantic models allow us to pose significant translational medicine questions. Here, we study the interplay between a cell's molecular state and its response to anti-cancer therapy by exploring the resistance of cancer cells to Decitabine, a demethylating agent. We were able to generate a testable hypothesis to explain how Decitabine fights cancer - namely, that it targets apoptosis-related gene promoters predominantly in Decitabine-sensitive cell lines, thus conveying its cytotoxic effect by activating the apoptosis pathway. Our research provides a framework whereby similar hypotheses can be developed easily.
Archetype-based data warehouse environment to enable the reuse of electronic health record data.
Marco-Ruiz, Luis; Moner, David; Maldonado, José A; Kolstrup, Nils; Bellika, Johan G
2015-09-01
The reuse of data captured during health care delivery is essential to satisfy the demands of clinical research and clinical decision support systems. A main barrier for the reuse is the existence of legacy formats of data and the high granularity of it when stored in an electronic health record (EHR) system. Thus, we need mechanisms to standardize, aggregate, and query data concealed in the EHRs, to allow their reuse whenever they are needed. To create a data warehouse infrastructure using archetype-based technologies, standards and query languages to enable the interoperability needed for data reuse. The work presented makes use of best of breed archetype-based data transformation and storage technologies to create a workflow for the modeling, extraction, transformation and load of EHR proprietary data into standardized data repositories. We converted legacy data and performed patient-centered aggregations via archetype-based transformations. Later, specific purpose aggregations were performed at a query level for particular use cases. Laboratory test results of a population of 230,000 patients belonging to Troms and Finnmark counties in Norway requested between January 2013 and November 2014 have been standardized. Test records normalization has been performed by defining transformation and aggregation functions between the laboratory records and an archetype. These mappings were used to automatically generate open EHR compliant data. These data were loaded into an archetype-based data warehouse. Once loaded, we defined indicators linked to the data in the warehouse to monitor test activity of Salmonella and Pertussis using the archetype query language. Archetype-based standards and technologies can be used to create a data warehouse environment that enables data from EHR systems to be reused in clinical research and decision support systems. With this approach, existing EHR data becomes available in a standardized and interoperable format, thus opening a world of possibilities toward semantic or concept-based reuse, query and communication of clinical data. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Horvath, Monica M.; Rusincovitch, Shelley A.; Brinson, Stephanie; Shang, Howard C.; Evans, Steve; Ferranti, Jeffrey M.
2015-01-01
Purpose Data generated in the care of patients are widely used to support clinical research and quality improvement, which has hastened the development of self-service query tools. User interface design for such tools, execution of query activity, and underlying application architecture have not been widely reported, and existing tools reflect a wide heterogeneity of methods and technical frameworks. We describe the design, application architecture, and use of a self-service model for enterprise data delivery within Duke Medicine. Methods Our query platform, the Duke Enterprise Data Unified Content Explorer (DEDUCE), supports enhanced data exploration, cohort identification, and data extraction from our enterprise data warehouse (EDW) using a series of modular environments that interact with a central keystone module, Cohort Manager (CM). A data-driven application architecture is implemented through three components: an application data dictionary, the concept of “smart dimensions”, and dynamically-generated user interfaces. Results DEDUCE CM allows flexible hierarchies of EDW queries within a grid-like workspace. A cohort “join” functionality allows switching between filters based on criteria occurring within or across patient encounters. To date, 674 users have been trained and activated in DEDUCE, and logon activity shows a steady increase, with variability between months. A comparison of filter conditions and export criteria shows that these activities have different patterns of usage across subject areas. Conclusions Organizations with sophisticated EDWs may find that users benefit from development of advanced query functionality, complimentary to the user interfaces and infrastructure used in other well-published models. Driven by its EDW context, the DEDUCE application architecture was also designed to be responsive to source data and to allow modification through alterations in metadata rather than programming, allowing an agile response to source system changes. PMID:25051403
Horvath, Monica M; Rusincovitch, Shelley A; Brinson, Stephanie; Shang, Howard C; Evans, Steve; Ferranti, Jeffrey M
2014-12-01
Data generated in the care of patients are widely used to support clinical research and quality improvement, which has hastened the development of self-service query tools. User interface design for such tools, execution of query activity, and underlying application architecture have not been widely reported, and existing tools reflect a wide heterogeneity of methods and technical frameworks. We describe the design, application architecture, and use of a self-service model for enterprise data delivery within Duke Medicine. Our query platform, the Duke Enterprise Data Unified Content Explorer (DEDUCE), supports enhanced data exploration, cohort identification, and data extraction from our enterprise data warehouse (EDW) using a series of modular environments that interact with a central keystone module, Cohort Manager (CM). A data-driven application architecture is implemented through three components: an application data dictionary, the concept of "smart dimensions", and dynamically-generated user interfaces. DEDUCE CM allows flexible hierarchies of EDW queries within a grid-like workspace. A cohort "join" functionality allows switching between filters based on criteria occurring within or across patient encounters. To date, 674 users have been trained and activated in DEDUCE, and logon activity shows a steady increase, with variability between months. A comparison of filter conditions and export criteria shows that these activities have different patterns of usage across subject areas. Organizations with sophisticated EDWs may find that users benefit from development of advanced query functionality, complimentary to the user interfaces and infrastructure used in other well-published models. Driven by its EDW context, the DEDUCE application architecture was also designed to be responsive to source data and to allow modification through alterations in metadata rather than programming, allowing an agile response to source system changes. Copyright © 2014 Elsevier Inc. All rights reserved.
Manchester visual query language
NASA Astrophysics Data System (ADS)
Oakley, John P.; Davis, Darryl N.; Shann, Richard T.
1993-04-01
We report a database language for visual retrieval which allows queries on image feature information which has been computed and stored along with images. The language is novel in that it provides facilities for dealing with feature data which has actually been obtained from image analysis. Each line in the Manchester Visual Query Language (MVQL) takes a set of objects as input and produces another, usually smaller, set as output. The MVQL constructs are mainly based on proven operators from the field of digital image analysis. An example is the Hough-group operator which takes as input a specification for the objects to be grouped, a specification for the relevant Hough space, and a definition of the voting rule. The output is a ranked list of high scoring bins. The query could be directed towards one particular image or an entire image database, in the latter case the bins in the output list would in general be associated with different images. We have implemented MVQL in two layers. The command interpreter is a Lisp program which maps each MVQL line to a sequence of commands which are used to control a specialized database engine. The latter is a hybrid graph/relational system which provides low-level support for inheritance and schema evolution. In the paper we outline the language and provide examples of useful queries. We also describe our solution to the engineering problems associated with the implementation of MVQL.
The Ruby UCSC API: accessing the UCSC genome database using Ruby.
Mishima, Hiroyuki; Aerts, Jan; Katayama, Toshiaki; Bonnal, Raoul J P; Yoshiura, Koh-ichiro
2012-09-21
The University of California, Santa Cruz (UCSC) genome database is among the most used sources of genomic annotation in human and other organisms. The database offers an excellent web-based graphical user interface (the UCSC genome browser) and several means for programmatic queries. A simple application programming interface (API) in a scripting language aimed at the biologist was however not yet available. Here, we present the Ruby UCSC API, a library to access the UCSC genome database using Ruby. The API is designed as a BioRuby plug-in and built on the ActiveRecord 3 framework for the object-relational mapping, making writing SQL statements unnecessary. The current version of the API supports databases of all organisms in the UCSC genome database including human, mammals, vertebrates, deuterostomes, insects, nematodes, and yeast.The API uses the bin index-if available-when querying for genomic intervals. The API also supports genomic sequence queries using locally downloaded *.2bit files that are not stored in the official MySQL database. The API is implemented in pure Ruby and is therefore available in different environments and with different Ruby interpreters (including JRuby). Assisted by the straightforward object-oriented design of Ruby and ActiveRecord, the Ruby UCSC API will facilitate biologists to query the UCSC genome database programmatically. The API is available through the RubyGem system. Source code and documentation are available at https://github.com/misshie/bioruby-ucsc-api/ under the Ruby license. Feedback and help is provided via the website at http://rubyucscapi.userecho.com/.
The Ruby UCSC API: accessing the UCSC genome database using Ruby
2012-01-01
Background The University of California, Santa Cruz (UCSC) genome database is among the most used sources of genomic annotation in human and other organisms. The database offers an excellent web-based graphical user interface (the UCSC genome browser) and several means for programmatic queries. A simple application programming interface (API) in a scripting language aimed at the biologist was however not yet available. Here, we present the Ruby UCSC API, a library to access the UCSC genome database using Ruby. Results The API is designed as a BioRuby plug-in and built on the ActiveRecord 3 framework for the object-relational mapping, making writing SQL statements unnecessary. The current version of the API supports databases of all organisms in the UCSC genome database including human, mammals, vertebrates, deuterostomes, insects, nematodes, and yeast. The API uses the bin index—if available—when querying for genomic intervals. The API also supports genomic sequence queries using locally downloaded *.2bit files that are not stored in the official MySQL database. The API is implemented in pure Ruby and is therefore available in different environments and with different Ruby interpreters (including JRuby). Conclusions Assisted by the straightforward object-oriented design of Ruby and ActiveRecord, the Ruby UCSC API will facilitate biologists to query the UCSC genome database programmatically. The API is available through the RubyGem system. Source code and documentation are available at https://github.com/misshie/bioruby-ucsc-api/ under the Ruby license. Feedback and help is provided via the website at http://rubyucscapi.userecho.com/. PMID:22994508
Brown, Jeffrey S; Holmes, John H; Shah, Kiran; Hall, Ken; Lazarus, Ross; Platt, Richard
2010-06-01
Comparative effectiveness research, medical product safety evaluation, and quality measurement will require the ability to use electronic health data held by multiple organizations. There is no consensus about whether to create regional or national combined (eg, "all payer") databases for these purposes, or distributed data networks that leave most Protected Health Information and proprietary data in the possession of the original data holders. Demonstrate functions of a distributed research network that supports research needs and also address data holders concerns about participation. Key design functions included strong local control of data uses and a centralized web-based querying interface. We implemented a pilot distributed research network and evaluated the design considerations, utility for research, and the acceptability to data holders of methods for menu-driven querying. We developed and tested a central, web-based interface with supporting network software. Specific functions assessed include query formation and distribution, query execution and review, and aggregation of results. This pilot successfully evaluated temporal trends in medication use and diagnoses at 5 separate sites, demonstrating some of the possibilities of using a distributed research network. The pilot demonstrated the potential utility of the design, which addressed the major concerns of both users and data holders. No serious obstacles were identified that would prevent development of a fully functional, scalable network. Distributed networks are capable of addressing nearly all anticipated uses of routinely collected electronic healthcare data. Distributed networks would obviate the need for centralized databases, thus avoiding numerous obstacles.
A Note on Interfacing Object Warehouses and Mass Storage Systems for Data Mining Applications
NASA Technical Reports Server (NTRS)
Grossman, Robert L.; Northcutt, Dave
1996-01-01
Data mining is the automatic discovery of patterns, associations, and anomalies in data sets. Data mining requires numerically and statistically intensive queries. Our assumption is that data mining requires a specialized data management infrastructure to support the aforementioned intensive queries, but because of the sizes of data involved, this infrastructure is layered over a hierarchical storage system. In this paper, we discuss the architecture of a system which is layered for modularity, but exploits specialized lightweight services to maintain efficiency. Rather than use a full functioned database for example, we use light weight object services specialized for data mining. We propose using information repositories between layers so that components on either side of the layer can access information in the repositories to assist in making decisions about data layout, the caching and migration of data, the scheduling of queries, and related matters.
ERIC Educational Resources Information Center
Barrera, Arnold; Braley, Richard T.; Slate, John R.
2010-01-01
Teacher mentors of first-year teachers provided insight into those practices they viewed as essential for their success in the mentoring role. Specifically, they were queried about teacher involvement/support, staff development, administrative support and resource materials. Almost all of the mentor teachers believed a teacher mentoring program…
Mentors' Views of Factors Essential for the Success of Beginning Teachers
ERIC Educational Resources Information Center
Barrera, Arnold; Braley, Richard; Slate, John R.
2008-01-01
The views of 46 mentors of first-year teachers were obtained regarding practices that they viewed as essential for their success in mentoring teachers. Specifically, they were queried about teacher involvement/support, staff development, administrative support, and resource materials. Almost all of the mentor teachers believed a teacher-mentoring…
Barton, Krysta S; Tate, Tyler; Lau, Nancy; Taliesin, Karen B; Waldman, Elisha D; Rosenberg, Abby R
2018-06-01
Supporting patients' spiritual needs is central to palliative care. Adolescents and young adults (AYAs) may be developing their spiritual identities; it is unclear how to navigate conversations concerning their spiritual needs. To 1) describe spiritual narratives among AYAs based on their self-identification as religious, spiritual, both, or neither and 2) identify language to support AYAs' spiritual needs in keeping with their self-identities. In this mixed-methods, prospective, longitudinal cohort study, AYAs (14-25 years old) with newly diagnosed cancer self-reported their "religiousness" and "spirituality." One-on-one, semistructured interviews were conducted at three time points (within 60 days of diagnosis, six to 12 months, and 12-18 months later) and included queries about spirituality, God/prayer, meaning from illness, and evolving self-identity. Post hoc directed content analysis informed a framework for approaching religious/spiritual discussions. Seventeen AYAs (mean age 17.1 years, SD = 2.7, 47% male) participated in 44 interviews. Of n = 16 with concurrent survey responses, five (31%) self-identified as both "religious and spiritual," five (31%) as "spiritual, not religious," one (6%) as "religious, not spiritual," and five (31%) as neither. Those who endorsed religiousness tended to cite faith as a source of strength, whereas many who declined this self-identity explicitly questioned their preexisting beliefs. Regardless of self-identified "religiousness" or "spirituality," most participants endorsed quests for meaning, purpose, and/or legacy, and all included constructs of hope in their narratives. AYA self-identities evolve during the illness experience. When words such as "religion" and "spirituality" do not fit, explicitly exploring hopes, worries, meaning, and changing life perspectives may be a promising alternative. Copyright © 2018 American Academy of Hospice and Palliative Medicine. Published by Elsevier Inc. All rights reserved.
Percolator: Scalable Pattern Discovery in Dynamic Graphs
DOE Office of Scientific and Technical Information (OSTI.GOV)
Choudhury, Sutanay; Purohit, Sumit; Lin, Peng
We demonstrate Percolator, a distributed system for graph pattern discovery in dynamic graphs. In contrast to conventional mining systems, Percolator advocates efficient pattern mining schemes that (1) support pattern detection with keywords; (2) integrate incremental and parallel pattern mining; and (3) support analytical queries such as trend analysis. The core idea of Percolator is to dynamically decide and verify a small fraction of patterns and their in- stances that must be inspected in response to buffered updates in dynamic graphs, with a total mining cost independent of graph size. We demonstrate a) the feasibility of incremental pattern mining by walkingmore » through each component of Percolator, b) the efficiency and scalability of Percolator over the sheer size of real-world dynamic graphs, and c) how the user-friendly GUI of Percolator inter- acts with users to support keyword-based queries that detect, browse and inspect trending patterns. We also demonstrate two user cases of Percolator, in social media trend analysis and academic collaboration analysis, respectively.« less
NoSQL Based 3D City Model Management System
NASA Astrophysics Data System (ADS)
Mao, B.; Harrie, L.; Cao, J.; Wu, Z.; Shen, J.
2014-04-01
To manage increasingly complicated 3D city models, a framework based on NoSQL database is proposed in this paper. The framework supports import and export of 3D city model according to international standards such as CityGML, KML/COLLADA and X3D. We also suggest and implement 3D model analysis and visualization in the framework. For city model analysis, 3D geometry data and semantic information (such as name, height, area, price and so on) are stored and processed separately. We use a Map-Reduce method to deal with the 3D geometry data since it is more complex, while the semantic analysis is mainly based on database query operation. For visualization, a multiple 3D city representation structure CityTree is implemented within the framework to support dynamic LODs based on user viewpoint. Also, the proposed framework is easily extensible and supports geoindexes to speed up the querying. Our experimental results show that the proposed 3D city management system can efficiently fulfil the analysis and visualization requirements.
Stjernswärd, Sigrid
2015-03-01
Families living with mental illness can experience added burden and eventually own ill health. National and international guidelines support the development of web based solutions to cost-effectively address health care needs. Mapping patterns of internet use may help tailor interventions to effectively address users' needs. The study's aim was to explore the contents of relatives' and friends' queries on a psychologists' website with professional feedback. All visible questions [n = 59] classified under the website's "helping a relative/friend" category between 20 090 615-20 130 927 were printed out and analyzed using content analysis. The analysis resulted in four categories and subcategories illuminating families'/friends' areas of concern: support to help; concerns with health care; young children's welfare; and repercussions on own health. "Question & Answer" [Q&A] forums can shed light onto health seekers' online behavior and may be a way of addressing families' needs of support and information. Further studies are needed to assess the replies' therapeutic value for their recipients.
NLM at TREC 2012 Medical Records Track
2012-11-01
automatic runs are not significantly above the medians. As in 2011, we conclude that the existing search engines are mature enough to support cohort selection tasks, and the quality of the queries could be
Rice, Michael; Gladstone, William; Weir, Michael
2004-01-01
We discuss how relational databases constitute an ideal framework for representing and analyzing large-scale genomic data sets in biology. As a case study, we describe a Drosophila splice-site database that we recently developed at Wesleyan University for use in research and teaching. The database stores data about splice sites computed by a custom algorithm using Drosophila cDNA transcripts and genomic DNA and supports a set of procedures for analyzing splice-site sequence space. A generic Web interface permits the execution of the procedures with a variety of parameter settings and also supports custom structured query language queries. Moreover, new analytical procedures can be added by updating special metatables in the database without altering the Web interface. The database provides a powerful setting for students to develop informatic thinking skills.
2004-01-01
We discuss how relational databases constitute an ideal framework for representing and analyzing large-scale genomic data sets in biology. As a case study, we describe a Drosophila splice-site database that we recently developed at Wesleyan University for use in research and teaching. The database stores data about splice sites computed by a custom algorithm using Drosophila cDNA transcripts and genomic DNA and supports a set of procedures for analyzing splice-site sequence space. A generic Web interface permits the execution of the procedures with a variety of parameter settings and also supports custom structured query language queries. Moreover, new analytical procedures can be added by updating special metatables in the database without altering the Web interface. The database provides a powerful setting for students to develop informatic thinking skills. PMID:15592597
Stetler, Cheryl B; McQueen, Lynn; Demakis, John; Mittman, Brian S
2008-01-01
Background The continuing gap between available evidence and current practice in health care reinforces the need for more effective solutions, in particular related to organizational context. Considerable advances have been made within the U.S. Veterans Health Administration (VA) in systematically implementing evidence into practice. These advances have been achieved through a system-level program focused on collaboration and partnerships among policy makers, clinicians, and researchers. The Quality Enhancement Research Initiative (QUERI) was created to generate research-driven initiatives that directly enhance health care quality within the VA and, simultaneously, contribute to the field of implementation science. This paradigm-shifting effort provided a natural laboratory for exploring organizational change processes. This article describes the underlying change framework and implementation strategy used to operationalize QUERI. Strategic approach to organizational change QUERI used an evidence-based organizational framework focused on three contextual elements: 1) cultural norms and values, in this case related to the role of health services researchers in evidence-based quality improvement; 2) capacity, in this case among researchers and key partners to engage in implementation research; 3) and supportive infrastructures to reinforce expectations for change and to sustain new behaviors as part of the norm. As part of a QUERI Series in Implementation Science, this article describes the framework's application in an innovative integration of health services research, policy, and clinical care delivery. Conclusion QUERI's experience and success provide a case study in organizational change. It demonstrates that progress requires a strategic, systems-based effort. QUERI's evidence-based initiative involved a deliberate cultural shift, requiring ongoing commitment in multiple forms and at multiple levels. VA's commitment to QUERI came in the form of visionary leadership, targeted allocation of resources, infrastructure refinements, innovative peer review and study methods, and direct involvement of key stakeholders. Stakeholders included both those providing and managing clinical care, as well as those producing relevant evidence within the health care system. The organizational framework and related implementation interventions used to achieve contextual change resulted in engaged investigators and enhanced uptake of research knowledge. QUERI's approach and progress provide working hypotheses for others pursuing similar system-wide efforts to routinely achieve evidence-based care. PMID:18510750
NASA Astrophysics Data System (ADS)
Leroy, Norah
2017-08-01
This paper addresses the theme of social inclusion through language learning. The focus is on an ad-hoc tutoring scheme set up between newly arrived British migrant pupils and French monolingual pupils in a small secondary school in the south-west of France. Though the original objective of this tutoring scheme was to improve the English skills of the younger pupils, feedback reports indicated that it also had a positive impact on the relationship between the British migrant pupils and their French peers. Teachers believed that those involved participated more fully in class, and appeared more self-assured and generally happy thanks to the interpersonal relationships this scheme helped to forge. This study demonstrates the necessity of analysing the socio-cultural context migrants may find themselves in, in order to identify potential challenges. The ad-hoc tutoring scheme described here is an example of how language learning can support the integration and inclusion of "new generation" migrants into everyday school life.
2014-01-01
Introduction: The Internet is revolutionizing tobacco control, but few have harnessed the Web for surveillance. We demonstrate for the first time an approach for analyzing aggregate Internet search queries that captures precise changes in population considerations about tobacco. Methods: We compared tobacco-related Google queries originating in the United States during the week of the State Children’s Health Insurance Program (SCHIP) 2009 cigarette excise tax increase with a historic baseline. Specific queries were then ranked according to their relative increases while also considering approximations of changes in absolute search volume. Results: Individual queries with the largest relative increases the week of the SCHIP tax were “cigarettes Indian reservations” 640% (95% CI, 472–918), “free cigarettes online” 557% (95% CI, 432–756), and “Indian reservations cigarettes” 542% (95% CI, 414–733), amounting to about 7,500 excess searches. By themes, the largest relative increases were tribal cigarettes 246% (95% CI, 228–265), “free” cigarettes 215% (95% CI, 191–242), and cigarette stores 176% (95% CI, 160–193), accounting for 21,000, 27,000, and 90,000 excess queries. All avoidance queries, including those aforementioned themes, relatively increased 150% (95% CI, 144–155) or 550,000 from their baseline. All cessation queries increased 46% (95% CI, 44–48), or 175,000, around SCHIP; including themes for “cold turkey” 19% (95% CI, 11–27) or 2,600, cessation products 47% (95% CI, 44–50) or 78,000, and dubious cessation approaches (e.g., hypnosis) 40% (95% CI, 33–47) or 2,300. Conclusions: The SCHIP tax motivated specific changes in population considerations. Our strategy can support evaluations that temporally link tobacco control measures with instantaneous population reactions, as well as serve as a springboard for traditional studies, for example, including survey questionnaire design. PMID:24323570
YAHA: fast and flexible long-read alignment with optimal breakpoint detection.
Faust, Gregory G; Hall, Ira M
2012-10-01
With improved short-read assembly algorithms and the recent development of long-read sequencers, split mapping will soon be the preferred method for structural variant (SV) detection. Yet, current alignment tools are not well suited for this. We present YAHA, a fast and flexible hash-based aligner. YAHA is as fast and accurate as BWA-SW at finding the single best alignment per query and is dramatically faster and more sensitive than both SSAHA2 and MegaBLAST at finding all possible alignments. Unlike other aligners that report all, or one, alignment per query, or that use simple heuristics to select alignments, YAHA uses a directed acyclic graph to find the optimal set of alignments that cover a query using a biologically relevant breakpoint penalty. YAHA can also report multiple mappings per defined segment of the query. We show that YAHA detects more breakpoints in less time than BWA-SW across all SV classes, and especially excels at complex SVs comprising multiple breakpoints. YAHA is currently supported on 64-bit Linux systems. Binaries and sample data are freely available for download from http://faculty.virginia.edu/irahall/YAHA. imh4y@virginia.edu.
Automated Database Mediation Using Ontological Metadata Mappings
Marenco, Luis; Wang, Rixin; Nadkarni, Prakash
2009-01-01
Objective To devise an automated approach for integrating federated database information using database ontologies constructed from their extended metadata. Background One challenge of database federation is that the granularity of representation of equivalent data varies across systems. Dealing effectively with this problem is analogous to dealing with precoordinated vs. postcoordinated concepts in biomedical ontologies. Model Description The authors describe an approach based on ontological metadata mapping rules defined with elements of a global vocabulary, which allows a query specified at one granularity level to fetch data, where possible, from databases within the federation that use different granularities. This is implemented in OntoMediator, a newly developed production component of our previously described Query Integrator System. OntoMediator's operation is illustrated with a query that accesses three geographically separate, interoperating databases. An example based on SNOMED also illustrates the applicability of high-level rules to support the enforcement of constraints that can prevent inappropriate curator or power-user actions. Summary A rule-based framework simplifies the design and maintenance of systems where categories of data must be mapped to each other, for the purpose of either cross-database query or for curation of the contents of compositional controlled vocabularies. PMID:19567801
Query-Biased Preview over Outsourced and Encrypted Data
Luo, Guangchun; Qin, Ke; Chen, Aiguo
2013-01-01
For both convenience and security, more and more users encrypt their sensitive data before outsourcing it to a third party such as cloud storage service. However, searching for the desired documents becomes problematic since it is costly to download and decrypt each possibly needed document to check if it contains the desired content. An informative query-biased preview feature, as applied in modern search engine, could help the users to learn about the content without downloading the entire document. However, when the data are encrypted, securely extracting a keyword-in-context snippet from the data as a preview becomes a challenge. Based on private information retrieval protocol and the core concept of searchable encryption, we propose a single-server and two-round solution to securely obtain a query-biased snippet over the encrypted data from the server. We achieve this novel result by making a document (plaintext) previewable under any cryptosystem and constructing a secure index to support dynamic computation for a best matched snippet when queried by some keywords. For each document, the scheme has O(d) storage complexity and O(log(d/s) + s + d/s) communication complexity, where d is the document size and s is the snippet length. PMID:24078798
Query-biased preview over outsourced and encrypted data.
Peng, Ningduo; Luo, Guangchun; Qin, Ke; Chen, Aiguo
2013-01-01
For both convenience and security, more and more users encrypt their sensitive data before outsourcing it to a third party such as cloud storage service. However, searching for the desired documents becomes problematic since it is costly to download and decrypt each possibly needed document to check if it contains the desired content. An informative query-biased preview feature, as applied in modern search engine, could help the users to learn about the content without downloading the entire document. However, when the data are encrypted, securely extracting a keyword-in-context snippet from the data as a preview becomes a challenge. Based on private information retrieval protocol and the core concept of searchable encryption, we propose a single-server and two-round solution to securely obtain a query-biased snippet over the encrypted data from the server. We achieve this novel result by making a document (plaintext) previewable under any cryptosystem and constructing a secure index to support dynamic computation for a best matched snippet when queried by some keywords. For each document, the scheme has O(d) storage complexity and O(log(d/s) + s + d/s) communication complexity, where d is the document size and s is the snippet length.
GenoMetric Query Language: a novel approach to large-scale genomic data management.
Masseroli, Marco; Pinoli, Pietro; Venco, Francesco; Kaitoua, Abdulrahman; Jalili, Vahid; Palluzzi, Fernando; Muller, Heiko; Ceri, Stefano
2015-06-15
Improvement of sequencing technologies and data processing pipelines is rapidly providing sequencing data, with associated high-level features, of many individual genomes in multiple biological and clinical conditions. They allow for data-driven genomic, transcriptomic and epigenomic characterizations, but require state-of-the-art 'big data' computing strategies, with abstraction levels beyond available tool capabilities. We propose a high-level, declarative GenoMetric Query Language (GMQL) and a toolkit for its use. GMQL operates downstream of raw data preprocessing pipelines and supports queries over thousands of heterogeneous datasets and samples; as such it is key to genomic 'big data' analysis. GMQL leverages a simple data model that provides both abstractions of genomic region data and associated experimental, biological and clinical metadata and interoperability between many data formats. Based on Hadoop framework and Apache Pig platform, GMQL ensures high scalability, expressivity, flexibility and simplicity of use, as demonstrated by several biological query examples on ENCODE and TCGA datasets. The GMQL toolkit is freely available for non-commercial use at http://www.bioinformatics.deib.polimi.it/GMQL/. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
CellLineNavigator: a workbench for cancer cell line analysis
Krupp, Markus; Itzel, Timo; Maass, Thorsten; Hildebrandt, Andreas; Galle, Peter R.; Teufel, Andreas
2013-01-01
The CellLineNavigator database, freely available at http://www.medicalgenomics.org/celllinenavigator, is a web-based workbench for large scale comparisons of a large collection of diverse cell lines. It aims to support experimental design in the fields of genomics, systems biology and translational biomedical research. Currently, this compendium holds genome wide expression profiles of 317 different cancer cell lines, categorized into 57 different pathological states and 28 individual tissues. To enlarge the scope of CellLineNavigator, the database was furthermore closely linked to commonly used bioinformatics databases and knowledge repositories. To ensure easy data access and search ability, a simple data and an intuitive querying interface were implemented. It allows the user to explore and filter gene expression, focusing on pathological or physiological conditions. For a more complex search, the advanced query interface may be used to query for (i) differentially expressed genes; (ii) pathological or physiological conditions; or (iii) gene names or functional attributes, such as Kyoto Encyclopaedia of Genes and Genomes pathway maps. These queries may also be combined. Finally, CellLineNavigator allows additional advanced analysis of differentially regulated genes by a direct link to the Database for Annotation, Visualization and Integrated Discovery (DAVID) Bioinformatics Resources. PMID:23118487
Federated Space-Time Query for Earth Science Data Using OpenSearch Conventions
NASA Astrophysics Data System (ADS)
Lynnes, C.; Beaumont, B.; Duerr, R. E.; Hua, H.
2009-12-01
The past decade has seen a burgeoning of remote sensing and Earth science data providers, as evidenced in the growth of the Earth Science Information Partner (ESIP) federation. At the same time, the need to combine diverse data sets to enable understanding of the Earth as a system has also grown. While the expansion of data providers is in general a boon to such studies, the diversity presents a challenge to finding useful data for a given study. Locating all the data files with aerosol information for a particular volcanic eruption, for example, may involve learning and using several different search tools to execute the requisite space-time queries. To address this issue, the ESIP federation is developing a federated space-time query framework, based on the OpenSearch convention (www.opensearch.org), with Geo and Time extensions. In this framework, data providers publish OpenSearch Description Documents that describe in a machine-readable form how to execute queries against the provider. The novelty of OpenSearch is that the space-time query interface becomes both machine callable and easy enough to integrate into the web browser's search box. This flexibility, together with a simple REST (HTTP-get) interface, should allow a variety of data providers to participate in the federated search framework, from large institutional data centers to individual scientists. The simple interface enables trivial querying of multiple data sources and participation in recursive-like federated searches--all using the same common OpenSearch interface. This simplicity also makes the construction of clients easy, as does existing OpenSearch client libraries in a variety of languages. Moreover, a number of clients and aggregation services already exist and OpenSearch is already supported by a number of web browsers such as Firefox and Internet Explorer.
F. O. Kern, Elizabeth; Beischel, Scott; Stalnaker, Randal; Aron, David C.; Kirsh, Susan R.; Watts, Sharon A.
2008-01-01
Background Little information is available describing how to implement a disease registry from an electronic patient record system. The aim of this report is to describe the technology, methods, and utility of a diabetes registry populated by the Veterans Health Information Systems Architecture (VistA), which underlies the computerized patient record system of the Veterans Health Administration (VHA) in Veteran Affairs Integrated Service Network 10 (VISN 10). Methods VISN 10 data from VistA were mapped to a relational SQL-based data system using KB_SQL software. Operational definitions for diabetes, active clinical management, and responsible providers were used to create views of patient-level data in the diabetes registry. Query Analyzer was used to access the data views directly. Semicustomizable reports were created by linking the diabetes registry to a Web page using Microsoft asp.net2. A retrospective observational study design was used to analyze trends in the process of care and outcomes. Results Since October 2001, 81,227 patients with diabetes have enrolled in VISN 10: approximately 42,000 are currently under active management by VISN 10 providers. By tracking primary care visits, we assigned 91% to a clinic group responsible for diabetes care. In the Cleveland Veterans Affairs Medical Center (VAMC), the frequency of mean annual hemoglobin A1c levels ≥9% has declined significantly over 5 years. Almost 4000 patients have been seen in diabetes intervention programs in the Cleveland VAMC over the past 4 years. Conclusions A diabetes registry can be populated from the database underlying the VHA electronic patient record database system and linked to Web-based and ad hoc queries useful for quality improvement. PMID:19885172
SIRSALE: integrated video database management tools
NASA Astrophysics Data System (ADS)
Brunie, Lionel; Favory, Loic; Gelas, J. P.; Lefevre, Laurent; Mostefaoui, Ahmed; Nait-Abdesselam, F.
2002-07-01
Video databases became an active field of research during the last decade. The main objective in such systems is to provide users with capabilities to friendly search, access and playback distributed stored video data in the same way as they do for traditional distributed databases. Hence, such systems need to deal with hard issues : (a) video documents generate huge volumes of data and are time sensitive (streams must be delivered at a specific bitrate), (b) contents of video data are very hard to be automatically extracted and need to be humanly annotated. To cope with these issues, many approaches have been proposed in the literature including data models, query languages, video indexing etc. In this paper, we present SIRSALE : a set of video databases management tools that allow users to manipulate video documents and streams stored in large distributed repositories. All the proposed tools are based on generic models that can be customized for specific applications using ad-hoc adaptation modules. More precisely, SIRSALE allows users to : (a) browse video documents by structures (sequences, scenes, shots) and (b) query the video database content by using a graphical tool, adapted to the nature of the target video documents. This paper also presents an annotating interface which allows archivists to describe the content of video documents. All these tools are coupled to a video player integrating remote VCR functionalities and are based on active network technology. So, we present how dedicated active services allow an optimized video transport for video streams (with Tamanoir active nodes). We then describe experiments of using SIRSALE on an archive of news video and soccer matches. The system has been demonstrated to professionals with a positive feedback. Finally, we discuss open issues and present some perspectives.
High Performance Analytics with the R3-Cache
NASA Astrophysics Data System (ADS)
Eavis, Todd; Sayeed, Ruhan
Contemporary data warehouses now represent some of the world’s largest databases. As these systems grow in size and complexity, however, it becomes increasingly difficult for brute force query processing approaches to meet the performance demands of end users. Certainly, improved indexing and more selective view materialization are helpful in this regard. Nevertheless, with warehouses moving into the multi-terabyte range, it is clear that the minimization of external memory accesses must be a primary performance objective. In this paper, we describe the R 3-cache, a natively multi-dimensional caching framework designed specifically to support sophisticated warehouse/OLAP environments. R 3-cache is based upon an in-memory version of the R-tree that has been extended to support buffer pages rather than disk blocks. A key strength of the R 3-cache is that it is able to utilize multi-dimensional fragments of previous query results so as to significantly minimize the frequency and scale of disk accesses. Moreover, the new caching model directly accommodates the standard relational storage model and provides mechanisms for pro-active updates that exploit the existence of query “hot spots”. The current prototype has been evaluated as a component of the Sidera DBMS, a “shared nothing” parallel OLAP server designed for multi-terabyte analytics. Experimental results demonstrate significant performance improvements relative to simpler alternatives.
An ontology-based comparative anatomy information system
Travillian, Ravensara S.; Diatchka, Kremena; Judge, Tejinder K.; Wilamowska, Katarzyna; Shapiro, Linda G.
2010-01-01
Introduction This paper describes the design, implementation, and potential use of a comparative anatomy information system (CAIS) for querying on similarities and differences between homologous anatomical structures across species, the knowledge base it operates upon, the method it uses for determining the answers to the queries, and the user interface it employs to present the results. The relevant informatics contributions of our work include (1) the development and application of the structural difference method, a formalism for symbolically representing anatomical similarities and differences across species; (2) the design of the structure of a mapping between the anatomical models of two different species and its application to information about specific structures in humans, mice, and rats; and (3) the design of the internal syntax and semantics of the query language. These contributions provide the foundation for the development of a working system that allows users to submit queries about the similarities and differences between mouse, rat, and human anatomy; delivers result sets that describe those similarities and differences in symbolic terms; and serves as a prototype for the extension of the knowledge base to any number of species. Additionally, we expanded the domain knowledge by identifying medically relevant structural questions for the human, the mouse, and the rat, and made an initial foray into the validation of the application and its content by means of user questionnaires, software testing, and other feedback. Methods The anatomical structures of the species to be compared, as well as the mappings between species, are modeled on templates from the Foundational Model of Anatomy knowledge base, and compared using graph-matching techniques. A graphical user interface allows users to issue queries that retrieve information concerning similarities and differences between structures in the species being examined. Queries from diverse information sources, including domain experts, peer-reviewed articles, and reference books, have been used to test the system and to illustrate its potential use in comparative anatomy studies. Results 157 test queries were submitted to the CAIS system, and all of them were correctly answered. The interface was evaluated in terms of clarity and ease of use. This testing determined that the application works well, and is fairly intuitive to use, but users want to see more clarification of the meaning of the different types of possible queries. Some of the interface issues will naturally be resolved as we refine our conceptual model to deal with partial and complex homologies in the content. Conclusions The CAIS system and its associated methods are expected to be useful to biologists and translational medicine researchers. Possible applications range from supporting theoretical work in clarifying and modeling ontogenetic, physiological, pathological, and evolutionary transformations, to concrete techniques for improving the analysis of genotype–phenotype relationships among various animal models in support of a wide array of clinical and scientific initiatives. PMID:21146377
... the responding members. Read More 20th Annual White Horse Beach Charity Golf Tournament On July 29, 2017 ... and his committee held the 20th Annual White Horse Beach Charity Golf Tournament. Read More '); jQuery('#s5_ ...
Content-based image retrieval on mobile devices
NASA Astrophysics Data System (ADS)
Ahmad, Iftikhar; Abdullah, Shafaq; Kiranyaz, Serkan; Gabbouj, Moncef
2005-03-01
Content-based image retrieval area possesses a tremendous potential for exploration and utilization equally for researchers and people in industry due to its promising results. Expeditious retrieval of desired images requires indexing of the content in large-scale databases along with extraction of low-level features based on the content of these images. With the recent advances in wireless communication technology and availability of multimedia capable phones it has become vital to enable query operation in image databases and retrieve results based on the image content. In this paper we present a content-based image retrieval system for mobile platforms, providing the capability of content-based query to any mobile device that supports Java platform. The system consists of light-weight client application running on a Java enabled device and a server containing a servlet running inside a Java enabled web server. The server responds to image query using efficient native code from selected image database. The client application, running on a mobile phone, is able to initiate a query request, which is handled by a servlet in the server for finding closest match to the queried image. The retrieved results are transmitted over mobile network and images are displayed on the mobile phone. We conclude that such system serves as a basis of content-based information retrieval on wireless devices and needs to cope up with factors such as constraints on hand-held devices and reduced network bandwidth available in mobile environments.
Introducing glycomics data into the Semantic Web
2013-01-01
Background Glycoscience is a research field focusing on complex carbohydrates (otherwise known as glycans)a, which can, for example, serve as “switches” that toggle between different functions of a glycoprotein or glycolipid. Due to the advancement of glycomics technologies that are used to characterize glycan structures, many glycomics databases are now publicly available and provide useful information for glycoscience research. However, these databases have almost no link to other life science databases. Results In order to implement support for the Semantic Web most efficiently for glycomics research, the developers of major glycomics databases agreed on a minimal standard for representing glycan structure and annotation information using RDF (Resource Description Framework). Moreover, all of the participants implemented this standard prototype and generated preliminary RDF versions of their data. To test the utility of the converted data, all of the data sets were uploaded into a Virtuoso triple store, and several SPARQL queries were tested as “proofs-of-concept” to illustrate the utility of the Semantic Web in querying across databases which were originally difficult to implement. Conclusions We were able to successfully retrieve information by linking UniCarbKB, GlycomeDB and JCGGDB in a single SPARQL query to obtain our target information. We also tested queries linking UniProt with GlycoEpitope as well as lectin data with GlycomeDB through PDB. As a result, we have been able to link proteomics data with glycomics data through the implementation of Semantic Web technologies, allowing for more flexible queries across these domains. PMID:24280648
Introducing glycomics data into the Semantic Web.
Aoki-Kinoshita, Kiyoko F; Bolleman, Jerven; Campbell, Matthew P; Kawano, Shin; Kim, Jin-Dong; Lütteke, Thomas; Matsubara, Masaaki; Okuda, Shujiro; Ranzinger, Rene; Sawaki, Hiromichi; Shikanai, Toshihide; Shinmachi, Daisuke; Suzuki, Yoshinori; Toukach, Philip; Yamada, Issaku; Packer, Nicolle H; Narimatsu, Hisashi
2013-11-26
Glycoscience is a research field focusing on complex carbohydrates (otherwise known as glycans)a, which can, for example, serve as "switches" that toggle between different functions of a glycoprotein or glycolipid. Due to the advancement of glycomics technologies that are used to characterize glycan structures, many glycomics databases are now publicly available and provide useful information for glycoscience research. However, these databases have almost no link to other life science databases. In order to implement support for the Semantic Web most efficiently for glycomics research, the developers of major glycomics databases agreed on a minimal standard for representing glycan structure and annotation information using RDF (Resource Description Framework). Moreover, all of the participants implemented this standard prototype and generated preliminary RDF versions of their data. To test the utility of the converted data, all of the data sets were uploaded into a Virtuoso triple store, and several SPARQL queries were tested as "proofs-of-concept" to illustrate the utility of the Semantic Web in querying across databases which were originally difficult to implement. We were able to successfully retrieve information by linking UniCarbKB, GlycomeDB and JCGGDB in a single SPARQL query to obtain our target information. We also tested queries linking UniProt with GlycoEpitope as well as lectin data with GlycomeDB through PDB. As a result, we have been able to link proteomics data with glycomics data through the implementation of Semantic Web technologies, allowing for more flexible queries across these domains.
Exposing the cancer genome atlas as a SPARQL endpoint
Deus, Helena F.; Veiga, Diogo F.; Freire, Pablo R.; Weinstein, John N.; Mills, Gordon B.; Almeida, Jonas S.
2011-01-01
The Cancer Genome Atlas (TCGA) is a multidisciplinary, multi-institutional effort to characterize several types of cancer. Datasets from biomedical domains such as TCGA present a particularly challenging task for those interested in dynamically aggregating its results because the data sources are typically both heterogeneous and distributed. The Linked Data best practices offer a solution to integrate and discover data with those characteristics, namely through exposure of data as Web services supporting SPARQL, the Resource Description Framework query language. Most SPARQL endpoints, however, cannot easily be queried by data experts. Furthermore, exposing experimental data as SPARQL endpoints remains a challenging task because, in most cases, data must first be converted to Resource Description Framework triples. In line with those requirements, we have developed an infrastructure to expose clinical, demographic and molecular data elements generated by TCGA as a SPARQL endpoint by assigning elements to entities of the Simple Sloppy Semantic Database (S3DB) management model. All components of the infrastructure are available as independent Representational State Transfer (REST) Web services to encourage reusability, and a simple interface was developed to automatically assemble SPARQL queries by navigating a representation of the TCGA domain. A key feature of the proposed solution that greatly facilitates assembly of SPARQL queries is the distinction between the TCGA domain descriptors and data elements. Furthermore, the use of the S3DB management model as a mediator enables queries to both public and protected data without the need for prior submission to a single data source. PMID:20851208
Automatically finding relevant citations for clinical guideline development.
Bui, Duy Duc An; Jonnalagadda, Siddhartha; Del Fiol, Guilherme
2015-10-01
Literature database search is a crucial step in the development of clinical practice guidelines and systematic reviews. In the age of information technology, the process of literature search is still conducted manually, therefore it is costly, slow and subject to human errors. In this research, we sought to improve the traditional search approach using innovative query expansion and citation ranking approaches. We developed a citation retrieval system composed of query expansion and citation ranking methods. The methods are unsupervised and easily integrated over the PubMed search engine. To validate the system, we developed a gold standard consisting of citations that were systematically searched and screened to support the development of cardiovascular clinical practice guidelines. The expansion and ranking methods were evaluated separately and compared with baseline approaches. Compared with the baseline PubMed expansion, the query expansion algorithm improved recall (80.2% vs. 51.5%) with small loss on precision (0.4% vs. 0.6%). The algorithm could find all citations used to support a larger number of guideline recommendations than the baseline approach (64.5% vs. 37.2%, p<0.001). In addition, the citation ranking approach performed better than PubMed's "most recent" ranking (average precision +6.5%, recall@k +21.1%, p<0.001), PubMed's rank by "relevance" (average precision +6.1%, recall@k +14.8%, p<0.001), and the machine learning classifier that identifies scientifically sound studies from MEDLINE citations (average precision +4.9%, recall@k +4.2%, p<0.001). Our unsupervised query expansion and ranking techniques are more flexible and effective than PubMed's default search engine behavior and the machine learning classifier. Automated citation finding is promising to augment the traditional literature search. Copyright © 2015 Elsevier Inc. All rights reserved.
Masseroli, Marco; Kaitoua, Abdulrahman; Pinoli, Pietro; Ceri, Stefano
2016-12-01
While a huge amount of (epi)genomic data of multiple types is becoming available by using Next Generation Sequencing (NGS) technologies, the most important emerging problem is the so-called tertiary analysis, concerned with sense making, e.g., discovering how different (epi)genomic regions and their products interact and cooperate with each other. We propose a paradigm shift in tertiary analysis, based on the use of the Genomic Data Model (GDM), a simple data model which links genomic feature data to their associated experimental, biological and clinical metadata. GDM encompasses all the data formats which have been produced for feature extraction from (epi)genomic datasets. We specifically describe the mapping to GDM of SAM (Sequence Alignment/Map), VCF (Variant Call Format), NARROWPEAK (for called peaks produced by NGS ChIP-seq or DNase-seq methods), and BED (Browser Extensible Data) formats, but GDM supports as well all the formats describing experimental datasets (e.g., including copy number variations, DNA somatic mutations, or gene expressions) and annotations (e.g., regarding transcription start sites, genes, enhancers or CpG islands). We downloaded and integrated samples of all the above-mentioned data types and formats from multiple sources. The GDM is able to homogeneously describe semantically heterogeneous data and makes the ground for providing data interoperability, e.g., achieved through the GenoMetric Query Language (GMQL), a high-level, declarative query language for genomic big data. The combined use of the data model and the query language allows comprehensive processing of multiple heterogeneous data, and supports the development of domain-specific data-driven computations and bio-molecular knowledge discovery. Copyright © 2016 Elsevier Inc. All rights reserved.
Array Databases: Agile Analytics (not just) for the Earth Sciences
NASA Astrophysics Data System (ADS)
Baumann, P.; Misev, D.
2015-12-01
Gridded data, such as images, image timeseries, and climate datacubes, today are managed separately from the metadata, and with different, restricted retrieval capabilities. While databases are good at metadata modelled in tables, XML hierarchies, or RDF graphs, they traditionally do not support multi-dimensional arrays.This gap is being closed by Array Databases, pioneered by the scalable rasdaman ("raster data manager") array engine. Its declarative query language, rasql, extends SQL with array operators which are optimized and parallelized on server side. Installations can easily be mashed up securely, thereby enabling large-scale location-transparent query processing in federations. Domain experts value the integration with their commonly used tools leading to a quick learning curve.Earth, Space, and Life sciences, but also Social sciences as well as business have massive amounts of data and complex analysis challenges that are answered by rasdaman. As of today, rasdaman is mature and in operational use on hundreds of Terabytes of timeseries datacubes, with transparent query distribution across more than 1,000 nodes. Additionally, its concepts have shaped international Big Data standards in the field, including the forthcoming array extension to ISO SQL, many of which are supported by both open-source and commercial systems meantime. In the geo field, rasdaman is reference implementation for the Open Geospatial Consortium (OGC) Big Data standard, WCS, now also under adoption by ISO. Further, rasdaman is in the final stage of OSGeo incubation.In this contribution we present array queries a la rasdaman, describe the architecture and novel optimization and parallelization techniques introduced in 2015, and put this in context of the intercontinental EarthServer initiative which utilizes rasdaman for enabling agile analytics on Petascale datacubes.
Legaz-García, María Del Carmen; Dentler, Kathrin; Fernández-Breis, Jesualdo Tomás; Cornet, Ronald
2017-01-01
ArchMS is a framework that represents clinical information and knowledge using ontologies in OWL, which facilitates semantic interoperability and thereby the exploitation and secondary use of clinical data. However, it does not yet support the automated assessment of quality of care. CLIF is a stepwise method to formalize quality indicators. The method has been implemented in the CLIF tool which supports its users in generating computable queries based on a patient data model which can be based on archetypes. To enable the automated computation of quality indicators using ontologies and archetypes, we tested whether ArchMS and the CLIF tool can be integrated. We successfully automated the process of generating SPARQL queries from quality indicators that have been formalized with CLIF and integrated them into ArchMS. Hence, ontologies and archetypes can be combined for the execution of formalized quality indicators.
A Distributed Multi-Agent System for Collaborative Information Management and Learning
NASA Technical Reports Server (NTRS)
Chen, James R.; Wolfe, Shawn R.; Wragg, Stephen D.; Koga, Dennis (Technical Monitor)
2000-01-01
In this paper, we present DIAMS, a system of distributed, collaborative agents to help users access, manage, share and exchange information. A DIAMS personal agent helps its owner find information most relevant to current needs. It provides tools and utilities for users to manage their information repositories with dynamic organization and virtual views. Flexible hierarchical display is integrated with indexed query search-to support effective information access. Automatic indexing methods are employed to support user queries and communication between agents. Contents of a repository are kept in object-oriented storage to facilitate information sharing. Collaboration between users is aided by easy sharing utilities as well as automated information exchange. Matchmaker agents are designed to establish connections between users with similar interests and expertise. DIAMS agents provide needed services for users to share and learn information from one another on the World Wide Web.
Indexing and retrieving point and region objects
NASA Astrophysics Data System (ADS)
Ibrahim, Azzam T.; Fotouhi, Farshad A.
1996-03-01
R-tree and its variants are examples of spatial data structures for paged-secondary memory. To process a query, these structures require multiple path traversals. In this paper, we present a new image access method, SB+-tree which requires a single path traversal to process a query. Also, SB+-tree will allow commercial databases an access method for spatial objects without a major change, since most commercial databases already support B+-tree as an access method for text data. The SB+-tree can be used for zero and non-zero size data objects. Non-zero size objects are approximated by their minimum bounding rectangles (MBRs). The number of SB+-trees generated is dependent upon the number of dimensions of the approximation of the object. The structure supports efficient spatial operations such as regions-overlap, distance and direction. In this paper, we experimentally and analytically demonstrate the superiority of SB+-tree over R-tree.
Does your model weigh the same as a Duck?
NASA Astrophysics Data System (ADS)
Jain, Ajay N.; Cleves, Ann E.
2012-01-01
Computer-aided drug design is a mature field by some measures, and it has produced notable successes that underpin the study of interactions between small molecules and living systems. However, unlike a truly mature field, fallacies of logic lie at the heart of the arguments in support of major lines of research on methodology and validation thereof. Two particularly pernicious ones are cum hoc ergo propter hoc (with this, therefore because of this) and confirmation bias (seeking evidence that is confirmatory of the hypothesis at hand). These fallacies will be discussed in the context of off-target predictive modeling, QSAR, molecular similarity computations, and docking. Examples will be shown that avoid these problems.
Exchange of Veterans Affairs medical data using national and local networks.
Dayhoff, R E; Maloney, D L
1992-12-17
Remote data exchange is extremely useful to a number of medical applications. It requires an infrastructure including systems, network and software tools. With such an infrastructure, existing local applications can be extended to serve national needs. There are many approaches to providing remote data exchange. Selection of an approach for an application requires balancing of various factors, including the need for rapid interactive access to data and ad hoc queries, the adequacy of access to predefined data sets, the need for an integrated view of the data, the ability to provide adequate security protection, the amount of data required, and the time frame in which data is required. The applications described here demonstrate new ways that the VA is reaping benefits from its infrastructure and its compatible integrated hospital information systems located at its facilities. The needs that have been met are also needs of private hospitals. However, in many cases the infrastructure to allow data exchange is not present. The VA's experiences may serve to establish the benefits that can be obtained by all hospitals.
Supporting infobuttons with terminological knowledge.
Cimino, J. J.; Elhanan, G.; Zeng, Q.
1997-01-01
We have developed several prototype applications which integrate clinical systems with on-line information resources by using patient data to drive queries in response to user information needs. We refer to these collectively as infobuttons because they are evoked with a minimum of keyboard entry. We make use of knowledge in our terminology, the Medical Entities Dictionary (MED) to assist with the selection of appropriate queries and resources, as well as the translation of patient data to forms recognized by the resources. This paper describes the kinds of knowledge in the MED, including literal attributes, hierarchical links and other semantic links, and how this knowledge is used in system integration. PMID:9357682
Hewitt, Robin; Gobbi, Alberto; Lee, Man-Ling
2005-01-01
Relational databases are the current standard for storing and retrieving data in the pharmaceutical and biotech industries. However, retrieving data from a relational database requires specialized knowledge of the database schema and of the SQL query language. At Anadys, we have developed an easy-to-use system for searching and reporting data in a relational database to support our drug discovery project teams. This system is fast and flexible and allows users to access all data without having to write SQL queries. This paper presents the hierarchical, graph-based metadata representation and SQL-construction methods that, together, are the basis of this system's capabilities.
Supporting infobuttons with terminological knowledge.
Cimino, J J; Elhanan, G; Zeng, Q
1997-01-01
We have developed several prototype applications which integrate clinical systems with on-line information resources by using patient data to drive queries in response to user information needs. We refer to these collectively as infobuttons because they are evoked with a minimum of keyboard entry. We make use of knowledge in our terminology, the Medical Entities Dictionary (MED) to assist with the selection of appropriate queries and resources, as well as the translation of patient data to forms recognized by the resources. This paper describes the kinds of knowledge in the MED, including literal attributes, hierarchical links and other semantic links, and how this knowledge is used in system integration.
Mercury Toolset for Spatiotemporal Metadata
NASA Technical Reports Server (NTRS)
Wilson, Bruce E.; Palanisamy, Giri; Devarakonda, Ranjeet; Rhyne, B. Timothy; Lindsley, Chris; Green, James
2010-01-01
Mercury (http://mercury.ornl.gov) is a set of tools for federated harvesting, searching, and retrieving metadata, particularly spatiotemporal metadata. Version 3.0 of the Mercury toolset provides orders of magnitude improvements in search speed, support for additional metadata formats, integration with Google Maps for spatial queries, facetted type search, support for RSS (Really Simple Syndication) delivery of search results, and enhanced customization to meet the needs of the multiple projects that use Mercury. It provides a single portal to very quickly search for data and information contained in disparate data management systems, each of which may use different metadata formats. Mercury harvests metadata and key data from contributing project servers distributed around the world and builds a centralized index. The search interfaces then allow the users to perform a variety of fielded, spatial, and temporal searches across these metadata sources. This centralized repository of metadata with distributed data sources provides extremely fast search results to the user, while allowing data providers to advertise the availability of their data and maintain complete control and ownership of that data. Mercury periodically (typically daily) harvests metadata sources through a collection of interfaces and re-indexes these metadata to provide extremely rapid search capabilities, even over collections with tens of millions of metadata records. A number of both graphical and application interfaces have been constructed within Mercury, to enable both human users and other computer programs to perform queries. Mercury was also designed to support multiple different projects, so that the particular fields that can be queried and used with search filters are easy to configure for each different project.
Yu, Bowen; Doraiswamy, Harish; Chen, Xi; Miraldi, Emily; Arrieta-Ortiz, Mario Luis; Hafemeister, Christoph; Madar, Aviv; Bonneau, Richard; Silva, Cláudio T
2014-12-01
Elucidation of transcriptional regulatory networks (TRNs) is a fundamental goal in biology, and one of the most important components of TRNs are transcription factors (TFs), proteins that specifically bind to gene promoter and enhancer regions to alter target gene expression patterns. Advances in genomic technologies as well as advances in computational biology have led to multiple large regulatory network models (directed networks) each with a large corpus of supporting data and gene-annotation. There are multiple possible biological motivations for exploring large regulatory network models, including: validating TF-target gene relationships, figuring out co-regulation patterns, and exploring the coordination of cell processes in response to changes in cell state or environment. Here we focus on queries aimed at validating regulatory network models, and on coordinating visualization of primary data and directed weighted gene regulatory networks. The large size of both the network models and the primary data can make such coordinated queries cumbersome with existing tools and, in particular, inhibits the sharing of results between collaborators. In this work, we develop and demonstrate a web-based framework for coordinating visualization and exploration of expression data (RNA-seq, microarray), network models and gene-binding data (ChIP-seq). Using specialized data structures and multiple coordinated views, we design an efficient querying model to support interactive analysis of the data. Finally, we show the effectiveness of our framework through case studies for the mouse immune system (a dataset focused on a subset of key cellular functions) and a model bacteria (a small genome with high data-completeness).
Mercury Toolset for Spatiotemporal Metadata
NASA Astrophysics Data System (ADS)
Devarakonda, Ranjeet; Palanisamy, Giri; Green, James; Wilson, Bruce; Rhyne, B. Timothy; Lindsley, Chris
2010-06-01
Mercury (http://mercury.ornl.gov) is a set of tools for federated harvesting, searching, and retrieving metadata, particularly spatiotemporal metadata. Version 3.0 of the Mercury toolset provides orders of magnitude improvements in search speed, support for additional metadata formats, integration with Google Maps for spatial queries, facetted type search, support for RSS (Really Simple Syndication) delivery of search results, and enhanced customization to meet the needs of the multiple projects that use Mercury. It provides a single portal to very quickly search for data and information contained in disparate data management systems, each of which may use different metadata formats. Mercury harvests metadata and key data from contributing project servers distributed around the world and builds a centralized index. The search interfaces then allow the users to perform a variety of fielded, spatial, and temporal searches across these metadata sources. This centralized repository of metadata with distributed data sources provides extremely fast search results to the user, while allowing data providers to advertise the availability of their data and maintain complete control and ownership of that data. Mercury periodically (typically daily)harvests metadata sources through a collection of interfaces and re-indexes these metadata to provide extremely rapid search capabilities, even over collections with tens of millions of metadata records. A number of both graphical and application interfaces have been constructed within Mercury, to enable both human users and other computer programs to perform queries. Mercury was also designed to support multiple different projects, so that the particular fields that can be queried and used with search filters are easy to configure for each different project.
Oliveira, S R M; Almeida, G V; Souza, K R R; Rodrigues, D N; Kuser-Falcão, P R; Yamagishi, M E B; Santos, E H; Vieira, F D; Jardine, J G; Neshich, G
2007-10-05
An effective strategy for managing protein databases is to provide mechanisms to transform raw data into consistent, accurate and reliable information. Such mechanisms will greatly reduce operational inefficiencies and improve one's ability to better handle scientific objectives and interpret the research results. To achieve this challenging goal for the STING project, we introduce Sting_RDB, a relational database of structural parameters for protein analysis with support for data warehousing and data mining. In this article, we highlight the main features of Sting_RDB and show how a user can explore it for efficient and biologically relevant queries. Considering its importance for molecular biologists, effort has been made to advance Sting_RDB toward data quality assessment. To the best of our knowledge, Sting_RDB is one of the most comprehensive data repositories for protein analysis, now also capable of providing its users with a data quality indicator. This paper differs from our previous study in many aspects. First, we introduce Sting_RDB, a relational database with mechanisms for efficient and relevant queries using SQL. Sting_rdb evolved from the earlier, text (flat file)-based database, in which data consistency and integrity was not guaranteed. Second, we provide support for data warehousing and mining. Third, the data quality indicator was introduced. Finally and probably most importantly, complex queries that could not be posed on a text-based database, are now easily implemented. Further details are accessible at the Sting_RDB demo web page: http://www.cbi.cnptia.embrapa.br/StingRDB.
Fu, Heyun; Wei, Chenhui; Qu, Xiaolei; Li, Hui; Zhu, Dongqiang
2018-01-01
Dissolved black carbon (DBC), the soluble fraction of black carbon (BC), is an important constituent of dissolved organic matter pool. However, little is known about the binding interactions between hydrophobic organic contaminants (HOCs) and DBC and their significance in the fate process. This study determined the binding ability of DBC released from rice-derived BC for a series of apolar HOCs, including four polycyclic aromatic hydrocarbons and four chlorinated benzenes, using batch sorption and solubility enhancement techniques. Bulk BC and a dissolved soil humic acid (DSHA) were included as benchmark sorbents. The organic carbon-normalized sorption coefficient of phenanthrene to DBC was slightly lower than bulk BC, but was over ten folds higher than DSHA. Consistently, DBC was more effective than DSHA in enhancing the apparent water solubility of the tested HOCs, and the enhancement positively correlated with solute n-octanol-water partition coefficient, indicating the predominance of hydrophobic partition. The much higher binding ability of DBC relative to DSHA was mainly attributed to its higher tendency to form pseudomicellar structures as supported by the fluorescence quenching and the pH-edge data. Our findings suggest that DBC might play a significant role in the environmental fate and transport of HOCs as both sorbent and carrier. Copyright © 2017 Elsevier Ltd. All rights reserved.
Classification of Chemical Compounds to Support Complex Queries in a Pathway Database
Weidemann, Andreas; Kania, Renate; Peiss, Christian; Rojas, Isabel
2004-01-01
Data quality in biological databases has become a topic of great discussion. To provide high quality data and to deal with the vast amount of biochemical data, annotators and curators need to be supported by software that carries out part of their work in an (semi-) automatic manner. The detection of errors and inconsistencies is a part that requires the knowledge of domain experts, thus in most cases it is done manually, making it very expensive and time-consuming. This paper presents two tools to partially support the curation of data on biochemical pathways. The tool enables the automatic classification of chemical compounds based on their respective SMILES strings. Such classification allows the querying and visualization of biochemical reactions at different levels of abstraction, according to the level of detail at which the reaction participants are described. Chemical compounds can be classified in a flexible manner based on different criteria. The support of the process of data curation is provided by facilitating the detection of compounds that are identified as different but that are actually the same. This is also used to identify similar reactions and, in turn, pathways. PMID:18629066
A semantic web framework to integrate cancer omics data with biological knowledge
2012-01-01
Background The RDF triple provides a simple linguistic means of describing limitless types of information. Triples can be flexibly combined into a unified data source we call a semantic model. Semantic models open new possibilities for the integration of variegated biological data. We use Semantic Web technology to explicate high throughput clinical data in the context of fundamental biological knowledge. We have extended Corvus, a data warehouse which provides a uniform interface to various forms of Omics data, by providing a SPARQL endpoint. With the querying and reasoning tools made possible by the Semantic Web, we were able to explore quantitative semantic models retrieved from Corvus in the light of systematic biological knowledge. Results For this paper, we merged semantic models containing genomic, transcriptomic and epigenomic data from melanoma samples with two semantic models of functional data - one containing Gene Ontology (GO) data, the other, regulatory networks constructed from transcription factor binding information. These two semantic models were created in an ad hoc manner but support a common interface for integration with the quantitative semantic models. Such combined semantic models allow us to pose significant translational medicine questions. Here, we study the interplay between a cell's molecular state and its response to anti-cancer therapy by exploring the resistance of cancer cells to Decitabine, a demethylating agent. Conclusions We were able to generate a testable hypothesis to explain how Decitabine fights cancer - namely, that it targets apoptosis-related gene promoters predominantly in Decitabine-sensitive cell lines, thus conveying its cytotoxic effect by activating the apoptosis pathway. Our research provides a framework whereby similar hypotheses can be developed easily. PMID:22373303
Got, Jeanne; Cortés, María Paz; Maass, Alejandro
2018-01-01
Genome-scale metabolic models have become the tool of choice for the global analysis of microorganism metabolism, and their reconstruction has attained high standards of quality and reliability. Improvements in this area have been accompanied by the development of some major platforms and databases, and an explosion of individual bioinformatics methods. Consequently, many recent models result from “à la carte” pipelines, combining the use of platforms, individual tools and biological expertise to enhance the quality of the reconstruction. Although very useful, introducing heterogeneous tools, that hardly interact with each other, causes loss of traceability and reproducibility in the reconstruction process. This represents a real obstacle, especially when considering less studied species whose metabolic reconstruction can greatly benefit from the comparison to good quality models of related organisms. This work proposes an adaptable workspace, AuReMe, for sustainable reconstructions or improvements of genome-scale metabolic models involving personalized pipelines. At each step, relevant information related to the modifications brought to the model by a method is stored. This ensures that the process is reproducible and documented regardless of the combination of tools used. Additionally, the workspace establishes a way to browse metabolic models and their metadata through the automatic generation of ad-hoc local wikis dedicated to monitoring and facilitating the process of reconstruction. AuReMe supports exploration and semantic query based on RDF databases. We illustrate how this workspace allowed handling, in an integrated way, the metabolic reconstructions of non-model organisms such as an extremophile bacterium or eukaryote algae. Among relevant applications, the latter reconstruction led to putative evolutionary insights of a metabolic pathway. PMID:29791443
A Mass Spectrometry Proteomics Data Management Platform*
Sharma, Vagisha; Eng, Jimmy K.; MacCoss, Michael J.; Riffle, Michael
2012-01-01
Mass spectrometry-based proteomics is increasingly being used in biomedical research. These experiments typically generate a large volume of highly complex data, and the volume and complexity are only increasing with time. There exist many software pipelines for analyzing these data (each typically with its own file formats), and as technology improves, these file formats change and new formats are developed. Files produced from these myriad software programs may accumulate on hard disks or tape drives over time, with older files being rendered progressively more obsolete and unusable with each successive technical advancement and data format change. Although initiatives exist to standardize the file formats used in proteomics, they do not address the core failings of a file-based data management system: (1) files are typically poorly annotated experimentally, (2) files are “organically” distributed across laboratory file systems in an ad hoc manner, (3) files formats become obsolete, and (4) searching the data and comparing and contrasting results across separate experiments is very inefficient (if possible at all). Here we present a relational database architecture and accompanying web application dubbed Mass Spectrometry Data Platform that is designed to address the failings of the file-based mass spectrometry data management approach. The database is designed such that the output of disparate software pipelines may be imported into a core set of unified tables, with these core tables being extended to support data generated by specific pipelines. Because the data are unified, they may be queried, viewed, and compared across multiple experiments using a common web interface. Mass Spectrometry Data Platform is open source and freely available at http://code.google.com/p/msdapl/. PMID:22611296
Integrated Behavioral Health Care in Family Medicine Residencies A CERA Survey.
Jacobs, Christine; Brieler, Jay A; Salas, Joanne; Betancourt, Renée M; Cronholm, Peter F
2018-05-01
Behavioral health integration (BHI) in primary care settings is critical to mental health care in the United States. Family medicine resident experience in BHI in family medicine residency (FMR) continuity clinics is essential preparation for practice. We surveyed FMR program directors to characterize the status of BHI in FMR training. Using the Council of Academic Family Medicine Educational Research Alliance (CERA) 2017 survey, FMR program directors (n=478, 261 respondents, 54.6% response rate) were queried regarding the stage of BHI within the residency family medicine center (FMC), integration activities at the FMC, and the professions of the BH faculty. BHI was characterized by Substance Abuse and Mental Health Services Agency (SAMHSA) designations within FMRs, and chi-square or ANOVA with Tukey honest significant difference (HSD) post hoc testing was used to assess differences in reported BHI attributes. Program directors reported a high level of BHI in their FMCs (44.1% full integration, 33.7% colocated). Higher levels of BHI were associated with increased use of warm handoffs, same day consultation, shared health records, and the use of behavioral health (BH) professionals for both mental health and medical issues. Family physicians, psychiatrists, and psychologists were most likely to be training residents in BHI. Almost half of FMR programs have colocated BH care or fully integrated BH as defined by SAMHSA. Highly integrated FMRs use a diversity of behavioral professionals and activities. Residencies currently at the collaboration stage could increase BH provider types and BHI practices to better prepare residents for practice. Residencies with full BHI may consider focusing on supporting BHI-trained residents transitioning into practice, or disseminating the model in the general primary care community.
Ghandikota, Sudhir; Hershey, Gurjit K Khurana; Mersha, Tesfaye B
2018-03-24
Advances in high-throughput sequencing technologies have made it possible to generate multiple omics data at an unprecedented rate and scale. The accumulation of these omics data far outpaces the rate at which biologists can mine and generate new hypothesis to test experimentally. There is an urgent need to develop a myriad of powerful tools to efficiently and effectively search and filter these resources to address specific post-GWAS functional genomics questions. However, to date, these resources are scattered across several databases and often lack a unified portal for data annotation and analytics. In addition, existing tools to analyze and visualize these databases are highly fragmented, resulting researchers to access multiple applications and manual interventions for each gene or variant in an ad hoc fashion until all the questions are answered. In this study, we present GENEASE, a web-based one-stop bioinformatics tool designed to not only query and explore multi-omics and phenotype databases (e.g., GTEx, ClinVar, dbGaP, GWAS Catalog, ENCODE, Roadmap Epigenomics, KEGG, Reactome, Gene and Phenotype Ontology) in a single web interface but also to perform seamless post genome-wide association downstream functional and overlap analysis for non-coding regulatory variants. GENEASE accesses over 50 different databases in public domain including model organism-specific databases to facilitate gene/variant and disease exploration, enrichment and overlap analysis in real time. It is a user-friendly tool with point-and-click interface containing links for support information including user manual and examples. GENEASE can be accessed freely at http://research.cchmc.org/mershalab/genease_new/login.html. Tesfaye.Mersha@cchmc.org, Sudhir.Ghandikota@cchmc.org. Supplementary data are available at Bioinformatics online.
State & Society: Presidential Candidates Answer Queries on Science Policy
ERIC Educational Resources Information Center
Physics Today, 1976
1976-01-01
Presents views of Gerald Ford and Jimmy Carter on the role of science advisors in the Executive Office of the President, national energy needs and the nuclear power program, and federal support for basic and applied science. (MLH)
Data Sharing in DHT Based P2P Systems
NASA Astrophysics Data System (ADS)
Roncancio, Claudia; Del Pilar Villamil, María; Labbé, Cyril; Serrano-Alvarado, Patricia
The evolution of peer-to-peer (P2P) systems triggered the building of large scale distributed applications. The main application domain is data sharing across a very large number of highly autonomous participants. Building such data sharing systems is particularly challenging because of the “extreme” characteristics of P2P infrastructures: massive distribution, high churn rate, no global control, potentially untrusted participants... This article focuses on declarative querying support, query optimization and data privacy on a major class of P2P systems, that based on Distributed Hash Table (P2P DHT). The usual approaches and the algorithms used by classic distributed systems and databases for providing data privacy and querying services are not well suited to P2P DHT systems. A considerable amount of work was required to adapt them for the new challenges such systems present. This paper describes the most important solutions found. It also identifies important future research trends in data management in P2P DHT systems.
Query optimization for graph analytics on linked data using SPARQL
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hong, Seokyong; Lee, Sangkeun; Lim, Seung -Hwan
2015-07-01
Triplestores that support query languages such as SPARQL are emerging as the preferred and scalable solution to represent data and meta-data as massive heterogeneous graphs using Semantic Web standards. With increasing adoption, the desire to conduct graph-theoretic mining and exploratory analysis has also increased. Addressing that desire, this paper presents a solution that is the marriage of Graph Theory and the Semantic Web. We present software that can analyze Linked Data using graph operations such as counting triangles, finding eccentricity, testing connectedness, and computing PageRank directly on triple stores via the SPARQL interface. We describe the process of optimizing performancemore » of the SPARQL-based implementation of such popular graph algorithms by reducing the space-overhead, simplifying iterative complexity and removing redundant computations by understanding query plans. Our optimized approach shows significant performance gains on triplestores hosted on stand-alone workstations as well as hardware-optimized scalable supercomputers such as the Cray XMT.« less
Image query and indexing for digital x rays
NASA Astrophysics Data System (ADS)
Long, L. Rodney; Thoma, George R.
1998-12-01
The web-based medical information retrieval system (WebMIRS) allows interned access to databases containing 17,000 digitized x-ray spine images and associated text data from National Health and Nutrition Examination Surveys (NHANES). WebMIRS allows SQL query of the text, and viewing of the returned text records and images using a standard browser. We are now working (1) to determine utility of data directly derived from the images in our databases, and (2) to investigate the feasibility of computer-assisted or automated indexing of the images to support image retrieval of images of interest to biomedical researchers in the field of osteoarthritis. To build an initial database based on image data, we are manually segmenting a subset of the vertebrae, using techniques from vertebral morphometry. From this, we will derive and add to the database vertebral features. This image-derived data will enhance the user's data access capability by enabling the creation of combined SQL/image-content queries.
Toward An Unstructured Mesh Database
NASA Astrophysics Data System (ADS)
Rezaei Mahdiraji, Alireza; Baumann, Peter Peter
2014-05-01
Unstructured meshes are used in several application domains such as earth sciences (e.g., seismology), medicine, oceanography, cli- mate modeling, GIS as approximate representations of physical objects. Meshes subdivide a domain into smaller geometric elements (called cells) which are glued together by incidence relationships. The subdivision of a domain allows computational manipulation of complicated physical structures. For instance, seismologists model earthquakes using elastic wave propagation solvers on hexahedral meshes. The hexahedral con- tains several hundred millions of grid points and millions of hexahedral cells. Each vertex node in the hexahedrals stores a multitude of data fields. To run simulation on such meshes, one needs to iterate over all the cells, iterate over incident cells to a given cell, retrieve coordinates of cells, assign data values to cells, etc. Although meshes are used in many application domains, to the best of our knowledge there is no database vendor that support unstructured mesh features. Currently, the main tool for querying and manipulating unstructured meshes are mesh libraries, e.g., CGAL and GRAL. Mesh li- braries are dedicated libraries which includes mesh algorithms and can be run on mesh representations. The libraries do not scale with dataset size, do not have declarative query language, and need deep C++ knowledge for query implementations. Furthermore, due to high coupling between the implementations and input file structure, the implementations are less reusable and costly to maintain. A dedicated mesh database offers the following advantages: 1) declarative querying, 2) ease of maintenance, 3) hiding mesh storage structure from applications, and 4) transparent query optimization. To design a mesh database, the first challenge is to define a suitable generic data model for unstructured meshes. We proposed ImG-Complexes data model as a generic topological mesh data model which extends incidence graph model to multi-incidence relationships. We instrument ImG model with sets of optional and application-specific constraints which can be used to check validity of meshes for a specific class of object such as manifold, pseudo-manifold, and simplicial manifold. We conducted experiments to measure the performance of the graph database solution in processing mesh queries and compare it with GrAL mesh library and PostgreSQL database on synthetic and real mesh datasets. The experiments show that each system perform well on specific types of mesh queries, e.g., graph databases perform well on global path-intensive queries. In the future, we investigate database operations for the ImG model and design a mesh query language.
Sundanese ancient manuscripts search engine using probability approach
NASA Astrophysics Data System (ADS)
Suryani, Mira; Hadi, Setiawan; Paulus, Erick; Nurma Yulita, Intan; Supriatna, Asep K.
2017-10-01
Today, Information and Communication Technology (ICT) has become a regular thing for every aspect of live include cultural and heritage aspect. Sundanese ancient manuscripts as Sundanese heritage are in damage condition and also the information that containing on it. So in order to preserve the information in Sundanese ancient manuscripts and make them easier to search, a search engine has been developed. The search engine must has good computing ability. In order to get the best computation in developed search engine, three types of probabilistic approaches: Bayesian Networks Model, Divergence from Randomness with PL2 distribution, and DFR-PL2F as derivative form DFR-PL2 have been compared in this study. The three probabilistic approaches supported by index of documents and three different weighting methods: term occurrence, term frequency, and TF-IDF. The experiment involved 12 Sundanese ancient manuscripts. From 12 manuscripts there are 474 distinct terms. The developed search engine tested by 50 random queries for three types of query. The experiment results showed that for the single query and multiple query, the best searching performance given by the combination of PL2F approach and TF-IDF weighting method. The performance has been evaluated using average time responds with value about 0.08 second and Mean Average Precision (MAP) about 0.33.
Omicseq: a web-based search engine for exploring omics datasets
Sun, Xiaobo; Pittard, William S.; Xu, Tianlei; Chen, Li; Zwick, Michael E.; Jiang, Xiaoqian; Wang, Fusheng
2017-01-01
Abstract The development and application of high-throughput genomics technologies has resulted in massive quantities of diverse omics data that continue to accumulate rapidly. These rich datasets offer unprecedented and exciting opportunities to address long standing questions in biomedical research. However, our ability to explore and query the content of diverse omics data is very limited. Existing dataset search tools rely almost exclusively on the metadata. A text-based query for gene name(s) does not work well on datasets wherein the vast majority of their content is numeric. To overcome this barrier, we have developed Omicseq, a novel web-based platform that facilitates the easy interrogation of omics datasets holistically to improve ‘findability’ of relevant data. The core component of Omicseq is trackRank, a novel algorithm for ranking omics datasets that fully uses the numerical content of the dataset to determine relevance to the query entity. The Omicseq system is supported by a scalable and elastic, NoSQL database that hosts a large collection of processed omics datasets. In the front end, a simple, web-based interface allows users to enter queries and instantly receive search results as a list of ranked datasets deemed to be the most relevant. Omicseq is freely available at http://www.omicseq.org. PMID:28402462
Human motion retrieval from hand-drawn sketch.
Chao, Min-Wen; Lin, Chao-Hung; Assa, Jackie; Lee, Tong-Yee
2012-05-01
The rapid growth of motion capture data increases the importance of motion retrieval. The majority of the existing motion retrieval approaches are based on a labor-intensive step in which the user browses and selects a desired query motion clip from the large motion clip database. In this work, a novel sketching interface for defining the query is presented. This simple approach allows users to define the required motion by sketching several motion strokes over a drawn character, which requires less effort and extends the users’ expressiveness. To support the real-time interface, a specialized encoding of the motions and the hand-drawn query is required. Here, we introduce a novel hierarchical encoding scheme based on a set of orthonormal spherical harmonic (SH) basis functions, which provides a compact representation, and avoids the CPU/processing intensive stage of temporal alignment used by previous solutions. Experimental results show that the proposed approach can well retrieve the motions, and is capable of retrieve logically and numerically similar motions, which is superior to previous approaches. The user study shows that the proposed system can be a useful tool to input motion query if the users are familiar with it. Finally, an application of generating a 3D animation from a hand-drawn comics strip is demonstrated.
A rank-based Prediction Algorithm of Learning User's Intention
NASA Astrophysics Data System (ADS)
Shen, Jie; Gao, Ying; Chen, Cang; Gong, HaiPing
Internet search has become an important part in people's daily life. People can find many types of information to meet different needs through search engines on the Internet. There are two issues for the current search engines: first, the users should predetermine the types of information they want and then change to the appropriate types of search engine interfaces. Second, most search engines can support multiple kinds of search functions, each function has its own separate search interface. While users need different types of information, they must switch between different interfaces. In practice, most queries are corresponding to various types of information results. These queries can search the relevant results in various search engines, such as query "Palace" contains the websites about the introduction of the National Palace Museum, blog, Wikipedia, some pictures and video information. This paper presents a new aggregative algorithm for all kinds of search results. It can filter and sort the search results by learning three aspects about the query words, search results and search history logs to achieve the purpose of detecting user's intention. Experiments demonstrate that this rank-based method for multi-types of search results is effective. It can meet the user's search needs well, enhance user's satisfaction, provide an effective and rational model for optimizing search engines and improve user's search experience.
An Improvement to a Multi-Client Searchable Encryption Scheme for Boolean Queries.
Jiang, Han; Li, Xue; Xu, Qiuliang
2016-12-01
The migration of e-health systems to the cloud computing brings huge benefits, as same as some security risks. Searchable Encryption(SE) is a cryptography encryption scheme that can protect the confidentiality of data and utilize the encrypted data at the same time. The SE scheme proposed by Cash et al. in Crypto2013 and its follow-up work in CCS2013 are most practical SE Scheme that support Boolean queries at present. In their scheme, the data user has to generate the search tokens by the counter number one by one and interact with server repeatedly, until he meets the correct one, or goes through plenty of tokens to illustrate that there is no search result. In this paper, we make an improvement to their scheme. We allow server to send back some information and help the user to generate exact search token in the search phase. In our scheme, there are only two round interaction between server and user, and the search token has [Formula: see text] elements, where n is the keywords number in query expression, and [Formula: see text] is the minimum documents number that contains one of keyword in query expression, and the computation cost of server is [Formula: see text] modular exponentiation operation.
Harvesting implementation for the GI-cat distributed catalog
NASA Astrophysics Data System (ADS)
Boldrini, Enrico; Papeschi, Fabrizio; Bigagli, Lorenzo; Mazzetti, Paolo
2010-05-01
GI-cat framework implements a distributed catalog service supporting different international standards and interoperability arrangements in use by the geoscientific community. The distribution functionality in conjunction with the mediation functionality allows to seamlessly query remote heterogeneous data sources, including OGC Web Services - e.e. OGC CSW, WCS, WFS and WMS, community standards such as UNIDATA THREDDS/OPeNDAP, SeaDataNet CDI (Common Data Index), GBIF (Global Biodiversity Information Facility) services and OpenSearch engines. In the GI-cat modular architecture a distributor component carry out the distribution functionality by query delegation to the mediator components (one for each different data source). Each of these mediator components is able to query a specific data source and convert back the results by mapping of the foreign data model to the GI-cat internal one, based on ISO 19139. In order to cope with deployment scenarios in which local data is expected, an harvesting approach has been experimented. The new strategy comes in addition to the consolidated distributed approach, allowing the user to switch between a remote and a local search at will for each federated resource; this extends GI-cat configuration possibilities. The harvesting strategy is designed in GI-cat by the use at the core of a local cache component, implemented as a native XML database and based on eXist. The different heterogeneous sources are queried for the bulk of available data; this data is then injected into the cache component after being converted to the GI-cat data model. The query and conversion steps are performed by the mediator components that were are part of the GI-cat framework. Afterward each new query can be exercised against local data that have been stored in the cache component. Considering both advantages and shortcomings that affect harvesting and query distribution approaches, it comes out that a user driven tuning is required to take the best of them. This is often related to the specific user scenarios to be implemented. GI-cat proved to be a flexible framework to address user need. The GI-cat configurator tool was updated to make such a tuning possible: each data source can be configured to enable either harvesting or query distribution approaches; in the former case an appropriate harvesting interval can be set.
Mirel, Barbara
2009-02-13
Current usability studies of bioinformatics tools suggest that tools for exploratory analysis support some tasks related to finding relationships of interest but not the deep causal insights necessary for formulating plausible and credible hypotheses. To better understand design requirements for gaining these causal insights in systems biology analyses a longitudinal field study of 15 biomedical researchers was conducted. Researchers interacted with the same protein-protein interaction tools to discover possible disease mechanisms for further experimentation. Findings reveal patterns in scientists' exploratory and explanatory analysis and reveal that tools positively supported a number of well-structured query and analysis tasks. But for several of scientists' more complex, higher order ways of knowing and reasoning the tools did not offer adequate support. Results show that for a better fit with scientists' cognition for exploratory analysis systems biology tools need to better match scientists' processes for validating, for making a transition from classification to model-based reasoning, and for engaging in causal mental modelling. As the next great frontier in bioinformatics usability, tool designs for exploratory systems biology analysis need to move beyond the successes already achieved in supporting formulaic query and analysis tasks and now reduce current mismatches with several of scientists' higher order analytical practices. The implications of results for tool designs are discussed.
Salehi, Ali; Jimenez-Berni, Jose; Deery, David M; Palmer, Doug; Holland, Edward; Rozas-Larraondo, Pablo; Chapman, Scott C; Georgakopoulos, Dimitrios; Furbank, Robert T
2015-01-01
To our knowledge, there is no software or database solution that supports large volumes of biological time series sensor data efficiently and enables data visualization and analysis in real time. Existing solutions for managing data typically use unstructured file systems or relational databases. These systems are not designed to provide instantaneous response to user queries. Furthermore, they do not support rapid data analysis and visualization to enable interactive experiments. In large scale experiments, this behaviour slows research discovery, discourages the widespread sharing and reuse of data that could otherwise inform critical decisions in a timely manner and encourage effective collaboration between groups. In this paper we present SensorDB, a web based virtual laboratory that can manage large volumes of biological time series sensor data while supporting rapid data queries and real-time user interaction. SensorDB is sensor agnostic and uses web-based, state-of-the-art cloud and storage technologies to efficiently gather, analyse and visualize data. Collaboration and data sharing between different agencies and groups is thereby facilitated. SensorDB is available online at http://sensordb.csiro.au.
Morrison, James J; Hostetter, Jason; Wang, Kenneth; Siegel, Eliot L
2015-02-01
Real-time mining of large research trial datasets enables development of case-based clinical decision support tools. Several applicable research datasets exist including the National Lung Screening Trial (NLST), a dataset unparalleled in size and scope for studying population-based lung cancer screening. Using these data, a clinical decision support tool was developed which matches patient demographics and lung nodule characteristics to a cohort of similar patients. The NLST dataset was converted into Structured Query Language (SQL) tables hosted on a web server, and a web-based JavaScript application was developed which performs real-time queries. JavaScript is used for both the server-side and client-side language, allowing for rapid development of a robust client interface and server-side data layer. Real-time data mining of user-specified patient cohorts achieved a rapid return of cohort cancer statistics and lung nodule distribution information. This system demonstrates the potential of individualized real-time data mining using large high-quality clinical trial datasets to drive evidence-based clinical decision-making.
Maras, Melissa A; Weston, Karen J; Blacksmith, Jennifer; Brophy, Chelsey
2015-03-01
Schools must possess a variety of capacities to effectively support comprehensive and coordinated school health promotion activities, and researchers have developed a district-level capacity-building framework specific to school health promotion. State-level school health coalitions often support such capacity-building efforts and should embed this work within a data-based, decision-making model. However, there is a lack of guidance for state school health coalitions on how they should collect and use data. This article uses a district-level capacity-building framework to interpret findings from a statewide coordinated school health needs/resource assessment in order to examine statewide capacity for school health promotion. Participants included school personnel (N = 643) from one state. Descriptive statistics were calculated for survey items, with further examination of subgroup differences among school administrators and nurses. Results were then interpreted via a post hoc application of a district-level capacity-building framework. Findings across districts revealed statewide strengths and gaps with regard to leadership and management capacities, internal and external supports, and an indicator of global capacity. Findings support the utility of using a common framework across local and state levels to align efforts and embed capacity-building activities within a data-driven, continuous improvement model. © 2014 Society for Public Health Education.
Secure searching of biomarkers through hybrid homomorphic encryption scheme.
Kim, Miran; Song, Yongsoo; Cheon, Jung Hee
2017-07-26
As genome sequencing technology develops rapidly, there has lately been an increasing need to keep genomic data secure even when stored in the cloud and still used for research. We are interested in designing a protocol for the secure outsourcing matching problem on encrypted data. We propose an efficient method to securely search a matching position with the query data and extract some information at the position. After decryption, only a small amount of comparisons with the query information should be performed in plaintext state. We apply this method to find a set of biomarkers in encrypted genomes. The important feature of our method is to encode a genomic database as a single element of polynomial ring. Since our method requires a single homomorphic multiplication of hybrid scheme for query computation, it has the advantage over the previous methods in parameter size, computation complexity, and communication cost. In particular, the extraction procedure not only prevents leakage of database information that has not been queried by user but also reduces the communication cost by half. We evaluate the performance of our method and verify that the computation on large-scale personal data can be securely and practically outsourced to a cloud environment during data analysis. It takes about 3.9 s to search-and-extract the reference and alternate sequences at the queried position in a database of size 4M. Our solution for finding a set of biomarkers in DNA sequences shows the progress of cryptographic techniques in terms of their capability can support real-world genome data analysis in a cloud environment.
Exposing the cancer genome atlas as a SPARQL endpoint.
Deus, Helena F; Veiga, Diogo F; Freire, Pablo R; Weinstein, John N; Mills, Gordon B; Almeida, Jonas S
2010-12-01
The Cancer Genome Atlas (TCGA) is a multidisciplinary, multi-institutional effort to characterize several types of cancer. Datasets from biomedical domains such as TCGA present a particularly challenging task for those interested in dynamically aggregating its results because the data sources are typically both heterogeneous and distributed. The Linked Data best practices offer a solution to integrate and discover data with those characteristics, namely through exposure of data as Web services supporting SPARQL, the Resource Description Framework query language. Most SPARQL endpoints, however, cannot easily be queried by data experts. Furthermore, exposing experimental data as SPARQL endpoints remains a challenging task because, in most cases, data must first be converted to Resource Description Framework triples. In line with those requirements, we have developed an infrastructure to expose clinical, demographic and molecular data elements generated by TCGA as a SPARQL endpoint by assigning elements to entities of the Simple Sloppy Semantic Database (S3DB) management model. All components of the infrastructure are available as independent Representational State Transfer (REST) Web services to encourage reusability, and a simple interface was developed to automatically assemble SPARQL queries by navigating a representation of the TCGA domain. A key feature of the proposed solution that greatly facilitates assembly of SPARQL queries is the distinction between the TCGA domain descriptors and data elements. Furthermore, the use of the S3DB management model as a mediator enables queries to both public and protected data without the need for prior submission to a single data source. Copyright © 2010 Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Xiong, Wei; Qiu, Bo; Tian, Qi; Mueller, Henning; Xu, Changsheng
2005-04-01
Medical image retrieval is still mainly a research domain with a large variety of applications and techniques. With the ImageCLEF 2004 benchmark, an evaluation framework has been created that includes a database, query topics and ground truth data. Eleven systems (with a total of more than 50 runs) compared their performance in various configurations. The results show that there is not any one feature that performs well on all query tasks. Key to successful retrieval is rather the selection of features and feature weights based on a specific set of input features, thus on the query task. In this paper we propose a novel method based on query topic dependent image features (QTDIF) for content-based medical image retrieval. These feature sets are designed to capture both inter-category and intra-category statistical variations to achieve good retrieval performance in terms of recall and precision. We have used Gaussian Mixture Models (GMM) and blob representation to model medical images and construct the proposed novel QTDIF for CBIR. Finally, trained multi-class support vector machines (SVM) are used for image similarity ranking. The proposed methods have been tested over the Casimage database with around 9000 images, for the given 26 image topics, used for imageCLEF 2004. The retrieval performance has been compared with the medGIFT system, which is based on the GNU Image Finding Tool (GIFT). The experimental results show that the proposed QTDIF-based CBIR can provide significantly better performance than systems based general features only.
Silva, Sara; Gouveia-Oliveira, Rodrigo; Maretzek, António; Carriço, João; Gudnason, Thorolfur; Kristinsson, Karl G; Ekdahl, Karl; Brito-Avô, António; Tomasz, Alexander; Sanches, Ilda Santos; Lencastre, Hermínia de; Almeida, Jonas
2003-01-01
Background EURIS (European Resistance Intervention Study) was launched as a multinational study in September of 2000 to identify the multitude of complex risk factors that contribute to the high carriage rate of drug resistant Streptococcus pneumoniae strains in children attending Day Care Centers in several European countries. Access to the very large number of data required the development of a web-based infrastructure – EURISWEB – that includes a relational online database, coupled with a query system for data retrieval, and allows integrative storage of demographic, clinical and molecular biology data generated in EURIS. Methods All components of the system were developed using open source programming tools: data storage management was supported by PostgreSQL, and the hypertext preprocessor to generate the web pages was implemented using PHP. The query system is based on a software agent running in the background specifically developed for EURIS. Results The website currently contains data related to 13,500 nasopharyngeal samples and over one million measures taken from 5,250 individual children, as well as over one thousand pre-made and user-made queries aggregated into several reports, approximately. It is presently in use by participating researchers from three countries (Iceland, Portugal and Sweden). Conclusion An operational model centered on a PHP engine builds the interface between the user and the database automatically, allowing an easy maintenance of the system. The query system is also sufficiently adaptable to allow the integration of several advanced data analysis procedures far more demanding than simple queries, eventually including artificial intelligence predictive models. PMID:12846930
Guhlin, Joseph; Silverstein, Kevin A T; Zhou, Peng; Tiffin, Peter; Young, Nevin D
2017-08-10
Rapid generation of omics data in recent years have resulted in vast amounts of disconnected datasets without systemic integration and knowledge building, while individual groups have made customized, annotated datasets available on the web with few ways to link them to in-lab datasets. With so many research groups generating their own data, the ability to relate it to the larger genomic and comparative genomic context is becoming increasingly crucial to make full use of the data. The Omics Database Generator (ODG) allows users to create customized databases that utilize published genomics data integrated with experimental data which can be queried using a flexible graph database. When provided with omics and experimental data, ODG will create a comparative, multi-dimensional graph database. ODG can import definitions and annotations from other sources such as InterProScan, the Gene Ontology, ENZYME, UniPathway, and others. This annotation data can be especially useful for studying new or understudied species for which transcripts have only been predicted, and rapidly give additional layers of annotation to predicted genes. In better studied species, ODG can perform syntenic annotation translations or rapidly identify characteristics of a set of genes or nucleotide locations, such as hits from an association study. ODG provides a web-based user-interface for configuring the data import and for querying the database. Queries can also be run from the command-line and the database can be queried directly through programming language hooks available for most languages. ODG supports most common genomic formats as well as generic, easy to use tab-separated value format for user-provided annotations. ODG is a user-friendly database generation and query tool that adapts to the supplied data to produce a comparative genomic database or multi-layered annotation database. ODG provides rapid comparative genomic annotation and is therefore particularly useful for non-model or understudied species. For species for which more data are available, ODG can be used to conduct complex multi-omics, pattern-matching queries.
Taboada, María; Martínez, Diego; Pilo, Belén; Jiménez-Escrig, Adriano; Robinson, Peter N; Sobrido, María J
2012-07-31
Semantic Web technology can considerably catalyze translational genetics and genomics research in medicine, where the interchange of information between basic research and clinical levels becomes crucial. This exchange involves mapping abstract phenotype descriptions from research resources, such as knowledge databases and catalogs, to unstructured datasets produced through experimental methods and clinical practice. This is especially true for the construction of mutation databases. This paper presents a way of harmonizing abstract phenotype descriptions with patient data from clinical practice, and querying this dataset about relationships between phenotypes and genetic variants, at different levels of abstraction. Due to the current availability of ontological and terminological resources that have already reached some consensus in biomedicine, a reuse-based ontology engineering approach was followed. The proposed approach uses the Ontology Web Language (OWL) to represent the phenotype ontology and the patient model, the Semantic Web Rule Language (SWRL) to bridge the gap between phenotype descriptions and clinical data, and the Semantic Query Web Rule Language (SQWRL) to query relevant phenotype-genotype bidirectional relationships. The work tests the use of semantic web technology in the biomedical research domain named cerebrotendinous xanthomatosis (CTX), using a real dataset and ontologies. A framework to query relevant phenotype-genotype bidirectional relationships is provided. Phenotype descriptions and patient data were harmonized by defining 28 Horn-like rules in terms of the OWL concepts. In total, 24 patterns of SWQRL queries were designed following the initial list of competency questions. As the approach is based on OWL, the semantic of the framework adapts the standard logical model of an open world assumption. This work demonstrates how semantic web technologies can be used to support flexible representation and computational inference mechanisms required to query patient datasets at different levels of abstraction. The open world assumption is especially good for describing only partially known phenotype-genotype relationships, in a way that is easily extensible. In future, this type of approach could offer researchers a valuable resource to infer new data from patient data for statistical analysis in translational research. In conclusion, phenotype description formalization and mapping to clinical data are two key elements for interchanging knowledge between basic and clinical research.
2013-01-01
Background Due to the growing number of biomedical entries in data repositories of the National Center for Biotechnology Information (NCBI), it is difficult to collect, manage and process all of these entries in one place by third-party software developers without significant investment in hardware and software infrastructure, its maintenance and administration. Web services allow development of software applications that integrate in one place the functionality and processing logic of distributed software components, without integrating the components themselves and without integrating the resources to which they have access. This is achieved by appropriate orchestration or choreography of available Web services and their shared functions. After the successful application of Web services in the business sector, this technology can now be used to build composite software tools that are oriented towards biomedical data processing. Results We have developed a new tool for efficient and dynamic data exploration in GenBank and other NCBI databases. A dedicated search GenBank system makes use of NCBI Web services and a package of Entrez Programming Utilities (eUtils) in order to provide extended searching capabilities in NCBI data repositories. In search GenBank users can use one of the three exploration paths: simple data searching based on the specified user’s query, advanced data searching based on the specified user’s query, and advanced data exploration with the use of macros. search GenBank orchestrates calls of particular tools available through the NCBI Web service providing requested functionality, while users interactively browse selected records in search GenBank and traverse between NCBI databases using available links. On the other hand, by building macros in the advanced data exploration mode, users create choreographies of eUtils calls, which can lead to the automatic discovery of related data in the specified databases. Conclusions search GenBank extends standard capabilities of the NCBI Entrez search engine in querying biomedical databases. The possibility of creating and saving macros in the search GenBank is a unique feature and has a great potential. The potential will further grow in the future with the increasing density of networks of relationships between data stored in particular databases. search GenBank is available for public use at http://sgb.biotools.pl/. PMID:23452691
Mrozek, Dariusz; Małysiak-Mrozek, Bożena; Siążnik, Artur
2013-03-01
Due to the growing number of biomedical entries in data repositories of the National Center for Biotechnology Information (NCBI), it is difficult to collect, manage and process all of these entries in one place by third-party software developers without significant investment in hardware and software infrastructure, its maintenance and administration. Web services allow development of software applications that integrate in one place the functionality and processing logic of distributed software components, without integrating the components themselves and without integrating the resources to which they have access. This is achieved by appropriate orchestration or choreography of available Web services and their shared functions. After the successful application of Web services in the business sector, this technology can now be used to build composite software tools that are oriented towards biomedical data processing. We have developed a new tool for efficient and dynamic data exploration in GenBank and other NCBI databases. A dedicated search GenBank system makes use of NCBI Web services and a package of Entrez Programming Utilities (eUtils) in order to provide extended searching capabilities in NCBI data repositories. In search GenBank users can use one of the three exploration paths: simple data searching based on the specified user's query, advanced data searching based on the specified user's query, and advanced data exploration with the use of macros. search GenBank orchestrates calls of particular tools available through the NCBI Web service providing requested functionality, while users interactively browse selected records in search GenBank and traverse between NCBI databases using available links. On the other hand, by building macros in the advanced data exploration mode, users create choreographies of eUtils calls, which can lead to the automatic discovery of related data in the specified databases. search GenBank extends standard capabilities of the NCBI Entrez search engine in querying biomedical databases. The possibility of creating and saving macros in the search GenBank is a unique feature and has a great potential. The potential will further grow in the future with the increasing density of networks of relationships between data stored in particular databases. search GenBank is available for public use at http://sgb.biotools.pl/.
From Provenance Standards and Tools to Queries and Actionable Provenance
NASA Astrophysics Data System (ADS)
Ludaescher, B.
2017-12-01
The W3C PROV standard provides a minimal core for sharing retrospective provenance information for scientific workflows and scripts. PROV extensions such as DataONE's ProvONE model are necessary for linking runtime observables in retrospective provenance records with conceptual-level prospective provenance information, i.e., workflow (or dataflow) graphs. Runtime provenance recorders, such as DataONE's RunManager for R, or noWorkflow for Python capture retrospective provenance automatically. YesWorkflow (YW) is a toolkit that allows researchers to declare high-level prospective provenance models of scripts via simple inline comments (YW-annotations), revealing the computational modules and dataflow dependencies in the script. By combining and linking both forms of provenance, important queries and use cases can be supported that neither provenance model can afford on its own. We present existing and emerging provenance tools developed for the DataONE and SKOPE (Synthesizing Knowledge of Past Environments) projects. We show how the different tools can be used individually and in combination to model, capture, share, query, and visualize provenance information. We also present challenges and opportunities for making provenance information more immediately actionable for the researchers who create it in the first place. We argue that such a shift towards "provenance-for-self" is necessary to accelerate the creation, sharing, and use of provenance in support of transparent, reproducible computational and data science.
A SOA broker solution for standard discovery and access services: the GI-cat framework
NASA Astrophysics Data System (ADS)
Boldrini, Enrico
2010-05-01
GI-cat ideal users are data providers or service providers within the geoscience community. The former have their data already available through an access service (e.g. an OGC Web Service) and would have it published through a standard catalog service, in a seamless way. The latter would develop a catalog broker and let users query and access different geospatial resources through one or more standard interfaces and Application Profiles (AP) (e.g. OGC CSW ISO AP, CSW ebRIM/EO AP, etc.). GI-cat actually implements a broker components (i.e. a middleware service) which carries out distribution and mediation functionalities among "well-adopted" catalog interfaces and data access protocols. GI-cat also publishes different discovery interfaces: the OGC CSW ISO and ebRIM Application Profiles (the latter coming with support for the EO and CIM extension packages) and two different OpenSearch interfaces developed in order to explore Web 2.0 possibilities. An extended interface is also available to exploit all available GI-cat features, such as interruptible incremental queries and queries feedback. Interoperability tests performed in the context of different projects have also pointed out the importance to enforce compatibility with existing and wide-spread tools of the open source community (e.g. GeoNetwork and Deegree catalogs), which was then achieved. Based on a service-oriented framework of modular components, GI-cat can effectively be customized and tailored to support different deployment scenarios. In addition to the distribution functionality an harvesting approach has been lately experimented, allowing the user to switch between a distributed and a local search giving thus more possibilities to support different deployment scenarios. A configurator tool is available in order to enable an effective high level configuration of the broker service. A specific geobrowser was also naturally developed, for demonstrating the advanced GI-cat functionalities. This client, called GI-go, is an example of the possible applications which may be built on top of the GI-cat broker component. GI-go allows discovering and browsing of the available datasets, retrieving and evaluating their description and performing distributed queries according to any combination of the following criteria: geographic area, temporal interval, topic of interest (free-text and/or keyword selection are allowed) and data source (i.e. where, when, what, who). The results set of a query (e.g. datasets metadata) are then displayed in an incremental way leveraging the asynchronous interactions approach implemented by GI-cat. This feature allows the user to access the intermediate query results. Query interruption and feedback features are also provided to the user. Alternatively, user may perform a browsing task by selecting a catalog resource from the current configuration and navigate through its aggregated and/or leaf datasets. In both cases datasets metadata, expressed according to ISO 19139 (and also Dublin Core and ebRIM if available), are displayed for download, along with a resource portrayal and actual data access (when this is meaningful and possible). The GI-cat distributed catalog service has been successfully deployed and experimented in the framework of different projects and initiative, including the SeaDataNet FP6 project, GEOSS IP3 (Interoperability Process Pilot Project), GEOSS AIP-2 (Architectural Implementation Project - Phase 2), FP7 GENESI-DR, CNR GIIDA, FP7 EUROGEOSS and ESA HMA project.
This procedure is designed to support the collection of potentially responsive information using automated E-Discovery tools that rely on keywords, key phrases, index queries, or other technological assistance to retrieve Electronically Stored Information
Schools Can Provide Help for the Children of Divorce.
ERIC Educational Resources Information Center
Holzman, Terry
1984-01-01
Since the mid-1970s, Newton, Massachusetts, schools have developed support groups and curricular offerings to help children of divorced parents. Newton's Ad-Hoc Committee on Separation and Divorce, developed under the leadership of John Cullinane, now serves as a systemwide clearinghouse for teacher resources. (JBM)
Coping Strategies Used by Distance Rehabilitation Counseling Interns
ERIC Educational Resources Information Center
Kampfe, Charlene M.; Smith, Mae S.; Manyibe, Edward O.; Sales, Amos P.; Moore, Susan F.
2009-01-01
This study investigated coping strategies used by distance master's level student interns from one rehabilitation counseling program. Analysis of variance revealed a significant difference among five coping strategies. Post hoc comparisons showed that interns used problem-focused and seeking social support more frequently than self-blame, wishful…
Flexible quality of service model for wireless body area sensor networks.
Liao, Yangzhe; Leeson, Mark S; Higgins, Matthew D
2016-03-01
Wireless body area sensor networks (WBASNs) are becoming an increasingly significant breakthrough technology for smart healthcare systems, enabling improved clinical decision-making in daily medical care. Recently, radio frequency ultra-wideband technology has developed substantially for physiological signal monitoring due to its advantages such as low-power consumption, high transmission data rate, and miniature antenna size. Applications of future ubiquitous healthcare systems offer the prospect of collecting human vital signs, early detection of abnormal medical conditions, real-time healthcare data transmission and remote telemedicine support. However, due to the technical constraints of sensor batteries, the supply of power is a major bottleneck for healthcare system design. Moreover, medium access control (MAC) needs to support reliable transmission links that allow sensors to transmit data safely and stably. In this Letter, the authors provide a flexible quality of service model for ad hoc networks that can support fast data transmission, adaptive schedule MAC control, and energy efficient ubiquitous WBASN networks. Results show that the proposed multi-hop communication ad hoc network model can balance information packet collisions and power consumption. Additionally, wireless communications link in WBASNs can effectively overcome multi-user interference and offer high transmission data rates for healthcare systems.
Kusunoki, Diana; Sarcevic, Aleksandra; Zhang, Zhan; Yala, Maria
2014-01-01
Prior CSCW research on awareness in clinical settings has mostly focused on higher-level team coordination spanning across longer-term trajectories at the department and inter-department levels. In this paper, we offer a perspective on what awareness means within the context of an ad hoc, time- and safety-critical medical setting by looking at teams treating severely ill patients with urgent needs. We report findings from four participatory design workshops conducted with emergency medicine clinicians at two regional emergency departments. Workshops were developed to elicit design ideas for information displays that support awareness in emergency medical situations. Through analysis of discussions and clinicians’ sketches of information displays, we identified five features of teamwork that can be used as a foundation for supporting awareness from the perspective of clinicians. Based on these findings, we contribute rich descriptions of four facets of awareness that teams manage during emergency medical situations: team member awareness, elapsed time awareness, teamwork-oriented and patient-driven task awareness, and overall progress awareness. We then discuss these four awareness types in relation to awareness facets found in the CSCW literature. PMID:25870498
Query Auto-Completion Based on Word2vec Semantic Similarity
NASA Astrophysics Data System (ADS)
Shao, Taihua; Chen, Honghui; Chen, Wanyu
2018-04-01
Query auto-completion (QAC) is the first step of information retrieval, which helps users formulate the entire query after inputting only a few prefixes. Regarding the models of QAC, the traditional method ignores the contribution from the semantic relevance between queries. However, similar queries always express extremely similar search intention. In this paper, we propose a hybrid model FS-QAC based on query semantic similarity as well as the query frequency. We choose word2vec method to measure the semantic similarity between intended queries and pre-submitted queries. By combining both features, our experiments show that FS-QAC model improves the performance when predicting the user’s query intention and helping formulate the right query. Our experimental results show that the optimal hybrid model contributes to a 7.54% improvement in terms of MRR against a state-of-the-art baseline using the public AOL query logs.
A Modular Framework for Transforming Structured Data into HTML with Machine-Readable Annotations
NASA Astrophysics Data System (ADS)
Patton, E. W.; West, P.; Rozell, E.; Zheng, J.
2010-12-01
There is a plethora of web-based Content Management Systems (CMS) available for maintaining projects and data, i.a. However, each system varies in its capabilities and often content is stored separately and accessed via non-uniform web interfaces. Moving from one CMS to another (e.g., MediaWiki to Drupal) can be cumbersome, especially if a large quantity of data must be adapted to the new system. To standardize the creation, display, management, and sharing of project information, we have assembled a framework that uses existing web technologies to transform data provided by any service that supports the SPARQL Protocol and RDF Query Language (SPARQL) queries into HTML fragments, allowing it to be embedded in any existing website. The framework utilizes a two-tier XML Stylesheet Transformation (XSLT) that uses existing ontologies (e.g., Friend-of-a-Friend, Dublin Core) to interpret query results and render them as HTML documents. These ontologies can be used in conjunction with custom ontologies suited to individual needs (e.g., domain-specific ontologies for describing data records). Furthermore, this transformation process encodes machine-readable annotations, namely, the Resource Description Framework in attributes (RDFa), into the resulting HTML, so that capable parsers and search engines can extract the relationships between entities (e.g, people, organizations, datasets). To facilitate editing of content, the framework provides a web-based form system, mapping each query to a dynamically generated form that can be used to modify and create entities, while keeping the native data store up-to-date. This open framework makes it easy to duplicate data across many different sites, allowing researchers to distribute their data in many different online forums. In this presentation we will outline the structure of queries and the stylesheets used to transform them, followed by a brief walkthrough that follows the data from storage to human- and machine-accessible web page. We conclude with a discussion on content caching and steps toward performing queries across multiple domains.
Seeking Insights About Cycling Mood Disorders via Anonymized Search Logs
White, Ryen W; Horvitz, Eric
2014-01-01
Background Mood disorders affect a significant portion of the general population. Cycling mood disorders are characterized by intermittent episodes (or events) of the disease. Objective Using anonymized Web search logs, we identify a population of people with significant interest in mood stabilizing drugs (MSD) and seek evidence of mood swings in this population. Methods We extracted queries to the Microsoft Bing search engine made by 20,046 Web searchers over six months, separately explored searcher demographics using data from a large external panel of users, and sought supporting information from people with mood disorders via a survey. We analyzed changes in information needs over time relative to searches on MSD. Results Queries for MSD focused on side effects and their relation to the disease. We found evidence of significant changes in search behavior and interests coinciding with days that MSD queries are made. These include large increases (>100%) in the access of nutrition information, commercial information, and adult materials. A survey of patients diagnosed with mood disorders provided evidence that repeated queries on MSD may come with exacerbations of mood disorder. A classifier predicting the occurrence of such queries one day before they are observed obtains strong performance (AUC=0.78). Conclusions Observed patterns in search behavior align with known behaviors and those highlighted by survey respondents. These observations suggest that searchers showing intensive interest in MSD may be patients who have been prescribed these drugs. Given behavioral dynamics, we surmise that the days on which MSD queries are made may coincide with commencement of mania or depression. Although we do not have data on mood changes and whether users have been diagnosed with bipolar illness, we see evidence of cycling in people who show interest in MSD and further show that we can predict impending shifts in behavior and interest. PMID:24568936
EquiX-A Search and Query Language for XML.
ERIC Educational Resources Information Center
Cohen, Sara; Kanza, Yaron; Kogan, Yakov; Sagiv, Yehoshua; Nutt, Werner; Serebrenik, Alexander
2002-01-01
Describes EquiX, a search language for XML that combines querying with searching to query the data and the meta-data content of Web pages. Topics include search engines; a data model for XML documents; search query syntax; search query semantics; an algorithm for evaluating a query on a document; and indexing EquiX queries. (LRW)
Spatial and symbolic queries for 3D image data
NASA Astrophysics Data System (ADS)
Benson, Daniel C.; Zick, Gregory L.
1992-04-01
We present a query system for an object-oriented biomedical imaging database containing 3-D anatomical structures and their corresponding 2-D images. The graphical interface facilitates the formation of spatial queries, nonspatial or symbolic queries, and combined spatial/symbolic queries. A query editor is used for the creation and manipulation of 3-D query objects as volumes, surfaces, lines, and points. Symbolic predicates are formulated through a combination of text fields and multiple choice selections. Query results, which may include images, image contents, composite objects, graphics, and alphanumeric data, are displayed in multiple views. Objects returned by the query may be selected directly within the views for further inspection or modification, or for use as query objects in subsequent queries. Our image database query system provides visual feedback and manipulation of spatial query objects, multiple views of volume data, and the ability to combine spatial and symbolic queries. The system allows for incremental enhancement of existing objects and the addition of new objects and spatial relationships. The query system is designed for databases containing symbolic and spatial data. This paper discuses its application to data acquired in biomedical 3- D image reconstruction, but it is applicable to other areas such as CAD/CAM, geographical information systems, and computer vision.
GenoQuery: a new querying module for functional annotation in a genomic warehouse
Lemoine, Frédéric; Labedan, Bernard; Froidevaux, Christine
2008-01-01
Motivation: We have to cope with both a deluge of new genome sequences and a huge amount of data produced by high-throughput approaches used to exploit these genomic features. Crossing and comparing such heterogeneous and disparate data will help improving functional annotation of genomes. This requires designing elaborate integration systems such as warehouses for storing and querying these data. Results: We have designed a relational genomic warehouse with an original multi-layer architecture made of a databases layer and an entities layer. We describe a new querying module, GenoQuery, which is based on this architecture. We use the entities layer to define mixed queries. These mixed queries allow searching for instances of biological entities and their properties in the different databases, without specifying in which database they should be found. Accordingly, we further introduce the central notion of alternative queries. Such queries have the same meaning as the original mixed queries, while exploiting complementarities yielded by the various integrated databases of the warehouse. We explain how GenoQuery computes all the alternative queries of a given mixed query. We illustrate how useful this querying module is by means of a thorough example. Availability: http://www.lri.fr/~lemoine/GenoQuery/ Contact: chris@lri.fr, lemoine@lri.fr PMID:18586731
Preliminary Results on Uncertainty Quantification for Pattern Analytics
DOE Office of Scientific and Technical Information (OSTI.GOV)
Stracuzzi, David John; Brost, Randolph; Chen, Maximillian Gene
2015-09-01
This report summarizes preliminary research into uncertainty quantification for pattern ana- lytics within the context of the Pattern Analytics to Support High-Performance Exploitation and Reasoning (PANTHER) project. The primary focus of PANTHER was to make large quantities of remote sensing data searchable by analysts. The work described in this re- port adds nuance to both the initial data preparation steps and the search process. Search queries are transformed from does the specified pattern exist in the data? to how certain is the system that the returned results match the query? We show example results for both data processing and search,more » and discuss a number of possible improvements for each.« less
SPARK: Adapting Keyword Query to Semantic Search
NASA Astrophysics Data System (ADS)
Zhou, Qi; Wang, Chong; Xiong, Miao; Wang, Haofen; Yu, Yong
Semantic search promises to provide more accurate result than present-day keyword search. However, progress with semantic search has been delayed due to the complexity of its query languages. In this paper, we explore a novel approach of adapting keywords to querying the semantic web: the approach automatically translates keyword queries into formal logic queries so that end users can use familiar keywords to perform semantic search. A prototype system named 'SPARK' has been implemented in light of this approach. Given a keyword query, SPARK outputs a ranked list of SPARQL queries as the translation result. The translation in SPARK consists of three major steps: term mapping, query graph construction and query ranking. Specifically, a probabilistic query ranking model is proposed to select the most likely SPARQL query. In the experiment, SPARK achieved an encouraging translation result.
Griffon, N; Schuers, M; Dhombres, F; Merabti, T; Kerdelhué, G; Rollin, L; Darmoni, S J
2016-08-02
Despite international initiatives like Orphanet, it remains difficult to find up-to-date information about rare diseases. The aim of this study is to propose an exhaustive set of queries for PubMed based on terminological knowledge and to evaluate it versus the queries based on expertise provided by the most frequently used resource in Europe: Orphanet. Four rare disease terminologies (MeSH, OMIM, HPO and HRDO) were manually mapped to each other permitting the automatic creation of expended terminological queries for rare diseases. For 30 rare diseases, 30 citations retrieved by Orphanet expert query and/or query based on terminological knowledge were assessed for relevance by two independent reviewers unaware of the query's origin. An adjudication procedure was used to resolve any discrepancy. Precision, relative recall and F-measure were all computed. For each Orphanet rare disease (n = 8982), there was a corresponding terminological query, in contrast with only 2284 queries provided by Orphanet. Only 553 citations were evaluated due to queries with 0 or only a few hits. There were no significant differences between the Orpha query and terminological query in terms of precision, respectively 0.61 vs 0.52 (p = 0.13). Nevertheless, terminological queries retrieved more citations more often than Orpha queries (0.57 vs. 0.33; p = 0.01). Interestingly, Orpha queries seemed to retrieve older citations than terminological queries (p < 0.0001). The terminological queries proposed in this study are now currently available for all rare diseases. They may be a useful tool for both precision or recall oriented literature search.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zamora, Antonio
Advanced Natural Language Processing Tools for Web Information Retrieval, Content Analysis, and Synthesis. The goal of this SBIR was to implement and evaluate several advanced Natural Language Processing (NLP) tools and techniques to enhance the precision and relevance of search results by analyzing and augmenting search queries and by helping to organize the search output obtained from heterogeneous databases and web pages containing textual information of interest to DOE and the scientific-technical user communities in general. The SBIR investigated 1) the incorporation of spelling checkers in search applications, 2) identification of significant phrases and concepts using a combination of linguisticmore » and statistical techniques, and 3) enhancement of the query interface and search retrieval results through the use of semantic resources, such as thesauri. A search program with a flexible query interface was developed to search reference databases with the objective of enhancing search results from web queries or queries of specialized search systems such as DOE's Information Bridge. The DOE ETDE/INIS Joint Thesaurus was processed to create a searchable database. Term frequencies and term co-occurrences were used to enhance the web information retrieval by providing algorithmically-derived objective criteria to organize relevant documents into clusters containing significant terms. A thesaurus provides an authoritative overview and classification of a field of knowledge. By organizing the results of a search using the thesaurus terminology, the output is more meaningful than when the results are just organized based on the terms that co-occur in the retrieved documents, some of which may not be significant. An attempt was made to take advantage of the hierarchy provided by broader and narrower terms, as well as other field-specific information in the thesauri. The search program uses linguistic morphological routines to find relevant entries regardless of whether terms are stored in singular or plural form. Implementation of additional inflectional morphology processes for verbs can enhance retrieval further, but this has to be balanced by the possibility of broadening the results too much. In addition to the DOE energy thesaurus, other sources of specialized organized knowledge such as the Medical Subject Headings (MeSH), the Unified Medical Language System (UMLS), and Wikipedia were investigated. The supporting role of the NLP thesaurus search program was enhanced by incorporating spelling aid and a part-of-speech tagger to cope with misspellings in the queries and to determine the grammatical roles of the query words and identify nouns for special processing. To improve precision, multiple modes of searching were implemented including Boolean operators, and field-specific searches. Programs to convert a thesaurus or reference file into searchable support files can be deployed easily, and the resulting files are immediately searchable to produce relevance-ranked results with builtin spelling aid, morphological processing, and advanced search logic. Demonstration systems were built for several databases, including the DOE energy thesaurus.« less
Analysis performed in support of the Ad-Hoc Working Group of RTCA SC-159 on RAIM/FDE issues
DOT National Transportation Integrated Search
2002-01-01
In 1999, the FAA requested that RTCA SC-159 address one of the recommendations from the study performed by the Johns Hopkins University (JHU) Applied Physics Lab (APL) on the use of GPS and augmented GPS for aviation applications. This recommendation...
ERIC Educational Resources Information Center
Grosch, Audrey N.
1973-01-01
A regionally organized program for serials bibliography is proposed because of the large volume of complex data needing control and the many purposes to which the data can be put in support of regional or local needs. (2 references) (Author)
ERIC Educational Resources Information Center
Petersen, Allan L.
In March 1983, the Board of Governors of the California Community Colleges mandated the preparation of criteria for the identification of courses that should be ineligible for state support, in lieu of the continuation of specific course deletions. An Ad Hoc Committee on Course Criteria was formed to assist in the identification of the issues and…
USDA-ARS?s Scientific Manuscript database
Information to support application of hydrologic and water quality (H/WQ) models abounds, yet modelers commonly use arbitrary, ad hoc methods to conduct, document, and report model calibration, validation, and evaluation. Consistent methods are needed to improve model calibration, validation, and e...
Health Promoting Schools: A New Zealand Perspective
ERIC Educational Resources Information Center
Cushman, Penni
2008-01-01
In the last 20 years the health promoting schools movement has gained momentum internationally. Without strong national leadership and direction its development in New Zealand has been ad hoc and sporadic. However, as the evidence supporting the role of health promoting schools in contributing to students' health and academic outcomes becomes more…
Millennium III Challenges: A Major Role for Agricultural Research/Extension/Education.
ERIC Educational Resources Information Center
National Association of State Universities and Land Grant Colleges, Washington, DC.
This report by the National Association of State Universities and Land-Grant Colleges (NASULGC) ad hoc Committee on Federal Support for Agricultural Research, Extension and Education suggests solutions to major challenges in agricultural research. Three new realities provide a powerful rationale for major new investment in agriculture: dramatic…
SPARQL Query Re-writing Using Partonomy Based Transformation Rules
NASA Astrophysics Data System (ADS)
Jain, Prateek; Yeh, Peter Z.; Verma, Kunal; Henson, Cory A.; Sheth, Amit P.
Often the information present in a spatial knowledge base is represented at a different level of granularity and abstraction than the query constraints. For querying ontology's containing spatial information, the precise relationships between spatial entities has to be specified in the basic graph pattern of SPARQL query which can result in long and complex queries. We present a novel approach to help users intuitively write SPARQL queries to query spatial data, rather than relying on knowledge of the ontology structure. Our framework re-writes queries, using transformation rules to exploit part-whole relations between geographical entities to address the mismatches between query constraints and knowledge base. Our experiments were performed on completely third party datasets and queries. Evaluations were performed on Geonames dataset using questions from National Geographic Bee serialized into SPARQL and British Administrative Geography Ontology using questions from a popular trivia website. These experiments demonstrate high precision in retrieval of results and ease in writing queries.
Implementation of Quantum Private Queries Using Nuclear Magnetic Resonance
NASA Astrophysics Data System (ADS)
Wang, Chuan; Hao, Liang; Zhao, Lian-Jie
2011-08-01
We present a modified protocol for the realization of a quantum private query process on a classical database. Using one-qubit query and CNOT operation, the query process can be realized in a two-mode database. In the query process, the data privacy is preserved as the sender would not reveal any information about the database besides her query information, and the database provider cannot retain any information about the query. We implement the quantum private query protocol in a nuclear magnetic resonance system. The density matrix of the memory registers are constructed.
Omicseq: a web-based search engine for exploring omics datasets.
Sun, Xiaobo; Pittard, William S; Xu, Tianlei; Chen, Li; Zwick, Michael E; Jiang, Xiaoqian; Wang, Fusheng; Qin, Zhaohui S
2017-07-03
The development and application of high-throughput genomics technologies has resulted in massive quantities of diverse omics data that continue to accumulate rapidly. These rich datasets offer unprecedented and exciting opportunities to address long standing questions in biomedical research. However, our ability to explore and query the content of diverse omics data is very limited. Existing dataset search tools rely almost exclusively on the metadata. A text-based query for gene name(s) does not work well on datasets wherein the vast majority of their content is numeric. To overcome this barrier, we have developed Omicseq, a novel web-based platform that facilitates the easy interrogation of omics datasets holistically to improve 'findability' of relevant data. The core component of Omicseq is trackRank, a novel algorithm for ranking omics datasets that fully uses the numerical content of the dataset to determine relevance to the query entity. The Omicseq system is supported by a scalable and elastic, NoSQL database that hosts a large collection of processed omics datasets. In the front end, a simple, web-based interface allows users to enter queries and instantly receive search results as a list of ranked datasets deemed to be the most relevant. Omicseq is freely available at http://www.omicseq.org. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
A Query Expansion Framework in Image Retrieval Domain Based on Local and Global Analysis
Rahman, M. M.; Antani, S. K.; Thoma, G. R.
2011-01-01
We present an image retrieval framework based on automatic query expansion in a concept feature space by generalizing the vector space model of information retrieval. In this framework, images are represented by vectors of weighted concepts similar to the keyword-based representation used in text retrieval. To generate the concept vocabularies, a statistical model is built by utilizing Support Vector Machine (SVM)-based classification techniques. The images are represented as “bag of concepts” that comprise perceptually and/or semantically distinguishable color and texture patches from local image regions in a multi-dimensional feature space. To explore the correlation between the concepts and overcome the assumption of feature independence in this model, we propose query expansion techniques in the image domain from a new perspective based on both local and global analysis. For the local analysis, the correlations between the concepts based on the co-occurrence pattern, and the metrical constraints based on the neighborhood proximity between the concepts in encoded images, are analyzed by considering local feedback information. We also analyze the concept similarities in the collection as a whole in the form of a similarity thesaurus and propose an efficient query expansion based on the global analysis. The experimental results on a photographic collection of natural scenes and a biomedical database of different imaging modalities demonstrate the effectiveness of the proposed framework in terms of precision and recall. PMID:21822350
ArrayBridge: Interweaving declarative array processing with high-performance computing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Xing, Haoyuan; Floratos, Sofoklis; Blanas, Spyros
Scientists are increasingly turning to datacenter-scale computers to produce and analyze massive arrays. Despite decades of database research that extols the virtues of declarative query processing, scientists still write, debug and parallelize imperative HPC kernels even for the most mundane queries. This impedance mismatch has been partly attributed to the cumbersome data loading process; in response, the database community has proposed in situ mechanisms to access data in scientific file formats. Scientists, however, desire more than a passive access method that reads arrays from files. This paper describes ArrayBridge, a bi-directional array view mechanism for scientific file formats, that aimsmore » to make declarative array manipulations interoperable with imperative file-centric analyses. Our prototype implementation of ArrayBridge uses HDF5 as the underlying array storage library and seamlessly integrates into the SciDB open-source array database system. In addition to fast querying over external array objects, ArrayBridge produces arrays in the HDF5 file format just as easily as it can read from it. ArrayBridge also supports time travel queries from imperative kernels through the unmodified HDF5 API, and automatically deduplicates between array versions for space efficiency. Our extensive performance evaluation in NERSC, a large-scale scientific computing facility, shows that ArrayBridge exhibits statistically indistinguishable performance and I/O scalability to the native SciDB storage engine.« less
Semantics Enabled Queries in EuroGEOSS: a Discovery Augmentation Approach
NASA Astrophysics Data System (ADS)
Santoro, M.; Mazzetti, P.; Fugazza, C.; Nativi, S.; Craglia, M.
2010-12-01
One of the main challenges in Earth Science Informatics is to build interoperability frameworks which allow users to discover, evaluate, and use information from different scientific domains. This needs to address multidisciplinary interoperability challenges concerning both technological and scientific aspects. From the technological point of view, it is necessary to provide a set of special interoperability arrangement in order to develop flexible frameworks that allow a variety of loosely-coupled services to interact with each other. From a scientific point of view, it is necessary to document clearly the theoretical and methodological assumptions underpinning applications in different scientific domains, and develop cross-domain ontologies to facilitate interdisciplinary dialogue and understanding. In this presentation we discuss a brokering approach that extends the traditional Service Oriented Architecture (SOA) adopted by most Spatial Data Infrastructures (SDIs) to provide the necessary special interoperability arrangements. In the EC-funded EuroGEOSS (A European approach to GEOSS) project, we distinguish among three possible functional brokering components: discovery, access and semantics brokers. This presentation focuses on the semantics broker, the Discovery Augmentation Component (DAC), which was specifically developed to address the three thematic areas covered by the EuroGEOSS project: biodiversity, forestry and drought. The EuroGEOSS DAC federates both semantics (e.g. SKOS repositories) and ISO-compliant geospatial catalog services. The DAC can be queried using common geospatial constraints (i.e. what, where, when, etc.). Two different augmented discovery styles are supported: a) automatic query expansion; b) user assisted query expansion. In the first case, the main discovery steps are: i. the query keywords (the what constraint) are “expanded” with related concepts/terms retrieved from the set of federated semantic services. A default expansion regards the multilinguality relationship; ii. The resulting queries are submitted to the federated catalog services; iii. The DAC performs a “smart” aggregation of the queries results and provides them back to the client. In the second case, the main discovery steps are: i. the user browses the federated semantic repositories and selects the concepts/terms-of-interest; ii. The DAC creates the set of geospatial queries based on the selected concepts/terms and submits them to the federated catalog services; iii. The DAC performs a “smart” aggregation of the queries results and provides them back to the client. A Graphical User Interface (GUI) was also developed for testing and interacting with the DAC. The entire brokering framework is deployed in the context of EuroGEOSS infrastructure and it is used in a couple of GEOSS AIP-3 use scenarios: the “e-Habitat Use Scenario” for the Biodiversity and Climate Change topic, and the “Comprehensive Drought Index Use Scenario” for Water/Drought topic
Stream traffic data archival, querying, and analysis with TransDec.
DOT National Transportation Integrated Search
2011-01-01
The goal of research was to extend the traffic data analysis of the TransDec (short for : Transportation Decision-Making) system, which was developed under METRANS 09-26 : research grant. The TransDec system is a real-data driven system to support de...
Peeling the Onion: Okapi System Architecture and Software Design Issues.
ERIC Educational Resources Information Center
Jones, S.; And Others
1997-01-01
Discusses software design issues for Okapi, an information retrieval system that incorporates both search engine and user interface and supports weighted searching, relevance feedback, and query expansion. The basic search system, adjacency searching, and moving toward a distributed system are discussed. (Author/LRW)
Providing R-Tree Support for Mongodb
NASA Astrophysics Data System (ADS)
Xiang, Longgang; Shao, Xiaotian; Wang, Dehao
2016-06-01
Supporting large amounts of spatial data is a significant characteristic of modern databases. However, unlike some mature relational databases, such as Oracle and PostgreSQL, most of current burgeoning NoSQL databases are not well designed for storing geospatial data, which is becoming increasingly important in various fields. In this paper, we propose a novel method to provide R-tree index, as well as corresponding spatial range query and nearest neighbour query functions, for MongoDB, one of the most prevalent NoSQL databases. First, after in-depth analysis of MongoDB's features, we devise an efficient tabular document structure which flattens R-tree index into MongoDB collections. Further, relevant mechanisms of R-tree operations are issued, and then we discuss in detail how to integrate R-tree into MongoDB. Finally, we present the experimental results which show that our proposed method out-performs the built-in spatial index of MongoDB. Our research will greatly facilitate big data management issues with MongoDB in a variety of geospatial information applications.
Rosenbaum, Benjamin P; Silkin, Nikolay; Miller, Randolph A
2014-01-01
Real-time alerting systems typically warn providers about abnormal laboratory results or medication interactions. For more complex tasks, institutions create site-wide 'data warehouses' to support quality audits and longitudinal research. Sophisticated systems like i2b2 or Stanford's STRIDE utilize data warehouses to identify cohorts for research and quality monitoring. However, substantial resources are required to install and maintain such systems. For more modest goals, an organization desiring merely to identify patients with 'isolation' orders, or to determine patients' eligibility for clinical trials, may adopt a simpler, limited approach based on processing the output of one clinical system, and not a data warehouse. We describe a limited, order-entry-based, real-time 'pick off' tool, utilizing public domain software (PHP, MySQL). Through a web interface the tool assists users in constructing complex order-related queries and auto-generates corresponding database queries that can be executed at recurring intervals. We describe successful application of the tool for research and quality monitoring.
A study of medical and health queries to web search engines.
Spink, Amanda; Yang, Yin; Jansen, Jim; Nykanen, Pirrko; Lorence, Daniel P; Ozmutlu, Seda; Ozmutlu, H Cenk
2004-03-01
This paper reports findings from an analysis of medical or health queries to different web search engines. We report results: (i). comparing samples of 10000 web queries taken randomly from 1.2 million query logs from the AlltheWeb.com and Excite.com commercial web search engines in 2001 for medical or health queries, (ii). comparing the 2001 findings from Excite and AlltheWeb.com users with results from a previous analysis of medical and health related queries from the Excite Web search engine for 1997 and 1999, and (iii). medical or health advice-seeking queries beginning with the word 'should'. Findings suggest: (i). a small percentage of web queries are medical or health related, (ii). the top five categories of medical or health queries were: general health, weight issues, reproductive health and puberty, pregnancy/obstetrics, and human relationships, and (iii). over time, the medical and health queries may have declined as a proportion of all web queries, as the use of specialized medical/health websites and e-commerce-related queries has increased. Findings provide insights into medical and health-related web querying and suggests some implications for the use of the general web search engines when seeking medical/health information.
Monitoring Moving Queries inside a Safe Region
Al-Khalidi, Haidar; Taniar, David; Alamri, Sultan
2014-01-01
With mobile moving range queries, there is a need to recalculate the relevant surrounding objects of interest whenever the query moves. Therefore, monitoring the moving query is very costly. The safe region is one method that has been proposed to minimise the communication and computation cost of continuously monitoring a moving range query. Inside the safe region the set of objects of interest to the query do not change; thus there is no need to update the query while it is inside its safe region. However, when the query leaves its safe region the mobile device has to reevaluate the query, necessitating communication with the server. Knowing when and where the mobile device will leave a safe region is widely known as a difficult problem. To solve this problem, we propose a novel method to monitor the position of the query over time using a linear function based on the direction of the query obtained by periodic monitoring of its position. Periodic monitoring ensures that the query is aware of its location all the time. This method reduces the costs associated with communications in client-server architecture. Computational results show that our method is successful in handling moving query patterns. PMID:24696652
Sahoo, Satya S.; Bodenreider, Olivier; Rutter, Joni L.; Skinner, Karen J.; Sheth, Amit P.
2008-01-01
Objectives This paper illustrates how Semantic Web technologies (especially RDF, OWL, and SPARQL) can support information integration and make it easy to create semantic mashups (semantically integrated resources). In the context of understanding the genetic basis of nicotine dependence, we integrate gene and pathway information and show how three complex biological queries can be answered by the integrated knowledge base. Methods We use an ontology-driven approach to integrate two gene resources (Entrez Gene and HomoloGene) and three pathway resources (KEGG, Reactome and BioCyc), for five organisms, including humans. We created the Entrez Knowledge Model (EKoM), an information model in OWL for the gene resources, and integrated it with the extant BioPAX ontology designed for pathway resources. The integrated schema is populated with data from the pathway resources, publicly available in BioPAX-compatible format, and gene resources for which a population procedure was created. The SPARQL query language is used to formulate queries over the integrated knowledge base to answer the three biological queries. Results Simple SPARQL queries could easily identify hub genes, i.e., those genes whose gene products participate in many pathways or interact with many other gene products. The identification of the genes expressed in the brain turned out to be more difficult, due to the lack of a common identification scheme for proteins. Conclusion Semantic Web technologies provide a valid framework for information integration in the life sciences. Ontology-driven integration represents a flexible, sustainable and extensible solution to the integration of large volumes of information. Additional resources, which enable the creation of mappings between information sources, are required to compensate for heterogeneity across namespaces. Resource page http://knoesis.wright.edu/research/lifesci/integration/structured_data/JBI-2008/ PMID:18395495
Sahoo, Satya S; Bodenreider, Olivier; Rutter, Joni L; Skinner, Karen J; Sheth, Amit P
2008-10-01
This paper illustrates how Semantic Web technologies (especially RDF, OWL, and SPARQL) can support information integration and make it easy to create semantic mashups (semantically integrated resources). In the context of understanding the genetic basis of nicotine dependence, we integrate gene and pathway information and show how three complex biological queries can be answered by the integrated knowledge base. We use an ontology-driven approach to integrate two gene resources (Entrez Gene and HomoloGene) and three pathway resources (KEGG, Reactome and BioCyc), for five organisms, including humans. We created the Entrez Knowledge Model (EKoM), an information model in OWL for the gene resources, and integrated it with the extant BioPAX ontology designed for pathway resources. The integrated schema is populated with data from the pathway resources, publicly available in BioPAX-compatible format, and gene resources for which a population procedure was created. The SPARQL query language is used to formulate queries over the integrated knowledge base to answer the three biological queries. Simple SPARQL queries could easily identify hub genes, i.e., those genes whose gene products participate in many pathways or interact with many other gene products. The identification of the genes expressed in the brain turned out to be more difficult, due to the lack of a common identification scheme for proteins. Semantic Web technologies provide a valid framework for information integration in the life sciences. Ontology-driven integration represents a flexible, sustainable and extensible solution to the integration of large volumes of information. Additional resources, which enable the creation of mappings between information sources, are required to compensate for heterogeneity across namespaces. RESOURCE PAGE: http://knoesis.wright.edu/research/lifesci/integration/structured_data/JBI-2008/
Automatic classification and detection of clinically relevant images for diabetic retinopathy
NASA Astrophysics Data System (ADS)
Xu, Xinyu; Li, Baoxin
2008-03-01
We proposed a novel approach to automatic classification of Diabetic Retinopathy (DR) images and retrieval of clinically-relevant DR images from a database. Given a query image, our approach first classifies the image into one of the three categories: microaneurysm (MA), neovascularization (NV) and normal, and then it retrieves DR images that are clinically-relevant to the query image from an archival image database. In the classification stage, the query DR images are classified by the Multi-class Multiple-Instance Learning (McMIL) approach, where images are viewed as bags, each of which contains a number of instances corresponding to non-overlapping blocks, and each block is characterized by low-level features including color, texture, histogram of edge directions, and shape. McMIL first learns a collection of instance prototypes for each class that maximizes the Diverse Density function using Expectation- Maximization algorithm. A nonlinear mapping is then defined using the instance prototypes and maps every bag to a point in a new multi-class bag feature space. Finally a multi-class Support Vector Machine is trained in the multi-class bag feature space. In the retrieval stage, we retrieve images from the archival database who bear the same label with the query image, and who are the top K nearest neighbors of the query image in terms of similarity in the multi-class bag feature space. The classification approach achieves high classification accuracy, and the retrieval of clinically-relevant images not only facilitates utilization of the vast amount of hidden diagnostic knowledge in the database, but also improves the efficiency and accuracy of DR lesion diagnosis and assessment.
Tang, Muh-Chyun; Liu, Ying-Hsang; Wu, Wan-Ching
2013-09-01
Previous research has shown that information seekers in biomedical domain need more support in formulating their queries. A user study was conducted to evaluate the effectiveness of a metadata based query suggestion interface for PubMed bibliographic search. The study also investigated the impact of search task familiarity on search behaviors and the effectiveness of the interface. A real user, user search request and real system approach was used for the study. Unlike tradition IR evaluation, where assigned tasks were used, the participants were asked to search requests of their own. Forty-four researchers in Health Sciences participated in the evaluation - each conducted two research requests of their own, alternately with the proposed interface and the PubMed baseline. Several performance criteria were measured to assess the potential benefits of the experimental interface, including users' assessment of their original and eventual queries, the perceived usefulness of the interfaces, satisfaction with the search results, and the average relevance score of the saved records. The results show that, when searching for an unfamiliar topic, users were more likely to change their queries, indicating the effect of familiarity on search behaviors. The results also show that the interface scored higher on several of the performance criteria, such as the "goodness" of the queries, perceived usefulness, and user satisfaction. Furthermore, in line with our hypothesis, the proposed interface was relatively more effective when less familiar search requests were attempted. Results indicate that there is a selective compatibility between search familiarity and search interface. One implication of the research for system evaluation is the importance of taking into consideration task familiarity when assessing the effectiveness of interactive IR systems. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Retrieving high-resolution images over the Internet from an anatomical image database
NASA Astrophysics Data System (ADS)
Strupp-Adams, Annette; Henderson, Earl
1999-12-01
The Visible Human Data set is an important contribution to the national collection of anatomical images. To enhance the availability of these images, the National Library of Medicine has supported the design and development of a prototype object-oriented image database which imports, stores, and distributes high resolution anatomical images in both pixel and voxel formats. One of the key database modules is its client-server Internet interface. This Web interface provides a query engine with retrieval access to high-resolution anatomical images that range in size from 100KB for browser viewable rendered images, to 1GB for anatomical structures in voxel file formats. The Web query and retrieval client-server system is composed of applet GUIs, servlets, and RMI application modules which communicate with each other to allow users to query for specific anatomical structures, and retrieve image data as well as associated anatomical images from the database. Selected images can be downloaded individually as single files via HTTP or downloaded in batch-mode over the Internet to the user's machine through an applet that uses Netscape's Object Signing mechanism. The image database uses ObjectDesign's object-oriented DBMS, ObjectStore that has a Java interface. The query and retrieval systems has been tested with a Java-CDE window system, and on the x86 architecture using Windows NT 4.0. This paper describes the Java applet client search engine that queries the database; the Java client module that enables users to view anatomical images online; the Java application server interface to the database which organizes data returned to the user, and its distribution engine that allow users to download image files individually and/or in batch-mode.
Meeker, Daniella; Jiang, Xiaoqian; Matheny, Michael E; Farcas, Claudiu; D'Arcy, Michel; Pearlman, Laura; Nookala, Lavanya; Day, Michele E; Kim, Katherine K; Kim, Hyeoneui; Boxwala, Aziz; El-Kareh, Robert; Kuo, Grace M; Resnic, Frederic S; Kesselman, Carl; Ohno-Machado, Lucila
2015-11-01
Centralized and federated models for sharing data in research networks currently exist. To build multivariate data analysis for centralized networks, transfer of patient-level data to a central computation resource is necessary. The authors implemented distributed multivariate models for federated networks in which patient-level data is kept at each site and data exchange policies are managed in a study-centric manner. The objective was to implement infrastructure that supports the functionality of some existing research networks (e.g., cohort discovery, workflow management, and estimation of multivariate analytic models on centralized data) while adding additional important new features, such as algorithms for distributed iterative multivariate models, a graphical interface for multivariate model specification, synchronous and asynchronous response to network queries, investigator-initiated studies, and study-based control of staff, protocols, and data sharing policies. Based on the requirements gathered from statisticians, administrators, and investigators from multiple institutions, the authors developed infrastructure and tools to support multisite comparative effectiveness studies using web services for multivariate statistical estimation in the SCANNER federated network. The authors implemented massively parallel (map-reduce) computation methods and a new policy management system to enable each study initiated by network participants to define the ways in which data may be processed, managed, queried, and shared. The authors illustrated the use of these systems among institutions with highly different policies and operating under different state laws. Federated research networks need not limit distributed query functionality to count queries, cohort discovery, or independently estimated analytic models. Multivariate analyses can be efficiently and securely conducted without patient-level data transport, allowing institutions with strict local data storage requirements to participate in sophisticated analyses based on federated research networks. © The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association.
Goetz, Matthew B; Bowman, Candice; Hoang, Tuyen; Anaya, Henry; Osborn, Teresa; Gifford, Allen L; Asch, Steven M
2008-03-19
We describe how we used the framework of the U.S. Department of Veterans Affairs (VA) Quality Enhancement Research Initiative (QUERI) to develop a program to improve rates of diagnostic testing for the Human Immunodeficiency Virus (HIV). This venture was prompted by the observation by the CDC that 25% of HIV-infected patients do not know their diagnosis - a point of substantial importance to the VA, which is the largest provider of HIV care in the United States. Following the QUERI steps (or process), we evaluated: 1) whether undiagnosed HIV infection is a high-risk, high-volume clinical issue within the VA, 2) whether there are evidence-based recommendations for HIV testing, 3) whether there are gaps in the performance of VA HIV testing, and 4) the barriers and facilitators to improving current practice in the VA.Based on our findings, we developed and initiated a QUERI step 4/phase 1 pilot project using the precepts of the Chronic Care Model. Our improvement strategy relies upon electronic clinical reminders to provide decision support; audit/feedback as a clinical information system, and appropriate changes in delivery system design. These activities are complemented by academic detailing and social marketing interventions to achieve provider activation. Our preliminary formative evaluation indicates the need to ensure leadership and team buy-in, address facility-specific barriers, refine the reminder, and address factors that contribute to inter-clinic variances in HIV testing rates. Preliminary unadjusted data from the first seven months of our program show 3-5 fold increases in the proportion of at-risk patients who are offered HIV testing at the VA sites (stations) where the pilot project has been undertaken; no change was seen at control stations. This project demonstrates the early success of the application of the QUERI process to the development of a program to improve HIV testing rates. Preliminary unadjusted results show that the coordinated use of audit/feedback, provider activation, and organizational change can increase HIV testing rates for at-risk patients. We are refining our program prior to extending our work to a small-scale, multi-site evaluation (QUERI step 4/phase 2). We also plan to evaluate the durability/sustainability of the intervention effect, the costs of HIV testing, and the number of newly identified HIV-infected patients. Ultimately, we will evaluate this program in other geographically dispersed stations (QUERI step 4/phases 3 and 4).
Goetz, Matthew B; Bowman, Candice; Hoang, Tuyen; Anaya, Henry; Osborn, Teresa; Gifford, Allen L; Asch, Steven M
2008-01-01
Background We describe how we used the framework of the U.S. Department of Veterans Affairs (VA) Quality Enhancement Research Initiative (QUERI) to develop a program to improve rates of diagnostic testing for the Human Immunodeficiency Virus (HIV). This venture was prompted by the observation by the CDC that 25% of HIV-infected patients do not know their diagnosis – a point of substantial importance to the VA, which is the largest provider of HIV care in the United States. Methods Following the QUERI steps (or process), we evaluated: 1) whether undiagnosed HIV infection is a high-risk, high-volume clinical issue within the VA, 2) whether there are evidence-based recommendations for HIV testing, 3) whether there are gaps in the performance of VA HIV testing, and 4) the barriers and facilitators to improving current practice in the VA. Based on our findings, we developed and initiated a QUERI step 4/phase 1 pilot project using the precepts of the Chronic Care Model. Our improvement strategy relies upon electronic clinical reminders to provide decision support; audit/feedback as a clinical information system, and appropriate changes in delivery system design. These activities are complemented by academic detailing and social marketing interventions to achieve provider activation. Results Our preliminary formative evaluation indicates the need to ensure leadership and team buy-in, address facility-specific barriers, refine the reminder, and address factors that contribute to inter-clinic variances in HIV testing rates. Preliminary unadjusted data from the first seven months of our program show 3–5 fold increases in the proportion of at-risk patients who are offered HIV testing at the VA sites (stations) where the pilot project has been undertaken; no change was seen at control stations. Discussion This project demonstrates the early success of the application of the QUERI process to the development of a program to improve HIV testing rates. Preliminary unadjusted results show that the coordinated use of audit/feedback, provider activation, and organizational change can increase HIV testing rates for at-risk patients. We are refining our program prior to extending our work to a small-scale, multi-site evaluation (QUERI step 4/phase 2). We also plan to evaluate the durability/sustainability of the intervention effect, the costs of HIV testing, and the number of newly identified HIV-infected patients. Ultimately, we will evaluate this program in other geographically dispersed stations (QUERI step 4/phases 3 and 4). PMID:18353185
Peute, Linda W P; de Keizer, Nicolette F; Jaspers, Monique W M
2015-06-01
To compare the performance of the Concurrent (CTA) and Retrospective (RTA) Think Aloud method and to assess their value in a formative usability evaluation of an Intensive Care Registry-physician data query tool designed to support ICU quality improvement processes. Sixteen representative intensive care physicians participated in the usability evaluation study. Subjects were allocated to either the CTA or RTA method by a matched randomized design. Each subject performed six usability-testing tasks of varying complexity in the query tool in a real-working context. Methods were compared with regard to number and type of problems detected. Verbal protocols of CTA and RTA were analyzed in depth to assess differences in verbal output. Standardized measures were applied to assess thoroughness in usability problem detection weighted per problem severity level and method overall effectiveness in detecting usability problems with regard to the time subjects spent per method. The usability evaluation of the data query tool revealed a total of 43 unique usability problems that the intensive care physicians encountered. CTA detected unique usability problems with regard to graphics/symbols, navigation issues, error messages, and the organization of information on the query tool's screens. RTA detected unique issues concerning system match with subjects' language and applied terminology. The in-depth verbal protocol analysis of CTA provided information on intensive care physicians' query design strategies. Overall, CTA performed significantly better than RTA in detecting usability problems. CTA usability problem detection effectiveness was 0.80 vs. 0.62 (p<0.05) respectively, with an average difference of 42% less time spent per subject compared to RTA. In addition, CTA was more thorough in detecting usability problems of a moderate (0.85 vs. 0.7) and severe nature (0.71 vs. 0.57). In this study, the CTA is more effective in usability-problem detection and provided clarification of intensive care physician query design strategies to inform redesign of the query tool. However, CTA does not outperform RTA. The RTA additionally elucidated unique usability problems and new user requirements. Based on the results of this study, we recommend the use of CTA in formative usability evaluation studies of health information technology. However, we recommend further research on the application of RTA in usability studies with regard to user expertise and experience when focusing on user profile customized (re)design. Copyright © 2015 Elsevier Inc. All rights reserved.
Cumulative query method for influenza surveillance using search engine data.
Seo, Dong-Woo; Jo, Min-Woo; Sohn, Chang Hwan; Shin, Soo-Yong; Lee, JaeHo; Yu, Maengsoo; Kim, Won Young; Lim, Kyoung Soo; Lee, Sang-Il
2014-12-16
Internet search queries have become an important data source in syndromic surveillance system. However, there is currently no syndromic surveillance system using Internet search query data in South Korea. The objective of this study was to examine correlations between our cumulative query method and national influenza surveillance data. Our study was based on the local search engine, Daum (approximately 25% market share), and influenza-like illness (ILI) data from the Korea Centers for Disease Control and Prevention. A quota sampling survey was conducted with 200 participants to obtain popular queries. We divided the study period into two sets: Set 1 (the 2009/10 epidemiological year for development set 1 and 2010/11 for validation set 1) and Set 2 (2010/11 for development Set 2 and 2011/12 for validation Set 2). Pearson's correlation coefficients were calculated between the Daum data and the ILI data for the development set. We selected the combined queries for which the correlation coefficients were .7 or higher and listed them in descending order. Then, we created a cumulative query method n representing the number of cumulative combined queries in descending order of the correlation coefficient. In validation set 1, 13 cumulative query methods were applied, and 8 had higher correlation coefficients (min=.916, max=.943) than that of the highest single combined query. Further, 11 of 13 cumulative query methods had an r value of ≥.7, but 4 of 13 combined queries had an r value of ≥.7. In validation set 2, 8 of 15 cumulative query methods showed higher correlation coefficients (min=.975, max=.987) than that of the highest single combined query. All 15 cumulative query methods had an r value of ≥.7, but 6 of 15 combined queries had an r value of ≥.7. Cumulative query method showed relatively higher correlation with national influenza surveillance data than combined queries in the development and validation set.
PROTICdb: a web-based application to store, track, query, and compare plant proteome data.
Ferry-Dumazet, Hélène; Houel, Gwenn; Montalent, Pierre; Moreau, Luc; Langella, Olivier; Negroni, Luc; Vincent, Delphine; Lalanne, Céline; de Daruvar, Antoine; Plomion, Christophe; Zivy, Michel; Joets, Johann
2005-05-01
PROTICdb is a web-based application, mainly designed to store and analyze plant proteome data obtained by two-dimensional polyacrylamide gel electrophoresis (2-D PAGE) and mass spectrometry (MS). The purposes of PROTICdb are (i) to store, track, and query information related to proteomic experiments, i.e., from tissue sampling to protein identification and quantitative measurements, and (ii) to integrate information from the user's own expertise and other sources into a knowledge base, used to support data interpretation (e.g., for the determination of allelic variants or products of post-translational modifications). Data insertion into the relational database of PROTICdb is achieved either by uploading outputs of image analysis and MS identification software, or by filling web forms. 2-D PAGE annotated maps can be displayed, queried, and compared through a graphical interface. Links to external databases are also available. Quantitative data can be easily exported in a tabulated format for statistical analyses. PROTICdb is based on the Oracle or the PostgreSQL Database Management System and is freely available upon request at the following URL: http://moulon.inra.fr/ bioinfo/PROTICdb.
Combining clinical and genomics queries using i2b2 – Three methods
Murphy, Shawn N.; Avillach, Paul; Bellazzi, Riccardo; Phillips, Lori; Gabetta, Matteo; Eran, Alal; McDuffie, Michael T.; Kohane, Isaac S.
2017-01-01
We are fortunate to be living in an era of twin biomedical data surges: a burgeoning representation of human phenotypes in the medical records of our healthcare systems, and high-throughput sequencing making rapid technological advances. The difficulty representing genomic data and its annotations has almost by itself led to the recognition of a biomedical “Big Data” challenge, and the complexity of healthcare data only compounds the problem to the point that coherent representation of both systems on the same platform seems insuperably difficult. We investigated the capability for complex, integrative genomic and clinical queries to be supported in the Informatics for Integrating Biology and the Bedside (i2b2) translational software package. Three different data integration approaches were developed: The first is based on Sequence Ontology, the second is based on the tranSMART engine, and the third on CouchDB. These novel methods for representing and querying complex genomic and clinical data on the i2b2 platform are available today for advancing precision medicine. PMID:28388645
Efficient Implementation of an Optimal Interpolator for Large Spatial Data Sets
NASA Technical Reports Server (NTRS)
Memarsadeghi, Nargess; Mount, David M.
2007-01-01
Scattered data interpolation is a problem of interest in numerous areas such as electronic imaging, smooth surface modeling, and computational geometry. Our motivation arises from applications in geology and mining, which often involve large scattered data sets and a demand for high accuracy. The method of choice is ordinary kriging. This is because it is a best unbiased estimator. Unfortunately, this interpolant is computationally very expensive to compute exactly. For n scattered data points, computing the value of a single interpolant involves solving a dense linear system of size roughly n x n. This is infeasible for large n. In practice, kriging is solved approximately by local approaches that are based on considering only a relatively small'number of points that lie close to the query point. There are many problems with this local approach, however. The first is that determining the proper neighborhood size is tricky, and is usually solved by ad hoc methods such as selecting a fixed number of nearest neighbors or all the points lying within a fixed radius. Such fixed neighborhood sizes may not work well for all query points, depending on local density of the point distribution. Local methods also suffer from the problem that the resulting interpolant is not continuous. Meyer showed that while kriging produces smooth continues surfaces, it has zero order continuity along its borders. Thus, at interface boundaries where the neighborhood changes, the interpolant behaves discontinuously. Therefore, it is important to consider and solve the global system for each interpolant. However, solving such large dense systems for each query point is impractical. Recently a more principled approach to approximating kriging has been proposed based on a technique called covariance tapering. The problems arise from the fact that the covariance functions that are used in kriging have global support. Our implementations combine, utilize, and enhance a number of different approaches that have been introduced in literature for solving large linear systems for interpolation of scattered data points. For very large systems, exact methods such as Gaussian elimination are impractical since they require 0(n(exp 3)) time and 0(n(exp 2)) storage. As Billings et al. suggested, we use an iterative approach. In particular, we use the SYMMLQ method, for solving the large but sparse ordinary kriging systems that result from tapering. The main technical issue that need to be overcome in our algorithmic solution is that the points' covariance matrix for kriging should be symmetric positive definite. The goal of tapering is to obtain a sparse approximate representation of the covariance matrix while maintaining its positive definiteness. Furrer et al. used tapering to obtain a sparse linear system of the form Ax = b, where A is the tapered symmetric positive definite covariance matrix. Thus, Cholesky factorization could be used to solve their linear systems. They implemented an efficient sparse Cholesky decomposition method. They also showed if these tapers are used for a limited class of covariance models, the solution of the system converges to the solution of the original system. Matrix A in the ordinary kriging system, while symmetric, is not positive definite. Thus, their approach is not applicable to the ordinary kriging system. Therefore, we use tapering only to obtain a sparse linear system. Then, we use SYMMLQ to solve the ordinary kriging system. We show that solving large kriging systems becomes practical via tapering and iterative methods, and results in lower estimation errors compared to traditional local approaches, and significant memory savings compared to the original global system. We also developed a more efficient variant of the sparse SYMMLQ method for large ordinary kriging systems. This approach adaptively finds the correct local neighborhood for each query point in the interpolation process.
A Query Integrator and Manager for the Query Web
Brinkley, James F.; Detwiler, Landon T.
2012-01-01
We introduce two concepts: the Query Web as a layer of interconnected queries over the document web and the semantic web, and a Query Web Integrator and Manager (QI) that enables the Query Web to evolve. QI permits users to write, save and reuse queries over any web accessible source, including other queries saved in other installations of QI. The saved queries may be in any language (e.g. SPARQL, XQuery); the only condition for interconnection is that the queries return their results in some form of XML. This condition allows queries to chain off each other, and to be written in whatever language is appropriate for the task. We illustrate the potential use of QI for several biomedical use cases, including ontology view generation using a combination of graph-based and logical approaches, value set generation for clinical data management, image annotation using terminology obtained from an ontology web service, ontology-driven brain imaging data integration, small-scale clinical data integration, and wider-scale clinical data integration. Such use cases illustrate the current range of applications of QI and lead us to speculate about the potential evolution from smaller groups of interconnected queries into a larger query network that layers over the document and semantic web. The resulting Query Web could greatly aid researchers and others who now have to manually navigate through multiple information sources in order to answer specific questions. PMID:22531831
SCEC UCVM - Unified California Velocity Model
NASA Astrophysics Data System (ADS)
Small, P.; Maechling, P. J.; Jordan, T. H.; Ely, G. P.; Taborda, R.
2011-12-01
The SCEC Unified California Velocity Model (UCVM) is a software framework for a state-wide California velocity model. UCVM provides researchers with two new capabilities: (1) the ability to query Vp, Vs, and density from any standard regional California velocity model through a uniform interface, and (2) the ability to combine multiple velocity models into a single state-wide model. These features are crucial in order to support large-scale ground motion simulations and to facilitate improvements in the underlying velocity models. UCVM provides integrated support for the following standard velocity models: SCEC CVM-H, SCEC CVM-S and the CVM-SI variant, USGS Bay Area (cencalvm), Lin-Thurber Statewide, and other smaller regional models. New models may be easily incorporated as they become available. Two query interfaces are provided: a Linux command line program, and a C application programming interface (API). The C API query interface is simple, fully independent of any specific model, and MPI-friendly. Input coordinates are geographic longitude/latitude and the vertical coordinate may be either depth or elevation. Output parameters include Vp, Vs, and density along with the identity of the model from which these material properties were obtained. In addition to access to the standard models, UCVM also includes a high resolution statewide digital elevation model, Vs30 map, and an optional near-surface geo-technical layer (GTL) based on Ely's Vs30-derived GTL. The elevation and Vs30 information is bundled along with the returned Vp,Vs velocities and density, so that all relevant information is retrieved with a single query. When the GTL is enabled, it is blended with the underlying crustal velocity models along a configurable transition depth range with an interpolation function. Multiple, possibly overlapping, regional velocity models may be combined together into a single state-wide model. This is accomplished by tiling the regional models on top of one another in three dimensions in a researcher-specified order. No reconciliation is performed within overlapping model regions, although a post-processing tool is provided to perform a simple numerical smoothing. Lastly, a 3D region from a combined model may be extracted and exported into a CVM-Etree. This etree may then be queried by UCVM much like a standard velocity model but with less overhead and generally better performance due to the efficiency of the etree data structure.
Calculations concerning the HCO(+)/HOC(+) abundance ratio in dense interstellar clouds
NASA Technical Reports Server (NTRS)
Defrees, D. J.; Mclean, A. D.; Herbst, E.
1984-01-01
Calculations have been performed to determine the rate coefficients of several reactions involved in both the formation and depletion of interstellar HCO(+) and HOC(+). The abundance of HOC(+) deduced from these calculations is consistent with the tentative identification of HOC(+) in Sgr B2 by Woods et al. (1983). The large HCO(+)/HOC(+) abundance ratio observed by Woods et al. is due at least in part to a more rapid formation rate for HCO(+) and probably due as well to a more rapid depletion rate for HOC(+).
Data Container Study for Handling Array-based Data Using Rasdaman, Hive, Spark, and MongoDB
NASA Astrophysics Data System (ADS)
Xu, M.; Hu, F.; Yu, M.; Scheele, C.; Liu, K.; Huang, Q.; Yang, C. P.; Little, M. M.
2016-12-01
Geoscience communities have come up with various big data storage solutions, such as Rasdaman and Hive, to address the grand challenges for massive Earth observation data management and processing. To examine the readiness of current solutions in supporting big Earth observation, we propose to investigate and compare four popular data container solutions, including Rasdaman, Hive, Spark, and MongoDB. Using different types of spatial and non-spatial queries, datasets stored in common scientific data formats (e.g., NetCDF and HDF), and two applications (i.e. dust storm simulation data mining and MERRA data analytics), we systematically compare and evaluate the feature and performance of these four data containers in terms of data discover and access. The computing resources (e.g. CPU, memory, hard drive, network) consumed while performing various queries and operations are monitored and recorded for the performance evaluation. The initial results show that 1) Rasdaman has the best performance for queries on statistical and operational functions, and supports NetCDF data format better than HDF; 2) Rasdaman clustering configuration is more complex than the others; 3) Hive performs better on single pixel extraction from multiple images; and 4) Except for the single pixel extractions, Spark performs better than Hive and its performance is close to Rasdaman. A comprehensive report will detail the experimental results, and compare their pros and cons regarding system performance, ease of use, accessibility, scalability, compatibility, and flexibility.
HyQue: evaluating hypotheses using Semantic Web technologies.
Callahan, Alison; Dumontier, Michel; Shah, Nigam H
2011-05-17
Key to the success of e-Science is the ability to computationally evaluate expert-composed hypotheses for validity against experimental data. Researchers face the challenge of collecting, evaluating and integrating large amounts of diverse information to compose and evaluate a hypothesis. Confronted with rapidly accumulating data, researchers currently do not have the software tools to undertake the required information integration tasks. We present HyQue, a Semantic Web tool for querying scientific knowledge bases with the purpose of evaluating user submitted hypotheses. HyQue features a knowledge model to accommodate diverse hypotheses structured as events and represented using Semantic Web languages (RDF/OWL). Hypothesis validity is evaluated against experimental and literature-sourced evidence through a combination of SPARQL queries and evaluation rules. Inference over OWL ontologies (for type specifications, subclass assertions and parthood relations) and retrieval of facts stored as Bio2RDF linked data provide support for a given hypothesis. We evaluate hypotheses of varying levels of detail about the genetic network controlling galactose metabolism in Saccharomyces cerevisiae to demonstrate the feasibility of deploying such semantic computing tools over a growing body of structured knowledge in Bio2RDF. HyQue is a query-based hypothesis evaluation system that can currently evaluate hypotheses about the galactose metabolism in S. cerevisiae. Hypotheses as well as the supporting or refuting data are represented in RDF and directly linked to one another allowing scientists to browse from data to hypothesis and vice versa. HyQue hypotheses and data are available at http://semanticscience.org/projects/hyque.
Rapid Deployment of a RESTful Service for Oceanographic Research Cruises
NASA Astrophysics Data System (ADS)
Fu, Linyun; Arko, Robert; Leadbetter, Adam
2014-05-01
The Ocean Data Interoperability Platform (ODIP) seeks to increase data sharing across scientific domains and international boundaries, by providing a forum to harmonize diverse regional data systems. ODIP participants from the US include the Rolling Deck to Repository (R2R) program, whose mission is to capture, catalog, and describe the underway/environmental sensor data from US oceanographic research vessels and submit the data to public long-term archives. R2R publishes information online as Linked Open Data, making it widely available using Semantic Web standards. Each vessel, sensor, cruise, dataset, person, organization, funding award, log, report, etc, has a Uniform Resource Identifier (URI). Complex queries that federate results from other data providers are supported, using the SPARQL query language. To facilitate interoperability, R2R uses controlled vocabularies developed collaboratively by the science community (eg. SeaDataNet device categories) and published online by the NERC Vocabulary Server (NVS). In response to user feedback, we are developing a standard programming interface (API) and Web portal for R2R's Linked Open Data. The API provides a set of simple REST-type URLs that are translated on-the-fly into SPARQL queries, and supports common output formats (eg. JSON). We will demonstrate an implementation based on the Epimorphics Linked Data API (ELDA) open-source Java package. Our experience shows that constructing a simple portal with limited schema elements in this way can significantly reduce development time and maintenance complexity.
Toyota, Toshiaki; Morimoto, Takeshi; Shiomi, Hiroki; Ando, Kenji; Ono, Koh; Shizuta, Satoshi; Kato, Takao; Saito, Naritatsu; Furukawa, Yutaka; Nakagawa, Yoshihisa; Horie, Minoru; Kimura, Takeshi
2017-03-24
Few studies have evaluated the prevalence and clinical outcomes of ad hoc percutaneous coronary intervention (PCI), performing diagnostic coronary angiography and PCI in the same session, in stable coronary artery disease (CAD) patients.Methods and Results:From the CREDO-Kyoto PCI/CABG registry cohort-2, 6,943 patients were analyzed as having stable CAD and undergoing first PCI. Ad hoc PCI and non-ad hoc PCI were performed in 1,722 (24.8%) and 5,221 (75.1%) patients, respectively. The cumulative 5-year incidence and adjusted risk for all-cause death were not significantly different between the 2 groups (15% vs. 15%, P=0.53; hazard ratio: 1.15, 95% confidence interval: 0.98-1.35, P=0.08). Ad hoc PCI relative to non-ad hoc PCI was associated with neutral risk for myocardial infarction, any coronary revascularization, and bleeding, but was associated with a trend towards lower risk for stroke (hazard ratio: 0.78, 95% confidence interval: 0.60-1.02, P=0.06). Ad hoc PCI in stable CAD patients was associated with at least comparable 5-year clinical outcomes as with non-ad hoc PCI. Considering patients' preference and the cost-saving, the ad hoc PCI strategy might be a safe and attractive option for patients with stable CAD, although the prevalence of ad hoc PCI was low in the current study population.
On service differentiation in mobile Ad Hoc networks.
Zhang, Shun-liang; Ye, Cheng-qing
2004-09-01
A network model is proposed to support service differentiation for mobile Ad Hoc networks by combining a fully distributed admission control approach and the DIFS based differentiation mechanism of IEEE802.11. It can provide different kinds of QoS (Quality of Service) for various applications. Admission controllers determine a committed bandwidth based on the reserved bandwidth of flows and the source utilization of networks. Packets are marked when entering into networks by markers according to the committed rate. By the mark in the packet header, intermediate nodes handle the received packets in different manners to provide applications with the QoS corresponding to the pre-negotiated profile. Extensive simulation experiments showed that the proposed mechanism can provide QoS guarantee to assured service traffic and increase the channel utilization of networks.
Burg, Matthew M; Barefoot, John; Berkman, Lisa; Catellier, Diane J; Czajkowski, Susan; Saab, Patrice; Huber, Marc; DeLillo, Vicki; Mitchell, Pamela; Skala, Judy; Taylor, C Barr
2005-01-01
In post hoc analyses, to examine in low perceived social support (LPSS) patients enrolled in the Enhancing Recovery in Coronary Heart Disease (ENRICHD) clinical trial (n = 1503), the pattern of social support following myocardial infarction (MI), the impact of psychosocial intervention on perceived support, the relationship of perceived support at the time of MI to subsequent death and recurrent MI, and the relationship of change in perceived support 6 months after MI to subsequent mortality. Partner status (partner, no partner) and score (<12 = low support; >12 = moderate support) on the ENRICHD Social Support Instrument (ESSI) were used post hoc to define four levels of risk. The resulting 4 LPSS risk groups were compared on baseline characteristics, changes in social support, and medical outcomes to a group of concurrently enrolled acute myocardial infarction patients without depression or LPSS (MI comparison group, n = 408). Effects of treatment assignment on LPSS and death/recurrent MI were also examined. All 4 LPSS risk groups demonstrated improvement in perceived support, regardless of treatment assignment, with a significant treatment effect only seen in the LPSS risk group with no partner and moderate support at baseline. During an average 29-month follow-up, the combined end point of death/nonfatal MI was 10% in the MI comparison group and 23% in the ENRICHD LPSS patients; LPSS conferred a greater risk in unadjusted and adjusted models (HR = 1.74-2.39). Change in ESSI score and/or improvement in perceived social support were not found to predict subsequent mortality. Baseline LPSS predicted death/recurrent MI in the ENRICHD cohort, independent of treatment assignment. Intervention effects indicated a partner surrogacy role for the interventionist and the need for a moderate level of support at baseline for the intervention to be effective.
NASA Astrophysics Data System (ADS)
Mehic, M.; Fazio, P.; Voznak, M.; Partila, P.; Komosny, D.; Tovarek, J.; Chmelikova, Z.
2016-05-01
A mobile ad hoc network is a collection of mobile nodes which communicate without a fixed backbone or centralized infrastructure. Due to the frequent mobility of nodes, routes connecting two distant nodes may change. Therefore, it is not possible to establish a priori fixed paths for message delivery through the network. Because of its importance, routing is the most studied problem in mobile ad hoc networks. In addition, if the Quality of Service (QoS) is demanded, one must guarantee the QoS not only over a single hop but over an entire wireless multi-hop path which may not be a trivial task. In turns, this requires the propagation of QoS information within the network. The key to the support of QoS reporting is QoS routing, which provides path QoS information at each source. To support QoS for real-time traffic one needs to know not only minimum delay on the path to the destination but also the bandwidth available on it. Therefore, throughput, end-to-end delay, and routing overhead are traditional performance metrics used to evaluate the performance of routing protocol. To obtain additional information about the link, most of quality-link metrics are based on calculation of the lost probabilities of links by broadcasting probe packets. In this paper, we address the problem of including multiple routing metrics in existing routing packets that are broadcasted through the network. We evaluate the efficiency of such approach with modified version of DSDV routing protocols in ns-3 simulator.
Using an image-extended relational database to support content-based image retrieval in a PACS.
Traina, Caetano; Traina, Agma J M; Araújo, Myrian R B; Bueno, Josiane M; Chino, Fabio J T; Razente, Humberto; Azevedo-Marques, Paulo M
2005-12-01
This paper presents a new Picture Archiving and Communication System (PACS), called cbPACS, which has content-based image retrieval capabilities. The cbPACS answers range and k-nearest- neighbor similarity queries, employing a relational database manager extended to support images. The images are compared through their features, which are extracted by an image-processing module and stored in the extended relational database. The database extensions were developed aiming at efficiently answering similarity queries by taking advantage of specialized indexing methods. The main concept supporting the extensions is the definition, inside the relational manager, of distance functions based on features extracted from the images. An extension to the SQL language enables the construction of an interpreter that intercepts the extended commands and translates them to standard SQL, allowing any relational database server to be used. By now, the system implemented works on features based on color distribution of the images through normalized histograms as well as metric histograms. Metric histograms are invariant regarding scale, translation and rotation of images and also to brightness transformations. The cbPACS is prepared to integrate new image features, based on texture and shape of the main objects in the image.
SNOMED CT module-driven clinical archetype management.
Allones, J L; Taboada, M; Martinez, D; Lozano, R; Sobrido, M J
2013-06-01
To explore semantic search to improve management and user navigation in clinical archetype repositories. In order to support semantic searches across archetypes, an automated method based on SNOMED CT modularization is implemented to transform clinical archetypes into SNOMED CT extracts. Concurrently, query terms are converted into SNOMED CT concepts using the search engine Lucene. Retrieval is then carried out by matching query concepts with the corresponding SNOMED CT segments. A test collection of the 16 clinical archetypes, including over 250 terms, and a subset of 55 clinical terms from two medical dictionaries, MediLexicon and MedlinePlus, were used to test our method. The keyword-based service supported by the OpenEHR repository offered us a benchmark to evaluate the enhancement of performance. In total, our approach reached 97.4% precision and 69.1% recall, providing a substantial improvement of recall (more than 70%) compared to the benchmark. Exploiting medical domain knowledge from ontologies such as SNOMED CT may overcome some limitations of the keyword-based systems and thus improve the search experience of repository users. An automated approach based on ontology segmentation is an efficient and feasible way for supporting modeling, management and user navigation in clinical archetype repositories. Copyright © 2013 Elsevier Inc. All rights reserved.
AnnotCompute: annotation-based exploration and meta-analysis of genomics experiments
Zheng, Jie; Stoyanovich, Julia; Manduchi, Elisabetta; Liu, Junmin; Stoeckert, Christian J.
2011-01-01
The ever-increasing scale of biological data sets, particularly those arising in the context of high-throughput technologies, requires the development of rich data exploration tools. In this article, we present AnnotCompute, an information discovery platform for repositories of functional genomics experiments such as ArrayExpress. Our system leverages semantic annotations of functional genomics experiments with controlled vocabulary and ontology terms, such as those from the MGED Ontology, to compute conceptual dissimilarities between pairs of experiments. These dissimilarities are then used to support two types of exploratory analysis—clustering and query-by-example. We show that our proposed dissimilarity measures correspond to a user's intuition about conceptual dissimilarity, and can be used to support effective query-by-example. We also evaluate the quality of clustering based on these measures. While AnnotCompute can support a richer data exploration experience, its effectiveness is limited in some cases, due to the quality of available annotations. Nonetheless, tools such as AnnotCompute may provide an incentive for richer annotations of experiments. Code is available for download at http://www.cbil.upenn.edu/downloads/AnnotCompute. Database URL: http://www.cbil.upenn.edu/annotCompute/ PMID:22190598
Guiding Students to Answers: Query Recommendation
ERIC Educational Resources Information Center
Yilmazel, Ozgur
2011-01-01
This paper reports on a guided navigation system built on the textbook search engine developed at Anadolu University to support distance education students. The search engine uses Turkish Language specific language processing modules to enable searches over course material presented in Open Education Faculty textbooks. We implemented a guided…
Using Generalized Annotated Programs to Solve Social Network Diffusion Optimization Problems
2013-01-01
as follows: —Let kall be the k value for the SNDOP-ALL query and for each SNDOP query i, let ki be the k for that query. For each query i, set ki... kall − 1. —Number each element of vi ∈ V such that gI(vi) and V C(vi) are true. For the ith SNDOP query, let vi be the corresponding element of V —Let...vertices of S. PROOF. We set up |V | SNDOP-queries as follows: —Let kall be the k value for the SNDOP-ALL query and and for each SNDOP-query i, let ki be
A web-based data-querying tool based on ontology-driven methodology and flowchart-based model.
Ping, Xiao-Ou; Chung, Yufang; Tseng, Yi-Ju; Liang, Ja-Der; Yang, Pei-Ming; Huang, Guan-Tarn; Lai, Feipei
2013-10-08
Because of the increased adoption rate of electronic medical record (EMR) systems, more health care records have been increasingly accumulating in clinical data repositories. Therefore, querying the data stored in these repositories is crucial for retrieving the knowledge from such large volumes of clinical data. The aim of this study is to develop a Web-based approach for enriching the capabilities of the data-querying system along the three following considerations: (1) the interface design used for query formulation, (2) the representation of query results, and (3) the models used for formulating query criteria. The Guideline Interchange Format version 3.5 (GLIF3.5), an ontology-driven clinical guideline representation language, was used for formulating the query tasks based on the GLIF3.5 flowchart in the Protégé environment. The flowchart-based data-querying model (FBDQM) query execution engine was developed and implemented for executing queries and presenting the results through a visual and graphical interface. To examine a broad variety of patient data, the clinical data generator was implemented to automatically generate the clinical data in the repository, and the generated data, thereby, were employed to evaluate the system. The accuracy and time performance of the system for three medical query tasks relevant to liver cancer were evaluated based on the clinical data generator in the experiments with varying numbers of patients. In this study, a prototype system was developed to test the feasibility of applying a methodology for building a query execution engine using FBDQMs by formulating query tasks using the existing GLIF. The FBDQM-based query execution engine was used to successfully retrieve the clinical data based on the query tasks formatted using the GLIF3.5 in the experiments with varying numbers of patients. The accuracy of the three queries (ie, "degree of liver damage," "degree of liver damage when applying a mutually exclusive setting," and "treatments for liver cancer") was 100% for all four experiments (10 patients, 100 patients, 1000 patients, and 10,000 patients). Among the three measured query phases, (1) structured query language operations, (2) criteria verification, and (3) other, the first two had the longest execution time. The ontology-driven FBDQM-based approach enriched the capabilities of the data-querying system. The adoption of the GLIF3.5 increased the potential for interoperability, shareability, and reusability of the query tasks.
When Leadership Matters: Perspectives from a Teacher Team Implementing Response to Intervention
ERIC Educational Resources Information Center
Meyer, Michele M.; Behar-Horenstein, Linda S.
2015-01-01
Previous research investigating the effectiveness of response to intervention (RTI) has relied on post hoc data analyses and surveys, although few studies have explored interactions among teacher teams. Understanding the synergistic impact of teacher work within the RTI framework may have implications for how school leaders can support teacher…
Realizing Outdoor Independent Learning with a Butterfly-Watching Mobile Learning System
ERIC Educational Resources Information Center
Chen, Yuh-Shyan; Kao, Tai-Chien; Sheu, Jang-Ping
2005-01-01
In this article, we describe the development of a mobile butterfly-watching learning (BWL) system to realize outdoor independent learning for mobile learners. The mobile butterfly-watching learning system was designed in a wireless mobile ad-hoc learning environment. This is first result to provide a cognitive tool with supporting the independent…
Foreign Languages and Higher Education: New Structures for a Changed World
ERIC Educational Resources Information Center
Modern Language Association, 2007
2007-01-01
The Modern Language Association (MLA) supports a broad, intellectually driven approach to teaching language and culture in higher education. To study the best ways of implementing this approach in today's world, the MLA Executive Council established an Ad Hoc Committee on Foreign Languages. The committee was charged with examining the current…
2012-01-01
Background Semantic Web technology can considerably catalyze translational genetics and genomics research in medicine, where the interchange of information between basic research and clinical levels becomes crucial. This exchange involves mapping abstract phenotype descriptions from research resources, such as knowledge databases and catalogs, to unstructured datasets produced through experimental methods and clinical practice. This is especially true for the construction of mutation databases. This paper presents a way of harmonizing abstract phenotype descriptions with patient data from clinical practice, and querying this dataset about relationships between phenotypes and genetic variants, at different levels of abstraction. Methods Due to the current availability of ontological and terminological resources that have already reached some consensus in biomedicine, a reuse-based ontology engineering approach was followed. The proposed approach uses the Ontology Web Language (OWL) to represent the phenotype ontology and the patient model, the Semantic Web Rule Language (SWRL) to bridge the gap between phenotype descriptions and clinical data, and the Semantic Query Web Rule Language (SQWRL) to query relevant phenotype-genotype bidirectional relationships. The work tests the use of semantic web technology in the biomedical research domain named cerebrotendinous xanthomatosis (CTX), using a real dataset and ontologies. Results A framework to query relevant phenotype-genotype bidirectional relationships is provided. Phenotype descriptions and patient data were harmonized by defining 28 Horn-like rules in terms of the OWL concepts. In total, 24 patterns of SWQRL queries were designed following the initial list of competency questions. As the approach is based on OWL, the semantic of the framework adapts the standard logical model of an open world assumption. Conclusions This work demonstrates how semantic web technologies can be used to support flexible representation and computational inference mechanisms required to query patient datasets at different levels of abstraction. The open world assumption is especially good for describing only partially known phenotype-genotype relationships, in a way that is easily extensible. In future, this type of approach could offer researchers a valuable resource to infer new data from patient data for statistical analysis in translational research. In conclusion, phenotype description formalization and mapping to clinical data are two key elements for interchanging knowledge between basic and clinical research. PMID:22849591
Representation and Integration of Scientific Information
NASA Technical Reports Server (NTRS)
1998-01-01
The objective of this Joint Research Interchange with NASA-Ames was to investigate how the Tsimmis technology could be used to represent and integrate scientific information. The main goal of the Tsimmis project is to allow a decision maker to find information of interest from such sources, fuse it, and process it (e.g., summarize it, visualize it, discover trends). Another important goal is the easy incorporation of new sources, as well the ability to deal with sources whose structure or services evolve. During the Interchange we had research meetings approximately every month or two. The funds provided by NASA supported work that lead to the following two papers: Fusion Queries over Internet Databases; Efficient Query Subscription Processing in a Multicast Environment.
An RDF/OWL knowledge base for query answering and decision support in clinical pharmacogenetics.
Samwald, Matthias; Freimuth, Robert; Luciano, Joanne S; Lin, Simon; Powers, Robert L; Marshall, M Scott; Adlassnig, Klaus-Peter; Dumontier, Michel; Boyce, Richard D
2013-01-01
Genetic testing for personalizing pharmacotherapy is bound to become an important part of clinical routine. To address associated issues with data management and quality, we are creating a semantic knowledge base for clinical pharmacogenetics. The knowledge base is made up of three components: an expressive ontology formalized in the Web Ontology Language (OWL 2 DL), a Resource Description Framework (RDF) model for capturing detailed results of manual annotation of pharmacogenomic information in drug product labels, and an RDF conversion of relevant biomedical datasets. Our work goes beyond the state of the art in that it makes both automated reasoning as well as query answering as simple as possible, and the reasoning capabilities go beyond the capabilities of previously described ontologies.
Pattern Discovery and Change Detection of Online Music Query Streams
NASA Astrophysics Data System (ADS)
Li, Hua-Fu
In this paper, an efficient stream mining algorithm, called FTP-stream (Frequent Temporal Pattern mining of streams), is proposed to find the frequent temporal patterns over melody sequence streams. In the framework of our proposed algorithm, an effective bit-sequence representation is used to reduce the time and memory needed to slide the windows. The FTP-stream algorithm can calculate the support threshold in only a single pass based on the concept of bit-sequence representation. It takes the advantage of "left" and "and" operations of the representation. Experiments show that the proposed algorithm only scans the music query stream once, and runs significant faster and consumes less memory than existing algorithms, such as SWFI-stream and Moment.
Stracuzzi, David John; Brost, Randolph C.; Phillips, Cynthia A.; ...
2015-09-26
Geospatial semantic graphs provide a robust foundation for representing and analyzing remote sensor data. In particular, they support a variety of pattern search operations that capture the spatial and temporal relationships among the objects and events in the data. However, in the presence of large data corpora, even a carefully constructed search query may return a large number of unintended matches. This work considers the problem of calculating a quality score for each match to the query, given that the underlying data are uncertain. As a result, we present a preliminary evaluation of three methods for determining both match qualitymore » scores and associated uncertainty bounds, illustrated in the context of an example based on overhead imagery data.« less
TEQUEL: The query language of SADDLE
NASA Technical Reports Server (NTRS)
Rajan, S. D.
1984-01-01
A relational database management system is presented that is tailored for engineering applications. A wide variety of engineering data types are supported and the data definition language (DDL) and data manipulation language (DML) are extended to handle matrices. The system can be used either in the standalone mode or through a FORTRAN or PASCAL application program. The query language is of the relational calculus type and allows the user to store, retrieve, update and delete tuples from relations. The relational operations including union, intersect and differ facilitate creation of temporary relations that can be used for manipulating information in a powerful manner. Sample applications are shown to illustrate the creation of data through a FORTRAN program and data manipulation using the TEQUEL DML.
Jadhav, Ashutosh; Andrews, Donna; Fiksdal, Alexander; Kumbamu, Ashok; McCormick, Jennifer B; Misitano, Andrew; Nelsen, Laurie; Ryu, Euijung; Sheth, Amit; Wu, Stephen
2014-01-01
Background The number of people using the Internet and mobile/smart devices for health information seeking is increasing rapidly. Although the user experience for online health information seeking varies with the device used, for example, smart devices (SDs) like smartphones/tablets versus personal computers (PCs) like desktops/laptops, very few studies have investigated how online health information seeking behavior (OHISB) may differ by device. Objective The objective of this study is to examine differences in OHISB between PCs and SDs through a comparative analysis of large-scale health search queries submitted through Web search engines from both types of devices. Methods Using the Web analytics tool, IBM NetInsight OnDemand, and based on the type of devices used (PCs or SDs), we obtained the most frequent health search queries between June 2011 and May 2013 that were submitted on Web search engines and directed users to the Mayo Clinic’s consumer health information website. We performed analyses on “Queries with considering repetition counts (QwR)” and “Queries without considering repetition counts (QwoR)”. The dataset contains (1) 2.74 million and 3.94 million QwoR, respectively for PCs and SDs, and (2) more than 100 million QwR for both PCs and SDs. We analyzed structural properties of the queries (length of the search queries, usage of query operators and special characters in health queries), types of search queries (keyword-based, wh-questions, yes/no questions), categorization of the queries based on health categories and information mentioned in the queries (gender, age-groups, temporal references), misspellings in the health queries, and the linguistic structure of the health queries. Results Query strings used for health information searching via PCs and SDs differ by almost 50%. The most searched health categories are “Symptoms” (1 in 3 search queries), “Causes”, and “Treatments & Drugs”. The distribution of search queries for different health categories differs with the device used for the search. Health queries tend to be longer and more specific than general search queries. Health queries from SDs are longer and have slightly fewer spelling mistakes than those from PCs. Users specify words related to women and children more often than that of men and any other age group. Most of the health queries are formulated using keywords; the second-most common are wh- and yes/no questions. Users ask more health questions using SDs than PCs. Almost all health queries have at least one noun and health queries from SDs are more descriptive than those from PCs. Conclusions This study is a large-scale comparative analysis of health search queries to understand the effects of device type (PCs vs SDs) used on OHISB. The study indicates that the device used for online health information search plays an important role in shaping how health information searches by consumers and patients are executed. PMID:25000537
Jadhav, Ashutosh; Andrews, Donna; Fiksdal, Alexander; Kumbamu, Ashok; McCormick, Jennifer B; Misitano, Andrew; Nelsen, Laurie; Ryu, Euijung; Sheth, Amit; Wu, Stephen; Pathak, Jyotishman
2014-07-04
The number of people using the Internet and mobile/smart devices for health information seeking is increasing rapidly. Although the user experience for online health information seeking varies with the device used, for example, smart devices (SDs) like smartphones/tablets versus personal computers (PCs) like desktops/laptops, very few studies have investigated how online health information seeking behavior (OHISB) may differ by device. The objective of this study is to examine differences in OHISB between PCs and SDs through a comparative analysis of large-scale health search queries submitted through Web search engines from both types of devices. Using the Web analytics tool, IBM NetInsight OnDemand, and based on the type of devices used (PCs or SDs), we obtained the most frequent health search queries between June 2011 and May 2013 that were submitted on Web search engines and directed users to the Mayo Clinic's consumer health information website. We performed analyses on "Queries with considering repetition counts (QwR)" and "Queries without considering repetition counts (QwoR)". The dataset contains (1) 2.74 million and 3.94 million QwoR, respectively for PCs and SDs, and (2) more than 100 million QwR for both PCs and SDs. We analyzed structural properties of the queries (length of the search queries, usage of query operators and special characters in health queries), types of search queries (keyword-based, wh-questions, yes/no questions), categorization of the queries based on health categories and information mentioned in the queries (gender, age-groups, temporal references), misspellings in the health queries, and the linguistic structure of the health queries. Query strings used for health information searching via PCs and SDs differ by almost 50%. The most searched health categories are "Symptoms" (1 in 3 search queries), "Causes", and "Treatments & Drugs". The distribution of search queries for different health categories differs with the device used for the search. Health queries tend to be longer and more specific than general search queries. Health queries from SDs are longer and have slightly fewer spelling mistakes than those from PCs. Users specify words related to women and children more often than that of men and any other age group. Most of the health queries are formulated using keywords; the second-most common are wh- and yes/no questions. Users ask more health questions using SDs than PCs. Almost all health queries have at least one noun and health queries from SDs are more descriptive than those from PCs. This study is a large-scale comparative analysis of health search queries to understand the effects of device type (PCs vs. SDs) used on OHISB. The study indicates that the device used for online health information search plays an important role in shaping how health information searches by consumers and patients are executed.
SkyQuery - A Prototype Distributed Query and Cross-Matching Web Service for the Virtual Observatory
NASA Astrophysics Data System (ADS)
Thakar, A. R.; Budavari, T.; Malik, T.; Szalay, A. S.; Fekete, G.; Nieto-Santisteban, M.; Haridas, V.; Gray, J.
2002-12-01
We have developed a prototype distributed query and cross-matching service for the VO community, called SkyQuery, which is implemented with hierarchichal Web Services. SkyQuery enables astronomers to run combined queries on existing distributed heterogeneous astronomy archives. SkyQuery provides a simple, user-friendly interface to run distributed queries over the federation of registered astronomical archives in the VO. The SkyQuery client connects to the portal Web Service, which farms the query out to the individual archives, which are also Web Services called SkyNodes. The cross-matching algorithm is run recursively on each SkyNode. Each archive is a relational DBMS with a HTM index for fast spatial lookups. The results of the distributed query are returned as an XML DataSet that is automatically rendered by the client. SkyQuery also returns the image cutout corresponding to the query result. SkyQuery finds not only matches between the various catalogs, but also dropouts - objects that exist in some of the catalogs but not in others. This is often as important as finding matches. We demonstrate the utility of SkyQuery with a brown-dwarf search between SDSS and 2MASS, and a search for radio-quiet quasars in SDSS, 2MASS and FIRST. The importance of a service like SkyQuery for the worldwide astronomical community cannot be overstated: data on the same objects in various archives is mapped in different wavelength ranges and looks very different due to different errors, instrument sensitivities and other peculiarities of each archive. Our cross-matching algorithm preforms a fuzzy spatial join across multiple catalogs. This type of cross-matching is currently often done by eye, one object at a time. A static cross-identification table for a set of archives would become obsolete by the time it was built - the exponential growth of astronomical data means that a dynamic cross-identification mechanism like SkyQuery is the only viable option. SkyQuery was funded by a grant from the NASA AISR program.
Software Engineering Laboratory (SEL) database organization and user's guide, revision 2
NASA Technical Reports Server (NTRS)
Morusiewicz, Linda; Bristow, John
1992-01-01
The organization of the Software Engineering Laboratory (SEL) database is presented. Included are definitions and detailed descriptions of the database tables and views, the SEL data, and system support data. The mapping from the SEL and system support data to the base table is described. In addition, techniques for accessing the database through the Database Access Manager for the SEL (DAMSEL) system and via the ORACLE structured query language (SQL) are discussed.
Software Engineering Laboratory (SEL) database organization and user's guide
NASA Technical Reports Server (NTRS)
So, Maria; Heller, Gerard; Steinberg, Sandra; Spiegel, Douglas
1989-01-01
The organization of the Software Engineering Laboratory (SEL) database is presented. Included are definitions and detailed descriptions of the database tables and views, the SEL data, and system support data. The mapping from the SEL and system support data to the base tables is described. In addition, techniques for accessing the database, through the Database Access Manager for the SEL (DAMSEL) system and via the ORACLE structured query language (SQL), are discussed.
S-MART, a software toolbox to aid RNA-Seq data analysis.
Zytnicki, Matthias; Quesneville, Hadi
2011-01-01
High-throughput sequencing is now routinely performed in many experiments. But the analysis of the millions of sequences generated, is often beyond the expertise of the wet labs who have no personnel specializing in bioinformatics. Whereas several tools are now available to map high-throughput sequencing data on a genome, few of these can extract biological knowledge from the mapped reads. We have developed a toolbox called S-MART, which handles mapped RNA-Seq data. S-MART is an intuitive and lightweight tool which performs many of the tasks usually required for the analysis of mapped RNA-Seq reads. S-MART does not require any computer science background and thus can be used by all of the biologist community through a graphical interface. S-MART can run on any personal computer, yielding results within an hour even for Gb of data for most queries. S-MART may perform the entire analysis of the mapped reads, without any need for other ad hoc scripts. With this tool, biologists can easily perform most of the analyses on their computer for their RNA-Seq data, from the mapped data to the discovery of important loci.
S-MART, A Software Toolbox to Aid RNA-seq Data Analysis
Zytnicki, Matthias; Quesneville, Hadi
2011-01-01
High-throughput sequencing is now routinely performed in many experiments. But the analysis of the millions of sequences generated, is often beyond the expertise of the wet labs who have no personnel specializing in bioinformatics. Whereas several tools are now available to map high-throughput sequencing data on a genome, few of these can extract biological knowledge from the mapped reads. We have developed a toolbox called S-MART, which handles mapped RNA-Seq data. S-MART is an intuitive and lightweight tool which performs many of the tasks usually required for the analysis of mapped RNA-Seq reads. S-MART does not require any computer science background and thus can be used by all of the biologist community through a graphical interface. S-MART can run on any personal computer, yielding results within an hour even for Gb of data for most queries. S-MART may perform the entire analysis of the mapped reads, without any need for other ad hoc scripts. With this tool, biologists can easily perform most of the analyses on their computer for their RNA-Seq data, from the mapped data to the discovery of important loci. PMID:21998740
New public dataset for spotting patterns in medieval document images
NASA Astrophysics Data System (ADS)
En, Sovann; Nicolas, Stéphane; Petitjean, Caroline; Jurie, Frédéric; Heutte, Laurent
2017-01-01
With advances in technology, a large part of our cultural heritage is becoming digitally available. In particular, in the field of historical document image analysis, there is now a growing need for indexing and data mining tools, thus allowing us to spot and retrieve the occurrences of an object of interest, called a pattern, in a large database of document images. Patterns may present some variability in terms of color, shape, or context, making the spotting of patterns a challenging task. Pattern spotting is a relatively new field of research, still hampered by the lack of available annotated resources. We present a new publicly available dataset named DocExplore dedicated to spotting patterns in historical document images. The dataset contains 1500 images and 1464 queries, and allows the evaluation of two tasks: image retrieval and pattern localization. A standardized benchmark protocol along with ad hoc metrics is provided for a fair comparison of the submitted approaches. We also provide some first results obtained with our baseline system on this new dataset, which show that there is room for improvement and that should encourage researchers of the document image analysis community to design new systems and submit improved results.
Face recognition: database acquisition, hybrid algorithms, and human studies
NASA Astrophysics Data System (ADS)
Gutta, Srinivas; Huang, Jeffrey R.; Singh, Dig; Wechsler, Harry
1997-02-01
One of the most important technologies absent in traditional and emerging frontiers of computing is the management of visual information. Faces are accessible `windows' into the mechanisms that govern our emotional and social lives. The corresponding face recognition tasks considered herein include: (1) Surveillance, (2) CBIR, and (3) CBIR subject to correct ID (`match') displaying specific facial landmarks such as wearing glasses. We developed robust matching (`classification') and retrieval schemes based on hybrid classifiers and showed their feasibility using the FERET database. The hybrid classifier architecture consist of an ensemble of connectionist networks--radial basis functions-- and decision trees. The specific characteristics of our hybrid architecture include (a) query by consensus as provided by ensembles of networks for coping with the inherent variability of the image formation and data acquisition process, and (b) flexible and adaptive thresholds as opposed to ad hoc and hard thresholds. Experimental results, proving the feasibility of our approach, yield (i) 96% accuracy, using cross validation (CV), for surveillance on a data base consisting of 904 images (ii) 97% accuracy for CBIR tasks, on a database of 1084 images, and (iii) 93% accuracy, using CV, for CBIR subject to correct ID match tasks on a data base of 200 images.
Siirala, Eriikka; Peltonen, Laura-Maria; Lundgrén-Laine, Heljä; Salanterä, Sanna; Junttila, Kristiina
2016-09-01
To describe the tactical and the operational decisions made by nurse managers when managing the daily unit operation in peri-operative settings. Management is challenging as situations change rapidly and decisions are constantly made. Understanding decision-making in this complex environment helps to develop decision support systems to support nurse managers' operative and tactical decision-making. Descriptive cross-sectional design. Data were collected from 20 nurse managers with the think-aloud method during the busiest working hours and analysed using thematic content analysis. Nurse managers made over 700 decisions; either ad hoc (n = 289), near future (n = 268) or long-term (n = 187) by nature. Decisions were often made simultaneously with many interruptions. Ad hoc decisions covered staff allocation, ensuring adequate staff, rescheduling surgical procedures, confirmation tangible resources and following-up the daily unit operation. Decisions in the near future were: planning of surgical procedures and tangible resources, and planning staff allocation. Long-term decisions were: human recourses, nursing development, supplies and equipment, and finances in the unit. Decision-making was vulnerable to interruptions, which sometimes complicated the managing tasks. The results can be used when planning decision support systems and when defining the nurse managers' tasks in peri-operative settings. © 2016 John Wiley & Sons Ltd.
Teng, Rui; Leibnitz, Kenji; Miura, Ryu
2013-01-01
An essential application of wireless sensor networks is to successfully respond to user queries. Query packet losses occur in the query dissemination due to wireless communication problems such as interference, multipath fading, packet collisions, etc. The losses of query messages at sensor nodes result in the failure of sensor nodes reporting the requested data. Hence, the reliable and successful dissemination of query messages to sensor nodes is a non-trivial problem. The target of this paper is to enable highly successful query delivery to sensor nodes by localized and energy-efficient discovery, and recovery of query losses. We adopt local and collective cooperation among sensor nodes to increase the success rate of distributed discoveries and recoveries. To enable the scalability in the operations of discoveries and recoveries, we employ a distributed name resolution mechanism at each sensor node to allow sensor nodes to self-detect the correlated queries and query losses, and then efficiently locally respond to the query losses. We prove that the collective discovery of query losses has a high impact on the success of query dissemination and reveal that scalability can be achieved by using the proposed approach. We further study the novel features of the cooperation and competition in the collective recovery at PHY and MAC layers, and show that the appropriate number of detectors can achieve optimal successful recovery rate. We evaluate the proposed approach with both mathematical analyses and computer simulations. The proposed approach enables a high rate of successful delivery of query messages and it results in short route lengths to recover from query losses. The proposed approach is scalable and operates in a fully distributed manner. PMID:23748172
HEALTH GeoJunction: place-time-concept browsing of health publications.
MacEachren, Alan M; Stryker, Michael S; Turton, Ian J; Pezanowski, Scott
2010-05-18
The volume of health science publications is escalating rapidly. Thus, keeping up with developments is becoming harder as is the task of finding important cross-domain connections. When geographic location is a relevant component of research reported in publications, these tasks are more difficult because standard search and indexing facilities have limited or no ability to identify geographic foci in documents. This paper introduces HEALTH GeoJunction, a web application that supports researchers in the task of quickly finding scientific publications that are relevant geographically and temporally as well as thematically. HEALTH GeoJunction is a geovisual analytics-enabled web application providing: (a) web services using computational reasoning methods to extract place-time-concept information from bibliographic data for documents and (b) visually-enabled place-time-concept query, filtering, and contextualizing tools that apply to both the documents and their extracted content. This paper focuses specifically on strategies for visually-enabled, iterative, facet-like, place-time-concept filtering that allows analysts to quickly drill down to scientific findings of interest in PubMed abstracts and to explore relations among abstracts and extracted concepts in place and time. The approach enables analysts to: find publications without knowing all relevant query parameters, recognize unanticipated geographic relations within and among documents in multiple health domains, identify the thematic emphasis of research targeting particular places, notice changes in concepts over time, and notice changes in places where concepts are emphasized. PubMed is a database of over 19 million biomedical abstracts and citations maintained by the National Center for Biotechnology Information; achieving quick filtering is an important contribution due to the database size. Including geography in filters is important due to rapidly escalating attention to geographic factors in public health. The implementation of mechanisms for iterative place-time-concept filtering makes it possible to narrow searches efficiently and quickly from thousands of documents to a small subset that meet place-time-concept constraints. Support for a more-like-this query creates the potential to identify unexpected connections across diverse areas of research. Multi-view visualization methods support understanding of the place, time, and concept components of document collections and enable comparison of filtered query results to the full set of publications.
USDA-ARS?s Scientific Manuscript database
Finding relevant knowledge and information to prevent land degradation and support restoration has historically involved researchers working from their own knowledge, querying people they know, and tediously searching topical literature reviews.To address this need we created JournalMap (http://www....
A Participants' DSS for a Management Game with a DSS Generator.
ERIC Educational Resources Information Center
Yeo, Gee Kin; Nah, Fui Hoon
1992-01-01
Describes the design of a decision support system (DSS) for a management game called MAGNUS (Management Game for National University of Singapore). Built-in models for performance analysis and decision making are explained; database query and model building are described; and future work is discussed. (11 references) (LRW)
A Database Evaluation Based on Information Needs of Academic Social Scientists.
ERIC Educational Resources Information Center
Buterbaugh, Nancy Toth
This study evaluates two databases, "Historical Abstracts" and REESWeb, to determine their effectiveness in supporting academic social science research. While many performance evaluations gather quantitative data from isolated query and response transactions, this study is a qualitative evaluation of the databases in the context of…
Query Modification through External Sources to Support Clinical Decisions
2014-11-01
takes no medications. Physical examination is normal. The EKG shows nonspecific changes. Summary 58-year-old woman with hypertension and obesity presents...algorithm for suffix stripping. Program, 14:130–137, 1980. Reprinted in Readings in Information Retrieval, pages 313–316, 1997. M. S. Simpson, E
Thematic video indexing to support video database retrieval and query processing
NASA Astrophysics Data System (ADS)
Khoja, Shakeel A.; Hall, Wendy
1999-08-01
This paper presents a novel video database system, which caters for complex and long videos, such as documentaries, educational videos, etc. As compared to relatively structured format videos like CNN news or commercial advertisements, this database system has the capacity to work with long and unstructured videos.
Worked Examples in Teaching Queries for Searching Academic Databases
ERIC Educational Resources Information Center
Kickham-Samy, Mary
2013-01-01
The worked-example effect, an application of cognitive load theory, is a well-supported method of instruction for well-structured problems (Chandler and Sweller, 1991; Cooper and Sweller, 1987; Sweller and Cooper, 1985; Tuovinen & Sweller, 1999; Ward and Sweller, 1990). One limitation is expertise-reversal effect, where advanced students…
NASA Astrophysics Data System (ADS)
Nass, A.; D'Amore, M.; Helbert, J.
2018-04-01
An archiving structure and reference level of derived and already published data supports the scientific community significantly by a constant rise of knowledge and understanding based on recent discussions within Information Science and Management.
Making Authentic Data Accessible: The Sensing the Environment Inquiry Module
ERIC Educational Resources Information Center
Griffis, Kathy; Thadani, Vandana; Wise, Joe
2008-01-01
We report on the development of a middle school life sciences inquiry module, Sensing the Environment. This "data-enriched" inquiry module includes a series of activities exploring the nature of science, photosynthesis, transpiration, and natural selection, which culminates in students' querying authentic environmental data to support a scientific…
2002-01-01
to the OODBMS approach. The ORDBMS approach produced such research prototypes as Postgres [155], and Starburst [67] and commercial products such as...Kemnitz. The POSTGRES Next-Generation Database Management System. Communications of the ACM, 34(10):78–92, 1991. [156] Michael Stonebreaker and Dorothy
Ontological Approach to Military Knowledge Modeling and Management
2004-03-01
federated search mechanism has to reformulate user queries (expressed using the ontology) in the query languages of the different sources (e.g. SQL...ontologies as a common terminology – Unified query to perform federated search • Query processing – Ontology mapping to sources reformulate queries
VISAGE: Interactive Visual Graph Querying.
Pienta, Robert; Navathe, Shamkant; Tamersoy, Acar; Tong, Hanghang; Endert, Alex; Chau, Duen Horng
2016-06-01
Extracting useful patterns from large network datasets has become a fundamental challenge in many domains. We present VISAGE, an interactive visual graph querying approach that empowers users to construct expressive queries, without writing complex code (e.g., finding money laundering rings of bankers and business owners). Our contributions are as follows: (1) we introduce graph autocomplete , an interactive approach that guides users to construct and refine queries, preventing over-specification; (2) VISAGE guides the construction of graph queries using a data-driven approach, enabling users to specify queries with varying levels of specificity, from concrete and detailed (e.g., query by example), to abstract (e.g., with "wildcard" nodes of any types), to purely structural matching; (3) a twelve-participant, within-subject user study demonstrates VISAGE's ease of use and the ability to construct graph queries significantly faster than using a conventional query language; (4) VISAGE works on real graphs with over 468K edges, achieving sub-second response times for common queries.
VISAGE: Interactive Visual Graph Querying
Pienta, Robert; Navathe, Shamkant; Tamersoy, Acar; Tong, Hanghang; Endert, Alex; Chau, Duen Horng
2017-01-01
Extracting useful patterns from large network datasets has become a fundamental challenge in many domains. We present VISAGE, an interactive visual graph querying approach that empowers users to construct expressive queries, without writing complex code (e.g., finding money laundering rings of bankers and business owners). Our contributions are as follows: (1) we introduce graph autocomplete, an interactive approach that guides users to construct and refine queries, preventing over-specification; (2) VISAGE guides the construction of graph queries using a data-driven approach, enabling users to specify queries with varying levels of specificity, from concrete and detailed (e.g., query by example), to abstract (e.g., with “wildcard” nodes of any types), to purely structural matching; (3) a twelve-participant, within-subject user study demonstrates VISAGE’s ease of use and the ability to construct graph queries significantly faster than using a conventional query language; (4) VISAGE works on real graphs with over 468K edges, achieving sub-second response times for common queries. PMID:28553670
Intelligent self-organization methods for wireless ad hoc sensor networks based on limited resources
NASA Astrophysics Data System (ADS)
Hortos, William S.
2006-05-01
A wireless ad hoc sensor network (WSN) is a configuration for area surveillance that affords rapid, flexible deployment in arbitrary threat environments. There is no infrastructure support and sensor nodes communicate with each other only when they are in transmission range. To a greater degree than the terminals found in mobile ad hoc networks (MANETs) for communications, sensor nodes are resource-constrained, with limited computational processing, bandwidth, memory, and power, and are typically unattended once in operation. Consequently, the level of information exchange among nodes, to support any complex adaptive algorithms to establish network connectivity and optimize throughput, not only deplete those limited resources and creates high overhead in narrowband communications, but also increase network vulnerability to eavesdropping by malicious nodes. Cooperation among nodes, critical to the mission of sensor networks, can thus be disrupted by the inappropriate choice of the method for self-organization. Recent published contributions to the self-configuration of ad hoc sensor networks, e.g., self-organizing mapping and swarm intelligence techniques, have been based on the adaptive control of the cross-layer interactions found in MANET protocols to achieve one or more performance objectives: connectivity, intrusion resistance, power control, throughput, and delay. However, few studies have examined the performance of these algorithms when implemented with the limited resources of WSNs. In this paper, self-organization algorithms for the initiation, operation and maintenance of a network topology from a collection of wireless sensor nodes are proposed that improve the performance metrics significant to WSNs. The intelligent algorithm approach emphasizes low computational complexity, energy efficiency and robust adaptation to change, allowing distributed implementation with the actual limited resources of the cooperative nodes of the network. Extensions of the algorithms from flat topologies to two-tier hierarchies of sensor nodes are presented. Results from a few simulations of the proposed algorithms are compared to the published results of other approaches to sensor network self-organization in common scenarios. The estimated network lifetime and extent under static resource allocations are computed.
EarthServer: Cross-Disciplinary Earth Science Through Data Cube Analytics
NASA Astrophysics Data System (ADS)
Baumann, P.; Rossi, A. P.
2016-12-01
The unprecedented increase of imagery, in-situ measurements, and simulation data produced by Earth (and Planetary) Science observations missions bears a rich, yet not leveraged potential for getting insights from integrating such diverse datasets and transform scientific questions into actual queries to data, formulated in a standardized way.The intercontinental EarthServer [1] initiative is demonstrating new directions for flexible, scalable Earth Science services based on innovative NoSQL technology. Researchers from Europe, the US and Australia have teamed up to rigorously implement the concept of the datacube. Such a datacube may have spatial and temporal dimensions (such as a satellite image time series) and may unite an unlimited number of scenes. Independently from whatever efficient data structuring a server network may perform internally, users (scientist, planners, decision makers) will always see just a few datacubes they can slice and dice.EarthServer has established client [2] and server technology for such spatio-temporal datacubes. The underlying scalable array engine, rasdaman [3,4], enables direct interaction, including 3-D visualization, common EO data processing, and general analytics. Services exclusively rely on the open OGC "Big Geo Data" standards suite, the Web Coverage Service (WCS). Conversely, EarthServer has shaped and advanced WCS based on the experience gained. The first phase of EarthServer has advanced scalable array database technology into 150+ TB services. Currently, Petabyte datacubes are being built for ad-hoc and cross-disciplinary querying, e.g. using climate, Earth observation and ocean data.We will present the EarthServer approach, its impact on OGC / ISO / INSPIRE standardization, and its platform technology, rasdaman.References: [1] Baumann, et al. (2015) DOI: 10.1080/17538947.2014.1003106 [2] Hogan, P., (2011) NASA World Wind, Proceedings of the 2nd International Conference on Computing for Geospatial Research & Applications ACM. [3] Baumann, Peter, et al. (2014) In Proc. 10th ICDM, 194-201. [4] Dumitru, A. et al. (2014) In Proc ACM SIGMOD Workshop on Data Analytics in the Cloud (DanaC'2014), 1-4.
Which factors predict the time spent answering queries to a drug information centre?
Reppe, Linda A.; Spigset, Olav
2010-01-01
Objective To develop a model based upon factors able to predict the time spent answering drug-related queries to Norwegian drug information centres (DICs). Setting and method Drug-related queries received at 5 DICs in Norway from March to May 2007 were randomly assigned to 20 employees until each of them had answered a minimum of five queries. The employees reported the number of drugs involved, the type of literature search performed, and whether the queries were considered judgmental or not, using a specifically developed scoring system. Main outcome measures The scores of these three factors were added together to define a workload score for each query. Workload and its individual factors were subsequently related to the measured time spent answering the queries by simple or multiple linear regression analyses. Results Ninety-six query/answer pairs were analyzed. Workload significantly predicted the time spent answering the queries (adjusted R2 = 0.22, P < 0.001). Literature search was the individual factor best predicting the time spent answering the queries (adjusted R2 = 0.17, P < 0.001), and this variable also contributed the most in the multiple regression analyses. Conclusion The most important workload factor predicting the time spent handling the queries in this study was the type of literature search that had to be performed. The categorisation of queries as judgmental or not, also affected the time spent answering the queries. The number of drugs involved did not significantly influence the time spent answering drug information queries. PMID:20922480
Personalized query suggestion based on user behavior
NASA Astrophysics Data System (ADS)
Chen, Wanyu; Hao, Zepeng; Shao, Taihua; Chen, Honghui
Query suggestions help users refine their queries after they input an initial query. Previous work mainly concentrated on similarity-based and context-based query suggestion approaches. However, models that focus on adapting to a specific user (personalization) can help to improve the probability of the user being satisfied. In this paper, we propose a personalized query suggestion model based on users’ search behavior (UB model), where we inject relevance between queries and users’ search behavior into a basic probabilistic model. For the relevance between queries, we consider their semantical similarity and co-occurrence which indicates the behavior information from other users in web search. Regarding the current user’s preference to a query, we combine the user’s short-term and long-term search behavior in a linear fashion and deal with the data sparse problem with Bayesian probabilistic matrix factorization (BPMF). In particular, we also investigate the impact of different personalization strategies (the combination of the user’s short-term and long-term search behavior) on the performance of query suggestion reranking. We quantify the improvement of our proposed UB model against a state-of-the-art baseline using the public AOL query logs and show that it beats the baseline in terms of metrics used in query suggestion reranking. The experimental results show that: (i) for personalized ranking, users’ behavioral information helps to improve query suggestion effectiveness; and (ii) given a query, merging information inferred from the short-term and long-term search behavior of a particular user can result in a better performance than both plain approaches.
Human Hepatocyte Growth Factor (hHGF)-Modified Hepatic Oval Cells Improve Liver Transplant Survival
Li, Li; Ran, Jiang-Hua; Li, Xue-Hua; Liu, Zhi-Heng; Liu, Gui-Jie; Gao, Yan-Chao; Zhang, Xue-Li; Sun, Hiu-Dong
2012-01-01
Despite progress in the field of immunosuppression, acute rejection is still a common postoperative complication following liver transplantation. This study aims to investigate the capacity of the human hepatocyte growth factor (hHGF) in modifying hepatic oval cells (HOCs) administered simultaneously with orthotopic liver transplantation as a means of improving graft survival. HOCs were activated and isolated using a modified 2-acetylaminofluorene/partial hepatectomy (2-AAF/PH) model in male Lewis rats. A HOC line stably expressing the HGF gene was established following stable transfection of the pBLAST2-hHGF plasmid. Our results demonstrated that hHGF-modified HOCs could efficiently differentiate into hepatocytes and bile duct epithelial cells in vitro. Administration of HOCs at the time of liver transplantation induced a wider distribution of SRY-positive donor cells in liver tissues. Administration of hHGF-HOC at the time of transplantation remarkably prolonged the median survival time and improved liver function for recipients compared to these parameters in the other treatment groups (P<0.05). Moreover, hHGF-HOC administration at the time of liver transplantation significantly suppressed elevation of interleukin-2 (IL-2), tumor necrosis factor-α (TNF-α) and interferon-γ (IFN-γ) levels while increasing the production of IL-10 and TGF-β1 (P<0.05). HOC or hHGF-HOC administration promoted cell proliferation, reduced cell apoptosis, and decreased liver allograft rejection rates. Furthermore, hHGF-modified HOCs more efficiently reduced acute allograft rejection (P<0.05 versus HOC transplantation only). Our results indicate that the combination of hHGF-modified HOCs with liver transplantation decreased host anti-graft immune responses resulting in a reduction of allograft rejection rates and prolonging graft survival in recipient rats. This suggests that HOC-based cell transplantation therapies can be developed as a means of treating severe liver injuries. PMID:23028627
Schuers, Matthieu; Joulakian, Mher; Kerdelhué, Gaetan; Segas, Léa; Grosjean, Julien; Darmoni, Stéfan J; Griffon, Nicolas
2017-07-03
MEDLINE is the most widely used medical bibliographic database in the world. Most of its citations are in English and this can be an obstacle for some researchers to access the information the database contains. We created a multilingual query builder to facilitate access to the PubMed subset using a language other than English. The aim of our study was to assess the impact of this multilingual query builder on the quality of PubMed queries for non-native English speaking physicians and medical researchers. A randomised controlled study was conducted among French speaking general practice residents. We designed a multi-lingual query builder to facilitate information retrieval, based on available MeSH translations and providing users with both an interface and a controlled vocabulary in their own language. Participating residents were randomly allocated either the French or the English version of the query builder. They were asked to translate 12 short medical questions into MeSH queries. The main outcome was the quality of the query. Two librarians blind to the arm independently evaluated each query, using a modified published classification that differentiated eight types of errors. Twenty residents used the French version of the query builder and 22 used the English version. 492 queries were analysed. There were significantly more perfect queries in the French group vs. the English group (respectively 37.9% vs. 17.9%; p < 0.01). It took significantly more time for the members of the English group than the members of the French group to build each query, respectively 194 sec vs. 128 sec; p < 0.01. This multi-lingual query builder is an effective tool to improve the quality of PubMed queries in particular for researchers whose first language is not English.
Dynamic autonomous routing technology for IP-based satellite ad hoc networks
NASA Astrophysics Data System (ADS)
Wang, Xiaofei; Deng, Jing; Kostas, Theresa; Rajappan, Gowri
2014-06-01
IP-based routing for military LEO/MEO satellite ad hoc networks is very challenging due to network and traffic heterogeneity, network topology and traffic dynamics. In this paper, we describe a traffic priority-aware routing scheme for such networks, namely Dynamic Autonomous Routing Technology (DART) for satellite ad hoc networks. DART has a cross-layer design, and conducts routing and resource reservation concurrently for optimal performance in the fluid but predictable satellite ad hoc networks. DART ensures end-to-end data delivery with QoS assurances by only choosing routing paths that have sufficient resources, supporting different packet priority levels. In order to do so, DART incorporates several resource management and innovative routing mechanisms, which dynamically adapt to best fit the prevailing conditions. In particular, DART integrates a resource reservation mechanism to reserve network bandwidth resources; a proactive routing mechanism to set up non-overlapping spanning trees to segregate high priority traffic flows from lower priority flows so that the high priority flows do not face contention from low priority flows; a reactive routing mechanism to arbitrate resources between various traffic priorities when needed; a predictive routing mechanism to set up routes for scheduled missions and for anticipated topology changes for QoS assurance. We present simulation results showing the performance of DART. We have conducted these simulations using the Iridium constellation and trajectories as well as realistic military communications scenarios. The simulation results demonstrate DART's ability to discriminate between high-priority and low-priority traffic flows and ensure disparate QoS requirements of these traffic flows.
A Web-Based Data-Querying Tool Based on Ontology-Driven Methodology and Flowchart-Based Model
Ping, Xiao-Ou; Chung, Yufang; Liang, Ja-Der; Yang, Pei-Ming; Huang, Guan-Tarn; Lai, Feipei
2013-01-01
Background Because of the increased adoption rate of electronic medical record (EMR) systems, more health care records have been increasingly accumulating in clinical data repositories. Therefore, querying the data stored in these repositories is crucial for retrieving the knowledge from such large volumes of clinical data. Objective The aim of this study is to develop a Web-based approach for enriching the capabilities of the data-querying system along the three following considerations: (1) the interface design used for query formulation, (2) the representation of query results, and (3) the models used for formulating query criteria. Methods The Guideline Interchange Format version 3.5 (GLIF3.5), an ontology-driven clinical guideline representation language, was used for formulating the query tasks based on the GLIF3.5 flowchart in the Protégé environment. The flowchart-based data-querying model (FBDQM) query execution engine was developed and implemented for executing queries and presenting the results through a visual and graphical interface. To examine a broad variety of patient data, the clinical data generator was implemented to automatically generate the clinical data in the repository, and the generated data, thereby, were employed to evaluate the system. The accuracy and time performance of the system for three medical query tasks relevant to liver cancer were evaluated based on the clinical data generator in the experiments with varying numbers of patients. Results In this study, a prototype system was developed to test the feasibility of applying a methodology for building a query execution engine using FBDQMs by formulating query tasks using the existing GLIF. The FBDQM-based query execution engine was used to successfully retrieve the clinical data based on the query tasks formatted using the GLIF3.5 in the experiments with varying numbers of patients. The accuracy of the three queries (ie, “degree of liver damage,” “degree of liver damage when applying a mutually exclusive setting,” and “treatments for liver cancer”) was 100% for all four experiments (10 patients, 100 patients, 1000 patients, and 10,000 patients). Among the three measured query phases, (1) structured query language operations, (2) criteria verification, and (3) other, the first two had the longest execution time. Conclusions The ontology-driven FBDQM-based approach enriched the capabilities of the data-querying system. The adoption of the GLIF3.5 increased the potential for interoperability, shareability, and reusability of the query tasks. PMID:25600078
Distributed Topology Organization and Transmission Scheduling in Wireless Ad Hoc Networks
2004-01-01
slot consists of two mini -slots that support that support full-duplex transfer–one for master-to-slave transmission and the other from slave-to-master...protocol storage requirement would only be 208 bits in this case. 111 Communication requirements Each half-duplex mini -slot can be used by either a data or...either a data or control packet, eq. (4.2) sets the minimum half-duplex mini -slot size in the system or equivalently, the maximum system period
Spontaneous Group Learning in Ambient Learning Environments
NASA Astrophysics Data System (ADS)
Bick, Markus; Jughardt, Achim; Pawlowski, Jan M.; Veith, Patrick
Spontaneous Group Learning is a concept to form and facilitate face-to-face, ad-hoc learning groups in collaborative settings. We show how to use Ambient Intelligence to identify, support, and initiate group processes. Learners' positions are determined by widely used technologies, e.g., Bluetooth and WLAN. As a second step, learners' positions, tasks, and interests are visualized. Finally, a group process is initiated supported by relevant documents and services. Our solution is a starting point to develop new didactical solutions for collaborative processes.
International forensic automotive paint database
NASA Astrophysics Data System (ADS)
Bishea, Gregory A.; Buckle, Joe L.; Ryland, Scott G.
1999-02-01
The Technical Working Group for Materials Analysis (TWGMAT) is supporting an international forensic automotive paint database. The Federal Bureau of Investigation and the Royal Canadian Mounted Police (RCMP) are collaborating on this effort through TWGMAT. This paper outlines the support and further development of the RCMP's Automotive Paint Database, `Paint Data Query'. This cooperative agreement augments and supports a current, validated, searchable, automotive paint database that is used to identify make(s), model(s), and year(s) of questioned paint samples in hit-and-run fatalities and other associated investigations involving automotive paint.
Mining Longitudinal Web Queries: Trends and Patterns.
ERIC Educational Resources Information Center
Wang, Peiling; Berry, Michael W.; Yang, Yiheng
2003-01-01
Analyzed user queries submitted to an academic Web site during a four-year period, using a relational database, to examine users' query behavior, to identify problems they encounter, and to develop techniques for optimizing query analysis and mining. Linguistic analyses focus on query structures, lexicon, and word associations using statistical…
Optimizing a Query by Transformation and Expansion.
Glocker, Katrin; Knurr, Alexander; Dieter, Julia; Dominick, Friederike; Forche, Melanie; Koch, Christian; Pascoe Pérez, Analie; Roth, Benjamin; Ückert, Frank
2017-01-01
In the biomedical sector not only the amount of information produced and uploaded into the web is enormous, but also the number of sources where these data can be found. Clinicians and researchers spend huge amounts of time on trying to access this information and to filter the most important answers to a given question. As the formulation of these queries is crucial, automated query expansion is an effective tool to optimize a query and receive the best possible results. In this paper we introduce the concept of a workflow for an optimization of queries in the medical and biological sector by using a series of tools for expansion and transformation of the query. After the definition of attributes by the user, the query string is compared to previous queries in order to add semantic co-occurring terms to the query. Additionally, the query is enlarged by an inclusion of synonyms. The translation into database specific ontologies ensures the optimal query formulation for the chosen database(s). As this process can be performed in various databases at once, the results are ranked and normalized in order to achieve a comparable list of answers for a question.
EmptyHeaded: A Relational Engine for Graph Processing
Aberger, Christopher R.; Tu, Susan; Olukotun, Kunle; Ré, Christopher
2016-01-01
There are two types of high-performance graph processing engines: low- and high-level engines. Low-level engines (Galois, PowerGraph, Snap) provide optimized data structures and computation models but require users to write low-level imperative code, hence ensuring that efficiency is the burden of the user. In high-level engines, users write in query languages like datalog (SociaLite) or SQL (Grail). High-level engines are easier to use but are orders of magnitude slower than the low-level graph engines. We present EmptyHeaded, a high-level engine that supports a rich datalog-like query language and achieves performance comparable to that of low-level engines. At the core of EmptyHeaded’s design is a new class of join algorithms that satisfy strong theoretical guarantees but have thus far not achieved performance comparable to that of specialized graph processing engines. To achieve high performance, EmptyHeaded introduces a new join engine architecture, including a novel query optimizer and data layouts that leverage single-instruction multiple data (SIMD) parallelism. With this architecture, EmptyHeaded outperforms high-level approaches by up to three orders of magnitude on graph pattern queries, PageRank, and Single-Source Shortest Paths (SSSP) and is an order of magnitude faster than many low-level baselines. We validate that EmptyHeaded competes with the best-of-breed low-level engine (Galois), achieving comparable performance on PageRank and at most 3× worse performance on SSSP. PMID:28077912
Murphy, SN; Barnett, GO; Chueh, HC
2000-01-01
The patient base of the Partners HealthCare System in Boston exceeds 1.8 million. Many of these patients are desirable for participation in research studies. To facilitate their discovery, we developed a data warehouse to contain clinical characteristics of these patients. The data warehouse contains diagnosis and procedures from administrative databases. The patients are indexed across institutions and their demographics provided by an Enterprise Master Patient Indexing service. Characteristics of the diagnoses and procedures such as associated providers, dates of service, inpatient/outpatient status, and other visit-related characteristics are also fed from the administrative systems. The targeted users of this system are research clinician s interested in finding patient cohorts for research studies. Their data requirements were analyzed and have been reported elsewhere. We did not expect the clinicians to become expert users of the system. Tools for querying healthcare data have traditionally been text based, although graphical interfaces have been pursued. In order to support the simple drag and drop visual model, as well as the identification and distribution of the patient data, a three-tier software architecture was developed. The user interface was developed in Visual Basic and distributed as an ActiveX object embedded in an HTML page. The middle layer was developed in Java and Microsoft COM. The queries are represented throughout their lifetime as XML objects, and the Microsoft SQL7 database is queried and managed in standard SQL. PMID:11080028
Murphy; Barnett; Chueh
2000-01-01
The patient base of the Partners HealthCare System in Boston exceeds 1.8 million. Many of these patients are desirable for participation in research studies. To facilitate their discovery, we developed a data warehouse to contain clinical characteristics of these patients. The data warehouse contains diagnosis and procedures from administrative databases. The patients are indexed across institutions and their demographics provided by an Enterprise Master Patient Indexing service. Characteristics of the diagnoses and procedures such as associated providers, dates of service, inpatient/outpatient status, and other visit-related characteristics are also fed from the administrative systems. The targeted users of this system are research clinician s interested in finding patient cohorts for research studies. Their data requirements were analyzed and have been reported elsewhere. We did not expect the clinicians to become expert users of the system. Tools for querying healthcare data have traditionally been text based, although graphical interfaces have been pursued. In order to support the simple drag and drop visual model, as well as the identification and distribution of the patient data, a three-tier software architecture was developed. The user interface was developed in Visual Basic and distributed as an ActiveX object embedded in an HTML page. The middle layer was developed in Java and Microsoft COM. The queries are represented throughout their lifetime as XML objects, and the Microsoft SQL7 database is queried and managed in standard SQL.
Assisting Consumer Health Information Retrieval with Query Recommendations
Zeng, Qing T.; Crowell, Jonathan; Plovnick, Robert M.; Kim, Eunjung; Ngo, Long; Dibble, Emily
2006-01-01
Objective: Health information retrieval (HIR) on the Internet has become an important practice for millions of people, many of whom have problems forming effective queries. We have developed and evaluated a tool to assist people in health-related query formation. Design: We developed the Health Information Query Assistant (HIQuA) system. The system suggests alternative/additional query terms related to the user's initial query that can be used as building blocks to construct a better, more specific query. The recommended terms are selected according to their semantic distance from the original query, which is calculated on the basis of concept co-occurrences in medical literature and log data as well as semantic relations in medical vocabularies. Measurements: An evaluation of the HIQuA system was conducted and a total of 213 subjects participated in the study. The subjects were randomized into 2 groups. One group was given query recommendations and the other was not. Each subject performed HIR for both a predefined and a self-defined task. Results: The study showed that providing HIQuA recommendations resulted in statistically significantly higher rates of successful queries (odds ratio = 1.66, 95% confidence interval = 1.16–2.38), although no statistically significant impact on user satisfaction or the users' ability to accomplish the predefined retrieval task was found. Conclusion: Providing semantic-distance-based query recommendations can help consumers with query formation during HIR. PMID:16221944
NASA Astrophysics Data System (ADS)
Kuznetsov, Valentin; Riley, Daniel; Afaq, Anzar; Sekhri, Vijay; Guo, Yuyi; Lueking, Lee
2010-04-01
The CMS experiment has implemented a flexible and powerful system enabling users to find data within the CMS physics data catalog. The Dataset Bookkeeping Service (DBS) comprises a database and the services used to store and access metadata related to CMS physics data. To this, we have added a generalized query system in addition to the existing web and programmatic interfaces to the DBS. This query system is based on a query language that hides the complexity of the underlying database structure by discovering the join conditions between database tables. This provides a way of querying the system that is simple and straightforward for CMS data managers and physicists to use without requiring knowledge of the database tables or keys. The DBS Query Language uses the ANTLR tool to build the input query parser and tokenizer, followed by a query builder that uses a graph representation of the DBS schema to construct the SQL query sent to underlying database. We will describe the design of the query system, provide details of the language components and overview of how this component fits into the overall data discovery system architecture.
Spatial aggregation query in dynamic geosensor networks
NASA Astrophysics Data System (ADS)
Yi, Baolin; Feng, Dayang; Xiao, Shisong; Zhao, Erdun
2007-11-01
Wireless sensor networks have been widely used for civilian and military applications, such as environmental monitoring and vehicle tracking. In many of these applications, the researches mainly aim at building sensor network based systems to leverage the sensed data to applications. However, the existing works seldom exploited spatial aggregation query considering the dynamic characteristics of sensor networks. In this paper, we investigate how to process spatial aggregation query over dynamic geosensor networks where both the sink node and sensor nodes are mobile and propose several novel improvements on enabling techniques. The mobility of sensors makes the existing routing protocol based on information of fixed framework or the neighborhood infeasible. We present an improved location-based stateless implicit geographic forwarding (IGF) protocol for routing a query toward the area specified by query window, a diameter-based window aggregation query (DWAQ) algorithm for query propagation and data aggregation in the query window, finally considering the location changing of the sink node, we present two schemes to forward the result to the sink node. Simulation results show that the proposed algorithms can improve query latency and query accuracy.
ERIC Educational Resources Information Center
Lam, Wing
2007-01-01
Increasingly, organizations find that they need to integrate large number of information systems in order to support enterprise-wide business initiatives such as e-business, supply chain management and customer relationship management. To date, organizations have largely tended to address information systems (IS) integration in an ad-hoc manner.…
ERIC Educational Resources Information Center
Johnson, Mallory O.; Gandhi, Monica
2015-01-01
Mentoring is increasingly recognized as a critical element in supporting successful careers in academic research in medicine and related disciplines, particularly for trainees and early career investigators from underrepresented backgrounds. Mentoring is often executed ad hoc; there are limited programs to train faculty to become more effective…
1986-06-01
GUIDE JJ TRIMIS Program Office 5401 Westbard Avenue Bethesda, Maryland 20816 W" V NDC Federal Systems. lnc.._" 1300 Piccard Drive .. -Rockville, Marylcn...Project/Teay iUnft1. DOD A A fri-Service Medical Information Systems Program office 11. Contract(C) orGaw a 5401 Westbard Avenue (C) Bethesda, MD 20816
Improving Student Retention through Evidence Based Proactive Systems at the Open University (UK)
ERIC Educational Resources Information Center
Gibbs, Graham; Regan, Peter; Simpson, Ormond
2007-01-01
The Open University has been undertaking an extended initiative to improve student retention through enhanced support for at-risk students. This initiative has evolved through a series of stages from ad hoc small scale local interventions relying largely on tutors and student self-referral, to an institution-wide pro-active system implemented by…
School Improvement Plans and Student Learning in Jamaica
ERIC Educational Resources Information Center
Lockheed, Marlaine; Harris, Abigail; Jayasundera, Tamara
2010-01-01
A school improvement program that provided support to poor-performing schools on the basis of needs identified in a school improvement plan was implemented in 72 government schools in Jamaica, from 1998 to 2005. In this independent evaluation of the program, we use propensity score matching to create, post hoc, a control group of schools that were…
Hoogendam, Arjen; Stalenhoef, Anton FH; Robbé, Pieter F de Vries; Overbeke, A John PM
2008-01-01
Background The use of PubMed to answer daily medical care questions is limited because it is challenging to retrieve a small set of relevant articles and time is restricted. Knowing what aspects of queries are likely to retrieve relevant articles can increase the effectiveness of PubMed searches. The objectives of our study were to identify queries that are likely to retrieve relevant articles by relating PubMed search techniques and tools to the number of articles retrieved and the selection of articles for further reading. Methods This was a prospective observational study of queries regarding patient-related problems sent to PubMed by residents and internists in internal medicine working in an Academic Medical Centre. We analyzed queries, search results, query tools (Mesh, Limits, wildcards, operators), selection of abstract and full-text for further reading, using a portal that mimics PubMed. Results PubMed was used to solve 1121 patient-related problems, resulting in 3205 distinct queries. Abstracts were viewed in 999 (31%) of these queries, and in 126 (39%) of 321 queries using query tools. The average term count per query was 2.5. Abstracts were selected in more than 40% of queries using four or five terms, increasing to 63% if the use of four or five terms yielded 2–161 articles. Conclusion Queries sent to PubMed by physicians at our hospital during daily medical care contain fewer than three terms. Queries using four to five terms, retrieving less than 161 article titles, are most likely to result in abstract viewing. PubMed search tools are used infrequently by our population and are less effective than the use of four or five terms. Methods to facilitate the formulation of precise queries, using more relevant terms, should be the focus of education and research. PMID:18816391
Enhanced Weight based DSR for Mobile Ad Hoc Networks
NASA Astrophysics Data System (ADS)
Verma, Samant; Jain, Sweta
2011-12-01
Routing in ad hoc network is a great problematic, since a good routing protocol must ensure fast and efficient packet forwarding, which isn't evident in ad hoc networks. In literature there exists lot of routing protocols however they don't include all the aspects of ad hoc networks as mobility, device and medium constraints which make these protocols not efficient for some configuration and categories of ad hoc networks. Thus in this paper we propose an improvement of Weight Based DSR in order to include some of the aspects of ad hoc networks as stability, remaining battery power, load and trust factor and proposing a new approach Enhanced Weight Based DSR.
NASA Astrophysics Data System (ADS)
Antani, Sameer K.; Natarajan, Mukil; Long, Jonathan L.; Long, L. Rodney; Thoma, George R.
2005-04-01
The article describes the status of our ongoing R&D at the U.S. National Library of Medicine (NLM) towards the development of an advanced multimedia database biomedical information system that supports content-based image retrieval (CBIR). NLM maintains a collection of 17,000 digitized spinal X-rays along with text survey data from the Second National Health and Nutritional Examination Survey (NHANES II). These data serve as a rich data source for epidemiologists and researchers of osteoarthritis and musculoskeletal diseases. It is currently possible to access these through text keyword queries using our Web-based Medical Information Retrieval System (WebMIRS). CBIR methods developed specifically for biomedical images could offer direct visual searching of these images by means of example image or user sketch. We are building a system which supports hybrid queries that have text and image-content components. R&D goals include developing algorithms for robust image segmentation for localizing and identifying relevant anatomy, labeling the segmented anatomy based on its pathology, developing suitable indexing and similarity matching methods for images and image features, and associating the survey text information for query and retrieval along with the image data. Some highlights of the system developed in MATLAB and Java are: use of a networked or local centralized database for text and image data; flexibility to incorporate new research work; provides a means to control access to system components under development; and use of XML for structured reporting. The article details the design, features, and algorithms in this third revision of this prototype system, CBIR3.
CDAO-Store: Ontology-driven Data Integration for Phylogenetic Analysis
2011-01-01
Background The Comparative Data Analysis Ontology (CDAO) is an ontology developed, as part of the EvoInfo and EvoIO groups supported by the National Evolutionary Synthesis Center, to provide semantic descriptions of data and transformations commonly found in the domain of phylogenetic analysis. The core concepts of the ontology enable the description of phylogenetic trees and associated character data matrices. Results Using CDAO as the semantic back-end, we developed a triple-store, named CDAO-Store. CDAO-Store is a RDF-based store of phylogenetic data, including a complete import of TreeBASE. CDAO-Store provides a programmatic interface, in the form of web services, and a web-based front-end, to perform both user-defined as well as domain-specific queries; domain-specific queries include search for nearest common ancestors, minimum spanning clades, filter multiple trees in the store by size, author, taxa, tree identifier, algorithm or method. In addition, CDAO-Store provides a visualization front-end, called CDAO-Explorer, which can be used to view both character data matrices and trees extracted from the CDAO-Store. CDAO-Store provides import capabilities, enabling the addition of new data to the triple-store; files in PHYLIP, MEGA, nexml, and NEXUS formats can be imported and their CDAO representations added to the triple-store. Conclusions CDAO-Store is made up of a versatile and integrated set of tools to support phylogenetic analysis. To the best of our knowledge, CDAO-Store is the first semantically-aware repository of phylogenetic data with domain-specific querying capabilities. The portal to CDAO-Store is available at http://www.cs.nmsu.edu/~cdaostore. PMID:21496247
HyQue: evaluating hypotheses using Semantic Web technologies
2011-01-01
Background Key to the success of e-Science is the ability to computationally evaluate expert-composed hypotheses for validity against experimental data. Researchers face the challenge of collecting, evaluating and integrating large amounts of diverse information to compose and evaluate a hypothesis. Confronted with rapidly accumulating data, researchers currently do not have the software tools to undertake the required information integration tasks. Results We present HyQue, a Semantic Web tool for querying scientific knowledge bases with the purpose of evaluating user submitted hypotheses. HyQue features a knowledge model to accommodate diverse hypotheses structured as events and represented using Semantic Web languages (RDF/OWL). Hypothesis validity is evaluated against experimental and literature-sourced evidence through a combination of SPARQL queries and evaluation rules. Inference over OWL ontologies (for type specifications, subclass assertions and parthood relations) and retrieval of facts stored as Bio2RDF linked data provide support for a given hypothesis. We evaluate hypotheses of varying levels of detail about the genetic network controlling galactose metabolism in Saccharomyces cerevisiae to demonstrate the feasibility of deploying such semantic computing tools over a growing body of structured knowledge in Bio2RDF. Conclusions HyQue is a query-based hypothesis evaluation system that can currently evaluate hypotheses about the galactose metabolism in S. cerevisiae. Hypotheses as well as the supporting or refuting data are represented in RDF and directly linked to one another allowing scientists to browse from data to hypothesis and vice versa. HyQue hypotheses and data are available at http://semanticscience.org/projects/hyque. PMID:21624158
CDAO-store: ontology-driven data integration for phylogenetic analysis.
Chisham, Brandon; Wright, Ben; Le, Trung; Son, Tran Cao; Pontelli, Enrico
2011-04-15
The Comparative Data Analysis Ontology (CDAO) is an ontology developed, as part of the EvoInfo and EvoIO groups supported by the National Evolutionary Synthesis Center, to provide semantic descriptions of data and transformations commonly found in the domain of phylogenetic analysis. The core concepts of the ontology enable the description of phylogenetic trees and associated character data matrices. Using CDAO as the semantic back-end, we developed a triple-store, named CDAO-Store. CDAO-Store is a RDF-based store of phylogenetic data, including a complete import of TreeBASE. CDAO-Store provides a programmatic interface, in the form of web services, and a web-based front-end, to perform both user-defined as well as domain-specific queries; domain-specific queries include search for nearest common ancestors, minimum spanning clades, filter multiple trees in the store by size, author, taxa, tree identifier, algorithm or method. In addition, CDAO-Store provides a visualization front-end, called CDAO-Explorer, which can be used to view both character data matrices and trees extracted from the CDAO-Store. CDAO-Store provides import capabilities, enabling the addition of new data to the triple-store; files in PHYLIP, MEGA, nexml, and NEXUS formats can be imported and their CDAO representations added to the triple-store. CDAO-Store is made up of a versatile and integrated set of tools to support phylogenetic analysis. To the best of our knowledge, CDAO-Store is the first semantically-aware repository of phylogenetic data with domain-specific querying capabilities. The portal to CDAO-Store is available at http://www.cs.nmsu.edu/~cdaostore.