Sample records for answering complex queries

  1. Design of a Low-Cost Adaptive Question Answering System for Closed Domain Factoid Queries

    ERIC Educational Resources Information Center

    Toh, Huey Ling

    2010-01-01

    Closed domain question answering (QA) systems achieve precision and recall at the cost of complex language processing techniques to parse the answer corpus. We propose a "query-based" model for indexing answers in a closed domain factoid QA system. Further, we use a phrase term inference method for improving the ranking order of related questions.…

  2. Complex analyses on clinical information systems using restricted natural language querying to resolve time-event dependencies.

    PubMed

    Safari, Leila; Patrick, Jon D

    2018-06-01

    This paper reports on a generic framework to provide clinicians with the ability to conduct complex analyses on elaborate research topics using cascaded queries to resolve internal time-event dependencies in the research questions, as an extension to the proposed Clinical Data Analytics Language (CliniDAL). A cascaded query model is proposed to resolve internal time-event dependencies in the queries which can have up to five levels of criteria starting with a query to define subjects to be admitted into a study, followed by a query to define the time span of the experiment. Three more cascaded queries can be required to define control groups, control variables and output variables which all together simulate a real scientific experiment. According to the complexity of the research questions, the cascaded query model has the flexibility of merging some lower level queries for simple research questions or adding a nested query to each level to compose more complex queries. Three different scenarios (one of them contains two studies) are described and used for evaluation of the proposed solution. CliniDAL's complex analyses solution enables answering complex queries with time-event dependencies at most in a few hours which manually would take many days. An evaluation of results of the research studies based on the comparison between CliniDAL and SQL solutions reveals high usability and efficiency of CliniDAL's solution. Copyright © 2018 Elsevier Inc. All rights reserved.

  3. Which factors predict the time spent answering queries to a drug information centre?

    PubMed Central

    Reppe, Linda A.; Spigset, Olav

    2010-01-01

    Objective To develop a model based upon factors able to predict the time spent answering drug-related queries to Norwegian drug information centres (DICs). Setting and method Drug-related queries received at 5 DICs in Norway from March to May 2007 were randomly assigned to 20 employees until each of them had answered a minimum of five queries. The employees reported the number of drugs involved, the type of literature search performed, and whether the queries were considered judgmental or not, using a specifically developed scoring system. Main outcome measures The scores of these three factors were added together to define a workload score for each query. Workload and its individual factors were subsequently related to the measured time spent answering the queries by simple or multiple linear regression analyses. Results Ninety-six query/answer pairs were analyzed. Workload significantly predicted the time spent answering the queries (adjusted R2 = 0.22, P < 0.001). Literature search was the individual factor best predicting the time spent answering the queries (adjusted R2 = 0.17, P < 0.001), and this variable also contributed the most in the multiple regression analyses. Conclusion The most important workload factor predicting the time spent handling the queries in this study was the type of literature search that had to be performed. The categorisation of queries as judgmental or not, also affected the time spent answering the queries. The number of drugs involved did not significantly influence the time spent answering drug information queries. PMID:20922480

  4. Merging OLTP and OLAP - Back to the Future

    NASA Astrophysics Data System (ADS)

    Lehner, Wolfgang

    When the terms "Data Warehousing" and "Online Analytical Processing" were coined in the 1990s by Kimball, Codd, and others, there was an obvious need for separating data and workload for operational transactional-style processing and decision-making implying complex analytical queries over large and historic data sets. Large data warehouse infrastructures have been set up to cope with the special requirements of analytical query answering for multiple reasons: For example, analytical thinking heavily relies on predefined navigation paths to guide the user through the data set and to provide different views on different aggregation levels.Multi-dimensional queries exploiting hierarchically structured dimensions lead to complex star queries at a relational backend, which could hardly be handled by classical relational systems.

  5. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Madduri, Kamesh; Wu, Kesheng

    The Resource Description Framework (RDF) is a popular data model for representing linked data sets arising from the web, as well as large scienti c data repositories such as UniProt. RDF data intrinsically represents a labeled and directed multi-graph. SPARQL is a query language for RDF that expresses subgraph pattern- nding queries on this implicit multigraph in a SQL- like syntax. SPARQL queries generate complex intermediate join queries; to compute these joins e ciently, we propose a new strategy based on bitmap indexes. We store the RDF data in column-oriented structures as compressed bitmaps along with two dictionaries. This papermore » makes three new contributions. (i) We present an e cient parallel strategy for parsing the raw RDF data, building dictionaries of unique entities, and creating compressed bitmap indexes of the data. (ii) We utilize the constructed bitmap indexes to e ciently answer SPARQL queries, simplifying the join evaluations. (iii) To quantify the performance impact of using bitmap indexes, we compare our approach to the state-of-the-art triple-store RDF-3X. We nd that our bitmap index-based approach to answering queries is up to an order of magnitude faster for a variety of SPARQL queries, on gigascale RDF data sets.« less

  6. Using AberOWL for fast and scalable reasoning over BioPortal ontologies.

    PubMed

    Slater, Luke; Gkoutos, Georgios V; Schofield, Paul N; Hoehndorf, Robert

    2016-08-08

    Reasoning over biomedical ontologies using their OWL semantics has traditionally been a challenging task due to the high theoretical complexity of OWL-based automated reasoning. As a consequence, ontology repositories, as well as most other tools utilizing ontologies, either provide access to ontologies without use of automated reasoning, or limit the number of ontologies for which automated reasoning-based access is provided. We apply the AberOWL infrastructure to provide automated reasoning-based access to all accessible and consistent ontologies in BioPortal (368 ontologies). We perform an extensive performance evaluation to determine query times, both for queries of different complexity and for queries that are performed in parallel over the ontologies. We demonstrate that, with the exception of a few ontologies, even complex and parallel queries can now be answered in milliseconds, therefore allowing automated reasoning to be used on a large scale, to run in parallel, and with rapid response times.

  7. KBGIS-II: A knowledge-based geographic information system

    NASA Technical Reports Server (NTRS)

    Smith, Terence; Peuquet, Donna; Menon, Sudhakar; Agarwal, Pankaj

    1986-01-01

    The architecture and working of a recently implemented Knowledge-Based Geographic Information System (KBGIS-II), designed to satisfy several general criteria for the GIS, is described. The system has four major functions including query-answering, learning and editing. The main query finds constrained locations for spatial objects that are describable in a predicate-calculus based spatial object language. The main search procedures include a family of constraint-satisfaction procedures that use a spatial object knowledge base to search efficiently for complex spatial objects in large, multilayered spatial data bases. These data bases are represented in quadtree form. The search strategy is designed to reduce the computational cost of search in the average case. The learning capabilities of the system include the addition of new locations of complex spatial objects to the knowledge base as queries are answered, and the ability to learn inductively definitions of new spatial objects from examples. The new definitions are added to the knowledge base by the system. The system is performing all its designated tasks successfully. Future reports will relate performance characteristics of the system.

  8. A topic clustering approach to finding similar questions from large question and answer archives.

    PubMed

    Zhang, Wei-Nan; Liu, Ting; Yang, Yang; Cao, Liujuan; Zhang, Yu; Ji, Rongrong

    2014-01-01

    With the blooming of Web 2.0, Community Question Answering (CQA) services such as Yahoo! Answers (http://answers.yahoo.com), WikiAnswer (http://wiki.answers.com), and Baidu Zhidao (http://zhidao.baidu.com), etc., have emerged as alternatives for knowledge and information acquisition. Over time, a large number of question and answer (Q&A) pairs with high quality devoted by human intelligence have been accumulated as a comprehensive knowledge base. Unlike the search engines, which return long lists of results, searching in the CQA services can obtain the correct answers to the question queries by automatically finding similar questions that have already been answered by other users. Hence, it greatly improves the efficiency of the online information retrieval. However, given a question query, finding the similar and well-answered questions is a non-trivial task. The main challenge is the word mismatch between question query (query) and candidate question for retrieval (question). To investigate this problem, in this study, we capture the word semantic similarity between query and question by introducing the topic modeling approach. We then propose an unsupervised machine-learning approach to finding similar questions on CQA Q&A archives. The experimental results show that our proposed approach significantly outperforms the state-of-the-art methods.

  9. Producing approximate answers to database queries

    NASA Technical Reports Server (NTRS)

    Vrbsky, Susan V.; Liu, Jane W. S.

    1993-01-01

    We have designed and implemented a query processor, called APPROXIMATE, that makes approximate answers available if part of the database is unavailable or if there is not enough time to produce an exact answer. The accuracy of the approximate answers produced improves monotonically with the amount of data retrieved to produce the result. The exact answer is produced if all of the needed data are available and query processing is allowed to continue until completion. The monotone query processing algorithm of APPROXIMATE works within the standard relational algebra framework and can be implemented on a relational database system with little change to the relational architecture. We describe here the approximation semantics of APPROXIMATE that serves as the basis for meaningful approximations of both set-valued and single-valued queries. We show how APPROXIMATE is implemented to make effective use of semantic information, provided by an object-oriented view of the database, and describe the additional overhead required by APPROXIMATE.

  10. Accuracy of telephone reference service in health sciences libraries.

    PubMed Central

    Paskoff, B M

    1991-01-01

    Six factual queries were unobtrusively telephoned to fifty-one U.S. academic health sciences and hospital libraries. The majority of the queries (63.4%) were answered accurately. Referrals to another library or information source were made for 25.2% of the queries. Eleven answers (3.6%) were inaccurate, and no answer was provided for 7.8% of the queries. There was a correlation between the number of accurate answers provided and the presence of at least one staff member with a master's degree in library and information science. The correlation between employing a librarian certified by the Medical Library Association (MLA) and providing accurate answers was significant. The majority of referrals were to specific sources. If these "helpful referrals" are counted with accurate answers as correct responses, they total 76.8% of the answers. In a follow-up survey, five libraries stated that they did not provide accurate answers because they did not own an appropriate source. Staff-related problems were given as reasons for other than accurate answers by two of the libraries, while eight indicated that library policy prevented them from providing answers to the public. PMID:2039904

  11. To compare PubMed Clinical Queries and UpToDate in teaching information mastery to clinical residents: a crossover randomized controlled trial.

    PubMed

    Sayyah Ensan, Ladan; Faghankhani, Masoomeh; Javanbakht, Anna; Ahmadi, Seyed-Foad; Baradaran, Hamid Reza

    2011-01-01

    To compare PubMed Clinical Queries and UpToDate regarding the amount and speed of information retrieval and users' satisfaction. A cross-over randomized trial was conducted in February 2009 in Tehran University of Medical Sciences that included 44 year-one or two residents who participated in an information mastery workshop. A one-hour lecture on the principles of information mastery was organized followed by self learning slide shows before using each database. Subsequently, participants were randomly assigned to answer 2 clinical scenarios using either UpToDate or PubMed Clinical Queries then crossed to use the other database to answer 2 different clinical scenarios. The proportion of relevantly answered clinical scenarios, time to answer retrieval, and users' satisfaction were measured in each database. Based on intention-to-treat analysis, participants retrieved the answer of 67 (76%) questions using UpToDate and 38 (43%) questions using PubMed Clinical Queries (P<0.001). The median time to answer retrieval was 17 min (95% CI: 16 to 18) using UpToDate compared to 29 min (95% CI: 26 to 32) using PubMed Clinical Queries (P<0.001). The satisfaction with the accuracy of retrieved answers, interaction with UpToDate and also overall satisfaction were higher among UpToDate users compared to PubMed Clinical Queries users (P<0.001). For first time users, using UpToDate compared to Pubmed Clinical Queries can lead to not only a higher proportion of relevant answer retrieval within a shorter time, but also a higher users' satisfaction. So, addition of tutoring pre-appraised sources such as UpToDate to the information mastery curricula seems to be highly efficient.

  12. Consistent Query Answering of Conjunctive Queries under Primary Key Constraints

    ERIC Educational Resources Information Center

    Pema, Enela

    2014-01-01

    An inconsistent database is a database that violates one or more of its integrity constraints. In reality, violations of integrity constraints arise frequently under several different circumstances. Inconsistent databases have long posed the challenge to develop suitable tools for meaningful query answering. A principled approach for querying…

  13. Net Improvement of Correct Answers to Therapy Questions After PubMed Searches: Pre/Post Comparison

    PubMed Central

    Keepanasseril, Arun

    2013-01-01

    Background Clinicians search PubMed for answers to clinical questions although it is time consuming and not always successful. Objective To determine if PubMed used with its Clinical Queries feature to filter results based on study quality would improve search success (more correct answers to clinical questions related to therapy). Methods We invited 528 primary care physicians to participate, 143 (27.1%) consented, and 111 (21.0% of the total and 77.6% of those who consented) completed the study. Participants answered 14 yes/no therapy questions and were given 4 of these (2 originally answered correctly and 2 originally answered incorrectly) to search using either the PubMed main screen or PubMed Clinical Queries narrow therapy filter via a purpose-built system with identical search screens. Participants also picked 3 of the first 20 retrieved citations that best addressed each question. They were then asked to re-answer the original 14 questions. Results We found no statistically significant differences in the rates of correct or incorrect answers using the PubMed main screen or PubMed Clinical Queries. The rate of correct answers increased from 50.0% to 61.4% (95% CI 55.0%-67.8%) for the PubMed main screen searches and from 50.0% to 59.1% (95% CI 52.6%-65.6%) for Clinical Queries searches. These net absolute increases of 11.4% and 9.1%, respectively, included previously correct answers changing to incorrect at a rate of 9.5% (95% CI 5.6%-13.4%) for PubMed main screen searches and 9.1% (95% CI 5.3%-12.9%) for Clinical Queries searches, combined with increases in the rate of being correct of 20.5% (95% CI 15.2%-25.8%) for PubMed main screen searches and 17.7% (95% CI 12.7%-22.7%) for Clinical Queries searches. Conclusions PubMed can assist clinicians answering clinical questions with an approximately 10% absolute rate of improvement in correct answers. This small increase includes more correct answers partially offset by a decrease in previously correct answers. PMID:24217329

  14. Net improvement of correct answers to therapy questions after pubmed searches: pre/post comparison.

    PubMed

    McKibbon, Kathleen Ann; Lokker, Cynthia; Keepanasseril, Arun; Wilczynski, Nancy L; Haynes, R Brian

    2013-11-08

    Clinicians search PubMed for answers to clinical questions although it is time consuming and not always successful. To determine if PubMed used with its Clinical Queries feature to filter results based on study quality would improve search success (more correct answers to clinical questions related to therapy). We invited 528 primary care physicians to participate, 143 (27.1%) consented, and 111 (21.0% of the total and 77.6% of those who consented) completed the study. Participants answered 14 yes/no therapy questions and were given 4 of these (2 originally answered correctly and 2 originally answered incorrectly) to search using either the PubMed main screen or PubMed Clinical Queries narrow therapy filter via a purpose-built system with identical search screens. Participants also picked 3 of the first 20 retrieved citations that best addressed each question. They were then asked to re-answer the original 14 questions. We found no statistically significant differences in the rates of correct or incorrect answers using the PubMed main screen or PubMed Clinical Queries. The rate of correct answers increased from 50.0% to 61.4% (95% CI 55.0%-67.8%) for the PubMed main screen searches and from 50.0% to 59.1% (95% CI 52.6%-65.6%) for Clinical Queries searches. These net absolute increases of 11.4% and 9.1%, respectively, included previously correct answers changing to incorrect at a rate of 9.5% (95% CI 5.6%-13.4%) for PubMed main screen searches and 9.1% (95% CI 5.3%-12.9%) for Clinical Queries searches, combined with increases in the rate of being correct of 20.5% (95% CI 15.2%-25.8%) for PubMed main screen searches and 17.7% (95% CI 12.7%-22.7%) for Clinical Queries searches. PubMed can assist clinicians answering clinical questions with an approximately 10% absolute rate of improvement in correct answers. This small increase includes more correct answers partially offset by a decrease in previously correct answers.

  15. Generating and Executing Complex Natural Language Queries across Linked Data.

    PubMed

    Hamon, Thierry; Mougin, Fleur; Grabar, Natalia

    2015-01-01

    With the recent and intensive research in the biomedical area, the knowledge accumulated is disseminated through various knowledge bases. Links between these knowledge bases are needed in order to use them jointly. Linked Data, SPARQL language, and interfaces in Natural Language question-answering provide interesting solutions for querying such knowledge bases. We propose a method for translating natural language questions in SPARQL queries. We use Natural Language Processing tools, semantic resources, and the RDF triples description. The method is designed on 50 questions over 3 biomedical knowledge bases, and evaluated on 27 questions. It achieves 0.78 F-measure on the test set. The method for translating natural language questions into SPARQL queries is implemented as Perl module available at http://search.cpan.org/ thhamon/RDF-NLP-SPARQLQuery.

  16. KBGIS-2: A knowledge-based geographic information system

    NASA Technical Reports Server (NTRS)

    Smith, T.; Peuquet, D.; Menon, S.; Agarwal, P.

    1986-01-01

    The architecture and working of a recently implemented knowledge-based geographic information system (KBGIS-2) that was designed to satisfy several general criteria for the geographic information system are described. The system has four major functions that include query-answering, learning, and editing. The main query finds constrained locations for spatial objects that are describable in a predicate-calculus based spatial objects language. The main search procedures include a family of constraint-satisfaction procedures that use a spatial object knowledge base to search efficiently for complex spatial objects in large, multilayered spatial data bases. These data bases are represented in quadtree form. The search strategy is designed to reduce the computational cost of search in the average case. The learning capabilities of the system include the addition of new locations of complex spatial objects to the knowledge base as queries are answered, and the ability to learn inductively definitions of new spatial objects from examples. The new definitions are added to the knowledge base by the system. The system is currently performing all its designated tasks successfully, although currently implemented on inadequate hardware. Future reports will detail the performance characteristics of the system, and various new extensions are planned in order to enhance the power of KBGIS-2.

  17. A Coding Method for Efficient Subgraph Querying on Vertex- and Edge-Labeled Graphs

    PubMed Central

    Zhu, Lei; Song, Qinbao; Guo, Yuchen; Du, Lei; Zhu, Xiaoyan; Wang, Guangtao

    2014-01-01

    Labeled graphs are widely used to model complex data in many domains, so subgraph querying has been attracting more and more attention from researchers around the world. Unfortunately, subgraph querying is very time consuming since it involves subgraph isomorphism testing that is known to be an NP-complete problem. In this paper, we propose a novel coding method for subgraph querying that is based on Laplacian spectrum and the number of walks. Our method follows the filtering-and-verification framework and works well on graph databases with frequent updates. We also propose novel two-step filtering conditions that can filter out most false positives and prove that the two-step filtering conditions satisfy the no-false-negative requirement (no dismissal in answers). Extensive experiments on both real and synthetic graphs show that, compared with six existing counterpart methods, our method can effectively improve the efficiency of subgraph querying. PMID:24853266

  18. To Compare PubMed Clinical Queries and UpToDate in Teaching Information Mastery to Clinical Residents: A Crossover Randomized Controlled Trial

    PubMed Central

    Sayyah Ensan, Ladan; Faghankhani, Masoomeh; Javanbakht, Anna; Ahmadi, Seyed-Foad; Baradaran, Hamid Reza

    2011-01-01

    Purpose To compare PubMed Clinical Queries and UpToDate regarding the amount and speed of information retrieval and users' satisfaction. Method A cross-over randomized trial was conducted in February 2009 in Tehran University of Medical Sciences that included 44 year-one or two residents who participated in an information mastery workshop. A one-hour lecture on the principles of information mastery was organized followed by self learning slide shows before using each database. Subsequently, participants were randomly assigned to answer 2 clinical scenarios using either UpToDate or PubMed Clinical Queries then crossed to use the other database to answer 2 different clinical scenarios. The proportion of relevantly answered clinical scenarios, time to answer retrieval, and users' satisfaction were measured in each database. Results Based on intention-to-treat analysis, participants retrieved the answer of 67 (76%) questions using UpToDate and 38 (43%) questions using PubMed Clinical Queries (P<0.001). The median time to answer retrieval was 17 min (95% CI: 16 to 18) using UpToDate compared to 29 min (95% CI: 26 to 32) using PubMed Clinical Queries (P<0.001). The satisfaction with the accuracy of retrieved answers, interaction with UpToDate and also overall satisfaction were higher among UpToDate users compared to PubMed Clinical Queries users (P<0.001). Conclusions For first time users, using UpToDate compared to Pubmed Clinical Querries can lead to not only a higher proportion of relevant answer retrieval within a shorter time, but also a higher users' satisfaction. So, addition of tutoring pre-appraised sources such as UpToDate to the information mastery curricula seems to be highly efficient. PMID:21858142

  19. Incremental Query Rewriting with Resolution

    NASA Astrophysics Data System (ADS)

    Riazanov, Alexandre; Aragão, Marcelo A. T.

    We address the problem of semantic querying of relational databases (RDB) modulo knowledge bases using very expressive knowledge representation formalisms, such as full first-order logic or its various fragments. We propose to use a resolution-based first-order logic (FOL) reasoner for computing schematic answers to deductive queries, with the subsequent translation of these schematic answers to SQL queries which are evaluated using a conventional relational DBMS. We call our method incremental query rewriting, because an original semantic query is rewritten into a (potentially infinite) series of SQL queries. In this chapter, we outline the main idea of our technique - using abstractions of databases and constrained clauses for deriving schematic answers, and provide completeness and soundness proofs to justify the applicability of this technique to the case of resolution for FOL without equality. The proposed method can be directly used with regular RDBs, including legacy databases. Moreover, we propose it as a potential basis for an efficient Web-scale semantic search technology.

  20. A Natural Language Interface Concordant with a Knowledge Base.

    PubMed

    Han, Yong-Jin; Park, Seong-Bae; Park, Se-Young

    2016-01-01

    The discordance between expressions interpretable by a natural language interface (NLI) system and those answerable by a knowledge base is a critical problem in the field of NLIs. In order to solve this discordance problem, this paper proposes a method to translate natural language questions into formal queries that can be generated from a graph-based knowledge base. The proposed method considers a subgraph of a knowledge base as a formal query. Thus, all formal queries corresponding to a concept or a predicate in the knowledge base can be generated prior to query time and all possible natural language expressions corresponding to each formal query can also be collected in advance. A natural language expression has a one-to-one mapping with a formal query. Hence, a natural language question is translated into a formal query by matching the question with the most appropriate natural language expression. If the confidence of this matching is not sufficiently high the proposed method rejects the question and does not answer it. Multipredicate queries are processed by regarding them as a set of collected expressions. The experimental results show that the proposed method thoroughly handles answerable questions from the knowledge base and rejects unanswerable ones effectively.

  1. Four queries concerning the metaphysics of early human embryogenesis.

    PubMed

    Howsepian, A A

    2008-04-01

    In this essay, I attempt to provide answers to the following four queries concerning the metaphysics of early human embryogenesis. (1) Following its first cellular fission, is it coherent to claim that one and only one of two "blastomeric" twins of a human zygote is identical with that zygote? (2) Following the fusion of two human pre-embryos, is it coherent to claim that one and only one pre-fusion pre-embryo is identical with that postfusion pre-embryo? (3) Does a live human being come into existence only when its brain comes into existence? (4) At implantation, does a pre-embryo become a mere part of its mother? I argue that either if things have quidditative properties or if criterialism is false, then queries (1) and (2) can be answered in the affirmative; that in light of recent developments in theories of human death and in light of a more "functional" theory of brains, query (3) can be answered in the negative; and that plausible mereological principles require a negative answer to query (4).

  2. Modeling relief.

    PubMed

    Sumner, Walton; Xu, Jin Zhong; Roussel, Guy; Hagen, Michael D

    2007-10-11

    The American Board of Family Medicine deployed virtual patient simulations in 2004 to evaluate Diplomates' diagnostic and management skills. A previously reported dynamic process generates general symptom histories from time series data representing baseline values and reactions to medications. The simulator also must answer queries about details such as palliation and provocation. These responses often describe some recurring pattern, such as, "this medicine relieves my symptoms in a few minutes." The simulator can provide a detail stored as text, or it can evaluate a reference to a second query object. The second query object can generate details using a single Bayesian network to evaluate the effect of each drug in a virtual patient's medication list. A new medication option may not require redesign of the second query object if its implementation is consistent with related drugs. We expect this mechanism to maintain realistic responses to detail questions in complex simulations.

  3. Querying archetype-based EHRs by search ontology-based XPath engineering.

    PubMed

    Kropf, Stefan; Uciteli, Alexandr; Schierle, Katrin; Krücken, Peter; Denecke, Kerstin; Herre, Heinrich

    2018-05-11

    Legacy data and new structured data can be stored in a standardized format as XML-based EHRs on XML databases. Querying documents on these databases is crucial for answering research questions. Instead of using free text searches, that lead to false positive results, the precision can be increased by constraining the search to certain parts of documents. A search ontology-based specification of queries on XML documents defines search concepts and relates them to parts in the XML document structure. Such query specification method is practically introduced and evaluated by applying concrete research questions formulated in natural language on a data collection for information retrieval purposes. The search is performed by search ontology-based XPath engineering that reuses ontologies and XML-related W3C standards. The key result is that the specification of research questions can be supported by the usage of search ontology-based XPath engineering. A deeper recognition of entities and a semantic understanding of the content is necessary for a further improvement of precision and recall. Key limitation is that the application of the introduced process requires skills in ontology and software development. In future, the time consuming ontology development could be overcome by implementing a new clinical role: the clinical ontologist. The introduced Search Ontology XML extension connects Search Terms to certain parts in XML documents and enables an ontology-based definition of queries. Search ontology-based XPath engineering can support research question answering by the specification of complex XPath expressions without deep syntax knowledge about XPaths.

  4. A Novel Two-Tier Cooperative Caching Mechanism for the Optimization of Multi-Attribute Periodic Queries in Wireless Sensor Networks

    PubMed Central

    Zhou, ZhangBing; Zhao, Deng; Shu, Lei; Tsang, Kim-Fung

    2015-01-01

    Wireless sensor networks, serving as an important interface between physical environments and computational systems, have been used extensively for supporting domain applications, where multiple-attribute sensory data are queried from the network continuously and periodically. Usually, certain sensory data may not vary significantly within a certain time duration for certain applications. In this setting, sensory data gathered at a certain time slot can be used for answering concurrent queries and may be reused for answering the forthcoming queries when the variation of these data is within a certain threshold. To address this challenge, a popularity-based cooperative caching mechanism is proposed in this article, where the popularity of sensory data is calculated according to the queries issued in recent time slots. This popularity reflects the possibility that sensory data are interested in the forthcoming queries. Generally, sensory data with the highest popularity are cached at the sink node, while sensory data that may not be interested in the forthcoming queries are cached in the head nodes of divided grid cells. Leveraging these cooperatively cached sensory data, queries are answered through composing these two-tier cached data. Experimental evaluation shows that this approach can reduce the network communication cost significantly and increase the network capability. PMID:26131665

  5. Visual graph query formulation and exploration: a new perspective on information retrieval at the edge

    NASA Astrophysics Data System (ADS)

    Kase, Sue E.; Vanni, Michelle; Knight, Joanne A.; Su, Yu; Yan, Xifeng

    2016-05-01

    Within operational environments decisions must be made quickly based on the information available. Identifying an appropriate knowledge base and accurately formulating a search query are critical tasks for decision-making effectiveness in dynamic situations. The spreading of graph data management tools to access large graph databases is a rapidly emerging research area of potential benefit to the intelligence community. A graph representation provides a natural way of modeling data in a wide variety of domains. Graph structures use nodes, edges, and properties to represent and store data. This research investigates the advantages of information search by graph query initiated by the analyst and interactively refined within the contextual dimensions of the answer space toward a solution. The paper introduces SLQ, a user-friendly graph querying system enabling the visual formulation of schemaless and structureless graph queries. SLQ is demonstrated with an intelligence analyst information search scenario focused on identifying individuals responsible for manufacturing a mosquito-hosted deadly virus. The scenario highlights the interactive construction of graph queries without prior training in complex query languages or graph databases, intuitive navigation through the problem space, and visualization of results in graphical format.

  6. Big Data and Dysmenorrhea: What Questions Do Women and Men Ask About Menstrual Pain?

    PubMed

    Chen, Chen X; Groves, Doyle; Miller, Wendy R; Carpenter, Janet S

    2018-04-30

    Menstrual pain is highly prevalent among women of reproductive age. As the general public increasingly obtains health information online, Big Data from online platforms provide novel sources to understand the public's perspectives and information needs about menstrual pain. The study's purpose was to describe salient queries about dysmenorrhea using Big Data from a question and answer platform. We performed text-mining of 1.9 billion queries from ChaCha, a United States-based question and answer platform. Dysmenorrhea-related queries were identified by using keyword searching. Each relevant query was split into token words (i.e., meaningful words or phrases) and stop words (i.e., not meaningful functional words). Word Adjacency Graph (WAG) modeling was used to detect clusters of queries and visualize the range of dysmenorrhea-related topics. We constructed two WAG models respectively from queries by women of reproductive age and bymen. Salient themes were identified through inspecting clusters of WAG models. We identified two subsets of queries: Subset 1 contained 507,327 queries from women aged 13-50 years. Subset 2 contained 113,888 queries from men aged 13 or above. WAG modeling revealed topic clusters for each subset. Between female and male subsets, topic clusters overlapped on dysmenorrhea symptoms and management. Among female queries, there were distinctive topics on approaching menstrual pain at school and menstrual pain-related conditions; while among male queries, there was a distinctive cluster of queries on menstrual pain from male's perspectives. Big Data mining of the ChaCha ® question and answer service revealed a series of information needs among women and men on menstrual pain. Findings may be useful in structuring the content and informing the delivery platform for educational interventions.

  7. Everything you ever wanted to know about GRM* (*but were afraid to ask)

    Treesearch

    Jeffery A. Turner

    2015-01-01

    Querying the Forest Inventory and Analysis Database (FIADB) for growth, removals, and mortality (GRM) estimates can certainly be a conundrum. Providing the flexibility necessary to produce a wide array of GRM estimates has the unfortunate side effect of added complexity. This presentation seeks to answer some recurring questions related to GRM and how our new system...

  8. PACS and electronic health records

    NASA Astrophysics Data System (ADS)

    Cohen, Simona; Gilboa, Flora; Shani, Uri

    2002-05-01

    Electronic Health Record (EHR) is a major component of the health informatics domain. An important part of the EHR is the medical images obtained over a patient's lifetime and stored in diverse PACS. The vision presented in this paper is that future medical information systems will convert data from various medical sources -- including diverse modalities, PACS, HIS, CIS, RIS, and proprietary systems -- to HL7 standard XML documents. Then, the various documents are indexed and compiled to EHRs, upon which complex queries can be posed. We describe the conversion of data retrieved from PACS systems through DICOM to HL7 standard XML documents. This enables the EHR system to answer queries such as 'Get all chest images of patients at the age of 20-30, that have blood type 'A' and are allergic to pine trees', which a single PACS cannot answer. The integration of data from multiple sources makes our approach capable of delivering such answers. It enables the correlation of medical, demographic, clinical, and even genetic information. In addition, by fully indexing all the tagged data in DICOM objects, it becomes possible to offer access to huge amounts of valuable data, which can be better exploited in the specific radiology domain.

  9. Optimizing a Query by Transformation and Expansion.

    PubMed

    Glocker, Katrin; Knurr, Alexander; Dieter, Julia; Dominick, Friederike; Forche, Melanie; Koch, Christian; Pascoe Pérez, Analie; Roth, Benjamin; Ückert, Frank

    2017-01-01

    In the biomedical sector not only the amount of information produced and uploaded into the web is enormous, but also the number of sources where these data can be found. Clinicians and researchers spend huge amounts of time on trying to access this information and to filter the most important answers to a given question. As the formulation of these queries is crucial, automated query expansion is an effective tool to optimize a query and receive the best possible results. In this paper we introduce the concept of a workflow for an optimization of queries in the medical and biological sector by using a series of tools for expansion and transformation of the query. After the definition of attributes by the user, the query string is compared to previous queries in order to add semantic co-occurring terms to the query. Additionally, the query is enlarged by an inclusion of synonyms. The translation into database specific ontologies ensures the optimal query formulation for the chosen database(s). As this process can be performed in various databases at once, the results are ranked and normalized in order to achieve a comparable list of answers for a question.

  10. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Critchlow, Terence J.; Abdulla, Ghaleb; Becla, Jacek

    Data management is the organization of information to support efficient access and analysis. For data intensive computing applications, the speed at which relevant data can be accessed is a limiting factor in terms of the size and complexity of computation that can be performed. Data access speed is impacted by the size of the relevant subset of the data, the complexity of the query used to define it, and the layout of the data relative to the query. As the underlying data sets become increasingly complex, the questions asked of it become more involved as well. For example, geospatial datamore » associated with a city is no longer limited to the map data representing its streets, but now also includes layers identifying utility lines, key points, locations and types of businesses within the city limits, tax information for each land parcel, satellite imagery, and possibly even street-level views. As a result, queries have gone from simple questions, such as "how long is Main Street?", to much more complex questions such as "taking all other factors into consideration, are the property values of houses near parks higher than those under power lines, and if so, by what percentage". Answering these questions requires a coherent infrastructure, integrating the relevant data into a format optimized for the questions being asked.« less

  11. Monotonically improving approximate answers to relational algebra queries

    NASA Technical Reports Server (NTRS)

    Smith, Kenneth P.; Liu, J. W. S.

    1989-01-01

    We present here a query processing method that produces approximate answers to queries posed in standard relational algebra. This method is monotone in the sense that the accuracy of the approximate result improves with the amount of time spent producing the result. This strategy enables us to trade the time to produce the result for the accuracy of the result. An approximate relational model that characterizes appromimate relations and a partial order for comparing them is developed. Relational operators which operate on and return approximate relations are defined.

  12. An ontology-driven semantic mash-up of gene and biological pathway information: Application to the domain of nicotine dependence

    PubMed Central

    Sahoo, Satya S.; Bodenreider, Olivier; Rutter, Joni L.; Skinner, Karen J.; Sheth, Amit P.

    2008-01-01

    Objectives This paper illustrates how Semantic Web technologies (especially RDF, OWL, and SPARQL) can support information integration and make it easy to create semantic mashups (semantically integrated resources). In the context of understanding the genetic basis of nicotine dependence, we integrate gene and pathway information and show how three complex biological queries can be answered by the integrated knowledge base. Methods We use an ontology-driven approach to integrate two gene resources (Entrez Gene and HomoloGene) and three pathway resources (KEGG, Reactome and BioCyc), for five organisms, including humans. We created the Entrez Knowledge Model (EKoM), an information model in OWL for the gene resources, and integrated it with the extant BioPAX ontology designed for pathway resources. The integrated schema is populated with data from the pathway resources, publicly available in BioPAX-compatible format, and gene resources for which a population procedure was created. The SPARQL query language is used to formulate queries over the integrated knowledge base to answer the three biological queries. Results Simple SPARQL queries could easily identify hub genes, i.e., those genes whose gene products participate in many pathways or interact with many other gene products. The identification of the genes expressed in the brain turned out to be more difficult, due to the lack of a common identification scheme for proteins. Conclusion Semantic Web technologies provide a valid framework for information integration in the life sciences. Ontology-driven integration represents a flexible, sustainable and extensible solution to the integration of large volumes of information. Additional resources, which enable the creation of mappings between information sources, are required to compensate for heterogeneity across namespaces. Resource page http://knoesis.wright.edu/research/lifesci/integration/structured_data/JBI-2008/ PMID:18395495

  13. An ontology-driven semantic mashup of gene and biological pathway information: application to the domain of nicotine dependence.

    PubMed

    Sahoo, Satya S; Bodenreider, Olivier; Rutter, Joni L; Skinner, Karen J; Sheth, Amit P

    2008-10-01

    This paper illustrates how Semantic Web technologies (especially RDF, OWL, and SPARQL) can support information integration and make it easy to create semantic mashups (semantically integrated resources). In the context of understanding the genetic basis of nicotine dependence, we integrate gene and pathway information and show how three complex biological queries can be answered by the integrated knowledge base. We use an ontology-driven approach to integrate two gene resources (Entrez Gene and HomoloGene) and three pathway resources (KEGG, Reactome and BioCyc), for five organisms, including humans. We created the Entrez Knowledge Model (EKoM), an information model in OWL for the gene resources, and integrated it with the extant BioPAX ontology designed for pathway resources. The integrated schema is populated with data from the pathway resources, publicly available in BioPAX-compatible format, and gene resources for which a population procedure was created. The SPARQL query language is used to formulate queries over the integrated knowledge base to answer the three biological queries. Simple SPARQL queries could easily identify hub genes, i.e., those genes whose gene products participate in many pathways or interact with many other gene products. The identification of the genes expressed in the brain turned out to be more difficult, due to the lack of a common identification scheme for proteins. Semantic Web technologies provide a valid framework for information integration in the life sciences. Ontology-driven integration represents a flexible, sustainable and extensible solution to the integration of large volumes of information. Additional resources, which enable the creation of mappings between information sources, are required to compensate for heterogeneity across namespaces. RESOURCE PAGE: http://knoesis.wright.edu/research/lifesci/integration/structured_data/JBI-2008/

  14. Cooperative answers in database systems

    NASA Technical Reports Server (NTRS)

    Gaasterland, Terry; Godfrey, Parke; Minker, Jack; Novik, Lev

    1993-01-01

    A major concern of researchers who seek to improve human-computer communication involves how to move beyond literal interpretations of queries to a level of responsiveness that takes the user's misconceptions, expectations, desires, and interests into consideration. At Maryland, we are investigating how to better meet a user's needs within the framework of the cooperative answering system of Gal and Minker. We have been exploring how to use semantic information about the database to formulate coherent and informative answers. The work has two main thrusts: (1) the construction of a logic formula which embodies the content of a cooperative answer; and (2) the presentation of the logic formula to the user in a natural language form. The information that is available in a deductive database system for building cooperative answers includes integrity constraints, user constraints, the search tree for answers to the query, and false presuppositions that are present in the query. The basic cooperative answering theory of Gal and Minker forms the foundation of a cooperative answering system that integrates the new construction and presentation methods. This paper provides an overview of the cooperative answering strategies used in the CARMIN cooperative answering system, an ongoing research effort at Maryland. Section 2 gives some useful background definitions. Section 3 describes techniques for collecting cooperative logical formulae. Section 4 discusses which natural language generation techniques are useful for presenting the logic formula in natural language text. Section 5 presents a diagram of the system.

  15. A weight based genetic algorithm for selecting views

    NASA Astrophysics Data System (ADS)

    Talebian, Seyed H.; Kareem, Sameem A.

    2013-03-01

    Data warehouse is a technology designed for supporting decision making. Data warehouse is made by extracting large amount of data from different operational systems; transforming it to a consistent form and loading it to the central repository. The type of queries in data warehouse environment differs from those in operational systems. In contrast to operational systems, the analytical queries that are issued in data warehouses involve summarization of large volume of data and therefore in normal circumstance take a long time to be answered. On the other hand, the result of these queries must be answered in a short time to enable managers to make decisions as short time as possible. As a result, an essential need in this environment is in improving the performances of queries. One of the most popular methods to do this task is utilizing pre-computed result of queries. In this method, whenever a new query is submitted by the user instead of calculating the query on the fly through a large underlying database, the pre-computed result or views are used to answer the queries. Although, the ideal option would be pre-computing and saving all possible views, but, in practice due to disk space constraint and overhead due to view updates it is not considered as a feasible choice. Therefore, we need to select a subset of possible views to save on disk. The problem of selecting the right subset of views is considered as an important challenge in data warehousing. In this paper we suggest a Weighted Based Genetic Algorithm (WBGA) for solving the view selection problem with two objectives.

  16. Using Bitmap Indexing Technology for Combined Numerical and TextQueries

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Stockinger, Kurt; Cieslewicz, John; Wu, Kesheng

    2006-10-16

    In this paper, we describe a strategy of using compressedbitmap indices to speed up queries on both numerical data and textdocuments. By using an efficient compression algorithm, these compressedbitmap indices are compact even for indices with millions of distinctterms. Moreover, bitmap indices can be used very efficiently to answerBoolean queries over text documents involving multiple query terms.Existing inverted indices for text searches are usually inefficient forcorpora with a very large number of terms as well as for queriesinvolving a large number of hits. We demonstrate that our compressedbitmap index technology overcomes both of those short-comings. In aperformance comparison against amore » commonly used database system, ourindices answer queries 30 times faster on average. To provide full SQLsupport, we integrated our indexing software, called FastBit, withMonetDB. The integrated system MonetDB/FastBit provides not onlyefficient searches on a single table as FastBit does, but also answersjoin queries efficiently. Furthermore, MonetDB/FastBit also provides avery efficient retrieval mechanism of result records.« less

  17. FastBit Reference Manual

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wu, Kesheng

    2007-08-02

    An index in a database system is a data structure that utilizes redundant information about the base data to speed up common searching and retrieval operations. Most commonly used indexes are variants of B-trees, such as B+-tree and B*-tree. FastBit implements a set of alternative indexes call compressed bitmap indexes. Compared with B-tree variants, these indexes provide very efficient searching and retrieval operations by sacrificing the efficiency of updating the indexes after the modification of an individual record. In addition to the well-known strengths of bitmap indexes, FastBit has a special strength stemming from the bitmap compression scheme used. Themore » compression method is called the Word-Aligned Hybrid (WAH) code. It reduces the bitmap indexes to reasonable sizes and at the same time allows very efficient bitwise logical operations directly on the compressed bitmaps. Compared with the well-known compression methods such as LZ77 and Byte-aligned Bitmap code (BBC), WAH sacrifices some space efficiency for a significant improvement in operational efficiency. Since the bitwise logical operations are the most important operations needed to answer queries, using WAH compression has been shown to answer queries significantly faster than using other compression schemes. Theoretical analyses showed that WAH compressed bitmap indexes are optimal for one-dimensional range queries. Only the most efficient indexing schemes such as B+-tree and B*-tree have this optimality property. However, bitmap indexes are superior because they can efficiently answer multi-dimensional range queries by combining the answers to one-dimensional queries.« less

  18. Executing SPARQL Queries over the Web of Linked Data

    NASA Astrophysics Data System (ADS)

    Hartig, Olaf; Bizer, Christian; Freytag, Johann-Christoph

    The Web of Linked Data forms a single, globally distributed dataspace. Due to the openness of this dataspace, it is not possible to know in advance all data sources that might be relevant for query answering. This openness poses a new challenge that is not addressed by traditional research on federated query processing. In this paper we present an approach to execute SPARQL queries over the Web of Linked Data. The main idea of our approach is to discover data that might be relevant for answering a query during the query execution itself. This discovery is driven by following RDF links between data sources based on URIs in the query and in partial results. The URIs are resolved over the HTTP protocol into RDF data which is continuously added to the queried dataset. This paper describes concepts and algorithms to implement our approach using an iterator-based pipeline. We introduce a formalization of the pipelining approach and show that classical iterators may cause blocking due to the latency of HTTP requests. To avoid blocking, we propose an extension of the iterator paradigm. The evaluation of our approach shows its strengths as well as the still existing challenges.

  19. Storage and Retrieval of Large RDF Graph Using Hadoop and MapReduce

    NASA Astrophysics Data System (ADS)

    Farhan Husain, Mohammad; Doshi, Pankil; Khan, Latifur; Thuraisingham, Bhavani

    Handling huge amount of data scalably is a matter of concern for a long time. Same is true for semantic web data. Current semantic web frameworks lack this ability. In this paper, we describe a framework that we built using Hadoop to store and retrieve large number of RDF triples. We describe our schema to store RDF data in Hadoop Distribute File System. We also present our algorithms to answer a SPARQL query. We make use of Hadoop's MapReduce framework to actually answer the queries. Our results reveal that we can store huge amount of semantic web data in Hadoop clusters built mostly by cheap commodity class hardware and still can answer queries fast enough. We conclude that ours is a scalable framework, able to handle large amount of RDF data efficiently.

  20. Using wireless handheld computers to seek information at the point of care: an evaluation by clinicians.

    PubMed

    Hauser, Susan E; Demner-Fushman, Dina; Jacobs, Joshua L; Humphrey, Susanne M; Ford, Glenn; Thoma, George R

    2007-01-01

    To evaluate: (1) the effectiveness of wireless handheld computers for online information retrieval in clinical settings; (2) the role of MEDLINE in answering clinical questions raised at the point of care. A prospective single-cohort study: accompanying medical teams on teaching rounds, five internal medicine residents used and evaluated MD on Tap, an application for handheld computers, to seek answers in real time to clinical questions arising at the point of care. All transactions were stored by an intermediate server. Evaluators recorded clinical scenarios and questions, identified MEDLINE citations that answered the questions, and submitted daily and summative reports of their experience. A senior medical librarian corroborated the relevance of the selected citation to each scenario and question. Evaluators answered 68% of 363 background and foreground clinical questions during rounding sessions using a variety of MD on Tap features in an average session length of less than four minutes. The evaluator, the number and quality of query terms, the total number of citations found for a query, and the use of auto-spellcheck significantly contributed to the probability of query success. Handheld computers with Internet access are useful tools for healthcare providers to access MEDLINE in real time. MEDLINE citations can answer specific clinical questions when several medical terms are used to form a query. The MD on Tap application is an effective interface to MEDLINE in clinical settings, allowing clinicians to quickly find relevant citations.

  1. Using Wireless Handheld Computers to Seek Information at the Point of Care: An Evaluation by Clinicians

    PubMed Central

    Hauser, Susan E.; Demner-Fushman, Dina; Jacobs, Joshua L.; Humphrey, Susanne M.; Ford, Glenn; Thoma, George R.

    2007-01-01

    Objective To evaluate: (1) the effectiveness of wireless handheld computers for online information retrieval in clinical settings; (2) the role of MEDLINE® in answering clinical questions raised at the point of care. Design A prospective single-cohort study: accompanying medical teams on teaching rounds, five internal medicine residents used and evaluated MD on Tap, an application for handheld computers, to seek answers in real time to clinical questions arising at the point of care. Measurements All transactions were stored by an intermediate server. Evaluators recorded clinical scenarios and questions, identified MEDLINE citations that answered the questions, and submitted daily and summative reports of their experience. A senior medical librarian corroborated the relevance of the selected citation to each scenario and question. Results Evaluators answered 68% of 363 background and foreground clinical questions during rounding sessions using a variety of MD on Tap features in an average session length of less than four minutes. The evaluator, the number and quality of query terms, the total number of citations found for a query, and the use of auto-spellcheck significantly contributed to the probability of query success. Conclusion Handheld computers with Internet access are useful tools for healthcare providers to access MEDLINE in real time. MEDLINE citations can answer specific clinical questions when several medical terms are used to form a query. The MD on Tap application is an effective interface to MEDLINE in clinical settings, allowing clinicians to quickly find relevant citations. PMID:17712085

  2. Using the Weighted Keyword Model to Improve Information Retrieval for Answering Biomedical Questions

    PubMed Central

    Yu, Hong; Cao, Yong-gang

    2009-01-01

    Physicians ask many complex questions during the patient encounter. Information retrieval systems that can provide immediate and relevant answers to these questions can be invaluable aids to the practice of evidence-based medicine. In this study, we first automatically identify topic keywords from ad hoc clinical questions with a Condition Random Field model that is trained over thousands of manually annotated clinical questions. We then report on a linear model that assigns query weights based on their automatically identified semantic roles: topic keywords, domain specific terms, and their synonyms. Our evaluation shows that this weighted keyword model improves information retrieval from the Text Retrieval Conference Genomics track data. PMID:21347188

  3. Using the weighted keyword model to improve information retrieval for answering biomedical questions.

    PubMed

    Yu, Hong; Cao, Yong-Gang

    2009-03-01

    Physicians ask many complex questions during the patient encounter. Information retrieval systems that can provide immediate and relevant answers to these questions can be invaluable aids to the practice of evidence-based medicine. In this study, we first automatically identify topic keywords from ad hoc clinical questions with a Condition Random Field model that is trained over thousands of manually annotated clinical questions. We then report on a linear model that assigns query weights based on their automatically identified semantic roles: topic keywords, domain specific terms, and their synonyms. Our evaluation shows that this weighted keyword model improves information retrieval from the Text Retrieval Conference Genomics track data.

  4. Improving Web Search for Difficult Queries

    ERIC Educational Resources Information Center

    Wang, Xuanhui

    2009-01-01

    Search engines have now become essential tools in all aspects of our life. Although a variety of information needs can be served very successfully, there are still a lot of queries that search engines can not answer very effectively and these queries always make users feel frustrated. Since it is quite often that users encounter such "difficult…

  5. Added Value of Selected Images Embedded Into Radiology Reports to Referring Clinicians

    PubMed Central

    Iyer, Veena R.; Hahn, Peter F.; Blaszkowsky, Lawrence S.; Thayer, Sarah P.; Halpern, Elkan F.; Harisinghani, Mukesh G.

    2011-01-01

    Purpose The aim of this study was to evaluate the added utility of embedding images for findings described in radiology text reports to referring clinicians. Methods Thirty-five cases referred for abdominal CT scans in 2007 and 2008 were included. Referring physicians were asked to view text-only reports, followed by the same reports with pertinent images embedded. For each pair of reports, a questionnaire was administered. A 5-point, Likert-type scale was used to assess if the clinical query was satisfactorily answered by the text-only report. A “yes-or-no” question was used to assess whether the report with images answered the clinical query better; a positive answer to this question generated “yes-or-no” queries to examine whether the report with images helped in making a more confident decision on management, whether it reduced time spent in forming the plan, and whether it altered management. The questionnaire asked whether a radiologist would be contacted with queries on reading the text-only report and the report with images. Results In 32 of 35 cases, the text-only reports satisfactorily answered the clinical queries. In these 32 cases, the reports with attached images helped in making more confident management decisions and reduced time in planning management. Attached images altered management in 2 cases. Radiologists would have been consulted for clarifications in 21 and 10 cases on reading the text-only reports and the reports with embedded images, respectively. Conclusions Providing relevant images with reports saves time, increases physicians' confidence in deciding treatment plans, and can alter management. PMID:20193926

  6. An Ontology-Based Reasoning Framework for Querying Satellite Images for Disaster Monitoring.

    PubMed

    Alirezaie, Marjan; Kiselev, Andrey; Längkvist, Martin; Klügl, Franziska; Loutfi, Amy

    2017-11-05

    This paper presents a framework in which satellite images are classified and augmented with additional semantic information to enable queries about what can be found on the map at a particular location, but also about paths that can be taken. This is achieved by a reasoning framework based on qualitative spatial reasoning that is able to find answers to high level queries that may vary on the current situation. This framework called SemCityMap, provides the full pipeline from enriching the raw image data with rudimentary labels to the integration of a knowledge representation and reasoning methods to user interfaces for high level querying. To illustrate the utility of SemCityMap in a disaster scenario, we use an urban environment-central Stockholm-in combination with a flood simulation. We show that the system provides useful answers to high-level queries also with respect to the current flood status. Examples of such queries concern path planning for vehicles or retrieval of safe regions such as "find all regions close to schools and far from the flooded area". The particular advantage of our approach lies in the fact that ontological information and reasoning is explicitly integrated so that queries can be formulated in a natural way using concepts on appropriate level of abstraction, including additional constraints.

  7. An Ontology-Based Reasoning Framework for Querying Satellite Images for Disaster Monitoring

    PubMed Central

    Alirezaie, Marjan; Klügl, Franziska; Loutfi, Amy

    2017-01-01

    This paper presents a framework in which satellite images are classified and augmented with additional semantic information to enable queries about what can be found on the map at a particular location, but also about paths that can be taken. This is achieved by a reasoning framework based on qualitative spatial reasoning that is able to find answers to high level queries that may vary on the current situation. This framework called SemCityMap, provides the full pipeline from enriching the raw image data with rudimentary labels to the integration of a knowledge representation and reasoning methods to user interfaces for high level querying. To illustrate the utility of SemCityMap in a disaster scenario, we use an urban environment—central Stockholm—in combination with a flood simulation. We show that the system provides useful answers to high-level queries also with respect to the current flood status. Examples of such queries concern path planning for vehicles or retrieval of safe regions such as “find all regions close to schools and far from the flooded area”. The particular advantage of our approach lies in the fact that ontological information and reasoning is explicitly integrated so that queries can be formulated in a natural way using concepts on appropriate level of abstraction, including additional constraints. PMID:29113073

  8. Sensitivity and Predictive Value of 15 PubMed Search Strategies to Answer Clinical Questions Rated Against Full Systematic Reviews

    PubMed Central

    Merglen, Arnaud; Courvoisier, Delphine S; Combescure, Christophe; Garin, Nicolas; Perrier, Arnaud; Perneger, Thomas V

    2012-01-01

    Background Clinicians perform searches in PubMed daily, but retrieving relevant studies is challenging due to the rapid expansion of medical knowledge. Little is known about the performance of search strategies when they are applied to answer specific clinical questions. Objective To compare the performance of 15 PubMed search strategies in retrieving relevant clinical trials on therapeutic interventions. Methods We used Cochrane systematic reviews to identify relevant trials for 30 clinical questions. Search terms were extracted from the abstract using a predefined procedure based on the population, interventions, comparison, outcomes (PICO) framework and combined into queries. We tested 15 search strategies that varied in their query (PIC or PICO), use of PubMed’s Clinical Queries therapeutic filters (broad or narrow), search limits, and PubMed links to related articles. We assessed sensitivity (recall) and positive predictive value (precision) of each strategy on the first 2 PubMed pages (40 articles) and on the complete search output. Results The performance of the search strategies varied widely according to the clinical question. Unfiltered searches and those using the broad filter of Clinical Queries produced large outputs and retrieved few relevant articles within the first 2 pages, resulting in a median sensitivity of only 10%–25%. In contrast, all searches using the narrow filter performed significantly better, with a median sensitivity of about 50% (all P < .001 compared with unfiltered queries) and positive predictive values of 20%–30% (P < .001 compared with unfiltered queries). This benefit was consistent for most clinical questions. Searches based on related articles retrieved about a third of the relevant studies. Conclusions The Clinical Queries narrow filter, along with well-formulated queries based on the PICO framework, provided the greatest aid in retrieving relevant clinical trials within the 2 first PubMed pages. These results can help clinicians apply effective strategies to answer their questions at the point of care. PMID:22693047

  9. Sensitivity and predictive value of 15 PubMed search strategies to answer clinical questions rated against full systematic reviews.

    PubMed

    Agoritsas, Thomas; Merglen, Arnaud; Courvoisier, Delphine S; Combescure, Christophe; Garin, Nicolas; Perrier, Arnaud; Perneger, Thomas V

    2012-06-12

    Clinicians perform searches in PubMed daily, but retrieving relevant studies is challenging due to the rapid expansion of medical knowledge. Little is known about the performance of search strategies when they are applied to answer specific clinical questions. To compare the performance of 15 PubMed search strategies in retrieving relevant clinical trials on therapeutic interventions. We used Cochrane systematic reviews to identify relevant trials for 30 clinical questions. Search terms were extracted from the abstract using a predefined procedure based on the population, interventions, comparison, outcomes (PICO) framework and combined into queries. We tested 15 search strategies that varied in their query (PIC or PICO), use of PubMed's Clinical Queries therapeutic filters (broad or narrow), search limits, and PubMed links to related articles. We assessed sensitivity (recall) and positive predictive value (precision) of each strategy on the first 2 PubMed pages (40 articles) and on the complete search output. The performance of the search strategies varied widely according to the clinical question. Unfiltered searches and those using the broad filter of Clinical Queries produced large outputs and retrieved few relevant articles within the first 2 pages, resulting in a median sensitivity of only 10%-25%. In contrast, all searches using the narrow filter performed significantly better, with a median sensitivity of about 50% (all P < .001 compared with unfiltered queries) and positive predictive values of 20%-30% (P < .001 compared with unfiltered queries). This benefit was consistent for most clinical questions. Searches based on related articles retrieved about a third of the relevant studies. The Clinical Queries narrow filter, along with well-formulated queries based on the PICO framework, provided the greatest aid in retrieving relevant clinical trials within the 2 first PubMed pages. These results can help clinicians apply effective strategies to answer their questions at the point of care.

  10. Benchmarking distributed data warehouse solutions for storing genomic variant information

    PubMed Central

    Wiewiórka, Marek S.; Wysakowicz, Dawid P.; Okoniewski, Michał J.

    2017-01-01

    Abstract Genomic-based personalized medicine encompasses storing, analysing and interpreting genomic variants as its central issues. At a time when thousands of patientss sequenced exomes and genomes are becoming available, there is a growing need for efficient database storage and querying. The answer could be the application of modern distributed storage systems and query engines. However, the application of large genomic variant databases to this problem has not been sufficiently far explored so far in the literature. To investigate the effectiveness of modern columnar storage [column-oriented Database Management System (DBMS)] and query engines, we have developed a prototypic genomic variant data warehouse, populated with large generated content of genomic variants and phenotypic data. Next, we have benchmarked performance of a number of combinations of distributed storages and query engines on a set of SQL queries that address biological questions essential for both research and medical applications. In addition, a non-distributed, analytical database (MonetDB) has been used as a baseline. Comparison of query execution times confirms that distributed data warehousing solutions outperform classic relational DBMSs. Moreover, pre-aggregation and further denormalization of data, which reduce the number of distributed join operations, significantly improve query performance by several orders of magnitude. Most of distributed back-ends offer a good performance for complex analytical queries, while the Optimized Row Columnar (ORC) format paired with Presto and Parquet with Spark 2 query engines provide, on average, the lowest execution times. Apache Kudu on the other hand, is the only solution that guarantees a sub-second performance for simple genome range queries returning a small subset of data, where low-latency response is expected, while still offering decent performance for running analytical queries. In summary, research and clinical applications that require the storage and analysis of variants from thousands of samples can benefit from the scalability and performance of distributed data warehouse solutions. Database URL: https://github.com/ZSI-Bio/variantsdwh PMID:29220442

  11. Improving average ranking precision in user searches for biomedical research datasets

    PubMed Central

    Gobeill, Julien; Gaudinat, Arnaud; Vachon, Thérèse; Ruch, Patrick

    2017-01-01

    Abstract Availability of research datasets is keystone for health and life science study reproducibility and scientific progress. Due to the heterogeneity and complexity of these data, a main challenge to be overcome by research data management systems is to provide users with the best answers for their search queries. In the context of the 2016 bioCADDIE Dataset Retrieval Challenge, we investigate a novel ranking pipeline to improve the search of datasets used in biomedical experiments. Our system comprises a query expansion model based on word embeddings, a similarity measure algorithm that takes into consideration the relevance of the query terms, and a dataset categorization method that boosts the rank of datasets matching query constraints. The system was evaluated using a corpus with 800k datasets and 21 annotated user queries, and provided competitive results when compared to the other challenge participants. In the official run, it achieved the highest infAP, being +22.3% higher than the median infAP of the participant’s best submissions. Overall, it is ranked at top 2 if an aggregated metric using the best official measures per participant is considered. The query expansion method showed positive impact on the system’s performance increasing our baseline up to +5.0% and +3.4% for the infAP and infNDCG metrics, respectively. The similarity measure algorithm showed robust performance in different training conditions, with small performance variations compared to the Divergence from Randomness framework. Finally, the result categorization did not have significant impact on the system’s performance. We believe that our solution could be used to enhance biomedical dataset management systems. The use of data driven expansion methods, such as those based on word embeddings, could be an alternative to the complexity of biomedical terminologies. Nevertheless, due to the limited size of the assessment set, further experiments need to be performed to draw conclusive results. Database URL: https://biocaddie.org/benchmark-data PMID:29220475

  12. Explorative search of distributed bio-data to answer complex biomedical questions

    PubMed Central

    2014-01-01

    Background The huge amount of biomedical-molecular data increasingly produced is providing scientists with potentially valuable information. Yet, such data quantity makes difficult to find and extract those data that are most reliable and most related to the biomedical questions to be answered, which are increasingly complex and often involve many different biomedical-molecular aspects. Such questions can be addressed only by comprehensively searching and exploring different types of data, which frequently are ordered and provided by different data sources. Search Computing has been proposed for the management and integration of ranked results from heterogeneous search services. Here, we present its novel application to the explorative search of distributed biomedical-molecular data and the integration of the search results to answer complex biomedical questions. Results A set of available bioinformatics search services has been modelled and registered in the Search Computing framework, and a Bioinformatics Search Computing application (Bio-SeCo) using such services has been created and made publicly available at http://www.bioinformatics.deib.polimi.it/bio-seco/seco/. It offers an integrated environment which eases search, exploration and ranking-aware combination of heterogeneous data provided by the available registered services, and supplies global results that can support answering complex multi-topic biomedical questions. Conclusions By using Bio-SeCo, scientists can explore the very large and very heterogeneous biomedical-molecular data available. They can easily make different explorative search attempts, inspect obtained results, select the most appropriate, expand or refine them and move forward and backward in the construction of a global complex biomedical query on multiple distributed sources that could eventually find the most relevant results. Thus, it provides an extremely useful automated support for exploratory integrated bio search, which is fundamental for Life Science data driven knowledge discovery. PMID:24564278

  13. DOGMA: A Disk-Oriented Graph Matching Algorithm for RDF Databases

    NASA Astrophysics Data System (ADS)

    Bröcheler, Matthias; Pugliese, Andrea; Subrahmanian, V. S.

    RDF is an increasingly important paradigm for the representation of information on the Web. As RDF databases increase in size to approach tens of millions of triples, and as sophisticated graph matching queries expressible in languages like SPARQL become increasingly important, scalability becomes an issue. To date, there is no graph-based indexing method for RDF data where the index was designed in a way that makes it disk-resident. There is therefore a growing need for indexes that can operate efficiently when the index itself resides on disk. In this paper, we first propose the DOGMA index for fast subgraph matching on disk and then develop a basic algorithm to answer queries over this index. This algorithm is then significantly sped up via an optimized algorithm that uses efficient (but correct) pruning strategies when combined with two different extensions of the index. We have implemented a preliminary system and tested it against four existing RDF database systems developed by others. Our experiments show that our algorithm performs very well compared to these systems, with orders of magnitude improvements for complex graph queries.

  14. Toward a Data Scalable Solution for Facilitating Discovery of Science Resources

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Weaver, Jesse R.; Castellana, Vito G.; Morari, Alessandro

    Science is increasingly motivated by the need to process larger quantities of data. It is facing severe challenges in data collection, management, and processing, so much so that the computational demands of “data scaling” are competing with, and in many fields surpassing, the traditional objective of decreasing processing time. Example domains with large datasets include astronomy, biology, genomics, climate/weather, and material sciences. This paper presents a real-world use case in which we wish to answer queries pro- vided by domain scientists in order to facilitate discovery of relevant science resources. The problem is that the metadata for these science resourcesmore » is very large and is growing quickly, rapidly increasing the need for a data scaling solution. We propose a system – SGEM – designed for answering graph-based queries over large datasets on cluster architectures, and we re- port performance results for queries on the current RDESC dataset of nearly 1.4 billion triples, and on the well-known BSBM SPARQL query benchmark.« less

  15. Finding the Patient's Voice Using Big Data: Analysis of Users' Health-Related Concerns in the ChaCha Question-and-Answer Service (2009-2012).

    PubMed

    Priest, Chad; Knopf, Amelia; Groves, Doyle; Carpenter, Janet S; Furrey, Christopher; Krishnan, Anand; Miller, Wendy R; Otte, Julie L; Palakal, Mathew; Wiehe, Sarah; Wilson, Jeffrey

    2016-03-09

    The development of effective health care and public health interventions requires a comprehensive understanding of the perceptions, concerns, and stated needs of health care consumers and the public at large. Big datasets from social media and question-and-answer services provide insight into the public's health concerns and priorities without the financial, temporal, and spatial encumbrances of more traditional community-engagement methods and may prove a useful starting point for public-engagement health research (infodemiology). The objective of our study was to describe user characteristics and health-related queries of the ChaCha question-and-answer platform, and discuss how these data may be used to better understand the perceptions, concerns, and stated needs of health care consumers and the public at large. We conducted a retrospective automated textual analysis of anonymous user-generated queries submitted to ChaCha between January 2009 and November 2012. A total of 2.004 billion queries were read, of which 3.50% (70,083,796/2,004,243,249) were missing 1 or more data fields, leaving 1.934 billion complete lines of data for these analyses. Males and females submitted roughly equal numbers of health queries, but content differed by sex. Questions from females predominantly focused on pregnancy, menstruation, and vaginal health. Questions from males predominantly focused on body image, drug use, and sexuality. Adolescents aged 12-19 years submitted more queries than any other age group. Their queries were largely centered on sexual and reproductive health, and pregnancy in particular. The private nature of the ChaCha service provided a perfect environment for maximum frankness among users, especially among adolescents posing sensitive health questions. Adolescents' sexual health queries reveal knowledge gaps with serious, lifelong consequences. The nature of questions to the service provides opportunities for rapid understanding of health concerns and may lead to development of more effective tailored interventions.

  16. An Expressive and Efficient Language for XML Information Retrieval.

    ERIC Educational Resources Information Center

    Chinenyanga, Taurai Tapiwa; Kushmerick, Nicholas

    2002-01-01

    Discusses XML and information retrieval and describes a query language, ELIXIR (expressive and efficient language for XML information retrieval), with a textual similarity operator that can be used for similarity joins. Explains the algorithm for answering ELIXIR queries to generate intermediate relational data. (Author/LRW)

  17. The Use of Dynamic Segment Scoring for Language-Independent Question Answering

    DTIC Science & Technology

    2001-01-01

    initial window with one sentence is compared to scores corre- his/PRONOUN brother/ CONSANGUINITY like/SIMILARITY his/PRONOUN call/NOMENCLATURE he/PRONOUN...the query processing mod- ule. Using the differences between index numbers to specify phys- ical distance relationships among query keywords, we can

  18. Privacy-Preserving Location-Based Services

    ERIC Educational Resources Information Center

    Chow, Chi Yin

    2010-01-01

    Location-based services (LBS for short) providers require users' current locations to answer their location-based queries, e.g., range and nearest-neighbor queries. Revealing personal location information to potentially untrusted service providers could create privacy risks for users. To this end, our objective is to design a privacy-preserving…

  19. Ad-Hoc Queries over Document Collections - A Case Study

    NASA Astrophysics Data System (ADS)

    Löser, Alexander; Lutter, Steffen; Düssel, Patrick; Markl, Volker

    We discuss the novel problem of supporting analytical business intelligence queries over web-based textual content, e.g., BI-style reports based on 100.000's of documents from an ad-hoc web search result. Neither conventional search engines nor conventional Business Intelligence and ETL tools address this problem, which lies at the intersection of their capabilities. "Google Squared" or our system GOOLAP.info, are examples of these kinds of systems. They execute information extraction methods over one or several document collections at query time and integrate extracted records into a common view or tabular structure. Frequent extraction and object resolution failures cause incomplete records which could not be joined into a record answering the query. Our focus is the identification of join-reordering heuristics maximizing the size of complete records answering a structured query. With respect to given costs for document extraction we propose two novel join-operations: The multi-way CJ-operator joins records from multiple relationships extracted from a single document. The two-way join-operator DJ ensures data density by removing incomplete records from results. In a preliminary case study we observe that our join-reordering heuristics positively impact result size, record density and lower execution costs.

  20. Small sum privacy and large sum utility in data publishing.

    PubMed

    Fu, Ada Wai-Chee; Wang, Ke; Wong, Raymond Chi-Wing; Wang, Jia; Jiang, Minhao

    2014-08-01

    While the study of privacy preserving data publishing has drawn a lot of interest, some recent work has shown that existing mechanisms do not limit all inferences about individuals. This paper is a positive note in response to this finding. We point out that not all inference attacks should be countered, in contrast to all existing works known to us, and based on this we propose a model called SPLU. This model protects sensitive information, by which we refer to answers for aggregate queries with small sums, while queries with large sums are answered with higher accuracy. Using SPLU, we introduce a sanitization algorithm to protect data while maintaining high data utility for queries with large sums. Empirical results show that our method behaves as desired. Copyright © 2014 Elsevier Inc. All rights reserved.

  1. CSRQ: Communication-Efficient Secure Range Queries in Two-Tiered Sensor Networks

    PubMed Central

    Dai, Hua; Ye, Qingqun; Yang, Geng; Xu, Jia; He, Ruiliang

    2016-01-01

    In recent years, we have seen many applications of secure query in two-tiered wireless sensor networks. Storage nodes are responsible for storing data from nearby sensor nodes and answering queries from Sink. It is critical to protect data security from a compromised storage node. In this paper, the Communication-efficient Secure Range Query (CSRQ)—a privacy and integrity preserving range query protocol—is proposed to prevent attackers from gaining information of both data collected by sensor nodes and queries issued by Sink. To preserve privacy and integrity, in addition to employing the encoding mechanisms, a novel data structure called encrypted constraint chain is proposed, which embeds the information of integrity verification. Sink can use this encrypted constraint chain to verify the query result. The performance evaluation shows that CSRQ has lower communication cost than the current range query protocols. PMID:26907293

  2. SPARQLGraph: a web-based platform for graphically querying biological Semantic Web databases.

    PubMed

    Schweiger, Dominik; Trajanoski, Zlatko; Pabinger, Stephan

    2014-08-15

    Semantic Web has established itself as a framework for using and sharing data across applications and database boundaries. Here, we present a web-based platform for querying biological Semantic Web databases in a graphical way. SPARQLGraph offers an intuitive drag & drop query builder, which converts the visual graph into a query and executes it on a public endpoint. The tool integrates several publicly available Semantic Web databases, including the databases of the just recently released EBI RDF platform. Furthermore, it provides several predefined template queries for answering biological questions. Users can easily create and save new query graphs, which can also be shared with other researchers. This new graphical way of creating queries for biological Semantic Web databases considerably facilitates usability as it removes the requirement of knowing specific query languages and database structures. The system is freely available at http://sparqlgraph.i-med.ac.at.

  3. Audiotex Information Systems: Answering Consumer Queries Electronically. TDC Research Report No. 5.

    ERIC Educational Resources Information Center

    Conlan, Sharon; And Others

    A 14-month pilot of INFO-U, a fully automated telephone information service, assessed the feasibility of the technology in Minnesota Extension Service (MES) county offices to respond to consumer telephone queries. The project was designed to: (1) explore the potential of regional Extension cooperation and resource sharing; (2) increase recognition…

  4. Using ontology databases for scalable query answering, inconsistency detection, and data integration

    PubMed Central

    Dou, Dejing

    2011-01-01

    An ontology database is a basic relational database management system that models an ontology plus its instances. To reason over the transitive closure of instances in the subsumption hierarchy, for example, an ontology database can either unfold views at query time or propagate assertions using triggers at load time. In this paper, we use existing benchmarks to evaluate our method—using triggers—and we demonstrate that by forward computing inferences, we not only improve query time, but the improvement appears to cost only more space (not time). However, we go on to show that the true penalties were simply opaque to the benchmark, i.e., the benchmark inadequately captures load-time costs. We have applied our methods to two case studies in biomedicine, using ontologies and data from genetics and neuroscience to illustrate two important applications: first, ontology databases answer ontology-based queries effectively; second, using triggers, ontology databases detect instance-based inconsistencies—something not possible using views. Finally, we demonstrate how to extend our methods to perform data integration across multiple, distributed ontology databases. PMID:22163378

  5. SPARQLog: SPARQL with Rules and Quantification

    NASA Astrophysics Data System (ADS)

    Bry, François; Furche, Tim; Marnette, Bruno; Ley, Clemens; Linse, Benedikt; Poppe, Olga

    SPARQL has become the gold-standard for RDF query languages. Nevertheless, we believe there is further room for improving RDF query languages. In this chapter, we investigate the addition of rules and quantifier alternation to SPARQL. That extension, called SPARQLog, extends previous RDF query languages by arbitrary quantifier alternation: blank nodes may occur in the scope of all, some, or none of the universal variables of a rule. In addition, SPARQLog is aware of important RDF features such as the distinction between blank nodes, literals and IRIs or the RDFS vocabulary. The semantics of SPARQLog is closed (every answer is an RDF graph), but lifts RDF's restrictions on literal and blank node occurrences for intermediary data. We show how to define a sound and complete operational semantics that can be implemented using existing logic programming techniques. While SPARQLog is Turing complete, we identify a decidable (in fact, polynomial time) fragment SwARQLog ensuring polynomial data-complexity inspired from the notion of super-weak acyclicity in data exchange. Furthermore, we prove that SPARQLog with no universal quantifiers in the scope of existential ones (∀ ∃ fragment) is equivalent to full SPARQLog in presence of graph projection. Thus, the convenience of arbitrary quantifier alternation comes, in fact, for free. These results, though here presented in the context of RDF querying, apply similarly also in the more general setting of data exchange.

  6. Querying XML Data with SPARQL

    NASA Astrophysics Data System (ADS)

    Bikakis, Nikos; Gioldasis, Nektarios; Tsinaraki, Chrisa; Christodoulakis, Stavros

    SPARQL is today the standard access language for Semantic Web data. In the recent years XML databases have also acquired industrial importance due to the widespread applicability of XML in the Web. In this paper we present a framework that bridges the heterogeneity gap and creates an interoperable environment where SPARQL queries are used to access XML databases. Our approach assumes that fairly generic mappings between ontology constructs and XML Schema constructs have been automatically derived or manually specified. The mappings are used to automatically translate SPARQL queries to semantically equivalent XQuery queries which are used to access the XML databases. We present the algorithms and the implementation of SPARQL2XQuery framework, which is used for answering SPARQL queries over XML databases.

  7. MYCIN II: design and implementation of a therapy reference with complex content-based indexing.

    PubMed Central

    Kim, D. K.; Fagan, L. M.; Jones, K. T.; Berrios, D. C.; Yu, V. L.

    1998-01-01

    We describe the construction of MYCIN II, a prototype system that provides for content-based markup and search of a forthcoming clinical therapeutics textbook, Antimicrobial Therapy and Vaccines. Existing commercial search technology for digital references utilizes generic tools such as textword-based searches with geographical or statistical refinements. We suggest that the drawbacks of such systems significantly restrict their use in everyday clinical practice. This is in spite of the fact that there is a great need for the information contained within these same references. The system we describe is intended to supplement keyword searching so that certain important questions can be asked easily and can be answered reliably (in terms of precision and recall). Our method attacks this problem in a restricted domain of knowledge-clinical infectious disease. For example, we would like to be able to answer the class of questions exemplified by the following query: "What antimicrobial agents can be used to treat endocarditis caused by Eikenella corrodens?" We have compiled and analyzed a list of such questions to develop a concept-based markup scheme. This scheme was then applied within an HTML markup to electronically "highlight" passages from three textbook chapters. We constructed a functioning web-based search interface. Our system also provides semi-automated querying of PubMed using our concept markup and the user's actions as a guide. PMID:9929205

  8. MYCIN II: design and implementation of a therapy reference with complex content-based indexing.

    PubMed

    Kim, D K; Fagan, L M; Jones, K T; Berrios, D C; Yu, V L

    1998-01-01

    We describe the construction of MYCIN II, a prototype system that provides for content-based markup and search of a forthcoming clinical therapeutics textbook, Antimicrobial Therapy and Vaccines. Existing commercial search technology for digital references utilizes generic tools such as textword-based searches with geographical or statistical refinements. We suggest that the drawbacks of such systems significantly restrict their use in everyday clinical practice. This is in spite of the fact that there is a great need for the information contained within these same references. The system we describe is intended to supplement keyword searching so that certain important questions can be asked easily and can be answered reliably (in terms of precision and recall). Our method attacks this problem in a restricted domain of knowledge-clinical infectious disease. For example, we would like to be able to answer the class of questions exemplified by the following query: "What antimicrobial agents can be used to treat endocarditis caused by Eikenella corrodens?" We have compiled and analyzed a list of such questions to develop a concept-based markup scheme. This scheme was then applied within an HTML markup to electronically "highlight" passages from three textbook chapters. We constructed a functioning web-based search interface. Our system also provides semi-automated querying of PubMed using our concept markup and the user's actions as a guide.

  9. Comparison of the efficacy of three PubMed search filters in finding randomized controlled trials to answer clinical questions.

    PubMed

    Yousefi-Nooraie, Reza; Irani, Shirin; Mortaz-Hedjri, Soroush; Shakiba, Behnam

    2013-10-01

    The aim of this study was to compare the performance of three search methods in the retrieval of relevant clinical trials from PubMed to answer specific clinical questions. Included studies of a sample of 100 Cochrane reviews which recorded in PubMed were considered as the reference standard. The search queries were formulated based on the systematic review titles. Precision, recall and number of retrieved records for limiting the results to clinical trial publication type, and using sensitive and specific clinical queries filters were compared. The number of keywords, presence of specific names of intervention and syndrome in the search keywords were used in a model to predict the recalls and precisions. The Clinical queries-sensitive search strategy retrieved the largest number of records (33) and had the highest recall (41.6%) and lowest precision (4.8%). The presence of specific intervention name was the only significant predictor of all recalls and precisions (P = 0.016). The recall and precision of combination of simple clinical search queries and methodological search filters to find clinical trials in various subjects were considerably low. The limit field strategy yielded in higher precision and fewer retrieved records and approximately similar recall, compared with the clinical queries-sensitive strategy. Presence of specific intervention name in the search keywords increased both recall and precision. © 2010 John Wiley & Sons Ltd.

  10. Clinician-Oriented Access to Data - C.O.A.D.: A Natural Language Interface to a VA DHCP Database

    PubMed Central

    Levy, Christine; Rogers, Elizabeth

    1995-01-01

    Hospitals collect enormous amounts of data related to the on-going care of patients. Unfortunately, a clinicians access to the data is limited by complexities of the database structure and/or programming skills required to access the database. The COAD project attempts to bridge the gap between the clinical user's need for specific information from the database, and the wealth of data residing in the hospital information system. The project design includes a natural language interface to data contained in a VA DHCP database. We have developed a prototype which links natural language software to certain DHCP data elements, including, patient demographics, prescriptions, diagnoses, laboratory data, and provider information. English queries can by typed onto the system, and answers to the questions are returned. Future work includes refinement of natural language/DHCP connections to enable more sophisticated queries, and optimization of the system to reduce response time to user questions.

  11. Querying databases of trajectories of differential equations: Data structures for trajectories

    NASA Technical Reports Server (NTRS)

    Grossman, Robert

    1989-01-01

    One approach to qualitative reasoning about dynamical systems is to extract qualitative information by searching or making queries on databases containing very large numbers of trajectories. The efficiency of such queries depends crucially upon finding an appropriate data structure for trajectories of dynamical systems. Suppose that a large number of parameterized trajectories gamma of a dynamical system evolving in R sup N are stored in a database. Let Eta is contained in set R sup N denote a parameterized path in Euclidean Space, and let the Euclidean Norm denote a norm on the space of paths. A data structure is defined to represent trajectories of dynamical systems, and an algorithm is sketched which answers queries.

  12. Data Parallel Bin-Based Indexing for Answering Queries on Multi-Core Architectures

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gosink, Luke; Wu, Kesheng; Bethel, E. Wes

    2009-06-02

    The multi-core trend in CPUs and general purpose graphics processing units (GPUs) offers new opportunities for the database community. The increase of cores at exponential rates is likely to affect virtually every server and client in the coming decade, and presents database management systems with a huge, compelling disruption that will radically change how processing is done. This paper presents a new parallel indexing data structure for answering queries that takes full advantage of the increasing thread-level parallelism emerging in multi-core architectures. In our approach, our Data Parallel Bin-based Index Strategy (DP-BIS) first bins the base data, and then partitionsmore » and stores the values in each bin as a separate, bin-based data cluster. In answering a query, the procedures for examining the bin numbers and the bin-based data clusters offer the maximum possible level of concurrency; each record is evaluated by a single thread and all threads are processed simultaneously in parallel. We implement and demonstrate the effectiveness of DP-BIS on two multi-core architectures: a multi-core CPU and a GPU. The concurrency afforded by DP-BIS allows us to fully utilize the thread-level parallelism provided by each architecture--for example, our GPU-based DP-BIS implementation simultaneously evaluates over 12,000 records with an equivalent number of concurrently executing threads. In comparing DP-BIS's performance across these architectures, we show that the GPU-based DP-BIS implementation requires significantly less computation time to answer a query than the CPU-based implementation. We also demonstrate in our analysis that DP-BIS provides better overall performance than the commonly utilized CPU and GPU-based projection index. Finally, due to data encoding, we show that DP-BIS accesses significantly smaller amounts of data than index strategies that operate solely on a column's base data; this smaller data footprint is critical for parallel processors that possess limited memory resources (e.g., GPUs).« less

  13. An ontology-based comparative anatomy information system

    PubMed Central

    Travillian, Ravensara S.; Diatchka, Kremena; Judge, Tejinder K.; Wilamowska, Katarzyna; Shapiro, Linda G.

    2010-01-01

    Introduction This paper describes the design, implementation, and potential use of a comparative anatomy information system (CAIS) for querying on similarities and differences between homologous anatomical structures across species, the knowledge base it operates upon, the method it uses for determining the answers to the queries, and the user interface it employs to present the results. The relevant informatics contributions of our work include (1) the development and application of the structural difference method, a formalism for symbolically representing anatomical similarities and differences across species; (2) the design of the structure of a mapping between the anatomical models of two different species and its application to information about specific structures in humans, mice, and rats; and (3) the design of the internal syntax and semantics of the query language. These contributions provide the foundation for the development of a working system that allows users to submit queries about the similarities and differences between mouse, rat, and human anatomy; delivers result sets that describe those similarities and differences in symbolic terms; and serves as a prototype for the extension of the knowledge base to any number of species. Additionally, we expanded the domain knowledge by identifying medically relevant structural questions for the human, the mouse, and the rat, and made an initial foray into the validation of the application and its content by means of user questionnaires, software testing, and other feedback. Methods The anatomical structures of the species to be compared, as well as the mappings between species, are modeled on templates from the Foundational Model of Anatomy knowledge base, and compared using graph-matching techniques. A graphical user interface allows users to issue queries that retrieve information concerning similarities and differences between structures in the species being examined. Queries from diverse information sources, including domain experts, peer-reviewed articles, and reference books, have been used to test the system and to illustrate its potential use in comparative anatomy studies. Results 157 test queries were submitted to the CAIS system, and all of them were correctly answered. The interface was evaluated in terms of clarity and ease of use. This testing determined that the application works well, and is fairly intuitive to use, but users want to see more clarification of the meaning of the different types of possible queries. Some of the interface issues will naturally be resolved as we refine our conceptual model to deal with partial and complex homologies in the content. Conclusions The CAIS system and its associated methods are expected to be useful to biologists and translational medicine researchers. Possible applications range from supporting theoretical work in clarifying and modeling ontogenetic, physiological, pathological, and evolutionary transformations, to concrete techniques for improving the analysis of genotype–phenotype relationships among various animal models in support of a wide array of clinical and scientific initiatives. PMID:21146377

  14. A Query Integrator and Manager for the Query Web

    PubMed Central

    Brinkley, James F.; Detwiler, Landon T.

    2012-01-01

    We introduce two concepts: the Query Web as a layer of interconnected queries over the document web and the semantic web, and a Query Web Integrator and Manager (QI) that enables the Query Web to evolve. QI permits users to write, save and reuse queries over any web accessible source, including other queries saved in other installations of QI. The saved queries may be in any language (e.g. SPARQL, XQuery); the only condition for interconnection is that the queries return their results in some form of XML. This condition allows queries to chain off each other, and to be written in whatever language is appropriate for the task. We illustrate the potential use of QI for several biomedical use cases, including ontology view generation using a combination of graph-based and logical approaches, value set generation for clinical data management, image annotation using terminology obtained from an ontology web service, ontology-driven brain imaging data integration, small-scale clinical data integration, and wider-scale clinical data integration. Such use cases illustrate the current range of applications of QI and lead us to speculate about the potential evolution from smaller groups of interconnected queries into a larger query network that layers over the document and semantic web. The resulting Query Web could greatly aid researchers and others who now have to manually navigate through multiple information sources in order to answer specific questions. PMID:22531831

  15. Reference Readiness for AV Questions.

    ERIC Educational Resources Information Center

    Drolet, Leon L., Jr.

    1981-01-01

    Reviews 50 reference tools which librarians can use to answer almost any audiovisual question including queries on trivia, equipment selection, biographical information, and motion picture ratings. (LLS)

  16. An RDF/OWL knowledge base for query answering and decision support in clinical pharmacogenetics.

    PubMed

    Samwald, Matthias; Freimuth, Robert; Luciano, Joanne S; Lin, Simon; Powers, Robert L; Marshall, M Scott; Adlassnig, Klaus-Peter; Dumontier, Michel; Boyce, Richard D

    2013-01-01

    Genetic testing for personalizing pharmacotherapy is bound to become an important part of clinical routine. To address associated issues with data management and quality, we are creating a semantic knowledge base for clinical pharmacogenetics. The knowledge base is made up of three components: an expressive ontology formalized in the Web Ontology Language (OWL 2 DL), a Resource Description Framework (RDF) model for capturing detailed results of manual annotation of pharmacogenomic information in drug product labels, and an RDF conversion of relevant biomedical datasets. Our work goes beyond the state of the art in that it makes both automated reasoning as well as query answering as simple as possible, and the reasoning capabilities go beyond the capabilities of previously described ontologies.

  17. VPipe: Virtual Pipelining for Scheduling of DAG Stream Query Plans

    NASA Astrophysics Data System (ADS)

    Wang, Song; Gupta, Chetan; Mehta, Abhay

    There are data streams all around us that can be harnessed for tremendous business and personal advantage. For an enterprise-level stream processing system such as CHAOS [1] (Continuous, Heterogeneous Analytic Over Streams), handling of complex query plans with resource constraints is challenging. While several scheduling strategies exist for stream processing, efficient scheduling of complex DAG query plans is still largely unsolved. In this paper, we propose a novel execution scheme for scheduling complex directed acyclic graph (DAG) query plans with meta-data enriched stream tuples. Our solution, called Virtual Pipelined Chain (or VPipe Chain for short), effectively extends the "Chain" pipelining scheduling approach to complex DAG query plans.

  18. Semantic Annotations and Querying of Web Data Sources

    NASA Astrophysics Data System (ADS)

    Hornung, Thomas; May, Wolfgang

    A large part of the Web, actually holding a significant portion of the useful information throughout the Web, consists of views on hidden databases, provided by numerous heterogeneous interfaces that are partly human-oriented via Web forms ("Deep Web"), and partly based on Web Services (only machine accessible). In this paper we present an approach for annotating these sources in a way that makes them citizens of the Semantic Web. We illustrate how queries can be stated in terms of the ontology, and how the annotations are used to selected and access appropriate sources and to answer the queries.

  19. Querying databases of trajectories of differential equations 2: Index functions

    NASA Technical Reports Server (NTRS)

    Grossman, Robert

    1991-01-01

    Suppose that a large number of parameterized trajectories (gamma) of a dynamical system evolving in R sup N are stored in a database. Let eta is contained R sup N denote a parameterized path in Euclidean space, and let parallel to center dot parallel to denote a norm on the space of paths. A data structures and indices for trajectories are defined and algorithms are given to answer queries of the following forms: Query 1. Given a path eta, determine whether eta occurs as a subtrajectory of any trajectory gamma from the database. If so, return the trajectory; otherwise, return null. Query 2. Given a path eta, return the trajectory gamma from the database which minimizes the norm parallel to eta - gamma parallel.

  20. Relational Algebra in Spatial Decision Support Systems Ontologies.

    PubMed

    Diomidous, Marianna; Chardalias, Kostis; Koutonias, Panagiotis; Magnita, Adrianna; Andrianopoulos, Charalampos; Zimeras, Stelios; Mechili, Enkeleint Aggelos

    2017-01-01

    Decision Support Systems (DSS) is a powerful tool, for facilitates researchers to choose the correct decision based on their final results. Especially in medical cases where doctors could use these systems, to overcome the problem with the clinical misunderstanding. Based on these systems, queries must be constructed based on the particular questions that doctors must answer. In this work, combination between questions and queries would be presented via relational algebra.

  1. Visual analytics for semantic queries of TerraSAR-X image content

    NASA Astrophysics Data System (ADS)

    Espinoza-Molina, Daniela; Alonso, Kevin; Datcu, Mihai

    2015-10-01

    With the continuous image product acquisition of satellite missions, the size of the image archives is considerably increasing every day as well as the variety and complexity of their content, surpassing the end-user capacity to analyse and exploit them. Advances in the image retrieval field have contributed to the development of tools for interactive exploration and extraction of the images from huge archives using different parameters like metadata, key-words, and basic image descriptors. Even though we count on more powerful tools for automated image retrieval and data analysis, we still face the problem of understanding and analyzing the results. Thus, a systematic computational analysis of these results is required in order to provide to the end-user a summary of the archive content in comprehensible terms. In this context, visual analytics combines automated analysis with interactive visualizations analysis techniques for an effective understanding, reasoning and decision making on the basis of very large and complex datasets. Moreover, currently several researches are focused on associating the content of the images with semantic definitions for describing the data in a format to be easily understood by the end-user. In this paper, we present our approach for computing visual analytics and semantically querying the TerraSAR-X archive. Our approach is mainly composed of four steps: 1) the generation of a data model that explains the information contained in a TerraSAR-X product. The model is formed by primitive descriptors and metadata entries, 2) the storage of this model in a database system, 3) the semantic definition of the image content based on machine learning algorithms and relevance feedback, and 4) querying the image archive using semantic descriptors as query parameters and computing the statistical analysis of the query results. The experimental results shows that with the help of visual analytics and semantic definitions we are able to explain the image content using semantic terms and the relations between them answering questions such as what is the percentage of urban area in a region? or what is the distribution of water bodies in a city?

  2. Data Processing on Database Management Systems with Fuzzy Query

    NASA Astrophysics Data System (ADS)

    Şimşek, Irfan; Topuz, Vedat

    In this study, a fuzzy query tool (SQLf) for non-fuzzy database management systems was developed. In addition, samples of fuzzy queries were made by using real data with the tool developed in this study. Performance of SQLf was tested with the data about the Marmara University students' food grant. The food grant data were collected in MySQL database by using a form which had been filled on the web. The students filled a form on the web to describe their social and economical conditions for the food grant request. This form consists of questions which have fuzzy and crisp answers. The main purpose of this fuzzy query is to determine the students who deserve the grant. The SQLf easily found the eligible students for the grant through predefined fuzzy values. The fuzzy query tool (SQLf) could be used easily with other database system like ORACLE and SQL server.

  3. An intelligent user interface for browsing satellite data catalogs

    NASA Technical Reports Server (NTRS)

    Cromp, Robert F.; Crook, Sharon

    1989-01-01

    A large scale domain-independent spatial data management expert system that serves as a front-end to databases containing spatial data is described. This system is unique for two reasons. First, it uses spatial search techniques to generate a list of all the primary keys that fall within a user's spatial constraints prior to invoking the database management system, thus substantially decreasing the amount of time required to answer a user's query. Second, a domain-independent query expert system uses a domain-specific rule base to preprocess the user's English query, effectively mapping a broad class of queries into a smaller subset that can be handled by a commercial natural language processing system. The methods used by the spatial search module and the query expert system are explained, and the system architecture for the spatial data management expert system is described. The system is applied to data from the International Ultraviolet Explorer (IUE) satellite, and results are given.

  4. Remembrance of inferences past: Amortization in human hypothesis generation.

    PubMed

    Dasgupta, Ishita; Schulz, Eric; Goodman, Noah D; Gershman, Samuel J

    2018-05-21

    Bayesian models of cognition assume that people compute probability distributions over hypotheses. However, the required computations are frequently intractable or prohibitively expensive. Since people often encounter many closely related distributions, selective reuse of computations (amortized inference) is a computationally efficient use of the brain's limited resources. We present three experiments that provide evidence for amortization in human probabilistic reasoning. When sequentially answering two related queries about natural scenes, participants' responses to the second query systematically depend on the structure of the first query. This influence is sensitive to the content of the queries, only appearing when the queries are related. Using a cognitive load manipulation, we find evidence that people amortize summary statistics of previous inferences, rather than storing the entire distribution. These findings support the view that the brain trades off accuracy and computational cost, to make efficient use of its limited cognitive resources to approximate probabilistic inference. Copyright © 2018 Elsevier B.V. All rights reserved.

  5. Use of e-mail for Parkinson's disease consultations: Are answers just a clic away?

    PubMed

    Viedma-Guiard, E; Agüero, P; Crespo-Araico, L; Estévez-Fraga, C; Sánchez-Díez, G; López-Sendón, J L; Aviles-Olmos, I; García-Ribas, G; Palacios Romero, M L; Masjuan Vallejo, J; Martínez-Castrillo, J C; Alonso-Cánovas, A

    2018-03-01

    The clinical problems of patients with movement disorders (MD) are complex, and the duration and frequency of face-to-face consultations may be insufficient to meet their needs. We analysed the implementation of an e-mail-based query service for our MD unit's patients and their primary care physicians (PCPs). We retrospectively reviewed all consecutive emails sent and received over a period of 4 months, one year after implementation of the e-mail inquiry system. All patients received the during consultations, and PCPs, during scheduled informative meetings. We recorded and later analysed the profile of the questioner, patients' demographic and clinical data, number of queries, reason for consultation, and actions taken. From 1 January 2015 to 30 April 2015, the service received 137 emails from 63 patients (43% male, mean age 71±10.5) diagnosed with Parkinson's disease (76%), atypical parkinsonism (10%), and others (14%); 116 responses were sent. Twenty (32%) emails were written by patients, 38 (60%) by their caregivers, and 5 (8%) by their PCPs. The reasons for consultation were clinical in 50 cases (80%): 16 (32%) described clinical deterioration, 14 (28%) onset of new symptoms, and 20 (40%) side effects or concerns about medications. In 13 cases (20%), the query was bureaucratic: 11 were related to appointments (85%) and 2 were requests for clinical reports (15%). In response, new appointments were scheduled in 9 cases (14%), while the rest of the questions were answered by email. Patients were satisfied overall and the additional care burden on specialists was not excessive. Implementing an e-mail-based consultation system is feasible in MD units. It facilitates both communication between neurologists and patients and continued care in the primary care setting. Copyright © 2016 Sociedad Española de Neurología. Publicado por Elsevier España, S.L.U. All rights reserved.

  6. Visual Turing test for computer vision systems

    PubMed Central

    Geman, Donald; Geman, Stuart; Hallonquist, Neil; Younes, Laurent

    2015-01-01

    Today, computer vision systems are tested by their accuracy in detecting and localizing instances of objects. As an alternative, and motivated by the ability of humans to provide far richer descriptions and even tell a story about an image, we construct a “visual Turing test”: an operator-assisted device that produces a stochastic sequence of binary questions from a given test image. The query engine proposes a question; the operator either provides the correct answer or rejects the question as ambiguous; the engine proposes the next question (“just-in-time truthing”). The test is then administered to the computer-vision system, one question at a time. After the system’s answer is recorded, the system is provided the correct answer and the next question. Parsing is trivial and deterministic; the system being tested requires no natural language processing. The query engine employs statistical constraints, learned from a training set, to produce questions with essentially unpredictable answers—the answer to a question, given the history of questions and their correct answers, is nearly equally likely to be positive or negative. In this sense, the test is only about vision. The system is designed to produce streams of questions that follow natural story lines, from the instantiation of a unique object, through an exploration of its properties, and on to its relationships with other uniquely instantiated objects. PMID:25755262

  7. KnowledgeLink: Impact of Context-Sensitive Information Retrieval on Clinicians' Information Needs

    PubMed Central

    Maviglia, Saverio M.; Yoon, Catherine S.; Bates, David W.; Kuperman, Gilad

    2006-01-01

    Objective: Infobuttons are message-based content search and retrieval functions embedded within other applications that dynamically return information relevant to the clinical task at hand. The objective of this study was to determine whether infobuttons effectively answer providers' questions about medications or affect patient care decisions. Design: The authors implemented and evaluated a medication infobutton application called KnowledgeLink. Health care providers at 18 outpatient clinics were randomized to one of two versions of KnowledgeLink, one that linked to information from Micromedex (Thomson Micromedex, Greenwood Village, Co) and the other to material from SkolarMD (Wolters Kluwer Health, Palo Alto, CA). Measurements: Data were collected about the frequency of use and demographics of users, patients, and drugs that were queried. Users were periodically surveyed with short questionnaires and then with a more extensive survey at the end of one year. Results: During the first year, KnowledgeLink was used 7,972 times by 359 users to look up information about 1,723 medications for 4,961 patients. Clinicians used KnowledgeLink twice a month on average, and during an average of 1.2% of patient encounters. KnowledgeLink was used by a wide variety of medical staff, not just physicians and nurse practitioners. The frequency of usage and the questions asked varied with user role (primary care physician, specialist physician, nurse practitioner). Although the median KnowledgeLink session was brief (21 seconds), KnowledgeLink answered users' queries 84% of the time, and altered patient care decisions 15% of the time. Users rated KnowledgeLink favorably on multiple scales, recommended extending KnowledgeLink to other content domains, and suggested enhancing the interface to allow refinement of the query and selection of the target resource. Conclusion: An infobutton can satisfy information needs about medications. Although used infrequently and for brief sessions, KnowledgeLink was positively received, answered most users' questions, and had a significant impact on medical decision making. The next steps would be to broaden the domains that KnowledgeLink covers to more specifically tailor results to the user type, to provide options when queries are not immediately answered, and to implement KnowledgeLink within other electronic clinical applications. PMID:16221942

  8. Multidimensional indexing structure for use with linear optimization queries

    NASA Technical Reports Server (NTRS)

    Bergman, Lawrence David (Inventor); Castelli, Vittorio (Inventor); Chang, Yuan-Chi (Inventor); Li, Chung-Sheng (Inventor); Smith, John Richard (Inventor)

    2002-01-01

    Linear optimization queries, which usually arise in various decision support and resource planning applications, are queries that retrieve top N data records (where N is an integer greater than zero) which satisfy a specific optimization criterion. The optimization criterion is to either maximize or minimize a linear equation. The coefficients of the linear equation are given at query time. Methods and apparatus are disclosed for constructing, maintaining and utilizing a multidimensional indexing structure of database records to improve the execution speed of linear optimization queries. Database records with numerical attributes are organized into a number of layers and each layer represents a geometric structure called convex hull. Such linear optimization queries are processed by searching from the outer-most layer of this multi-layer indexing structure inwards. At least one record per layer will satisfy the query criterion and the number of layers needed to be searched depends on the spatial distribution of records, the query-issued linear coefficients, and N, the number of records to be returned. When N is small compared to the total size of the database, answering the query typically requires searching only a small fraction of all relevant records, resulting in a tremendous speedup as compared to linearly scanning the entire dataset.

  9. The development of a natural language interface to a geographical information system

    NASA Technical Reports Server (NTRS)

    Toledo, Sue Walker; Davis, Bruce

    1993-01-01

    This paper will discuss a two and a half year long project undertaken to develop an English-language interface for the geographical information system GRASS. The work was carried out for NASA by a small business, Netrologic, based in San Diego, California, under Phase 1 and 2 Small Business Innovative Research contracts. We consider here the potential value of this system whose current functionality addresses numerical, categorical and boolean raster layers and includes the display of point sets defined by constraints on one or more layers, answers yes/no and numerical questions, and creates statistical reports. It also handles complex queries and lexical ambiguities, and allows temporarily switching to UNIX or GRASS.

  10. An efficient compression scheme for bitmap indices

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wu, Kesheng; Otoo, Ekow J.; Shoshani, Arie

    2004-04-13

    When using an out-of-core indexing method to answer a query, it is generally assumed that the I/O cost dominates the overall query response time. Because of this, most research on indexing methods concentrate on reducing the sizes of indices. For bitmap indices, compression has been used for this purpose. However, in most cases, operations on these compressed bitmaps, mostly bitwise logical operations such as AND, OR, and NOT, spend more time in CPU than in I/O. To speedup these operations, a number of specialized bitmap compression schemes have been developed; the best known of which is the byte-aligned bitmap codemore » (BBC). They are usually faster in performing logical operations than the general purpose compression schemes, but, the time spent in CPU still dominates the total query response time. To reduce the query response time, we designed a CPU-friendly scheme named the word-aligned hybrid (WAH) code. In this paper, we prove that the sizes of WAH compressed bitmap indices are about two words per row for large range of attributes. This size is smaller than typical sizes of commonly used indices, such as a B-tree. Therefore, WAH compressed indices are not only appropriate for low cardinality attributes but also for high cardinality attributes.In the worst case, the time to operate on compressed bitmaps is proportional to the total size of the bitmaps involved. The total size of the bitmaps required to answer a query on one attribute is proportional to the number of hits. These indicate that WAH compressed bitmap indices are optimal. To verify their effectiveness, we generated bitmap indices for four different datasets and measured the response time of many range queries. Tests confirm that sizes of compressed bitmap indices are indeed smaller than B-tree indices, and query processing with WAH compressed indices is much faster than with BBC compressed indices, projection indices and B-tree indices. In addition, we also verified that the average query response time is proportional to the index size. This indicates that the compressed bitmap indices are efficient for very large datasets.« less

  11. Towards Hybrid Online On-Demand Querying of Realtime Data with Stateful Complex Event Processing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhou, Qunzhi; Simmhan, Yogesh; Prasanna, Viktor K.

    Emerging Big Data applications in areas like e-commerce and energy industry require both online and on-demand queries to be performed over vast and fast data arriving as streams. These present novel challenges to Big Data management systems. Complex Event Processing (CEP) is recognized as a high performance online query scheme which in particular deals with the velocity aspect of the 3-V’s of Big Data. However, traditional CEP systems do not consider data variety and lack the capability to embed ad hoc queries over the volume of data streams. In this paper, we propose H2O, a stateful complex event processing framework,more » to support hybrid online and on-demand queries over realtime data. We propose a semantically enriched event and query model to address data variety. A formal query algebra is developed to precisely capture the stateful and containment semantics of online and on-demand queries. We describe techniques to achieve the interactive query processing over realtime data featured by efficient online querying, dynamic stream data persistence and on-demand access. The system architecture is presented and the current implementation status reported.« less

  12. Voice-enabled Knowledge Engine using Flood Ontology and Natural Language Processing

    NASA Astrophysics Data System (ADS)

    Sermet, M. Y.; Demir, I.; Krajewski, W. F.

    2015-12-01

    The Iowa Flood Information System (IFIS) is a web-based platform developed by the Iowa Flood Center (IFC) to provide access to flood inundation maps, real-time flood conditions, flood forecasts, flood-related data, information and interactive visualizations for communities in Iowa. The IFIS is designed for use by general public, often people with no domain knowledge and limited general science background. To improve effective communication with such audience, we have introduced a voice-enabled knowledge engine on flood related issues in IFIS. Instead of navigating within many features and interfaces of the information system and web-based sources, the system provides dynamic computations based on a collection of built-in data, analysis, and methods. The IFIS Knowledge Engine connects to real-time stream gauges, in-house data sources, analysis and visualization tools to answer natural language questions. Our goal is the systematization of data and modeling results on flood related issues in Iowa, and to provide an interface for definitive answers to factual queries. The goal of the knowledge engine is to make all flood related knowledge in Iowa easily accessible to everyone, and support voice-enabled natural language input. We aim to integrate and curate all flood related data, implement analytical and visualization tools, and make it possible to compute answers from questions. The IFIS explicitly implements analytical methods and models, as algorithms, and curates all flood related data and resources so that all these resources are computable. The IFIS Knowledge Engine computes the answer by deriving it from its computational knowledge base. The knowledge engine processes the statement, access data warehouse, run complex database queries on the server-side and return outputs in various formats. This presentation provides an overview of IFIS Knowledge Engine, its unique information interface and functionality as an educational tool, and discusses the future plans for providing knowledge on flood related issues and resources. IFIS Knowledge Engine provides an alternative access method to these comprehensive set of tools and data resources available in IFIS. Current implementation of the system accepts free-form input and voice recognition capabilities within browser and mobile applications.

  13. Olelo: a web application for intuitive exploration of biomedical literature

    PubMed Central

    Niedermeier, Julian; Jankrift, Marcel; Tietböhl, Sören; Stachewicz, Toni; Folkerts, Hendrik; Uflacker, Matthias; Neves, Mariana

    2017-01-01

    Abstract Researchers usually query the large biomedical literature in PubMed via keywords, logical operators and filters, none of which is very intuitive. Question answering systems are an alternative to keyword searches. They allow questions in natural language as input and results reflect the given type of question, such as short answers and summaries. Few of those systems are available online but they experience drawbacks in terms of long response times and they support a limited amount of question and result types. Additionally, user interfaces are usually restricted to only displaying the retrieved information. For our Olelo web application, we combined biomedical literature and terminologies in a fast in-memory database to enable real-time responses to researchers’ queries. Further, we extended the built-in natural language processing features of the database with question answering and summarization procedures. Combined with a new explorative approach of document filtering and a clean user interface, Olelo enables a fast and intelligent search through the ever-growing biomedical literature. Olelo is available at http://www.hpi.de/plattner/olelo. PMID:28472397

  14. An EHR Prototype Using Structured ISO/EN 13606 Documents to Respond to Identified Clinical Information Needs of Diabetes Specialists: A Controlled Study on Feasibility and Impact

    PubMed Central

    Huebner-Bloder, Gudrun; Duftschmid, Georg; Kohler, Michael; Rinner, Christoph; Saboor, Samrend; Ammenwerth, Elske

    2012-01-01

    Cross-institutional longitudinal Electronic Health Records (EHR), as introduced in Austria at the moment, increase the challenge of information overload of healthcare professionals. We developed an innovative cross-institutional EHR query prototype that offers extended query options, including searching for specific information items or sets of information items. The available query options were derived from a systematic analysis of information needs of diabetes specialists during patient encounters. The prototype operates in an IHE-XDS-based environment where ISO/EN 13606-structured documents are available. We conducted a controlled study with seven diabetes specialists to assess the feasibility and impact of this EHR query prototype on efficient retrieving of patient information to answer typical clinical questions. The controlled study showed that the specialists were quicker and more successful (measured in percentage of expected information items found) in finding patient information compared to the standard full-document search options. The participants also appreciated the extended query options. PMID:23304308

  15. Research on Extension of Sparql Ontology Query Language Considering the Computation of Indoor Spatial Relations

    NASA Astrophysics Data System (ADS)

    Li, C.; Zhu, X.; Guo, W.; Liu, Y.; Huang, H.

    2015-05-01

    A method suitable for indoor complex semantic query considering the computation of indoor spatial relations is provided According to the characteristics of indoor space. This paper designs ontology model describing the space related information of humans, events and Indoor space objects (e.g. Storey and Room) as well as their relations to meet the indoor semantic query. The ontology concepts are used in IndoorSPARQL query language which extends SPARQL syntax for representing and querying indoor space. And four types specific primitives for indoor query, "Adjacent", "Opposite", "Vertical" and "Contain", are defined as query functions in IndoorSPARQL used to support quantitative spatial computations. Also a method is proposed to analysis the query language. Finally this paper adopts this method to realize indoor semantic query on the study area through constructing the ontology model for the study building. The experimental results show that the method proposed in this paper can effectively support complex indoor space semantic query.

  16. Bin-Hash Indexing: A Parallel Method for Fast Query Processing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bethel, Edward W; Gosink, Luke J.; Wu, Kesheng

    2008-06-27

    This paper presents a new parallel indexing data structure for answering queries. The index, called Bin-Hash, offers extremely high levels of concurrency, and is therefore well-suited for the emerging commodity of parallel processors, such as multi-cores, cell processors, and general purpose graphics processing units (GPU). The Bin-Hash approach first bins the base data, and then partitions and separately stores the values in each bin as a perfect spatial hash table. To answer a query, we first determine whether or not a record satisfies the query conditions based on the bin boundaries. For the bins with records that can not bemore » resolved, we examine the spatial hash tables. The procedures for examining the bin numbers and the spatial hash tables offer the maximum possible level of concurrency; all records are able to be evaluated by our procedure independently in parallel. Additionally, our Bin-Hash procedures access much smaller amounts of data than similar parallel methods, such as the projection index. This smaller data footprint is critical for certain parallel processors, like GPUs, where memory resources are limited. To demonstrate the effectiveness of Bin-Hash, we implement it on a GPU using the data-parallel programming language CUDA. The concurrency offered by the Bin-Hash index allows us to fully utilize the GPU's massive parallelism in our work; over 12,000 records can be simultaneously evaluated at any one time. We show that our new query processing method is an order of magnitude faster than current state-of-the-art CPU-based indexing technologies. Additionally, we compare our performance to existing GPU-based projection index strategies.« less

  17. Visually defining and querying consistent multi-granular clinical temporal abstractions.

    PubMed

    Combi, Carlo; Oliboni, Barbara

    2012-02-01

    The main goal of this work is to propose a framework for the visual specification and query of consistent multi-granular clinical temporal abstractions. We focus on the issue of querying patient clinical information by visually defining and composing temporal abstractions, i.e., high level patterns derived from several time-stamped raw data. In particular, we focus on the visual specification of consistent temporal abstractions with different granularities and on the visual composition of different temporal abstractions for querying clinical databases. Temporal abstractions on clinical data provide a concise and high-level description of temporal raw data, and a suitable way to support decision making. Granularities define partitions on the time line and allow one to represent time and, thus, temporal clinical information at different levels of detail, according to the requirements coming from the represented clinical domain. The visual representation of temporal information has been considered since several years in clinical domains. Proposed visualization techniques must be easy and quick to understand, and could benefit from visual metaphors that do not lead to ambiguous interpretations. Recently, physical metaphors such as strips, springs, weights, and wires have been proposed and evaluated on clinical users for the specification of temporal clinical abstractions. Visual approaches to boolean queries have been considered in the last years and confirmed that the visual support to the specification of complex boolean queries is both an important and difficult research topic. We propose and describe a visual language for the definition of temporal abstractions based on a set of intuitive metaphors (striped wall, plastered wall, brick wall), allowing the clinician to use different granularities. A new algorithm, underlying the visual language, allows the physician to specify only consistent abstractions, i.e., abstractions not containing contradictory conditions on the component abstractions. Moreover, we propose a visual query language where different temporal abstractions can be composed to build complex queries: temporal abstractions are visually connected through the usual logical connectives AND, OR, and NOT. The proposed visual language allows one to simply define temporal abstractions by using intuitive metaphors, and to specify temporal intervals related to abstractions by using different temporal granularities. The physician can interact with the designed and implemented tool by point-and-click selections, and can visually compose queries involving several temporal abstractions. The evaluation of the proposed granularity-related metaphors consisted in two parts: (i) solving 30 interpretation exercises by choosing the correct interpretation of a given screenshot representing a possible scenario, and (ii) solving a complex exercise, by visually specifying through the interface a scenario described only in natural language. The exercises were done by 13 subjects. The percentage of correct answers to the interpretation exercises were slightly different with respect to the considered metaphors (54.4--striped wall, 73.3--plastered wall, 61--brick wall, and 61--no wall), but post hoc statistical analysis on means confirmed that differences were not statistically significant. The result of the user's satisfaction questionnaire related to the evaluation of the proposed granularity-related metaphors ratified that there are no preferences for one of them. The evaluation of the proposed logical notation consisted in two parts: (i) solving five interpretation exercises provided by a screenshot representing a possible scenario and by three different possible interpretations, of which only one was correct, and (ii) solving five exercises, by visually defining through the interface a scenario described only in natural language. Exercises had an increasing difficulty. The evaluation involved a total of 31 subjects. Results related to this evaluation phase confirmed us about the soundness of the proposed solution even in comparison with a well known proposal based on a tabular query form (the only significant difference is that our proposal requires more time for the training phase: 21 min versus 14 min). In this work we have considered the issue of visually composing and querying temporal clinical patient data. In this context we have proposed a visual framework for the specification of consistent temporal abstractions with different granularities and for the visual composition of different temporal abstractions to build (possibly) complex queries on clinical databases. A new algorithm has been proposed to check the consistency of the specified granular abstraction. From the evaluation of the proposed metaphors and interfaces and from the comparison of the visual query language with a well known visual method for boolean queries, the soundness of the overall system has been confirmed; moreover, pros and cons and possible improvements emerged from the comparison of different visual metaphors and solutions. Copyright © 2011 Elsevier B.V. All rights reserved.

  18. Array Databases: Agile Analytics (not just) for the Earth Sciences

    NASA Astrophysics Data System (ADS)

    Baumann, P.; Misev, D.

    2015-12-01

    Gridded data, such as images, image timeseries, and climate datacubes, today are managed separately from the metadata, and with different, restricted retrieval capabilities. While databases are good at metadata modelled in tables, XML hierarchies, or RDF graphs, they traditionally do not support multi-dimensional arrays.This gap is being closed by Array Databases, pioneered by the scalable rasdaman ("raster data manager") array engine. Its declarative query language, rasql, extends SQL with array operators which are optimized and parallelized on server side. Installations can easily be mashed up securely, thereby enabling large-scale location-transparent query processing in federations. Domain experts value the integration with their commonly used tools leading to a quick learning curve.Earth, Space, and Life sciences, but also Social sciences as well as business have massive amounts of data and complex analysis challenges that are answered by rasdaman. As of today, rasdaman is mature and in operational use on hundreds of Terabytes of timeseries datacubes, with transparent query distribution across more than 1,000 nodes. Additionally, its concepts have shaped international Big Data standards in the field, including the forthcoming array extension to ISO SQL, many of which are supported by both open-source and commercial systems meantime. In the geo field, rasdaman is reference implementation for the Open Geospatial Consortium (OGC) Big Data standard, WCS, now also under adoption by ISO. Further, rasdaman is in the final stage of OSGeo incubation.In this contribution we present array queries a la rasdaman, describe the architecture and novel optimization and parallelization techniques introduced in 2015, and put this in context of the intercontinental EarthServer initiative which utilizes rasdaman for enabling agile analytics on Petascale datacubes.

  19. A high performance, ad-hoc, fuzzy query processing system for relational databases

    NASA Technical Reports Server (NTRS)

    Mansfield, William H., Jr.; Fleischman, Robert M.

    1992-01-01

    Database queries involving imprecise or fuzzy predicates are currently an evolving area of academic and industrial research. Such queries place severe stress on the indexing and I/O subsystems of conventional database environments since they involve the search of large numbers of records. The Datacycle architecture and research prototype is a database environment that uses filtering technology to perform an efficient, exhaustive search of an entire database. It has recently been modified to include fuzzy predicates in its query processing. The approach obviates the need for complex index structures, provides unlimited query throughput, permits the use of ad-hoc fuzzy membership functions, and provides a deterministic response time largely independent of query complexity and load. This paper describes the Datacycle prototype implementation of fuzzy queries and some recent performance results.

  20. Auspice: Automatic Service Planning in Cloud/Grid Environments

    NASA Astrophysics Data System (ADS)

    Chiu, David; Agrawal, Gagan

    Recent scientific advances have fostered a mounting number of services and data sets available for utilization. These resources, though scattered across disparate locations, are often loosely coupled both semantically and operationally. This loosely coupled relationship implies the possibility of linking together operations and data sets to answer queries. This task, generally known as automatic service composition, therefore abstracts the process of complex scientific workflow planning from the user. We have been exploring a metadata-driven approach toward automatic service workflow composition, among other enabling mechanisms, in our system, Auspice: Automatic Service Planning in Cloud/Grid Environments. In this paper, we present a complete overview of our system's unique features and outlooks for future deployment as the Cloud computing paradigm becomes increasingly eminent in enabling scientific computing.

  1. The Internet School of Medicine: use of electronic resources by medical trainees and the reliability of those resources.

    PubMed

    Egle, Jonathan P; Smeenge, David M; Kassem, Kamal M; Mittal, Vijay K

    2015-01-01

    Electronic sources of medical information are plentiful, and numerous studies have demonstrated the use of the Internet by patients and the variable reliability of these sources. Studies have investigated neither the use of web-based resources by residents, nor the reliability of the information available on these websites. A web-based survey was distributed to surgical residents in Michigan and third- and fourth-year medical students at an American allopathic and osteopathic medical school and a Caribbean allopathic school regarding their preferred sources of medical information in various situations. A set of 254 queries simulating those faced by medical trainees on rounds, on a written examination, or during patient care was developed. The top 5 electronic resources cited by the trainees were evaluated for their ability to answer these questions accurately, using standard textbooks as the point of reference. The respondents reported a wide variety of overall preferred resources. Most of the 73 responding medical trainees favored textbooks or board review books for prolonged studying, but electronic resources are frequently used for quick studying, clinical decision-making questions, and medication queries. The most commonly used electronic resources were UpToDate, Google, Medscape, Wikipedia, and Epocrates. UpToDate and Epocrates had the highest percentage of correct answers (47%) and Wikipedia had the lowest (26%). Epocrates also had the highest percentage of wrong answers (30%), whereas Google had the lowest percentage (18%). All resources had a significant number of questions that they were unable to answer. Though hardcopy books have not been completely replaced by electronic resources, more than half of medical students and nearly half of residents prefer web-based sources of information. For quick questions and studying, both groups prefer Internet sources. However, the most commonly used electronic resources fail to answer clinical queries more than half of the time and have an alarmingly high rate of inaccurate information. Copyright © 2014 Association of Program Directors in Surgery. Published by Elsevier Inc. All rights reserved.

  2. State & Society: Presidential Candidates Answer Queries on Science Policy

    ERIC Educational Resources Information Center

    Physics Today, 1976

    1976-01-01

    Presents views of Gerald Ford and Jimmy Carter on the role of science advisors in the Executive Office of the President, national energy needs and the nuclear power program, and federal support for basic and applied science. (MLH)

  3. Reactome graph database: Efficient access to complex pathway data

    PubMed Central

    Korninger, Florian; Viteri, Guilherme; Marin-Garcia, Pablo; Ping, Peipei; Wu, Guanming; Stein, Lincoln; D’Eustachio, Peter

    2018-01-01

    Reactome is a free, open-source, open-data, curated and peer-reviewed knowledgebase of biomolecular pathways. One of its main priorities is to provide easy and efficient access to its high quality curated data. At present, biological pathway databases typically store their contents in relational databases. This limits access efficiency because there are performance issues associated with queries traversing highly interconnected data. The same data in a graph database can be queried more efficiently. Here we present the rationale behind the adoption of a graph database (Neo4j) as well as the new ContentService (REST API) that provides access to these data. The Neo4j graph database and its query language, Cypher, provide efficient access to the complex Reactome data model, facilitating easy traversal and knowledge discovery. The adoption of this technology greatly improved query efficiency, reducing the average query time by 93%. The web service built on top of the graph database provides programmatic access to Reactome data by object oriented queries, but also supports more complex queries that take advantage of the new underlying graph-based data storage. By adopting graph database technology we are providing a high performance pathway data resource to the community. The Reactome graph database use case shows the power of NoSQL database engines for complex biological data types. PMID:29377902

  4. Reactome graph database: Efficient access to complex pathway data.

    PubMed

    Fabregat, Antonio; Korninger, Florian; Viteri, Guilherme; Sidiropoulos, Konstantinos; Marin-Garcia, Pablo; Ping, Peipei; Wu, Guanming; Stein, Lincoln; D'Eustachio, Peter; Hermjakob, Henning

    2018-01-01

    Reactome is a free, open-source, open-data, curated and peer-reviewed knowledgebase of biomolecular pathways. One of its main priorities is to provide easy and efficient access to its high quality curated data. At present, biological pathway databases typically store their contents in relational databases. This limits access efficiency because there are performance issues associated with queries traversing highly interconnected data. The same data in a graph database can be queried more efficiently. Here we present the rationale behind the adoption of a graph database (Neo4j) as well as the new ContentService (REST API) that provides access to these data. The Neo4j graph database and its query language, Cypher, provide efficient access to the complex Reactome data model, facilitating easy traversal and knowledge discovery. The adoption of this technology greatly improved query efficiency, reducing the average query time by 93%. The web service built on top of the graph database provides programmatic access to Reactome data by object oriented queries, but also supports more complex queries that take advantage of the new underlying graph-based data storage. By adopting graph database technology we are providing a high performance pathway data resource to the community. The Reactome graph database use case shows the power of NoSQL database engines for complex biological data types.

  5. Analysis of Information Needs of Users of MEDLINEplus, 2002 – 2003

    PubMed Central

    Scott-Wright, Alicia; Crowell, Jon; Zeng, Qing; Bates, David W.; Greenes, Robert

    2006-01-01

    We analyzed query logs from use of MEDLINEplus to answer the questions: Are consumers’ health information needs stable over time? and To what extent do users’ queries change over time? To determine log stability, we assessed an Overlap Rate (OR) defined as the number of unique queries common to two adjacent months divided by the total number of unique queries in those months. All exactly matching queries were considered as one unique query. We measured ORs for the top 10 and 100 unique queries of a month and compared these to ORs for the following month. Over ten months, users submitted 12,234,737 queries; only 2,179,571 (17.8%) were unique and these had a mean word count of 2.73 (S.D., 0.24); 121 of 137 (88.3%) unique queries each comprised of exactly matching search term(s) used at least 5000 times were of only one word. We could predict with 95% confidence that the monthly OR for the top 100 unique queries would lie between 67% – 87% when compared with the top 100 from the previous month. The mean month-to-month OR for top 10 queries was 62% (S.D., 20%) indicating significant variability; the lowest OR of 33% between the top 10 in Mar. compared to Apr. was likely due to “new” interest in information about SARS pneumonia in Apr. 2003. Consumers’ health information needs are relatively stable and the 100 most common unique queries are about 77% the same from month to month. Website sponsors should provide a broad range of information about a relatively stable number of topics. Analyses of log similarity may identify media-induced, cyclical, or seasonal changes in areas of consumer interest. PMID:17238431

  6. Generalized query-based active learning to identify differentially methylated regions in DNA.

    PubMed

    Haque, Md Muksitul; Holder, Lawrence B; Skinner, Michael K; Cook, Diane J

    2013-01-01

    Active learning is a supervised learning technique that reduces the number of examples required for building a successful classifier, because it can choose the data it learns from. This technique holds promise for many biological domains in which classified examples are expensive and time-consuming to obtain. Most traditional active learning methods ask very specific queries to the Oracle (e.g., a human expert) to label an unlabeled example. The example may consist of numerous features, many of which are irrelevant. Removing such features will create a shorter query with only relevant features, and it will be easier for the Oracle to answer. We propose a generalized query-based active learning (GQAL) approach that constructs generalized queries based on multiple instances. By constructing appropriately generalized queries, we can achieve higher accuracy compared to traditional active learning methods. We apply our active learning method to find differentially DNA methylated regions (DMRs). DMRs are DNA locations in the genome that are known to be involved in tissue differentiation, epigenetic regulation, and disease. We also apply our method on 13 other data sets and show that our method is better than another popular active learning technique.

  7. Concept locator: a client-server application for retrieval of UMLS metathesaurus concepts through complex boolean query.

    PubMed

    Nadkarni, P M

    1997-08-01

    Concept Locator (CL) is a client-server application that accesses a Sybase relational database server containing a subset of the UMLS Metathesaurus for the purpose of retrieval of concepts corresponding to one or more query expressions supplied to it. CL's query grammar permits complex Boolean expressions, wildcard patterns, and parenthesized (nested) subexpressions. CL translates the query expressions supplied to it into one or more SQL statements that actually perform the retrieval. The generated SQL is optimized by the client to take advantage of the strengths of the server's query optimizer, and sidesteps its weaknesses, so that execution is reasonably efficient.

  8. Analysis of queries sent to PubMed at the point of care: Observation of search behaviour in a medical teaching hospital

    PubMed Central

    Hoogendam, Arjen; Stalenhoef, Anton FH; Robbé, Pieter F de Vries; Overbeke, A John PM

    2008-01-01

    Background The use of PubMed to answer daily medical care questions is limited because it is challenging to retrieve a small set of relevant articles and time is restricted. Knowing what aspects of queries are likely to retrieve relevant articles can increase the effectiveness of PubMed searches. The objectives of our study were to identify queries that are likely to retrieve relevant articles by relating PubMed search techniques and tools to the number of articles retrieved and the selection of articles for further reading. Methods This was a prospective observational study of queries regarding patient-related problems sent to PubMed by residents and internists in internal medicine working in an Academic Medical Centre. We analyzed queries, search results, query tools (Mesh, Limits, wildcards, operators), selection of abstract and full-text for further reading, using a portal that mimics PubMed. Results PubMed was used to solve 1121 patient-related problems, resulting in 3205 distinct queries. Abstracts were viewed in 999 (31%) of these queries, and in 126 (39%) of 321 queries using query tools. The average term count per query was 2.5. Abstracts were selected in more than 40% of queries using four or five terms, increasing to 63% if the use of four or five terms yielded 2–161 articles. Conclusion Queries sent to PubMed by physicians at our hospital during daily medical care contain fewer than three terms. Queries using four to five terms, retrieving less than 161 article titles, are most likely to result in abstract viewing. PubMed search tools are used infrequently by our population and are less effective than the use of four or five terms. Methods to facilitate the formulation of precise queries, using more relevant terms, should be the focus of education and research. PMID:18816391

  9. Computerized Monitoring of the Inventory and Distribution of Research Chemicals

    ERIC Educational Resources Information Center

    And Others; Frycki, Stephen J.

    1973-01-01

    A one-time data entry system, coupled with an efficient use of the computer, which provides inventory management, distribution, and audit reporting, the ability to answer special queries, and to produce customized reports is described. (3 references) (Author)

  10. An advanced web query interface for biological databases

    PubMed Central

    Latendresse, Mario; Karp, Peter D.

    2010-01-01

    Although most web-based biological databases (DBs) offer some type of web-based form to allow users to author DB queries, these query forms are quite restricted in the complexity of DB queries that they can formulate. They can typically query only one DB, and can query only a single type of object at a time (e.g. genes) with no possible interaction between the objects—that is, in SQL parlance, no joins are allowed between DB objects. Writing precise queries against biological DBs is usually left to a programmer skillful enough in complex DB query languages like SQL. We present a web interface for building precise queries for biological DBs that can construct much more precise queries than most web-based query forms, yet that is user friendly enough to be used by biologists. It supports queries containing multiple conditions, and connecting multiple object types without using the join concept, which is unintuitive to biologists. This interactive web interface is called the Structured Advanced Query Page (SAQP). Users interactively build up a wide range of query constructs. Interactive documentation within the SAQP describes the schema of the queried DBs. The SAQP is based on BioVelo, a query language based on list comprehension. The SAQP is part of the Pathway Tools software and is available as part of several bioinformatics web sites powered by Pathway Tools, including the BioCyc.org site that contains more than 500 Pathway/Genome DBs. PMID:20624715

  11. Quantum Private Queries

    NASA Astrophysics Data System (ADS)

    Giovannetti, Vittorio; Lloyd, Seth; Maccone, Lorenzo

    2008-06-01

    We propose a cheat sensitive quantum protocol to perform a private search on a classical database which is efficient in terms of communication complexity. It allows a user to retrieve an item from the database provider without revealing which item he or she retrieved: if the provider tries to obtain information on the query, the person querying the database can find it out. The protocol ensures also perfect data privacy of the database: the information that the user can retrieve in a single query is bounded and does not depend on the size of the database. With respect to the known (quantum and classical) strategies for private information retrieval, our protocol displays an exponential reduction in communication complexity and in running-time computational complexity.

  12. Building a Smart Portal for Astronomy

    NASA Astrophysics Data System (ADS)

    Derriere, S.; Boch, T.

    2011-07-01

    The development of a portal for accessing astronomical resources is not an easy task. The ever-increasing complexity of the data products can result in very complex user interfaces, requiring a lot of effort and learning from the user in order to perform searches. This is often a design choice, where the user must explicitly set many constraints, while the portal search logic remains simple. We investigated a different approach, where the query interface is kept as simple as possible (ideally, a simple text field, like for Google search), and the search logic is made much more complex to interpret the query in a relevant manner. We will present the implications of this approach in terms of interpretation and categorization of the query parameters (related to astronomical vocabularies), translation (mapping) of these concepts into the portal components metadata, identification of query schemes and use cases matching the input parameters, and delivery of query results to the user.

  13. Robust QKD-based private database queries based on alternative sequences of single-qubit measurements

    NASA Astrophysics Data System (ADS)

    Yang, YuGuang; Liu, ZhiChao; Chen, XiuBo; Zhou, YiHua; Shi, WeiMin

    2017-12-01

    Quantum channel noise may cause the user to obtain a wrong answer and thus misunderstand the database holder for existing QKD-based quantum private query (QPQ) protocols. In addition, an outside attacker may conceal his attack by exploiting the channel noise. We propose a new, robust QPQ protocol based on four-qubit decoherence-free (DF) states. In contrast to existing QPQ protocols against channel noise, only an alternative fixed sequence of single-qubit measurements is needed by the user (Alice) to measure the received DF states. This property makes it easy to implement the proposed protocol by exploiting current technologies. Moreover, to retain the advantage of flexible database queries, we reconstruct Alice's measurement operators so that Alice needs only conditioned sequences of single-qubit measurements.

  14. Deductive Coordination of Multiple Geospatial Knowledge Sources

    NASA Astrophysics Data System (ADS)

    Waldinger, R.; Reddy, M.; Culy, C.; Hobbs, J.; Jarvis, P.; Dungan, J. L.

    2002-12-01

    Deductive inference is applied to choreograph the cooperation of multiple knowledge sources to respond to geospatial queries. When no one source can provide an answer, the response may be deduced from pieces of the answer provided by many sources. Examples of sources include (1) The Alexandria Digital Library Gazetteer, a repository that gives the locations for almost six million place names, (2) The Cia World Factbook, an online almanac with basic information about more than 200 countries. (3) The SRI TerraVision 3D Terrain Visualization System, which displays a flight-simulator-like interactive display of geographic data held in a database, (4) The NASA GDACC WebGIS client for searching satellite and other geographic data available through OpenGIS Consortium (OGC) Web Map Servers, and (5) The Northern Arizona University Latitude/Longitude Distance Calculator. Queries are phrased in English and are translated into logical theorems by the Gemini Natural Language Parser. The theorems are proved by SNARK, a first-order-logic theorem prover, in the context of an axiomatic geospatial theory. The theory embodies a representational scheme that takes into account the fact that the same place may have many names, and the same name may refer to many places. SNARK has built-in procedures (RCC8 and the Allen calculus, respectively) for reasoning about spatial and temporal concepts. External knowledge sources may be consulted by SNARK as the proof is in progress, so that most knowledge need not be stored axiomatically. The Open Agent Architecture (OAA) facilitates communication between sources that may be implemented on different machines in different computer languages. An answer to the query, in the form of text or an image, is extracted from the proof. Currently, three-dimensional images are displayed by TerraVision but other displays are possible. The combined system is called Geo-Logica. Some example queries that can be handled by Geo-Logica include: (1) show the petrified forests in Oregon north of Portland, (2) show the lake in Argentina with the highest elevation, and (3) Show the IGPB land cover classification, derived using MODIS, of Montana for July, 2000. Use of a theorem prover allows sources to cooperate even if they adapt different notational conventions and representation schemes and have never been designed to work together. New sources can be added without reprogramming the system, by providing axioms that advertise their capabilities. Future directions include entering into a dialogue with the user to clarify ambiguities, elaborate on previous questions, or provide new information necessary to answer the question. In addition, of particular interest is to deal with temporally varying data, with answers displayed as animated images.

  15. A common layer of interoperability for biomedical ontologies based on OWL EL.

    PubMed

    Hoehndorf, Robert; Dumontier, Michel; Oellrich, Anika; Wimalaratne, Sarala; Rebholz-Schuhmann, Dietrich; Schofield, Paul; Gkoutos, Georgios V

    2011-04-01

    Ontologies are essential in biomedical research due to their ability to semantically integrate content from different scientific databases and resources. Their application improves capabilities for querying and mining biological knowledge. An increasing number of ontologies is being developed for this purpose, and considerable effort is invested into formally defining them in order to represent their semantics explicitly. However, current biomedical ontologies do not facilitate data integration and interoperability yet, since reasoning over these ontologies is very complex and cannot be performed efficiently or is even impossible. We propose the use of less expressive subsets of ontology representation languages to enable efficient reasoning and achieve the goal of genuine interoperability between ontologies. We present and evaluate EL Vira, a framework that transforms OWL ontologies into the OWL EL subset, thereby enabling the use of tractable reasoning. We illustrate which OWL constructs and inferences are kept and lost following the conversion and demonstrate the performance gain of reasoning indicated by the significant reduction of processing time. We applied EL Vira to the open biomedical ontologies and provide a repository of ontologies resulting from this conversion. EL Vira creates a common layer of ontological interoperability that, for the first time, enables the creation of software solutions that can employ biomedical ontologies to perform inferences and answer complex queries to support scientific analyses. The EL Vira software is available from http://el-vira.googlecode.com and converted OBO ontologies and their mappings are available from http://bioonto.gen.cam.ac.uk/el-ont.

  16. BIOSPIDA: A Relational Database Translator for NCBI.

    PubMed

    Hagen, Matthew S; Lee, Eva K

    2010-11-13

    As the volume and availability of biological databases continue widespread growth, it has become increasingly difficult for research scientists to identify all relevant information for biological entities of interest. Details of nucleotide sequences, gene expression, molecular interactions, and three-dimensional structures are maintained across many different databases. To retrieve all necessary information requires an integrated system that can query multiple databases with minimized overhead. This paper introduces a universal parser and relational schema translator that can be utilized for all NCBI databases in Abstract Syntax Notation (ASN.1). The data models for OMIM, Entrez-Gene, Pubmed, MMDB and GenBank have been successfully converted into relational databases and all are easily linkable helping to answer complex biological questions. These tools facilitate research scientists to locally integrate databases from NCBI without significant workload or development time.

  17. Geometric Representations of Condition Queries on Three-Dimensional Vector Fields

    NASA Technical Reports Server (NTRS)

    Henze, Chris

    1999-01-01

    Condition queries on distributed data ask where particular conditions are satisfied. It is possible to represent condition queries as geometric objects by plotting field data in various spaces derived from the data, and by selecting loci within these derived spaces which signify the desired conditions. Rather simple geometric partitions of derived spaces can represent complex condition queries because much complexity can be encapsulated in the derived space mapping itself A geometric view of condition queries provides a useful conceptual unification, allowing one to intuitively understand many existing vector field feature detection algorithms -- and to design new ones -- as variations on a common theme. A geometric representation of condition queries also provides a simple and coherent basis for computer implementation, reducing a wide variety of existing and potential vector field feature detection techniques to a few simple geometric operations.

  18. NeuroLex.org: an online framework for neuroscience knowledge

    PubMed Central

    Larson, Stephen D.; Martone, Maryann E.

    2013-01-01

    The ability to transmit, organize, and query information digitally has brought with it the challenge of how to best use this power to facilitate scientific inquiry. Today, few information systems are able to provide detailed answers to complex questions about neuroscience that account for multiple spatial scales, and which cross the boundaries of diverse parts of the nervous system such as molecules, cellular parts, cells, circuits, systems and tissues. As a result, investigators still primarily seek answers to their questions in an increasingly densely populated collection of articles in the literature, each of which must be digested individually. If it were easier to search a knowledge base that was structured to answer neuroscience questions, such a system would enable questions to be answered in seconds that would otherwise require hours of literature review. In this article, we describe NeuroLex.org, a wiki-based website and knowledge management system. Its goal is to bring neurobiological knowledge into a framework that allows neuroscientists to review the concepts of neuroscience, with an emphasis on multiscale descriptions of the parts of nervous systems, aggregate their understanding with that of other scientists, link them to data sources and descriptions of important concepts in neuroscience, and expose parts that are still controversial or missing. To date, the site is tracking ~25,000 unique neuroanatomical parts and concepts in neurobiology spanning experimental techniques, behavioral paradigms, anatomical nomenclature, genes, proteins and molecules. Here we show how the structuring of information about these anatomical parts in the nervous system can be reused to answer multiple neuroscience questions, such as displaying all known GABAergic neurons aggregated in NeuroLex or displaying all brain regions that are known within NeuroLex to send axons into the cerebellar cortex. PMID:24009581

  19. Fast Inbound Top-K Query for Random Walk with Restart.

    PubMed

    Zhang, Chao; Jiang, Shan; Chen, Yucheng; Sun, Yidan; Han, Jiawei

    2015-09-01

    Random walk with restart (RWR) is widely recognized as one of the most important node proximity measures for graphs, as it captures the holistic graph structure and is robust to noise in the graph. In this paper, we study a novel query based on the RWR measure, called the inbound top-k (Ink) query. Given a query node q and a number k , the Ink query aims at retrieving k nodes in the graph that have the largest weighted RWR scores to q . Ink queries can be highly useful for various applications such as traffic scheduling, disease treatment, and targeted advertising. Nevertheless, none of the existing RWR computation techniques can accurately and efficiently process the Ink query in large graphs. We propose two algorithms, namely Squeeze and Ripple, both of which can accurately answer the Ink query in a fast and incremental manner. To identify the top- k nodes, Squeeze iteratively performs matrix-vector multiplication and estimates the lower and upper bounds for all the nodes in the graph. Ripple employs a more aggressive strategy by only estimating the RWR scores for the nodes falling in the vicinity of q , the nodes outside the vicinity do not need to be evaluated because their RWR scores are propagated from the boundary of the vicinity and thus upper bounded. Ripple incrementally expands the vicinity until the top- k result set can be obtained. Our extensive experiments on real-life graph data sets show that Ink queries can retrieve interesting results, and the proposed algorithms are orders of magnitude faster than state-of-the-art method.

  20. Relatives and friends queries on a psychologists "question and answer" forum online - authorship and contents.

    PubMed

    Stjernswärd, Sigrid

    2015-03-01

    Families living with mental illness can experience added burden and eventually own ill health. National and international guidelines support the development of web based solutions to cost-effectively address health care needs. Mapping patterns of internet use may help tailor interventions to effectively address users' needs. The study's aim was to explore the contents of relatives' and friends' queries on a psychologists' website with professional feedback. All visible questions [n = 59] classified under the website's "helping a relative/friend" category between 20 090 615-20 130 927 were printed out and analyzed using content analysis. The analysis resulted in four categories and subcategories illuminating families'/friends' areas of concern: support to help; concerns with health care; young children's welfare; and repercussions on own health. "Question & Answer" [Q&A] forums can shed light onto health seekers' online behavior and may be a way of addressing families' needs of support and information. Further studies are needed to assess the replies' therapeutic value for their recipients.

  1. Copyright information queries in the health sciences: trends and implications from the Ohio State University

    PubMed Central

    Gilliland, Anne T.; Bradigan, Pamela S.

    2014-01-01

    Objective: This paper presents the results of data gathered on copyright questions asked at an academic health sciences library. Methods: Collected data include questioner's status or discipline, the subject of the questions, the types of activities that the questioners were engaged in, the communication mode, and the length of time it took to answer the questions. Results: Overall results showed most questions were about permissions. Staff asked the most questions, followed by faculty and students. Conclusions: Copyright education is needed at universities, and further analysis of queries will determine the direction of the education. PMID:24860269

  2. My Corporis Fabrica: an ontology-based tool for reasoning and querying on complex anatomical models

    PubMed Central

    2014-01-01

    Background Multiple models of anatomy have been developed independently and for different purposes. In particular, 3D graphical models are specially useful for visualizing the different organs composing the human body, while ontologies such as FMA (Foundational Model of Anatomy) are symbolic models that provide a unified formal description of anatomy. Despite its comprehensive content concerning the anatomical structures, the lack of formal descriptions of anatomical functions in FMA limits its usage in many applications. In addition, the absence of connection between 3D models and anatomical ontologies makes it difficult and time-consuming to set up and access to the anatomical content of complex 3D objects. Results First, we provide a new ontology of anatomy called My Corporis Fabrica (MyCF), which conforms to FMA but extends it by making explicit how anatomical structures are composed, how they contribute to functions, and also how they can be related to 3D complex objects. Second, we have equipped MyCF with automatic reasoning capabilities that enable model checking and complex queries answering. We illustrate the added-value of such a declarative approach for interactive simulation and visualization as well as for teaching applications. Conclusions The novel vision of ontologies that we have developed in this paper enables a declarative assembly of different models to obtain composed models guaranteed to be anatomically valid while capturing the complexity of human anatomy. The main interest of this approach is its declarativity that makes possible for domain experts to enrich the knowledge base at any moment through simple editors without having to change the algorithmic machinery. This provides MyCF software environment a flexibility to process and add semantics on purpose for various applications that incorporate not only symbolic information but also 3D geometric models representing anatomical entities as well as other symbolic information like the anatomical functions. PMID:24936286

  3. Searching Electronic Health Records for Temporal Patterns in Patient Histories: A Case Study with Microsoft Amalga

    PubMed Central

    Plaisant, Catherine; Lam, Stanley; Shneiderman, Ben; Smith, Mark S.; Roseman, David; Marchand, Greg; Gillam, Michael; Feied, Craig; Handler, Jonathan; Rappaport, Hank

    2008-01-01

    As electronic health records (EHR) become more widespread, they enable clinicians and researchers to pose complex queries that can benefit immediate patient care and deepen understanding of medical treatment and outcomes. However, current query tools make complex temporal queries difficult to pose, and physicians have to rely on computer professionals to specify the queries for them. This paper describes our efforts to develop a novel query tool implemented in a large operational system at the Washington Hospital Center (Microsoft Amalga, formerly known as Azyxxi). We describe our design of the interface to specify temporal patterns and the visual presentation of results, and report on a pilot user study looking for adverse reactions following radiology studies using contrast. PMID:18999158

  4. Chemical-text hybrid search engines.

    PubMed

    Zhou, Yingyao; Zhou, Bin; Jiang, Shumei; King, Frederick J

    2010-01-01

    As the amount of chemical literature increases, it is critical that researchers be enabled to accurately locate documents related to a particular aspect of a given compound. Existing solutions, based on text and chemical search engines alone, suffer from the inclusion of "false negative" and "false positive" results, and cannot accommodate diverse repertoire of formats currently available for chemical documents. To address these concerns, we developed an approach called Entity-Canonical Keyword Indexing (ECKI), which converts a chemical entity embedded in a data source into its canonical keyword representation prior to being indexed by text search engines. We implemented ECKI using Microsoft Office SharePoint Server Search, and the resultant hybrid search engine not only supported complex mixed chemical and keyword queries but also was applied to both intranet and Internet environments. We envision that the adoption of ECKI will empower researchers to pose more complex search questions that were not readily attainable previously and to obtain answers at much improved speed and accuracy.

  5. An evidential approach to problem solving when a large number of knowledge systems is available

    NASA Technical Reports Server (NTRS)

    Dekorvin, Andre

    1989-01-01

    Some recent problems are no longer formulated in terms of imprecise facts, missing data or inadequate measuring devices. Instead, questions pertaining to knowledge and information itself arise and can be phrased independently of any particular area of knowledge. The problem considered in the present work is how to model a problem solver that is trying to find the answer to some query. The problem solver has access to a large number of knowledge systems that specialize in diverse features. In this context, feature means an indicator of what the possibilities for the answer are. The knowledge systems should not be accessed more than once, in order to have truly independent sources of information. Moreover, these systems are allowed to run in parallel. Since access might be expensive, it is necessary to construct a management policy for accessing these knowledge systems. To help in the access policy, some control knowledge systems are available. Control knowledge systems have knowledge about the performance parameters status of the knowledge systems. In order to carry out the double goal of estimating what units to access and to answer the given query, diverse pieces of evidence must be fused. The Dempster-Shafer Theory of Evidence is used to pool the knowledge bases.

  6. SEQUOIA: significance enhanced network querying through context-sensitive random walk and minimization of network conductance.

    PubMed

    Jeong, Hyundoo; Yoon, Byung-Jun

    2017-03-14

    Network querying algorithms provide computational means to identify conserved network modules in large-scale biological networks that are similar to known functional modules, such as pathways or molecular complexes. Two main challenges for network querying algorithms are the high computational complexity of detecting potential isomorphism between the query and the target graphs and ensuring the biological significance of the query results. In this paper, we propose SEQUOIA, a novel network querying algorithm that effectively addresses these issues by utilizing a context-sensitive random walk (CSRW) model for network comparison and minimizing the network conductance of potential matches in the target network. The CSRW model, inspired by the pair hidden Markov model (pair-HMM) that has been widely used for sequence comparison and alignment, can accurately assess the node-to-node correspondence between different graphs by accounting for node insertions and deletions. The proposed algorithm identifies high-scoring network regions based on the CSRW scores, which are subsequently extended by maximally reducing the network conductance of the identified subnetworks. Performance assessment based on real PPI networks and known molecular complexes show that SEQUOIA outperforms existing methods and clearly enhances the biological significance of the query results. The source code and datasets can be downloaded from http://www.ece.tamu.edu/~bjyoon/SEQUOIA .

  7. NOUS: Construction and Querying of Dynamic Knowledge Graphs

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Choudhury, Sutanay; Agarwal, Khushbu; Purohit, Sumit

    The ability to construct domain specific knowledge graphs (KG) and perform question-answering or hypothesis generation is a trans- formative capability. Despite their value, automated construction of knowledge graphs remains an expensive technical challenge that is beyond the reach for most enterprises and academic institutions. We propose an end-to-end framework for developing custom knowl- edge graph driven analytics for arbitrary application domains. The uniqueness of our system lies A) in its combination of curated KGs along with knowledge extracted from unstructured text, B) support for advanced trending and explanatory questions on a dynamic KG, and C) the ability to answer queriesmore » where the answer is embedded across multiple data sources.« less

  8. Semantic Services in e-Learning: An Argumentation Case Study

    ERIC Educational Resources Information Center

    Moreale, Emanuela; Vargas-Vera, Maria

    2004-01-01

    This paper outlines an e-Learning services architecture offering semantic-based services to students and tutors, in particular ways to browse and obtain information through web services. Services could include registration, authentication, tutoring systems, smart question answering for students' queries, automated marking systems and a student…

  9. Newspapers and Electronic Databases: Present Technology.

    ERIC Educational Resources Information Center

    Newcombe, Barbara; Trivedi, Harish

    1984-01-01

    Discusses technology used to preserve, control, index, and retrieve information in newspapers, highlighting ways to record analyses of news stories, storage/indexing systems based on computers, information as salable commodity, preparation of news for electronic storage, answering in-house queries, questions of copyright and invasion of privacy,…

  10. Guiding Students to Answers: Query Recommendation

    ERIC Educational Resources Information Center

    Yilmazel, Ozgur

    2011-01-01

    This paper reports on a guided navigation system built on the textbook search engine developed at Anadolu University to support distance education students. The search engine uses Turkish Language specific language processing modules to enable searches over course material presented in Open Education Faculty textbooks. We implemented a guided…

  11. Beyond Information Retrieval—Medical Question Answering

    PubMed Central

    Lee, Minsuk; Cimino, James; Zhu, Hai Ran; Sable, Carl; Shanker, Vijay; Ely, John; Yu, Hong

    2006-01-01

    Physicians have many questions when caring for patients, and frequently need to seek answers for their questions. Information retrieval systems (e.g., PubMed) typically return a list of documents in response to a user’s query. Frequently the number of returned documents is large and makes physicians’ information seeking “practical only ‘after hours’ and not in the clinical settings”. Question answering techniques are based on automatically analyzing thousands of electronic documents to generate short-text answers in response to clinical questions that are posed by physicians. The authors address physicians’ information needs and described the design, implementation, and evaluation of the medical question answering system (MedQA). Although our long term goal is to enable MedQA to answer all types of medical questions, currently, we currently implement MedQA to integrate information retrieval, extraction, and summarization techniques to automatically generate paragraph-level text for definitional questions (i.e., “What is X?”). MedQA can be accessed at http://www.dbmi.columbia.edu/~yuh9001/research/MedQA.html. PMID:17238385

  12. Comparative study on the customization of natural language interfaces to databases.

    PubMed

    Pazos R, Rodolfo A; Aguirre L, Marco A; González B, Juan J; Martínez F, José A; Pérez O, Joaquín; Verástegui O, Andrés A

    2016-01-01

    In the last decades the popularity of natural language interfaces to databases (NLIDBs) has increased, because in many cases information obtained from them is used for making important business decisions. Unfortunately, the complexity of their customization by database administrators make them difficult to use. In order for a NLIDB to obtain a high percentage of correctly translated queries, it is necessary that it is correctly customized for the database to be queried. In most cases the performance reported in NLIDB literature is the highest possible; i.e., the performance obtained when the interfaces were customized by the implementers. However, for end users it is more important the performance that the interface can yield when the NLIDB is customized by someone different from the implementers. Unfortunately, there exist very few articles that report NLIDB performance when the NLIDBs are not customized by the implementers. This article presents a semantically-enriched data dictionary (which permits solving many of the problems that occur when translating from natural language to SQL) and an experiment in which two groups of undergraduate students customized our NLIDB and English language frontend (ELF), considered one of the best available commercial NLIDBs. The experimental results show that, when customized by the first group, our NLIDB obtained a 44.69 % of correctly answered queries and ELF 11.83 % for the ATIS database, and when customized by the second group, our NLIDB attained 77.05 % and ELF 13.48 %. The performance attained by our NLIDB, when customized by ourselves was 90 %.

  13. RDF-GL: A SPARQL-Based Graphical Query Language for RDF

    NASA Astrophysics Data System (ADS)

    Hogenboom, Frederik; Milea, Viorel; Frasincar, Flavius; Kaymak, Uzay

    This chapter presents RDF-GL, a graphical query language (GQL) for RDF. The GQL is based on the textual query language SPARQL and mainly focuses on SPARQL SELECT queries. The advantage of a GQL over textual query languages is that complexity is hidden through the use of graphical symbols. RDF-GL is supported by a Java-based editor, SPARQLinG, which is presented as well. The editor does not only allow for RDF-GL query creation, but also converts RDF-GL queries to SPARQL queries and is able to subsequently execute these. Experiments show that using the GQL in combination with the editor makes RDF querying more accessible for end users.

  14. Getting Answers to Natural Language Questions on the Web.

    ERIC Educational Resources Information Center

    Radev, Dragomir R.; Libner, Kelsey; Fan, Weiguo

    2002-01-01

    Describes a study that investigated the use of natural language questions on Web search engines. Highlights include query languages; differences in search engine syntax; and results of logistic regression and analysis of variance that showed aspects of questions that predicted significantly different performances, including the number of words,…

  15. "Just the Answers, Please": Choosing a Web Search Service.

    ERIC Educational Resources Information Center

    Feldman, Susan

    1997-01-01

    Presents guidelines for selecting World Wide Web search engines. Real-life questions were used to test six search engines. Queries sought company information, product reviews, medical information, foreign information, technical reports, and current events. Compares performance and features of AltaVista, Excite, HotBot, Infoseek, Lycos, and Open…

  16. AlzPharm: integration of neurodegeneration data using RDF.

    PubMed

    Lam, Hugo Y K; Marenco, Luis; Clark, Tim; Gao, Yong; Kinoshita, June; Shepherd, Gordon; Miller, Perry; Wu, Elizabeth; Wong, Gwendolyn T; Liu, Nian; Crasto, Chiquito; Morse, Thomas; Stephens, Susie; Cheung, Kei-Hoi

    2007-05-09

    Neuroscientists often need to access a wide range of data sets distributed over the Internet. These data sets, however, are typically neither integrated nor interoperable, resulting in a barrier to answering complex neuroscience research questions. Domain ontologies can enable the querying heterogeneous data sets, but they are not sufficient for neuroscience since the data of interest commonly span multiple research domains. To this end, e-Neuroscience seeks to provide an integrated platform for neuroscientists to discover new knowledge through seamless integration of the very diverse types of neuroscience data. Here we present a Semantic Web approach to building this e-Neuroscience framework by using the Resource Description Framework (RDF) and its vocabulary description language, RDF Schema (RDFS), as a standard data model to facilitate both representation and integration of the data. We have constructed a pilot ontology for BrainPharm (a subset of SenseLab) using RDFS and then converted a subset of the BrainPharm data into RDF according to the ontological structure. We have also integrated the converted BrainPharm data with existing RDF hypothesis and publication data from a pilot version of SWAN (Semantic Web Applications in Neuromedicine). Our implementation uses the RDF Data Model in Oracle Database 10g release 2 for data integration, query, and inference, while our Web interface allows users to query the data and retrieve the results in a convenient fashion. Accessing and integrating biomedical data which cuts across multiple disciplines will be increasingly indispensable and beneficial to neuroscience researchers. The Semantic Web approach we undertook has demonstrated a promising way to semantically integrate data sets created independently. It also shows how advanced queries and inferences can be performed over the integrated data, which are hard to achieve using traditional data integration approaches. Our pilot results suggest that our Semantic Web approach is suitable for realizing e-Neuroscience and generic enough to be applied in other biomedical fields.

  17. AlzPharm: integration of neurodegeneration data using RDF

    PubMed Central

    Lam, Hugo YK; Marenco, Luis; Clark, Tim; Gao, Yong; Kinoshita, June; Shepherd, Gordon; Miller, Perry; Wu, Elizabeth; Wong, Gwendolyn T; Liu, Nian; Crasto, Chiquito; Morse, Thomas; Stephens, Susie; Cheung, Kei-Hoi

    2007-01-01

    Background Neuroscientists often need to access a wide range of data sets distributed over the Internet. These data sets, however, are typically neither integrated nor interoperable, resulting in a barrier to answering complex neuroscience research questions. Domain ontologies can enable the querying heterogeneous data sets, but they are not sufficient for neuroscience since the data of interest commonly span multiple research domains. To this end, e-Neuroscience seeks to provide an integrated platform for neuroscientists to discover new knowledge through seamless integration of the very diverse types of neuroscience data. Here we present a Semantic Web approach to building this e-Neuroscience framework by using the Resource Description Framework (RDF) and its vocabulary description language, RDF Schema (RDFS), as a standard data model to facilitate both representation and integration of the data. Results We have constructed a pilot ontology for BrainPharm (a subset of SenseLab) using RDFS and then converted a subset of the BrainPharm data into RDF according to the ontological structure. We have also integrated the converted BrainPharm data with existing RDF hypothesis and publication data from a pilot version of SWAN (Semantic Web Applications in Neuromedicine). Our implementation uses the RDF Data Model in Oracle Database 10g release 2 for data integration, query, and inference, while our Web interface allows users to query the data and retrieve the results in a convenient fashion. Conclusion Accessing and integrating biomedical data which cuts across multiple disciplines will be increasingly indispensable and beneficial to neuroscience researchers. The Semantic Web approach we undertook has demonstrated a promising way to semantically integrate data sets created independently. It also shows how advanced queries and inferences can be performed over the integrated data, which are hard to achieve using traditional data integration approaches. Our pilot results suggest that our Semantic Web approach is suitable for realizing e-Neuroscience and generic enough to be applied in other biomedical fields. PMID:17493287

  18. Capturing domain knowledge from multiple sources: the rare bone disorders use case.

    PubMed

    Groza, Tudor; Tudorache, Tania; Robinson, Peter N; Zankl, Andreas

    2015-01-01

    Lately, ontologies have become a fundamental building block in the process of formalising and storing complex biomedical information. The community-driven ontology curation process, however, ignores the possibility of multiple communities building, in parallel, conceptualisations of the same domain, and thus providing slightly different perspectives on the same knowledge. The individual nature of this effort leads to the need of a mechanism to enable us to create an overarching and comprehensive overview of the different perspectives on the domain knowledge. We introduce an approach that enables the loose integration of knowledge emerging from diverse sources under a single coherent interoperable resource. To accurately track the original knowledge statements, we record the provenance at very granular levels. We exemplify the approach in the rare bone disorders domain by proposing the Rare Bone Disorders Ontology (RBDO). Using RBDO, researchers are able to answer queries, such as: "What phenotypes describe a particular disorder and are common to all sources?" or to understand similarities between disorders based on divergent groupings (classifications) provided by the underlying sources. RBDO is available at http://purl.org/skeletome/rbdo. In order to support lightweight query and integration, the knowledge captured by RBDO has also been made available as a SPARQL Endpoint at http://bio-lark.org/se_skeldys.html.

  19. SPARK: Adapting Keyword Query to Semantic Search

    NASA Astrophysics Data System (ADS)

    Zhou, Qi; Wang, Chong; Xiong, Miao; Wang, Haofen; Yu, Yong

    Semantic search promises to provide more accurate result than present-day keyword search. However, progress with semantic search has been delayed due to the complexity of its query languages. In this paper, we explore a novel approach of adapting keywords to querying the semantic web: the approach automatically translates keyword queries into formal logic queries so that end users can use familiar keywords to perform semantic search. A prototype system named 'SPARK' has been implemented in light of this approach. Given a keyword query, SPARK outputs a ranked list of SPARQL queries as the translation result. The translation in SPARK consists of three major steps: term mapping, query graph construction and query ranking. Specifically, a probabilistic query ranking model is proposed to select the most likely SPARQL query. In the experiment, SPARK achieved an encouraging translation result.

  20. BEANS - a software package for distributed Big Data analysis

    NASA Astrophysics Data System (ADS)

    Hypki, Arkadiusz

    2018-07-01

    BEANS software is a web-based, easy to install and maintain, new tool to store and analyse in a distributed way a massive amount of data. It provides a clear interface for querying, filtering, aggregating, and plotting data from an arbitrary number of data sets. Its main purpose is to simplify the process of storing, examining, and finding new relations in huge data sets. The software is an answer to a growing need of the astronomical community to have a versatile tool to store, analyse, and compare the complex astrophysical numerical simulations with observations (e.g. simulations of the Galaxy or star clusters with the Gaia archive). However, this software was built in a general form and it is ready to use in any other research field. It can be used as a building block for other open-source software too.

  1. BEANS - a software package for distributed Big Data analysis

    NASA Astrophysics Data System (ADS)

    Hypki, Arkadiusz

    2018-03-01

    BEANS software is a web based, easy to install and maintain, new tool to store and analyse in a distributed way a massive amount of data. It provides a clear interface for querying, filtering, aggregating, and plotting data from an arbitrary number of datasets. Its main purpose is to simplify the process of storing, examining and finding new relations in huge datasets. The software is an answer to a growing need of the astronomical community to have a versatile tool to store, analyse and compare the complex astrophysical numerical simulations with observations (e.g. simulations of the Galaxy or star clusters with the Gaia archive). However, this software was built in a general form and it is ready to use in any other research field. It can be used as a building block for other open source software too.

  2. Taking Open Innovation to the Molecular Level - Strengths and Limitations.

    PubMed

    Zdrazil, Barbara; Blomberg, Niklas; Ecker, Gerhard F

    2012-08-01

    The ever-growing availability of large-scale open data and its maturation is having a significant impact on industrial drug-discovery, as well as on academic and non-profit research. As industry is changing to an 'open innovation' business concept, precompetitive initiatives and strong public-private partnerships including academic research cooperation partners are gaining more and more importance. Now, the bioinformatics and cheminformatics communities are seeking for web tools which allow the integration of this large volume of life science datasets available in the public domain. Such a data exploitation tool would ideally be able to answer complex biological questions by formulating only one search query. In this short review/perspective, we outline the use of semantic web approaches for data and knowledge integration. Further, we discuss strengths and current limitations of public available data retrieval tools and integrated platforms.

  3. Finding and accessing diagrams in biomedical publications.

    PubMed

    Kuhn, Tobias; Luong, ThaiBinh; Krauthammer, Michael

    2012-01-01

    Complex relationships in biomedical publications are often communicated by diagrams such as bar and line charts, which are a very effective way of summarizing and communicating multi-faceted data sets. Given the ever-increasing amount of published data, we argue that the precise retrieval of such diagrams is of great value for answering specific and otherwise hard-to-meet information needs. To this end, we demonstrate the use of advanced image processing and classification for identifying bar and line charts by the shape and relative location of the different image elements that make up the charts. With recall and precisions of close to 90% for the detection of relevant figures, we discuss the use of this technology in an existing biomedical image search engine, and outline how it enables new forms of literature queries over biomedical relationships that are represented in these charts.

  4. BIOSPIDA: A Relational Database Translator for NCBI

    PubMed Central

    Hagen, Matthew S.; Lee, Eva K.

    2010-01-01

    As the volume and availability of biological databases continue widespread growth, it has become increasingly difficult for research scientists to identify all relevant information for biological entities of interest. Details of nucleotide sequences, gene expression, molecular interactions, and three-dimensional structures are maintained across many different databases. To retrieve all necessary information requires an integrated system that can query multiple databases with minimized overhead. This paper introduces a universal parser and relational schema translator that can be utilized for all NCBI databases in Abstract Syntax Notation (ASN.1). The data models for OMIM, Entrez-Gene, Pubmed, MMDB and GenBank have been successfully converted into relational databases and all are easily linkable helping to answer complex biological questions. These tools facilitate research scientists to locally integrate databases from NCBI without significant workload or development time. PMID:21347013

  5. Use of synthesized data to support complex ad-hoc queries in an enterprise information warehouse: a diabetes use case.

    PubMed

    Rogers, Patrick; Erdal, Selnur; Santangelo, Jennifer; Liu, Jianhua; Schuster, Dara; Kamal, Jyoti

    2008-11-06

    The Ohio State University Medical Center (OSUMC) Information Warehouse (IW) is a comprehensive data warehousing facility incorporating operational, clinical, and biological data sets from multiple enterprise system. It is common for users of the IW to request complex ad-hoc queries that often require significant intervention by data analyst. In response to this challenge, we have designed a workflow that leverages synthesized data elements to support such queries in an more timely, efficient manner.

  6. Development of a replicated database of DHCP data for evaluation of drug use.

    PubMed Central

    Graber, S E; Seneker, J A; Stahl, A A; Franklin, K O; Neel, T E; Miller, R A

    1996-01-01

    This case report describes development and testing of a method to extract clinical information stored in the Veterans Affairs (VA) Decentralized Hospital Computer System (DHCP) for the purpose of analyzing data about groups of patients. The authors used a microcomputer-based, structured query language (SQL)-compatible, relational database system to replicate a subset of the Nashville VA Hospital's DHCP patient database. This replicated database contained the complete current Nashville DHCP prescription, provider, patient, and drug data sets, and a subset of the laboratory data. A pilot project employed this replicated database to answer questions that might arise in drug-use evaluation, such as identification of cases of polypharmacy, suboptimal drug regimens, and inadequate laboratory monitoring of drug therapy. These database queries included as candidates for review all prescriptions for all outpatients. The queries demonstrated that specific drug-use events could be identified for any time interval represented in the replicated database. PMID:8653451

  7. Development of a replicated database of DHCP data for evaluation of drug use.

    PubMed

    Graber, S E; Seneker, J A; Stahl, A A; Franklin, K O; Neel, T E; Miller, R A

    1996-01-01

    This case report describes development and testing of a method to extract clinical information stored in the Veterans Affairs (VA) Decentralized Hospital Computer System (DHCP) for the purpose of analyzing data about groups of patients. The authors used a microcomputer-based, structured query language (SQL)-compatible, relational database system to replicate a subset of the Nashville VA Hospital's DHCP patient database. This replicated database contained the complete current Nashville DHCP prescription, provider, patient, and drug data sets, and a subset of the laboratory data. A pilot project employed this replicated database to answer questions that might arise in drug-use evaluation, such as identification of cases of polypharmacy, suboptimal drug regimens, and inadequate laboratory monitoring of drug therapy. These database queries included as candidates for review all prescriptions for all outpatients. The queries demonstrated that specific drug-use events could be identified for any time interval represented in the replicated database.

  8. Breaking the Curse of Cardinality on Bitmap Indexes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wu, Kesheng; Wu, Kesheng; Stockinger, Kurt

    2008-04-04

    Bitmap indexes are known to be efficient for ad-hoc range queries that are common in data warehousing and scientific applications. However, they suffer from the curse of cardinality, that is, their efficiency deteriorates as attribute cardinalities increase. A number of strategies have been proposed, but none of them addresses the problem adequately. In this paper, we propose a novel binned bitmap index that greatly reduces the cost to answer queries, and therefore breaks the curse of cardinality. The key idea is to augment the binned index with an Order-preserving Bin-based Clustering (OrBiC) structure. This data structure significantly reduces the I/Omore » operations needed to resolve records that cannot be resolved with the bitmaps. To further improve the proposed index structure, we also present a strategy to create single-valued bins for frequent values. This strategy reduces index sizes and improves query processing speed. Overall, the binned indexes with OrBiC great improves the query processing speed, and are 3 - 25 times faster than the best available indexes for high-cardinality data.« less

  9. Using discordance to improve classification in narrative clinical databases: an application to community-acquired pneumonia.

    PubMed

    Hripcsak, George; Knirsch, Charles; Zhou, Li; Wilcox, Adam; Melton, Genevieve B

    2007-03-01

    Data mining in electronic medical records may facilitate clinical research, but much of the structured data may be miscoded, incomplete, or non-specific. The exploitation of narrative data using natural language processing may help, although nesting, varying granularity, and repetition remain challenges. In a study of community-acquired pneumonia using electronic records, these issues led to poor classification. Limiting queries to accurate, complete records led to vastly reduced, possibly biased samples. We exploited knowledge latent in the electronic records to improve classification. A similarity metric was used to cluster cases. We defined discordance as the degree to which cases within a cluster give different answers for some query that addresses a classification task of interest. Cases with higher discordance are more likely to be incorrectly classified, and can be reviewed manually to adjust the classification, improve the query, or estimate the likely accuracy of the query. In a study of pneumonia--in which the ICD9-CM coding was found to be very poor--the discordance measure was statistically significantly correlated with classification correctness (.45; 95% CI .15-.62).

  10. Information Network Model Query Processing

    NASA Astrophysics Data System (ADS)

    Song, Xiaopu

    Information Networking Model (INM) [31] is a novel database model for real world objects and relationships management. It naturally and directly supports various kinds of static and dynamic relationships between objects. In INM, objects are networked through various natural and complex relationships. INM Query Language (INM-QL) [30] is designed to explore such information network, retrieve information about schema, instance, their attributes, relationships, and context-dependent information, and process query results in the user specified form. INM database management system has been implemented using Berkeley DB, and it supports INM-QL. This thesis is mainly focused on the implementation of the subsystem that is able to effectively and efficiently process INM-QL. The subsystem provides a lexical and syntactical analyzer of INM-QL, and it is able to choose appropriate evaluation strategies and index mechanism to process queries in INM-QL without the user's intervention. It also uses intermediate result structure to hold intermediate query result and other helping structures to reduce complexity of query processing.

  11. SPARQL Query Re-writing Using Partonomy Based Transformation Rules

    NASA Astrophysics Data System (ADS)

    Jain, Prateek; Yeh, Peter Z.; Verma, Kunal; Henson, Cory A.; Sheth, Amit P.

    Often the information present in a spatial knowledge base is represented at a different level of granularity and abstraction than the query constraints. For querying ontology's containing spatial information, the precise relationships between spatial entities has to be specified in the basic graph pattern of SPARQL query which can result in long and complex queries. We present a novel approach to help users intuitively write SPARQL queries to query spatial data, rather than relying on knowledge of the ontology structure. Our framework re-writes queries, using transformation rules to exploit part-whole relations between geographical entities to address the mismatches between query constraints and knowledge base. Our experiments were performed on completely third party datasets and queries. Evaluations were performed on Geonames dataset using questions from National Geographic Bee serialized into SPARQL and British Administrative Geography Ontology using questions from a popular trivia website. These experiments demonstrate high precision in retrieval of results and ease in writing queries.

  12. Aesthetic Inquiry in Education: Community, Transcendence, and the Meaning of Pedagogy

    ERIC Educational Resources Information Center

    Alexander, Hanan A.

    2003-01-01

    What does it mean to understand education as an art, to conceive inquiry in education aesthetically, or to assess pedagogy artistically? Answers to these queries are often grounded in Deweyan instrumentalism, neo-Marxist critical theory, or postmodern skepticism that tend to fall prey to the paradoxes of radical relativism and extreme…

  13. "Don't Know" Responding to Answerable and Unanswerable Questions during Misleading and Hypnotic Interviews

    ERIC Educational Resources Information Center

    Scoboria, Alan; Mazzoni, Giuliana; Kirsch, Irving

    2008-01-01

    "Don't know" (DK) responses to interview questions are conceptually heterogeneous, and may represent uncertainty or clear statements about the contents of memory. A study examined the subjective intent of DK responses in relation to the objective status of information queried, in the context of memory distorting procedures. Participants…

  14. Efficient Instant Search

    ERIC Educational Resources Information Center

    Ji, Shengyue

    2011-01-01

    Traditional information systems return answers after a user submits a complete query. Users often feel "left in the dark" when they have limited knowledge about the underlying data, and have to use a try-and-see approach for finding information. The trend of supporting autocomplete in these systems is a first step towards solving this problem. A…

  15. A Semantic Parsing Method for Mapping Clinical Questions to Logical Forms

    PubMed Central

    Roberts, Kirk; Patra, Braja Gopal

    2017-01-01

    This paper presents a method for converting natural language questions about structured data in the electronic health record (EHR) into logical forms. The logical forms can then subsequently be converted to EHR-dependent structured queries. The natural language processing task, known as semantic parsing, has the potential to convert questions to logical forms with extremely high precision, resulting in a system that is usable and trusted by clinicians for real-time use in clinical settings. We propose a hybrid semantic parsing method, combining rule-based methods with a machine learning-based classifier. The overall semantic parsing precision on a set of 212 questions is 95.6%. The parser’s rules furthermore allow it to “know what it does not know”, enabling the system to indicate when unknown terms prevent it from understanding the question’s full logical structure. When combined with a module for converting a logical form into an EHR-dependent query, this high-precision approach allows for a question answering system to provide a user with a single, verifiably correct answer. PMID:29854217

  16. Multi-Level Bitmap Indexes for Flash Memory Storage

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wu, Kesheng; Madduri, Kamesh; Canon, Shane

    2010-07-23

    Due to their low access latency, high read speed, and power-efficient operation, flash memory storage devices are rapidly emerging as an attractive alternative to traditional magnetic storage devices. However, tests show that the most efficient indexing methods are not able to take advantage of the flash memory storage devices. In this paper, we present a set of multi-level bitmap indexes that can effectively take advantage of flash storage devices. These indexing methods use coarsely binned indexes to answer queries approximately, and then use finely binned indexes to refine the answers. Our new methods read significantly lower volumes of data atmore » the expense of an increased disk access count, thus taking full advantage of the improved read speed and low access latency of flash devices. To demonstrate the advantage of these new indexes, we measure their performance on a number of storage systems using a standard data warehousing benchmark called the Set Query Benchmark. We observe that multi-level strategies on flash drives are up to 3 times faster than traditional indexing strategies on magnetic disk drives.« less

  17. Efficient Queries of Stand-off Annotations for Natural Language Processing on Electronic Medical Records.

    PubMed

    Luo, Yuan; Szolovits, Peter

    2016-01-01

    In natural language processing, stand-off annotation uses the starting and ending positions of an annotation to anchor it to the text and stores the annotation content separately from the text. We address the fundamental problem of efficiently storing stand-off annotations when applying natural language processing on narrative clinical notes in electronic medical records (EMRs) and efficiently retrieving such annotations that satisfy position constraints. Efficient storage and retrieval of stand-off annotations can facilitate tasks such as mapping unstructured text to electronic medical record ontologies. We first formulate this problem into the interval query problem, for which optimal query/update time is in general logarithm. We next perform a tight time complexity analysis on the basic interval tree query algorithm and show its nonoptimality when being applied to a collection of 13 query types from Allen's interval algebra. We then study two closely related state-of-the-art interval query algorithms, proposed query reformulations, and augmentations to the second algorithm. Our proposed algorithm achieves logarithmic time stabbing-max query time complexity and solves the stabbing-interval query tasks on all of Allen's relations in logarithmic time, attaining the theoretic lower bound. Updating time is kept logarithmic and the space requirement is kept linear at the same time. We also discuss interval management in external memory models and higher dimensions.

  18. Efficient Queries of Stand-off Annotations for Natural Language Processing on Electronic Medical Records

    PubMed Central

    Luo, Yuan; Szolovits, Peter

    2016-01-01

    In natural language processing, stand-off annotation uses the starting and ending positions of an annotation to anchor it to the text and stores the annotation content separately from the text. We address the fundamental problem of efficiently storing stand-off annotations when applying natural language processing on narrative clinical notes in electronic medical records (EMRs) and efficiently retrieving such annotations that satisfy position constraints. Efficient storage and retrieval of stand-off annotations can facilitate tasks such as mapping unstructured text to electronic medical record ontologies. We first formulate this problem into the interval query problem, for which optimal query/update time is in general logarithm. We next perform a tight time complexity analysis on the basic interval tree query algorithm and show its nonoptimality when being applied to a collection of 13 query types from Allen’s interval algebra. We then study two closely related state-of-the-art interval query algorithms, proposed query reformulations, and augmentations to the second algorithm. Our proposed algorithm achieves logarithmic time stabbing-max query time complexity and solves the stabbing-interval query tasks on all of Allen’s relations in logarithmic time, attaining the theoretic lower bound. Updating time is kept logarithmic and the space requirement is kept linear at the same time. We also discuss interval management in external memory models and higher dimensions. PMID:27478379

  19. Towards Building a High Performance Spatial Query System for Large Scale Medical Imaging Data.

    PubMed

    Aji, Ablimit; Wang, Fusheng; Saltz, Joel H

    2012-11-06

    Support of high performance queries on large volumes of scientific spatial data is becoming increasingly important in many applications. This growth is driven by not only geospatial problems in numerous fields, but also emerging scientific applications that are increasingly data- and compute-intensive. For example, digital pathology imaging has become an emerging field during the past decade, where examination of high resolution images of human tissue specimens enables more effective diagnosis, prediction and treatment of diseases. Systematic analysis of large-scale pathology images generates tremendous amounts of spatially derived quantifications of micro-anatomic objects, such as nuclei, blood vessels, and tissue regions. Analytical pathology imaging provides high potential to support image based computer aided diagnosis. One major requirement for this is effective querying of such enormous amount of data with fast response, which is faced with two major challenges: the "big data" challenge and the high computation complexity. In this paper, we present our work towards building a high performance spatial query system for querying massive spatial data on MapReduce. Our framework takes an on demand index building approach for processing spatial queries and a partition-merge approach for building parallel spatial query pipelines, which fits nicely with the computing model of MapReduce. We demonstrate our framework on supporting multi-way spatial joins for algorithm evaluation and nearest neighbor queries for microanatomic objects. To reduce query response time, we propose cost based query optimization to mitigate the effect of data skew. Our experiments show that the framework can efficiently support complex analytical spatial queries on MapReduce.

  20. Towards Building a High Performance Spatial Query System for Large Scale Medical Imaging Data

    PubMed Central

    Aji, Ablimit; Wang, Fusheng; Saltz, Joel H.

    2013-01-01

    Support of high performance queries on large volumes of scientific spatial data is becoming increasingly important in many applications. This growth is driven by not only geospatial problems in numerous fields, but also emerging scientific applications that are increasingly data- and compute-intensive. For example, digital pathology imaging has become an emerging field during the past decade, where examination of high resolution images of human tissue specimens enables more effective diagnosis, prediction and treatment of diseases. Systematic analysis of large-scale pathology images generates tremendous amounts of spatially derived quantifications of micro-anatomic objects, such as nuclei, blood vessels, and tissue regions. Analytical pathology imaging provides high potential to support image based computer aided diagnosis. One major requirement for this is effective querying of such enormous amount of data with fast response, which is faced with two major challenges: the “big data” challenge and the high computation complexity. In this paper, we present our work towards building a high performance spatial query system for querying massive spatial data on MapReduce. Our framework takes an on demand index building approach for processing spatial queries and a partition-merge approach for building parallel spatial query pipelines, which fits nicely with the computing model of MapReduce. We demonstrate our framework on supporting multi-way spatial joins for algorithm evaluation and nearest neighbor queries for microanatomic objects. To reduce query response time, we propose cost based query optimization to mitigate the effect of data skew. Our experiments show that the framework can efficiently support complex analytical spatial queries on MapReduce. PMID:24501719

  1. Query-Time Optimization Techniques for Structured Queries in Information Retrieval

    ERIC Educational Resources Information Center

    Cartright, Marc-Allen

    2013-01-01

    The use of information retrieval (IR) systems is evolving towards larger, more complicated queries. Both the IR industrial and research communities have generated significant evidence indicating that in order to continue improving retrieval effectiveness, increases in retrieval model complexity may be unavoidable. From an operational perspective,…

  2. Managing and Querying Image Annotation and Markup in XML.

    PubMed

    Wang, Fusheng; Pan, Tony; Sharma, Ashish; Saltz, Joel

    2010-01-01

    Proprietary approaches for representing annotations and image markup are serious barriers for researchers to share image data and knowledge. The Annotation and Image Markup (AIM) project is developing a standard based information model for image annotation and markup in health care and clinical trial environments. The complex hierarchical structures of AIM data model pose new challenges for managing such data in terms of performance and support of complex queries. In this paper, we present our work on managing AIM data through a native XML approach, and supporting complex image and annotation queries through native extension of XQuery language. Through integration with xService, AIM databases can now be conveniently shared through caGrid.

  3. Managing and Querying Image Annotation and Markup in XML

    PubMed Central

    Wang, Fusheng; Pan, Tony; Sharma, Ashish; Saltz, Joel

    2010-01-01

    Proprietary approaches for representing annotations and image markup are serious barriers for researchers to share image data and knowledge. The Annotation and Image Markup (AIM) project is developing a standard based information model for image annotation and markup in health care and clinical trial environments. The complex hierarchical structures of AIM data model pose new challenges for managing such data in terms of performance and support of complex queries. In this paper, we present our work on managing AIM data through a native XML approach, and supporting complex image and annotation queries through native extension of XQuery language. Through integration with xService, AIM databases can now be conveniently shared through caGrid. PMID:21218167

  4. BioWarehouse: a bioinformatics database warehouse toolkit

    PubMed Central

    Lee, Thomas J; Pouliot, Yannick; Wagner, Valerie; Gupta, Priyanka; Stringer-Calvert, David WJ; Tenenbaum, Jessica D; Karp, Peter D

    2006-01-01

    Background This article addresses the problem of interoperation of heterogeneous bioinformatics databases. Results We introduce BioWarehouse, an open source toolkit for constructing bioinformatics database warehouses using the MySQL and Oracle relational database managers. BioWarehouse integrates its component databases into a common representational framework within a single database management system, thus enabling multi-database queries using the Structured Query Language (SQL) but also facilitating a variety of database integration tasks such as comparative analysis and data mining. BioWarehouse currently supports the integration of a pathway-centric set of databases including ENZYME, KEGG, and BioCyc, and in addition the UniProt, GenBank, NCBI Taxonomy, and CMR databases, and the Gene Ontology. Loader tools, written in the C and JAVA languages, parse and load these databases into a relational database schema. The loaders also apply a degree of semantic normalization to their respective source data, decreasing semantic heterogeneity. The schema supports the following bioinformatics datatypes: chemical compounds, biochemical reactions, metabolic pathways, proteins, genes, nucleic acid sequences, features on protein and nucleic-acid sequences, organisms, organism taxonomies, and controlled vocabularies. As an application example, we applied BioWarehouse to determine the fraction of biochemically characterized enzyme activities for which no sequences exist in the public sequence databases. The answer is that no sequence exists for 36% of enzyme activities for which EC numbers have been assigned. These gaps in sequence data significantly limit the accuracy of genome annotation and metabolic pathway prediction, and are a barrier for metabolic engineering. Complex queries of this type provide examples of the value of the data warehousing approach to bioinformatics research. Conclusion BioWarehouse embodies significant progress on the database integration problem for bioinformatics. PMID:16556315

  5. BioWarehouse: a bioinformatics database warehouse toolkit.

    PubMed

    Lee, Thomas J; Pouliot, Yannick; Wagner, Valerie; Gupta, Priyanka; Stringer-Calvert, David W J; Tenenbaum, Jessica D; Karp, Peter D

    2006-03-23

    This article addresses the problem of interoperation of heterogeneous bioinformatics databases. We introduce BioWarehouse, an open source toolkit for constructing bioinformatics database warehouses using the MySQL and Oracle relational database managers. BioWarehouse integrates its component databases into a common representational framework within a single database management system, thus enabling multi-database queries using the Structured Query Language (SQL) but also facilitating a variety of database integration tasks such as comparative analysis and data mining. BioWarehouse currently supports the integration of a pathway-centric set of databases including ENZYME, KEGG, and BioCyc, and in addition the UniProt, GenBank, NCBI Taxonomy, and CMR databases, and the Gene Ontology. Loader tools, written in the C and JAVA languages, parse and load these databases into a relational database schema. The loaders also apply a degree of semantic normalization to their respective source data, decreasing semantic heterogeneity. The schema supports the following bioinformatics datatypes: chemical compounds, biochemical reactions, metabolic pathways, proteins, genes, nucleic acid sequences, features on protein and nucleic-acid sequences, organisms, organism taxonomies, and controlled vocabularies. As an application example, we applied BioWarehouse to determine the fraction of biochemically characterized enzyme activities for which no sequences exist in the public sequence databases. The answer is that no sequence exists for 36% of enzyme activities for which EC numbers have been assigned. These gaps in sequence data significantly limit the accuracy of genome annotation and metabolic pathway prediction, and are a barrier for metabolic engineering. Complex queries of this type provide examples of the value of the data warehousing approach to bioinformatics research. BioWarehouse embodies significant progress on the database integration problem for bioinformatics.

  6. Impact of Scientific Versus Emotional Wording of Patient Questions on Doctor-Patient Communication in an Internet Forum: A Randomized Controlled Experiment with Medical Students.

    PubMed

    Bientzle, Martina; Griewatz, Jan; Kimmerle, Joachim; Küppers, Julia; Cress, Ulrike; Lammerding-Koeppel, Maria

    2015-11-25

    Medical expert forums on the Internet play an increasing role in patient counseling. Therefore, it is important to understand how doctor-patient communication is influenced in such forums both by features of the patients or advice seekers, as expressed in their forum queries, and by characteristics of the medical experts involved. In this experimental study, we aimed to examine in what way (1) the particular wording of patient queries and (2) medical experts' therapeutic health concepts (for example, beliefs around adhering to a distinctly scientific understanding of diagnosis and treatment and a clear focus on evidence-based medicine) impact communication behavior of the medical experts in an Internet forum. Advanced medical students (in their ninth semester of medical training) were recruited as participants. Participation in the online forum was part of a communication training embedded in a gynecology course. We first measured their biomedical therapeutic health concept (hereinafter called "biomedical concept"). Then they participated in an online forum where they answered fictitious patient queries about mammography screening that either included scientific or emotional wording in a between-group design. We analyzed participants' replies with regard to the following dimensions: their use of scientific or emotional wording, the amount of communicated information, and their attempt to build a positive doctor-patient relationship. This study was carried out with 117 medical students (73 women, 41 men, 3 did not indicate their sex). We found evidence that both the wording of patient queries and the participants' biomedical concept influenced participants' response behavior. They answered emotional patient queries in a more emotional way (mean 0.92, SD 1.02) than scientific patient queries (mean 0.26, SD 0.55; t74=3.48, P<.001, d=0.81). We also found a significant interaction effect between participants' use of scientific or emotional wording and type of patient query (F2,74=10.29, P<.01, partial η(2)=0.12) indicating that participants used scientific wording independently of the type of patient query, whereas they used emotional wording particularly when replying to emotional patient queries. In addition, the more pronounced the medical experts' biomedical concept was, the more scientifically (adjusted β=.20; F1,75=2.95, P=.045) and the less emotionally (adjusted β=-.22; F1,74=3.66, P=.03) they replied to patient queries. Finally, we found that participants' biomedical concept predicted their engagement in relationship building (adjusted β=-.26): The more pronounced their biomedical concept was, the less they attempted to build a positive doctor-patient relationship (F1,74=5.39, P=.02). Communication training for medical experts could aim to address this issue of recognizing patients' communication styles and needs in certain situations in order to teach medical experts how to take those aspects adequately into account. In addition, communication training should also make medical experts aware of their individual therapeutic health concepts and the consequential implications in communication situations.

  7. An Experimental Investigation of Complexity in Database Query Formulation Tasks

    ERIC Educational Resources Information Center

    Casterella, Gretchen Irwin; Vijayasarathy, Leo

    2013-01-01

    Information Technology professionals and other knowledge workers rely on their ability to extract data from organizational databases to respond to business questions and support decision making. Structured query language (SQL) is the standard programming language for querying data in relational databases, and SQL skills are in high demand and are…

  8. Multi-Bit Quantum Private Query

    NASA Astrophysics Data System (ADS)

    Shi, Wei-Xu; Liu, Xing-Tong; Wang, Jian; Tang, Chao-Jing

    2015-09-01

    Most of the existing Quantum Private Queries (QPQ) protocols provide only single-bit queries service, thus have to be repeated several times when more bits are retrieved. Wei et al.'s scheme for block queries requires a high-dimension quantum key distribution system to sustain, which is still restricted in the laboratory. Here, based on Markus Jakobi et al.'s single-bit QPQ protocol, we propose a multi-bit quantum private query protocol, in which the user can get access to several bits within one single query. We also extend the proposed protocol to block queries, using a binary matrix to guard database security. Analysis in this paper shows that our protocol has better communication complexity, implementability and can achieve a considerable level of security.

  9. Reference Transactions Analysis: The Cost-Effectiveness of Staffing a Traditional Academic Reference Desk

    ERIC Educational Resources Information Center

    Ryan, Susan M.

    2008-01-01

    This study categorizes 6959 reference desk transactions to determine how many of the queries require the attention of a librarian. Results indicate that 89% could likely be answered by non-librarians. From the results of this and other studies, the author explores the cost-effectiveness of staffing a traditional reference desk with librarians.…

  10. Intersystem Compatibility and Convertibility of Subject Vocabularies.

    ERIC Educational Resources Information Center

    Wall, E.; Barnes, J.

    This is the fifth in a series of eight reports of a research study for the National Agricultural Library (NAL) on the effective utilization of bibliographic data bases in machine readable form. NAL desires ultimately to develop techniques of interacting with other data bases so that queries put to NAL may be answered with documents or document…

  11. Targeting and Structuring Information Resource Use: A Path toward Informed Clinical Decisions

    ERIC Educational Resources Information Center

    Mangrulkar, Rajesh S.

    2004-01-01

    A core skill for all physicians to master is that of information manager. Despite a rapidly expanding set of electronic and print-based information resources, clinicians continue to answer their clinical queries predominantly through informal or formal consultation. Even as new tools are brought to market, the majority of them present information…

  12. Measuring Our Success: How to Gauge the "Value Added" by an Independent School Education

    ERIC Educational Resources Information Center

    Gulla, John; Jorgenson, Olaf

    2014-01-01

    Having addressed variations of the question--How can a school's success be "measured"?--with mixed results across a collective five decades of service to boards, John Gulla and Olaf Jorgenson endeavored to develop a more helpful answer. To this end, they queried 200-plus leaders of California Association of Independent Schools…

  13. A Different Pitch to Slope

    ERIC Educational Resources Information Center

    Wolbert, William

    2017-01-01

    The query "When are we ever going to use this?" is easily answered when discussing the slope of a line. The pitch of a roof, the grade of a road, and stair stringers are three applications of slope that are used extensively. The concept of slope, which is introduced fairly early in the mathematics curriculum has hands-on applications…

  14. Finding and Accessing Diagrams in Biomedical Publications

    PubMed Central

    Kuhn, Tobias; Luong, ThaiBinh; Krauthammer, Michael

    2012-01-01

    Complex relationships in biomedical publications are often communicated by diagrams such as bar and line charts, which are a very effective way of summarizing and communicating multi-faceted data sets. Given the ever-increasing amount of published data, we argue that the precise retrieval of such diagrams is of great value for answering specific and otherwise hard-to-meet information needs. To this end, we demonstrate the use of advanced image processing and classification for identifying bar and line charts by the shape and relative location of the different image elements that make up the charts. With recall and precisions of close to 90% for the detection of relevant figures, we discuss the use of this technology in an existing biomedical image search engine, and outline how it enables new forms of literature queries over biomedical relationships that are represented in these charts. PMID:23304318

  15. Understanding the Scalability of Bayesian Network Inference using Clique Tree Growth Curves

    NASA Technical Reports Server (NTRS)

    Mengshoel, Ole Jakob

    2009-01-01

    Bayesian networks (BNs) are used to represent and efficiently compute with multi-variate probability distributions in a wide range of disciplines. One of the main approaches to perform computation in BNs is clique tree clustering and propagation. In this approach, BN computation consists of propagation in a clique tree compiled from a Bayesian network. There is a lack of understanding of how clique tree computation time, and BN computation time in more general, depends on variations in BN size and structure. On the one hand, complexity results tell us that many interesting BN queries are NP-hard or worse to answer, and it is not hard to find application BNs where the clique tree approach in practice cannot be used. On the other hand, it is well-known that tree-structured BNs can be used to answer probabilistic queries in polynomial time. In this article, we develop an approach to characterizing clique tree growth as a function of parameters that can be computed in polynomial time from BNs, specifically: (i) the ratio of the number of a BN's non-root nodes to the number of root nodes, or (ii) the expected number of moral edges in their moral graphs. Our approach is based on combining analytical and experimental results. Analytically, we partition the set of cliques in a clique tree into different sets, and introduce a growth curve for each set. For the special case of bipartite BNs, we consequently have two growth curves, a mixed clique growth curve and a root clique growth curve. In experiments, we systematically increase the degree of the root nodes in bipartite Bayesian networks, and find that root clique growth is well-approximated by Gompertz growth curves. It is believed that this research improves the understanding of the scaling behavior of clique tree clustering, provides a foundation for benchmarking and developing improved BN inference and machine learning algorithms, and presents an aid for analytical trade-off studies of clique tree clustering using growth curves.

  16. Development of a Search Strategy for an Evidence Based Retrieval Service

    PubMed Central

    Ho, Gah Juan; Liew, Su May; Ng, Chirk Jenn; Hisham Shunmugam, Ranita; Glasziou, Paul

    2016-01-01

    Background Physicians are often encouraged to locate answers for their clinical queries via an evidence-based literature search approach. The methods used are often not clearly specified. Inappropriate search strategies, time constraint and contradictory information complicate evidence retrieval. Aims Our study aimed to develop a search strategy to answer clinical queries among physicians in a primary care setting Methods Six clinical questions of different medical conditions seen in primary care were formulated. A series of experimental searches to answer each question was conducted on 3 commonly advocated medical databases. We compared search results from a PICO (patients, intervention, comparison, outcome) framework for questions using different combinations of PICO elements. We also compared outcomes from doing searches using text words, Medical Subject Headings (MeSH), or a combination of both. All searches were documented using screenshots and saved search strategies. Results Answers to all 6 questions using the PICO framework were found. A higher number of systematic reviews were obtained using a 2 PICO element search compared to a 4 element search. A more optimal choice of search is a combination of both text words and MeSH terms. Despite searching using the Systematic Review filter, many non-systematic reviews or narrative reviews were found in PubMed. There was poor overlap between outcomes of searches using different databases. The duration of search and screening for the 6 questions ranged from 1 to 4 hours. Conclusion This strategy has been shown to be feasible and can provide evidence to doctors’ clinical questions. It has the potential to be incorporated into an interventional study to determine the impact of an online evidence retrieval system. PMID:27935993

  17. A natural language interface plug-in for cooperative query answering in biological databases.

    PubMed

    Jamil, Hasan M

    2012-06-11

    One of the many unique features of biological databases is that the mere existence of a ground data item is not always a precondition for a query response. It may be argued that from a biologist's standpoint, queries are not always best posed using a structured language. By this we mean that approximate and flexible responses to natural language like queries are well suited for this domain. This is partly due to biologists' tendency to seek simpler interfaces and partly due to the fact that questions in biology involve high level concepts that are open to interpretations computed using sophisticated tools. In such highly interpretive environments, rigidly structured databases do not always perform well. In this paper, our goal is to propose a semantic correspondence plug-in to aid natural language query processing over arbitrary biological database schema with an aim to providing cooperative responses to queries tailored to users' interpretations. Natural language interfaces for databases are generally effective when they are tuned to the underlying database schema and its semantics. Therefore, changes in database schema become impossible to support, or a substantial reorganization cost must be absorbed to reflect any change. We leverage developments in natural language parsing, rule languages and ontologies, and data integration technologies to assemble a prototype query processor that is able to transform a natural language query into a semantically equivalent structured query over the database. We allow knowledge rules and their frequent modifications as part of the underlying database schema. The approach we adopt in our plug-in overcomes some of the serious limitations of many contemporary natural language interfaces, including support for schema modifications and independence from underlying database schema. The plug-in introduced in this paper is generic and facilitates connecting user selected natural language interfaces to arbitrary databases using a semantic description of the intended application. We demonstrate the feasibility of our approach with a practical example.

  18. Ontological interpretation of biomedical database content.

    PubMed

    Santana da Silva, Filipe; Jansen, Ludger; Freitas, Fred; Schulz, Stefan

    2017-06-26

    Biological databases store data about laboratory experiments, together with semantic annotations, in order to support data aggregation and retrieval. The exact meaning of such annotations in the context of a database record is often ambiguous. We address this problem by grounding implicit and explicit database content in a formal-ontological framework. By using a typical extract from the databases UniProt and Ensembl, annotated with content from GO, PR, ChEBI and NCBI Taxonomy, we created four ontological models (in OWL), which generate explicit, distinct interpretations under the BioTopLite2 (BTL2) upper-level ontology. The first three models interpret database entries as individuals (IND), defined classes (SUBC), and classes with dispositions (DISP), respectively; the fourth model (HYBR) is a combination of SUBC and DISP. For the evaluation of these four models, we consider (i) database content retrieval, using ontologies as query vocabulary; (ii) information completeness; and, (iii) DL complexity and decidability. The models were tested under these criteria against four competency questions (CQs). IND does not raise any ontological claim, besides asserting the existence of sample individuals and relations among them. Modelling patterns have to be created for each type of annotation referent. SUBC is interpreted regarding maximally fine-grained defined subclasses under the classes referred to by the data. DISP attempts to extract truly ontological statements from the database records, claiming the existence of dispositions. HYBR is a hybrid of SUBC and DISP and is more parsimonious regarding expressiveness and query answering complexity. For each of the four models, the four CQs were submitted as DL queries. This shows the ability to retrieve individuals with IND, and classes in SUBC and HYBR. DISP does not retrieve anything because the axioms with disposition are embedded in General Class Inclusion (GCI) statements. Ambiguity of biological database content is addressed by a method that identifies implicit knowledge behind semantic annotations in biological databases and grounds it in an expressive upper-level ontology. The result is a seamless representation of database structure, content and annotations as OWL models.

  19. VISAGE: Interactive Visual Graph Querying.

    PubMed

    Pienta, Robert; Navathe, Shamkant; Tamersoy, Acar; Tong, Hanghang; Endert, Alex; Chau, Duen Horng

    2016-06-01

    Extracting useful patterns from large network datasets has become a fundamental challenge in many domains. We present VISAGE, an interactive visual graph querying approach that empowers users to construct expressive queries, without writing complex code (e.g., finding money laundering rings of bankers and business owners). Our contributions are as follows: (1) we introduce graph autocomplete , an interactive approach that guides users to construct and refine queries, preventing over-specification; (2) VISAGE guides the construction of graph queries using a data-driven approach, enabling users to specify queries with varying levels of specificity, from concrete and detailed (e.g., query by example), to abstract (e.g., with "wildcard" nodes of any types), to purely structural matching; (3) a twelve-participant, within-subject user study demonstrates VISAGE's ease of use and the ability to construct graph queries significantly faster than using a conventional query language; (4) VISAGE works on real graphs with over 468K edges, achieving sub-second response times for common queries.

  20. VISAGE: Interactive Visual Graph Querying

    PubMed Central

    Pienta, Robert; Navathe, Shamkant; Tamersoy, Acar; Tong, Hanghang; Endert, Alex; Chau, Duen Horng

    2017-01-01

    Extracting useful patterns from large network datasets has become a fundamental challenge in many domains. We present VISAGE, an interactive visual graph querying approach that empowers users to construct expressive queries, without writing complex code (e.g., finding money laundering rings of bankers and business owners). Our contributions are as follows: (1) we introduce graph autocomplete, an interactive approach that guides users to construct and refine queries, preventing over-specification; (2) VISAGE guides the construction of graph queries using a data-driven approach, enabling users to specify queries with varying levels of specificity, from concrete and detailed (e.g., query by example), to abstract (e.g., with “wildcard” nodes of any types), to purely structural matching; (3) a twelve-participant, within-subject user study demonstrates VISAGE’s ease of use and the ability to construct graph queries significantly faster than using a conventional query language; (4) VISAGE works on real graphs with over 468K edges, achieving sub-second response times for common queries. PMID:28553670

  1. In-context query reformulation for failing SPARQL queries

    NASA Astrophysics Data System (ADS)

    Viswanathan, Amar; Michaelis, James R.; Cassidy, Taylor; de Mel, Geeth; Hendler, James

    2017-05-01

    Knowledge bases for decision support systems are growing increasingly complex, through continued advances in data ingest and management approaches. However, humans do not possess the cognitive capabilities to retain a bird's-eyeview of such knowledge bases, and may end up issuing unsatisfiable queries to such systems. This work focuses on the implementation of a query reformulation approach for graph-based knowledge bases, specifically designed to support the Resource Description Framework (RDF). The reformulation approach presented is instance-and schema-aware. Thus, in contrast to relaxation techniques found in the state-of-the-art, the presented approach produces in-context query reformulation.

  2. A Rule-Based Spatial Reasoning Approach for OpenStreetMap Data Quality Enrichment; Case Study of Routing and Navigation

    PubMed Central

    2017-01-01

    Finding relevant geospatial information is increasingly critical because of the growing volume of geospatial data available within the emerging “Big Data” era. Users are expecting that the availability of massive datasets will create more opportunities to uncover hidden information and answer more complex queries. This is especially the case with routing and navigation services where the ability to retrieve points of interest and landmarks make the routing service personalized, precise, and relevant. In this paper, we propose a new geospatial information approach that enables the retrieval of implicit information, i.e., geospatial entities that do not exist explicitly in the available source. We present an information broker that uses a rule-based spatial reasoning algorithm to detect topological relations. The information broker is embedded into a framework where annotations and mappings between OpenStreetMap data attributes and external resources, such as taxonomies, support the enrichment of queries to improve the ability of the system to retrieve information. Our method is tested with two case studies that leads to enriching the completeness of OpenStreetMap data with footway crossing points-of-interests as well as building entrances for routing and navigation purposes. It is concluded that the proposed approach can uncover implicit entities and contribute to extract required information from the existing datasets. PMID:29088125

  3. EarthServer: a Summary of Achievements in Technology, Services, and Standards

    NASA Astrophysics Data System (ADS)

    Baumann, Peter

    2015-04-01

    Big Data in the Earth sciences, the Tera- to Exabyte archives, mostly are made up from coverage data, according to ISO and OGC defined as the digital representation of some space-time varying phenomenon. Common examples include 1-D sensor timeseries, 2-D remote sensing imagery, 3D x/y/t image timese ries and x/y/z geology data, and 4-D x/y/z/t atmosphere and ocean data. Analytics on such data requires on-demand processing of sometimes significant complexity, such as getting the Fourier transform of satellite images. As network bandwidth limits prohibit transfer of such Big Data it is indispensable to devise protocols allowing clients to task flexible and fast processing on the server. The transatlantic EarthServer initiative, running from 2011 through 2014, has united 11 partners to establish Big Earth Data Analytics. A key ingredient has been flexibility for users to ask whatever they want, not impeded and complicated by system internals. The EarthServer answer to this is to use high-level, standards-based query languages which unify data and metadata search in a simple, yet powerful way. A second key ingredient is scalability. Without any doubt, scalability ultimately can only be achieved through parallelization. In the past, parallelizing cod e has been done at compile time and usually with manual intervention. The EarthServer approach is to perform a samentic-based dynamic distribution of queries fragments based on networks optimization and further criteria. The EarthServer platform is comprised by rasdaman, the pioneer and leading Array DBMS built for any-size multi-dimensional raster data being extended with support for irregular grids and general meshes; in-situ retrieval (evaluation of database queries on existing archive structures, avoiding data import and, hence, duplication); the aforementioned distributed query processing. Additionally, Web clients for multi-dimensional data visualization are being established. Client/server interfaces are strictly based on OGC and W3C standards, in particular the Web Coverage Processing Service (WCPS) which defines a high-level coverage query language. Reviewers have attested EarthServer that "With no doubt the project has been shaping the Big Earth Data landscape through the standardization activities within OGC, ISO and beyond". We present the project approach, its outcomes and impact on standardization and Big Data technology, and vistas for the future.

  4. Seeking Web-Based Information About Attention Deficit Hyperactivity Disorder: Where, What, and When

    PubMed Central

    2017-01-01

    Background Attention Deficit Hyperactivity Disorder (ADHD) is a common neurodevelopmental disorder, prevalent among 2-10% of the population. Objective The objective of this study was to describe where, what, and when people search online for topics related to ADHD. Methods Data were collected from Microsoft’s Bing search engine and from the community question and answer site, Yahoo Answers. The questions were analyzed based on keywords and using further statistical methods. Results Our results revealed that the Internet indeed constitutes a source of information for people searching the topic of ADHD, and that they search for information mostly about ADHD symptoms. Furthermore, individuals personally affected by the disorder made 2.0 more questions about ADHD compared with others. Questions begin when children reach 2 years of age, with an average age of 5.1 years. Most of the websites searched were not specifically related to ADHD and the timing of searches as well as the query content were different among those prediagnosis compared with postdiagnosis. Conclusions The study results shed light on the features of ADHD-related searches. Thus, they may help improve the Internet as a source of reliable information, and promote improved awareness and knowledge about ADHD as well as quality of life for populations dealing with the complex phenomena of ADHD. PMID:28432038

  5. Raising the IQ in full-text searching via intelligent querying

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kero, R.; Russell, L.; Swietlik, C.

    1994-11-01

    Current Information Retrieval (IR) technologies allow for efficient access to relevant information, provided that user selected query terms coincide with the specific linguistical choices made by the authors whose works constitute the text-base. Therefore, the challenge is to enhance the limited searching capability of state-of-the-practice IR. This can be done either with augmented clients that overcome current server searching deficiencies, or with added capabilities that can augment searching algorithms on the servers. The technology being investigated is that of deductive databases, with a set of new techniques called cooperative answering. This technology utilizes semantic networks to allow for navigation betweenmore » possible query search term alternatives. The augmented search terms are passed to an IR engine and the results can be compared. The project utilizes the OSTI Environment, Safety and Health Thesaurus to populate the domain specific semantic network and the text base of ES&H related documents from the Facility Profile Information Management System as the domain specific search space.« less

  6. GBA manager: an online tool for querying low-complexity regions in proteins.

    PubMed

    Bandyopadhyay, Nirmalya; Kahveci, Tamer

    2010-01-01

    Abstract We developed GBA Manager, an online software that facilitates the Graph-Based Algorithm (GBA) we proposed in our earlier work. GBA identifies the low-complexity regions (LCR) of protein sequences. GBA exploits a similarity matrix, such as BLOSUM62, to compute the complexity of the subsequences of the input protein sequence. It uses a graph-based algorithm to accurately compute the regions that have low complexities. GBA Manager is a user friendly web-service that enables online querying of protein sequences using GBA. In addition to querying capabilities of the existing GBA algorithm, GBA Manager computes the p-values of the LCR identified. The p-value gives an estimate of the possibility that the region appears by chance. GBA Manager presents the output in three different understandable formats. GBA Manager is freely accessible at http://bioinformatics.cise.ufl.edu/GBA/GBA.htm .

  7. The Philosopher as Parent: John Dewey's Observations of His Children's Language Development and the Development of His Thinking about Communication

    ERIC Educational Resources Information Center

    Dyehouse, Jeremiah; Manke, Krysten

    2017-01-01

    Can John Dewey's experiments at the University of Chicago's Laboratory School teach contemporary inquirers about "learning by making?" This article warrants an affirmative answer to this query. Unlike intellectual historians who trace the source of Dewey's and his colleagues' 1890s pedagogies to their cultural biases, we contend that…

  8. A Response to Jordan's (2004) "Explanatory Adequacy and Theories of Second Language Acquisition"

    ERIC Educational Resources Information Center

    Gregg, Kevin R.

    2005-01-01

    In a recent paper (Jordan, Geoff Jordan takes issue with some of my claims about second language acquisition (SLA) theory. Specifically, he queries the necessity of a property theory, and he finds my discussion of explanation unsatisfactory. In this brief reply, I try to answer his criticisms. In a brief but interesting paper, Geoff Jordan (2004:…

  9. What Is the Purpose? Reflections on Inclusion and Special Education from a Capability Perspective

    ERIC Educational Resources Information Center

    Reindal, Solveig Magnus

    2010-01-01

    This article investigated what the capability approach developed by Amartya Sen and Martha Nussbaum can contribute to the issue of inclusion as a new theoretical framework for special education. By posing the question: "What is the purpose of inclusion?", I have proposed to answer this query by investigating how the capability approach is able to…

  10. A Reply? A Response to Penny Thompson

    ERIC Educational Resources Information Center

    Doble, Peter

    2007-01-01

    Penny Thompson's "reply" to the author's article (Doble, 2005) briefly tells readers that for an answer to some of the author's queries, readers may turn to her book; for the rest, she proposes to take "the argument" further. One of the problems with her earlier article was that it had no discernible argument, so it is not easy to see how it may…

  11. Solving problems by interrogating sets of knowledge systems: Toward a theory of multiple knowledge systems

    NASA Technical Reports Server (NTRS)

    Dekorvin, Andre

    1989-01-01

    The main purpose is to develop a theory for multiple knowledge systems. A knowledge system could be a sensor or an expert system, but it must specialize in one feature. The problem is that we have an exhaustive list of possible answers to some query (such as what object is it). By collecting different feature values, in principle, it should be possible to give an answer to the query, or at least narrow down the list. Since a sensor, or for that matter an expert system, does not in most cases yield a precise value for the feature, uncertainty must be built into the model. Also, researchers must have a formal mechanism to be able to put the information together. Researchers chose to use the Dempster-Shafer approach to handle the problems mentioned above. Researchers introduce the concept of a state of recognition and point out that there is a relation between receiving updates and defining a set valued Markov Chain. Also, deciding what the value of the next set valued variable is can be phrased in terms of classical decision making theory such as minimizing the maximum regret. Other related problems are examined.

  12. Using an image-extended relational database to support content-based image retrieval in a PACS.

    PubMed

    Traina, Caetano; Traina, Agma J M; Araújo, Myrian R B; Bueno, Josiane M; Chino, Fabio J T; Razente, Humberto; Azevedo-Marques, Paulo M

    2005-12-01

    This paper presents a new Picture Archiving and Communication System (PACS), called cbPACS, which has content-based image retrieval capabilities. The cbPACS answers range and k-nearest- neighbor similarity queries, employing a relational database manager extended to support images. The images are compared through their features, which are extracted by an image-processing module and stored in the extended relational database. The database extensions were developed aiming at efficiently answering similarity queries by taking advantage of specialized indexing methods. The main concept supporting the extensions is the definition, inside the relational manager, of distance functions based on features extracted from the images. An extension to the SQL language enables the construction of an interpreter that intercepts the extended commands and translates them to standard SQL, allowing any relational database server to be used. By now, the system implemented works on features based on color distribution of the images through normalized histograms as well as metric histograms. Metric histograms are invariant regarding scale, translation and rotation of images and also to brightness transformations. The cbPACS is prepared to integrate new image features, based on texture and shape of the main objects in the image.

  13. [Response of Pharmaceutical Companies to the Crisis of Post-Marketing Clinical Trials of Anti-Cancer Agents -- Results of Questionnaires to Pharmaceutical Companies].

    PubMed

    Nakajima, Toshifusa

    2016-04-01

    Investigator-oriented post-marketing clinical trials of anti-cancer agents are faced to financial crisis due to drastic decrease in research-funds from pharmaceutical companies caused by a scandal in 2013. In order to assess the balance of research funds between 2012 and 2014, we made queries to 26 companies manufacturing anti-cancer agents, and only 10 of 26 responded to our queries. Decrease in the fund was observed in 5 of 10, no change in 1, increase in 3 and no answer in 1. Companies showed passive attitude to carry out doctor-oriented clinical trials of off-patent drugs or unapproved drugs according to advanced medical care B program, though some companies answered to proceed approved routines of these drugs if clinical trials showed good results. Most companies declined to make comments on the activity of Japan Agency for Medical Research and Development (AMED), but some insisted to produce good corroboration between AMED and pharmaceutical companies in order to improve the quality of trials. Further corroboration must be necessary for this purpose among researchers, governmental administrative organs, pharmaceutical companies, patients' groups, and mass-media.

  14. Air Markets Program Data (AMPD)

    EPA Pesticide Factsheets

    The Air Markets Program Data tool allows users to search EPA data to answer scientific, general, policy, and regulatory questions about industry emissions. Air Markets Program Data (AMPD) is a web-based application that allows users easy access to both current and historical data collected as part of EPA's emissions trading programs. This site allows you to create and view reports and to download emissions data for further analysis. AMPD provides a query tool so users can create custom queries of industry source emissions data, allowance data, compliance data, and facility attributes. In addition, AMPD provides interactive maps, charts, reports, and pre-packaged datasets. AMPD does not require any additional software, plug-ins, or security controls and can be accessed using a standard web browser.

  15. RCQ-GA: RDF Chain Query Optimization Using Genetic Algorithms

    NASA Astrophysics Data System (ADS)

    Hogenboom, Alexander; Milea, Viorel; Frasincar, Flavius; Kaymak, Uzay

    The application of Semantic Web technologies in an Electronic Commerce environment implies a need for good support tools. Fast query engines are needed for efficient querying of large amounts of data, usually represented using RDF. We focus on optimizing a special class of SPARQL queries, the so-called RDF chain queries. For this purpose, we devise a genetic algorithm called RCQ-GA that determines the order in which joins need to be performed for an efficient evaluation of RDF chain queries. The approach is benchmarked against a two-phase optimization algorithm, previously proposed in literature. The more complex a query is, the more RCQ-GA outperforms the benchmark in solution quality, execution time needed, and consistency of solution quality. When the algorithms are constrained by a time limit, the overall performance of RCQ-GA compared to the benchmark further improves.

  16. The CMS DBS query language

    NASA Astrophysics Data System (ADS)

    Kuznetsov, Valentin; Riley, Daniel; Afaq, Anzar; Sekhri, Vijay; Guo, Yuyi; Lueking, Lee

    2010-04-01

    The CMS experiment has implemented a flexible and powerful system enabling users to find data within the CMS physics data catalog. The Dataset Bookkeeping Service (DBS) comprises a database and the services used to store and access metadata related to CMS physics data. To this, we have added a generalized query system in addition to the existing web and programmatic interfaces to the DBS. This query system is based on a query language that hides the complexity of the underlying database structure by discovering the join conditions between database tables. This provides a way of querying the system that is simple and straightforward for CMS data managers and physicists to use without requiring knowledge of the database tables or keys. The DBS Query Language uses the ANTLR tool to build the input query parser and tokenizer, followed by a query builder that uses a graph representation of the DBS schema to construct the SQL query sent to underlying database. We will describe the design of the query system, provide details of the language components and overview of how this component fits into the overall data discovery system architecture.

  17. WATCHMAN: A Data Warehouse Intelligent Cache Manager

    NASA Technical Reports Server (NTRS)

    Scheuermann, Peter; Shim, Junho; Vingralek, Radek

    1996-01-01

    Data warehouses store large volumes of data which are used frequently by decision support applications. Such applications involve complex queries. Query performance in such an environment is critical because decision support applications often require interactive query response time. Because data warehouses are updated infrequently, it becomes possible to improve query performance by caching sets retrieved by queries in addition to query execution plans. In this paper we report on the design of an intelligent cache manager for sets retrieved by queries called WATCHMAN, which is particularly well suited for data warehousing environment. Our cache manager employs two novel, complementary algorithms for cache replacement and for cache admission. WATCHMAN aims at minimizing query response time and its cache replacement policy swaps out entire retrieved sets of queries instead of individual pages. The cache replacement and admission algorithms make use of a profit metric, which considers for each retrieved set its average rate of reference, its size, and execution cost of the associated query. We report on a performance evaluation based on the TPC-D and Set Query benchmarks. These experiments show that WATCHMAN achieves a substantial performance improvement in a decision support environment when compared to a traditional LRU replacement algorithm.

  18. Quantum algorithms on Walsh transform and Hamming distance for Boolean functions

    NASA Astrophysics Data System (ADS)

    Xie, Zhengwei; Qiu, Daowen; Cai, Guangya

    2018-06-01

    Walsh spectrum or Walsh transform is an alternative description of Boolean functions. In this paper, we explore quantum algorithms to approximate the absolute value of Walsh transform W_f at a single point z0 (i.e., |W_f(z0)|) for n-variable Boolean functions with probability at least 8/π 2 using the number of O(1/|W_f(z_{0)|ɛ }) queries, promised that the accuracy is ɛ , while the best known classical algorithm requires O(2n) queries. The Hamming distance between Boolean functions is used to study the linearity testing and other important problems. We take advantage of Walsh transform to calculate the Hamming distance between two n-variable Boolean functions f and g using O(1) queries in some cases. Then, we exploit another quantum algorithm which converts computing Hamming distance between two Boolean functions to quantum amplitude estimation (i.e., approximate counting). If Ham(f,g)=t≠0, we can approximately compute Ham( f, g) with probability at least 2/3 by combining our algorithm and {Approx-Count(f,ɛ ) algorithm} using the expected number of Θ( √{N/(\\lfloor ɛ t\\rfloor +1)}+√{t(N-t)}/\\lfloor ɛ t\\rfloor +1) queries, promised that the accuracy is ɛ . Moreover, our algorithm is optimal, while the exact query complexity for the above problem is Θ(N) and the query complexity with the accuracy ɛ is O(1/ɛ 2N/(t+1)) in classical algorithm, where N=2n. Finally, we present three exact quantum query algorithms for two promise problems on Hamming distance using O(1) queries, while any classical deterministic algorithm solving the problem uses Ω(2n) queries.

  19. [Answers to queries on absence of "heart related to fire" theory in the western Han Dynasty].

    PubMed

    Tian, S; Wang, J

    1995-01-01

    This paper claims that all the materials, including the first chapter of Lu's Chun Qiu Shi Er Ji, the theory of Heart Related to Fire in Huai Nan Zi Di Xing Xun, and "Chi Di Zi kills Bai Di Zi" are all later interpolations and are, therefore, unconvincing. The author offers ten questions conflicting with the old sayings.

  20. A Survey of Kurdish Students' Sound Segment & Syllabic Pattern Errors in the Course of Learning EFL

    ERIC Educational Resources Information Center

    Mohammadi, Jahangir

    2014-01-01

    This paper is devoted to finding adequate answers to the following queries: (A) what are the segmental and syllabic pattern errors made by Kurdish students in their pronunciation? (B) Can the problematic areas in pronunciation be predicted by a systematic comparison of the sound systems of both native and target languages? (C) Can there be any…

  1. The ethics of complexity. Genetics and autism, a literature review.

    PubMed

    Hens, Kristien; Peeters, Hilde; Dierickx, Kris

    2016-04-01

    It is commonly believed that the etiology of autism is at least partly explained through genetics. Given the complexity of autism and the variability of the autistic phenotype, genetic research and counseling in this field are also complex and associated with specific ethical questions. Although the ethics of autism genetics, especially with regard to reproductive choices, has been widely discussed on the public fora, an in depth philosophical or bioethical reflection on all aspects of the theme seems to be missing. With this literature review we wanted to map the basic questions and answers that exist in the bioethical literature on autism genetics, research, counseling and reproduction, and provide suggestions as to how the discussion can proceed. We found 19 papers that fitted the description of "bioethics literature focusing on autism genetics," and analyzed their content to distill arguments and themes. We concluded that because of the complexity of autism, and the uncertainty with regard to its status, more ethical reflection is needed before definite conclusions and recommendations can be drawn. Moreover, there is a dearth of bioethical empirical studies querying the opinions of all parties, including people with autism themselves. Such empirical bioethical studies should be urgently done before bioethical conclusions regarding the aims and desirability of research into autism genes can be done. Also, fundamental philosophical reflection on concepts of disease should accompany research into the etiology of autism. © 2016 Wiley Periodicals, Inc.

  2. A Firefly Algorithm-based Approach for Pseudo-Relevance Feedback: Application to Medical Database.

    PubMed

    Khennak, Ilyes; Drias, Habiba

    2016-11-01

    The difficulty of disambiguating the sense of the incomplete and imprecise keywords that are extensively used in the search queries has caused the failure of search systems to retrieve the desired information. One of the most powerful and promising method to overcome this shortcoming and improve the performance of search engines is Query Expansion, whereby the user's original query is augmented by new keywords that best characterize the user's information needs and produce more useful query. In this paper, a new Firefly Algorithm-based approach is proposed to enhance the retrieval effectiveness of query expansion while maintaining low computational complexity. In contrast to the existing literature, the proposed approach uses a Firefly Algorithm to find the best expanded query among a set of expanded query candidates. Moreover, this new approach allows the determination of the length of the expanded query empirically. Experimental results on MEDLINE, the on-line medical information database, show that our proposed approach is more effective and efficient compared to the state-of-the-art.

  3. HomPPI: a class of sequence homology based protein-protein interface prediction methods

    PubMed Central

    2011-01-01

    Background Although homology-based methods are among the most widely used methods for predicting the structure and function of proteins, the question as to whether interface sequence conservation can be effectively exploited in predicting protein-protein interfaces has been a subject of debate. Results We studied more than 300,000 pair-wise alignments of protein sequences from structurally characterized protein complexes, including both obligate and transient complexes. We identified sequence similarity criteria required for accurate homology-based inference of interface residues in a query protein sequence. Based on these analyses, we developed HomPPI, a class of sequence homology-based methods for predicting protein-protein interface residues. We present two variants of HomPPI: (i) NPS-HomPPI (Non partner-specific HomPPI), which can be used to predict interface residues of a query protein in the absence of knowledge of the interaction partner; and (ii) PS-HomPPI (Partner-specific HomPPI), which can be used to predict the interface residues of a query protein with a specific target protein. Our experiments on a benchmark dataset of obligate homodimeric complexes show that NPS-HomPPI can reliably predict protein-protein interface residues in a given protein, with an average correlation coefficient (CC) of 0.76, sensitivity of 0.83, and specificity of 0.78, when sequence homologs of the query protein can be reliably identified. NPS-HomPPI also reliably predicts the interface residues of intrinsically disordered proteins. Our experiments suggest that NPS-HomPPI is competitive with several state-of-the-art interface prediction servers including those that exploit the structure of the query proteins. The partner-specific classifier, PS-HomPPI can, on a large dataset of transient complexes, predict the interface residues of a query protein with a specific target, with a CC of 0.65, sensitivity of 0.69, and specificity of 0.70, when homologs of both the query and the target can be reliably identified. The HomPPI web server is available at http://homppi.cs.iastate.edu/. Conclusions Sequence homology-based methods offer a class of computationally efficient and reliable approaches for predicting the protein-protein interface residues that participate in either obligate or transient interactions. For query proteins involved in transient interactions, the reliability of interface residue prediction can be improved by exploiting knowledge of putative interaction partners. PMID:21682895

  4. GEMINI: a computationally-efficient search engine for large gene expression datasets.

    PubMed

    DeFreitas, Timothy; Saddiki, Hachem; Flaherty, Patrick

    2016-02-24

    Low-cost DNA sequencing allows organizations to accumulate massive amounts of genomic data and use that data to answer a diverse range of research questions. Presently, users must search for relevant genomic data using a keyword, accession number of meta-data tag. However, in this search paradigm the form of the query - a text-based string - is mismatched with the form of the target - a genomic profile. To improve access to massive genomic data resources, we have developed a fast search engine, GEMINI, that uses a genomic profile as a query to search for similar genomic profiles. GEMINI implements a nearest-neighbor search algorithm using a vantage-point tree to store a database of n profiles and in certain circumstances achieves an [Formula: see text] expected query time in the limit. We tested GEMINI on breast and ovarian cancer gene expression data from The Cancer Genome Atlas project and show that it achieves a query time that scales as the logarithm of the number of records in practice on genomic data. In a database with 10(5) samples, GEMINI identifies the nearest neighbor in 0.05 sec compared to a brute force search time of 0.6 sec. GEMINI is a fast search engine that uses a query genomic profile to search for similar profiles in a very large genomic database. It enables users to identify similar profiles independent of sample label, data origin or other meta-data information.

  5. Computing health quality measures using Informatics for Integrating Biology and the Bedside.

    PubMed

    Klann, Jeffrey G; Murphy, Shawn N

    2013-04-19

    The Health Quality Measures Format (HQMF) is a Health Level 7 (HL7) standard for expressing computable Clinical Quality Measures (CQMs). Creating tools to process HQMF queries in clinical databases will become increasingly important as the United States moves forward with its Health Information Technology Strategic Plan to Stages 2 and 3 of the Meaningful Use incentive program (MU2 and MU3). Informatics for Integrating Biology and the Bedside (i2b2) is one of the analytical databases used as part of the Office of the National Coordinator (ONC)'s Query Health platform to move toward this goal. Our goal is to integrate i2b2 with the Query Health HQMF architecture, to prepare for other HQMF use-cases (such as MU2 and MU3), and to articulate the functional overlap between i2b2 and HQMF. Therefore, we analyze the structure of HQMF, and then we apply this understanding to HQMF computation on the i2b2 clinical analytical database platform. Specifically, we develop a translator between two query languages, HQMF and i2b2, so that the i2b2 platform can compute HQMF queries. We use the HQMF structure of queries for aggregate reporting, which define clinical data elements and the temporal and logical relationships between them. We use the i2b2 XML format, which allows flexible querying of a complex clinical data repository in an easy-to-understand domain-specific language. The translator can represent nearly any i2b2-XML query as HQMF and execute in i2b2 nearly any HQMF query expressible in i2b2-XML. This translator is part of the freely available reference implementation of the QueryHealth initiative. We analyze limitations of the conversion and find it covers many, but not all, of the complex temporal and logical operators required by quality measures. HQMF is an expressive language for defining quality measures, and it will be important to understand and implement for CQM computation, in both meaningful use and population health. However, its current form might allow complexity that is intractable for current database systems (both in terms of implementation and computation). Our translator, which supports the subset of HQMF currently expressible in i2b2-XML, may represent the beginnings of a practical compromise. It is being pilot-tested in two Query Health demonstration projects, and it can be further expanded to balance computational tractability with the advanced features needed by measure developers.

  6. Computing Health Quality Measures Using Informatics for Integrating Biology and the Bedside

    PubMed Central

    Murphy, Shawn N

    2013-01-01

    Background The Health Quality Measures Format (HQMF) is a Health Level 7 (HL7) standard for expressing computable Clinical Quality Measures (CQMs). Creating tools to process HQMF queries in clinical databases will become increasingly important as the United States moves forward with its Health Information Technology Strategic Plan to Stages 2 and 3 of the Meaningful Use incentive program (MU2 and MU3). Informatics for Integrating Biology and the Bedside (i2b2) is one of the analytical databases used as part of the Office of the National Coordinator (ONC)’s Query Health platform to move toward this goal. Objective Our goal is to integrate i2b2 with the Query Health HQMF architecture, to prepare for other HQMF use-cases (such as MU2 and MU3), and to articulate the functional overlap between i2b2 and HQMF. Therefore, we analyze the structure of HQMF, and then we apply this understanding to HQMF computation on the i2b2 clinical analytical database platform. Specifically, we develop a translator between two query languages, HQMF and i2b2, so that the i2b2 platform can compute HQMF queries. Methods We use the HQMF structure of queries for aggregate reporting, which define clinical data elements and the temporal and logical relationships between them. We use the i2b2 XML format, which allows flexible querying of a complex clinical data repository in an easy-to-understand domain-specific language. Results The translator can represent nearly any i2b2-XML query as HQMF and execute in i2b2 nearly any HQMF query expressible in i2b2-XML. This translator is part of the freely available reference implementation of the QueryHealth initiative. We analyze limitations of the conversion and find it covers many, but not all, of the complex temporal and logical operators required by quality measures. Conclusions HQMF is an expressive language for defining quality measures, and it will be important to understand and implement for CQM computation, in both meaningful use and population health. However, its current form might allow complexity that is intractable for current database systems (both in terms of implementation and computation). Our translator, which supports the subset of HQMF currently expressible in i2b2-XML, may represent the beginnings of a practical compromise. It is being pilot-tested in two Query Health demonstration projects, and it can be further expanded to balance computational tractability with the advanced features needed by measure developers. PMID:23603227

  7. Secure Skyline Queries on Cloud Platform.

    PubMed

    Liu, Jinfei; Yang, Juncheng; Xiong, Li; Pei, Jian

    2017-04-01

    Outsourcing data and computation to cloud server provides a cost-effective way to support large scale data storage and query processing. However, due to security and privacy concerns, sensitive data (e.g., medical records) need to be protected from the cloud server and other unauthorized users. One approach is to outsource encrypted data to the cloud server and have the cloud server perform query processing on the encrypted data only. It remains a challenging task to support various queries over encrypted data in a secure and efficient way such that the cloud server does not gain any knowledge about the data, query, and query result. In this paper, we study the problem of secure skyline queries over encrypted data. The skyline query is particularly important for multi-criteria decision making but also presents significant challenges due to its complex computations. We propose a fully secure skyline query protocol on data encrypted using semantically-secure encryption. As a key subroutine, we present a new secure dominance protocol, which can be also used as a building block for other queries. Finally, we provide both serial and parallelized implementations and empirically study the protocols in terms of efficiency and scalability under different parameter settings, verifying the feasibility of our proposed solutions.

  8. Distributed query plan generation using multiobjective genetic algorithm.

    PubMed

    Panicker, Shina; Kumar, T V Vijay

    2014-01-01

    A distributed query processing strategy, which is a key performance determinant in accessing distributed databases, aims to minimize the total query processing cost. One way to achieve this is by generating efficient distributed query plans that involve fewer sites for processing a query. In the case of distributed relational databases, the number of possible query plans increases exponentially with respect to the number of relations accessed by the query and the number of sites where these relations reside. Consequently, computing optimal distributed query plans becomes a complex problem. This distributed query plan generation (DQPG) problem has already been addressed using single objective genetic algorithm, where the objective is to minimize the total query processing cost comprising the local processing cost (LPC) and the site-to-site communication cost (CC). In this paper, this DQPG problem is formulated and solved as a biobjective optimization problem with the two objectives being minimize total LPC and minimize total CC. These objectives are simultaneously optimized using a multiobjective genetic algorithm NSGA-II. Experimental comparison of the proposed NSGA-II based DQPG algorithm with the single objective genetic algorithm shows that the former performs comparatively better and converges quickly towards optimal solutions for an observed crossover and mutation probability.

  9. Distributed Query Plan Generation Using Multiobjective Genetic Algorithm

    PubMed Central

    Panicker, Shina; Vijay Kumar, T. V.

    2014-01-01

    A distributed query processing strategy, which is a key performance determinant in accessing distributed databases, aims to minimize the total query processing cost. One way to achieve this is by generating efficient distributed query plans that involve fewer sites for processing a query. In the case of distributed relational databases, the number of possible query plans increases exponentially with respect to the number of relations accessed by the query and the number of sites where these relations reside. Consequently, computing optimal distributed query plans becomes a complex problem. This distributed query plan generation (DQPG) problem has already been addressed using single objective genetic algorithm, where the objective is to minimize the total query processing cost comprising the local processing cost (LPC) and the site-to-site communication cost (CC). In this paper, this DQPG problem is formulated and solved as a biobjective optimization problem with the two objectives being minimize total LPC and minimize total CC. These objectives are simultaneously optimized using a multiobjective genetic algorithm NSGA-II. Experimental comparison of the proposed NSGA-II based DQPG algorithm with the single objective genetic algorithm shows that the former performs comparatively better and converges quickly towards optimal solutions for an observed crossover and mutation probability. PMID:24963513

  10. Solving the problem of Trans-Genomic Query with alignment tables.

    PubMed

    Parker, Douglass Stott; Hsiao, Ruey-Lung; Xing, Yi; Resch, Alissa M; Lee, Christopher J

    2008-01-01

    The trans-genomic query (TGQ) problem--enabling the free query of biological information, even across genomes--is a central challenge facing bioinformatics. Solutions to this problem can alter the nature of the field, moving it beyond the jungle of data integration and expanding the number and scope of questions that can be answered. An alignment table is a binary relationship on locations (sequence segments). An important special case of alignment tables are hit tables ? tables of pairs of highly similar segments produced by alignment tools like BLAST. However, alignment tables also include general binary relationships, and can represent any useful connection between sequence locations. They can be curated, and provide a high-quality queryable backbone of connections between biological information. Alignment tables thus can be a natural foundation for TGQ, as they permit a central part of the TGQ problem to be reduced to purely technical problems involving tables of locations.Key challenges in implementing alignment tables include efficient representation and indexing of sequence locations. We define a location datatype that can be incorporated naturally into common off-the-shelf database systems. We also describe an implementation of alignment tables in BLASTGRES, an extension of the open-source POSTGRESQL database system that provides indexing and operators on locations required for querying alignment tables. This paper also reviews several successful large-scale applications of alignment tables for Trans-Genomic Query. Tables with millions of alignments have been used in queries about alternative splicing, an area of genomic analysis concerning the way in which a single gene can yield multiple transcripts. Comparative genomics is a large potential application area for TGQ and alignment tables.

  11. The BioPrompt-box: an ontology-based clustering tool for searching in biological databases.

    PubMed

    Corsi, Claudio; Ferragina, Paolo; Marangoni, Roberto

    2007-03-08

    High-throughput molecular biology provides new data at an incredible rate, so that the increase in the size of biological databanks is enormous and very rapid. This scenario generates severe problems not only at indexing time, where suitable algorithmic techniques for data indexing and retrieval are required, but also at query time, since a user query may produce such a large set of results that their browsing and "understanding" becomes humanly impractical. This problem is well known to the Web community, where a new generation of Web search engines is being developed, like Vivisimo. These tools organize on-the-fly the results of a user query in a hierarchy of labeled folders that ease their browsing and knowledge extraction. We investigate this approach on biological data, and propose the so called The BioPrompt-boxsoftware system which deploys ontology-driven clustering strategies for making the searching process of biologists more efficient and effective. The BioPrompt-box (Bpb) defines a document as a biological sequence plus its associated meta-data taken from the underneath databank--like references to ontologies or to external databanks, and plain texts as comments of researchers and (title, abstracts or even body of) papers. Bpboffers several tools to customize the search and the clustering process over its indexed documents. The user can search a set of keywords within a specific field of the document schema, or can execute Blastto find documents relative to homologue sequences. In both cases the search task returns a set of documents (hits) which constitute the answer to the user query. Since the number of hits may be large, Bpbclusters them into groups of homogenous content, organized as a hierarchy of labeled clusters. The user can actually choose among several ontology-based hierarchical clustering strategies, each offering a different "view" of the returned hits. Bpbcomputes these views by exploiting the meta-data present within the retrieved documents such as the references to Gene Ontology, the taxonomy lineage, the organism and the keywords. Of course, the approach is flexible enough to leave room for future additions of other meta-information. The ultimate goal of the clustering process is to provide the user with several different readings of the (maybe numerous) query results and show possible hidden correlations among them, thus improving their browsing and understanding. Bpb is a powerful search engine that makes it very easy to perform complex queries over the indexed databanks (currently only UNIPROT is considered). The ontology-based clustering approach is efficient and effective, and could thus be applied successfully to larger databanks, like GenBank or EMBL.

  12. The BioPrompt-box: an ontology-based clustering tool for searching in biological databases

    PubMed Central

    Corsi, Claudio; Ferragina, Paolo; Marangoni, Roberto

    2007-01-01

    Background High-throughput molecular biology provides new data at an incredible rate, so that the increase in the size of biological databanks is enormous and very rapid. This scenario generates severe problems not only at indexing time, where suitable algorithmic techniques for data indexing and retrieval are required, but also at query time, since a user query may produce such a large set of results that their browsing and "understanding" becomes humanly impractical. This problem is well known to the Web community, where a new generation of Web search engines is being developed, like Vivisimo. These tools organize on-the-fly the results of a user query in a hierarchy of labeled folders that ease their browsing and knowledge extraction. We investigate this approach on biological data, and propose the so called The BioPrompt-boxsoftware system which deploys ontology-driven clustering strategies for making the searching process of biologists more efficient and effective. Results The BioPrompt-box (Bpb) defines a document as a biological sequence plus its associated meta-data taken from the underneath databank – like references to ontologies or to external databanks, and plain texts as comments of researchers and (title, abstracts or even body of) papers. Bpboffers several tools to customize the search and the clustering process over its indexed documents. The user can search a set of keywords within a specific field of the document schema, or can execute Blastto find documents relative to homologue sequences. In both cases the search task returns a set of documents (hits) which constitute the answer to the user query. Since the number of hits may be large, Bpbclusters them into groups of homogenous content, organized as a hierarchy of labeled clusters. The user can actually choose among several ontology-based hierarchical clustering strategies, each offering a different "view" of the returned hits. Bpbcomputes these views by exploiting the meta-data present within the retrieved documents such as the references to Gene Ontology, the taxonomy lineage, the organism and the keywords. Of course, the approach is flexible enough to leave room for future additions of other meta-information. The ultimate goal of the clustering process is to provide the user with several different readings of the (maybe numerous) query results and show possible hidden correlations among them, thus improving their browsing and understanding. Conclusion Bpb is a powerful search engine that makes it very easy to perform complex queries over the indexed databanks (currently only UNIPROT is considered). The ontology-based clustering approach is efficient and effective, and could thus be applied successfully to larger databanks, like GenBank or EMBL. PMID:17430575

  13. Use of Graph Database for the Integration of Heterogeneous Biological Data.

    PubMed

    Yoon, Byoung-Ha; Kim, Seon-Kyu; Kim, Seon-Young

    2017-03-01

    Understanding complex relationships among heterogeneous biological data is one of the fundamental goals in biology. In most cases, diverse biological data are stored in relational databases, such as MySQL and Oracle, which store data in multiple tables and then infer relationships by multiple-join statements. Recently, a new type of database, called the graph-based database, was developed to natively represent various kinds of complex relationships, and it is widely used among computer science communities and IT industries. Here, we demonstrate the feasibility of using a graph-based database for complex biological relationships by comparing the performance between MySQL and Neo4j, one of the most widely used graph databases. We collected various biological data (protein-protein interaction, drug-target, gene-disease, etc.) from several existing sources, removed duplicate and redundant data, and finally constructed a graph database containing 114,550 nodes and 82,674,321 relationships. When we tested the query execution performance of MySQL versus Neo4j, we found that Neo4j outperformed MySQL in all cases. While Neo4j exhibited a very fast response for various queries, MySQL exhibited latent or unfinished responses for complex queries with multiple-join statements. These results show that using graph-based databases, such as Neo4j, is an efficient way to store complex biological relationships. Moreover, querying a graph database in diverse ways has the potential to reveal novel relationships among heterogeneous biological data.

  14. Use of Graph Database for the Integration of Heterogeneous Biological Data

    PubMed Central

    Yoon, Byoung-Ha; Kim, Seon-Kyu

    2017-01-01

    Understanding complex relationships among heterogeneous biological data is one of the fundamental goals in biology. In most cases, diverse biological data are stored in relational databases, such as MySQL and Oracle, which store data in multiple tables and then infer relationships by multiple-join statements. Recently, a new type of database, called the graph-based database, was developed to natively represent various kinds of complex relationships, and it is widely used among computer science communities and IT industries. Here, we demonstrate the feasibility of using a graph-based database for complex biological relationships by comparing the performance between MySQL and Neo4j, one of the most widely used graph databases. We collected various biological data (protein-protein interaction, drug-target, gene-disease, etc.) from several existing sources, removed duplicate and redundant data, and finally constructed a graph database containing 114,550 nodes and 82,674,321 relationships. When we tested the query execution performance of MySQL versus Neo4j, we found that Neo4j outperformed MySQL in all cases. While Neo4j exhibited a very fast response for various queries, MySQL exhibited latent or unfinished responses for complex queries with multiple-join statements. These results show that using graph-based databases, such as Neo4j, is an efficient way to store complex biological relationships. Moreover, querying a graph database in diverse ways has the potential to reveal novel relationships among heterogeneous biological data. PMID:28416946

  15. Sharing the Burden and Risk: An Operational Assessment of the Reserve Components in Operation Iraqi Freedom

    DTIC Science & Technology

    2016-10-01

    Manpower Data Center (DMDC) for data extracts identifying monthly deployments from September 2001 through December 2014. This data would answer questions... Manpower Data Center (DMDC) databases captured which service members were mobilized and deployed. Government history offices, lessons learned...develop MOEs and MOPs to conduct assessments. 1. Data Extracts Concurrent with engagement efforts, IDA queried the Defense Manpower Data Center (DMDC

  16. Seeking Web-Based Information About Attention Deficit Hyperactivity Disorder: Where, What, and When.

    PubMed

    Rosenblum, Sara; Yom-Tov, Elad

    2017-04-21

    Attention Deficit Hyperactivity Disorder (ADHD) is a common neurodevelopmental disorder, prevalent among 2-10% of the population. The objective of this study was to describe where, what, and when people search online for topics related to ADHD. Data were collected from Microsoft's Bing search engine and from the community question and answer site, Yahoo Answers. The questions were analyzed based on keywords and using further statistical methods. Our results revealed that the Internet indeed constitutes a source of information for people searching the topic of ADHD, and that they search for information mostly about ADHD symptoms. Furthermore, individuals personally affected by the disorder made 2.0 more questions about ADHD compared with others. Questions begin when children reach 2 years of age, with an average age of 5.1 years. Most of the websites searched were not specifically related to ADHD and the timing of searches as well as the query content were different among those prediagnosis compared with postdiagnosis. The study results shed light on the features of ADHD-related searches. Thus, they may help improve the Internet as a source of reliable information, and promote improved awareness and knowledge about ADHD as well as quality of life for populations dealing with the complex phenomena of ADHD. ©Sara Rosenblum, Elad Yom-Tov. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 21.04.2017.

  17. Developing the personal narratives of children with complex communication needs associated with intellectual disabilities: What is the potential of Storysharing® ?

    PubMed

    Bunning, Karen; Gooch, Lynsey; Johnson, Miranda

    2017-07-01

    Sharing personal experience in narrative is challenging for individuals with intellectual disabilities. The aim was to investigate the potential of Storysharing ® (Storysharing is an innovative communication method based on personal narrative, which has been developed to support conversations with people who have severe difficulties in communication) intervention. The study involved eleven pupil-educational supporter dyads at a special school. Storysharing ® was implemented over a 15-week period. Personal narratives were captured on video pre- and post-intervention. The data were analysed for discourse and narrative. Significant differences revealed a decline in 'query-answer' sequences and an increase in supporter use of 'prompts'. After intervention, there were fewer story episodes. Narrative structure showed gains in action sequences leading to climax, and in closing elements, indicating a more complete narrative. The Storysharing ® intervention appears to be associated with changes to the dyadic, personal narratives illustrating its potential. © 2016 John Wiley & Sons Ltd.

  18. Efficient processing of multiple nested event pattern queries over multi-dimensional event streams based on a triaxial hierarchical model.

    PubMed

    Xiao, Fuyuan; Aritsugi, Masayoshi; Wang, Qing; Zhang, Rong

    2016-09-01

    For efficient and sophisticated analysis of complex event patterns that appear in streams of big data from health care information systems and support for decision-making, a triaxial hierarchical model is proposed in this paper. Our triaxial hierarchical model is developed by focusing on hierarchies among nested event pattern queries with an event concept hierarchy, thereby allowing us to identify the relationships among the expressions and sub-expressions of the queries extensively. We devise a cost-based heuristic by means of the triaxial hierarchical model to find an optimised query execution plan in terms of the costs of both the operators and the communications between them. According to the triaxial hierarchical model, we can also calculate how to reuse the results of the common sub-expressions in multiple queries. By integrating the optimised query execution plan with the reuse schemes, a multi-query optimisation strategy is developed to accomplish efficient processing of multiple nested event pattern queries. We present empirical studies in which the performance of multi-query optimisation strategy was examined under various stream input rates and workloads. Specifically, the workloads of pattern queries can be used for supporting monitoring patients' conditions. On the other hand, experiments with varying input rates of streams can correspond to changes of the numbers of patients that a system should manage, whereas burst input rates can correspond to changes of rushes of patients to be taken care of. The experimental results have shown that, in Workload 1, our proposal can improve about 4 and 2 times throughput comparing with the relative works, respectively; in Workload 2, our proposal can improve about 3 and 2 times throughput comparing with the relative works, respectively; in Workload 3, our proposal can improve about 6 times throughput comparing with the relative work. The experimental results demonstrated that our proposal was able to process complex queries efficiently which can support health information systems and further decision-making. Copyright © 2016 Elsevier B.V. All rights reserved.

  19. Towards Big Earth Data Analytics: The EarthServer Approach

    NASA Astrophysics Data System (ADS)

    Baumann, Peter

    2013-04-01

    Big Data in the Earth sciences, the Tera- to Exabyte archives, mostly are made up from coverage data whereby the term "coverage", according to ISO and OGC, is defined as the digital representation of some space-time varying phenomenon. Common examples include 1-D sensor timeseries, 2-D remote sensing imagery, 3D x/y/t image timeseries and x/y/z geology data, and 4-D x/y/z/t atmosphere and ocean data. Analytics on such data requires on-demand processing of sometimes significant complexity, such as getting the Fourier transform of satellite images. As network bandwidth limits prohibit transfer of such Big Data it is indispensable to devise protocols allowing clients to task flexible and fast processing on the server. The EarthServer initiative, funded by EU FP7 eInfrastructures, unites 11 partners from computer and earth sciences to establish Big Earth Data Analytics. One key ingredient is flexibility for users to ask what they want, not impeded and complicated by system internals. The EarthServer answer to this is to use high-level query languages; these have proven tremendously successful on tabular and XML data, and we extend them with a central geo data structure, multi-dimensional arrays. A second key ingredient is scalability. Without any doubt, scalability ultimately can only be achieved through parallelization. In the past, parallelizing code has been done at compile time and usually with manual intervention. The EarthServer approach is to perform a samentic-based dynamic distribution of queries fragments based on networks optimization and further criteria. The EarthServer platform is comprised by rasdaman, an Array DBMS enabling efficient storage and retrieval of any-size, any-type multi-dimensional raster data. In the project, rasdaman is being extended with several functionality and scalability features, including: support for irregular grids and general meshes; in-situ retrieval (evaluation of database queries on existing archive structures, avoiding data import and, hence, duplication); the aforementioned distributed query processing. Additionally, Web clients for multi-dimensional data visualization are being established. Client/server interfaces are strictly based on OGC and W3C standards, in particular the Web Coverage Processing Service (WCPS) which defines a high-level raster query language. We present the EarthServer project with its vision and approaches, relate it to the current state of standardization, and demonstrate it by way of large-scale data centers and their services using rasdaman.

  20. Ontology-Driven Provenance Management in eScience: An Application in Parasite Research

    NASA Astrophysics Data System (ADS)

    Sahoo, Satya S.; Weatherly, D. Brent; Mutharaju, Raghava; Anantharam, Pramod; Sheth, Amit; Tarleton, Rick L.

    Provenance, from the French word "provenir", describes the lineage or history of a data entity. Provenance is critical information in scientific applications to verify experiment process, validate data quality and associate trust values with scientific results. Current industrial scale eScience projects require an end-to-end provenance management infrastructure. This infrastructure needs to be underpinned by formal semantics to enable analysis of large scale provenance information by software applications. Further, effective analysis of provenance information requires well-defined query mechanisms to support complex queries over large datasets. This paper introduces an ontology-driven provenance management infrastructure for biology experiment data, as part of the Semantic Problem Solving Environment (SPSE) for Trypanosoma cruzi (T.cruzi). This provenance infrastructure, called T.cruzi Provenance Management System (PMS), is underpinned by (a) a domain-specific provenance ontology called Parasite Experiment ontology, (b) specialized query operators for provenance analysis, and (c) a provenance query engine. The query engine uses a novel optimization technique based on materialized views called materialized provenance views (MPV) to scale with increasing data size and query complexity. This comprehensive ontology-driven provenance infrastructure not only allows effective tracking and management of ongoing experiments in the Tarleton Research Group at the Center for Tropical and Emerging Global Diseases (CTEGD), but also enables researchers to retrieve the complete provenance information of scientific results for publication in literature.

  1. Mining the SDSS SkyServer SQL queries log

    NASA Astrophysics Data System (ADS)

    Hirota, Vitor M.; Santos, Rafael; Raddick, Jordan; Thakar, Ani

    2016-05-01

    SkyServer, the Internet portal for the Sloan Digital Sky Survey (SDSS) astronomic catalog, provides a set of tools that allows data access for astronomers and scientific education. One of SkyServer data access interfaces allows users to enter ad-hoc SQL statements to query the catalog. SkyServer also presents some template queries that can be used as basis for more complex queries. This interface has logged over 330 million queries submitted since 2001. It is expected that analysis of this data can be used to investigate usage patterns, identify potential new classes of queries, find similar queries, etc. and to shed some light on how users interact with the Sloan Digital Sky Survey data and how scientists have adopted the new paradigm of e-Science, which could in turn lead to enhancements on the user interfaces and experience in general. In this paper we review some approaches to SQL query mining, apply the traditional techniques used in the literature and present lessons learned, namely, that the general text mining approach for feature extraction and clustering does not seem to be adequate for this type of data, and, most importantly, we find that this type of analysis can result in very different queries being clustered together.

  2. Secure Skyline Queries on Cloud Platform

    PubMed Central

    Liu, Jinfei; Yang, Juncheng; Xiong, Li; Pei, Jian

    2017-01-01

    Outsourcing data and computation to cloud server provides a cost-effective way to support large scale data storage and query processing. However, due to security and privacy concerns, sensitive data (e.g., medical records) need to be protected from the cloud server and other unauthorized users. One approach is to outsource encrypted data to the cloud server and have the cloud server perform query processing on the encrypted data only. It remains a challenging task to support various queries over encrypted data in a secure and efficient way such that the cloud server does not gain any knowledge about the data, query, and query result. In this paper, we study the problem of secure skyline queries over encrypted data. The skyline query is particularly important for multi-criteria decision making but also presents significant challenges due to its complex computations. We propose a fully secure skyline query protocol on data encrypted using semantically-secure encryption. As a key subroutine, we present a new secure dominance protocol, which can be also used as a building block for other queries. Finally, we provide both serial and parallelized implementations and empirically study the protocols in terms of efficiency and scalability under different parameter settings, verifying the feasibility of our proposed solutions. PMID:28883710

  3. Limits on efficient computation in the physical world

    NASA Astrophysics Data System (ADS)

    Aaronson, Scott Joel

    More than a speculative technology, quantum computing seems to challenge our most basic intuitions about how the physical world should behave. In this thesis I show that, while some intuitions from classical computer science must be jettisoned in the light of modern physics, many others emerge nearly unscathed; and I use powerful tools from computational complexity theory to help determine which are which. In the first part of the thesis, I attack the common belief that quantum computing resembles classical exponential parallelism, by showing that quantum computers would face serious limitations on a wider range of problems than was previously known. In particular, any quantum algorithm that solves the collision problem---that of deciding whether a sequence of n integers is one-to-one or two-to-one---must query the sequence O (n1/5) times. This resolves a question that was open for years; previously no lower bound better than constant was known. A corollary is that there is no "black-box" quantum algorithm to break cryptographic hash functions or solve the Graph Isomorphism problem in polynomial time. I also show that relative to an oracle, quantum computers could not solve NP-complete problems in polynomial time, even with the help of nonuniform "quantum advice states"; and that any quantum algorithm needs O (2n/4/n) queries to find a local minimum of a black-box function on the n-dimensional hypercube. Surprisingly, the latter result also leads to new classical lower bounds for the local search problem. Finally, I give new lower bounds on quantum one-way communication complexity, and on the quantum query complexity of total Boolean functions and recursive Fourier sampling. The second part of the thesis studies the relationship of the quantum computing model to physical reality. I first examine the arguments of Leonid Levin, Stephen Wolfram, and others who believe quantum computing to be fundamentally impossible. I find their arguments unconvincing without a "Sure/Shor separator"---a criterion that separates the already-verified quantum states from those that appear in Shor's factoring algorithm. I argue that such a separator should be based on a complexity classification of quantum states, and go on to create such a classification. Next I ask what happens to the quantum computing model if we take into account that the speed of light is finite---and in particular, whether Grover's algorithm still yields a quadratic speedup for searching a database. Refuting a claim by Benioff, I show that the surprising answer is yes. Finally, I analyze hypothetical models of computation that go even beyond quantum computing. I show that many such models would be as powerful as the complexity class PP, and use this fact to give a simple, quantum computing based proof that PP is closed under intersection. On the other hand, I also present one model---wherein we could sample the entire history of a hidden variable---that appears to be more powerful than standard quantum computing, but only slightly so.

  4. An Intelligent Pictorial Information System

    NASA Astrophysics Data System (ADS)

    Lee, Edward T.; Chang, B.

    1987-05-01

    In examining the history of computer application, we discover that early computer systems were developed primarily for applications related to scientific computation, as in weather prediction, aerospace applications, and nuclear physics applications. At this stage, the computer system served as a big calculator to perform, in the main, manipulation of numbers. Then it was found that computer systems could also be used for business applications, information storage and retrieval, word processing, and report generation. The history of computer application is summarized in Table I. The complexity of pictures makes picture processing much more difficult than number and alphanumerical processing. Therefore, new techniques, new algorithms, and above all, new pictorial knowledge, [1] are needed to overcome the limitatins of existing computer systems. New frontiers in designing computer systems are the ways to handle the representation,[2,3] classification, manipulation, processing, storage, and retrieval of pictures. Especially, the ways to deal with similarity measures and the meaning of the word "approximate" and the phrase "approximate reasoning" are an important and an indispensable part of an intelligent pictorial information system. [4,5] The main objective of this paper is to investigate the mathematical foundation for the effective organization and efficient retrieval of pictures in similarity-directed pictorial databases, [6] based on similarity retrieval techniques [7] and fuzzy languages [8]. The main advantage of this approach is that similar pictures are stored logically close to each other by using quantitative similarity measures. Thus, for answering queries, the amount of picture data needed to be searched can be reduced and the retrieval time can be improved. In addition, in a pictorial database, very often it is desired to find pictures (or feature vectors, histograms, etc.) that are most similar to or most dissimilar [9] to a test picture (or feature vector). Using similarity measures, one can not only store similar pictures logically or physically close to each other in order to improve retrieval or updating efficiency, one can also use such similarity measures to answer fuzzy queries involving nonexact retrieval conditions. In this paper, similarity directed pictorial databases involving geometric figures, chromosome images, [10] leukocyte images, cardiomyopathy images, and satellite images [11] are presented as illustrative examples.

  5. Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce.

    PubMed

    Aji, Ablimit; Wang, Fusheng; Vo, Hoang; Lee, Rubao; Liu, Qiaoling; Zhang, Xiaodong; Saltz, Joel

    2013-08-01

    Support of high performance queries on large volumes of spatial data becomes increasingly important in many application domains, including geospatial problems in numerous fields, location based services, and emerging scientific applications that are increasingly data- and compute-intensive. The emergence of massive scale spatial data is due to the proliferation of cost effective and ubiquitous positioning technologies, development of high resolution imaging technologies, and contribution from a large number of community users. There are two major challenges for managing and querying massive spatial data to support spatial queries: the explosion of spatial data, and the high computational complexity of spatial queries. In this paper, we present Hadoop-GIS - a scalable and high performance spatial data warehousing system for running large scale spatial queries on Hadoop. Hadoop-GIS supports multiple types of spatial queries on MapReduce through spatial partitioning, customizable spatial query engine RESQUE, implicit parallel spatial query execution on MapReduce, and effective methods for amending query results through handling boundary objects. Hadoop-GIS utilizes global partition indexing and customizable on demand local spatial indexing to achieve efficient query processing. Hadoop-GIS is integrated into Hive to support declarative spatial queries with an integrated architecture. Our experiments have demonstrated the high efficiency of Hadoop-GIS on query response and high scalability to run on commodity clusters. Our comparative experiments have showed that performance of Hadoop-GIS is on par with parallel SDBMS and outperforms SDBMS for compute-intensive queries. Hadoop-GIS is available as a set of library for processing spatial queries, and as an integrated software package in Hive.

  6. Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce

    PubMed Central

    Aji, Ablimit; Wang, Fusheng; Vo, Hoang; Lee, Rubao; Liu, Qiaoling; Zhang, Xiaodong; Saltz, Joel

    2013-01-01

    Support of high performance queries on large volumes of spatial data becomes increasingly important in many application domains, including geospatial problems in numerous fields, location based services, and emerging scientific applications that are increasingly data- and compute-intensive. The emergence of massive scale spatial data is due to the proliferation of cost effective and ubiquitous positioning technologies, development of high resolution imaging technologies, and contribution from a large number of community users. There are two major challenges for managing and querying massive spatial data to support spatial queries: the explosion of spatial data, and the high computational complexity of spatial queries. In this paper, we present Hadoop-GIS – a scalable and high performance spatial data warehousing system for running large scale spatial queries on Hadoop. Hadoop-GIS supports multiple types of spatial queries on MapReduce through spatial partitioning, customizable spatial query engine RESQUE, implicit parallel spatial query execution on MapReduce, and effective methods for amending query results through handling boundary objects. Hadoop-GIS utilizes global partition indexing and customizable on demand local spatial indexing to achieve efficient query processing. Hadoop-GIS is integrated into Hive to support declarative spatial queries with an integrated architecture. Our experiments have demonstrated the high efficiency of Hadoop-GIS on query response and high scalability to run on commodity clusters. Our comparative experiments have showed that performance of Hadoop-GIS is on par with parallel SDBMS and outperforms SDBMS for compute-intensive queries. Hadoop-GIS is available as a set of library for processing spatial queries, and as an integrated software package in Hive. PMID:24187650

  7. Usage of the Jess Engine, Rules and Ontology to Query a Relational Database

    NASA Astrophysics Data System (ADS)

    Bak, Jaroslaw; Jedrzejek, Czeslaw; Falkowski, Maciej

    We present a prototypical implementation of a library tool, the Semantic Data Library (SDL), which integrates the Jess (Java Expert System Shell) engine, rules and ontology to query a relational database. The tool extends functionalities of previous OWL2Jess with SWRL implementations and takes full advantage of the Jess engine, by separating forward and backward reasoning. The optimization of integration of all these technologies is an advancement over previous tools. We discuss the complexity of the query algorithm. As a demonstration of capability of the SDL library, we execute queries using crime ontology which is being developed in the Polish PPBW project.

  8. Supporting temporal queries on clinical relational databases: the S-WATCH-QL language.

    PubMed Central

    Combi, C.; Missora, L.; Pinciroli, F.

    1996-01-01

    Due to the ubiquitous and special nature of time, specially in clinical datábases there's the need of particular temporal data and operators. In this paper we describe S-WATCH-QL (Structured Watch Query Language), a temporal extension of SQL, the widespread query language based on the relational model. S-WATCH-QL extends the well-known SQL by the addition of: a) temporal data types that allow the storage of information with different levels of granularity; b) historical relations that can store together both instantaneous valid times and intervals; c) some temporal clauses, functions and predicates allowing to define complex temporal queries. PMID:8947722

  9. Differential privacy based on importance weighting

    PubMed Central

    Ji, Zhanglong

    2014-01-01

    This paper analyzes a novel method for publishing data while still protecting privacy. The method is based on computing weights that make an existing dataset, for which there are no confidentiality issues, analogous to the dataset that must be kept private. The existing dataset may be genuine but public already, or it may be synthetic. The weights are importance sampling weights, but to protect privacy, they are regularized and have noise added. The weights allow statistical queries to be answered approximately while provably guaranteeing differential privacy. We derive an expression for the asymptotic variance of the approximate answers. Experiments show that the new mechanism performs well even when the privacy budget is small, and when the public and private datasets are drawn from different populations. PMID:24482559

  10. A Semantic Basis for Proof Queries and Transformations

    NASA Technical Reports Server (NTRS)

    Aspinall, David; Denney, Ewen W.; Luth, Christoph

    2013-01-01

    We extend the query language PrQL, designed for inspecting machine representations of proofs, to also allow transformation of proofs. PrQL natively supports hiproofs which express proof structure using hierarchically nested labelled trees, which we claim is a natural way of taming the complexity of huge proofs. Query-driven transformations enable manipulation of this structure, in particular, to transform proofs produced by interactive theorem provers into forms that assist their understanding, or that could be consumed by other tools. In this paper we motivate and define basic transformation operations, using an abstract denotational semantics of hiproofs and queries. This extends our previous semantics for queries based on syntactic tree representations.We define update operations that add and remove sub-proofs, and manipulate the hierarchy to group and ungroup nodes. We show that

  11. Shedding Light on Thirteen Years of Darkness: Content Analysis of Articles Pertaining to Transgender Issues in Marriage/Couple and Family Therapy Journals

    ERIC Educational Resources Information Center

    Blumer, Markie L. C.; Green, Mary S.; Knowles, Sarah J.; Williams, April

    2012-01-01

    What is the extent to which marriage/couple and family therapy (M/CFT) journals address transgender issues and how many of them say they are inclusive of transgender persons when they are not? To answer these queries, a content analysis was conducted on articles published in M/CFT literature from 1997 through 2009. Of the 10,739 articles examined…

  12. In the Jungle of Astronomical On--line Data Services

    NASA Astrophysics Data System (ADS)

    Egret, D.

    The author tried to survive in the jungle of astronomical on--line data services. In order to find efficient answers to common scientific data retrieval requests, he had to collect many pieces of information, in order to formulate typical user scenarios, and try them against a number of different data bases, catalogue services, or information systems. He discovered soon how frustrating treasure coffers may be when their keys are not available, but he realized also that nice widgets and gadgets are of no help when the information is not there. And, before long, he knew he would have to navigate through several systems because no one was yet offering a general answer to all his questions. I will present examples of common user scenarios and show how they were tested against a number of services. I will propose some elements of classification which should help the end-user to evaluate how adequate the different services may be for providing satisfying answers to specific queries. For that, many aspects of the user interaction will be considered: documentation, access, query formulation, functionalities, qualification of the data, overall efficiency, etc. I will also suggest possible improvements to the present situation: the first of them being to encourage system managers to increase collaboration between one another, for the benefit of the whole astronomical community. The subjective review I will present, is based on publicly available astronomical on--line services from the U.S. and from Europe, most of which (excepting the newcomers) were described in ``Databases and On-Line Data in Astronomy", (Albrecht & Egret, eds, 1991): this includes databases (such as NED and Simbad ), catalog services ( StarCat , DIRA , XCatScan , etc.), and information systems ( ADS and ESIS ).

  13. Multi-INT Complex Event Processing using Approximate, Incremental Graph Pattern Search

    DTIC Science & Technology

    2012-06-01

    graph pattern search and SPARQL queries . Total execution time for 10 executions each of 5 random pattern searches in synthetic data sets...01/11 1000 10000 100000 RDF triples Time (secs) 10 20 Graph pattern algorithm SPARQL queries Initial Performance Comparisons 09/18/11 2011 Thrust Area

  14. Graph-Based Weakly-Supervised Methods for Information Extraction & Integration

    ERIC Educational Resources Information Center

    Talukdar, Partha Pratim

    2010-01-01

    The variety and complexity of potentially-related data resources available for querying--webpages, databases, data warehouses--has been growing ever more rapidly. There is a growing need to pose integrative queries "across" multiple such sources, exploiting foreign keys and other means of interlinking data to merge information from diverse…

  15. Enabling complex queries to drug information sources through functional composition.

    PubMed

    Peters, Lee; Mortensen, Jonathan; Nguyen, Thang; Bodenreider, Olivier

    2013-01-01

    Our objective was to enable an end-user to create complex queries to drug information sources through functional composition, by creating sequences of functions from application program interfaces (API) to drug terminologies. The development of a functional composition model seeks to link functions from two distinct APIs. An ontology was developed using Protégé to model the functions of the RxNorm and NDF-RT APIs by describing the semantics of their input and output. A set of rules were developed to define the interoperable conditions for functional composition. The operational definition of interoperability between function pairs is established by executing the rules on the ontology. We illustrate that the functional composition model supports common use cases, including checking interactions for RxNorm drugs and deploying allergy lists defined in reference to drug properties in NDF-RT. This model supports the RxMix application (http://mor.nlm.nih.gov/RxMix/), an application we developed for enabling complex queries to the RxNorm and NDF-RT APIs.

  16. Saying What You're Looking For: Linguistics Meets Video Search.

    PubMed

    Barrett, Daniel Paul; Barbu, Andrei; Siddharth, N; Siskind, Jeffrey Mark

    2016-10-01

    We present an approach to searching large video corpora for clips which depict a natural-language query in the form of a sentence. Compositional semantics is used to encode subtle meaning differences lost in other approaches, such as the difference between two sentences which have identical words but entirely different meaning: The person rode the horse versus The horse rode the person. Given a sentential query and a natural-language parser, we produce a score indicating how well a video clip depicts that sentence for each clip in a corpus and return a ranked list of clips. Two fundamental problems are addressed simultaneously: detecting and tracking objects, and recognizing whether those tracks depict the query. Because both tracking and object detection are unreliable, our approach uses the sentential query to focus the tracker on the relevant participants and ensures that the resulting tracks are described by the sentential query. While most earlier work was limited to single-word queries which correspond to either verbs or nouns, we search for complex queries which contain multiple phrases, such as prepositional phrases, and modifiers, such as adverbs. We demonstrate this approach by searching for 2,627 naturally elicited sentential queries in 10 Hollywood movies.

  17. Secure quantum private information retrieval using phase-encoded queries

    NASA Astrophysics Data System (ADS)

    Olejnik, Lukasz

    2011-08-01

    We propose a quantum solution to the classical private information retrieval (PIR) problem, which allows one to query a database in a private manner. The protocol offers privacy thresholds and allows the user to obtain information from a database in a way that offers the potential adversary, in this model the database owner, no possibility of deterministically establishing the query contents. This protocol may also be viewed as a solution to the symmetrically private information retrieval problem in that it can offer database security (inability for a querying user to steal its contents). Compared to classical solutions, the protocol offers substantial improvement in terms of communication complexity. In comparison with the recent quantum private queries [Phys. Rev. Lett.PRLTAO0031-900710.1103/PhysRevLett.100.230502 100, 230502 (2008)] protocol, it is more efficient in terms of communication complexity and the number of rounds, while offering a clear privacy parameter. We discuss the security of the protocol and analyze its strengths and conclude that using this technique makes it challenging to obtain the unconditional (in the information-theoretic sense) privacy degree; nevertheless, in addition to being simple, the protocol still offers a privacy level. The oracle used in the protocol is inspired both by the classical computational PIR solutions as well as the Deutsch-Jozsa oracle.

  18. Secure quantum private information retrieval using phase-encoded queries

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Olejnik, Lukasz

    We propose a quantum solution to the classical private information retrieval (PIR) problem, which allows one to query a database in a private manner. The protocol offers privacy thresholds and allows the user to obtain information from a database in a way that offers the potential adversary, in this model the database owner, no possibility of deterministically establishing the query contents. This protocol may also be viewed as a solution to the symmetrically private information retrieval problem in that it can offer database security (inability for a querying user to steal its contents). Compared to classical solutions, the protocol offersmore » substantial improvement in terms of communication complexity. In comparison with the recent quantum private queries [Phys. Rev. Lett. 100, 230502 (2008)] protocol, it is more efficient in terms of communication complexity and the number of rounds, while offering a clear privacy parameter. We discuss the security of the protocol and analyze its strengths and conclude that using this technique makes it challenging to obtain the unconditional (in the information-theoretic sense) privacy degree; nevertheless, in addition to being simple, the protocol still offers a privacy level. The oracle used in the protocol is inspired both by the classical computational PIR solutions as well as the Deutsch-Jozsa oracle.« less

  19. Test-state approach to the quantum search problem

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sehrawat, Arun; Nguyen, Le Huy; Graduate School for Integrative Sciences and Engineering, National University of Singapore, Singapore 117597

    2011-05-15

    The search for 'a quantum needle in a quantum haystack' is a metaphor for the problem of finding out which one of a permissible set of unitary mappings - the oracles - is implemented by a given black box. Grover's algorithm solves this problem with quadratic speedup as compared with the analogous search for 'a classical needle in a classical haystack'. Since the outcome of Grover's algorithm is probabilistic - it gives the correct answer with high probability, not with certainty - the answer requires verification. For this purpose we introduce specific test states, one for each oracle. These testmore » states can also be used to realize 'a classical search for the quantum needle' which is deterministic - it always gives a definite answer after a finite number of steps - and 3.41 times as fast as the purely classical search. Since the test-state search and Grover's algorithm look for the same quantum needle, the average number of oracle queries of the test-state search is the classical benchmark for Grover's algorithm.« less

  20. Developing A Web-based User Interface for Semantic Information Retrieval

    NASA Technical Reports Server (NTRS)

    Berrios, Daniel C.; Keller, Richard M.

    2003-01-01

    While there are now a number of languages and frameworks that enable computer-based systems to search stored data semantically, the optimal design for effective user interfaces for such systems is still uncle ar. Such interfaces should mask unnecessary query detail from users, yet still allow them to build queries of arbitrary complexity without significant restrictions. We developed a user interface supporting s emantic query generation for Semanticorganizer, a tool used by scient ists and engineers at NASA to construct networks of knowledge and dat a. Through this interface users can select node types, node attribute s and node links to build ad-hoc semantic queries for searching the S emanticOrganizer network.

  1. Array Processing in the Cloud: the rasdaman Approach

    NASA Astrophysics Data System (ADS)

    Merticariu, Vlad; Dumitru, Alex

    2015-04-01

    The multi-dimensional array data model is gaining more and more attention when dealing with Big Data challenges in a variety of domains such as climate simulations, geographic information systems, medical imaging or astronomical observations. Solutions provided by classical Big Data tools such as Key-Value Stores and MapReduce, as well as traditional relational databases, proved to be limited in domains associated with multi-dimensional data. This problem has been addressed by the field of array databases, in which systems provide database services for raster data, without imposing limitations on the number of dimensions that a dataset can have. Examples of datasets commonly handled by array databases include 1-dimensional sensor data, 2-D satellite imagery, 3-D x/y/t image time series as well as x/y/z geophysical voxel data, and 4-D x/y/z/t weather data. And this can grow as large as simulations of the whole universe when it comes to astrophysics. rasdaman is a well established array database, which implements many optimizations for dealing with large data volumes and operation complexity. Among those, the latest one is intra-query parallelization support: a network of machines collaborate for answering a single array database query, by dividing it into independent sub-queries sent to different servers. This enables massive processing speed-ups, which promise solutions to research challenges on multi-Petabyte data cubes. There are several correlated factors which influence the speedup that intra-query parallelisation brings: the number of servers, the capabilities of each server, the quality of the network, the availability of the data to the server that needs it in order to compute the result and many more. In the effort of adapting the engine to cloud processing patterns, two main components have been identified: one that handles communication and gathers information about the arrays sitting on every server, and a processing unit responsible with dividing work among available nodes and executing operations on local data. The federation daemon collects and stores statistics from the other network nodes and provides real time updates about local changes. Information exchanged includes available datasets, CPU load and memory usage per host. The processing component is represented by the rasdaman server. Using information from the federation daemon it breaks queries into subqueries to be executed on peer nodes, ships them, and assembles the intermediate results. Thus, we define a rasdaman network node as a pair of a federation daemon and a rasdaman server. Any node can receive a query and will subsequently act as this query's dispatcher, so all peers are at the same level and there is no single point of failure. Should a node become inaccessible then the peers will recognize this and will not any longer consider this peer for distribution. Conversely, a peer at any time can join the network. To assess the feasibility of our approach, we deployed a rasdaman network in the Amazon Elastic Cloud environment on 1001 nodes, and observed that this feature can greatly increase the performance and scalability of the system, offering a large throughput of processed data.

  2. Genomes as geography: using GIS technology to build interactive genome feature maps

    PubMed Central

    Dolan, Mary E; Holden, Constance C; Beard, M Kate; Bult, Carol J

    2006-01-01

    Background Many commonly used genome browsers display sequence annotations and related attributes as horizontal data tracks that can be toggled on and off according to user preferences. Most genome browsers use only simple keyword searches and limit the display of detailed annotations to one chromosomal region of the genome at a time. We have employed concepts, methodologies, and tools that were developed for the display of geographic data to develop a Genome Spatial Information System (GenoSIS) for displaying genomes spatially, and interacting with genome annotations and related attribute data. In contrast to the paradigm of horizontally stacked data tracks used by most genome browsers, GenoSIS uses the concept of registered spatial layers composed of spatial objects for integrated display of diverse data. In addition to basic keyword searches, GenoSIS supports complex queries, including spatial queries, and dynamically generates genome maps. Our adaptation of the geographic information system (GIS) model in a genome context supports spatial representation of genome features at multiple scales with a versatile and expressive query capability beyond that supported by existing genome browsers. Results We implemented an interactive genome sequence feature map for the mouse genome in GenoSIS, an application that uses ArcGIS, a commercially available GIS software system. The genome features and their attributes are represented as spatial objects and data layers that can be toggled on and off according to user preferences or displayed selectively in response to user queries. GenoSIS supports the generation of custom genome maps in response to complex queries about genome features based on both their attributes and locations. Our example application of GenoSIS to the mouse genome demonstrates the powerful visualization and query capability of mature GIS technology applied in a novel domain. Conclusion Mapping tools developed specifically for geographic data can be exploited to display, explore and interact with genome data. The approach we describe here is organism independent and is equally useful for linear and circular chromosomes. One of the unique capabilities of GenoSIS compared to existing genome browsers is the capacity to generate genome feature maps dynamically in response to complex attribute and spatial queries. PMID:16984652

  3. Monitoring of bacteria growth using a wireless, remote query resonant-circuit sensor: application to environmental sensing

    NASA Technical Reports Server (NTRS)

    Ong, K. G.; Wang, J.; Singh, R. S.; Bachas, L. G.; Grimes, C. A.; Daunert, S. (Principal Investigator)

    2001-01-01

    A new technique is presented for in-vivo remote query measurement of the complex permittivity spectra of a biological culture solution. A sensor comprised of a printed inductor-capacitor resonant-circuit is placed within the culture solution of interest, with the impedance spectrum of the sensor measured using a remotely located loop antenna; the complex permittivity spectra of the culture is calculated from the measured impedance spectrum. The remote query nature of the sensor platform enables, for example, the in-vivo real-time monitoring of bacteria or yeast growth from within sealed opaque containers. The wireless monitoring technique does not require a specific alignment between sensor and antenna. Results are presented for studies conducted on laboratory strains of Bacillus subtilis, Escherichia coli JM109, Pseudomonas putida and Saccharomyces cerevisiae.

  4. Seismic Constraints on Shallow Crustal Processes at the East Pacific Rise.

    DTIC Science & Technology

    1994-02-01

    also put me up in his house and introduced me to the local microbreweries . Will Wilcock provided the codes for attenuation tomography used in Chapter 4...and also kept the internet busy answering all of my questions. John Collins has patiently responded to all my queries about reflectivity. Through the...Gerard Fryer. Chapter 4 is in press in Geophysical Research Letters, with co-authors Will Wilcock and G.M. Purdy. I plan on submitting a revised

  5. Insulation insomnia: a cure

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Heyman, M.

    1978-09-01

    Some answers to questions are provided on home insulation. The first asks about measures to cut home energy use, followed by queries on kinds of insulation available and their differences, description of the R Factor, selecting the best kind, how much insulation and the money it will save, fire safety properties, the use of a vapor barrier, the plugging of attic vents in the winter, and installation information. Some research projects or the fire safety, properties, and installation are reviewed.

  6. Molecular implementation of simple logic programs.

    PubMed

    Ran, Tom; Kaplan, Shai; Shapiro, Ehud

    2009-10-01

    Autonomous programmable computing devices made of biomolecules could interact with a biological environment and be used in future biological and medical applications. Biomolecular implementations of finite automata and logic gates have already been developed. Here, we report an autonomous programmable molecular system based on the manipulation of DNA strands that is capable of performing simple logical deductions. Using molecular representations of facts such as Man(Socrates) and rules such as Mortal(X) <-- Man(X) (Every Man is Mortal), the system can answer molecular queries such as Mortal(Socrates)? (Is Socrates Mortal?) and Mortal(X)? (Who is Mortal?). This biomolecular computing system compares favourably with previous approaches in terms of expressive power, performance and precision. A compiler translates facts, rules and queries into their molecular representations and subsequently operates a robotic system that assembles the logical deductions and delivers the result. This prototype is the first simple programming language with a molecular-scale implementation.

  7. SPARQL-enabled identifier conversion with Identifiers.org

    PubMed Central

    Wimalaratne, Sarala M.; Bolleman, Jerven; Juty, Nick; Katayama, Toshiaki; Dumontier, Michel; Redaschi, Nicole; Le Novère, Nicolas; Hermjakob, Henning; Laibe, Camille

    2015-01-01

    Motivation: On the semantic web, in life sciences in particular, data is often distributed via multiple resources. Each of these sources is likely to use their own International Resource Identifier for conceptually the same resource or database record. The lack of correspondence between identifiers introduces a barrier when executing federated SPARQL queries across life science data. Results: We introduce a novel SPARQL-based service to enable on-the-fly integration of life science data. This service uses the identifier patterns defined in the Identifiers.org Registry to generate a plurality of identifier variants, which can then be used to match source identifiers with target identifiers. We demonstrate the utility of this identifier integration approach by answering queries across major producers of life science Linked Data. Availability and implementation: The SPARQL-based identifier conversion service is available without restriction at http://identifiers.org/services/sparql. Contact: sarala@ebi.ac.uk PMID:25638809

  8. Arenani: pointing and information query system for object beyond your reach

    NASA Astrophysics Data System (ADS)

    Adachi, Mariko; Sakamoto, Kunio

    2008-03-01

    The authors developed a prototype information query system. It is easy to get the information about an object with in your reach. But it is troublesome to do the same in case that the object is far away. If someone is around you, you can ask an easy question with a finger pointing; "What is that?" Our developed system also realizes this approach using information technologies. The system consists of a laser pointer, transmitter and receiver units for an optical communication. The laser pointer is used for pointing an object. Moreover this laser light is modulated for sending information about user's identification (ID) codes to identify who asks a question. Each object has a receiver for laser light communication and sends user's identification to a main computer. After pointing an object, a questioner receives an answer through a wireless information network like an email on the cellular phone.

  9. SPARQL-enabled identifier conversion with Identifiers.org.

    PubMed

    Wimalaratne, Sarala M; Bolleman, Jerven; Juty, Nick; Katayama, Toshiaki; Dumontier, Michel; Redaschi, Nicole; Le Novère, Nicolas; Hermjakob, Henning; Laibe, Camille

    2015-06-01

    On the semantic web, in life sciences in particular, data is often distributed via multiple resources. Each of these sources is likely to use their own International Resource Identifier for conceptually the same resource or database record. The lack of correspondence between identifiers introduces a barrier when executing federated SPARQL queries across life science data. We introduce a novel SPARQL-based service to enable on-the-fly integration of life science data. This service uses the identifier patterns defined in the Identifiers.org Registry to generate a plurality of identifier variants, which can then be used to match source identifiers with target identifiers. We demonstrate the utility of this identifier integration approach by answering queries across major producers of life science Linked Data. The SPARQL-based identifier conversion service is available without restriction at http://identifiers.org/services/sparql. © The Author 2015. Published by Oxford University Press.

  10. 1927 reference in the new millennium: where is the Automat?

    PubMed Central

    Worel, Sunny Lynn; Rethlefsen, Melissa Lyle

    2003-01-01

    James Ballard, director at the Boston Medical Library, tracked questions he received at the reference desk in 1927 to recognize the trend of queries and to record the information for future use. He presented a paper on reference services that listed sixty of his reference questions at the Thirtieth Annual Meeting of the Medical Library Association (MLA) in 1927. During a two-month period in 2001, the authors examined Ballard's questions by attempting to answer them with print sources from the 1920s and with the Internet. The searchers answered 85% of the questions with the Internet and 80% with 1920s reference sources. The authors compared Internet and 1920s print resources for practical use. When answering the questions with 1920s resources, the searchers rediscovered a time in health sciences libraries when there was no Ulrich's Periodicals Directory, no standardized subject headings, and no comprehensive listings of available books. Yet, the authors found many of the 1920s reference materials to be quite useful and often multifunctional. The authors recorded observations regarding the impact of automation on answering reference questions. Even though the Internet has changed the outward appearance of reference services, many things remain the same. PMID:12883575

  11. Do not hesitate to use Tversky-and other hints for successful active analogue searches with feature count descriptors.

    PubMed

    Horvath, Dragos; Marcou, Gilles; Varnek, Alexandre

    2013-07-22

    This study is an exhaustive analysis of the neighborhood behavior over a large coherent data set (ChEMBL target/ligand pairs of known Ki, for 165 targets with >50 associated ligands each). It focuses on similarity-based virtual screening (SVS) success defined by the ascertained optimality index. This is a weighted compromise between purity and retrieval rate of active hits in the neighborhood of an active query. One key issue addressed here is the impact of Tversky asymmetric weighing of query vs candidate features (represented as integer-value ISIDA colored fragment/pharmacophore triplet count descriptor vectors). The nearly a 3/4 million independent SVS runs showed that Tversky scores with a strong bias in favor of query-specific features are, by far, the most successful and the least failure-prone out of a set of nine other dissimilarity scores. These include classical Tanimoto, which failed to defend its privileged status in practical SVS applications. Tversky performance is not significantly conditioned by tuning of its bias parameter α. Both initial "guesses" of α = 0.9 and 0.7 were more successful than Tanimoto (at its turn, better than Euclid). Tversky was eventually tested in exhaustive similarity searching within the library of 1.6 M commercial + bioactive molecules at http://infochim.u-strasbg.fr/webserv/VSEngine.html , comparing favorably to Tanimoto in terms of "scaffold hopping" propensity. Therefore, it should be used at least as often as, perhaps in parallel to Tanimoto in SVS. Analysis with respect to query subclasses highlighted relationships of query complexity (simply expressed in terms of pharmacophore pattern counts) and/or target nature vs SVS success likelihood. SVS using more complex queries are more robust with respect to the choice of their operational premises (descriptors, metric). Yet, they are best handled by "pro-query" Tversky scores at α > 0.5. Among simpler queries, one may distinguish between "growable" (allowing for active analogs with additional features), and a few "conservative" queries not allowing any growth. These (typically bioactive amine transporter ligands) form the specific application domain of "pro-candidate" biased Tversky scores at α < 0.5.

  12. Constraint-based Data Mining

    NASA Astrophysics Data System (ADS)

    Boulicaut, Jean-Francois; Jeudy, Baptiste

    Knowledge Discovery in Databases (KDD) is a complex interactive process. The promising theoretical framework of inductive databases considers this is essentially a querying process. It is enabled by a query language which can deal either with raw data or patterns which hold in the data. Mining patterns turns to be the so-called inductive query evaluation process for which constraint-based Data Mining techniques have to be designed. An inductive query specifies declaratively the desired constraints and algorithms are used to compute the patterns satisfying the constraints in the data. We survey important results of this active research domain. This chapter emphasizes a real breakthrough for hard problems concerning local pattern mining under various constraints and it points out the current directions of research as well.

  13. Bat-Inspired Algorithm Based Query Expansion for Medical Web Information Retrieval.

    PubMed

    Khennak, Ilyes; Drias, Habiba

    2017-02-01

    With the increasing amount of medical data available on the Web, looking for health information has become one of the most widely searched topics on the Internet. Patients and people of several backgrounds are now using Web search engines to acquire medical information, including information about a specific disease, medical treatment or professional advice. Nonetheless, due to a lack of medical knowledge, many laypeople have difficulties in forming appropriate queries to articulate their inquiries, which deem their search queries to be imprecise due the use of unclear keywords. The use of these ambiguous and vague queries to describe the patients' needs has resulted in a failure of Web search engines to retrieve accurate and relevant information. One of the most natural and promising method to overcome this drawback is Query Expansion. In this paper, an original approach based on Bat Algorithm is proposed to improve the retrieval effectiveness of query expansion in medical field. In contrast to the existing literature, the proposed approach uses Bat Algorithm to find the best expanded query among a set of expanded query candidates, while maintaining low computational complexity. Moreover, this new approach allows the determination of the length of the expanded query empirically. Numerical results on MEDLINE, the on-line medical information database, show that the proposed approach is more effective and efficient compared to the baseline.

  14. BioFed: federated query processing over life sciences linked open data.

    PubMed

    Hasnain, Ali; Mehmood, Qaiser; Sana E Zainab, Syeda; Saleem, Muhammad; Warren, Claude; Zehra, Durre; Decker, Stefan; Rebholz-Schuhmann, Dietrich

    2017-03-15

    Biomedical data, e.g. from knowledge bases and ontologies, is increasingly made available following open linked data principles, at best as RDF triple data. This is a necessary step towards unified access to biological data sets, but this still requires solutions to query multiple endpoints for their heterogeneous data to eventually retrieve all the meaningful information. Suggested solutions are based on query federation approaches, which require the submission of SPARQL queries to endpoints. Due to the size and complexity of available data, these solutions have to be optimised for efficient retrieval times and for users in life sciences research. Last but not least, over time, the reliability of data resources in terms of access and quality have to be monitored. Our solution (BioFed) federates data over 130 SPARQL endpoints in life sciences and tailors query submission according to the provenance information. BioFed has been evaluated against the state of the art solution FedX and forms an important benchmark for the life science domain. The efficient cataloguing approach of the federated query processing system 'BioFed', the triple pattern wise source selection and the semantic source normalisation forms the core to our solution. It gathers and integrates data from newly identified public endpoints for federated access. Basic provenance information is linked to the retrieved data. Last but not least, BioFed makes use of the latest SPARQL standard (i.e., 1.1) to leverage the full benefits for query federation. The evaluation is based on 10 simple and 10 complex queries, which address data in 10 major and very popular data sources (e.g., Dugbank, Sider). BioFed is a solution for a single-point-of-access for a large number of SPARQL endpoints providing life science data. It facilitates efficient query generation for data access and provides basic provenance information in combination with the retrieved data. BioFed fully supports SPARQL 1.1 and gives access to the endpoint's availability based on the EndpointData graph. Our evaluation of BioFed against FedX is based on 20 heterogeneous federated SPARQL queries and shows competitive execution performance in comparison to FedX, which can be attributed to the provision of provenance information for the source selection. Developing and testing federated query engines for life sciences data is still a challenging task. According to our findings, it is advantageous to optimise the source selection. The cataloguing of SPARQL endpoints, including type and property indexing, leads to efficient querying of data resources over the Web of Data. This could even be further improved through the use of ontologies, e.g., for abstract normalisation of query terms.

  15. Executing Complexity-Increasing Queries in Relational (MySQL) and NoSQL (MongoDB and EXist) Size-Growing ISO/EN 13606 Standardized EHR Databases

    PubMed Central

    Sánchez-de-Madariaga, Ricardo; Muñoz, Adolfo; Castro, Antonio L; Moreno, Oscar; Pascual, Mario

    2018-01-01

    This research shows a protocol to assess the computational complexity of querying relational and non-relational (NoSQL (not only Structured Query Language)) standardized electronic health record (EHR) medical information database systems (DBMS). It uses a set of three doubling-sized databases, i.e. databases storing 5000, 10,000 and 20,000 realistic standardized EHR extracts, in three different database management systems (DBMS): relational MySQL object-relational mapping (ORM), document-based NoSQL MongoDB, and native extensible markup language (XML) NoSQL eXist. The average response times to six complexity-increasing queries were computed, and the results showed a linear behavior in the NoSQL cases. In the NoSQL field, MongoDB presents a much flatter linear slope than eXist. NoSQL systems may also be more appropriate to maintain standardized medical information systems due to the special nature of the updating policies of medical information, which should not affect the consistency and efficiency of the data stored in NoSQL databases. One limitation of this protocol is the lack of direct results of improved relational systems such as archetype relational mapping (ARM) with the same data. However, the interpolation of doubling-size database results to those presented in the literature and other published results suggests that NoSQL systems might be more appropriate in many specific scenarios and problems to be solved. For example, NoSQL may be appropriate for document-based tasks such as EHR extracts used in clinical practice, or edition and visualization, or situations where the aim is not only to query medical information, but also to restore the EHR in exactly its original form. PMID:29608174

  16. Executing Complexity-Increasing Queries in Relational (MySQL) and NoSQL (MongoDB and EXist) Size-Growing ISO/EN 13606 Standardized EHR Databases.

    PubMed

    Sánchez-de-Madariaga, Ricardo; Muñoz, Adolfo; Castro, Antonio L; Moreno, Oscar; Pascual, Mario

    2018-03-19

    This research shows a protocol to assess the computational complexity of querying relational and non-relational (NoSQL (not only Structured Query Language)) standardized electronic health record (EHR) medical information database systems (DBMS). It uses a set of three doubling-sized databases, i.e. databases storing 5000, 10,000 and 20,000 realistic standardized EHR extracts, in three different database management systems (DBMS): relational MySQL object-relational mapping (ORM), document-based NoSQL MongoDB, and native extensible markup language (XML) NoSQL eXist. The average response times to six complexity-increasing queries were computed, and the results showed a linear behavior in the NoSQL cases. In the NoSQL field, MongoDB presents a much flatter linear slope than eXist. NoSQL systems may also be more appropriate to maintain standardized medical information systems due to the special nature of the updating policies of medical information, which should not affect the consistency and efficiency of the data stored in NoSQL databases. One limitation of this protocol is the lack of direct results of improved relational systems such as archetype relational mapping (ARM) with the same data. However, the interpolation of doubling-size database results to those presented in the literature and other published results suggests that NoSQL systems might be more appropriate in many specific scenarios and problems to be solved. For example, NoSQL may be appropriate for document-based tasks such as EHR extracts used in clinical practice, or edition and visualization, or situations where the aim is not only to query medical information, but also to restore the EHR in exactly its original form.

  17. Query by example video based on fuzzy c-means initialized by fixed clustering center

    NASA Astrophysics Data System (ADS)

    Hou, Sujuan; Zhou, Shangbo; Siddique, Muhammad Abubakar

    2012-04-01

    Currently, the high complexity of video contents has posed the following major challenges for fast retrieval: (1) efficient similarity measurements, and (2) efficient indexing on the compact representations. A video-retrieval strategy based on fuzzy c-means (FCM) is presented for querying by example. Initially, the query video is segmented and represented by a set of shots, each shot can be represented by a key frame, and then we used video processing techniques to find visual cues to represent the key frame. Next, because the FCM algorithm is sensitive to the initializations, here we initialized the cluster center by the shots of query video so that users could achieve appropriate convergence. After an FCM cluster was initialized by the query video, each shot of query video was considered a benchmark point in the aforesaid cluster, and each shot in the database possessed a class label. The similarity between the shots in the database with the same class label and benchmark point can be transformed into the distance between them. Finally, the similarity between the query video and the video in database was transformed into the number of similar shots. Our experimental results demonstrated the performance of this proposed approach.

  18. Army technology development. IBIS query. Software to support the Image Based Information System (IBIS) expansion for mapping, charting and geodesy

    NASA Technical Reports Server (NTRS)

    Friedman, S. Z.; Walker, R. E.; Aitken, R. B.

    1986-01-01

    The Image Based Information System (IBIS) has been under development at the Jet Propulsion Laboratory (JPL) since 1975. It is a collection of more than 90 programs that enable processing of image, graphical, tabular data for spatial analysis. IBIS can be utilized to create comprehensive geographic data bases. From these data, an analyst can study various attributes describing characteristics of a given study area. Even complex combinations of disparate data types can be synthesized to obtain a new perspective on spatial phenomena. In 1984, new query software was developed enabling direct Boolean queries of IBIS data bases through the submission of easily understood expressions. An improved syntax methodology, a data dictionary, and display software simplified the analysts' tasks associated with building, executing, and subsequently displaying the results of a query. The primary purpose of this report is to describe the features and capabilities of the new query software. A secondary purpose of this report is to compare this new query software to the query software developed previously (Friedman, 1982). With respect to this topic, the relative merits and drawbacks of both approaches are covered.

  19. NCBI2RDF: enabling full RDF-based access to NCBI databases.

    PubMed

    Anguita, Alberto; García-Remesal, Miguel; de la Iglesia, Diana; Maojo, Victor

    2013-01-01

    RDF has become the standard technology for enabling interoperability among heterogeneous biomedical databases. The NCBI provides access to a large set of life sciences databases through a common interface called Entrez. However, the latter does not provide RDF-based access to such databases, and, therefore, they cannot be integrated with other RDF-compliant databases and accessed via SPARQL query interfaces. This paper presents the NCBI2RDF system, aimed at providing RDF-based access to the complete NCBI data repository. This API creates a virtual endpoint for servicing SPARQL queries over different NCBI repositories and presenting to users the query results in SPARQL results format, thus enabling this data to be integrated and/or stored with other RDF-compliant repositories. SPARQL queries are dynamically resolved, decomposed, and forwarded to the NCBI-provided E-utilities programmatic interface to access the NCBI data. Furthermore, we show how our approach increases the expressiveness of the native NCBI querying system, allowing several databases to be accessed simultaneously. This feature significantly boosts productivity when working with complex queries and saves time and effort to biomedical researchers. Our approach has been validated with a large number of SPARQL queries, thus proving its reliability and enhanced capabilities in biomedical environments.

  20. Integrating unified medical language system and association mining techniques into relevance feedback for biomedical literature search.

    PubMed

    Ji, Yanqing; Ying, Hao; Tran, John; Dews, Peter; Massanari, R Michael

    2016-07-19

    Finding highly relevant articles from biomedical databases is challenging not only because it is often difficult to accurately express a user's underlying intention through keywords but also because a keyword-based query normally returns a long list of hits with many citations being unwanted by the user. This paper proposes a novel biomedical literature search system, called BiomedSearch, which supports complex queries and relevance feedback. The system employed association mining techniques to build a k-profile representing a user's relevance feedback. More specifically, we developed a weighted interest measure and an association mining algorithm to find the strength of association between a query and each concept in the article(s) selected by the user as feedback. The top concepts were utilized to form a k-profile used for the next-round search. BiomedSearch relies on Unified Medical Language System (UMLS) knowledge sources to map text files to standard biomedical concepts. It was designed to support queries with any levels of complexity. A prototype of BiomedSearch software was made and it was preliminarily evaluated using the Genomics data from TREC (Text Retrieval Conference) 2006 Genomics Track. Initial experiment results indicated that BiomedSearch increased the mean average precision (MAP) for a set of queries. With UMLS and association mining techniques, BiomedSearch can effectively utilize users' relevance feedback to improve the performance of biomedical literature search.

  1. MIMO: an efficient tool for molecular interaction maps overlap

    PubMed Central

    2013-01-01

    Background Molecular pathways represent an ensemble of interactions occurring among molecules within the cell and between cells. The identification of similarities between molecular pathways across organisms and functions has a critical role in understanding complex biological processes. For the inference of such novel information, the comparison of molecular pathways requires to account for imperfect matches (flexibility) and to efficiently handle complex network topologies. To date, these characteristics are only partially available in tools designed to compare molecular interaction maps. Results Our approach MIMO (Molecular Interaction Maps Overlap) addresses the first problem by allowing the introduction of gaps and mismatches between query and template pathways and permits -when necessary- supervised queries incorporating a priori biological information. It then addresses the second issue by relying directly on the rich graph topology described in the Systems Biology Markup Language (SBML) standard, and uses multidigraphs to efficiently handle multiple queries on biological graph databases. The algorithm has been here successfully used to highlight the contact point between various human pathways in the Reactome database. Conclusions MIMO offers a flexible and efficient graph-matching tool for comparing complex biological pathways. PMID:23672344

  2. Complex dynamics of our economic life on different scales: insights from search engine query data.

    PubMed

    Preis, Tobias; Reith, Daniel; Stanley, H Eugene

    2010-12-28

    Search engine query data deliver insight into the behaviour of individuals who are the smallest possible scale of our economic life. Individuals are submitting several hundred million search engine queries around the world each day. We study weekly search volume data for various search terms from 2004 to 2010 that are offered by the search engine Google for scientific use, providing information about our economic life on an aggregated collective level. We ask the question whether there is a link between search volume data and financial market fluctuations on a weekly time scale. Both collective 'swarm intelligence' of Internet users and the group of financial market participants can be regarded as a complex system of many interacting subunits that react quickly to external changes. We find clear evidence that weekly transaction volumes of S&P 500 companies are correlated with weekly search volume of corresponding company names. Furthermore, we apply a recently introduced method for quantifying complex correlations in time series with which we find a clear tendency that search volume time series and transaction volume time series show recurring patterns.

  3. Combining clinical and genomics queries using i2b2 – Three methods

    PubMed Central

    Murphy, Shawn N.; Avillach, Paul; Bellazzi, Riccardo; Phillips, Lori; Gabetta, Matteo; Eran, Alal; McDuffie, Michael T.; Kohane, Isaac S.

    2017-01-01

    We are fortunate to be living in an era of twin biomedical data surges: a burgeoning representation of human phenotypes in the medical records of our healthcare systems, and high-throughput sequencing making rapid technological advances. The difficulty representing genomic data and its annotations has almost by itself led to the recognition of a biomedical “Big Data” challenge, and the complexity of healthcare data only compounds the problem to the point that coherent representation of both systems on the same platform seems insuperably difficult. We investigated the capability for complex, integrative genomic and clinical queries to be supported in the Informatics for Integrating Biology and the Bedside (i2b2) translational software package. Three different data integration approaches were developed: The first is based on Sequence Ontology, the second is based on the tranSMART engine, and the third on CouchDB. These novel methods for representing and querying complex genomic and clinical data on the i2b2 platform are available today for advancing precision medicine. PMID:28388645

  4. A Deductive Capability for Data Management,

    DTIC Science & Technology

    1976-01-01

    deductive processor can answer ‘what-if’ and other kinds of hi gh- level queries that are difficult if not impossible for present-day data management...I , e 184 KELLOGG , KLAHR , TRAVIS search space. This led to a host of resol ution strategies (Chanq and Lee (1Q73)). Some recent approaches to...as co—nieto a—d accur ate a nicture as possibl e of the operation of a hrge business org a n ~~~n. This wil l include the need to understand the

  5. Quantum Private Query Based on Bell State and Single Photons

    NASA Astrophysics Data System (ADS)

    Gao, Xiang; Chang, Yan; Zhang, Shi-Bin; Yang, Fan; Zhang, Yan

    2018-03-01

    Quantum private query (QPQ) can protect both user's and database holder's privacy. In this paper, we propose a novel quantum private query protocol based on Bell state and single photons. As far as we know, no one has ever proposed the QPQ based on Bell state. By using the decoherence-free (DF) states, our protocol can resist the collective noise. Besides that, our protocol is a one-way quantum protocol, which can resist the Trojan horse attack and reduce the communication complexity. Our protocol can not only guarantee the participants' privacy but also stand against an external eavesdropper.

  6. Presurgical cleft lip and palate orthopedics: an overview

    PubMed Central

    Alzain, Ibtesam; Batwa, Waeil; Cash, Alex; Murshid, Zuhair A

    2017-01-01

    Patients with cleft lip and/or palate go through a lifelong journey of multidisciplinary care, starting from before birth and extending until adulthood. Presurgical orthopedic (PSO) treatment is one of the earliest stages of this care plan. In this paper we provide a review of the PSO treatment. This review should help general and specialist dentists to better understand the cleft patient care path and to be able to answer patient queries more efficiently. The objectives of this paper were to review the basic principles of PSO treatment, the various types of techniques used in this therapy, and the protocol followed, and to critically evaluate the advantages and disadvantages of some of these techniques. In conclusion, we believe that PSO treatment, specifically nasoalveolar molding, does help to approximate the segments of the cleft maxilla and does reduce the intersegment space in readiness for the surgical closure of cleft sites. However, what we remain unable to prove equivocally at this point is whether the reduction in the dimensions of the cleft presurgically and the manipulation of the nasal complex benefit our patients in the long term. PMID:28615974

  7. Trends in Social Science: The Impact of Computational and Simulative Models

    NASA Astrophysics Data System (ADS)

    Conte, Rosaria; Paolucci, Mario; Cecconi, Federico

    This paper discusses current progress in the computational social sciences. Specifically, it examines the following questions: Are the computational social sciences exhibiting positive or negative developments? What are the roles of agent-based models and simulation (ABM), network analysis, and other "computational" methods within this dynamic? (Conte, The necessity of intelligent agents in social simulation, Advances in Complex Systems, 3(01n04), 19-38, 2000; Conte 2010; Macy, Annual Review of Sociology, 143-166, 2002). Are there objective indicators of scientific growth that can be applied to different scientific areas, allowing for comparison among them? In this paper, some answers to these questions are presented and discussed. In particular, comparisons among different disciplines in the social and computational sciences are shown, taking into account their respective growth trends in the number of publication citations over the last few decades (culled from Google Scholar). After a short discussion of the methodology adopted, results of keyword-based queries are presented, unveiling some unexpected local impacts of simulation on the takeoff of traditionally poorly productive disciplines.

  8. Combining conceptual graphs and argumentation for aiding in the teleexpertise.

    PubMed

    Doumbouya, Mamadou Bilo; Kamsu-Foguem, Bernard; Kenfack, Hugues; Foguem, Clovis

    2015-08-01

    Current medical information systems are too complex to be meaningfully exploited. Hence there is a need to develop new strategies for maximising the exploitation of medical data to the benefit of medical professionals. It is against this backdrop that we want to propose a tangible contribution by providing a tool which combines conceptual graphs and Dung׳s argumentation system in order to assist medical professionals in their decision making process. The proposed tool allows medical professionals to easily manipulate and visualise queries and answers for making decisions during the practice of teleexpertise. The knowledge modelling is made using an open application programming interface (API) called CoGui, which offers the means for building structured knowledge bases with the dedicated functionalities of graph-based reasoning via retrieved data from different institutions (hospitals, national security centre, and nursing homes). The tool that we have described in this study supports a formal traceable structure of the reasoning with acceptable arguments to elucidate some ethical problems that occur very often in the telemedicine domain. Copyright © 2015 Elsevier Ltd. All rights reserved.

  9. An Examination of Natural Language as a Query Formation Tool for Retrieving Information on E-Health from Pub Med.

    ERIC Educational Resources Information Center

    Peterson, Gabriel M.; Su, Kuichun; Ries, James E.; Sievert, Mary Ellen C.

    2002-01-01

    Discussion of Internet use for information searches on health-related topics focuses on a study that examined complexity and variability of natural language in using search terms that express the concept of electronic health (e-health). Highlights include precision of retrieved information; shift in terminology; and queries using the Pub Med…

  10. Query-Biased Preview over Outsourced and Encrypted Data

    PubMed Central

    Luo, Guangchun; Qin, Ke; Chen, Aiguo

    2013-01-01

    For both convenience and security, more and more users encrypt their sensitive data before outsourcing it to a third party such as cloud storage service. However, searching for the desired documents becomes problematic since it is costly to download and decrypt each possibly needed document to check if it contains the desired content. An informative query-biased preview feature, as applied in modern search engine, could help the users to learn about the content without downloading the entire document. However, when the data are encrypted, securely extracting a keyword-in-context snippet from the data as a preview becomes a challenge. Based on private information retrieval protocol and the core concept of searchable encryption, we propose a single-server and two-round solution to securely obtain a query-biased snippet over the encrypted data from the server. We achieve this novel result by making a document (plaintext) previewable under any cryptosystem and constructing a secure index to support dynamic computation for a best matched snippet when queried by some keywords. For each document, the scheme has O(d) storage complexity and O(log(d/s) + s + d/s) communication complexity, where d is the document size and s is the snippet length. PMID:24078798

  11. Query-biased preview over outsourced and encrypted data.

    PubMed

    Peng, Ningduo; Luo, Guangchun; Qin, Ke; Chen, Aiguo

    2013-01-01

    For both convenience and security, more and more users encrypt their sensitive data before outsourcing it to a third party such as cloud storage service. However, searching for the desired documents becomes problematic since it is costly to download and decrypt each possibly needed document to check if it contains the desired content. An informative query-biased preview feature, as applied in modern search engine, could help the users to learn about the content without downloading the entire document. However, when the data are encrypted, securely extracting a keyword-in-context snippet from the data as a preview becomes a challenge. Based on private information retrieval protocol and the core concept of searchable encryption, we propose a single-server and two-round solution to securely obtain a query-biased snippet over the encrypted data from the server. We achieve this novel result by making a document (plaintext) previewable under any cryptosystem and constructing a secure index to support dynamic computation for a best matched snippet when queried by some keywords. For each document, the scheme has O(d) storage complexity and O(log(d/s) + s + d/s) communication complexity, where d is the document size and s is the snippet length.

  12. An alternative database approach for management of SNOMED CT and improved patient data queries.

    PubMed

    Campbell, W Scott; Pedersen, Jay; McClay, James C; Rao, Praveen; Bastola, Dhundy; Campbell, James R

    2015-10-01

    SNOMED CT is the international lingua franca of terminologies for human health. Based in Description Logics (DL), the terminology enables data queries that incorporate inferences between data elements, as well as, those relationships that are explicitly stated. However, the ontologic and polyhierarchical nature of the SNOMED CT concept model make it difficult to implement in its entirety within electronic health record systems that largely employ object oriented or relational database architectures. The result is a reduction of data richness, limitations of query capability and increased systems overhead. The hypothesis of this research was that a graph database (graph DB) architecture using SNOMED CT as the basis for the data model and subsequently modeling patient data upon the semantic core of SNOMED CT could exploit the full value of the terminology to enrich and support advanced data querying capability of patient data sets. The hypothesis was tested by instantiating a graph DB with the fully classified SNOMED CT concept model. The graph DB instance was tested for integrity by calculating the transitive closure table for the SNOMED CT hierarchy and comparing the results with transitive closure tables created using current, validated methods. The graph DB was then populated with 461,171 anonymized patient record fragments and over 2.1 million associated SNOMED CT clinical findings. Queries, including concept negation and disjunction, were then run against the graph database and an enterprise Oracle relational database (RDBMS) of the same patient data sets. The graph DB was then populated with laboratory data encoded using LOINC, as well as, medication data encoded with RxNorm and complex queries performed using LOINC, RxNorm and SNOMED CT to identify uniquely described patient populations. A graph database instance was successfully created for two international releases of SNOMED CT and two US SNOMED CT editions. Transitive closure tables and descriptive statistics generated using the graph database were identical to those using validated methods. Patient queries produced identical patient count results to the Oracle RDBMS with comparable times. Database queries involving defining attributes of SNOMED CT concepts were possible with the graph DB. The same queries could not be directly performed with the Oracle RDBMS representation of the patient data and required the creation and use of external terminology services. Further, queries of undefined depth were successful in identifying unknown relationships between patient cohorts. The results of this study supported the hypothesis that a patient database built upon and around the semantic model of SNOMED CT was possible. The model supported queries that leveraged all aspects of the SNOMED CT logical model to produce clinically relevant query results. Logical disjunction and negation queries were possible using the data model, as well as, queries that extended beyond the structural IS_A hierarchy of SNOMED CT to include queries that employed defining attribute-values of SNOMED CT concepts as search parameters. As medical terminologies, such as SNOMED CT, continue to expand, they will become more complex and model consistency will be more difficult to assure. Simultaneously, consumers of data will increasingly demand improvements to query functionality to accommodate additional granularity of clinical concepts without sacrificing speed. This new line of research provides an alternative approach to instantiating and querying patient data represented using advanced computable clinical terminologies. Copyright © 2015 Elsevier Inc. All rights reserved.

  13. XGI: a graphical interface for XQuery creation.

    PubMed

    Li, Xiang; Gennari, John H; Brinkley, James F

    2007-10-11

    XML has become the default standard for data exchange among heterogeneous data sources, and in January 2007 XQuery (XML Query language) was recommended by the World Wide Web Consortium as the query language for XML. However, XQuery is a complex language that is difficult for non-programmers to learn. We have therefore developed XGI (XQuery Graphical Interface), a visual interface for graphically generating XQuery. In this paper we demonstrate the functionality of XGI through its application to a biomedical XML dataset. We describe the system architecture and the features of XGI in relation to several existing querying systems, we demonstrate the system's usability through a sample query construction, and we discuss a preliminary evaluation of XGI. Finally, we describe some limitations of the system, and our plans for future improvements.

  14. Pattern search in multi-structure data: a framework for the next-generation evidence-based medicine

    NASA Astrophysics Data System (ADS)

    Sukumar, Sreenivas R.; Ainsworth, Keela C.

    2014-03-01

    With the impetus towards personalized and evidence-based medicine, the need for a framework to analyze/interpret quantitative measurements (blood work, toxicology, etc.) with qualitative descriptions (specialist reports after reading images, bio-medical knowledgebase, etc.) to predict diagnostic risks is fast emerging. Addressing this need, we pose and answer the following questions: (i) How can we jointly analyze and explore measurement data in context with qualitative domain knowledge? (ii) How can we search and hypothesize patterns (not known apriori) from such multi-structure data? (iii) How can we build predictive models by integrating weakly-associated multi-relational multi-structure data? We propose a framework towards answering these questions. We describe a software solution that leverages hardware for scalable in-memory analytics and applies next-generation semantic query tools on medical data.

  15. A question-answer pair (QAP) database integrated with websites to answer complex questions submitted to the Regional Medicines Information and Pharmacovigilance Centres in Norway (RELIS): a descriptive study.

    PubMed

    Schjøtt, Jan; Reppe, Linda A; Roland, Pål-Didrik H; Westergren, Tone

    2012-01-01

    To assess a question-answer pair (QAP) database integrated with websites developed for drug information centres to answer complex questions effectively. Descriptive study with comparison of two subsequent 6-year periods (1995-2000 and 2001-2006). The Regional Medicines Information and Pharmacovigilance Centres in Norway (RELIS). A randomised sample of QAPs from the RELIS database. Answer time in days compared with Mann-Whitney U test. Number of drugs involved (one, two, three or more), complexity (judgemental and/or patient-related or not) and literature search (none, simple or advanced) compared with χ(2) tests. 842 QAPs (312 from 1995 to 2000 and 530 from 2001 to 2006) were compared. The fraction of judgemental and patient-related questions increased (66%-75% and 54%-72%, respectively, p<0.01). Number of drugs and literature search (>50% advanced) was similar in the two periods, but the fraction of answers referring to the RELIS database increased (13%-31%, p<0.01). Median answer time was reduced from 2 days to 1 (p<0.01), although the fraction of complex questions increased from the first to the second period. Furthermore, the mean number of questions per employee per year increased from 66 to 89 from the first to the second period. The authors conclude that RELIS has a potential to efficiently answer complex questions. The model is of relevance for organisation of drug information centres.

  16. Integrating user profile in medical CBIR systems to answer perceptual similarity queries

    NASA Astrophysics Data System (ADS)

    Bugatti, Pedro H.; Kaster, Daniel S.; Ponciano-Silva, Marcelo; Traina, Agma J. M.; Traina, Caetano, Jr.

    2011-03-01

    Techniques for Content-Based Image Retrieval (CBIR) have been intensively explored due to the increase in the amount of captured images and the need of fast retrieval of them. The medical field is a specific example that generates a large flow of information, especially digital images employed for diagnosing. One issue that still remains unsolved deals with how to reach the perceptual similarity. That is, to achieve an effective retrieval, one must characterize and quantify the perceptual similarity regarding the specialist in the field. Therefore, the present paper was conceived to fill in this gap creating a consistent support to perform similarity queries over medical images, maintaining the semantics of a given query desired by the user. CBIR systems relying in relevance feedback techniques usually request the users to label relevant images. In this paper, we present a simple but highly effective strategy to survey user profiles, taking advantage of such labeling to implicitly gather the user perceptual similarity. The user profiles maintain the settings desired for each user, allowing tuning the similarity assessment, which encompasses dynamically changing the distance function employed through an interactive process. Experiments using computed tomography lung images show that the proposed approach is effective in capturing the users' perception.

  17. Griffin: A Tool for Symbolic Inference of Synchronous Boolean Molecular Networks.

    PubMed

    Muñoz, Stalin; Carrillo, Miguel; Azpeitia, Eugenio; Rosenblueth, David A

    2018-01-01

    Boolean networks are important models of biochemical systems, located at the high end of the abstraction spectrum. A number of Boolean gene networks have been inferred following essentially the same method. Such a method first considers experimental data for a typically underdetermined "regulation" graph. Next, Boolean networks are inferred by using biological constraints to narrow the search space, such as a desired set of (fixed-point or cyclic) attractors. We describe Griffin , a computer tool enhancing this method. Griffin incorporates a number of well-established algorithms, such as Dubrova and Teslenko's algorithm for finding attractors in synchronous Boolean networks. In addition, a formal definition of regulation allows Griffin to employ "symbolic" techniques, able to represent both large sets of network states and Boolean constraints. We observe that when the set of attractors is required to be an exact set, prohibiting additional attractors, a naive Boolean coding of this constraint may be unfeasible. Such cases may be intractable even with symbolic methods, as the number of Boolean constraints may be astronomically large. To overcome this problem, we employ an Artificial Intelligence technique known as "clause learning" considerably increasing Griffin 's scalability. Without clause learning only toy examples prohibiting additional attractors are solvable: only one out of seven queries reported here is answered. With clause learning, by contrast, all seven queries are answered. We illustrate Griffin with three case studies drawn from the Arabidopsis thaliana literature. Griffin is available at: http://turing.iimas.unam.mx/griffin.

  18. Foundations of RDF Databases

    NASA Astrophysics Data System (ADS)

    Arenas, Marcelo; Gutierrez, Claudio; Pérez, Jorge

    The goal of this paper is to give an overview of the basics of the theory of RDF databases. We provide a formal definition of RDF that includes the features that distinguish this model from other graph data models. We then move into the fundamental issue of querying RDF data. We start by considering the RDF query language SPARQL, which is a W3C Recommendation since January 2008. We provide an algebraic syntax and a compositional semantics for this language, study the complexity of the evaluation problem for different fragments of SPARQL, and consider the problem of optimizing the evaluation of SPARQL queries, showing that a natural fragment of this language has some good properties in this respect. We furthermore study the expressive power of SPARQL, by comparing it with some well-known query languages such as relational algebra. We conclude by considering the issue of querying RDF data in the presence of RDFS vocabulary. In particular, we present a recently proposed extension of SPARQL with navigational capabilities.

  19. A Dimensional Bus model for integrating clinical and research data.

    PubMed

    Wade, Ted D; Hum, Richard C; Murphy, James R

    2011-12-01

    Many clinical research data integration platforms rely on the Entity-Attribute-Value model because of its flexibility, even though it presents problems in query formulation and execution time. The authors sought more balance in these traits. Borrowing concepts from Entity-Attribute-Value and from enterprise data warehousing, the authors designed an alternative called the Dimensional Bus model and used it to integrate electronic medical record, sponsored study, and biorepository data. Each type of observational collection has its own table, and the structure of these tables varies to suit the source data. The observational tables are linked to the Bus, which holds provenance information and links to various classificatory dimensions that amplify the meaning of the data or facilitate its query and exposure management. The authors implemented a Bus-based clinical research data repository with a query system that flexibly manages data access and confidentiality, facilitates catalog search, and readily formulates and compiles complex queries. The design provides a workable way to manage and query mixed schemas in a data warehouse.

  20. A semantic proteomics dashboard (SemPoD) for data management in translational research.

    PubMed

    Jayapandian, Catherine P; Zhao, Meng; Ewing, Rob M; Zhang, Guo-Qiang; Sahoo, Satya S

    2012-01-01

    One of the primary challenges in translational research data management is breaking down the barriers between the multiple data silos and the integration of 'omics data with clinical information to complete the cycle from the bench to the bedside. The role of contextual metadata, also called provenance information, is a key factor ineffective data integration, reproducibility of results, correct attribution of original source, and answering research queries involving "What", "Where", "When", "Which", "Who", "How", and "Why" (also known as the W7 model). But, at present there is limited or no effective approach to managing and leveraging provenance information for integrating data across studies or projects. Hence, there is an urgent need for a paradigm shift in creating a "provenance-aware" informatics platform to address this challenge. We introduce an ontology-driven, intuitive Semantic Proteomics Dashboard (SemPoD) that uses provenance together with domain information (semantic provenance) to enable researchers to query, compare, and correlate different types of data across multiple projects, and allow integration with legacy data to support their ongoing research. The SemPoD platform, currently in use at the Case Center for Proteomics and Bioinformatics (CPB), consists of three components: (a) Ontology-driven Visual Query Composer, (b) Result Explorer, and (c) Query Manager. Currently, SemPoD allows provenance-aware querying of 1153 mass-spectrometry experiments from 20 different projects. SemPod uses the systems molecular biology provenance ontology (SysPro) to support a dynamic query composition interface, which automatically updates the components of the query interface based on previous user selections and efficiently prunes the result set usinga "smart filtering" approach. The SysPro ontology re-uses terms from the PROV-ontology (PROV-O) being developed by the World Wide Web Consortium (W3C) provenance working group, the minimum information required for reporting a molecular interaction experiment (MIMIx), and the minimum information about a proteomics experiment (MIAPE) guidelines. The SemPoD was evaluated both in terms of user feedback and as scalability of the system. SemPoD is an intuitive and powerful provenance ontology-driven data access and query platform that uses the MIAPE and MIMIx metadata guideline to create an integrated view over large-scale systems molecular biology datasets. SemPoD leverages the SysPro ontology to create an intuitive dashboard for biologists to compose queries, explore the results, and use a query manager for storing queries for later use. SemPoD can be deployed over many existing database applications storing 'omics data, including, as illustrated here, the LabKey data-management system. The initial user feedback evaluating the usability and functionality of SemPoD has been very positive and it is being considered for wider deployment beyond the proteomics domain, and in other 'omics' centers.

  1. Labeling RDF Graphs for Linear Time and Space Querying

    NASA Astrophysics Data System (ADS)

    Furche, Tim; Weinzierl, Antonius; Bry, François

    Indices and data structures for web querying have mostly considered tree shaped data, reflecting the view of XML documents as tree-shaped. However, for RDF (and when querying ID/IDREF constraints in XML) data is indisputably graph-shaped. In this chapter, we first study existing indexing and labeling schemes for RDF and other graph datawith focus on support for efficient adjacency and reachability queries. For XML, labeling schemes are an important part of the widespread adoption of XML, in particular for mapping XML to existing (relational) database technology. However, the existing indexing and labeling schemes for RDF (and graph data in general) sacrifice one of the most attractive properties of XML labeling schemes, the constant time (and per-node space) test for adjacency (child) and reachability (descendant). In the second part, we introduce the first labeling scheme for RDF data that retains this property and thus achieves linear time and space processing of acyclic RDF queries on a significantly larger class of graphs than previous approaches (which are mostly limited to tree-shaped data). Finally, we show how this labeling scheme can be applied to (acyclic) SPARQL queries to obtain an evaluation algorithm with time and space complexity linear in the number of resources in the queried RDF graph.

  2. NCBI2RDF: Enabling Full RDF-Based Access to NCBI Databases

    PubMed Central

    Anguita, Alberto; García-Remesal, Miguel; de la Iglesia, Diana; Maojo, Victor

    2013-01-01

    RDF has become the standard technology for enabling interoperability among heterogeneous biomedical databases. The NCBI provides access to a large set of life sciences databases through a common interface called Entrez. However, the latter does not provide RDF-based access to such databases, and, therefore, they cannot be integrated with other RDF-compliant databases and accessed via SPARQL query interfaces. This paper presents the NCBI2RDF system, aimed at providing RDF-based access to the complete NCBI data repository. This API creates a virtual endpoint for servicing SPARQL queries over different NCBI repositories and presenting to users the query results in SPARQL results format, thus enabling this data to be integrated and/or stored with other RDF-compliant repositories. SPARQL queries are dynamically resolved, decomposed, and forwarded to the NCBI-provided E-utilities programmatic interface to access the NCBI data. Furthermore, we show how our approach increases the expressiveness of the native NCBI querying system, allowing several databases to be accessed simultaneously. This feature significantly boosts productivity when working with complex queries and saves time and effort to biomedical researchers. Our approach has been validated with a large number of SPARQL queries, thus proving its reliability and enhanced capabilities in biomedical environments. PMID:23984425

  3. An approach for heterogeneous and loosely coupled geospatial data distributed computing

    NASA Astrophysics Data System (ADS)

    Chen, Bin; Huang, Fengru; Fang, Yu; Huang, Zhou; Lin, Hui

    2010-07-01

    Most GIS (Geographic Information System) applications tend to have heterogeneous and autonomous geospatial information resources, and the availability of these local resources is unpredictable and dynamic under a distributed computing environment. In order to make use of these local resources together to solve larger geospatial information processing problems that are related to an overall situation, in this paper, with the support of peer-to-peer computing technologies, we propose a geospatial data distributed computing mechanism that involves loosely coupled geospatial resource directories and a term named as Equivalent Distributed Program of global geospatial queries to solve geospatial distributed computing problems under heterogeneous GIS environments. First, a geospatial query process schema for distributed computing as well as a method for equivalent transformation from a global geospatial query to distributed local queries at SQL (Structured Query Language) level to solve the coordinating problem among heterogeneous resources are presented. Second, peer-to-peer technologies are used to maintain a loosely coupled network environment that consists of autonomous geospatial information resources, thus to achieve decentralized and consistent synchronization among global geospatial resource directories, and to carry out distributed transaction management of local queries. Finally, based on the developed prototype system, example applications of simple and complex geospatial data distributed queries are presented to illustrate the procedure of global geospatial information processing.

  4. Finding Complex Roots: Can You Trust Your Calculator?

    ERIC Educational Resources Information Center

    Ciesla, Barbara A.; Watson, John W.

    2006-01-01

    This article investigates a specific instance when the textbook answer for finding a root of a complex number differed with the answer given by the TI-83. After explaining the reason for the difference the article then expands the definition of the integral root of a complex number to an arbitrary complex power of a complex number.

  5. DREAM: Classification scheme for dialog acts in clinical research query mediation.

    PubMed

    Hoxha, Julia; Chandar, Praveen; He, Zhe; Cimino, James; Hanauer, David; Weng, Chunhua

    2016-02-01

    Clinical data access involves complex but opaque communication between medical researchers and query analysts. Understanding such communication is indispensable for designing intelligent human-machine dialog systems that automate query formulation. This study investigates email communication and proposes a novel scheme for classifying dialog acts in clinical research query mediation. We analyzed 315 email messages exchanged in the communication for 20 data requests obtained from three institutions. The messages were segmented into 1333 utterance units. Through a rigorous process, we developed a classification scheme and applied it for dialog act annotation of the extracted utterances. Evaluation results with high inter-annotator agreement demonstrate the reliability of this scheme. This dataset is used to contribute preliminary understanding of dialog acts distribution and conversation flow in this dialog space. Copyright © 2015 Elsevier Inc. All rights reserved.

  6. Querying graphs in protein-protein interactions networks using feedback vertex set.

    PubMed

    Blin, Guillaume; Sikora, Florian; Vialette, Stéphane

    2010-01-01

    Recent techniques increase rapidly the amount of our knowledge on interactions between proteins. The interpretation of these new information depends on our ability to retrieve known substructures in the data, the Protein-Protein Interactions (PPIs) networks. In an algorithmic point of view, it is an hard task since it often leads to NP-hard problems. To overcome this difficulty, many authors have provided tools for querying patterns with a restricted topology, i.e., paths or trees in PPI networks. Such restriction leads to the development of fixed parameter tractable (FPT) algorithms, which can be practicable for restricted sizes of queries. Unfortunately, Graph Homomorphism is a W[1]-hard problem, and hence, no FPT algorithm can be found when patterns are in the shape of general graphs. However, Dost et al. gave an algorithm (which is not implemented) to query graphs with a bounded treewidth in PPI networks (the treewidth of the query being involved in the time complexity). In this paper, we propose another algorithm for querying pattern in the shape of graphs, also based on dynamic programming and the color-coding technique. To transform graphs queries into trees without loss of informations, we use feedback vertex set coupled to a node duplication mechanism. Hence, our algorithm is FPT for querying graphs with a bounded size of their feedback vertex set. It gives an alternative to the treewidth parameter, which can be better or worst for a given query. We provide a python implementation which allows us to validate our implementation on real data. Especially, we retrieve some human queries in the shape of graphs into the fly PPI network.

  7. The Evaluation of a Temporal Reasoning System in Processing Clinical Discharge Summaries

    PubMed Central

    Zhou, Li; Parsons, Simon; Hripcsak, George

    2008-01-01

    Context TimeText is a temporal reasoning system designed to represent, extract, and reason about temporal information in clinical text. Objective To measure the accuracy of the TimeText for processing clinical discharge summaries. Design Six physicians with biomedical informatics training served as domain experts. Twenty discharge summaries were randomly selected for the evaluation. For each of the first 14 reports, 5 to 8 clinically important medical events were chosen. The temporal reasoning system generated temporal relations about the endpoints (start or finish) of pairs of medical events. Two experts (subjects) manually generated temporal relations for these medical events. The system and expert-generated results were assessed by four other experts (raters). All of the twenty discharge summaries were used to assess the system’s accuracy in answering time-oriented clinical questions. For each report, five to ten clinically plausible temporal questions about events were generated. Two experts generated answers to the questions to serve as the gold standard. We wrote queries to retrieve answers from system’s output. Measurements Correctness of generated temporal relations, recall of clinically important relations, and accuracy in answering temporal questions. Results The raters determined that 97% of subjects’ 295 generated temporal relations were correct and that 96.5% of the system’s 995 generated temporal relations were correct. The system captured 79% of 307 temporal relations determined to be clinically important by the subjects and raters. The system answered 84% of the temporal questions correctly. Conclusion The system encoded the majority of information identified by experts, and was able to answer simple temporal questions. PMID:17947618

  8. UCSC genome browser: deep support for molecular biomedical research.

    PubMed

    Mangan, Mary E; Williams, Jennifer M; Lathe, Scott M; Karolchik, Donna; Lathe, Warren C

    2008-01-01

    The volume and complexity of genomic sequence data, and the additional experimental data required for annotation of the genomic context, pose a major challenge for display and access for biomedical researchers. Genome browsers organize this data and make it available in various ways to extract useful information to advance research projects. The UCSC Genome Browser is one of these resources. The official sequence data for a given species forms the framework to display many other types of data such as expression, variation, cross-species comparisons, and more. Visual representations of the data are available for exploration. Data can be queried with sequences. Complex database queries are also easily achieved with the Table Browser interface. Associated tools permit additional query types or access to additional data sources such as images of in situ localizations. Support for solving researcher's issues is provided with active discussion mailing lists and by providing updated training materials. The UCSC Genome Browser provides a source of deep support for a wide range of biomedical molecular research (http://genome.ucsc.edu).

  9. Web information retrieval based on ontology

    NASA Astrophysics Data System (ADS)

    Zhang, Jian

    2013-03-01

    The purpose of the Information Retrieval (IR) is to find a set of documents that are relevant for a specific information need of a user. Traditional Information Retrieval model commonly used in commercial search engine is based on keyword indexing system and Boolean logic queries. One big drawback of traditional information retrieval is that they typically retrieve information without an explicitly defined domain of interest to the users so that a lot of no relevance information returns to users, which burden the user to pick up useful answer from these no relevance results. In order to tackle this issue, many semantic web information retrieval models have been proposed recently. The main advantage of Semantic Web is to enhance search mechanisms with the use of Ontology's mechanisms. In this paper, we present our approach to personalize web search engine based on ontology. In addition, key techniques are also discussed in our paper. Compared to previous research, our works concentrate on the semantic similarity and the whole process including query submission and information annotation.

  10. An adaptable architecture for patient cohort identification from diverse data sources.

    PubMed

    Bache, Richard; Miles, Simon; Taweel, Adel

    2013-12-01

    We define and validate an architecture for systems that identify patient cohorts for clinical trials from multiple heterogeneous data sources. This architecture has an explicit query model capable of supporting temporal reasoning and expressing eligibility criteria independently of the representation of the data used to evaluate them. The architecture has the key feature that queries defined according to the query model are both pre and post-processed and this is used to address both structural and semantic heterogeneity. The process of extracting the relevant clinical facts is separated from the process of reasoning about them. A specific instance of the query model is then defined and implemented. We show that the specific instance of the query model has wide applicability. We then describe how it is used to access three diverse data warehouses to determine patient counts. Although the proposed architecture requires greater effort to implement the query model than would be the case for using just SQL and accessing a data-based management system directly, this effort is justified because it supports both temporal reasoning and heterogeneous data sources. The query model only needs to be implemented once no matter how many data sources are accessed. Each additional source requires only the implementation of a lightweight adaptor. The architecture has been used to implement a specific query model that can express complex eligibility criteria and access three diverse data warehouses thus demonstrating the feasibility of this approach in dealing with temporal reasoning and data heterogeneity.

  11. What types of astronomy images are most popular?

    NASA Astrophysics Data System (ADS)

    Allen, Alice; Bonnell, Jerry T.; Connelly, Paul; Haring, Ralf; Lowe, Stuart R.; Nemiroff, Robert J.

    2015-01-01

    Stunning imagery helps make astronomy one of the most popular sciences -- but what types of astronomy images are most popular? To help answer this question, public response to images posted to various public venues of the Astronomy Picture of the Day (APOD) are investigated. APOD portals queried included the main NASA website and the social media mirrors on Facebook, Google Plus, and Twitter. Popularity measures include polls, downloads, page views, likes, shares, and retweets; these measures are used to assess how image popularity varies in relation to various image attributes including topic and topicality.

  12. A semantic problem solving environment for integrative parasite research: identification of intervention targets for Trypanosoma cruzi.

    PubMed

    Parikh, Priti P; Minning, Todd A; Nguyen, Vinh; Lalithsena, Sarasi; Asiaee, Amir H; Sahoo, Satya S; Doshi, Prashant; Tarleton, Rick; Sheth, Amit P

    2012-01-01

    Research on the biology of parasites requires a sophisticated and integrated computational platform to query and analyze large volumes of data, representing both unpublished (internal) and public (external) data sources. Effective analysis of an integrated data resource using knowledge discovery tools would significantly aid biologists in conducting their research, for example, through identifying various intervention targets in parasites and in deciding the future direction of ongoing as well as planned projects. A key challenge in achieving this objective is the heterogeneity between the internal lab data, usually stored as flat files, Excel spreadsheets or custom-built databases, and the external databases. Reconciling the different forms of heterogeneity and effectively integrating data from disparate sources is a nontrivial task for biologists and requires a dedicated informatics infrastructure. Thus, we developed an integrated environment using Semantic Web technologies that may provide biologists the tools for managing and analyzing their data, without the need for acquiring in-depth computer science knowledge. We developed a semantic problem-solving environment (SPSE) that uses ontologies to integrate internal lab data with external resources in a Parasite Knowledge Base (PKB), which has the ability to query across these resources in a unified manner. The SPSE includes Web Ontology Language (OWL)-based ontologies, experimental data with its provenance information represented using the Resource Description Format (RDF), and a visual querying tool, Cuebee, that features integrated use of Web services. We demonstrate the use and benefit of SPSE using example queries for identifying gene knockout targets of Trypanosoma cruzi for vaccine development. Answers to these queries involve looking up multiple sources of data, linking them together and presenting the results. The SPSE facilitates parasitologists in leveraging the growing, but disparate, parasite data resources by offering an integrative platform that utilizes Semantic Web techniques, while keeping their workload increase minimal.

  13. Transcriptional responses to complex mixtures - A review

    EPA Science Inventory

    Exposure of people to hazardous compounds is primarily through complex environmental mixtures, those that occur through media such as air, soil, water, food, cigarette smoke, and combustion emissions. Microarray technology offers the ability to query the entire genome after expos...

  14. Regular paths in SparQL: querying the NCI Thesaurus.

    PubMed

    Detwiler, Landon T; Suciu, Dan; Brinkley, James F

    2008-11-06

    OWL, the Web Ontology Language, provides syntax and semantics for representing knowledge for the semantic web. Many of the constructs of OWL have a basis in the field of description logics. While the formal underpinnings of description logics have lead to a highly computable language, it has come at a cognitive cost. OWL ontologies are often unintuitive to readers lacking a strong logic background. In this work we describe GLEEN, a regular path expression library, which extends the RDF query language SparQL to support complex path expressions over OWL and other RDF-based ontologies. We illustrate the utility of GLEEN by showing how it can be used in a query-based approach to defining simpler, more intuitive views of OWL ontologies. In particular we show how relatively simple GLEEN-enhanced SparQL queries can create views of the OWL version of the NCI Thesaurus that match the views generated by the web-based NCI browser.

  15. Using a data base management system for modelling SSME test history data

    NASA Technical Reports Server (NTRS)

    Abernethy, K.

    1985-01-01

    The usefulness of a data base management system (DBMS) for modelling historical test data for the complete series of static test firings for the Space Shuttle Main Engine (SSME) was assessed. From an analysis of user data base query requirements, it became clear that a relational DMBS which included a relationally complete query language would permit a model satisfying the query requirements. Representative models and sample queries are discussed. A list of environment-particular evaluation criteria for the desired DBMS was constructed; these criteria include requirements in the areas of user-interface complexity, program independence, flexibility, modifiability, and output capability. The evaluation process included the construction of several prototype data bases for user assessement. The systems studied, representing the three major DBMS conceptual models, were: MIRADS, a hierarchical system; DMS-1100, a CODASYL-based network system; ORACLE, a relational system; and DATATRIEVE, a relational-type system.

  16. Analysis of Technique to Extract Data from the Web for Improved Performance

    NASA Astrophysics Data System (ADS)

    Gupta, Neena; Singh, Manish

    2010-11-01

    The World Wide Web rapidly guides the world into a newly amazing electronic world, where everyone can publish anything in electronic form and extract almost all the information. Extraction of information from semi structured or unstructured documents, such as web pages, is a useful yet complex task. Data extraction, which is important for many applications, extracts the records from the HTML files automatically. Ontologies can achieve a high degree of accuracy in data extraction. We analyze method for data extraction OBDE (Ontology-Based Data Extraction), which automatically extracts the query result records from the web with the help of agents. OBDE first constructs an ontology for a domain according to information matching between the query interfaces and query result pages from different web sites within the same domain. Then, the constructed domain ontology is used during data extraction to identify the query result section in a query result page and to align and label the data values in the extracted records. The ontology-assisted data extraction method is fully automatic and overcomes many of the deficiencies of current automatic data extraction methods.

  17. A Dimensional Bus model for integrating clinical and research data

    PubMed Central

    Hum, Richard C; Murphy, James R

    2011-01-01

    Objectives Many clinical research data integration platforms rely on the Entity–Attribute–Value model because of its flexibility, even though it presents problems in query formulation and execution time. The authors sought more balance in these traits. Materials and Methods Borrowing concepts from Entity–Attribute–Value and from enterprise data warehousing, the authors designed an alternative called the Dimensional Bus model and used it to integrate electronic medical record, sponsored study, and biorepository data. Each type of observational collection has its own table, and the structure of these tables varies to suit the source data. The observational tables are linked to the Bus, which holds provenance information and links to various classificatory dimensions that amplify the meaning of the data or facilitate its query and exposure management. Results The authors implemented a Bus-based clinical research data repository with a query system that flexibly manages data access and confidentiality, facilitates catalog search, and readily formulates and compiles complex queries. Conclusion The design provides a workable way to manage and query mixed schemas in a data warehouse. PMID:21856687

  18. Improving biomedical information retrieval by linear combinations of different query expansion techniques.

    PubMed

    Abdulla, Ahmed AbdoAziz Ahmed; Lin, Hongfei; Xu, Bo; Banbhrani, Santosh Kumar

    2016-07-25

    Biomedical literature retrieval is becoming increasingly complex, and there is a fundamental need for advanced information retrieval systems. Information Retrieval (IR) programs scour unstructured materials such as text documents in large reserves of data that are usually stored on computers. IR is related to the representation, storage, and organization of information items, as well as to access. In IR one of the main problems is to determine which documents are relevant and which are not to the user's needs. Under the current regime, users cannot precisely construct queries in an accurate way to retrieve particular pieces of data from large reserves of data. Basic information retrieval systems are producing low-quality search results. In our proposed system for this paper we present a new technique to refine Information Retrieval searches to better represent the user's information need in order to enhance the performance of information retrieval by using different query expansion techniques and apply a linear combinations between them, where the combinations was linearly between two expansion results at one time. Query expansions expand the search query, for example, by finding synonyms and reweighting original terms. They provide significantly more focused, particularized search results than do basic search queries. The retrieval performance is measured by some variants of MAP (Mean Average Precision) and according to our experimental results, the combination of best results of query expansion is enhanced the retrieved documents and outperforms our baseline by 21.06 %, even it outperforms a previous study by 7.12 %. We propose several query expansion techniques and their combinations (linearly) to make user queries more cognizable to search engines and to produce higher-quality search results.

  19. Expectations of Benefit in Early-Phase Clinical Trials: Implications for Assessing the Adequacy of Informed Consent

    PubMed Central

    Weinfurt, Kevin P.; Seils, Damon M.; Tzeng, Janice P.; Compton, Kate L.; Sulmasy, Daniel P.; Astrow, Alan B.; Solarino, Nicholas A.; Schulman, Kevin A.; Meropol, Neal J.

    2009-01-01

    Background Participants in early-phase clinical trials have reported high expectations of benefit from their participation. There is concern that participants misunderstand the trials to which they have consented. Such concern is based on assumptions about what patients mean when they respond to questions about likelihood of benefit. Methods Participants were 27 women and 18 men in early-phase oncology trials at 2 academic medical centers in the United States. To determine whether expectations of benefit differ depending on how patients are queried, we randomly assigned participants to 1 of 3 interviews corresponding to 3 questions about likelihood of benefit: frequency-type, belief-type, and vague. In semistructured interviews, we queried participants about how they understood and answered the question. Participants then answered and discussed one of the other questions. Results Expectations of benefit in response to the belief-type question were significantly greater than expectations in response to the frequency-type and vague questions (P = .02). The most common justifications involved positive attitude (n = 27 [60%]) and references to physical health (n = 23 [51%]). References to positive attitude were most common among participants with higher (> 70%) expectations (n = 11 [85%]) and least common among those with lower (< 50%) expectations (n = 3 [27%]). Conclusions The wording of questions about likelihood of benefit shapes the expectations that patients express. Also, patients who express high expectations may not do so to communicate understanding, but rather to register optimism. Ongoing research will clarify the meaning of high expectations and examine methods for assessing understanding in this context. PMID:18378940

  20. Knowledge Management Framework for Emerging Infectious Diseases Preparedness and Response: Design and Development of Public Health Document Ontology

    PubMed Central

    Zhang, Zhizun; Gonzalez, Mila C; Morse, Stephen S

    2017-01-01

    Background There are increasing concerns about our preparedness and timely coordinated response across the globe to cope with emerging infectious diseases (EIDs). This poses practical challenges that require exploiting novel knowledge management approaches effectively. Objective This work aims to develop an ontology-driven knowledge management framework that addresses the existing challenges in sharing and reusing public health knowledge. Methods We propose a systems engineering-inspired ontology-driven knowledge management approach. It decomposes public health knowledge into concepts and relations and organizes the elements of knowledge based on the teleological functions. Both knowledge and semantic rules are stored in an ontology and retrieved to answer queries regarding EID preparedness and response. Results A hybrid concept extraction was implemented in this work. The quality of the ontology was evaluated using the formal evaluation method Ontology Quality Evaluation Framework. Conclusions Our approach is a potentially effective methodology for managing public health knowledge. Accuracy and comprehensiveness of the ontology can be improved as more knowledge is stored. In the future, a survey will be conducted to collect queries from public health practitioners. The reasoning capacity of the ontology will be evaluated using the queries and hypothetical outbreaks. We suggest the importance of developing a knowledge sharing standard like the Gene Ontology for the public health domain. PMID:29021130

  1. Nanocubes for real-time exploration of spatiotemporal datasets.

    PubMed

    Lins, Lauro; Klosowski, James T; Scheidegger, Carlos

    2013-12-01

    Consider real-time exploration of large multidimensional spatiotemporal datasets with billions of entries, each defined by a location, a time, and other attributes. Are certain attributes correlated spatially or temporally? Are there trends or outliers in the data? Answering these questions requires aggregation over arbitrary regions of the domain and attributes of the data. Many relational databases implement the well-known data cube aggregation operation, which in a sense precomputes every possible aggregate query over the database. Data cubes are sometimes assumed to take a prohibitively large amount of space, and to consequently require disk storage. In contrast, we show how to construct a data cube that fits in a modern laptop's main memory, even for billions of entries; we call this data structure a nanocube. We present algorithms to compute and query a nanocube, and show how it can be used to generate well-known visual encodings such as heatmaps, histograms, and parallel coordinate plots. When compared to exact visualizations created by scanning an entire dataset, nanocube plots have bounded screen error across a variety of scales, thanks to a hierarchical structure in space and time. We demonstrate the effectiveness of our technique on a variety of real-world datasets, and present memory, timing, and network bandwidth measurements. We find that the timings for the queries in our examples are dominated by network and user-interaction latencies.

  2. STARS 2.0: 2nd-generation open-source archiving and query software

    NASA Astrophysics Data System (ADS)

    Winegar, Tom

    2008-07-01

    The Subaru Telescope is in process of developing an open-source alternative to the 1st-generation software and databases (STARS 1) used for archiving and query. For STARS 2, we have chosen PHP and Python for scripting and MySQL as the database software. We have collected feedback from staff and observers, and used this feedback to significantly improve the design and functionality of our future archiving and query software. Archiving - We identified two weaknesses in 1st-generation STARS archiving software: a complex and inflexible table structure and uncoordinated system administration for our business model: taking pictures from the summit and archiving them in both Hawaii and Japan. We adopted a simplified and normalized table structure with passive keyword collection, and we are designing an archive-to-archive file transfer system that automatically reports real-time status and error conditions and permits error recovery. Query - We identified several weaknesses in 1st-generation STARS query software: inflexible query tools, poor sharing of calibration data, and no automatic file transfer mechanisms to observers. We are developing improved query tools and sharing of calibration data, and multi-protocol unassisted file transfer mechanisms for observers. In the process, we have redefined a 'query': from an invisible search result that can only transfer once in-house right now, with little status and error reporting and no error recovery - to a stored search result that can be monitored, transferred to different locations with multiple protocols, reporting status and error conditions and permitting recovery from errors.

  3. An adaptable architecture for patient cohort identification from diverse data sources

    PubMed Central

    Bache, Richard; Miles, Simon; Taweel, Adel

    2013-01-01

    Objective We define and validate an architecture for systems that identify patient cohorts for clinical trials from multiple heterogeneous data sources. This architecture has an explicit query model capable of supporting temporal reasoning and expressing eligibility criteria independently of the representation of the data used to evaluate them. Method The architecture has the key feature that queries defined according to the query model are both pre and post-processed and this is used to address both structural and semantic heterogeneity. The process of extracting the relevant clinical facts is separated from the process of reasoning about them. A specific instance of the query model is then defined and implemented. Results We show that the specific instance of the query model has wide applicability. We then describe how it is used to access three diverse data warehouses to determine patient counts. Discussion Although the proposed architecture requires greater effort to implement the query model than would be the case for using just SQL and accessing a data-based management system directly, this effort is justified because it supports both temporal reasoning and heterogeneous data sources. The query model only needs to be implemented once no matter how many data sources are accessed. Each additional source requires only the implementation of a lightweight adaptor. Conclusions The architecture has been used to implement a specific query model that can express complex eligibility criteria and access three diverse data warehouses thus demonstrating the feasibility of this approach in dealing with temporal reasoning and data heterogeneity. PMID:24064442

  4. A semantically-aided architecture for a web-based monitoring system for carotid atherosclerosis.

    PubMed

    Kolias, Vassileios D; Stamou, Giorgos; Golemati, Spyretta; Stoitsis, Giannis; Gkekas, Christos D; Liapis, Christos D; Nikita, Konstantina S

    2015-08-01

    Carotid atherosclerosis is a multifactorial disease and its clinical diagnosis depends on the evaluation of heterogeneous clinical data, such as imaging exams, biochemical tests and the patient's clinical history. The lack of interoperability between Health Information Systems (HIS) does not allow the physicians to acquire all the necessary data for the diagnostic process. In this paper, a semantically-aided architecture is proposed for a web-based monitoring system for carotid atherosclerosis that is able to gather and unify heterogeneous data with the use of an ontology and to create a common interface for data access enhancing the interoperability of HIS. The architecture is based on an application ontology of carotid atherosclerosis that is used to (a) integrate heterogeneous data sources on the basis of semantic representation and ontological reasoning and (b) access the critical information using SPARQL query rewriting and ontology-based data access services. The architecture was tested over a carotid atherosclerosis dataset consisting of the imaging exams and the clinical profile of 233 patients, using a set of complex queries, constructed by the physicians. The proposed architecture was evaluated with respect to the complexity of the queries that the physicians could make and the retrieval speed. The proposed architecture gave promising results in terms of interoperability, data integration of heterogeneous sources with an ontological way and expanded capabilities of query and retrieval in HIS.

  5. ProBiS-CHARMMing: Web Interface for Prediction and Optimization of Ligands in Protein Binding Sites.

    PubMed

    Konc, Janez; Miller, Benjamin T; Štular, Tanja; Lešnik, Samo; Woodcock, H Lee; Brooks, Bernard R; Janežič, Dušanka

    2015-11-23

    Proteins often exist only as apo structures (unligated) in the Protein Data Bank, with their corresponding holo structures (with ligands) unavailable. However, apoproteins may not represent the amino-acid residue arrangement upon ligand binding well, which is especially problematic for molecular docking. We developed the ProBiS-CHARMMing web interface by connecting the ProBiS ( http://probis.cmm.ki.si ) and CHARMMing ( http://www.charmming.org ) web servers into one functional unit that enables prediction of protein-ligand complexes and allows for their geometry optimization and interaction energy calculation. The ProBiS web server predicts ligands (small compounds, proteins, nucleic acids, and single-atom ligands) that may bind to a query protein. This is achieved by comparing its surface structure against a nonredundant database of protein structures and finding those that have binding sites similar to that of the query protein. Existing ligands found in the similar binding sites are then transposed to the query according to predictions from ProBiS. The CHARMMing web server enables, among other things, minimization and potential energy calculation for a wide variety of biomolecular systems, and it is used here to optimize the geometry of the predicted protein-ligand complex structures using the CHARMM force field and to calculate their interaction energies with the corresponding query proteins. We show how ProBiS-CHARMMing can be used to predict ligands and their poses for a particular binding site, and minimize the predicted protein-ligand complexes to obtain representations of holoproteins. The ProBiS-CHARMMing web interface is freely available for academic users at http://probis.nih.gov.

  6. A Linguistic Truth-Valued Temporal Reasoning Formalism and Its Implementation

    NASA Astrophysics Data System (ADS)

    Lu, Zhirui; Liu, Jun; Augusto, Juan C.; Wang, Hui

    Temporality and uncertainty are important features of many real world systems. Solving problems in such systems requires the use of formal mechanism such as logic systems, statistical methods or other reasoning and decision-making methods. In this paper, we propose a linguistic truth-valued temporal reasoning formalism to enable the management of both features concurrently using a linguistic truth valued logic and a temporal logic. We also provide a backward reasoning algorithm which allows the answering of user queries. A simple but realistic scenario in a smart home application is used to illustrate our work.

  7. Of oil and ice: interview with Prince Mohammed Al-Faisal. [Towing icebergs for Middle East water supplies

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Not Available

    1978-01-01

    Prince Mohammed Al-Faisal, son of the late King Faisal of Saudi Arabia, answered queries on his proposal to tow icebergs from the South Pole to enhance water supplies. He says the project is more economically lucrative than their oil industry. Oil is a diminishing resource, but with ample water supplies (of which they are now short) additional food may be produced and may be utilized in industry. The prince was then questioned on the country's economics, industrial concerns, the workforce in Saudi Arabia, education, and traditional changes in the country. (MCW)

  8. SKYLAB (SL)-3 CREWMEN - IN-ORBIT PRESS CONFERENCE - JSC

    NASA Image and Video Library

    1973-09-21

    S73-34339 (21 Sept. 1973) --- Astronaut Alan L. Bean, right, Skylab 3 commander, answers a question during the Sept. 21, 1973 press conference from the Skylab space station in Earth orbit. This is a black and white reproduction taken from a television transmission made by a TV camera aboard the Skylab space station. Scientist-astronaut Owen K. Garriott, center, science pilot; and astronaut Jack R. Lousma, left, pilot, await queries from newsmen on the ground to be sent up by scientist-astronaut Story Musgrave, CAPCOM for this shift of Skylab 3. Photo credit: NASA

  9. Foundations for Streaming Model Transformations by Complex Event Processing.

    PubMed

    Dávid, István; Ráth, István; Varró, Dániel

    2018-01-01

    Streaming model transformations represent a novel class of transformations to manipulate models whose elements are continuously produced or modified in high volume and with rapid rate of change. Executing streaming transformations requires efficient techniques to recognize activated transformation rules over a live model and a potentially infinite stream of events. In this paper, we propose foundations of streaming model transformations by innovatively integrating incremental model query, complex event processing (CEP) and reactive (event-driven) transformation techniques. Complex event processing allows to identify relevant patterns and sequences of events over an event stream. Our approach enables event streams to include model change events which are automatically and continuously populated by incremental model queries. Furthermore, a reactive rule engine carries out transformations on identified complex event patterns. We provide an integrated domain-specific language with precise semantics for capturing complex event patterns and streaming transformations together with an execution engine, all of which is now part of the Viatra reactive transformation framework. We demonstrate the feasibility of our approach with two case studies: one in an advanced model engineering workflow; and one in the context of on-the-fly gesture recognition.

  10. Deep Question Answering for protein annotation

    PubMed Central

    Gobeill, Julien; Gaudinat, Arnaud; Pasche, Emilie; Vishnyakova, Dina; Gaudet, Pascale; Bairoch, Amos; Ruch, Patrick

    2015-01-01

    Biomedical professionals have access to a huge amount of literature, but when they use a search engine, they often have to deal with too many documents to efficiently find the appropriate information in a reasonable time. In this perspective, question-answering (QA) engines are designed to display answers, which were automatically extracted from the retrieved documents. Standard QA engines in literature process a user question, then retrieve relevant documents and finally extract some possible answers out of these documents using various named-entity recognition processes. In our study, we try to answer complex genomics questions, which can be adequately answered only using Gene Ontology (GO) concepts. Such complex answers cannot be found using state-of-the-art dictionary- and redundancy-based QA engines. We compare the effectiveness of two dictionary-based classifiers for extracting correct GO answers from a large set of 100 retrieved abstracts per question. In the same way, we also investigate the power of GOCat, a GO supervised classifier. GOCat exploits the GOA database to propose GO concepts that were annotated by curators for similar abstracts. This approach is called deep QA, as it adds an original classification step, and exploits curated biological data to infer answers, which are not explicitly mentioned in the retrieved documents. We show that for complex answers such as protein functional descriptions, the redundancy phenomenon has a limited effect. Similarly usual dictionary-based approaches are relatively ineffective. In contrast, we demonstrate how existing curated data, beyond information extraction, can be exploited by a supervised classifier, such as GOCat, to massively improve both the quantity and the quality of the answers with a +100% improvement for both recall and precision. Database URL: http://eagl.unige.ch/DeepQA4PA/ PMID:26384372

  11. Deep Question Answering for protein annotation.

    PubMed

    Gobeill, Julien; Gaudinat, Arnaud; Pasche, Emilie; Vishnyakova, Dina; Gaudet, Pascale; Bairoch, Amos; Ruch, Patrick

    2015-01-01

    Biomedical professionals have access to a huge amount of literature, but when they use a search engine, they often have to deal with too many documents to efficiently find the appropriate information in a reasonable time. In this perspective, question-answering (QA) engines are designed to display answers, which were automatically extracted from the retrieved documents. Standard QA engines in literature process a user question, then retrieve relevant documents and finally extract some possible answers out of these documents using various named-entity recognition processes. In our study, we try to answer complex genomics questions, which can be adequately answered only using Gene Ontology (GO) concepts. Such complex answers cannot be found using state-of-the-art dictionary- and redundancy-based QA engines. We compare the effectiveness of two dictionary-based classifiers for extracting correct GO answers from a large set of 100 retrieved abstracts per question. In the same way, we also investigate the power of GOCat, a GO supervised classifier. GOCat exploits the GOA database to propose GO concepts that were annotated by curators for similar abstracts. This approach is called deep QA, as it adds an original classification step, and exploits curated biological data to infer answers, which are not explicitly mentioned in the retrieved documents. We show that for complex answers such as protein functional descriptions, the redundancy phenomenon has a limited effect. Similarly usual dictionary-based approaches are relatively ineffective. In contrast, we demonstrate how existing curated data, beyond information extraction, can be exploited by a supervised classifier, such as GOCat, to massively improve both the quantity and the quality of the answers with a +100% improvement for both recall and precision. Database URL: http://eagl.unige.ch/DeepQA4PA/. © The Author(s) 2015. Published by Oxford University Press.

  12. Enhancing user privacy in SARG04-based private database query protocols

    NASA Astrophysics Data System (ADS)

    Yu, Fang; Qiu, Daowen; Situ, Haozhen; Wang, Xiaoming; Long, Shun

    2015-11-01

    The well-known SARG04 protocol can be used in a private query application to generate an oblivious key. By usage of the key, the user can retrieve one out of N items from a database without revealing which one he/she is interested in. However, the existing SARG04-based private query protocols are vulnerable to the attacks of faked data from the database since in its canonical form, the SARG04 protocol lacks means for one party to defend attacks from the other. While such attacks can cause significant loss of user privacy, a variant of the SARG04 protocol is proposed in this paper with new mechanisms designed to help the user protect its privacy in private query applications. In the protocol, it is the user who starts the session with the database, trying to learn from it bits of a raw key in an oblivious way. An honesty test is used to detect a cheating database who had transmitted faked data. The whole private query protocol has O( N) communication complexity for conveying at least N encrypted items. Compared with the existing SARG04-based protocols, it is efficient in communication for per-bit learning.

  13. Querying and Computing with BioCyc Databases

    PubMed Central

    Krummenacker, Markus; Paley, Suzanne; Mueller, Lukas; Yan, Thomas; Karp, Peter D.

    2006-01-01

    Summary We describe multiple methods for accessing and querying the complex and integrated cellular data in the BioCyc family of databases: access through multiple file formats, access through Application Program Interfaces (APIs) for LISP, Perl and Java, and SQL access through the BioWarehouse relational database. Availability The Pathway Tools software and 20 BioCyc DBs in Tiers 1 and 2 are freely available to academic users; fees apply to some types of commercial use. For download instructions see http://BioCyc.org/download.shtml PMID:15961440

  14. Systematic Clinical Reasoning in Physical Therapy (SCRIPT): Tool for the Purposeful Practice of Clinical Reasoning in Orthopedic Manual Physical Therapy.

    PubMed

    Baker, Sarah E; Painter, Elizabeth E; Morgan, Brandon C; Kaus, Anna L; Petersen, Evan J; Allen, Christopher S; Deyle, Gail D; Jensen, Gail M

    2017-01-01

    Clinical reasoning is essential to physical therapist practice. Solid clinical reasoning processes may lead to greater understanding of the patient condition, early diagnostic hypothesis development, and well-tolerated examination and intervention strategies, as well as mitigate the risk of diagnostic error. However, the complex and often subconscious nature of clinical reasoning can impede the development of this skill. Protracted tools have been published to help guide self-reflection on clinical reasoning but might not be feasible in typical clinical settings. This case illustrates how the Systematic Clinical Reasoning in Physical Therapy (SCRIPT) tool can be used to guide the clinical reasoning process and prompt a physical therapist to search the literature to answer a clinical question and facilitate formal mentorship sessions in postprofessional physical therapist training programs. The SCRIPT tool enabled the mentee to generate appropriate hypotheses, plan the examination, query the literature to answer a clinical question, establish a physical therapist diagnosis, and design an effective treatment plan. The SCRIPT tool also facilitated the mentee's clinical reasoning and provided the mentor insight into the mentee's clinical reasoning. The reliability and validity of the SCRIPT tool have not been formally studied. Clinical mentorship is a cornerstone of postprofessional training programs and intended to develop advanced clinical reasoning skills. However, clinical reasoning is often subconscious and, therefore, a challenging skill to develop. The use of a tool such as the SCRIPT may facilitate developing clinical reasoning skills by providing a systematic approach to data gathering and making clinical judgments to bring clinical reasoning to the conscious level, facilitate self-reflection, and make a mentored physical therapist's thought processes explicit to his or her clinical mentor. © 2017 American Physical Therapy Association

  15. EpiK: A Knowledge Base for Epidemiological Modeling and Analytics of Infectious Diseases

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hasan, S. M. Shamimul; Fox, Edward A.; Bisset, Keith

    Computational epidemiology seeks to develop computational methods to study the distribution and determinants of health-related states or events (including disease), and the application of this study to the control of diseases and other health problems. Recent advances in computing and data sciences have led to the development of innovative modeling environments to support this important goal. The datasets used to drive the dynamic models as well as the data produced by these models presents unique challenges owing to their size, heterogeneity and diversity. These datasets form the basis of effective and easy to use decision support and analytical environments. Asmore » a result, it is important to develop scalable data management systems to store, manage and integrate these datasets. In this paper, we develop EpiK—a knowledge base that facilitates the development of decision support and analytical environments to support epidemic science. An important goal is to develop a framework that links the input as well as output datasets to facilitate effective spatio-temporal and social reasoning that is critical in planning and intervention analysis before and during an epidemic. The data management framework links modeling workflow data and its metadata using a controlled vocabulary. The metadata captures information about storage, the mapping between the linked model and the physical layout, and relationships to support services. EpiK is designed to support agent-based modeling and analytics frameworks—aggregate models can be seen as special cases and are thus supported. We use semantic web technologies to create a representation of the datasets that encapsulates both the location and the schema heterogeneity. The choice of RDF as a representation language is motivated by the diversity and growth of the datasets that need to be integrated. A query bank is developed—the queries capture a broad range of questions that can be posed and answered during a typical case study pertaining to disease outbreaks. The queries are constructed using SPARQL Protocol and RDF Query Language (SPARQL) over the EpiK. EpiK can hide schema and location heterogeneity while efficiently supporting queries that span the computational epidemiology modeling pipeline: from model construction to simulation output. As a result, we show that the performance of benchmark queries varies significantly with respect to the choice of hardware underlying the database and resource description framework (RDF) engine.« less

  16. EpiK: A Knowledge Base for Epidemiological Modeling and Analytics of Infectious Diseases

    DOE PAGES

    Hasan, S. M. Shamimul; Fox, Edward A.; Bisset, Keith; ...

    2017-11-06

    Computational epidemiology seeks to develop computational methods to study the distribution and determinants of health-related states or events (including disease), and the application of this study to the control of diseases and other health problems. Recent advances in computing and data sciences have led to the development of innovative modeling environments to support this important goal. The datasets used to drive the dynamic models as well as the data produced by these models presents unique challenges owing to their size, heterogeneity and diversity. These datasets form the basis of effective and easy to use decision support and analytical environments. Asmore » a result, it is important to develop scalable data management systems to store, manage and integrate these datasets. In this paper, we develop EpiK—a knowledge base that facilitates the development of decision support and analytical environments to support epidemic science. An important goal is to develop a framework that links the input as well as output datasets to facilitate effective spatio-temporal and social reasoning that is critical in planning and intervention analysis before and during an epidemic. The data management framework links modeling workflow data and its metadata using a controlled vocabulary. The metadata captures information about storage, the mapping between the linked model and the physical layout, and relationships to support services. EpiK is designed to support agent-based modeling and analytics frameworks—aggregate models can be seen as special cases and are thus supported. We use semantic web technologies to create a representation of the datasets that encapsulates both the location and the schema heterogeneity. The choice of RDF as a representation language is motivated by the diversity and growth of the datasets that need to be integrated. A query bank is developed—the queries capture a broad range of questions that can be posed and answered during a typical case study pertaining to disease outbreaks. The queries are constructed using SPARQL Protocol and RDF Query Language (SPARQL) over the EpiK. EpiK can hide schema and location heterogeneity while efficiently supporting queries that span the computational epidemiology modeling pipeline: from model construction to simulation output. As a result, we show that the performance of benchmark queries varies significantly with respect to the choice of hardware underlying the database and resource description framework (RDF) engine.« less

  17. A medical enigma : what "gale chinoise (Chinese scabies)" meant 150 years ago.

    PubMed

    Gaüzere, B-A; Aubry, P

    2017-06-01

    The meaning of the term "gale chinoise" mentioned in some articles about French overseas territories in the 19th century, remains unclear. In response to a query of an American colleague dermatologist trying to find out what it meant 150 years ago, we attempted to elucidate the nature of this ancient disease, which today would be translated literally as Chinese scabies. We submitted the query to a panel of civilian and military French tropical medicine specialists including dermatologists, through two networks : Association Amicale Santé Navale et d'Outre-Mer and Société de Pathologie Exotique. Very few answers were received from the approximately 400 colleagues in these networks. They mentioned : ciguatera, other types of ichtyosarcotoxism, smallpox, and leprosy. Several said they never encountered this term during many years spent in French Polynesia, and none was able to find irrefutable proof of their suggestion. Discussion and conclusion. Leprosy, smallpox, ciguatera? The identity of "gale chinoise" remains an enigma ; it might have been intended to designate several different diseases.

  18. Incremental Ontology-Based Extraction and Alignment in Semi-structured Documents

    NASA Astrophysics Data System (ADS)

    Thiam, Mouhamadou; Bennacer, Nacéra; Pernelle, Nathalie; Lô, Moussa

    SHIRIis an ontology-based system for integration of semi-structured documents related to a specific domain. The system’s purpose is to allow users to access to relevant parts of documents as answers to their queries. SHIRI uses RDF/OWL for representation of resources and SPARQL for their querying. It relies on an automatic, unsupervised and ontology-driven approach for extraction, alignment and semantic annotation of tagged elements of documents. In this paper, we focus on the Extract-Align algorithm which exploits a set of named entity and term patterns to extract term candidates to be aligned with the ontology. It proceeds in an incremental manner in order to populate the ontology with terms describing instances of the domain and to reduce the access to extern resources such as Web. We experiment it on a HTML corpus related to call for papers in computer science and the results that we obtain are very promising. These results show how the incremental behaviour of Extract-Align algorithm enriches the ontology and the number of terms (or named entities) aligned directly with the ontology increases.

  19. TOM: a web-based integrated approach for identification of candidate disease genes.

    PubMed

    Rossi, Simona; Masotti, Daniele; Nardini, Christine; Bonora, Elena; Romeo, Giovanni; Macii, Enrico; Benini, Luca; Volinia, Stefano

    2006-07-01

    The massive production of biological data by means of highly parallel devices like microarrays for gene expression has paved the way to new possible approaches in molecular genetics. Among them the possibility of inferring biological answers by querying large amounts of expression data. Based on this principle, we present here TOM, a web-based resource for the efficient extraction of candidate genes for hereditary diseases. The service requires the previous knowledge of at least another gene responsible for the disease and the linkage area, or else of two disease associated genetic intervals. The algorithm uses the information stored in public resources, including mapping, expression and functional databases. Given the queries, TOM will select and list one or more candidate genes. This approach allows the geneticist to bypass the costly and time consuming tracing of genetic markers through entire families and might improve the chance of identifying disease genes, particularly for rare diseases. We present here the tool and the results obtained on known benchmark and on hereditary predisposition to familial thyroid cancer. Our algorithm is available at http://www-micrel.deis.unibo.it/~tom/.

  20. X-ray effects on pacemaker type circuits

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Blamires, N.G.; Myatt, J.

    1982-03-01

    Queries have been raised concerning the potential hazards of X-ray irradiation on patients using the new generation of heart pacemakers based on digital circuitry. The present study was undertaken to provide some answers to these queries. The work was conducted in two parts. First, a literature search was done and, second, circuits using current state of the art digital technology were irradiated with X-rays. Watch circuits were chosen because of their availability and built-in facilities by which their function could be tested. Doses up to 330 rads were administered to them using energies of 46, 114, and 141 KeV. Themore » conclusion drawn from both parts of the study was that X-rays used for diagnostic purposes were unlikely to affect the performance of this type of circuit in any way. It was accepted that for therapeutic purposes doses far in excess of this are administered and circuit malfunctions are likely to occur. To assess the probability of a digital pacemaker malfunctioning, samples of that particular type would have to be irradiated at the relevant dose.« less

  1. A probabilistic NF2 relational algebra for integrated information retrieval and database systems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Fuhr, N.; Roelleke, T.

    The integration of information retrieval (IR) and database systems requires a data model which allows for modelling documents as entities, representing uncertainty and vagueness and performing uncertain inference. For this purpose, we present a probabilistic data model based on relations in non-first-normal-form (NF2). Here, tuples are assigned probabilistic weights giving the probability that a tuple belongs to a relation. Thus, the set of weighted index terms of a document are represented as a probabilistic subrelation. In a similar way, imprecise attribute values are modelled as a set-valued attribute. We redefine the relational operators for this type of relations such thatmore » the result of each operator is again a probabilistic NF2 relation, where the weight of a tuple gives the probability that this tuple belongs to the result. By ordering the tuples according to decreasing probabilities, the model yields a ranking of answers like in most IR models. This effect also can be used for typical database queries involving imprecise attribute values as well as for combinations of database and IR queries.« less

  2. Perceived Barriers to Information Access Among Medical Residents in Iran: Obstacles to Answering Clinical Queries in Settings with Limited Internet Accessibility

    PubMed Central

    Mazloomdoost, Danesh; Mehregan, Shervineh; Mahmoudi, Hilda; Soltani, Akbar; Embi, Peter J.

    2007-01-01

    Studies performed in the US and other Western countries have documented that physicians generate many clinical questions during a typical day and rely on various information sources for answers. Little is known about the information seeking behaviors of physicians practicing in other countries, particularly those with limited Internet connectivity. We conducted this study to document the perceived barriers to information resources used by medical residents in Iran. Our findings reveal that different perceived barriers exist for electronic versus paper-based resources. Notably, paper-based resources are perceived to be limited by resident time-constraints and availability of resources, whereas electronic resources are limited by cost decentralized resources (such as PDAs) and accessibility of centralized, Internet access. These findings add to the limited literature regarding health information-seeking activities in international healthcare settings, particularly those with limited Internet connectivity, and will supplement future studies of and interventions in such settings. PMID:18693891

  3. Perceived barriers to information access among medical residents in Iran: obstacles to answering clinical queries in settings with limited Internet accessibility.

    PubMed

    Mazloomdoost, Danesh; Mehregan, Shervineh; Mahmoudi, Hilda; Soltani, Akbar; Embi, Peter J

    2007-10-11

    Studies performed in the US and other Western countries have documented that physicians generate many clinical questions during a typical day and rely on various information sources for answers. Little is known about the information seeking behaviors of physicians practicing in other countries, particularly those with limited Internet connectivity. We conducted this study to document the perceived barriers to information resources used by medical residents in Iran. Our findings reveal that different perceived barriers exist for electronic versus paper-based resources. Notably, paper-based resources are perceived to be limited by resident time-constraints and availability of resources, whereas electronic resources are limited by cost decentralized resources (such as PDAs) and accessibility of centralized, Internet access. These findings add to the limited literature regarding health information-seeking activities in international healthcare settings, particularly those with limited Internet connectivity, and will supplement future studies of and interventions in such settings.

  4. Social media and the classroom?

    NASA Astrophysics Data System (ADS)

    2015-02-01

    Many years ago, I learned (through eavesdropping on a conversation during lab) that my students had set up their own Facebook group. They told me they were using it to help each other with homework assignments. This year, my daughter took physics at a university. She and her friends were struggling a bit with the online quizzes. I suggested that she set up a Facebook community and add me as a member. I would answer questions and help the group study. My daughter's group used Facebook to get answers to specific questions from the quizzes. They often ended up helping each other because the questions were posed quite late in the evening. Questions ranged from exact copies of the original queries to "Does anyone know what equation to use for this?" I began to think that, although their grades were improving on the quizzes, they were not gaining any content knowledge. To combat this, I made and posted a few short video clips reteaching the content.

  5. Muscle Logic: New Knowledge Resource for Anatomy Enables Comprehensive Searches of the Literature on the Feeding Muscles of Mammals

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Druzinsky, Robert E.; Balhoff, James P.; Crompton, Alfred W.

    Here we present the Mammalian Feeding Muscle Ontology (MFMO), a multi-species ontology focused on anatomical structures that participate in feeding and other oral/pharyngeal behaviors. A unique feature of the MFMO is that a simple, computable, definition of each muscle, which includes its attachments and innervation, is true across mammals. This construction mirrors the logical foundation of comparative anatomy and permits searches using language familiar to biologists. Further, it provides a template for muscles that will be useful in extending any anatomy ontology. The MFMO is developed to support the Feeding Experiments End-User Database Project (FEED, https://feedexp.org/), a publicly-available, online repositorymore » for physiological data collected from in vivo studies of feeding (e.g., mastication, biting, swallowing) in mammals. Currently the MFMO is integrated into FEED and also into two literature-specific implementations of Textpresso, a text-mining system that facilitates powerful searches of a corpus of scientific publications. We evaluate the MFMO by asking questions that test the ability of the ontology to return appropriate answers (competency questions). Lastly, we compare the results of queries of the MFMO to results from similar searches in PubMed and Google Scholar. Our tests demonstrate that the MFMO is competent to answer queries formed in the common language of comparative anatomy, but PubMed and Google Scholar are not. Overall, our results show that by incorporating anatomical ontologies into searches, an expanded and anatomically comprehensive set of results can be obtained. The broader scientific and publishing communities should consider taking up the challenge of semantically enabled search capabilities.« less

  6. Information Communication using Knowledge Engine on Flood Issues

    NASA Astrophysics Data System (ADS)

    Demir, I.; Krajewski, W. F.

    2012-04-01

    The Iowa Flood Information System (IFIS) is a web-based platform developed by the Iowa Flood Center (IFC) to provide access to and visualization of flood inundation maps, real-time flood conditions, flood forecasts both short-term and seasonal, and other flood-related data for communities in Iowa. The system is designed for use by general public, often people with no domain knowledge and poor general science background. To improve effective communication with such audience, we have introduced a new way in IFIS to get information on flood related issues - instead of by navigating within hundreds of features and interfaces of the information system and web-based sources-- by providing dynamic computations based on a collection of built-in data, analysis, and methods. The IFIS Knowledge Engine connects to distributed sources of real-time stream gauges, and in-house data sources, analysis and visualization tools to answer questions grouped into several categories. Users will be able to provide input based on the query within the categories of rainfall, flood conditions, forecast, inundation maps, flood risk and data sensors. Our goal is the systematization of knowledge on flood related issues, and to provide a single source for definitive answers to factual queries. Long-term goal of this knowledge engine is to make all flood related knowledge easily accessible to everyone, and provide educational geoinformatics tool. The future implementation of the system will be able to accept free-form input and voice recognition capabilities within browser and mobile applications. We intend to deliver increasing capabilities for the system over the coming releases of IFIS. This presentation provides an overview of our Knowledge Engine, its unique information interface and functionality as an educational tool, and discusses the future plans for providing knowledge on flood related issues and resources.

  7. Muscle Logic: New Knowledge Resource for Anatomy Enables Comprehensive Searches of the Literature on the Feeding Muscles of Mammals

    DOE PAGES

    Druzinsky, Robert E.; Balhoff, James P.; Crompton, Alfred W.; ...

    2016-02-12

    Here we present the Mammalian Feeding Muscle Ontology (MFMO), a multi-species ontology focused on anatomical structures that participate in feeding and other oral/pharyngeal behaviors. A unique feature of the MFMO is that a simple, computable, definition of each muscle, which includes its attachments and innervation, is true across mammals. This construction mirrors the logical foundation of comparative anatomy and permits searches using language familiar to biologists. Further, it provides a template for muscles that will be useful in extending any anatomy ontology. The MFMO is developed to support the Feeding Experiments End-User Database Project (FEED, https://feedexp.org/), a publicly-available, online repositorymore » for physiological data collected from in vivo studies of feeding (e.g., mastication, biting, swallowing) in mammals. Currently the MFMO is integrated into FEED and also into two literature-specific implementations of Textpresso, a text-mining system that facilitates powerful searches of a corpus of scientific publications. We evaluate the MFMO by asking questions that test the ability of the ontology to return appropriate answers (competency questions). Lastly, we compare the results of queries of the MFMO to results from similar searches in PubMed and Google Scholar. Our tests demonstrate that the MFMO is competent to answer queries formed in the common language of comparative anatomy, but PubMed and Google Scholar are not. Overall, our results show that by incorporating anatomical ontologies into searches, an expanded and anatomically comprehensive set of results can be obtained. The broader scientific and publishing communities should consider taking up the challenge of semantically enabled search capabilities.« less

  8. Advice from a Medical Expert through the Internet on Queries about AIDS and Hepatitis: Analysis of a Pilot Experiment

    PubMed Central

    Marco, Javier; Barba, Raquel; Losa, Juan E; de la Serna, Carlos Martínez; Sainz, María; Lantigua, Isabel Fernández; de la Serna, Jose Luis

    2006-01-01

    Background Advice from a medical expert on concerns and queries expressed anonymously through the Internet by patients and later posted on the Web, offers a new type of patient–doctor relationship. The aim of the current study was to perform a descriptive analysis of questions about AIDS and hepatitis made to an infectious disease expert and sent through the Internet to a consumer-oriented Web site in the Spanish language. Methods and Findings Questions were e-mailed and the questions and answers were posted anonymously in the “expert-advice” section of a Web site focused on AIDS and hepatitis. We performed a descriptive study and a temporal analysis of the questions received in the first 12 months after the launch of the site. A total of 899 questions were received from December 2003 to November 2004, with a marked linear growth pattern. Questions originated in Spain in 68% of cases and 32% came from Latin America (the Caribbean, Central America, and South America). Eighty percent of the senders were male. Most of the questions concerned HIV infection (79%) with many fewer on hepatitis (17%) . The highest numbers of questions were submitted just after the weekend (37% of questions were made on Mondays and Tuesdays). Risk factors for contracting HIV infection were the most frequent concern (69%), followed by the window period for detection (12.6%), laboratory results (5.9%), symptoms (4.7%), diagnosis (2.7%), and treatment (2.2%). Conclusions Our results confirm a great demand for this type of “ask-the-expert” Internet service, at least for AIDS and hepatitis. Factors such as anonymity, free access, and immediate answers have been key factors in its success. PMID:16796404

  9. Effectiveness of national evidence-based medicine competition in Taiwan

    PubMed Central

    2013-01-01

    Background Competition and education are intimately related and can be combined in many ways. The role of competition in medical education of evidence-based medicine (EBM) has not been investigated. In order to enhance the dissemination and implementation of EBM in Taiwan, EBM competitions have been established among healthcare professionals. This study was to evaluate the impact of competition in EBM learning. Methods The EBM competition used PICO (patient, intervention, comparison, and outcome) queries to examine participants’ skills in framing an answerable question, literature search, critical appraisal and clinical application among interdisciplinary teams. A structured questionnaire survey was conducted to investigate EBM among participants in the years of 2009 and 2011. Participants completed a baseline questionnaire survey at three months prior to the competition and finished the same questionnaire right after the competition. Results Valid questionnaires were collected from 358 participants, included 162 physicians, 71 nurses, 101 pharmacists, and 24 other allied healthcare professionals. There were significant increases in participants’ knowledge of and skills in EBM (p < 0.001). Their barriers to literature searching and forming answerable questions significantly decreased (p < 0.01). Furthermore, there were significant increases in their access to the evidence-based retrieval databases, including the Cochrane Library (p < 0.001), MD Consult (p < 0.001), ProQuest (p < 0.001), UpToDate (p = 0.001), CINAHL (p = 0.001), and MicroMedex (p = 0.024). Conclusions The current study demonstrates a method that successfully enhanced the knowledge of, skills in, and behavior of EBM. The data suggest competition using PICO queries may serve as an effective way to facilitate the learning of EBM. PMID:23651869

  10. Bilastine in allergic rhinoconjunctivitis and urticaria: a practical approach to treatment decisions based on queries received by the medical information department

    PubMed Central

    Leceta, Amalia; Sologuren, Ander; Valiente, Román; Campo, Cristina; Labeaga, Luis

    2017-01-01

    Background Bilastine is a safe and effective commonly prescribed non-sedating H1-antihistamine approved for symptomatic treatment in patients with allergic disorders such as rhinoconjunctivitis and urticaria. It was evaluated in many patients throughout the clinical development required for its approval, but clinical trials generally exclude many patients who will benefit in everyday clinical practice (especially those with coexisting diseases and/or being treated with concomitant drugs). Following its introduction into clinical practice, the Medical Information Specialists at Faes Farma have received many practical queries regarding the optimal use of bilastine in different circumstances. Data sources and methods Queries received by the Medical Information Department and the responses provided to senders of these queries. Results The most frequent questions received by the Medical Information Department included the potential for drug-drug interactions with bilastine and commonly used agents such as anticoagulants (including the novel oral anticoagulants), antiretrovirals, antituberculosis regimens, corticosteroids, digoxin, oral contraceptives, and proton pump inhibitors. Most of these medicines are not usually allowed in clinical trials, and so advice needs to be based upon the pharmacological profiles of the drugs involved and expert opinion. The pharmacokinetic profile of bilastine appears favourable since it undergoes negligible metabolism and is almost exclusively eliminated via renal excretion, and it neither induces nor inhibits the activity of several isoenzymes from the CYP 450 system. Consequently, bilastine does not interact with cytochrome metabolic pathways. Other queries involved specific patient groups such as subjects with renal impairment, women who are breastfeeding or who are trying to become pregnant, and patients with other concomitant diseases. Interestingly, several questions related to topics that are well covered in the Summary of Product Characteristics (SmPC), which suggests that this resource is not being well used. Conclusions Overall, this analysis highlights gaps in our knowledge regarding the optimal use of bilastine. Expert opinion based upon an understanding of the science can help in the decision-making, but more research is needed to provide evidence-based answers in certain circumstances. PMID:28210286

  11. Bilastine in allergic rhinoconjunctivitis and urticaria: a practical approach to treatment decisions based on queries received by the medical information department.

    PubMed

    Leceta, Amalia; Sologuren, Ander; Valiente, Román; Campo, Cristina; Labeaga, Luis

    2017-01-01

    Bilastine is a safe and effective commonly prescribed non-sedating H 1 -antihistamine approved for symptomatic treatment in patients with allergic disorders such as rhinoconjunctivitis and urticaria. It was evaluated in many patients throughout the clinical development required for its approval, but clinical trials generally exclude many patients who will benefit in everyday clinical practice (especially those with coexisting diseases and/or being treated with concomitant drugs). Following its introduction into clinical practice, the Medical Information Specialists at Faes Farma have received many practical queries regarding the optimal use of bilastine in different circumstances. Queries received by the Medical Information Department and the responses provided to senders of these queries. The most frequent questions received by the Medical Information Department included the potential for drug-drug interactions with bilastine and commonly used agents such as anticoagulants (including the novel oral anticoagulants), antiretrovirals, antituberculosis regimens, corticosteroids, digoxin, oral contraceptives, and proton pump inhibitors. Most of these medicines are not usually allowed in clinical trials, and so advice needs to be based upon the pharmacological profiles of the drugs involved and expert opinion. The pharmacokinetic profile of bilastine appears favourable since it undergoes negligible metabolism and is almost exclusively eliminated via renal excretion, and it neither induces nor inhibits the activity of several isoenzymes from the CYP 450 system. Consequently, bilastine does not interact with cytochrome metabolic pathways. Other queries involved specific patient groups such as subjects with renal impairment, women who are breastfeeding or who are trying to become pregnant, and patients with other concomitant diseases. Interestingly, several questions related to topics that are well covered in the Summary of Product Characteristics (SmPC), which suggests that this resource is not being well used. Overall, this analysis highlights gaps in our knowledge regarding the optimal use of bilastine. Expert opinion based upon an understanding of the science can help in the decision-making, but more research is needed to provide evidence-based answers in certain circumstances.

  12. Knowledge Management Framework for Emerging Infectious Diseases Preparedness and Response: Design and Development of Public Health Document Ontology.

    PubMed

    Zhang, Zhizun; Gonzalez, Mila C; Morse, Stephen S; Venkatasubramanian, Venkat

    2017-10-11

    There are increasing concerns about our preparedness and timely coordinated response across the globe to cope with emerging infectious diseases (EIDs). This poses practical challenges that require exploiting novel knowledge management approaches effectively. This work aims to develop an ontology-driven knowledge management framework that addresses the existing challenges in sharing and reusing public health knowledge. We propose a systems engineering-inspired ontology-driven knowledge management approach. It decomposes public health knowledge into concepts and relations and organizes the elements of knowledge based on the teleological functions. Both knowledge and semantic rules are stored in an ontology and retrieved to answer queries regarding EID preparedness and response. A hybrid concept extraction was implemented in this work. The quality of the ontology was evaluated using the formal evaluation method Ontology Quality Evaluation Framework. Our approach is a potentially effective methodology for managing public health knowledge. Accuracy and comprehensiveness of the ontology can be improved as more knowledge is stored. In the future, a survey will be conducted to collect queries from public health practitioners. The reasoning capacity of the ontology will be evaluated using the queries and hypothetical outbreaks. We suggest the importance of developing a knowledge sharing standard like the Gene Ontology for the public health domain. ©Zhizun Zhang, Mila C Gonzalez, Stephen S Morse, Venkat Venkatasubramanian. Originally published in JMIR Research Protocols (http://www.researchprotocols.org), 11.10.2017.

  13. Protecting count queries in study design

    PubMed Central

    Sarwate, Anand D; Boxwala, Aziz A

    2012-01-01

    Objective Today's clinical research institutions provide tools for researchers to query their data warehouses for counts of patients. To protect patient privacy, counts are perturbed before reporting; this compromises their utility for increased privacy. The goal of this study is to extend current query answer systems to guarantee a quantifiable level of privacy and allow users to tailor perturbations to maximize the usefulness according to their needs. Methods A perturbation mechanism was designed in which users are given options with respect to scale and direction of the perturbation. The mechanism translates the true count, user preferences, and a privacy level within administrator-specified bounds into a probability distribution from which the perturbed count is drawn. Results Users can significantly impact the scale and direction of the count perturbation and can receive more accurate final cohort estimates. Strong and semantically meaningful differential privacy is guaranteed, providing for a unified privacy accounting system that can support role-based trust levels. This study provides an open source web-enabled tool to investigate visually and numerically the interaction between system parameters, including required privacy level and user preference settings. Conclusions Quantifying privacy allows system administrators to provide users with a privacy budget and to monitor its expenditure, enabling users to control the inevitable loss of utility. While current measures of privacy are conservative, this system can take advantage of future advances in privacy measurement. The system provides new ways of trading off privacy and utility that are not provided in current study design systems. PMID:22511018

  14. Protecting count queries in study design.

    PubMed

    Vinterbo, Staal A; Sarwate, Anand D; Boxwala, Aziz A

    2012-01-01

    Today's clinical research institutions provide tools for researchers to query their data warehouses for counts of patients. To protect patient privacy, counts are perturbed before reporting; this compromises their utility for increased privacy. The goal of this study is to extend current query answer systems to guarantee a quantifiable level of privacy and allow users to tailor perturbations to maximize the usefulness according to their needs. A perturbation mechanism was designed in which users are given options with respect to scale and direction of the perturbation. The mechanism translates the true count, user preferences, and a privacy level within administrator-specified bounds into a probability distribution from which the perturbed count is drawn. Users can significantly impact the scale and direction of the count perturbation and can receive more accurate final cohort estimates. Strong and semantically meaningful differential privacy is guaranteed, providing for a unified privacy accounting system that can support role-based trust levels. This study provides an open source web-enabled tool to investigate visually and numerically the interaction between system parameters, including required privacy level and user preference settings. Quantifying privacy allows system administrators to provide users with a privacy budget and to monitor its expenditure, enabling users to control the inevitable loss of utility. While current measures of privacy are conservative, this system can take advantage of future advances in privacy measurement. The system provides new ways of trading off privacy and utility that are not provided in current study design systems.

  15. Developing a kidney and urinary pathway knowledge base

    PubMed Central

    2011-01-01

    Background Chronic renal disease is a global health problem. The identification of suitable biomarkers could facilitate early detection and diagnosis and allow better understanding of the underlying pathology. One of the challenges in meeting this goal is the necessary integration of experimental results from multiple biological levels for further analysis by data mining. Data integration in the life science is still a struggle, and many groups are looking to the benefits promised by the Semantic Web for data integration. Results We present a Semantic Web approach to developing a knowledge base that integrates data from high-throughput experiments on kidney and urine. A specialised KUP ontology is used to tie the various layers together, whilst background knowledge from external databases is incorporated by conversion into RDF. Using SPARQL as a query mechanism, we are able to query for proteins expressed in urine and place these back into the context of genes expressed in regions of the kidney. Conclusions The KUPKB gives KUP biologists the means to ask queries across many resources in order to aggregate knowledge that is necessary for answering biological questions. The Semantic Web technologies we use, together with the background knowledge from the domain’s ontologies, allows both rapid conversion and integration of this knowledge base. The KUPKB is still relatively small, but questions remain about scalability, maintenance and availability of the knowledge itself. Availability The KUPKB may be accessed via http://www.e-lico.eu/kupkb. PMID:21624162

  16. Big Data Analytics with Datalog Queries on Spark.

    PubMed

    Shkapsky, Alexander; Yang, Mohan; Interlandi, Matteo; Chiu, Hsuan; Condie, Tyson; Zaniolo, Carlo

    2016-01-01

    There is great interest in exploiting the opportunity provided by cloud computing platforms for large-scale analytics. Among these platforms, Apache Spark is growing in popularity for machine learning and graph analytics. Developing efficient complex analytics in Spark requires deep understanding of both the algorithm at hand and the Spark API or subsystem APIs (e.g., Spark SQL, GraphX). Our BigDatalog system addresses the problem by providing concise declarative specification of complex queries amenable to efficient evaluation. Towards this goal, we propose compilation and optimization techniques that tackle the important problem of efficiently supporting recursion in Spark. We perform an experimental comparison with other state-of-the-art large-scale Datalog systems and verify the efficacy of our techniques and effectiveness of Spark in supporting Datalog-based analytics.

  17. Big Data Analytics with Datalog Queries on Spark

    PubMed Central

    Shkapsky, Alexander; Yang, Mohan; Interlandi, Matteo; Chiu, Hsuan; Condie, Tyson; Zaniolo, Carlo

    2017-01-01

    There is great interest in exploiting the opportunity provided by cloud computing platforms for large-scale analytics. Among these platforms, Apache Spark is growing in popularity for machine learning and graph analytics. Developing efficient complex analytics in Spark requires deep understanding of both the algorithm at hand and the Spark API or subsystem APIs (e.g., Spark SQL, GraphX). Our BigDatalog system addresses the problem by providing concise declarative specification of complex queries amenable to efficient evaluation. Towards this goal, we propose compilation and optimization techniques that tackle the important problem of efficiently supporting recursion in Spark. We perform an experimental comparison with other state-of-the-art large-scale Datalog systems and verify the efficacy of our techniques and effectiveness of Spark in supporting Datalog-based analytics. PMID:28626296

  18. Partitioning medical image databases for content-based queries on a Grid.

    PubMed

    Montagnat, J; Breton, V; E Magnin, I

    2005-01-01

    In this paper we study the impact of executing a medical image database query application on the grid. For lowering the total computation time, the image database is partitioned into subsets to be processed on different grid nodes. A theoretical model of the application complexity and estimates of the grid execution overhead are used to efficiently partition the database. We show results demonstrating that smart partitioning of the database can lead to significant improvements in terms of total computation time. Grids are promising for content-based image retrieval in medical databases.

  19. Knowledge and Theme Discovery across Very Large Biological Data Sets Using Distributed Queries: A Prototype Combining Unstructured and Structured Data

    PubMed Central

    Repetski, Stephen; Venkataraman, Girish; Che, Anney; Luke, Brian T.; Girard, F. Pascal; Stephens, Robert M.

    2013-01-01

    As the discipline of biomedical science continues to apply new technologies capable of producing unprecedented volumes of noisy and complex biological data, it has become evident that available methods for deriving meaningful information from such data are simply not keeping pace. In order to achieve useful results, researchers require methods that consolidate, store and query combinations of structured and unstructured data sets efficiently and effectively. As we move towards personalized medicine, the need to combine unstructured data, such as medical literature, with large amounts of highly structured and high-throughput data such as human variation or expression data from very large cohorts, is especially urgent. For our study, we investigated a likely biomedical query using the Hadoop framework. We ran queries using native MapReduce tools we developed as well as other open source and proprietary tools. Our results suggest that the available technologies within the Big Data domain can reduce the time and effort needed to utilize and apply distributed queries over large datasets in practical clinical applications in the life sciences domain. The methodologies and technologies discussed in this paper set the stage for a more detailed evaluation that investigates how various data structures and data models are best mapped to the proper computational framework. PMID:24312478

  20. Knowledge and theme discovery across very large biological data sets using distributed queries: a prototype combining unstructured and structured data.

    PubMed

    Mudunuri, Uma S; Khouja, Mohamad; Repetski, Stephen; Venkataraman, Girish; Che, Anney; Luke, Brian T; Girard, F Pascal; Stephens, Robert M

    2013-01-01

    As the discipline of biomedical science continues to apply new technologies capable of producing unprecedented volumes of noisy and complex biological data, it has become evident that available methods for deriving meaningful information from such data are simply not keeping pace. In order to achieve useful results, researchers require methods that consolidate, store and query combinations of structured and unstructured data sets efficiently and effectively. As we move towards personalized medicine, the need to combine unstructured data, such as medical literature, with large amounts of highly structured and high-throughput data such as human variation or expression data from very large cohorts, is especially urgent. For our study, we investigated a likely biomedical query using the Hadoop framework. We ran queries using native MapReduce tools we developed as well as other open source and proprietary tools. Our results suggest that the available technologies within the Big Data domain can reduce the time and effort needed to utilize and apply distributed queries over large datasets in practical clinical applications in the life sciences domain. The methodologies and technologies discussed in this paper set the stage for a more detailed evaluation that investigates how various data structures and data models are best mapped to the proper computational framework.

  1. Modeling and mining term association for improving biomedical information retrieval performance.

    PubMed

    Hu, Qinmin; Huang, Jimmy Xiangji; Hu, Xiaohua

    2012-06-11

    The growth of the biomedical information requires most information retrieval systems to provide short and specific answers in response to complex user queries. Semantic information in the form of free text that is structured in a way makes it straightforward for humans to read but more difficult for computers to interpret automatically and search efficiently. One of the reasons is that most traditional information retrieval models assume terms are conditionally independent given a document/passage. Therefore, we are motivated to consider term associations within different contexts to help the models understand semantic information and use it for improving biomedical information retrieval performance. We propose a term association approach to discover term associations among the keywords from a query. The experiments are conducted on the TREC 2004-2007 Genomics data sets and the TREC 2004 HARD data set. The proposed approach is promising and achieves superiority over the baselines and the GSP results. The parameter settings and different indices are investigated that the sentence-based index produces the best results in terms of the document-level, the word-based index for the best results in terms of the passage-level and the paragraph-based index for the best results in terms of the passage2-level. Furthermore, the best term association results always come from the best baseline. The tuning number k in the proposed recursive re-ranking algorithm is discussed and locally optimized to be 10. First, modelling term association for improving biomedical information retrieval using factor analysis, is one of the major contributions in our work. Second, the experiments confirm that term association considering co-occurrence and dependency among the keywords can produce better results than the baselines treating the keywords independently. Third, the baselines are re-ranked according to the importance and reliance of latent factors behind term associations. These latent factors are decided by the proposed model and their term appearances in the first round retrieved passages.

  2. Modeling and mining term association for improving biomedical information retrieval performance

    PubMed Central

    2012-01-01

    Background The growth of the biomedical information requires most information retrieval systems to provide short and specific answers in response to complex user queries. Semantic information in the form of free text that is structured in a way makes it straightforward for humans to read but more difficult for computers to interpret automatically and search efficiently. One of the reasons is that most traditional information retrieval models assume terms are conditionally independent given a document/passage. Therefore, we are motivated to consider term associations within different contexts to help the models understand semantic information and use it for improving biomedical information retrieval performance. Results We propose a term association approach to discover term associations among the keywords from a query. The experiments are conducted on the TREC 2004-2007 Genomics data sets and the TREC 2004 HARD data set. The proposed approach is promising and achieves superiority over the baselines and the GSP results. The parameter settings and different indices are investigated that the sentence-based index produces the best results in terms of the document-level, the word-based index for the best results in terms of the passage-level and the paragraph-based index for the best results in terms of the passage2-level. Furthermore, the best term association results always come from the best baseline. The tuning number k in the proposed recursive re-ranking algorithm is discussed and locally optimized to be 10. Conclusions First, modelling term association for improving biomedical information retrieval using factor analysis, is one of the major contributions in our work. Second, the experiments confirm that term association considering co-occurrence and dependency among the keywords can produce better results than the baselines treating the keywords independently. Third, the baselines are re-ranked according to the importance and reliance of latent factors behind term associations. These latent factors are decided by the proposed model and their term appearances in the first round retrieved passages. PMID:22901087

  3. Stennis visits Lake Cormorant school

    NASA Image and Video Library

    2010-03-30

    Alexis Harry, assistant director of Astro Camp at NASA's John C. Stennis Space Center, talks with students at Lake Cormorant (Miss.) Elementary School during a 'Living and Working in Space' presentation March 30. Stennis hosted the school presentation during a visit to the Oxford area. Harry, who also is a high school biology teacher in Slidell, La., spent time discussing space travel with students and answering questions they had about the experience, including queries about how astronauts eat, sleep and drink in space. The presentation was sponsored by the NASA Office of External Affairs and Education at Stennis. For more information about NASA education initiatives, visit: http://education.ssc.nasa.gov/.

  4. Stennis visits Lake Cormorant school

    NASA Technical Reports Server (NTRS)

    2010-01-01

    Alexis Harry, assistant director of Astro Camp at NASA's John C. Stennis Space Center, talks with students at Lake Cormorant (Miss.) Elementary School during a 'Living and Working in Space' presentation March 30. Stennis hosted the school presentation during a visit to the Oxford area. Harry, who also is a high school biology teacher in Slidell, La., spent time discussing space travel with students and answering questions they had about the experience, including queries about how astronauts eat, sleep and drink in space. The presentation was sponsored by the NASA Office of External Affairs and Education at Stennis. For more information about NASA education initiatives, visit: http://education.ssc.nasa.gov/.

  5. CrossQuery: a web tool for easy associative querying of transcriptome data.

    PubMed

    Wagner, Toni U; Fischer, Andreas; Thoma, Eva C; Schartl, Manfred

    2011-01-01

    Enormous amounts of data are being generated by modern methods such as transcriptome or exome sequencing and microarray profiling. Primary analyses such as quality control, normalization, statistics and mapping are highly complex and need to be performed by specialists. Thereafter, results are handed back to biomedical researchers, who are then confronted with complicated data lists. For rather simple tasks like data filtering, sorting and cross-association there is a need for new tools which can be used by non-specialists. Here, we describe CrossQuery, a web tool that enables straight forward, simple syntax queries to be executed on transcriptome sequencing and microarray datasets. We provide deep-sequencing data sets of stem cell lines derived from the model fish Medaka and microarray data of human endothelial cells. In the example datasets provided, mRNA expression levels, gene, transcript and sample identification numbers, GO-terms and gene descriptions can be freely correlated, filtered and sorted. Queries can be saved for later reuse and results can be exported to standard formats that allow copy-and-paste to all widespread data visualization tools such as Microsoft Excel. CrossQuery enables researchers to quickly and freely work with transcriptome and microarray data sets requiring only minimal computer skills. Furthermore, CrossQuery allows growing association of multiple datasets as long as at least one common point of correlated information, such as transcript identification numbers or GO-terms, is shared between samples. For advanced users, the object-oriented plug-in and event-driven code design of both server-side and client-side scripts allow easy addition of new features, data sources and data types.

  6. Virtual Solar Observatory Distributed Query Construction

    NASA Technical Reports Server (NTRS)

    Gurman, J. B.; Dimitoglou, G.; Bogart, R.; Davey, A.; Hill, F.; Martens, P.

    2003-01-01

    Through a prototype implementation (Tian et al., this meeting) the VSO has already demonstrated the capability of unifying geographically distributed data sources following the Web Services paradigm and utilizing mechanisms such as the Simple Object Access Protocol (SOAP). So far, four participating sites (Stanford, Montana State University, National Solar Observatory and the Solar Data Analysis Center) permit Web-accessible, time-based searches that allow browse access to a number of diverse data sets. Our latest work includes the extension of the simple, time-based queries to include numerous other searchable observation parameters. For VSO users, this extended functionality enables more refined searches. For the VSO, it is a proof of concept that more complex, distributed queries can be effectively constructed and that results from heterogeneous, remote sources can be synthesized and presented to users as a single, virtual data product.

  7. XML Reconstruction View Selection in XML Databases: Complexity Analysis and Approximation Scheme

    NASA Astrophysics Data System (ADS)

    Chebotko, Artem; Fu, Bin

    Query evaluation in an XML database requires reconstructing XML subtrees rooted at nodes found by an XML query. Since XML subtree reconstruction can be expensive, one approach to improve query response time is to use reconstruction views - materialized XML subtrees of an XML document, whose nodes are frequently accessed by XML queries. For this approach to be efficient, the principal requirement is a framework for view selection. In this work, we are the first to formalize and study the problem of XML reconstruction view selection. The input is a tree T, in which every node i has a size c i and profit p i , and the size limitation C. The target is to find a subset of subtrees rooted at nodes i 1, ⋯ , i k respectively such that c_{i_1}+\\cdots +c_{i_k}le C, and p_{i_1}+\\cdots +p_{i_k} is maximal. Furthermore, there is no overlap between any two subtrees selected in the solution. We prove that this problem is NP-hard and present a fully polynomial-time approximation scheme (FPTAS) as a solution.

  8. Architecture and prototypical implementation of a semantic querying system for big Earth observation image bases

    PubMed Central

    Tiede, Dirk; Baraldi, Andrea; Sudmanns, Martin; Belgiu, Mariana; Lang, Stefan

    2017-01-01

    ABSTRACT Spatiotemporal analytics of multi-source Earth observation (EO) big data is a pre-condition for semantic content-based image retrieval (SCBIR). As a proof of concept, an innovative EO semantic querying (EO-SQ) subsystem was designed and prototypically implemented in series with an EO image understanding (EO-IU) subsystem. The EO-IU subsystem is automatically generating ESA Level 2 products (scene classification map, up to basic land cover units) from optical satellite data. The EO-SQ subsystem comprises a graphical user interface (GUI) and an array database embedded in a client server model. In the array database, all EO images are stored as a space-time data cube together with their Level 2 products generated by the EO-IU subsystem. The GUI allows users to (a) develop a conceptual world model based on a graphically supported query pipeline as a combination of spatial and temporal operators and/or standard algorithms and (b) create, save and share within the client-server architecture complex semantic queries/decision rules, suitable for SCBIR and/or spatiotemporal EO image analytics, consistent with the conceptual world model. PMID:29098143

  9. Clinician search behaviors may be influenced by search engine design.

    PubMed

    Lau, Annie Y S; Coiera, Enrico; Zrimec, Tatjana; Compton, Paul

    2010-06-30

    Searching the Web for documents using information retrieval systems plays an important part in clinicians' practice of evidence-based medicine. While much research focuses on the design of methods to retrieve documents, there has been little examination of the way different search engine capabilities influence clinician search behaviors. Previous studies have shown that use of task-based search engines allows for faster searches with no loss of decision accuracy compared with resource-based engines. We hypothesized that changes in search behaviors may explain these differences. In all, 75 clinicians (44 doctors and 31 clinical nurse consultants) were randomized to use either a resource-based or a task-based version of a clinical information retrieval system to answer questions about 8 clinical scenarios in a controlled setting in a university computer laboratory. Clinicians using the resource-based system could select 1 of 6 resources, such as PubMed; clinicians using the task-based system could select 1 of 6 clinical tasks, such as diagnosis. Clinicians in both systems could reformulate search queries. System logs unobtrusively capturing clinicians' interactions with the systems were coded and analyzed for clinicians' search actions and query reformulation strategies. The most frequent search action of clinicians using the resource-based system was to explore a new resource with the same query, that is, these clinicians exhibited a "breadth-first" search behaviour. Of 1398 search actions, clinicians using the resource-based system conducted 401 (28.7%, 95% confidence interval [CI] 26.37-31.11) in this way. In contrast, the majority of clinicians using the task-based system exhibited a "depth-first" search behavior in which they reformulated query keywords while keeping to the same task profiles. Of 585 search actions conducted by clinicians using the task-based system, 379 (64.8%, 95% CI 60.83-68.55) were conducted in this way. This study provides evidence that different search engine designs are associated with different user search behaviors.

  10. BOSS: context-enhanced search for biomedical objects

    PubMed Central

    2012-01-01

    Background There exist many academic search solutions and most of them can be put on either ends of spectrum: general-purpose search and domain-specific "deep" search systems. The general-purpose search systems, such as PubMed, offer flexible query interface, but churn out a list of matching documents that users have to go through the results in order to find the answers to their queries. On the other hand, the "deep" search systems, such as PPI Finder and iHOP, return the precompiled results in a structured way. Their results, however, are often found only within some predefined contexts. In order to alleviate these problems, we introduce a new search engine, BOSS, Biomedical Object Search System. Methods Unlike the conventional search systems, BOSS indexes segments, rather than documents. A segment refers to a Maximal Coherent Semantic Unit (MCSU) such as phrase, clause or sentence that is semantically coherent in the given context (e.g., biomedical objects or their relations). For a user query, BOSS finds all matching segments, identifies the objects appearing in those segments, and aggregates the segments for each object. Finally, it returns the ranked list of the objects along with their matching segments. Results The working prototype of BOSS is available at http://boss.korea.ac.kr. The current version of BOSS has indexed abstracts of more than 20 million articles published during last 16 years from 1996 to 2011 across all science disciplines. Conclusion BOSS fills the gap between either ends of the spectrum by allowing users to pose context-free queries and by returning a structured set of results. Furthermore, BOSS exhibits the characteristic of good scalability, just as with conventional document search engines, because it is designed to use a standard document-indexing model with minimal modifications. Considering the features, BOSS notches up the technological level of traditional solutions for search on biomedical information. PMID:22595092

  11. Semantics-based plausible reasoning to extend the knowledge coverage of medical knowledge bases for improved clinical decision support.

    PubMed

    Mohammadhassanzadeh, Hossein; Van Woensel, William; Abidi, Samina Raza; Abidi, Syed Sibte Raza

    2017-01-01

    Capturing complete medical knowledge is challenging-often due to incomplete patient Electronic Health Records (EHR), but also because of valuable, tacit medical knowledge hidden away in physicians' experiences. To extend the coverage of incomplete medical knowledge-based systems beyond their deductive closure, and thus enhance their decision-support capabilities, we argue that innovative, multi-strategy reasoning approaches should be applied. In particular, plausible reasoning mechanisms apply patterns from human thought processes, such as generalization, similarity and interpolation, based on attributional, hierarchical, and relational knowledge. Plausible reasoning mechanisms include inductive reasoning , which generalizes the commonalities among the data to induce new rules, and analogical reasoning , which is guided by data similarities to infer new facts. By further leveraging rich, biomedical Semantic Web ontologies to represent medical knowledge, both known and tentative, we increase the accuracy and expressivity of plausible reasoning, and cope with issues such as data heterogeneity, inconsistency and interoperability. In this paper, we present a Semantic Web-based, multi-strategy reasoning approach, which integrates deductive and plausible reasoning and exploits Semantic Web technology to solve complex clinical decision support queries. We evaluated our system using a real-world medical dataset of patients with hepatitis, from which we randomly removed different percentages of data (5%, 10%, 15%, and 20%) to reflect scenarios with increasing amounts of incomplete medical knowledge. To increase the reliability of the results, we generated 5 independent datasets for each percentage of missing values, which resulted in 20 experimental datasets (in addition to the original dataset). The results show that plausibly inferred knowledge extends the coverage of the knowledge base by, on average, 2%, 7%, 12%, and 16% for datasets with, respectively, 5%, 10%, 15%, and 20% of missing values. This expansion in the KB coverage allowed solving complex disease diagnostic queries that were previously unresolvable, without losing the correctness of the answers. However, compared to deductive reasoning, data-intensive plausible reasoning mechanisms yield a significant performance overhead. We observed that plausible reasoning approaches, by generating tentative inferences and leveraging domain knowledge of experts, allow us to extend the coverage of medical knowledge bases, resulting in improved clinical decision support. Second, by leveraging OWL ontological knowledge, we are able to increase the expressivity and accuracy of plausible reasoning methods. Third, our approach is applicable to clinical decision support systems for a range of chronic diseases.

  12. a Restoration Oriented Hbim System for Cultural Heritage Documentation: the Case Study of Parma Cathedral

    NASA Astrophysics Data System (ADS)

    Bruno, N.; Roncella, R.

    2018-05-01

    The need to safeguard and preserve Cultural Heritage (CH) is increasing and especially in Italy, where the amount of historical buildings is considerable, having efficient and standardized processes of CH management and conservation becomes strategic. At the time being, there are no tools capable of fulfilling all the specific functions required by Cultural Heritage documentation and, due to the complexity of historical assets, there are no solution as flexible and customizable as CH specific needs require. Nevertheless, BIM methodology can represent the most effective solution, on condition that proper methodologies, tools and functions are made available. The paper describes an ongoing research on the implementation of a Historical BIM system for the Parma cathedral, aimed at the maintenance, conservation and restoration. Its main goal was to give a concrete answer to the lack of specific tools required by Cultural Heritage documentation: organized and coordinated storage and management of historical data, easy analysis and query, time management, 3D modelling of irregular shapes, flexibility, user-friendliness, etc. The paper will describe the project and the implemented methodology, focusing mainly on survey and modelling phases. In describing the methodology, critical issues about the creation of a HBIM will be highlighted, trying to outline a workflow applicable also in other similar contexts.

  13. Interactive access to LP DAAC satellite data archives through a combination of open-source and custom middleware web services

    USGS Publications Warehouse

    Davis, Brian N.; Werpy, Jason; Friesz, Aaron M.; Impecoven, Kevin; Quenzer, Robert; Maiersperger, Tom; Meyer, David J.

    2015-01-01

    Current methods of searching for and retrieving data from satellite land remote sensing archives do not allow for interactive information extraction. Instead, Earth science data users are required to download files over low-bandwidth networks to local workstations and process data before science questions can be addressed. New methods of extracting information from data archives need to become more interactive to meet user demands for deriving increasingly complex information from rapidly expanding archives. Moving the tools required for processing data to computer systems of data providers, and away from systems of the data consumer, can improve turnaround times for data processing workflows. The implementation of middleware services was used to provide interactive access to archive data. The goal of this middleware services development is to enable Earth science data users to access remote sensing archives for immediate answers to science questions instead of links to large volumes of data to download and process. Exposing data and metadata to web-based services enables machine-driven queries and data interaction. Also, product quality information can be integrated to enable additional filtering and sub-setting. Only the reduced content required to complete an analysis is then transferred to the user.

  14. Learning multiple relative attributes with humans in the loop.

    PubMed

    Qian, Buyue; Wang, Xiang; Cao, Nan; Jiang, Yu-Gang; Davidson, Ian

    2014-12-01

    Semantic attributes have been recognized as a more spontaneous manner to describe and annotate image content. It is widely accepted that image annotation using semantic attributes is a significant improvement to the traditional binary or multiclass annotation due to its naturally continuous and relative properties. Though useful, existing approaches rely on an abundant supervision and high-quality training data, which limit their applicability. Two standard methods to overcome small amounts of guidance and low-quality training data are transfer and active learning. In the context of relative attributes, this would entail learning multiple relative attributes simultaneously and actively querying a human for additional information. This paper addresses the two main limitations in existing work: 1) it actively adds humans to the learning loop so that minimal additional guidance can be given and 2) it learns multiple relative attributes simultaneously and thereby leverages dependence amongst them. In this paper, we formulate a joint active learning to rank framework with pairwise supervision to achieve these two aims, which also has other benefits such as the ability to be kernelized. The proposed framework optimizes over a set of ranking functions (measuring the strength of the presence of attributes) simultaneously and dependently on each other. The proposed pairwise queries take the form of which one of these two pictures is more natural? These queries can be easily answered by humans. Extensive empirical study on real image data sets shows that our proposed method, compared with several state-of-the-art methods, achieves superior retrieval performance while requires significantly less human inputs.

  15. Vaccine-criticism on the internet: new insights based on French-speaking websites.

    PubMed

    Ward, Jeremy K; Peretti-Watel, Patrick; Larson, Heidi J; Raude, Jocelyn; Verger, Pierre

    2015-02-18

    The internet is playing an increasingly important part in fueling vaccine related controversies and in generating vaccine hesitant behaviors. English language Antivaccination websites have been thoroughly analyzed, however, little is known of the arguments presented in other languages on the internet. This study presents three types of results: (1) Authors apply a time tested content analysis methodology to describe the information diffused by French language vaccine critical websites in comparison with English speaking websites. The contents of French language vaccine critical websites are very similar to those of English language websites except for the relative absence of moral and religious arguments. (2) Authors evaluate the likelihood that internet users will find those websites through vaccine-related queries on a variety of French-language versions of google. Queries on controversial vaccines generated many more vaccine critical websites than queries on vaccination in general. (3) Authors propose a typology of vaccine critical websites. Authors distinguish between (a) websites that criticize all vaccines ("antivaccine" websites) and websites that criticize only some vaccines ("vaccine-selective" websites), and between (b) websites that focus on vaccines ("vaccine-focused" websites) and those for which vaccines were only a secondary topic of interest ("generalist" websites). The differences in stances by groups and websites affect the likelihood that they will be believed and by whom. This study therefore helps understand the different information landscapes that may contribute to the variety of forms of vaccine hesitancy. Public authorities should have better awareness and understanding of these stances to bring appropriate answers to the different controversies about vaccination. Copyright © 2014 Elsevier Ltd. All rights reserved.

  16. Bottom-Up Evaluation of Twig Join Pattern Queries in XML Document Databases

    NASA Astrophysics Data System (ADS)

    Chen, Yangjun

    Since the extensible markup language XML emerged as a new standard for information representation and exchange on the Internet, the problem of storing, indexing, and querying XML documents has been among the major issues of database research. In this paper, we study the twig pattern matching and discuss a new algorithm for processing ordered twig pattern queries. The time complexity of the algorithmis bounded by O(|D|·|Q| + |T|·leaf Q ) and its space overhead is by O(leaf T ·leaf Q ), where T stands for a document tree, Q for a twig pattern and D is a largest data stream associated with a node q of Q, which contains the database nodes that match the node predicate at q. leaf T (leaf Q ) represents the number of the leaf nodes of T (resp. Q). In addition, the algorithm can be adapted to an indexing environment with XB-trees being used.

  17. Time and Space Efficient Algorithms for Two-Party Authenticated Data Structures

    NASA Astrophysics Data System (ADS)

    Papamanthou, Charalampos; Tamassia, Roberto

    Authentication is increasingly relevant to data management. Data is being outsourced to untrusted servers and clients want to securely update and query their data. For example, in database outsourcing, a client's database is stored and maintained by an untrusted server. Also, in simple storage systems, clients can store very large amounts of data but at the same time, they want to assure their integrity when they retrieve them. In this paper, we present a model and protocol for two-party authentication of data structures. Namely, a client outsources its data structure and verifies that the answers to the queries have not been tampered with. We provide efficient algorithms to securely outsource a skip list with logarithmic time overhead at the server and client and logarithmic communication cost, thus providing an efficient authentication primitive for outsourced data, both structured (e.g., relational databases) and semi-structured (e.g., XML documents). In our technique, the client stores only a constant amount of space, which is optimal. Our two-party authentication framework can be deployed on top of existing storage applications, thus providing an efficient authentication service. Finally, we present experimental results that demonstrate the practical efficiency and scalability of our scheme.

  18. Linked Registries: Connecting Rare Diseases Patient Registries through a Semantic Web Layer

    PubMed Central

    González-Castro, Lorena; Carta, Claudio; van der Horst, Eelke; Lopes, Pedro; Kaliyaperumal, Rajaram; Thompson, Mark; Thompson, Rachel; Queralt-Rosinach, Núria; Lopez, Estrella; Wood, Libby; Robertson, Agata; Lamanna, Claudia; Gilling, Mette; Orth, Michael; Merino-Martinez, Roxana; Taruscio, Domenica; Lochmüller, Hanns

    2017-01-01

    Patient registries are an essential tool to increase current knowledge regarding rare diseases. Understanding these data is a vital step to improve patient treatments and to create the most adequate tools for personalized medicine. However, the growing number of disease-specific patient registries brings also new technical challenges. Usually, these systems are developed as closed data silos, with independent formats and models, lacking comprehensive mechanisms to enable data sharing. To tackle these challenges, we developed a Semantic Web based solution that allows connecting distributed and heterogeneous registries, enabling the federation of knowledge between multiple independent environments. This semantic layer creates a holistic view over a set of anonymised registries, supporting semantic data representation, integrated access, and querying. The implemented system gave us the opportunity to answer challenging questions across disperse rare disease patient registries. The interconnection between those registries using Semantic Web technologies benefits our final solution in a way that we can query single or multiple instances according to our needs. The outcome is a unique semantic layer, connecting miscellaneous registries and delivering a lightweight holistic perspective over the wealth of knowledge stemming from linked rare disease patient registries. PMID:29214177

  19. Research on spatio-temporal database techniques for spatial information service

    NASA Astrophysics Data System (ADS)

    Zhao, Rong; Wang, Liang; Li, Yuxiang; Fan, Rongshuang; Liu, Ping; Li, Qingyuan

    2007-06-01

    Geographic data should be described by spatial, temporal and attribute components, but the spatio-temporal queries are difficult to be answered within current GIS. This paper describes research into the development and application of spatio-temporal data management system based upon GeoWindows GIS software platform which was developed by Chinese Academy of Surveying and Mapping (CASM). Faced the current and practical requirements of spatial information application, and based on existing GIS platform, one kind of spatio-temporal data model which integrates vector and grid data together was established firstly. Secondly, we solved out the key technique of building temporal data topology, successfully developed a suit of spatio-temporal database management system adopting object-oriented methods. The system provides the temporal data collection, data storage, data management and data display and query functions. Finally, as a case study, we explored the application of spatio-temporal data management system with the administrative region data of multi-history periods of China as the basic data. With all the efforts above, the GIS capacity of management and manipulation in aspect of time and attribute of GIS has been enhanced, and technical reference has been provided for the further development of temporal geographic information system (TGIS).

  20. Linked Registries: Connecting Rare Diseases Patient Registries through a Semantic Web Layer.

    PubMed

    Sernadela, Pedro; González-Castro, Lorena; Carta, Claudio; van der Horst, Eelke; Lopes, Pedro; Kaliyaperumal, Rajaram; Thompson, Mark; Thompson, Rachel; Queralt-Rosinach, Núria; Lopez, Estrella; Wood, Libby; Robertson, Agata; Lamanna, Claudia; Gilling, Mette; Orth, Michael; Merino-Martinez, Roxana; Posada, Manuel; Taruscio, Domenica; Lochmüller, Hanns; Robinson, Peter; Roos, Marco; Oliveira, José Luís

    2017-01-01

    Patient registries are an essential tool to increase current knowledge regarding rare diseases. Understanding these data is a vital step to improve patient treatments and to create the most adequate tools for personalized medicine. However, the growing number of disease-specific patient registries brings also new technical challenges. Usually, these systems are developed as closed data silos, with independent formats and models, lacking comprehensive mechanisms to enable data sharing. To tackle these challenges, we developed a Semantic Web based solution that allows connecting distributed and heterogeneous registries, enabling the federation of knowledge between multiple independent environments. This semantic layer creates a holistic view over a set of anonymised registries, supporting semantic data representation, integrated access, and querying. The implemented system gave us the opportunity to answer challenging questions across disperse rare disease patient registries. The interconnection between those registries using Semantic Web technologies benefits our final solution in a way that we can query single or multiple instances according to our needs. The outcome is a unique semantic layer, connecting miscellaneous registries and delivering a lightweight holistic perspective over the wealth of knowledge stemming from linked rare disease patient registries.

  1. Partial automation of database processing of simulation outputs from L-systems models of plant morphogenesis.

    PubMed

    Chen, Yi- Ping Phoebe; Hanan, Jim

    2002-01-01

    Models of plant architecture allow us to explore how genotype environment interactions effect the development of plant phenotypes. Such models generate masses of data organised in complex hierarchies. This paper presents a generic system for creating and automatically populating a relational database from data generated by the widely used L-system approach to modelling plant morphogenesis. Techniques from compiler technology are applied to generate attributes (new fields) in the database, to simplify query development for the recursively-structured branching relationship. Use of biological terminology in an interactive query builder contributes towards making the system biologist-friendly.

  2. SPLICE: A program to assemble partial query solutions from three-dimensional database searches into novel ligands

    NASA Astrophysics Data System (ADS)

    Ho, Chris M. W.; Marshall, Garland R.

    1993-12-01

    SPLICE is a program that processes partial query solutions retrieved from 3D, structural databases to generate novel, aggregate ligands. It is designed to interface with the database searching program FOUNDATION, which retrieves fragments containing any combination of a user-specified minimum number of matching query elements. SPLICE eliminates aspects of structures that are physically incapable of binding within the active site. Then, a systematic rule-based procedure is performed upon the remaining fragments to ensure receptor complementarity. All modifications are automated and remain transparent to the user. Ligands are then assembled by linking components into composite structures through overlapping bonds. As a control experiment, FOUNDATION and SPLICE were used to reconstruct a know HIV-1 protease inhibitor after it had been fragmented, reoriented, and added to a sham database of fifty different small molecules. To illustrate the capabilities of this program, a 3D search query containing the pharmacophoric elements of an aspartic proteinase-inhibitor crystal complex was searched using FOUNDATION against a subset of the Cambridge Structural Database. One hundred thirty-one compounds were retrieved, each containing any combination of at least four query elements. Compounds were automatically screened and edited for receptor complementarity. Numerous combinations of fragments were discovered that could be linked to form novel structures, containing a greater number of pharmacophoric elements than any single retrieved fragment.

  3. Virtual Observatory Interfaces to the Chandra Data Archive

    NASA Astrophysics Data System (ADS)

    Tibbetts, M.; Harbo, P.; Van Stone, D.; Zografou, P.

    2014-05-01

    The Chandra Data Archive (CDA) plays a central role in the operation of the Chandra X-ray Center (CXC) by providing access to Chandra data. Proprietary interfaces have been the backbone of the CDA throughout the Chandra mission. While these interfaces continue to provide the depth and breadth of mission specific access Chandra users expect, the CXC has been adding Virtual Observatory (VO) interfaces to the Chandra proposal catalog and observation catalog. VO interfaces provide standards-based access to Chandra data through simple positional queries or more complex queries using the Astronomical Data Query Language. Recent development at the CDA has generalized our existing VO services to create a suite of services that can be configured to provide VO interfaces to any dataset. This approach uses a thin web service layer for the individual VO interfaces, a middle-tier query component which is shared among the VO interfaces for parsing, scheduling, and executing queries, and existing web services for file and data access. The CXC VO services provide Simple Cone Search (SCS), Simple Image Access (SIA), and Table Access Protocol (TAP) implementations for both the Chandra proposal and observation catalogs within the existing archive architecture. Our work with the Chandra proposal and observation catalogs, as well as additional datasets beyond the CDA, illustrates how we can provide configurable VO services to extend core archive functionality.

  4. Design and Implementation of a Metadata-rich File System

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ames, S; Gokhale, M B; Maltzahn, C

    2010-01-19

    Despite continual improvements in the performance and reliability of large scale file systems, the management of user-defined file system metadata has changed little in the past decade. The mismatch between the size and complexity of large scale data stores and their ability to organize and query their metadata has led to a de facto standard in which raw data is stored in traditional file systems, while related, application-specific metadata is stored in relational databases. This separation of data and semantic metadata requires considerable effort to maintain consistency and can result in complex, slow, and inflexible system operation. To address thesemore » problems, we have developed the Quasar File System (QFS), a metadata-rich file system in which files, user-defined attributes, and file relationships are all first class objects. In contrast to hierarchical file systems and relational databases, QFS defines a graph data model composed of files and their relationships. QFS incorporates Quasar, an XPATH-extended query language for searching the file system. Results from our QFS prototype show the effectiveness of this approach. Compared to the de facto standard, the QFS prototype shows superior ingest performance and comparable query performance on user metadata-intensive operations and superior performance on normal file metadata operations.« less

  5. Demonstration of quantum advantage in machine learning

    NASA Astrophysics Data System (ADS)

    Ristè, Diego; da Silva, Marcus P.; Ryan, Colm A.; Cross, Andrew W.; Córcoles, Antonio D.; Smolin, John A.; Gambetta, Jay M.; Chow, Jerry M.; Johnson, Blake R.

    2017-04-01

    The main promise of quantum computing is to efficiently solve certain problems that are prohibitively expensive for a classical computer. Most problems with a proven quantum advantage involve the repeated use of a black box, or oracle, whose structure encodes the solution. One measure of the algorithmic performance is the query complexity, i.e., the scaling of the number of oracle calls needed to find the solution with a given probability. Few-qubit demonstrations of quantum algorithms, such as Deutsch-Jozsa and Grover, have been implemented across diverse physical systems such as nuclear magnetic resonance, trapped ions, optical systems, and superconducting circuits. However, at the small scale, these problems can already be solved classically with a few oracle queries, limiting the obtained advantage. Here we solve an oracle-based problem, known as learning parity with noise, on a five-qubit superconducting processor. Executing classical and quantum algorithms using the same oracle, we observe a large gap in query count in favor of quantum processing. We find that this gap grows by orders of magnitude as a function of the error rates and the problem size. This result demonstrates that, while complex fault-tolerant architectures will be required for universal quantum computing, a significant quantum advantage already emerges in existing noisy systems.

  6. Avoiding the Complex History, Simple Answer Syndrome: A Lesson Plan for Providing Depth and Analysis in the High School History Classroom

    ERIC Educational Resources Information Center

    Lindquist, David H.

    2012-01-01

    Examining history from the perspective of investigators who wrestle with involved scenarios for which no simple answers exist, or from which no obvious conclusions can be drawn, allows students to understand the historiographic process and the complex nature of historical events, while gaining valuable practice in applying analytical and critical…

  7. Keys to Success: School Facilities Primer, Questions & Answers 101.

    ERIC Educational Resources Information Center

    Brady, Jim

    This publication provides answers to basic questions to help school board members more fully address the complexities of the planning, design, and construction process in order to maximize the goal of student success. The 101 questions and answers are in the areas of: facility planning; learning environment; information technology; safe schools;…

  8. Integrative Analysis of Complex Cancer Genomics and Clinical Profiles Using the cBioPortal

    PubMed Central

    Gao, Jianjiong; Aksoy, Bülent Arman; Dogrusoz, Ugur; Dresdner, Gideon; Gross, Benjamin; Sumer, S. Onur; Sun, Yichao; Jacobsen, Anders; Sinha, Rileen; Larsson, Erik; Cerami, Ethan; Sander, Chris; Schultz, Nikolaus

    2014-01-01

    The cBioPortal for Cancer Genomics (http://cbioportal.org) provides a Web resource for exploring, visualizing, and analyzing multidimensional cancer genomics data. The portal reduces molecular profiling data from cancer tissues and cell lines into readily understandable genetic, epigenetic, gene expression, and proteomic events. The query interface combined with customized data storage enables researchers to interactively explore genetic alterations across samples, genes, and pathways and, when available in the underlying data, to link these to clinical outcomes. The portal provides graphical summaries of gene-level data from multiple platforms, network visualization and analysis, survival analysis, patient-centric queries, and software programmatic access. The intuitive Web interface of the portal makes complex cancer genomics profiles accessible to researchers and clinicians without requiring bioinformatics expertise, thus facilitating biological discoveries. Here, we provide a practical guide to the analysis and visualization features of the cBioPortal for Cancer Genomics. PMID:23550210

  9. YAHA: fast and flexible long-read alignment with optimal breakpoint detection.

    PubMed

    Faust, Gregory G; Hall, Ira M

    2012-10-01

    With improved short-read assembly algorithms and the recent development of long-read sequencers, split mapping will soon be the preferred method for structural variant (SV) detection. Yet, current alignment tools are not well suited for this. We present YAHA, a fast and flexible hash-based aligner. YAHA is as fast and accurate as BWA-SW at finding the single best alignment per query and is dramatically faster and more sensitive than both SSAHA2 and MegaBLAST at finding all possible alignments. Unlike other aligners that report all, or one, alignment per query, or that use simple heuristics to select alignments, YAHA uses a directed acyclic graph to find the optimal set of alignments that cover a query using a biologically relevant breakpoint penalty. YAHA can also report multiple mappings per defined segment of the query. We show that YAHA detects more breakpoints in less time than BWA-SW across all SV classes, and especially excels at complex SVs comprising multiple breakpoints. YAHA is currently supported on 64-bit Linux systems. Binaries and sample data are freely available for download from http://faculty.virginia.edu/irahall/YAHA. imh4y@virginia.edu.

  10. Integrated databanks access and sequence/structure analysis services at the PBIL.

    PubMed

    Perrière, Guy; Combet, Christophe; Penel, Simon; Blanchet, Christophe; Thioulouse, Jean; Geourjon, Christophe; Grassot, Julien; Charavay, Céline; Gouy, Manolo; Duret, Laurent; Deléage, Gilbert

    2003-07-01

    The World Wide Web server of the PBIL (Pôle Bioinformatique Lyonnais) provides on-line access to sequence databanks and to many tools of nucleic acid and protein sequence analyses. This server allows to query nucleotide sequence banks in the EMBL and GenBank formats and protein sequence banks in the SWISS-PROT and PIR formats. The query engine on which our data bank access is based is the ACNUC system. It allows the possibility to build complex queries to access functional zones of biological interest and to retrieve large sequence sets. Of special interest are the unique features provided by this system to query the data banks of gene families developed at the PBIL. The server also provides access to a wide range of sequence analysis methods: similarity search programs, multiple alignments, protein structure prediction and multivariate statistics. An originality of this server is the integration of these two aspects: sequence retrieval and sequence analysis. Indeed, thanks to the introduction of re-usable lists, it is possible to perform treatments on large sets of data. The PBIL server can be reached at: http://pbil.univ-lyon1.fr.

  11. IMGT/3Dstructure-DB and IMGT/StructuralQuery, a database and a tool for immunoglobulin, T cell receptor and MHC structural data

    PubMed Central

    Kaas, Quentin; Ruiz, Manuel; Lefranc, Marie-Paule

    2004-01-01

    IMGT/3Dstructure-DB and IMGT/Structural-Query are a novel 3D structure database and a new tool for immunological proteins. They are part of IMGT, the international ImMunoGenetics information system®, a high-quality integrated knowledge resource specializing in immunoglobulins (IG), T cell receptors (TR), major histocompatibility complex (MHC) and related proteins of the immune system (RPI) of human and other vertebrate species, which consists of databases, Web resources and interactive on-line tools. IMGT/3Dstructure-DB data are described according to the IMGT Scientific chart rules based on the IMGT-ONTOLOGY concepts. IMGT/3Dstructure-DB provides IMGT gene and allele identification of IG, TR and MHC proteins with known 3D structures, domain delimitations, amino acid positions according to the IMGT unique numbering and renumbered coordinate flat files. Moreover IMGT/3Dstructure-DB provides 2D graphical representations (or Collier de Perles) and results of contact analysis. The IMGT/StructuralQuery tool allows search of this database based on specific structural characteristics. IMGT/3Dstructure-DB and IMGT/StructuralQuery are freely available at http://imgt.cines.fr. PMID:14681396

  12. CellLineNavigator: a workbench for cancer cell line analysis

    PubMed Central

    Krupp, Markus; Itzel, Timo; Maass, Thorsten; Hildebrandt, Andreas; Galle, Peter R.; Teufel, Andreas

    2013-01-01

    The CellLineNavigator database, freely available at http://www.medicalgenomics.org/celllinenavigator, is a web-based workbench for large scale comparisons of a large collection of diverse cell lines. It aims to support experimental design in the fields of genomics, systems biology and translational biomedical research. Currently, this compendium holds genome wide expression profiles of 317 different cancer cell lines, categorized into 57 different pathological states and 28 individual tissues. To enlarge the scope of CellLineNavigator, the database was furthermore closely linked to commonly used bioinformatics databases and knowledge repositories. To ensure easy data access and search ability, a simple data and an intuitive querying interface were implemented. It allows the user to explore and filter gene expression, focusing on pathological or physiological conditions. For a more complex search, the advanced query interface may be used to query for (i) differentially expressed genes; (ii) pathological or physiological conditions; or (iii) gene names or functional attributes, such as Kyoto Encyclopaedia of Genes and Genomes pathway maps. These queries may also be combined. Finally, CellLineNavigator allows additional advanced analysis of differentially regulated genes by a direct link to the Database for Annotation, Visualization and Integrated Discovery (DAVID) Bioinformatics Resources. PMID:23118487

  13. Shedding light on thirteen years of darkness: content analysis of articles pertaining to transgender issues in marriage/couple and family therapy journals.

    PubMed

    Blumer, Markie L C; Green, Mary S; Knowles, Sarah J; Williams, April

    2012-06-01

    What is the extent to which marriage/couple and family therapy (M/CFT) journals address transgender issues and how many of them say they are inclusive of transgender persons when they are not? To answer these queries, a content analysis was conducted on articles published in M/CFT literature from 1997 through 2009. Of the 10,739 articles examined in 17 journals, only nine (0.0008%) focused on transgender issues or used gender variance as a variable. Findings support the assertion that transgender issues are ignored and marginalized by M/CFT scholars and researchers alike. © 2012 American Association for Marriage and Family Therapy.

  14. FastBit: Interactively Searching Massive Data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wu, Kesheng; Ahern, Sean; Bethel, E. Wes

    2009-06-23

    As scientific instruments and computer simulations produce more and more data, the task of locating the essential information to gain insight becomes increasingly difficult. FastBit is an efficient software tool to address this challenge. In this article, we present a summary of the key underlying technologies, namely bitmap compression, encoding, and binning. Together these techniques enable FastBit to answer structured (SQL) queries orders of magnitude faster than popular database systems. To illustrate how FastBit is used in applications, we present three examples involving a high-energy physics experiment, a combustion simulation, and an accelerator simulation. In each case, FastBit significantly reducesmore » the response time and enables interactive exploration on terabytes of data.« less

  15. Additions to Pollard's "fun with Gompertz".

    PubMed

    Krishnamoorthy, S; Kulkarni, P M

    1993-01-01

    "In a recent paper, Pollard (1991) has demonstrated that under the Gompertz law of mortality quick accurate or approximate answers can be obtained to many queries on survival. Some of Pollard's formulae can also be developed in the context of multiple decrement life tables so as to arrive at simple solutions to problems on the probability of death due to a given cause and the effect of the elimination of a cause of death. It is realized that the cause-specific force of mortality may not obey the Gompertz law. Still, it may be possible to group the causes in such a way that for each group the Gompertz curve provides a good approximation." excerpt

  16. Automatically exposing OpenLifeData via SADI semantic Web Services.

    PubMed

    González, Alejandro Rodríguez; Callahan, Alison; Cruz-Toledo, José; Garcia, Adrian; Egaña Aranguren, Mikel; Dumontier, Michel; Wilkinson, Mark D

    2014-01-01

    Two distinct trends are emerging with respect to how data is shared, collected, and analyzed within the bioinformatics community. First, Linked Data, exposed as SPARQL endpoints, promises to make data easier to collect and integrate by moving towards the harmonization of data syntax, descriptive vocabularies, and identifiers, as well as providing a standardized mechanism for data access. Second, Web Services, often linked together into workflows, normalize data access and create transparent, reproducible scientific methodologies that can, in principle, be re-used and customized to suit new scientific questions. Constructing queries that traverse semantically-rich Linked Data requires substantial expertise, yet traditional RESTful or SOAP Web Services cannot adequately describe the content of a SPARQL endpoint. We propose that content-driven Semantic Web Services can enable facile discovery of Linked Data, independent of their location. We use a well-curated Linked Dataset - OpenLifeData - and utilize its descriptive metadata to automatically configure a series of more than 22,000 Semantic Web Services that expose all of its content via the SADI set of design principles. The OpenLifeData SADI services are discoverable via queries to the SHARE registry and easy to integrate into new or existing bioinformatics workflows and analytical pipelines. We demonstrate the utility of this system through comparison of Web Service-mediated data access with traditional SPARQL, and note that this approach not only simplifies data retrieval, but simultaneously provides protection against resource-intensive queries. We show, through a variety of different clients and examples of varying complexity, that data from the myriad OpenLifeData can be recovered without any need for prior-knowledge of the content or structure of the SPARQL endpoints. We also demonstrate that, via clients such as SHARE, the complexity of federated SPARQL queries is dramatically reduced.

  17. Examining database persistence of ISO/EN 13606 standardized electronic health record extracts: relational vs. NoSQL approaches.

    PubMed

    Sánchez-de-Madariaga, Ricardo; Muñoz, Adolfo; Lozano-Rubí, Raimundo; Serrano-Balazote, Pablo; Castro, Antonio L; Moreno, Oscar; Pascual, Mario

    2017-08-18

    The objective of this research is to compare the relational and non-relational (NoSQL) database systems approaches in order to store, recover, query and persist standardized medical information in the form of ISO/EN 13606 normalized Electronic Health Record XML extracts, both in isolation and concurrently. NoSQL database systems have recently attracted much attention, but few studies in the literature address their direct comparison with relational databases when applied to build the persistence layer of a standardized medical information system. One relational and two NoSQL databases (one document-based and one native XML database) of three different sizes have been created in order to evaluate and compare the response times (algorithmic complexity) of six different complexity growing queries, which have been performed on them. Similar appropriate results available in the literature have also been considered. Relational and non-relational NoSQL database systems show almost linear algorithmic complexity query execution. However, they show very different linear slopes, the former being much steeper than the two latter. Document-based NoSQL databases perform better in concurrency than in isolation, and also better than relational databases in concurrency. Non-relational NoSQL databases seem to be more appropriate than standard relational SQL databases when database size is extremely high (secondary use, research applications). Document-based NoSQL databases perform in general better than native XML NoSQL databases. EHR extracts visualization and edition are also document-based tasks more appropriate to NoSQL database systems. However, the appropriate database solution much depends on each particular situation and specific problem.

  18. EarthServer: Use of Rasdaman as a data store for use in visualisation of complex EO data

    NASA Astrophysics Data System (ADS)

    Clements, Oliver; Walker, Peter; Grant, Mike

    2013-04-01

    The European Commission FP7 project EarthServer is establishing open access and ad-hoc analytics on extreme-size Earth Science data, based on and extending cutting-edge Array Database technology. EarthServer is built around the Rasdaman Raster Data Manager which extends standard relational database systems with the ability to store and retrieve multi-dimensional raster data of unlimited size through an SQL style query language. Rasdaman facilitates visualisation of data by providing several Open Geospatial Consortium (OGC) standard interfaces through its web services wrapper, Petascope. These include the well established standards, Web Coverage Service (WCS) and Web Map Service (WMS) as well as the emerging standard, Web Coverage Processing Service (WCPS). The WCPS standard allows the running of ad-hoc queries on the data stored within Rasdaman, creating an infrastructure where users are not restricted by bandwidth when manipulating or querying huge datasets. Here we will show that the use of EarthServer technologies and infrastructure allows access and visualisation of massive scale data through a web client with only marginal bandwidth use as opposed to the current mechanism of copying huge amounts of data to create visualisations locally. For example if a user wanted to generate a plot of global average chlorophyll for a complete decade time series they would only have to download the result instead of Terabytes of data. Firstly we will present a brief overview of the capabilities of Rasdaman and the WCPS query language to introduce the ways in which it is used in a visualisation tool chain. We will show that there are several ways in which WCPS can be utilised to create both standard and novel web based visualisations. An example of a standard visualisation is the production of traditional 2d plots, allowing users the ability to plot data products easily. However, the query language allows the creation of novel/custom products, which can then immediately be plotted with the same system. For more complex multi-spectral data, WCPS allows the user to explore novel combinations of bands in standard band-ratio algorithms through a web browser with dynamic updating of the resultant image. To visualise very large datasets Rasdaman has the capability to dynamically scale a dataset or query result so that it can be appraised quickly for use in later unscaled queries. All of these techniques are accessible through a web based GIS interface increasing the number of potential users of the system. Lastly we will show the advances in dynamic web based 3D visualisations being explored within the EarthServer project. By utilising the emerging declarative 3D web standard X3DOM as a tool to visualise the results of WCPS queries we introduce several possible benefits, including quick appraisal of data for outliers or anomalous data points and visualisation of the uncertainty of data alongside the actual data values.

  19. So, What Is the Answer? Questions

    ERIC Educational Resources Information Center

    Hall, Don

    2005-01-01

    Leaders are barraged daily by teachers, administrators, and students seeking answers to questions ranging from the simplistic to the metaphysical in their complexity. The author is sure leaders wonder at times, "How in the world can I free up enough time to answer them all?" Well, the author states that he hates to break the news, but it is…

  20. Automation and integration of components for generalized semantic markup of electronic medical texts.

    PubMed

    Dugan, J M; Berrios, D C; Liu, X; Kim, D K; Kaizer, H; Fagan, L M

    1999-01-01

    Our group has built an information retrieval system based on a complex semantic markup of medical textbooks. We describe the construction of a set of web-based knowledge-acquisition tools that expedites the collection and maintenance of the concepts required for text markup and the search interface required for information retrieval from the marked text. In the text markup system, domain experts (DEs) identify sections of text that contain one or more elements from a finite set of concepts. End users can then query the text using a predefined set of questions, each of which identifies a subset of complementary concepts. The search process matches that subset of concepts to relevant points in the text. The current process requires that the DE invest significant time to generate the required concepts and questions. We propose a new system--called ACQUIRE (Acquisition of Concepts and Queries in an Integrated Retrieval Environment)--that assists a DE in two essential tasks in the text-markup process. First, it helps her to develop, edit, and maintain the concept model: the set of concepts with which she marks the text. Second, ACQUIRE helps her to develop a query model: the set of specific questions that end users can later use to search the marked text. The DE incorporates concepts from the concept model when she creates the questions in the query model. The major benefit of the ACQUIRE system is a reduction in the time and effort required for the text-markup process. We compared the process of concept- and query-model creation using ACQUIRE to the process used in previous work by rebuilding two existing models that we previously constructed manually. We observed a significant decrease in the time required to build and maintain the concept and query models.

  1. START: a system for flexible analysis of hundreds of genomic signal tracks in few lines of SQL-like queries.

    PubMed

    Zhu, Xinjie; Zhang, Qiang; Ho, Eric Dun; Yu, Ken Hung-On; Liu, Chris; Huang, Tim H; Cheng, Alfred Sze-Lok; Kao, Ben; Lo, Eric; Yip, Kevin Y

    2017-09-22

    A genomic signal track is a set of genomic intervals associated with values of various types, such as measurements from high-throughput experiments. Analysis of signal tracks requires complex computational methods, which often make the analysts focus too much on the detailed computational steps rather than on their biological questions. Here we propose Signal Track Query Language (STQL) for simple analysis of signal tracks. It is a Structured Query Language (SQL)-like declarative language, which means one only specifies what computations need to be done but not how these computations are to be carried out. STQL provides a rich set of constructs for manipulating genomic intervals and their values. To run STQL queries, we have developed the Signal Track Analytical Research Tool (START, http://yiplab.cse.cuhk.edu.hk/start/ ), a system that includes a Web-based user interface and a back-end execution system. The user interface helps users select data from our database of around 10,000 commonly-used public signal tracks, manage their own tracks, and construct, store and share STQL queries. The back-end system automatically translates STQL queries into optimized low-level programs and runs them on a computer cluster in parallel. We use STQL to perform 14 representative analytical tasks. By repeating these analyses using bedtools, Galaxy and custom Python scripts, we show that the STQL solution is usually the simplest, and the parallel execution achieves significant speed-up with large data files. Finally, we describe how a biologist with minimal formal training in computer programming self-learned STQL to analyze DNA methylation data we produced from 60 pairs of hepatocellular carcinoma (HCC) samples. Overall, STQL and START provide a generic way for analyzing a large number of genomic signal tracks in parallel easily.

  2. Immediate detailed feedback to test-enhanced learning: an effective online educational tool.

    PubMed

    Wojcikowski, Ken; Kirk, Leslie

    2013-11-01

    Test-enhanced learning has gained popularity because it is an effective way to increase retention of knowledge; provided the student receives the correct answer soon after the test is taken. To determine whether detailed feedback provided to test-enhanced learning questions is an effective online educational tool for improving performance on complex biomedical information exams. A series of online multiple choice tests were developed to test knowledge of biomedical information that students were expected to know after each patient-case. Following submission of the student answers, one cohort (n = 52) received answers only while the following year, a second cohort (n = 51) received the answers with detailed feedback explaining why each answer was correct or incorrect. Students in both groups progressed through the series of online tests with little assessor intervention. Students receiving the answers along with the explanations within their feedback performed significantly better in the final biomedical information exam than those students receiving correct answers only. This pilot study found that the detailed feedback to test-enhanced learning questions is an important online learning tool. The increase in student performance in the complex biomedical information exam in this study suggests that detailed feedback should be investigated not only for increasing knowledge, but also be investigated for its effect on retention and application of knowledge.

  3. Amatchmethod Based on Latent Semantic Analysis for Earthquakehazard Emergency Plan

    NASA Astrophysics Data System (ADS)

    Sun, D.; Zhao, S.; Zhang, Z.; Shi, X.

    2017-09-01

    The structure of the emergency plan on earthquake is complex, and it's difficult for decision maker to make a decision in a short time. To solve the problem, this paper presents a match method based on Latent Semantic Analysis (LSA). After the word segmentation preprocessing of emergency plan, we carry out keywords extraction according to the part-of-speech and the frequency of words. Then through LSA, we map the documents and query information to the semantic space, and calculate the correlation of documents and queries by the relation between vectors. The experiments results indicate that the LSA can improve the accuracy of emergency plan retrieval efficiently.

  4. Quantum private query with perfect user privacy against a joint-measurement attack

    NASA Astrophysics Data System (ADS)

    Yang, Yu-Guang; Liu, Zhi-Chao; Li, Jian; Chen, Xiu-Bo; Zuo, Hui-Juan; Zhou, Yi-Hua; Shi, Wei-Min

    2016-12-01

    The joint-measurement (JM) attack is the most powerful threat to the database security for existing quantum-key-distribution (QKD)-based quantum private query (QPQ) protocols. Wei et al. (2016) [28] proposed a novel QPQ protocol against the JM attack. However, their protocol relies on two-way quantum communication thereby affecting its real implementation and communication efficiency. Moreover, it cannot ensure perfect user privacy. In this paper, we present a new one-way QPQ protocol in which the special way of classical post-processing of oblivious key ensures the security against the JM attack. Furthermore, it realizes perfect user privacy and lower complexity of communication.

  5. Introducing glycomics data into the Semantic Web

    PubMed Central

    2013-01-01

    Background Glycoscience is a research field focusing on complex carbohydrates (otherwise known as glycans)a, which can, for example, serve as “switches” that toggle between different functions of a glycoprotein or glycolipid. Due to the advancement of glycomics technologies that are used to characterize glycan structures, many glycomics databases are now publicly available and provide useful information for glycoscience research. However, these databases have almost no link to other life science databases. Results In order to implement support for the Semantic Web most efficiently for glycomics research, the developers of major glycomics databases agreed on a minimal standard for representing glycan structure and annotation information using RDF (Resource Description Framework). Moreover, all of the participants implemented this standard prototype and generated preliminary RDF versions of their data. To test the utility of the converted data, all of the data sets were uploaded into a Virtuoso triple store, and several SPARQL queries were tested as “proofs-of-concept” to illustrate the utility of the Semantic Web in querying across databases which were originally difficult to implement. Conclusions We were able to successfully retrieve information by linking UniCarbKB, GlycomeDB and JCGGDB in a single SPARQL query to obtain our target information. We also tested queries linking UniProt with GlycoEpitope as well as lectin data with GlycomeDB through PDB. As a result, we have been able to link proteomics data with glycomics data through the implementation of Semantic Web technologies, allowing for more flexible queries across these domains. PMID:24280648

  6. Introducing glycomics data into the Semantic Web.

    PubMed

    Aoki-Kinoshita, Kiyoko F; Bolleman, Jerven; Campbell, Matthew P; Kawano, Shin; Kim, Jin-Dong; Lütteke, Thomas; Matsubara, Masaaki; Okuda, Shujiro; Ranzinger, Rene; Sawaki, Hiromichi; Shikanai, Toshihide; Shinmachi, Daisuke; Suzuki, Yoshinori; Toukach, Philip; Yamada, Issaku; Packer, Nicolle H; Narimatsu, Hisashi

    2013-11-26

    Glycoscience is a research field focusing on complex carbohydrates (otherwise known as glycans)a, which can, for example, serve as "switches" that toggle between different functions of a glycoprotein or glycolipid. Due to the advancement of glycomics technologies that are used to characterize glycan structures, many glycomics databases are now publicly available and provide useful information for glycoscience research. However, these databases have almost no link to other life science databases. In order to implement support for the Semantic Web most efficiently for glycomics research, the developers of major glycomics databases agreed on a minimal standard for representing glycan structure and annotation information using RDF (Resource Description Framework). Moreover, all of the participants implemented this standard prototype and generated preliminary RDF versions of their data. To test the utility of the converted data, all of the data sets were uploaded into a Virtuoso triple store, and several SPARQL queries were tested as "proofs-of-concept" to illustrate the utility of the Semantic Web in querying across databases which were originally difficult to implement. We were able to successfully retrieve information by linking UniCarbKB, GlycomeDB and JCGGDB in a single SPARQL query to obtain our target information. We also tested queries linking UniProt with GlycoEpitope as well as lectin data with GlycomeDB through PDB. As a result, we have been able to link proteomics data with glycomics data through the implementation of Semantic Web technologies, allowing for more flexible queries across these domains.

  7. Open Data, Jupyter Notebooks and Geospatial Data Standards Combined - Opening up large volumes of marine and climate data to other communities

    NASA Astrophysics Data System (ADS)

    Clements, O.; Siemen, S.; Wagemann, J.

    2017-12-01

    The EU-funded Earthserver-2 project aims to offer on-demand access to large volumes of environmental data (Earth Observation, Marine, Climate data and Planetary data) via the interface standard Web Coverage Service defined by the Open Geospatial Consortium. Providing access to data via OGC web services (e.g. WCS and WMS) has the potential to open up services to a wider audience, especially to users outside the respective communities. Especially WCS 2.0 with its processing extension Web Coverage Processing Service (WCPS) is highly beneficial to make large volumes accessible to non-expert communities. Users do not have to deal with custom community data formats, such as GRIB for the meteorological community, but can directly access the data in a format they are more familiar with, such as NetCDF, JSON or CSV. Data requests can further directly be integrated into custom processing routines and users are not required to download Gigabytes of data anymore. WCS supports trim (reduction of data extent) and slice (reduction of data dimension) operations on multi-dimensional data, providing users a very flexible on-demand access to the data. WCPS allows the user to craft queries to run on the data using a text-based query language, similar to SQL. These queries can be very powerful, e.g. condensing a three-dimensional data cube into its two-dimensional mean. However, the more processing-intensive the more complex the query. As part of the EarthServer-2 project, we developed a python library that helps users to generate complex WCPS queries with Python, a programming language they are more familiar with. The interactive presentation aims to give practical examples how users can benefit from two specific WCS services from the Marine and Climate community. Use-cases from the two communities will show different approaches to take advantage of a Web Coverage (Processing) Service. The entire content is available with Jupyter Notebooks, as they prove to be a highly beneficial tool to generate reproducible workflows for environmental data analysis.

  8. Muscle Logic: New Knowledge Resource for Anatomy Enables Comprehensive Searches of the Literature on the Feeding Muscles of Mammals.

    PubMed

    Druzinsky, Robert E; Balhoff, James P; Crompton, Alfred W; Done, James; German, Rebecca Z; Haendel, Melissa A; Herrel, Anthony; Herring, Susan W; Lapp, Hilmar; Mabee, Paula M; Muller, Hans-Michael; Mungall, Christopher J; Sternberg, Paul W; Van Auken, Kimberly; Vinyard, Christopher J; Williams, Susan H; Wall, Christine E

    2016-01-01

    In recent years large bibliographic databases have made much of the published literature of biology available for searches. However, the capabilities of the search engines integrated into these databases for text-based bibliographic searches are limited. To enable searches that deliver the results expected by comparative anatomists, an underlying logical structure known as an ontology is required. Here we present the Mammalian Feeding Muscle Ontology (MFMO), a multi-species ontology focused on anatomical structures that participate in feeding and other oral/pharyngeal behaviors. A unique feature of the MFMO is that a simple, computable, definition of each muscle, which includes its attachments and innervation, is true across mammals. This construction mirrors the logical foundation of comparative anatomy and permits searches using language familiar to biologists. Further, it provides a template for muscles that will be useful in extending any anatomy ontology. The MFMO is developed to support the Feeding Experiments End-User Database Project (FEED, https://feedexp.org/), a publicly-available, online repository for physiological data collected from in vivo studies of feeding (e.g., mastication, biting, swallowing) in mammals. Currently the MFMO is integrated into FEED and also into two literature-specific implementations of Textpresso, a text-mining system that facilitates powerful searches of a corpus of scientific publications. We evaluate the MFMO by asking questions that test the ability of the ontology to return appropriate answers (competency questions). We compare the results of queries of the MFMO to results from similar searches in PubMed and Google Scholar. Our tests demonstrate that the MFMO is competent to answer queries formed in the common language of comparative anatomy, but PubMed and Google Scholar are not. Overall, our results show that by incorporating anatomical ontologies into searches, an expanded and anatomically comprehensive set of results can be obtained. The broader scientific and publishing communities should consider taking up the challenge of semantically enabled search capabilities.

  9. The neural basis of deictic shifting in linguistic perspective-taking in high-functioning autism

    PubMed Central

    Liu, Yanni; Williams, Diane L.; Keller, Timothy A.; Minshew, Nancy J.; Just, Marcel Adam

    2011-01-01

    Personal pronouns, such as ‘I’ and ‘you’, require a speaker/listener to continuously re-map their reciprocal relation to their referent, depending on who is saying the pronoun. This process, called ‘deictic shifting’, may underlie the incorrect production of these pronouns, or ‘pronoun reversals’, such as referring to oneself with the pronoun ‘you’, which has been reported in children with autism. The underlying neural basis of deictic shifting, however, is not understood, nor has the processing of pronouns been studied in adults with autism. The present study compared the brain activation pattern and functional connectivity (synchronization of activation across brain areas) of adults with high-functioning autism and control participants using functional magnetic resonance imaging in a linguistic perspective-taking task that required deictic shifting. The results revealed significantly diminished frontal (right anterior insula) to posterior (precuneus) functional connectivity during deictic shifting in the autism group, as well as reliably slower and less accurate behavioural responses. A comparison of two types of deictic shifting revealed that the functional connectivity between the right anterior insula and precuneus was lower in autism while answering a question that contained the pronoun ‘you’, querying something about the participant’s view, but not when answering a query about someone else’s view. In addition to the functional connectivity between the right anterior insula and precuneus being lower in autism, activation in each region was atypical, suggesting over reliance on individual regions as a potential compensation for the lower level of collaborative interregional processing. These findings indicate that deictic shifting constitutes a challenge for adults with high-functioning autism, particularly when reference to one’s self is involved, and that the functional collaboration of two critical nodes, right anterior insula and precuneus, may play a critical role for deictic shifting by supporting an attention shift between oneself and others. PMID:21733887

  10. The neural basis of deictic shifting in linguistic perspective-taking in high-functioning autism.

    PubMed

    Mizuno, Akiko; Liu, Yanni; Williams, Diane L; Keller, Timothy A; Minshew, Nancy J; Just, Marcel Adam

    2011-08-01

    Personal pronouns, such as 'I' and 'you', require a speaker/listener to continuously re-map their reciprocal relation to their referent, depending on who is saying the pronoun. This process, called 'deictic shifting', may underlie the incorrect production of these pronouns, or 'pronoun reversals', such as referring to oneself with the pronoun 'you', which has been reported in children with autism. The underlying neural basis of deictic shifting, however, is not understood, nor has the processing of pronouns been studied in adults with autism. The present study compared the brain activation pattern and functional connectivity (synchronization of activation across brain areas) of adults with high-functioning autism and control participants using functional magnetic resonance imaging in a linguistic perspective-taking task that required deictic shifting. The results revealed significantly diminished frontal (right anterior insula) to posterior (precuneus) functional connectivity during deictic shifting in the autism group, as well as reliably slower and less accurate behavioural responses. A comparison of two types of deictic shifting revealed that the functional connectivity between the right anterior insula and precuneus was lower in autism while answering a question that contained the pronoun 'you', querying something about the participant's view, but not when answering a query about someone else's view. In addition to the functional connectivity between the right anterior insula and precuneus being lower in autism, activation in each region was atypical, suggesting over reliance on individual regions as a potential compensation for the lower level of collaborative interregional processing. These findings indicate that deictic shifting constitutes a challenge for adults with high-functioning autism, particularly when reference to one's self is involved, and that the functional collaboration of two critical nodes, right anterior insula and precuneus, may play a critical role for deictic shifting by supporting an attention shift between oneself and others.

  11. MEDCIS: Multi-Modality Epilepsy Data Capture and Integration System

    PubMed Central

    Zhang, Guo-Qiang; Cui, Licong; Lhatoo, Samden; Schuele, Stephan U.; Sahoo, Satya S.

    2014-01-01

    Sudden Unexpected Death in Epilepsy (SUDEP) is the leading mode of epilepsy-related death and is most common in patients with intractable, frequent, and continuing seizures. A statistically significant cohort of patients for SUDEP study requires meticulous, prospective follow up of a large population that is at an elevated risk, best represented by the Epilepsy Monitoring Unit (EMU) patient population. Multiple EMUs need to collaborate, share data for building a larger cohort of potential SUDEP patient using a state-of-the-art informatics infrastructure. To address the challenges of data integration and data access from multiple EMUs, we developed the Multi-Modality Epilepsy Data Capture and Integration System (MEDCIS) that combines retrospective clinical free text processing using NLP, prospective structured data capture using an ontology-driven interface, interfaces for cohort search and signal visualization, all in a single integrated environment. A dedicated Epilepsy and Seizure Ontology (EpSO) has been used to streamline the user interfaces, enhance its usability, and enable mappings across distributed databases so that federated queries can be executed. MEDCIS contained 936 patient data sets from the EMUs of University Hospitals Case Medical Center (UH CMC) in Cleveland and Northwestern Memorial Hospital (NMH) in Chicago. Patients from UH CMC and NMH were stored in different databases and then federated through MEDCIS using EpSO and our mapping module. More than 77GB of multi-modal signal data were processed using the Cloudwave pipeline and made available for rendering through the web-interface. About 74% of the 40 open clinical questions of interest were answerable accurately using the EpSO-driven VISual AGregagator and Explorer (VISAGE) interface. Questions not directly answerable were either due to their inherent computational complexity, the unavailability of primary information, or the scope of concept that has been formulated in the existing EpSO terminology system. PMID:25954436

  12. Adverse-drug-event data provided by pharmaceutical companies.

    PubMed

    Cudny, Magdalena E; Graham, Angie S

    2008-06-01

    Pharmaceutical company drug information center (PCDIC) responses to queries about adverse drug events (ADEs) were studied to determine whether PCDICs search sources other than the prescribing information on the package insert (PI) and whether the PCDICs' approach differs according to whether an ADE is listed in the PI (labeled) or not (unlabeled). Companies were selected from a list of PCDICs in the Physicians' Desk Reference. One oral or injectable prescription drug from each company was selected. For each drug, a labeled ADE and an unlabeled ADE about which to query the PCDICs were randomly selected from the index of an annual publication on ADEs. The investigators telephoned the PCDICs with an open-ended inquiry about the incidence, timing, and management of the ADE as reported in the literature and the company's internal data; they clarified that the request did not concern a specific patient. Whether or not information was provided, the source searched was recorded (PI, literature, internal database), and the percentages of PCDICs that used each source for labeled and for unlabeled ADEs were analyzed. Results were obtained from 100 companies to questions about 100 drugs (200 ADEs). For ADEs overall, 80% used the PI, 50% the medical literature, and 38% internal data. For labeled versus unlabeled ADEs, respectively, the PI was used by 84% and 76%; literature, both 50%; and internal data, 35% and 41%. The PCDIC specialists referencing the PI did not always provide accurate or up-to-date information. Some specialists, when asked to query internal databases, said that was not an option. For both labeled and unlabeled ADEs, the PI was the primary source used by PCDICs to answer safety questions about their products, and internal data were the least-used source. Most resources used by PCDICs are readily available to practicing pharmacists.

  13. Aftermath of bustamante attack on genomic beacon service.

    PubMed

    Aziz, Md Momin Al; Ghasemi, Reza; Waliullah, Md; Mohammed, Noman

    2017-07-26

    With the enormous need for federated eco-system for holding global genomic and clinical data, Global Alliance for Genomic and Health (GA4GH) has created an international website called beacon service which allows a researcher to find out whether a specific dataset can be utilized to his or her research beforehand. This simple webservice is quite useful as it allows queries like whether a certain position of a target chromosome has a specific nucleotide. However, the increased integration of individuals genomic data into clinical practice and research raised serious privacy concern. Though the answer of such queries are yes or no in Bacon network, it results in serious privacy implication as demonstrated in a recent work from Shringarpure and Bustamante. In their attack model, the authors demonstrated that with a limited number of queries, presence of an individual in any dataset can be determined. We propose two lightweight algorithms (based on randomized response) which captures the efficacy while preserving the privacy of the participants in a genomic beacon service. We also elaborate the strength and weakness of the attack by explaining some of their statistical and mathematical models using real world genomic database. We extend their experimental simulations for different adversarial assumptions and parameters. We experimentally evaluated the solutions on the original attack model with different parameters for better understanding of the privacy and utility tradeoffs provided by these two methods. Also, the statistical analysis further elaborates the different aspects of the prior attack which leads to a better risk management for the participants in a beacon service. The differentially private and lightweight solutions discussed here will make the attack much difficult to succeed while maintaining the fundamental motivation of beacon database network.

  14. Integration of relational and textual biomedical sources. A pilot experiment using a semi-automated method for logical schema acquisition.

    PubMed

    García-Remesal, M; Maojo, V; Billhardt, H; Crespo, J

    2010-01-01

    Bringing together structured and text-based sources is an exciting challenge for biomedical informaticians, since most relevant biomedical sources belong to one of these categories. In this paper we evaluate the feasibility of integrating relational and text-based biomedical sources using: i) an original logical schema acquisition method for textual databases developed by the authors, and ii) OntoFusion, a system originally designed by the authors for the integration of relational sources. We conducted an integration experiment involving a test set of seven differently structured sources covering the domain of genetic diseases. We used our logical schema acquisition method to generate schemas for all textual sources. The sources were integrated using the methods and tools provided by OntoFusion. The integration was validated using a test set of 500 queries. A panel of experts answered a questionnaire to evaluate i) the quality of the extracted schemas, ii) the query processing performance of the integrated set of sources, and iii) the relevance of the retrieved results. The results of the survey show that our method extracts coherent and representative logical schemas. Experts' feedback on the performance of the integrated system and the relevance of the retrieved results was also positive. Regarding the validation of the integration, the system successfully provided correct results for all queries in the test set. The results of the experiment suggest that text-based sources including a logical schema can be regarded as equivalent to structured databases. Using our method, previous research and existing tools designed for the integration of structured databases can be reused - possibly subject to minor modifications - to integrate differently structured sources.

  15. Privacy Risks from Genomic Data-Sharing Beacons.

    PubMed

    Shringarpure, Suyash S; Bustamante, Carlos D

    2015-11-05

    The human genetics community needs robust protocols that enable secure sharing of genomic data from participants in genetic research. Beacons are web servers that answer allele-presence queries--such as "Do you have a genome that has a specific nucleotide (e.g., A) at a specific genomic position (e.g., position 11,272 on chromosome 1)?"--with either "yes" or "no." Here, we show that individuals in a beacon are susceptible to re-identification even if the only data shared include presence or absence information about alleles in a beacon. Specifically, we propose a likelihood-ratio test of whether a given individual is present in a given genetic beacon. Our test is not dependent on allele frequencies and is the most powerful test for a specified false-positive rate. Through simulations, we showed that in a beacon with 1,000 individuals, re-identification is possible with just 5,000 queries. Relatives can also be identified in the beacon. Re-identification is possible even in the presence of sequencing errors and variant-calling differences. In a beacon constructed with 65 European individuals from the 1000 Genomes Project, we demonstrated that it is possible to detect membership in the beacon with just 250 SNPs. With just 1,000 SNP queries, we were able to detect the presence of an individual genome from the Personal Genome Project in an existing beacon. Our results show that beacons can disclose membership and implied phenotypic information about participants and do not protect privacy a priori. We discuss risk mitigation through policies and standards such as not allowing anonymous pings of genetic beacons and requiring minimum beacon sizes. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.

  16. Efficient strategies to find diagnostic test accuracy studies in kidney journals.

    PubMed

    Rogerson, Thomas E; Ladhani, Maleeka; Mitchell, Ruth; Craig, Jonathan C; Webster, Angela C

    2015-08-01

    Nephrologists looking for quick answers to diagnostic clinical questions in MEDLINE can use a range of published search strategies or Clinical Query limits to improve the precision of their searches. We aimed to evaluate existing search strategies for finding diagnostic test accuracy studies in nephrology journals. We assessed the accuracy of 14 search strategies for retrieving diagnostic test accuracy studies from three nephrology journals indexed in MEDLINE. Two investigators hand searched the same journals to create a reference set of diagnostic test accuracy studies to compare search strategy results against. We identified 103 diagnostic test accuracy studies, accounting for 2.1% of all studies published. The most specific search strategy was the Narrow Clinical Queries limit (sensitivity: 0.20, 95% CI 0.13-0.29; specificity: 0.99, 95% CI 0.99-0.99). Using the Narrow Clinical Queries limit, a searcher would need to screen three (95% CI 2-6) articles to find one diagnostic study. The most sensitive search strategy was van der Weijden 1999 Extended (sensitivity: 0.95; 95% CI 0.89-0.98; specificity 0.55, 95% CI 0.53-0.56) but required a searcher to screen 24 (95% CI 23-26) articles to find one diagnostic study. Bachmann 2002 was the best balanced search strategy, which was sensitive (0.88, 95% CI 0.81-0.94), but also specific (0.74, 95% CI 0.73-0.75), with a number needed to screen of 15 (95% CI 14-17). Diagnostic studies are infrequently published in nephrology journals. The addition of a strategy for diagnostic studies to a subject search strategy in MEDLINE may reduce the records needed to screen while preserving adequate search sensitivity for routine clinical use. © 2015 Asian Pacific Society of Nephrology.

  17. Querying Provenance Information: Basic Notions and an Example from Paleoclimate Reconstruction

    NASA Astrophysics Data System (ADS)

    Stodden, V.; Ludaescher, B.; Bocinsky, K.; Kintigh, K.; Kohler, T.; McPhillips, T.; Rush, J.

    2016-12-01

    Computational models are used to reconstruct and explain past environments and to predict likely future environments. For example, Bocinsky and Kohler have performed a 2,000-year reconstruction of the rain-fed maize agricultural niche in the US Southwest. The resulting academic publications not only contain traditional method descriptions, figures, etc. but also links to code and data for basic transparency and reproducibility. Examples include ResearchCompendia.org and the new project "Merging Science and Cyberinfrastructure Pathways: The Whole Tale." Provenance information provides a further critical element to understand a published study and to possibly extend or challenge the findings of the original authors. We present different notions and uses of provenance information using a computational archaeology example, e.g., the common use of "provenance for others" (for transparency and reproducibility), but also the more elusive but equally important use of "provenance for self". To this end, we distinguish prospective provenance (a.k.a. workflow) from retrospective provenance (a.k.a. data lineage) and show how combinations of both forms of provenance can be used to answer different kinds of important questions about a workflow and its execution. Since many workflows are developed using scripting or special purpose languages such as Python and R, we employ an approach and toolkit called YesWorkflow that brings provenance modeling, capture, and querying into the realm of scripting. YesWorkflow employs the basic W3C PROV standard, as well as the ProvONE extension for sharing and exchanging retrospective and prospective provenance information, respectively. Finally, we argue that the utility of provenance information should be maximized by developing different kinds provenance questions and queries during the early phases of computational workflow design and implementation.

  18. YEASTRACT: providing a programmatic access to curated transcriptional regulatory associations in Saccharomyces cerevisiae through a web services interface

    PubMed Central

    Abdulrehman, Dário; Monteiro, Pedro Tiago; Teixeira, Miguel Cacho; Mira, Nuno Pereira; Lourenço, Artur Bastos; dos Santos, Sandra Costa; Cabrito, Tânia Rodrigues; Francisco, Alexandre Paulo; Madeira, Sara Cordeiro; Aires, Ricardo Santos; Oliveira, Arlindo Limede; Sá-Correia, Isabel; Freitas, Ana Teresa

    2011-01-01

    The YEAst Search for Transcriptional Regulators And Consensus Tracking (YEASTRACT) information system (http://www.yeastract.com) was developed to support the analysis of transcription regulatory associations in Saccharomyces cerevisiae. Last updated in June 2010, this database contains over 48 200 regulatory associations between transcription factors (TFs) and target genes, including 298 specific DNA-binding sites for 110 characterized TFs. All regulatory associations stored in the database were revisited and detailed information on the experimental evidences that sustain those associations was added and classified as direct or indirect evidences. The inclusion of this new data, gathered in response to the requests of YEASTRACT users, allows the user to restrict its queries to subsets of the data based on the existence or not of experimental evidences for the direct action of the TFs in the promoter region of their target genes. Another new feature of this release is the availability of all data through a machine readable web-service interface. Users are no longer restricted to the set of available queries made available through the existing web interface, and can use the web service interface to query, retrieve and exploit the YEASTRACT data using their own implementation of additional functionalities. The YEASTRACT information system is further complemented with several computational tools that facilitate the use of the curated data when answering a number of important biological questions. Since its first release in 2006, YEASTRACT has been extensively used by hundreds of researchers from all over the world. We expect that by making the new data and services available, the system will continue to be instrumental for yeast biologists and systems biology researchers. PMID:20972212

  19. GO2PUB: Querying PubMed with semantic expansion of gene ontology terms

    PubMed Central

    2012-01-01

    Background With the development of high throughput methods of gene analyses, there is a growing need for mining tools to retrieve relevant articles in PubMed. As PubMed grows, literature searches become more complex and time-consuming. Automated search tools with good precision and recall are necessary. We developed GO2PUB to automatically enrich PubMed queries with gene names, symbols and synonyms annotated by a GO term of interest or one of its descendants. Results GO2PUB enriches PubMed queries based on selected GO terms and keywords. It processes the result and displays the PMID, title, authors, abstract and bibliographic references of the articles. Gene names, symbols and synonyms that have been generated as extra keywords from the GO terms are also highlighted. GO2PUB is based on a semantic expansion of PubMed queries using the semantic inheritance between terms through the GO graph. Two experts manually assessed the relevance of GO2PUB, GoPubMed and PubMed on three queries about lipid metabolism. Experts’ agreement was high (kappa = 0.88). GO2PUB returned 69% of the relevant articles, GoPubMed: 40% and PubMed: 29%. GO2PUB and GoPubMed have 17% of their results in common, corresponding to 24% of the total number of relevant results. 70% of the articles returned by more than one tool were relevant. 36% of the relevant articles were returned only by GO2PUB, 17% only by GoPubMed and 14% only by PubMed. For determining whether these results can be generalized, we generated twenty queries based on random GO terms with a granularity similar to those of the first three queries and compared the proportions of GO2PUB and GoPubMed results. These were respectively of 77% and 40% for the first queries, and of 70% and 38% for the random queries. The two experts also assessed the relevance of seven of the twenty queries (the three related to lipid metabolism and four related to other domains). Expert agreement was high (0.93 and 0.8). GO2PUB and GoPubMed performances were similar to those of the first queries. Conclusions We demonstrated that the use of genes annotated by either GO terms of interest or a descendant of these GO terms yields some relevant articles ignored by other tools. The comparison of GO2PUB, based on semantic expansion, with GoPubMed, based on text mining techniques, showed that both tools are complementary. The analysis of the randomly-generated queries suggests that the results obtained about lipid metabolism can be generalized to other biological processes. GO2PUB is available at http://go2pub.genouest.org. PMID:22958570

  20. Performance Prediction of a MongoDB-Based Traceability System in Smart Factory Supply Chains

    PubMed Central

    Kang, Yong-Shin; Park, Il-Ha; Youm, Sekyoung

    2016-01-01

    In the future, with the advent of the smart factory era, manufacturing and logistics processes will become more complex, and the complexity and criticality of traceability will further increase. This research aims at developing a performance assessment method to verify scalability when implementing traceability systems based on key technologies for smart factories, such as Internet of Things (IoT) and BigData. To this end, based on existing research, we analyzed traceability requirements and an event schema for storing traceability data in MongoDB, a document-based Not Only SQL (NoSQL) database. Next, we analyzed the algorithm of the most representative traceability query and defined a query-level performance model, which is composed of response times for the components of the traceability query algorithm. Next, this performance model was solidified as a linear regression model because the response times increase linearly by a benchmark test. Finally, for a case analysis, we applied the performance model to a virtual automobile parts logistics. As a result of the case study, we verified the scalability of a MongoDB-based traceability system and predicted the point when data node servers should be expanded in this case. The traceability system performance assessment method proposed in this research can be used as a decision-making tool for hardware capacity planning during the initial stage of construction of traceability systems and during their operational phase. PMID:27983654

  1. Performance Prediction of a MongoDB-Based Traceability System in Smart Factory Supply Chains.

    PubMed

    Kang, Yong-Shin; Park, Il-Ha; Youm, Sekyoung

    2016-12-14

    In the future, with the advent of the smart factory era, manufacturing and logistics processes will become more complex, and the complexity and criticality of traceability will further increase. This research aims at developing a performance assessment method to verify scalability when implementing traceability systems based on key technologies for smart factories, such as Internet of Things (IoT) and BigData. To this end, based on existing research, we analyzed traceability requirements and an event schema for storing traceability data in MongoDB, a document-based Not Only SQL (NoSQL) database. Next, we analyzed the algorithm of the most representative traceability query and defined a query-level performance model, which is composed of response times for the components of the traceability query algorithm. Next, this performance model was solidified as a linear regression model because the response times increase linearly by a benchmark test. Finally, for a case analysis, we applied the performance model to a virtual automobile parts logistics. As a result of the case study, we verified the scalability of a MongoDB-based traceability system and predicted the point when data node servers should be expanded in this case. The traceability system performance assessment method proposed in this research can be used as a decision-making tool for hardware capacity planning during the initial stage of construction of traceability systems and during their operational phase.

  2. Query optimization for graph analytics on linked data using SPARQL

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hong, Seokyong; Lee, Sangkeun; Lim, Seung -Hwan

    2015-07-01

    Triplestores that support query languages such as SPARQL are emerging as the preferred and scalable solution to represent data and meta-data as massive heterogeneous graphs using Semantic Web standards. With increasing adoption, the desire to conduct graph-theoretic mining and exploratory analysis has also increased. Addressing that desire, this paper presents a solution that is the marriage of Graph Theory and the Semantic Web. We present software that can analyze Linked Data using graph operations such as counting triangles, finding eccentricity, testing connectedness, and computing PageRank directly on triple stores via the SPARQL interface. We describe the process of optimizing performancemore » of the SPARQL-based implementation of such popular graph algorithms by reducing the space-overhead, simplifying iterative complexity and removing redundant computations by understanding query plans. Our optimized approach shows significant performance gains on triplestores hosted on stand-alone workstations as well as hardware-optimized scalable supercomputers such as the Cray XMT.« less

  3. Toward automated consumer question answering: automatically separating consumer questions from professional questions in the healthcare domain.

    PubMed

    Liu, Feifan; Antieau, Lamont D; Yu, Hong

    2011-12-01

    Both healthcare professionals and healthcare consumers have information needs that can be met through the use of computers, specifically via medical question answering systems. However, the information needs of both groups are different in terms of literacy levels and technical expertise, and an effective question answering system must be able to account for these differences if it is to formulate the most relevant responses for users from each group. In this paper, we propose that a first step toward answering the queries of different users is automatically classifying questions according to whether they were asked by healthcare professionals or consumers. We obtained two sets of consumer questions (~10,000 questions in total) from Yahoo answers. The professional questions consist of two question collections: 4654 point-of-care questions (denoted as PointCare) obtained from interviews of a group of family doctors following patient visits and 5378 questions from physician practices through professional online services (denoted as OnlinePractice). With more than 20,000 questions combined, we developed supervised machine-learning models for automatic classification between consumer questions and professional questions. To evaluate the robustness of our models, we tested the model that was trained on the Consumer-PointCare dataset on the Consumer-OnlinePractice dataset. We evaluated both linguistic features and statistical features and examined how the characteristics in two different types of professional questions (PointCare vs. OnlinePractice) may affect the classification performance. We explored information gain for feature reduction and the back-off linguistic category features. The 10-fold cross-validation results showed the best F1-measure of 0.936 and 0.946 on Consumer-PointCare and Consumer-OnlinePractice respectively, and the best F1-measure of 0.891 when testing the Consumer-PointCare model on the Consumer-OnlinePractice dataset. Healthcare consumer questions posted at Yahoo online communities can be reliably classified from professional questions posted by point-of-care clinicians and online physicians. The supervised machine-learning models are robust for this task. Our study will significantly benefit further development in automated consumer question answering. Copyright © 2011 Elsevier Inc. All rights reserved.

  4. Scale-Independent Relational Query Processing

    DTIC Science & Technology

    2013-10-04

    source options are also available, including Postgresql, MySQL , and SQLite. These mod- ern relational databases are generally very complex software systems...and Their Application to Data Stream Management. IGI Global, 2010. [68] George Reese. Database Programming with JDBC and Java , Second Edition. Ed. by

  5. OpenSearch technology for geospatial resources discovery

    NASA Astrophysics Data System (ADS)

    Papeschi, Fabrizio; Enrico, Boldrini; Mazzetti, Paolo

    2010-05-01

    In 2005, the term Web 2.0 has been coined by Tim O'Reilly to describe a quickly growing set of Web-based applications that share a common philosophy of "mutually maximizing collective intelligence and added value for each participant by formalized and dynamic information sharing". Around this same period, OpenSearch a new Web 2.0 technology, was developed. More properly, OpenSearch is a collection of technologies that allow publishing of search results in a format suitable for syndication and aggregation. It is a way for websites and search engines to publish search results in a standard and accessible format. Due to its strong impact on the way the Web is perceived by users and also due its relevance for businesses, Web 2.0 has attracted the attention of both mass media and the scientific community. This explosive growth in popularity of Web 2.0 technologies like OpenSearch, and practical applications of Service Oriented Architecture (SOA) resulted in an increased interest in similarities, convergence, and a potential synergy of these two concepts. SOA is considered as the philosophy of encapsulating application logic in services with a uniformly defined interface and making these publicly available via discovery mechanisms. Service consumers may then retrieve these services, compose and use them according to their current needs. A great degree of similarity between SOA and Web 2.0 may be leading to a convergence between the two paradigms. They also expose divergent elements, such as the Web 2.0 support to the human interaction in opposition to the typical SOA machine-to-machine interaction. According to these considerations, the Geospatial Information (GI) domain, is also moving first steps towards a new approach of data publishing and discovering, in particular taking advantage of the OpenSearch technology. A specific GI niche is represented by the OGC Catalog Service for Web (CSW) that is part of the OGC Web Services (OWS) specifications suite, which provides a set of services for discovery, access, and processing of geospatial resources in a SOA framework. GI-cat is a distributed CSW framework implementation developed by the ESSI Lab of the Italian National Research Council (CNR-IMAA) and the University of Florence. It provides brokering and mediation functionalities towards heterogeneous resources and inventories, exposing several standard interfaces for query distribution. This work focuses on a new GI-cat interface which allows the catalog to be queried according to the OpenSearch syntax specification, thus filling the gap between the SOA architectural design of the CSW and the Web 2.0. At the moment, there is no OGC standard specification about this topic, but an official change request has been proposed in order to enable the OGC catalogues to support OpenSearch queries. In this change request, an OpenSearch extension is proposed providing a standard mechanism to query a resource based on temporal and geographic extents. Two new catalog operations are also proposed, in order to publish a suitable OpenSearch interface. This extended interface is implemented by the modular GI-cat architecture adding a new profiling module called "OpenSearch profiler". Since GI-cat also acts as a clearinghouse catalog, another component called "OpenSearch accessor" is added in order to access OpenSearch compliant services. An important role in the GI-cat extension, is played by the adopted mapping strategy. Two different kind of mappings are required: query, and response elements mapping. Query mapping is provided in order to fit the simple OpenSearch query syntax to the complex CSW query expressed by the OGC Filter syntax. GI-cat internal data model is based on the ISO-19115 profile, that is more complex than the simple XML syndication formats, such as RSS 2.0 and Atom 1.0, suggested by OpenSearch. Once response elements are available, in order to be presented, they need to be translated from the GI-cat internal data model, to the above mentioned syndication formats; the mapping processing, is bidirectional. When GI-cat is used to access OpenSearch compliant services, the CSW query must be mapped to the OpenSearch query, and the response elements, must be translated according to the GI-cat internal data model. As results of such extensions, GI-cat provides a user friendly facade to the complex CSW interface, thus enabling it to be queried, for example, using a browser toolbar.

  6. Time-related patient data retrieval for the case studies from the pharmacogenomics research network

    PubMed Central

    Zhu, Qian; Tao, Cui; Ding, Ying; Chute, Christopher G.

    2012-01-01

    There are lots of question-based data elements from the pharmacogenomics research network (PGRN) studies. Many data elements contain temporal information. To semantically represent these elements so that they can be machine processiable is a challenging problem for the following reasons: (1) the designers of these studies usually do not have the knowledge of any computer modeling and query languages, so that the original data elements usually are represented in spreadsheets in human languages; and (2) the time aspects in these data elements can be too complex to be represented faithfully in a machine-understandable way. In this paper, we introduce our efforts on representing these data elements using semantic web technologies. We have developed an ontology, CNTRO, for representing clinical events and their temporal relations in the web ontology language (OWL). Here we use CNTRO to represent the time aspects in the data elements. We have evaluated 720 time-related data elements from PGRN studies. We adapted and extended the knowledge representation requirements for EliXR-TIME to categorize our data elements. A CNTRO-based SPARQL query builder has been developed to customize users’ own SPARQL queries for each knowledge representation requirement. The SPARQL query builder has been evaluated with a simulated EHR triple store to ensure its functionalities. PMID:23076712

  7. Time-related patient data retrieval for the case studies from the pharmacogenomics research network.

    PubMed

    Zhu, Qian; Tao, Cui; Ding, Ying; Chute, Christopher G

    2012-11-01

    There are lots of question-based data elements from the pharmacogenomics research network (PGRN) studies. Many data elements contain temporal information. To semantically represent these elements so that they can be machine processiable is a challenging problem for the following reasons: (1) the designers of these studies usually do not have the knowledge of any computer modeling and query languages, so that the original data elements usually are represented in spreadsheets in human languages; and (2) the time aspects in these data elements can be too complex to be represented faithfully in a machine-understandable way. In this paper, we introduce our efforts on representing these data elements using semantic web technologies. We have developed an ontology, CNTRO, for representing clinical events and their temporal relations in the web ontology language (OWL). Here we use CNTRO to represent the time aspects in the data elements. We have evaluated 720 time-related data elements from PGRN studies. We adapted and extended the knowledge representation requirements for EliXR-TIME to categorize our data elements. A CNTRO-based SPARQL query builder has been developed to customize users' own SPARQL queries for each knowledge representation requirement. The SPARQL query builder has been evaluated with a simulated EHR triple store to ensure its functionalities.

  8. The NASA Navigator Program Ground Based Archives at the Michelson Science Center: Supporting the Search for Habitable Planets

    NASA Astrophysics Data System (ADS)

    Berriman, G. B.; Ciardi, D. R.; Good, J. C.; Laity, A. C.; Zhang, A.

    2006-07-01

    At ADASS XIV, we described how the W. M. Keck Observatory Archive (KOA) re-uses and extends the component based architecture of the NASA/IPAC Infrared Science Archive (IRSA) to ingest and serve level 0 observations made with HIRES, the High Resolution Echelle Spectrometer. Since August 18, the KOA has ingested 325 GB of data from 135 nights of observations. The architecture exploits a service layer between the mass storage layer and the user interface. This service layer consists of standalone utilities called through a simple executive that perform generic query and retrieval functions, such as query generation, database table sub-setting, and return page generation etc. It has been extended to implement proprietary access to data through deployment of query management middleware developed for the National Virtual Observatory. The MSC archives have recently extended this design to query and retrieve complex data sets describing the properties of potential target stars for the Terrestrial Planet Finder (TPF) missions. The archives can now support knowledge based retrieval, as well as data retrieval. This paper describes how extensions to the IRSA architecture, which is applicable across all wavelengths and astronomical datatypes, supports the design and development of the MSC NP archives at modest cost.

  9. Designing integrated computational biology pipelines visually.

    PubMed

    Jamil, Hasan M

    2013-01-01

    The long-term cost of developing and maintaining a computational pipeline that depends upon data integration and sophisticated workflow logic is too high to even contemplate "what if" or ad hoc type queries. In this paper, we introduce a novel application building interface for computational biology research, called VizBuilder, by leveraging a recent query language called BioFlow for life sciences databases. Using VizBuilder, it is now possible to develop ad hoc complex computational biology applications at throw away costs. The underlying query language supports data integration and workflow construction almost transparently and fully automatically, using a best effort approach. Users express their application by drawing it with VizBuilder icons and connecting them in a meaningful way. Completed applications are compiled and translated as BioFlow queries for execution by the data management system LifeDB, for which VizBuilder serves as a front end. We discuss VizBuilder features and functionalities in the context of a real life application after we briefly introduce BioFlow. The architecture and design principles of VizBuilder are also discussed. Finally, we outline future extensions of VizBuilder. To our knowledge, VizBuilder is a unique system that allows visually designing computational biology pipelines involving distributed and heterogeneous resources in an ad hoc manner.

  10. Monte-Carlo methods make Dempster-Shafer formalism feasible

    NASA Technical Reports Server (NTRS)

    Kreinovich, Vladik YA.; Bernat, Andrew; Borrett, Walter; Mariscal, Yvonne; Villa, Elsa

    1991-01-01

    One of the main obstacles to the applications of Dempster-Shafer formalism is its computational complexity. If we combine m different pieces of knowledge, then in general case we have to perform up to 2(sup m) computational steps, which for large m is infeasible. For several important cases algorithms with smaller running time were proposed. We prove, however, that if we want to compute the belief bel(Q) in any given query Q, then exponential time is inevitable. It is still inevitable, if we want to compute bel(Q) with given precision epsilon. This restriction corresponds to the natural idea that since initial masses are known only approximately, there is no sense in trying to compute bel(Q) precisely. A further idea is that there is always some doubt in the whole knowledge, so there is always a probability p(sub o) that the expert's knowledge is wrong. In view of that it is sufficient to have an algorithm that gives a correct answer a probability greater than 1-p(sub o). If we use the original Dempster's combination rule, this possibility diminishes the running time, but still leaves the problem infeasible in the general case. We show that for the alternative combination rules proposed by Smets and Yager feasible methods exist. We also show how these methods can be parallelized, and what parallelization model fits this problem best.

  11. An artificially intelligent chat agent that answers adolescents' questions related to sex, drugs, and alcohol: an exploratory study.

    PubMed

    Crutzen, Rik; Peters, Gjalt-Jorn Y; Portugal, Sarah Dias; Fisser, Erwin M; Grolleman, Jorne J

    2011-05-01

    The aim of this study was to investigate if and how an artificially intelligent chat agent (chatbot) that answers questions about sex, drugs, and alcohol is used and evaluated by adolescents, especially in comparison with information lines and search engines. A sample of 929 adolescents (64% girls, mean age = 15), varying in urbanization level and educational level, participated in this study. Use of the chatbot was objectively tracked through server registrations (e.g., frequency and duration of conversations with the chatbot, the number and topics of queries), and a web-based questionnaire was used to evaluate the chatbot (e.g., the perception of anonymity, conciseness, ease of use, fun, quality and quantity of information, and speed) and to compare it with information lines and search engines. The chatbot reached high school attendees in general and not only adolescents with previous experience related to sex, drugs, or alcohol; this is promising from an informed decision-making point of view. Frequency (M = 11) and duration of conversations (3:57 minutes) was high and the chatbot was evaluated positively, especially in comparison with information lines and search engines. The use of chatbots within the field of health promotion has a large potential to reach a varied group of adolescents and to provide them with answers to their questions related to sex, drugs, and alcohol. Copyright © 2011 Society for Adolescent Health and Medicine. Published by Elsevier Inc. All rights reserved.

  12. Network-Physics (NP) BEC DIGITAL(#)-VULNERABILITY; ``Q-Computing"=Simple-Arithmetic;Modular-Congruences=SignalXNoise PRODUCTS=Clock-model;BEC-Factorization;RANDOM-# Definition;P=/=NP TRIVIAL Proof!!!

    NASA Astrophysics Data System (ADS)

    Pi, E. I.; Siegel, E.

    2010-03-01

    Siegel[AMS Natl.Mtg.(2002)-Abs.973-60-124] digits logarithmic- law inversion to ONLY BEQS BEC:Quanta/Bosons=#: EMP-like SEVERE VULNERABILITY of ONLY #-networks(VS.ANALOG INvulnerability) via Barabasi NP(VS.dynamics[Not.AMS(5/2009)] critique);(so called)``quantum-computing''(QC) = simple-arithmetic (sansdivision);algorithmiccomplexities:INtractibility/UNdecidabi lity/INefficiency/NONcomputability/HARDNESS(so MIScalled) ``noise''-induced-phase-transition(NIT)ACCELERATION:Cook-Levin theorem Reducibility = RG fixed-points; #-Randomness DEFINITION via WHAT? Query(VS. Goldreich[Not.AMS(2002)] How? mea culpa)= ONLY MBCS hot-plasma v #-clumping NON-random BEC; Modular-Arithmetic Congruences = Signal x Noise PRODUCTS = clock-model; NON-Shor[Physica A,341,586(04)]BEC logarithmic-law inversion factorization: Watkins #-theory U statistical- physics); P=/=NP C-S TRIVIAL Proof: Euclid!!! [(So Miscalled) computational-complexity J-O obviation(3 millennia AGO geometry: NO:CC,``CS'';``Feet of Clay!!!'']; Query WHAT?:Definition: (so MIScalled)``complexity''=UTTER-SIMPLICITY!! v COMPLICATEDNESS MEASURE(S).

  13. Interaction and Communication of Agents in Networks and Language Complexity Estimates

    NASA Technical Reports Server (NTRS)

    Smid, Jan; Obitko, Marek; Fisher, David; Truszkowski, Walt

    2004-01-01

    Knowledge acquisition and sharing are arguably the most critical activities of communicating agents. We report about our on-going project featuring knowledge acquisition and sharing among communicating agents embedded in a network. The applications we target range from hardware robots to virtual entities such as internet agents. Agent experiments can be simulated using a convenient simulation language. We analyzed the complexity of communicating agent simulations using Java and Easel. Scenarios we have studied are listed below. The communication among agents can range from declarative queries to sub-natural language queries. 1) A set of agents monitoring an object are asked to build activity profiles based on exchanging elementary observations; 2) A set of car drivers form a line, where every car is following its predecessor. An unsafe distance cm create a strong wave in the line. Individual agents are asked to incorporate and apply directions how to avoid the wave. 3) A set of micro-vehicles form a grid and are asked to propagate information and concepts to a central server.

  14. Easily configured real-time CPOE Pick Off Tool supporting focused clinical research and quality improvement.

    PubMed

    Rosenbaum, Benjamin P; Silkin, Nikolay; Miller, Randolph A

    2014-01-01

    Real-time alerting systems typically warn providers about abnormal laboratory results or medication interactions. For more complex tasks, institutions create site-wide 'data warehouses' to support quality audits and longitudinal research. Sophisticated systems like i2b2 or Stanford's STRIDE utilize data warehouses to identify cohorts for research and quality monitoring. However, substantial resources are required to install and maintain such systems. For more modest goals, an organization desiring merely to identify patients with 'isolation' orders, or to determine patients' eligibility for clinical trials, may adopt a simpler, limited approach based on processing the output of one clinical system, and not a data warehouse. We describe a limited, order-entry-based, real-time 'pick off' tool, utilizing public domain software (PHP, MySQL). Through a web interface the tool assists users in constructing complex order-related queries and auto-generates corresponding database queries that can be executed at recurring intervals. We describe successful application of the tool for research and quality monitoring.

  15. Access and use of the GUDMAP database of genitourinary development.

    PubMed

    Davies, Jamie A; Little, Melissa H; Aronow, Bruce; Armstrong, Jane; Brennan, Jane; Lloyd-MacGilp, Sue; Armit, Chris; Harding, Simon; Piu, Xinjun; Roochun, Yogmatee; Haggarty, Bernard; Houghton, Derek; Davidson, Duncan; Baldock, Richard

    2012-01-01

    The Genitourinary Development Molecular Atlas Project (GUDMAP) aims to document gene expression across time and space in the developing urogenital system of the mouse, and to provide access to a variety of relevant practical and educational resources. Data come from microarray gene expression profiling (from laser-dissected and FACS-sorted samples) and in situ hybridization at both low (whole-mount) and high (section) resolutions. Data are annotated to a published, high-resolution anatomical ontology and can be accessed using a variety of search interfaces. Here, we explain how to run typical queries on the database, by gene or anatomical location, how to view data, how to perform complex queries, and how to submit data.

  16. Secure searching of biomarkers through hybrid homomorphic encryption scheme.

    PubMed

    Kim, Miran; Song, Yongsoo; Cheon, Jung Hee

    2017-07-26

    As genome sequencing technology develops rapidly, there has lately been an increasing need to keep genomic data secure even when stored in the cloud and still used for research. We are interested in designing a protocol for the secure outsourcing matching problem on encrypted data. We propose an efficient method to securely search a matching position with the query data and extract some information at the position. After decryption, only a small amount of comparisons with the query information should be performed in plaintext state. We apply this method to find a set of biomarkers in encrypted genomes. The important feature of our method is to encode a genomic database as a single element of polynomial ring. Since our method requires a single homomorphic multiplication of hybrid scheme for query computation, it has the advantage over the previous methods in parameter size, computation complexity, and communication cost. In particular, the extraction procedure not only prevents leakage of database information that has not been queried by user but also reduces the communication cost by half. We evaluate the performance of our method and verify that the computation on large-scale personal data can be securely and practically outsourced to a cloud environment during data analysis. It takes about 3.9 s to search-and-extract the reference and alternate sequences at the queried position in a database of size 4M. Our solution for finding a set of biomarkers in DNA sequences shows the progress of cryptographic techniques in terms of their capability can support real-world genome data analysis in a cloud environment.

  17. Semi-automatic semantic annotation of PubMed Queries: a study on quality, efficiency, satisfaction

    PubMed Central

    Névéol, Aurélie; Islamaj-Doğan, Rezarta; Lu, Zhiyong

    2010-01-01

    Information processing algorithms require significant amounts of annotated data for training and testing. The availability of such data is often hindered by the complexity and high cost of production. In this paper, we investigate the benefits of a state-of-the-art tool to help with the semantic annotation of a large set of biomedical information queries. Seven annotators were recruited to annotate a set of 10,000 PubMed® queries with 16 biomedical and bibliographic categories. About half of the queries were annotated from scratch, while the other half were automatically pre-annotated and manually corrected. The impact of the automatic pre-annotations was assessed on several aspects of the task: time, number of actions, annotator satisfaction, inter-annotator agreement, quality and number of the resulting annotations. The analysis of annotation results showed that the number of required hand annotations is 28.9% less when using pre-annotated results from automatic tools. As a result, the overall annotation time was substantially lower when pre-annotations were used, while inter-annotator agreement was significantly higher. In addition, there was no statistically significant difference in the semantic distribution or number of annotations produced when pre-annotations were used. The annotated query corpus is freely available to the research community. This study shows that automatic pre-annotations are found helpful by most annotators. Our experience suggests using an automatic tool to assist large-scale manual annotation projects. This helps speed-up the annotation time and improve annotation consistency while maintaining high quality of the final annotations. PMID:21094696

  18. EURISWEB – Web-based epidemiological surveillance of antibiotic-resistant pneumococci in Day Care Centers

    PubMed Central

    Silva, Sara; Gouveia-Oliveira, Rodrigo; Maretzek, António; Carriço, João; Gudnason, Thorolfur; Kristinsson, Karl G; Ekdahl, Karl; Brito-Avô, António; Tomasz, Alexander; Sanches, Ilda Santos; Lencastre, Hermínia de; Almeida, Jonas

    2003-01-01

    Background EURIS (European Resistance Intervention Study) was launched as a multinational study in September of 2000 to identify the multitude of complex risk factors that contribute to the high carriage rate of drug resistant Streptococcus pneumoniae strains in children attending Day Care Centers in several European countries. Access to the very large number of data required the development of a web-based infrastructure – EURISWEB – that includes a relational online database, coupled with a query system for data retrieval, and allows integrative storage of demographic, clinical and molecular biology data generated in EURIS. Methods All components of the system were developed using open source programming tools: data storage management was supported by PostgreSQL, and the hypertext preprocessor to generate the web pages was implemented using PHP. The query system is based on a software agent running in the background specifically developed for EURIS. Results The website currently contains data related to 13,500 nasopharyngeal samples and over one million measures taken from 5,250 individual children, as well as over one thousand pre-made and user-made queries aggregated into several reports, approximately. It is presently in use by participating researchers from three countries (Iceland, Portugal and Sweden). Conclusion An operational model centered on a PHP engine builds the interface between the user and the database automatically, allowing an easy maintenance of the system. The query system is also sufficiently adaptable to allow the integration of several advanced data analysis procedures far more demanding than simple queries, eventually including artificial intelligence predictive models. PMID:12846930

  19. Toward An Unstructured Mesh Database

    NASA Astrophysics Data System (ADS)

    Rezaei Mahdiraji, Alireza; Baumann, Peter Peter

    2014-05-01

    Unstructured meshes are used in several application domains such as earth sciences (e.g., seismology), medicine, oceanography, cli- mate modeling, GIS as approximate representations of physical objects. Meshes subdivide a domain into smaller geometric elements (called cells) which are glued together by incidence relationships. The subdivision of a domain allows computational manipulation of complicated physical structures. For instance, seismologists model earthquakes using elastic wave propagation solvers on hexahedral meshes. The hexahedral con- tains several hundred millions of grid points and millions of hexahedral cells. Each vertex node in the hexahedrals stores a multitude of data fields. To run simulation on such meshes, one needs to iterate over all the cells, iterate over incident cells to a given cell, retrieve coordinates of cells, assign data values to cells, etc. Although meshes are used in many application domains, to the best of our knowledge there is no database vendor that support unstructured mesh features. Currently, the main tool for querying and manipulating unstructured meshes are mesh libraries, e.g., CGAL and GRAL. Mesh li- braries are dedicated libraries which includes mesh algorithms and can be run on mesh representations. The libraries do not scale with dataset size, do not have declarative query language, and need deep C++ knowledge for query implementations. Furthermore, due to high coupling between the implementations and input file structure, the implementations are less reusable and costly to maintain. A dedicated mesh database offers the following advantages: 1) declarative querying, 2) ease of maintenance, 3) hiding mesh storage structure from applications, and 4) transparent query optimization. To design a mesh database, the first challenge is to define a suitable generic data model for unstructured meshes. We proposed ImG-Complexes data model as a generic topological mesh data model which extends incidence graph model to multi-incidence relationships. We instrument ImG model with sets of optional and application-specific constraints which can be used to check validity of meshes for a specific class of object such as manifold, pseudo-manifold, and simplicial manifold. We conducted experiments to measure the performance of the graph database solution in processing mesh queries and compare it with GrAL mesh library and PostgreSQL database on synthetic and real mesh datasets. The experiments show that each system perform well on specific types of mesh queries, e.g., graph databases perform well on global path-intensive queries. In the future, we investigate database operations for the ImG model and design a mesh query language.

  20. A Web 2.0 Application for Executing Queries and Services on Climatic Data

    NASA Astrophysics Data System (ADS)

    Abad-Mota, S.; Ruckhaus, E.; Garboza, A.; Tepedino, G.

    2007-12-01

    For many years countries have collected data in order to understand climate, to study its effect in living species, and to predict future behavior. Nowadays, terabytes of data are collected by governmental agencies and academic institutions and the current challenge is how to provide appropriate access to this vast amount of climatic data. Each country has a different situation with respect to the collection and use of these data. In particular, in Venezuela, a few institutions have systematically gathered observational and hidrology data, but the data are mostly registered in non-digital media which have been lost or have deteriorated over the years; all of this restricts data availability. In 2006 a joint project between two major venezuelan universities, Universidad Simón Bolívar (USB) and Universidad Central de Venezuela (UCV) was initiated. The goal of the project is to develop a digital repository of the country's climatic and hidrology data, and to build an application that provides querying and service execution capabilities over these data. The repository has been conceptually modeled as a database, which integrates observational data and metadata. Among the metadata we have an inventory of all the stations where data has been collected, and the description of the measurements themselves, for instance, the instruments used for the collection, the time granularity of the measurements, and their units of measure. The resulting data model combines traditional entity relationship concepts with star and snowflake schemas from datawarehouses. The model allows the inclusion of historic or current data, and each kind of data requires a different loading process. A special emphasis has been given to the representation of the quality of the data stored in the repository. Quality attributes can be attached to each individual value or to sets of values; these attributes can represent statistical or semantic quality of the data. Values can be stored at any level of aggregation, hourly, daily, monthly, so that they can be provided to the user at the desired level. This means that additional caution has to be exercised in query answering, in order to distinguish between primary and derived data. On the other hand, a Web 2.0 application is being designed to provide a front-end to the repository. This design focuses on two important aspects: the use of metadata structures, and the definition of collaborative Web 2.0 features that can be integrated to a project of this nature. Metadata descriptors include for a set of measurements, its quality, granularity and other dimension information. With these descriptors it is possible to establish relationships between different sets of measurements and provide scientists with efficient searching mechanisms that determine the related sets of measurements that contribute to a query answer. Unlike traditional applications for climatic data, our approach not only satisfies requirements of researchers specialized in this domain, but also those of anyone interested in this area; one of the objectives is to build an informal knowledge base that can be improved and consolidated with the usage of the system.

  1. Subliminal perception of complex visual stimuli.

    PubMed

    Ionescu, Mihai Radu

    2016-01-01

    Rationale: Unconscious perception of various sensory modalities is an active subject of research though its function and effect on behavior is uncertain. Objective: The present study tried to assess if unconscious visual perception could occur with more complex visual stimuli than previously utilized. Methods and Results: Videos containing slideshows of indifferent complex images with interspersed frames of interest of various durations were presented to 24 healthy volunteers. The perception of the stimulus was evaluated with a forced-choice questionnaire while awareness was quantified by self-assessment with a modified awareness scale annexed to each question with 4 categories of awareness. At values of 16.66 ms of stimulus duration, conscious awareness was not possible and answers regarding the stimulus were random. At 50 ms, nonrandom answers were coupled with no self-reported awareness suggesting unconscious perception of the stimulus. At larger durations of stimulus presentation, significantly correct answers were coupled with a certain conscious awareness. Discussion: At values of 50 ms, unconscious perception is possible even with complex visual stimuli. Further studies are recommended with a focus on a range of interest of stimulus duration between 50 to 16.66 ms.

  2. Graphical modeling and query language for hospitals.

    PubMed

    Barzdins, Janis; Barzdins, Juris; Rencis, Edgars; Sostaks, Agris

    2013-01-01

    So far there has been little evidence that implementation of the health information technologies (HIT) is leading to health care cost savings. One of the reasons for this lack of impact by the HIT likely lies in the complexity of the business process ownership in the hospitals. The goal of our research is to develop a business model-based method for hospital use which would allow doctors to retrieve directly the ad-hoc information from various hospital databases. We have developed a special domain-specific process modelling language called the MedMod. Formally, we define the MedMod language as a profile on UML Class diagrams, but we also demonstrate it on examples, where we explain the semantics of all its elements informally. Moreover, we have developed the Process Query Language (PQL) that is based on MedMod process definition language. The purpose of PQL is to allow a doctor querying (filtering) runtime data of hospital's processes described using MedMod. The MedMod language tries to overcome deficiencies in existing process modeling languages, allowing to specify the loosely-defined sequence of the steps to be performed in the clinical process. The main advantages of PQL are in two main areas - usability and efficiency. They are: 1) the view on data through "glasses" of familiar process, 2) the simple and easy-to-perceive means of setting filtering conditions require no more expertise than using spreadsheet applications, 3) the dynamic response to each step in construction of the complete query that shortens the learning curve greatly and reduces the error rate, and 4) the selected means of filtering and data retrieving allows to execute queries in O(n) time regarding the size of the dataset. We are about to continue developing this project with three further steps. First, we are planning to develop user-friendly graphical editors for the MedMod process modeling and query languages. The second step is to do evaluation of usability the proposed language and tool involving the physicians from several hospitals in Latvia and working with real data from these hospitals. Our third step is to develop an efficient implementation of the query language.

  3. ODG: Omics database generator - a tool for generating, querying, and analyzing multi-omics comparative databases to facilitate biological understanding.

    PubMed

    Guhlin, Joseph; Silverstein, Kevin A T; Zhou, Peng; Tiffin, Peter; Young, Nevin D

    2017-08-10

    Rapid generation of omics data in recent years have resulted in vast amounts of disconnected datasets without systemic integration and knowledge building, while individual groups have made customized, annotated datasets available on the web with few ways to link them to in-lab datasets. With so many research groups generating their own data, the ability to relate it to the larger genomic and comparative genomic context is becoming increasingly crucial to make full use of the data. The Omics Database Generator (ODG) allows users to create customized databases that utilize published genomics data integrated with experimental data which can be queried using a flexible graph database. When provided with omics and experimental data, ODG will create a comparative, multi-dimensional graph database. ODG can import definitions and annotations from other sources such as InterProScan, the Gene Ontology, ENZYME, UniPathway, and others. This annotation data can be especially useful for studying new or understudied species for which transcripts have only been predicted, and rapidly give additional layers of annotation to predicted genes. In better studied species, ODG can perform syntenic annotation translations or rapidly identify characteristics of a set of genes or nucleotide locations, such as hits from an association study. ODG provides a web-based user-interface for configuring the data import and for querying the database. Queries can also be run from the command-line and the database can be queried directly through programming language hooks available for most languages. ODG supports most common genomic formats as well as generic, easy to use tab-separated value format for user-provided annotations. ODG is a user-friendly database generation and query tool that adapts to the supplied data to produce a comparative genomic database or multi-layered annotation database. ODG provides rapid comparative genomic annotation and is therefore particularly useful for non-model or understudied species. For species for which more data are available, ODG can be used to conduct complex multi-omics, pattern-matching queries.

  4. Titanbrowse: a new paradigm for access, visualization and analysis of hyperspectral imaging

    NASA Astrophysics Data System (ADS)

    Penteado, Paulo F.

    2016-10-01

    Currently there are archives and tools to explore remote sensing imaging, but these lack some functionality needed for hyperspectral imagers: 1) Querying and serving only whole datacubes is not enough, since in each cube there is typically a large variation in observation geometry over the spatial pixels. Thus, often the most useful unit for selecting observations of interest is not a whole cube but rather a single spectrum. 2) Pixel-specific geometric data included in the standard pipelines is calculated at only one point per pixel. Particularly for selections of pixels from many different cubes, or observations near the limb, it is necessary to know the actual extent of each pixel. 3) Database queries need not only metadata, but also by the spectral data. For instance, one query might look for atypical values of some band, or atypical relations between bands, denoting spectral features (such as ratios or differences between bands). 4) There is the need to evaluate arbitrary, dynamically-defined, complex functions of the data (beyond just simple arithmetic operations), both for selection in the queries, and for visualization, to interactively tune the queries to the observations of interest. 5) Making the most useful query for some analysis often requires interactive visualization integrated with data selection and processing, because the user needs to explore how different functions of the data vary over the observations without having to download data and import it into visualization software. 6) Complementary to interactive use, an API allowing programmatic access to the system is needed for systematic data analyses. 7) Direct access to calibrated and georeferenced data, without the need to download data and software and learn to process it.We present titanbrowse, a database, exploration and visualization system for Cassini VIMS observations of Titan, designed to fullfill the aforementioned needs. While it originallly ran on data in the user's computer, we are now developing an online version, so that users do not need to download software and data. The server, which we maintain, processes the queries and communicates the results to the client the user runs. http://ppenteado.net/titanbrowse.

  5. Mystery #25 Answer

    Atmospheric Science Data Center

    2013-04-22

    ... to the northeast of this region, less than a week away by car. Answer: TRUE. Table Mountain is not far from Victoria Falls, a ... For those who want a more relaxing experience, a cable car can take visitors to the top where they will find a restaurant complex. ...

  6. Rapid Deployment of a RESTful Service for Oceanographic Research Cruises

    NASA Astrophysics Data System (ADS)

    Fu, Linyun; Arko, Robert; Leadbetter, Adam

    2014-05-01

    The Ocean Data Interoperability Platform (ODIP) seeks to increase data sharing across scientific domains and international boundaries, by providing a forum to harmonize diverse regional data systems. ODIP participants from the US include the Rolling Deck to Repository (R2R) program, whose mission is to capture, catalog, and describe the underway/environmental sensor data from US oceanographic research vessels and submit the data to public long-term archives. R2R publishes information online as Linked Open Data, making it widely available using Semantic Web standards. Each vessel, sensor, cruise, dataset, person, organization, funding award, log, report, etc, has a Uniform Resource Identifier (URI). Complex queries that federate results from other data providers are supported, using the SPARQL query language. To facilitate interoperability, R2R uses controlled vocabularies developed collaboratively by the science community (eg. SeaDataNet device categories) and published online by the NERC Vocabulary Server (NVS). In response to user feedback, we are developing a standard programming interface (API) and Web portal for R2R's Linked Open Data. The API provides a set of simple REST-type URLs that are translated on-the-fly into SPARQL queries, and supports common output formats (eg. JSON). We will demonstrate an implementation based on the Epimorphics Linked Data API (ELDA) open-source Java package. Our experience shows that constructing a simple portal with limited schema elements in this way can significantly reduce development time and maintenance complexity.

  7. Visual exploration of big spatio-temporal urban data: a study of New York City taxi trips.

    PubMed

    Ferreira, Nivan; Poco, Jorge; Vo, Huy T; Freire, Juliana; Silva, Cláudio T

    2013-12-01

    As increasing volumes of urban data are captured and become available, new opportunities arise for data-driven analysis that can lead to improvements in the lives of citizens through evidence-based decision making and policies. In this paper, we focus on a particularly important urban data set: taxi trips. Taxis are valuable sensors and information associated with taxi trips can provide unprecedented insight into many different aspects of city life, from economic activity and human behavior to mobility patterns. But analyzing these data presents many challenges. The data are complex, containing geographical and temporal components in addition to multiple variables associated with each trip. Consequently, it is hard to specify exploratory queries and to perform comparative analyses (e.g., compare different regions over time). This problem is compounded due to the size of the data-there are on average 500,000 taxi trips each day in NYC. We propose a new model that allows users to visually query taxi trips. Besides standard analytics queries, the model supports origin-destination queries that enable the study of mobility across the city. We show that this model is able to express a wide range of spatio-temporal queries, and it is also flexible in that not only can queries be composed but also different aggregations and visual representations can be applied, allowing users to explore and compare results. We have built a scalable system that implements this model which supports interactive response times; makes use of an adaptive level-of-detail rendering strategy to generate clutter-free visualization for large results; and shows hidden details to the users in a summary through the use of overlay heat maps. We present a series of case studies motivated by traffic engineers and economists that show how our model and system enable domain experts to perform tasks that were previously unattainable for them.

  8. Query-by-example surgical activity detection.

    PubMed

    Gao, Yixin; Vedula, S Swaroop; Lee, Gyusung I; Lee, Mija R; Khudanpur, Sanjeev; Hager, Gregory D

    2016-06-01

    Easy acquisition of surgical data opens many opportunities to automate skill evaluation and teaching. Current technology to search tool motion data for surgical activity segments of interest is limited by the need for manual pre-processing, which can be prohibitive at scale. We developed a content-based information retrieval method, query-by-example (QBE), to automatically detect activity segments within surgical data recordings of long duration that match a query. The example segment of interest (query) and the surgical data recording (target trial) are time series of kinematics. Our approach includes an unsupervised feature learning module using a stacked denoising autoencoder (SDAE), two scoring modules based on asymmetric subsequence dynamic time warping (AS-DTW) and template matching, respectively, and a detection module. A distance matrix of the query against the trial is computed using the SDAE features, followed by AS-DTW combined with template scoring, to generate a ranked list of candidate subsequences (substrings). To evaluate the quality of the ranked list against the ground-truth, thresholding conventional DTW distances and bipartite matching are applied. We computed the recall, precision, F1-score, and a Jaccard index-based score on three experimental setups. We evaluated our QBE method using a suture throw maneuver as the query, on two tool motion datasets (JIGSAWS and MISTIC-SL) captured in a training laboratory. We observed a recall of 93, 90 and 87 % and a precision of 93, 91, and 88 % with same surgeon same trial (SSST), same surgeon different trial (SSDT) and different surgeon (DS) experiment setups on JIGSAWS, and a recall of 87, 81 and 75 % and a precision of 72, 61, and 53 % with SSST, SSDT and DS experiment setups on MISTIC-SL, respectively. We developed a novel, content-based information retrieval method to automatically detect multiple instances of an activity within long surgical recordings. Our method demonstrated adequate recall across different complexity datasets and experimental conditions.

  9. Automation and integration of components for generalized semantic markup of electronic medical texts.

    PubMed Central

    Dugan, J. M.; Berrios, D. C.; Liu, X.; Kim, D. K.; Kaizer, H.; Fagan, L. M.

    1999-01-01

    Our group has built an information retrieval system based on a complex semantic markup of medical textbooks. We describe the construction of a set of web-based knowledge-acquisition tools that expedites the collection and maintenance of the concepts required for text markup and the search interface required for information retrieval from the marked text. In the text markup system, domain experts (DEs) identify sections of text that contain one or more elements from a finite set of concepts. End users can then query the text using a predefined set of questions, each of which identifies a subset of complementary concepts. The search process matches that subset of concepts to relevant points in the text. The current process requires that the DE invest significant time to generate the required concepts and questions. We propose a new system--called ACQUIRE (Acquisition of Concepts and Queries in an Integrated Retrieval Environment)--that assists a DE in two essential tasks in the text-markup process. First, it helps her to develop, edit, and maintain the concept model: the set of concepts with which she marks the text. Second, ACQUIRE helps her to develop a query model: the set of specific questions that end users can later use to search the marked text. The DE incorporates concepts from the concept model when she creates the questions in the query model. The major benefit of the ACQUIRE system is a reduction in the time and effort required for the text-markup process. We compared the process of concept- and query-model creation using ACQUIRE to the process used in previous work by rebuilding two existing models that we previously constructed manually. We observed a significant decrease in the time required to build and maintain the concept and query models. Images Figure 1 Figure 2 Figure 4 Figure 5 PMID:10566457

  10. Answer Sets in a Fuzzy Equilibrium Logic

    NASA Astrophysics Data System (ADS)

    Schockaert, Steven; Janssen, Jeroen; Vermeir, Dirk; de Cock, Martine

    Since its introduction, answer set programming has been generalized in many directions, to cater to the needs of real-world applications. As one of the most general “classical” approaches, answer sets of arbitrary propositional theories can be defined as models in the equilibrium logic of Pearce. Fuzzy answer set programming, on the other hand, extends answer set programming with the capability of modeling continuous systems. In this paper, we combine the expressiveness of both approaches, and define answer sets of arbitrary fuzzy propositional theories as models in a fuzzification of equilibrium logic. We show that the resulting notion of answer set is compatible with existing definitions, when the syntactic restrictions of the corresponding approaches are met. We furthermore locate the complexity of the main reasoning tasks at the second level of the polynomial hierarchy. Finally, as an illustration of its modeling power, we show how fuzzy equilibrium logic can be used to find strong Nash equilibria.

  11. Thematic video indexing to support video database retrieval and query processing

    NASA Astrophysics Data System (ADS)

    Khoja, Shakeel A.; Hall, Wendy

    1999-08-01

    This paper presents a novel video database system, which caters for complex and long videos, such as documentaries, educational videos, etc. As compared to relatively structured format videos like CNN news or commercial advertisements, this database system has the capacity to work with long and unstructured videos.

  12. The Efficacy of Multidimensional Constraint Keys in Database Query Performance

    ERIC Educational Resources Information Center

    Cardwell, Leslie K.

    2012-01-01

    This work is intended to introduce a database design method to resolve the two-dimensional complexities inherent in the relational data model and its resulting performance challenges through abstract multidimensional constructs. A multidimensional constraint is derived and utilized to implement an indexed Multidimensional Key (MK) to abstract a…

  13. Query3d: a new method for high-throughput analysis of functional residues in protein structures.

    PubMed

    Ausiello, Gabriele; Via, Allegra; Helmer-Citterich, Manuela

    2005-12-01

    The identification of local similarities between two protein structures can provide clues of a common function. Many different methods exist for searching for similar subsets of residues in proteins of known structure. However, the lack of functional and structural information on single residues, together with the low level of integration of this information in comparison methods, is a limitation that prevents these methods from being fully exploited in high-throughput analyses. Here we describe Query3d, a program that is both a structural DBMS (Database Management System) and a local comparison method. The method conserves a copy of all the residues of the Protein Data Bank annotated with a variety of functional and structural information. New annotations can be easily added from a variety of methods and known databases. The algorithm makes it possible to create complex queries based on the residues' function and then to compare only subsets of the selected residues. Functional information is also essential to speed up the comparison and the analysis of the results. With Query3d, users can easily obtain statistics on how many and which residues share certain properties in all proteins of known structure. At the same time, the method also finds their structural neighbours in the whole PDB. Programs and data can be accessed through the PdbFun web interface.

  14. Query3d: a new method for high-throughput analysis of functional residues in protein structures

    PubMed Central

    Ausiello, Gabriele; Via, Allegra; Helmer-Citterich, Manuela

    2005-01-01

    Background The identification of local similarities between two protein structures can provide clues of a common function. Many different methods exist for searching for similar subsets of residues in proteins of known structure. However, the lack of functional and structural information on single residues, together with the low level of integration of this information in comparison methods, is a limitation that prevents these methods from being fully exploited in high-throughput analyses. Results Here we describe Query3d, a program that is both a structural DBMS (Database Management System) and a local comparison method. The method conserves a copy of all the residues of the Protein Data Bank annotated with a variety of functional and structural information. New annotations can be easily added from a variety of methods and known databases. The algorithm makes it possible to create complex queries based on the residues' function and then to compare only subsets of the selected residues. Functional information is also essential to speed up the comparison and the analysis of the results. Conclusion With Query3d, users can easily obtain statistics on how many and which residues share certain properties in all proteins of known structure. At the same time, the method also finds their structural neighbours in the whole PDB. Programs and data can be accessed through the PdbFun web interface. PMID:16351754

  15. Using Web Ontology Language to Integrate Heterogeneous Databases in the Neurosciences

    PubMed Central

    Lam, Hugo Y.K.; Marenco, Luis; Shepherd, Gordon M.; Miller, Perry L.; Cheung, Kei-Hoi

    2006-01-01

    Integrative neuroscience involves the integration and analysis of diverse types of neuroscience data involving many different experimental techniques. This data will increasingly be distributed across many heterogeneous databases that are web-accessible. Currently, these databases do not expose their schemas (database structures) and their contents to web applications/agents in a standardized, machine-friendly way. This limits database interoperation. To address this problem, we describe a pilot project that illustrates how neuroscience databases can be expressed using the Web Ontology Language, which is a semantically-rich ontological language, as a common data representation language to facilitate complex cross-database queries. In this pilot project, an existing tool called “D2RQ” was used to translate two neuroscience databases (NeuronDB and CoCoDat) into OWL, and the resulting OWL ontologies were then merged. An OWL-based reasoner (Racer) was then used to provide a sophisticated query language (nRQL) to perform integrated queries across the two databases based on the merged ontology. This pilot project is one step toward exploring the use of semantic web technologies in the neurosciences. PMID:17238384

  16. Pharmit: interactive exploration of chemical space.

    PubMed

    Sunseri, Jocelyn; Koes, David Ryan

    2016-07-08

    Pharmit (http://pharmit.csb.pitt.edu) provides an online, interactive environment for the virtual screening of large compound databases using pharmacophores, molecular shape and energy minimization. Users can import, create and edit virtual screening queries in an interactive browser-based interface. Queries are specified in terms of a pharmacophore, a spatial arrangement of the essential features of an interaction, and molecular shape. Search results can be further ranked and filtered using energy minimization. In addition to a number of pre-built databases of popular compound libraries, users may submit their own compound libraries for screening. Pharmit uses state-of-the-art sub-linear algorithms to provide interactive screening of millions of compounds. Queries typically take a few seconds to a few minutes depending on their complexity. This allows users to iteratively refine their search during a single session. The easy access to large chemical datasets provided by Pharmit simplifies and accelerates structure-based drug design. Pharmit is available under a dual BSD/GPL open-source license. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  17. A parallel data management system for large-scale NASA datasets

    NASA Technical Reports Server (NTRS)

    Srivastava, Jaideep

    1993-01-01

    The past decade has experienced a phenomenal growth in the amount of data and resultant information generated by NASA's operations and research projects. A key application is the reprocessing problem which has been identified to require data management capabilities beyond those available today (PRAT93). The Intelligent Information Fusion (IIF) system (ROEL91) is an ongoing NASA project which has similar requirements. Deriving our understanding of NASA's future data management needs based on the above, this paper describes an approach to using parallel computer systems (processor and I/O architectures) to develop an efficient parallel database management system to address the needs. Specifically, we propose to investigate issues in low-level record organizations and management, complex query processing, and query compilation and scheduling.

  18. East-China Geochemistry Database (ECGD):A New Networking Database for North China Craton

    NASA Astrophysics Data System (ADS)

    Wang, X.; Ma, W.

    2010-12-01

    North China Craton is one of the best natural laboratories that research some Earth Dynamic questions[1]. Scientists made much progress in research on this area, and got vast geochemistry data, which are essential for answering many fundamental questions about the age, composition, structure, and evolution of the East China area. But the geochemical data have long been accessible only through the scientific literature and theses where they have been widely dispersed, making it difficult for the broad Geosciences community to find, access and efficiently use the full range of available data[2]. How to effectively store, manage, share and reuse the existing geochemical data in the North China Craton area? East-China Geochemistry Database(ECGD) is a networking geochemical scientific database system that has been designed based on WebGIS and relational database for the structured storage and retrieval of geochemical data and geological map information. It is integrated the functions of data retrieval, spatial visualization and online analysis. ECGD focus on three areas: 1.Storage and retrieval of geochemical data and geological map information. Research on the characters of geochemical data, including its composing and connecting of each other, we designed a relational database, which based on geochemical relational data model, to store a variety of geological sample information such as sampling locality, age, sample characteristics, reference, major elements, rare earth elements, trace elements and isotope system et al. And a web-based user-friendly interface is provided for constructing queries. 2.Data view. ECGD is committed to online data visualization by different ways, especially to view data in digital map with dynamic way. Because ECGD was integrated WebGIS technology, the query results can be mapped on digital map, which can be zoomed, translation and dot selection. Besides of view and output query results data by html, txt or xls formats, researchers also can generate classification thematic maps using query results, according different parameters. 3.Data analysis on-line. Here we designed lots of geochemical online analysis tools, including geochemical diagrams, CIPW computing, and so on, which allows researchers to analyze query data without download query results. Operation of all these analysis tools is very easy; users just do it by click mouse one or two time. In summary, ECGD provide a geochemical platform for researchers, whom to know where various data are, to view various data in a synthetic and dynamic way, and analyze interested data online. REFERENCES [1] S. Gao, R.L. Rudnick, and W.L. Xu, “Recycling deep cratonic lithosphere and generation of intraplate magmatism in the North China Craton,” Earth and Planetary Science Letters,270,41-53,2008. [2] K.A. Lehnert, U. Harms, and E. Ito, “Promises, Achievements, and Challenges of Networking Global Geoinformatics Resources - Experiences of GeosciNET and EarthChem,” Geophysical Research Abstracts, Vol.10, EGU2008-A-05242,2008.

  19. Efficient view based 3-D object retrieval using Hidden Markov Model

    NASA Astrophysics Data System (ADS)

    Jain, Yogendra Kumar; Singh, Roshan Kumar

    2013-12-01

    Recent research effort has been dedicated to view based 3-D object retrieval, because of highly discriminative property of 3-D object and has multi view representation. The state-of-art method is highly depending on their own camera array setting for capturing views of 3-D object and use complex Zernike descriptor, HAC for representative view selection which limit their practical application and make it inefficient for retrieval. Therefore, an efficient and effective algorithm is required for 3-D Object Retrieval. In order to move toward a general framework for efficient 3-D object retrieval which is independent of camera array setting and avoidance of representative view selection, we propose an Efficient View Based 3-D Object Retrieval (EVBOR) method using Hidden Markov Model (HMM). In this framework, each object is represented by independent set of view, which means views are captured from any direction without any camera array restriction. In this, views are clustered (including query view) to generate the view cluster, which is then used to build the query model with HMM. In our proposed method, HMM is used in twofold: in the training (i.e. HMM estimate) and in the retrieval (i.e. HMM decode). The query model is trained by using these view clusters. The EVBOR query model is worked on the basis of query model combining with HMM. The proposed approach remove statically camera array setting for view capturing and can be apply for any 3-D object database to retrieve 3-D object efficiently and effectively. Experimental results demonstrate that the proposed scheme has shown better performance than existing methods. [Figure not available: see fulltext.

  20. An evaluation of performance by older persons on a simulated telecommuting task.

    PubMed

    Sharit, Joseph; Czaja, Sara J; Hernandez, Mario; Yang, Yulong; Perdomo, Dolores; Lewis, John E; Lee, Chin Chin; Nair, Sankaran

    2004-11-01

    Telecommuting work represents a strategy for managing the growing number of older people in the workforce. This study involved a simulated customer service telecommuting task that used e-mail to answer customer queries about media-related products and company policies. Participants included 27 "younger" older adults (50-65 years) and 25 "older" older adults (66-80 years). The participants performed the task for two 2-hr sessions a day over 4 consecutive days. Although both age groups showed significant improvement across sessions on many of the performance criteria, in general the improvements were more marked for the older age-group participants. However, the participants from both age groups had difficulty meeting some of the task performance requirements. These results are discussed in terms of training strategies for older workers.

  1. The Knowledge Structure in Amarakośa

    NASA Astrophysics Data System (ADS)

    Nair, Sivaja S.; Kulkarni, Amba

    Amarakośa is the most celebrated and authoritative ancient thesaurus of Sanskrit. It is one of the books which an Indian child learning through Indian traditional educational system memorizes as early as his first year of formal learning. Though it appears as a linear list of words, close inspection of it shows a rich organisation of words expressing various relations a word bears with other words. Thus when a child studies Amarakośa further, the linear list of words unfolds into a knowledge web. In this paper we describe our effort to make the implicit knowledge in Amarakośa explicit. A model for storing such structure is discussed and a web tool is described that answers the queries by reconstructing the links among words from the structured tables dynamically.

  2. CNTRO: A Semantic Web Ontology for Temporal Relation Inferencing in Clinical Narratives.

    PubMed

    Tao, Cui; Wei, Wei-Qi; Solbrig, Harold R; Savova, Guergana; Chute, Christopher G

    2010-11-13

    Using Semantic-Web specifications to represent temporal information in clinical narratives is an important step for temporal reasoning and answering time-oriented queries. Existing temporal models are either not compatible with the powerful reasoning tools developed for the Semantic Web, or designed only for structured clinical data and therefore are not ready to be applied on natural-language-based clinical narrative reports directly. We have developed a Semantic-Web ontology which is called Clinical Narrative Temporal Relation ontology. Using this ontology, temporal information in clinical narratives can be represented as RDF (Resource Description Framework) triples. More temporal information and relations can then be inferred by Semantic-Web based reasoning tools. Experimental results show that this ontology can represent temporal information in real clinical narratives successfully.

  3. Answers to Essential Questions about Standards, Assessments, Grading, & Reporting

    ERIC Educational Resources Information Center

    Guskey, Thomas R.; Jung, Lee Ann

    2013-01-01

    How do assessments for learning differ from assessments of learning? What is the purpose of grading? After nearly two decades of immersion in standards-based curriculua and instruction, our nation's educators are often still confounded by the (admittedly complex) landscape of standards, assessment, and reporting. In "Answers to Essential…

  4. What do people search online concerning the "elusive" fibromyalgia? Insights from a qualitative and quantitative analysis of Google Trends.

    PubMed

    Bragazzi, Nicola Luigi; Amital, Howard; Adawi, Mohammad; Brigo, Francesco; Watad, Samaa; Aljadeff, Gali; Amital, Daniela; Watad, Abdulla

    2017-08-01

    Fibromyalgia is a chronic disease, characterized by pain, fatigue, and poor sleep quality. Patients and mainly those with chronic diseases tend to search for health-related material online. Google Trends (GT), an online tracking system of Internet hit-search volumes that recently merged with its sister project Google Insights for Search (Google Inc.), was used to explore Internet activity related to fibromyalgia. Digital interest in fibromyalgia and related topics searched worldwide has been reported in the last 13 years. A slight decline in this interest has been observed through the years, remaining stable in the last 5 years. Fibromyalgia web behavior exhibited a regular, cyclic pattern, even though no seasonality could be detected. Similar findings have been reported among rheumatoid arthritis and depression. However, differently from rheumatoid arthritis and depression, the focus of the fibromyalgia-related queries was more concentrated on drug side effects and the "elusive" nature of fibromyalgia: is it a real or imaginary condition? Does it really exist or is it all in your head? A tremendous amount of information on fibromyalgia and related topics exist online. Still many queries have been raised and repeated constantly by fibromyalgia patients in the last 13 years. Therefore, physicians should be aware of the common concerns of people or patients regarding fibromyalgia in order to give a proper answers and education.

  5. PolySearch2: a significantly improved text-mining system for discovering associations between human diseases, genes, drugs, metabolites, toxins and more.

    PubMed

    Liu, Yifeng; Liang, Yongjie; Wishart, David

    2015-07-01

    PolySearch2 (http://polysearch.ca) is an online text-mining system for identifying relationships between biomedical entities such as human diseases, genes, SNPs, proteins, drugs, metabolites, toxins, metabolic pathways, organs, tissues, subcellular organelles, positive health effects, negative health effects, drug actions, Gene Ontology terms, MeSH terms, ICD-10 medical codes, biological taxonomies and chemical taxonomies. PolySearch2 supports a generalized 'Given X, find all associated Ys' query, where X and Y can be selected from the aforementioned biomedical entities. An example query might be: 'Find all diseases associated with Bisphenol A'. To find its answers, PolySearch2 searches for associations against comprehensive collections of free-text collections, including local versions of MEDLINE abstracts, PubMed Central full-text articles, Wikipedia full-text articles and US Patent application abstracts. PolySearch2 also searches 14 widely used, text-rich biological databases such as UniProt, DrugBank and Human Metabolome Database to improve its accuracy and coverage. PolySearch2 maintains an extensive thesaurus of biological terms and exploits the latest search engine technology to rapidly retrieve relevant articles and databases records. PolySearch2 also generates, ranks and annotates associative candidates and present results with relevancy statistics and highlighted key sentences to facilitate user interpretation. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  6. Analysis of Students' Eye Movement in Relation to Contents of Multimedia Lecture

    NASA Astrophysics Data System (ADS)

    Murakami, Masayuki; Kakusho, Koh; Minoh, Michihiko

    In this article, we report our analysis of how the students' eye movement is affected by the content of lecture in order to utilize as standard of selection of image for distance learning and WBT. We classified content of lecture into nine parts: introduction, presentation, explanation, illustration, assertion, query, reply, question, response.We analyzed students' eye movement in the multimedia lecture "Japanese Economics", which was distance lecture between Kyoto University and UCLA. As the result of analysis, we get the following characteristic of eye movement of each course process in practical lecture.Introduction; students gaze at lecturer at first in order to achieve advance organizer, and next look at material.Presentation; they mainly stare at material and sometimes peer at lecturer to complement lack of understanding with information given by lecturer.Explanation; staring time is longer than other course process categories, and students stare at the object which they regard as important.Illustration; students stare at material which offers main information source.Assertion; they gaze at lecturer because of interaction between lecturer and students.Question-and-answer; generally students look at speaker but in the case of "query" about material, they change their focuses on material and lecturer fast and by turns in order to get information of lecturer and material.And our research suggests the practical guide for our choice of image information.

  7. Associations of gender and age groups on the knowledge and use of drug information resources by American pharmacists.

    PubMed

    Carvajal, Manuel J; Clauson, Kevin A; Gershman, Jennifer; Polen, Hyla H

    2013-04-01

    To explore knowledge and use of drug information resources by pharmacists and identify patterns influenced by gender and age-group classification. A survey questionnaire was mailed nationwide to 1,000 practitioners working in community (n = 500) and hospital (n = 500) settings who answer drug information questions as part of their expected job responsibilities. Responses pertaining to drug information resource use and knowledge of different types of drug-related queries, resource media preferences, and perceived adequacy of resources maintained in the pharmacy were analyzed by gender and age group. The t statistic was used to test for significant differences of means and percentages between genders and between age groups. Descriptive statistics were used to characterize other findings. Gender and age group classification influenced patterns of knowledge and use of drug information resources by pharmacists. They also affected pharmacists' perceptions of the most common types of questions prompting them to consult a drug information reference, as well as the resources consulted. Micromedex, exclusively available in electronic format, was the most commonly consulted resource overall by pharmacists. Lexi-Comp Online was the leading choice by women, preferred over Micromedex, but was not one of the top two resources selected by men. This study successfully identified the influence of gender and age-group classification in assessing drug information resource knowledge and use of general and specific types of drug-related queries.

  8. PolySearch2: a significantly improved text-mining system for discovering associations between human diseases, genes, drugs, metabolites, toxins and more

    PubMed Central

    Liu, Yifeng; Liang, Yongjie; Wishart, David

    2015-01-01

    PolySearch2 (http://polysearch.ca) is an online text-mining system for identifying relationships between biomedical entities such as human diseases, genes, SNPs, proteins, drugs, metabolites, toxins, metabolic pathways, organs, tissues, subcellular organelles, positive health effects, negative health effects, drug actions, Gene Ontology terms, MeSH terms, ICD-10 medical codes, biological taxonomies and chemical taxonomies. PolySearch2 supports a generalized ‘Given X, find all associated Ys’ query, where X and Y can be selected from the aforementioned biomedical entities. An example query might be: ‘Find all diseases associated with Bisphenol A’. To find its answers, PolySearch2 searches for associations against comprehensive collections of free-text collections, including local versions of MEDLINE abstracts, PubMed Central full-text articles, Wikipedia full-text articles and US Patent application abstracts. PolySearch2 also searches 14 widely used, text-rich biological databases such as UniProt, DrugBank and Human Metabolome Database to improve its accuracy and coverage. PolySearch2 maintains an extensive thesaurus of biological terms and exploits the latest search engine technology to rapidly retrieve relevant articles and databases records. PolySearch2 also generates, ranks and annotates associative candidates and present results with relevancy statistics and highlighted key sentences to facilitate user interpretation. PMID:25925572

  9. Parents are interested in newborn genomic testing during the early postpartum period.

    PubMed

    Waisbren, Susan E; Bäck, Danielle K; Liu, Christina; Kalia, Sarah S; Ringer, Steven A; Holm, Ingrid A; Green, Robert C

    2015-06-01

    We surveyed parents to ascertain interest in newborn genomic testing and determine whether these queries would provoke refusal of conventional state-mandated newborn screening. After a brief genetics orientation, parents rated their interest in receiving genomic testing for their healthy newborn on a 5-point Likert scale and answered questions about demographics and health history. We used logistic regression to explore factors associated with interest in genomic testing and tracked any subsequent rejection of newborn screening. We queried 514 parents within 48 hours after birth while still in hospital (mean age (SD) 32.7 (6.4) years, 65.2% female, 61.2% white, 79.3% married). Parents reported being not at all (6.4%), a little (10.9%), somewhat (36.6%), very (28.0%), or extremely (18.1%) interested in genomic testing for their newborns. None refused state-mandated newborn screening. Married participants and those with health concerns about their infant were less interested in newborn genomic testing (P = 0.012 and P = 0.030, respectively). Degree of interest for mothers and fathers was discordant (at least two categories different) for 24.4% of couples. Interest in newborn genomic testing was high among parents of healthy newborns, and the majority of couples had similar levels of interest. Surveying parents about genomic sequencing did not prompt rejection of newborn screening.Genet Med 17 6, 501-504.

  10. Weak Links: Stabilizers of Complex Systems from Proteins to Social Networks

    NASA Astrophysics Data System (ADS)

    Csermely, Peter

    Why do women stabilize our societies? Why can we enjoy and understand Shakespeare? Why are fruitflies uniform? Why do omnivorous eating habits aid our survival? Why is Mona Lisa's smile beautiful? -- Is there any answer to these questions? This book shows that the statement: "weak links stabilize complex systems" holds the answers to all of the surprising questions above. The author (recipientof several distinguished science communication prizes) uses weak (low affinity, low probability) interactions as a thread to introduce a vast varietyof networks from proteins to ecosystems.

  11. The value of Retrospective and Concurrent Think Aloud in formative usability testing of a physician data query tool.

    PubMed

    Peute, Linda W P; de Keizer, Nicolette F; Jaspers, Monique W M

    2015-06-01

    To compare the performance of the Concurrent (CTA) and Retrospective (RTA) Think Aloud method and to assess their value in a formative usability evaluation of an Intensive Care Registry-physician data query tool designed to support ICU quality improvement processes. Sixteen representative intensive care physicians participated in the usability evaluation study. Subjects were allocated to either the CTA or RTA method by a matched randomized design. Each subject performed six usability-testing tasks of varying complexity in the query tool in a real-working context. Methods were compared with regard to number and type of problems detected. Verbal protocols of CTA and RTA were analyzed in depth to assess differences in verbal output. Standardized measures were applied to assess thoroughness in usability problem detection weighted per problem severity level and method overall effectiveness in detecting usability problems with regard to the time subjects spent per method. The usability evaluation of the data query tool revealed a total of 43 unique usability problems that the intensive care physicians encountered. CTA detected unique usability problems with regard to graphics/symbols, navigation issues, error messages, and the organization of information on the query tool's screens. RTA detected unique issues concerning system match with subjects' language and applied terminology. The in-depth verbal protocol analysis of CTA provided information on intensive care physicians' query design strategies. Overall, CTA performed significantly better than RTA in detecting usability problems. CTA usability problem detection effectiveness was 0.80 vs. 0.62 (p<0.05) respectively, with an average difference of 42% less time spent per subject compared to RTA. In addition, CTA was more thorough in detecting usability problems of a moderate (0.85 vs. 0.7) and severe nature (0.71 vs. 0.57). In this study, the CTA is more effective in usability-problem detection and provided clarification of intensive care physician query design strategies to inform redesign of the query tool. However, CTA does not outperform RTA. The RTA additionally elucidated unique usability problems and new user requirements. Based on the results of this study, we recommend the use of CTA in formative usability evaluation studies of health information technology. However, we recommend further research on the application of RTA in usability studies with regard to user expertise and experience when focusing on user profile customized (re)design. Copyright © 2015 Elsevier Inc. All rights reserved.

  12. Structuring a Supportive Environment for Women in Higher Education

    ERIC Educational Resources Information Center

    Howard-Vital, Michelle R.; Brunson, Deborah A.

    2006-01-01

    After participating on a search committee for an academic leader, the authors conduct an empirical and philosophical analysis of factors that influence the work and productivity of women in higher education. The authors query--how can women achieve more balance in educational institutions so they can pursue careers that reflect the complexity and…

  13. Facilitating Learners' Web-Based Information Problem-Solving by Query Expansion-Based Concept Mapping

    ERIC Educational Resources Information Center

    Huang, Yueh-Min; Liu, Ming-Chi; Chen, Nian-Shing; Kinshuk; Wen, Dunwei

    2014-01-01

    Web-based information problem-solving has been recognised as a critical ability for learners. However, the development of students' abilities in this area often faces several challenges, such as difficulty in building well-organised knowledge structures to support complex problems that require higher-order skills (e.g., system thinking). To…

  14. InterMine Webservices for Phytozome

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Carlson, Joseph; Hayes, David; Goodstein, David

    2014-01-10

    A data warehousing framework for biological information provides a useful infrastructure for providers and users of genomic data. For providers, the infrastructure give them a consistent mechanism for extracting raw data. While for the users, the web services supported by the software allows them to make either simple and common, or complex and unique, queries of the data

  15. EMAP and EMAGE: a framework for understanding spatially organized data.

    PubMed

    Baldock, Richard A; Bard, Jonathan B L; Burger, Albert; Burton, Nicolas; Christiansen, Jeff; Feng, Guanjie; Hill, Bill; Houghton, Derek; Kaufman, Matthew; Rao, Jianguo; Sharpe, James; Ross, Allyson; Stevenson, Peter; Venkataraman, Shanmugasundaram; Waterhouse, Andrew; Yang, Yiya; Davidson, Duncan R

    2003-01-01

    The Edinburgh MouseAtlas Project (EMAP) is a time-series of mouse-embryo volumetric models. The models provide a context-free spatial framework onto which structural interpretations and experimental data can be mapped. This enables collation, comparison, and query of complex spatial patterns with respect to each other and with respect to known or hypothesized structure. The atlas also includes a time-dependent anatomical ontology and mapping between the ontology and the spatial models in the form of delineated anatomical regions or tissues. The models provide a natural, graphical context for browsing and visualizing complex data. The Edinburgh Mouse Atlas Gene-Expression Database (EMAGE) is one of the first applications of the EMAP framework and provides a spatially mapped gene-expression database with associated tools for data mapping, submission, and query. In this article, we describe the underlying principles of the Atlas and the gene-expression database, and provide a practical introduction to the use of the EMAP and EMAGE tools, including use of new techniques for whole body gene-expression data capture and mapping.

  16. A relational database in neurosurgery.

    PubMed

    Sicurello, F; Marchetti, M R; Cazzaniga, P

    1995-01-01

    This paper describes teh automatic procedure for a clinical record management in a Neurosurgery ward. The automated record allows the storage, querying and effective management of clinical data. This is useful during the patient stay and also for data processing and analysis aiming at clinical research and statistical studies. The clinical record is problem-oriented. It contains a minimum data set regarding every patient and a data set which is defined by a classification nomenclature (using an inner protocol). The main parts of the clinical record are the following tables: PERSONAL DATA: contains the fields relating to personal and admission data of the patient. The compilation of some fields is compulsory because they serve as input for the automated discharge letter. This table is used as an identifier for patient retrieval. composed of five different tables according to the kind of data. They are: familiar anamnesis, physiological anamnesis, past and next pathology anamnesis, and trauma anamnesis. GENERAL OBJECTIVITY: contains the general physical information of a patient. The field hold default values, which quickens the compilation and assures the recording of normal values. NEUROLOGICAL EXAMINATION: contains information about the neurological status of the patient. Also in this table, ther are default values in the fields. COMA: contains standardized ata and classifications. The multiple choices are automated and driven and belong to homogeneous classes. SURGICAL OPERATIONS: the information recording is made defining the general kind of operation and then defining the peculiar kind of operation. INSTRUMENTAL EXAMINATIONS: some examination results are recorded in a free structure, while other ones (TAC, etc.) follow codified structure. In order to identify a pathology by means of TAC, it is enough to record three values corresponding to three variables. THis classification fully describes a lot of neurosurgical pathologies. DISCHARGE: contains conclusions, therapies, result, and hospital course. Medical language is closer to the natural one and presents some abiguities. In order to solve this problem, a classification nomenclature was used for diagnosis definition. DISCHARGE LETTER: the document given to the patient when he is discharged. It extracts data from the previously described modules and contains standard headings. The information stored int he database is structured (e.g., diagnosis, name, surname, etc.) and access to this data takes place when the user wants to search the database, using particular queries where the identifying data of a patient is put as conditions for the research (SELECT age, name WHERE diagnosis="TRAUMA"). Logical operators and relational algebra of the relational DBMS allows more complex queries ((diagnosis="TRAUMA" AND age="19") OR sex="M"). The queries are deterministic, because data management uses a classification nomenclature. Data retrieval takes place through a matching, and the DBMS answers directly to the queries. The information retrieval speed depends upon the kind of system that is used; in our case retrieval time is low because the accesses to disk are few even for big databases. In medicine, clinical records can have a hierarchical structure and/or a relational one. Nevertheless, the hierarchical model presents a disadvantage: it is not very flexible because it is linked to a pre-defined structure; as a matter of fact, the definition of path is established in the beginning and not during the execution. Thus, a better representation of the system at a logical level requries a relational DBMS which exploits the relationships between entities in a vertical and horizontal way. That is why the developers adopted a mixed strategy which exploits the advantages of both models and which is provided by M Technology with SQL language (M/SQL). For the future, it is important to have at one's disposal multimedia technologies, which integrate different kinds of information (alp

  17. SOCR data dashboard: an integrated big data archive mashing medicare, labor, census and econometric information.

    PubMed

    Husain, Syed S; Kalinin, Alexandr; Truong, Anh; Dinov, Ivo D

    Intuitive formulation of informative and computationally-efficient queries on big and complex datasets present a number of challenges. As data collection is increasingly streamlined and ubiquitous, data exploration, discovery and analytics get considerably harder. Exploratory querying of heterogeneous and multi-source information is both difficult and necessary to advance our knowledge about the world around us. We developed a mechanism to integrate dispersed multi-source data and service the mashed information via human and machine interfaces in a secure, scalable manner. This process facilitates the exploration of subtle associations between variables, population strata, or clusters of data elements, which may be opaque to standard independent inspection of the individual sources. This a new platform includes a device agnostic tool (Dashboard webapp, http://socr.umich.edu/HTML5/Dashboard/) for graphical querying, navigating and exploring the multivariate associations in complex heterogeneous datasets. The paper illustrates this core functionality and serviceoriented infrastructure using healthcare data (e.g., US data from the 2010 Census, Demographic and Economic surveys, Bureau of Labor Statistics, and Center for Medicare Services) as well as Parkinson's Disease neuroimaging data. Both the back-end data archive and the front-end dashboard interfaces are continuously expanded to include additional data elements and new ways to customize the human and machine interactions. A client-side data import utility allows for easy and intuitive integration of user-supplied datasets. This completely open-science framework may be used for exploratory analytics, confirmatory analyses, meta-analyses, and education and training purposes in a wide variety of fields.

  18. Muscle Logic: New Knowledge Resource for Anatomy Enables Comprehensive Searches of the Literature on the Feeding Muscles of Mammals

    PubMed Central

    Druzinsky, Robert E.; Balhoff, James P.; Crompton, Alfred W.; Done, James; German, Rebecca Z.; Haendel, Melissa A.; Herrel, Anthony; Herring, Susan W.; Lapp, Hilmar; Mabee, Paula M.; Muller, Hans-Michael; Mungall, Christopher J.; Sternberg, Paul W.; Van Auken, Kimberly; Vinyard, Christopher J.; Williams, Susan H.; Wall, Christine E.

    2016-01-01

    Background In recent years large bibliographic databases have made much of the published literature of biology available for searches. However, the capabilities of the search engines integrated into these databases for text-based bibliographic searches are limited. To enable searches that deliver the results expected by comparative anatomists, an underlying logical structure known as an ontology is required. Development and Testing of the Ontology Here we present the Mammalian Feeding Muscle Ontology (MFMO), a multi-species ontology focused on anatomical structures that participate in feeding and other oral/pharyngeal behaviors. A unique feature of the MFMO is that a simple, computable, definition of each muscle, which includes its attachments and innervation, is true across mammals. This construction mirrors the logical foundation of comparative anatomy and permits searches using language familiar to biologists. Further, it provides a template for muscles that will be useful in extending any anatomy ontology. The MFMO is developed to support the Feeding Experiments End-User Database Project (FEED, https://feedexp.org/), a publicly-available, online repository for physiological data collected from in vivo studies of feeding (e.g., mastication, biting, swallowing) in mammals. Currently the MFMO is integrated into FEED and also into two literature-specific implementations of Textpresso, a text-mining system that facilitates powerful searches of a corpus of scientific publications. We evaluate the MFMO by asking questions that test the ability of the ontology to return appropriate answers (competency questions). We compare the results of queries of the MFMO to results from similar searches in PubMed and Google Scholar. Results and Significance Our tests demonstrate that the MFMO is competent to answer queries formed in the common language of comparative anatomy, but PubMed and Google Scholar are not. Overall, our results show that by incorporating anatomical ontologies into searches, an expanded and anatomically comprehensive set of results can be obtained. The broader scientific and publishing communities should consider taking up the challenge of semantically enabled search capabilities. PMID:26870952

  19. Unleashing spatially distributed ecohydrology modeling using Big Data tools

    NASA Astrophysics Data System (ADS)

    Miles, B.; Idaszak, R.

    2015-12-01

    Physically based spatially distributed ecohydrology models are useful for answering science and management questions related to the hydrology and biogeochemistry of prairie, savanna, forested, as well as urbanized ecosystems. However, these models can produce hundreds of gigabytes of spatial output for a single model run over decadal time scales when run at regional spatial scales and moderate spatial resolutions (~100-km2+ at 30-m spatial resolution) or when run for small watersheds at high spatial resolutions (~1-km2 at 3-m spatial resolution). Numerical data formats such as HDF5 can store arbitrarily large datasets. However even in HPC environments, there are practical limits on the size of single files that can be stored and reliably backed up. Even when such large datasets can be stored, querying and analyzing these data can suffer from poor performance due to memory limitations and I/O bottlenecks, for example on single workstations where memory and bandwidth are limited, or in HPC environments where data are stored separately from computational nodes. The difficulty of storing and analyzing spatial data from ecohydrology models limits our ability to harness these powerful tools. Big Data tools such as distributed databases have the potential to surmount the data storage and analysis challenges inherent to large spatial datasets. Distributed databases solve these problems by storing data close to computational nodes while enabling horizontal scalability and fault tolerance. Here we present the architecture of and preliminary results from PatchDB, a distributed datastore for managing spatial output from the Regional Hydro-Ecological Simulation System (RHESSys). The initial version of PatchDB uses message queueing to asynchronously write RHESSys model output to an Apache Cassandra cluster. Once stored in the cluster, these data can be efficiently queried to quickly produce both spatial visualizations for a particular variable (e.g. maps and animations), as well as point time series of arbitrary variables at arbitrary points in space within a watershed or river basin. By treating ecohydrology modeling as a Big Data problem, we hope to provide a platform for answering transformative science and management questions related to water quantity and quality in a world of non-stationary climate.

  20. Binary culture of microalgae as an integrated approach for enhanced biomass and metabolites productivity, wastewater treatment, and bioflocculation.

    PubMed

    Rashid, Naim; Park, Won-Kun; Selvaratnam, Thinesh

    2018-03-01

    Ecological studies of microalgae have revealed their potential to co-exist in the natural environment. It provides an evidence of the symbiotic relationship of microalgae with other microorganisms. The symbiosis potential of microalgae is inherited with distinct advantages, providing a venue for their scale-up applications. The deployment of large-scale microalgae applications is limited due to the technical challenges such as slow growth rate, low metabolites yield, and high risk of biomass contamination by unwanted bacteria. However, these challenges can be overcome by exploring symbiotic potential of microalgae. In a symbiotic system, photosynthetic microalgae co-exist with bacteria, fungi, as well as heterotrophic microalgae. In this consortium, they can exchange nutrients and metabolites, transfer gene, and interact with each other through complex metabolic mechanism. Microalgae in this system, termed as a binary culture, are reported to exhibit high growth rate, enhanced bio-flocculation, and biochemical productivity without experiencing contamination. Binary culture also offers interesting applications in other biotechnological processes including bioremediation, wastewater treatment, and production of high-value metabolites. The focus of the study is to provide a perspective to enhance the understanding about microalgae binary culture. In this review, the mechanism of binary culture, its potential, and limitations are briefly discussed. A number of queries are evolved through this study, which needs to be answered by executing future research to assess the real potential of binary culture. Copyright © 2017 Elsevier Ltd. All rights reserved.

  1. Expert system for maintenance management of a boiling water reactor power plant

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hong Shen; Liou, L.W.; Levine, S.

    1992-01-01

    An expert system code has been developed for the maintenance of two boiling water reactor units in Berwick, Pennsylvania, that are operated by the Pennsylvania Power and Light Company (PP and L). The objective of this expert system code, where the knowledge of experienced operators and engineers is captured and implemented, is to support the decisions regarding which components can be safely and reliably removed from service for maintenance. It can also serve as a query-answering facility for checking the plant system status and for training purposes. The operating and maintenance information of a large number of support systems, whichmore » must be available for emergencies and/or in the event of an accident, is stored in the data base of the code. It identifies the relevant technical specifications and management rules for shutting down any one of the systems or removing a component from service to support maintenance. Because of the complexity and time needed to incorporate a large number of systems and their components, the first phase of the expert system develops a prototype code, which includes only the reactor core isolation coolant system, the high-pressure core injection system, the instrument air system, the service water system, and the plant electrical system. The next phase is scheduled to expand the code to include all other systems. This paper summarizes the prototype code and the design concept of the complete expert system code for maintenance management of all plant systems and components.« less

  2. A Hybrid Approach to Finding Relevant Social Media Content for Complex Domain Specific Information Needs.

    PubMed

    Cameron, Delroy; Sheth, Amit P; Jaykumar, Nishita; Thirunarayan, Krishnaprasad; Anand, Gaurish; Smith, Gary A

    2014-12-01

    While contemporary semantic search systems offer to improve classical keyword-based search, they are not always adequate for complex domain specific information needs. The domain of prescription drug abuse, for example, requires knowledge of both ontological concepts and "intelligible constructs" not typically modeled in ontologies. These intelligible constructs convey essential information that include notions of intensity, frequency, interval, dosage and sentiments, which could be important to the holistic needs of the information seeker. In this paper, we present a hybrid approach to domain specific information retrieval that integrates ontology-driven query interpretation with synonym-based query expansion and domain specific rules, to facilitate search in social media on prescription drug abuse. Our framework is based on a context-free grammar (CFG) that defines the query language of constructs interpretable by the search system. The grammar provides two levels of semantic interpretation: 1) a top-level CFG that facilitates retrieval of diverse textual patterns, which belong to broad templates and 2) a low-level CFG that enables interpretation of specific expressions belonging to such textual patterns. These low-level expressions occur as concepts from four different categories of data: 1) ontological concepts, 2) concepts in lexicons (such as emotions and sentiments), 3) concepts in lexicons with only partial ontology representation, called lexico-ontology concepts (such as side effects and routes of administration (ROA)), and 4) domain specific expressions (such as date, time, interval, frequency and dosage) derived solely through rules. Our approach is embodied in a novel Semantic Web platform called PREDOSE, which provides search support for complex domain specific information needs in prescription drug abuse epidemiology. When applied to a corpus of over 1 million drug abuse-related web forum posts, our search framework proved effective in retrieving relevant documents when compared with three existing search systems.

  3. A Hybrid Approach to Finding Relevant Social Media Content for Complex Domain Specific Information Needs

    PubMed Central

    Cameron, Delroy; Sheth, Amit P.; Jaykumar, Nishita; Thirunarayan, Krishnaprasad; Anand, Gaurish; Smith, Gary A.

    2015-01-01

    While contemporary semantic search systems offer to improve classical keyword-based search, they are not always adequate for complex domain specific information needs. The domain of prescription drug abuse, for example, requires knowledge of both ontological concepts and “intelligible constructs” not typically modeled in ontologies. These intelligible constructs convey essential information that include notions of intensity, frequency, interval, dosage and sentiments, which could be important to the holistic needs of the information seeker. In this paper, we present a hybrid approach to domain specific information retrieval that integrates ontology-driven query interpretation with synonym-based query expansion and domain specific rules, to facilitate search in social media on prescription drug abuse. Our framework is based on a context-free grammar (CFG) that defines the query language of constructs interpretable by the search system. The grammar provides two levels of semantic interpretation: 1) a top-level CFG that facilitates retrieval of diverse textual patterns, which belong to broad templates and 2) a low-level CFG that enables interpretation of specific expressions belonging to such textual patterns. These low-level expressions occur as concepts from four different categories of data: 1) ontological concepts, 2) concepts in lexicons (such as emotions and sentiments), 3) concepts in lexicons with only partial ontology representation, called lexico-ontology concepts (such as side effects and routes of administration (ROA)), and 4) domain specific expressions (such as date, time, interval, frequency and dosage) derived solely through rules. Our approach is embodied in a novel Semantic Web platform called PREDOSE, which provides search support for complex domain specific information needs in prescription drug abuse epidemiology. When applied to a corpus of over 1 million drug abuse-related web forum posts, our search framework proved effective in retrieving relevant documents when compared with three existing search systems. PMID:25814917

  4. Memetic Algorithms, Domain Knowledge, and Financial Investing

    ERIC Educational Resources Information Center

    Du, Jie

    2012-01-01

    While the question of how to use human knowledge to guide evolutionary search is long-recognized, much remains to be done to answer this question adequately. This dissertation aims to further answer this question by exploring the role of domain knowledge in evolutionary computation as applied to real-world, complex problems, such as financial…

  5. Questions and Answers Explaining the New Tax Rules Applicable to Tax-Sheltered Annuities.

    ERIC Educational Resources Information Center

    Gordon, David E.; Spuehler, Donald R.

    1991-01-01

    The Tax Reform Act of 1986 and subsequent legislation have radically altered the rules needed to maintain favorable tax status of tax-sheltered annuity plans for college employees. Application of the new rules is complex. Critical questions facing institutions and organizations are answered, and potential liabilities facing educational employers…

  6. Query-oriented evidence extraction to support evidence-based medicine practice.

    PubMed

    Sarker, Abeed; Mollá, Diego; Paris, Cecile

    2016-02-01

    Evidence-based medicine practice requires medical practitioners to rely on the best available evidence, in addition to their expertise, when making clinical decisions. The medical domain boasts a large amount of published medical research data, indexed in various medical databases such as MEDLINE. As the size of this data grows, practitioners increasingly face the problem of information overload, and past research has established the time-associated obstacles faced by evidence-based medicine practitioners. In this paper, we focus on the problem of automatic text summarisation to help practitioners quickly find query-focused information from relevant documents. We utilise an annotated corpus that is specialised for the task of evidence-based summarisation of text. In contrast to past summarisation approaches, which mostly rely on surface level features to identify salient pieces of texts that form the summaries, our approach focuses on the use of corpus-based statistics, and domain-specific lexical knowledge for the identification of summary contents. We also apply a target-sentence-specific summarisation technique that reduces the problem of underfitting that persists in generic summarisation models. In automatic evaluations run over a large number of annotated summaries, our extractive summarisation technique statistically outperforms various baseline and benchmark summarisation models with a percentile rank of 96.8%. A manual evaluation shows that our extractive summarisation approach is capable of selecting content with high recall and precision, and may thus be used to generate bottom-line answers to practitioners' queries. Our research shows that the incorporation of specialised data and domain-specific knowledge can significantly improve text summarisation performance in the medical domain. Due to the vast amounts of medical text available, and the high growth of this form of data, we suspect that such summarisation techniques will address the time-related obstacles associated with evidence-based medicine. Copyright © 2015 Elsevier Inc. All rights reserved.

  7. Observation Data Model Core Components, its Implementation in the Table Access Protocol Version 1.1

    NASA Astrophysics Data System (ADS)

    Louys, Mireille; Tody, Doug; Dowler, Patrick; Durand, Daniel; Michel, Laurent; Bonnarel, Francos; Micol, Alberto; IVOA DataModel Working Group; Louys, Mireille; Tody, Doug; Dowler, Patrick; Durand, Daniel

    2017-05-01

    This document defines the core components of the Observation data model that are necessary to perform data discovery when querying data centers for astronomical observations of interest. It exposes use-cases to be carried out, explains the model and provides guidelines for its implementation as a data access service based on the Table Access Protocol (TAP). It aims at providing a simple model easy to understand and to implement by data providers that wish to publish their data into the Virtual Observatory. This interface integrates data modeling and data access aspects in a single service and is named ObsTAP. It will be referenced as such in the IVOA registries. In this document, the Observation Data Model Core Components (ObsCoreDM) defines the core components of queryable metadata required for global discovery of observational data. It is meant to allow a single query to be posed to TAP services at multiple sites to perform global data discovery without having to understand the details of the services present at each site. It defines a minimal set of basic metadata and thus allows for a reasonable cost of implementation by data providers. The combination of the ObsCoreDM with TAP is referred to as an ObsTAP service. As with most of the VO Data Models, ObsCoreDM makes use of STC, Utypes, Units and UCDs. The ObsCoreDM can be serialized as a VOTable. ObsCoreDM can make reference to more complete data models such as Characterisation DM, Spectrum DM or Simple Spectral Line Data Model (SSLDM). ObsCore shares a large set of common concepts with DataSet Metadata Data Model (Cresitello-Dittmar et al. 2016) which binds together most of the data model concepts from the above models in a comprehensive and more general frame work. This current specification on the contrary provides guidelines for implementing these concepts using the TAP protocol and answering ADQL queries. It is dedicated to global discovery.

  8. Computer assisted data analysis in intensive care: the ICDEV project--development of a scientific database system for intensive care (Intensive Care Data Evaluation Project).

    PubMed

    Metnitz, P G; Laback, P; Popow, C; Laback, O; Lenz, K; Hiesmayr, M

    1995-01-01

    Patient Data Management Systems (PDMS) for ICUs collect, present and store clinical data. Various intentions make analysis of those digitally stored data desirable, such as quality control or scientific purposes. The aim of the Intensive Care Data Evaluation project (ICDEV), was to provide a database tool for the analysis of data recorded at various ICUs at the University Clinics of Vienna. General Hospital of Vienna, with two different PDMSs used: CareVue 9000 (Hewlett Packard, Andover, USA) at two ICUs (one medical ICU and one neonatal ICU) and PICIS Chart+ (PICIS, Paris, France) at one Cardiothoracic ICU. CONCEPT AND METHODS: Clinically oriented analysis of the data collected in a PDMS at an ICU was the beginning of the development. After defining the database structure we established a client-server based database system under Microsoft Windows NI and developed a user friendly data quering application using Microsoft Visual C++ and Visual Basic; ICDEV was successfully installed at three different ICUs, adjustment to the different PDMS configurations were done within a few days. The database structure developed by us enables a powerful query concept representing an 'EXPERT QUESTION COMPILER' which may help to answer almost any clinical questions. Several program modules facilitate queries at the patient, group and unit level. Results from ICDEV-queries are automatically transferred to Microsoft Excel for display (in form of configurable tables and graphs) and further processing. The ICDEV concept is configurable for adjustment to different intensive care information systems and can be used to support computerized quality control. However, as long as there exists no sufficient artifact recognition or data validation software for automatically recorded patient data, the reliability of these data and their usage for computer assisted quality control remain unclear and should be further studied.

  9. Query Auto-Completion Based on Word2vec Semantic Similarity

    NASA Astrophysics Data System (ADS)

    Shao, Taihua; Chen, Honghui; Chen, Wanyu

    2018-04-01

    Query auto-completion (QAC) is the first step of information retrieval, which helps users formulate the entire query after inputting only a few prefixes. Regarding the models of QAC, the traditional method ignores the contribution from the semantic relevance between queries. However, similar queries always express extremely similar search intention. In this paper, we propose a hybrid model FS-QAC based on query semantic similarity as well as the query frequency. We choose word2vec method to measure the semantic similarity between intended queries and pre-submitted queries. By combining both features, our experiments show that FS-QAC model improves the performance when predicting the user’s query intention and helping formulate the right query. Our experimental results show that the optimal hybrid model contributes to a 7.54% improvement in terms of MRR against a state-of-the-art baseline using the public AOL query logs.

  10. Declarative Programming with Temporal Constraints, in the Language CG.

    PubMed

    Negreanu, Lorina

    2015-01-01

    Specifying and interpreting temporal constraints are key elements of knowledge representation and reasoning, with applications in temporal databases, agent programming, and ambient intelligence. We present and formally characterize the language CG, which tackles this issue. In CG, users are able to develop time-dependent programs, in a flexible and straightforward manner. Such programs can, in turn, be coupled with evolving environments, thus empowering users to control the environment's evolution. CG relies on a structure for storing temporal information, together with a dedicated query mechanism. Hence, we explore the computational complexity of our query satisfaction problem. We discuss previous implementation attempts of CG and introduce a novel prototype which relies on logic programming. Finally, we address the issue of consistency and correctness of CG program execution, using the Event-B modeling approach.

  11. Using Semantic Web Technologies for Cohort Identification from Electronic Health Records for Clinical Research

    PubMed Central

    Pathak, Jyotishman; Kiefer, Richard C.; Chute, Christopher G.

    2012-01-01

    The ability to conduct genome-wide association studies (GWAS) has enabled new exploration of how genetic variations contribute to health and disease etiology. One of the key requirements to perform GWAS is the identification of subject cohorts with accurate classification of disease phenotypes. In this work, we study how emerging Semantic Web technologies can be applied in conjunction with clinical data stored in electronic health records (EHRs) to accurately identify subjects with specific diseases for inclusion in cohort studies. In particular, we demonstrate the role of using Resource Description Framework (RDF) for representing EHR data and enabling federated querying and inferencing via standardized Web protocols for identifying subjects with Diabetes Mellitus. Our study highlights the potential of using Web-scale data federation approaches to execute complex queries. PMID:22779040

  12. On-Demand Associative Cross-Language Information Retrieval

    NASA Astrophysics Data System (ADS)

    Geraldo, André Pinto; Moreira, Viviane P.; Gonçalves, Marcos A.

    This paper proposes the use of algorithms for mining association rules as an approach for Cross-Language Information Retrieval. These algorithms have been widely used to analyse market basket data. The idea is to map the problem of finding associations between sales items to the problem of finding term translations over a parallel corpus. The proposal was validated by means of experiments using queries in two distinct languages: Portuguese and Finnish to retrieve documents in English. The results show that the performance of our proposed approach is comparable to the performance of the monolingual baseline and to query translation via machine translation, even though these systems employ more complex Natural Language Processing techniques. The combination between machine translation and our approach yielded the best results, even outperforming the monolingual baseline.

  13. ExplorEnz: the primary source of the IUBMB enzyme list

    PubMed Central

    McDonald, Andrew G.; Boyce, Sinéad; Tipton, Keith F.

    2009-01-01

    ExplorEnz is the MySQL database that is used for the curation and dissemination of the International Union of Biochemistry and Molecular Biology (IUBMB) Enzyme Nomenclature. A simple web-based query interface is provided, along with an advanced search engine for more complex Boolean queries. The WWW front-end is accessible at http://www.enzyme-database.org, from where downloads of the database as SQL and XML are also available. An associated form-based curatorial application has been developed to facilitate the curation of enzyme data as well as the internal and public review processes that occur before an enzyme entry is made official. Suggestions for new enzyme entries, or modifications to existing ones, can be made using the forms provided at http://www.enzyme-database.org/forms.php. PMID:18776214

  14. PropBase Query Layer: a single portal to UK subsurface physical property databases

    NASA Astrophysics Data System (ADS)

    Kingdon, Andrew; Nayembil, Martin L.; Richardson, Anne E.; Smith, A. Graham

    2013-04-01

    Until recently, the delivery of geological information for industry and public was achieved by geological mapping. Now pervasively available computers mean that 3D geological models can deliver realistic representations of the geometric location of geological units, represented as shells or volumes. The next phase of this process is to populate these with physical properties data that describe subsurface heterogeneity and its associated uncertainty. Achieving this requires capture and serving of physical, hydrological and other property information from diverse sources to populate these models. The British Geological Survey (BGS) holds large volumes of subsurface property data, derived both from their own research data collection and also other, often commercially derived data sources. This can be voxelated to incorporate this data into the models to demonstrate property variation within the subsurface geometry. All property data held by BGS has for many years been stored in relational databases to ensure their long-term continuity. However these have, by necessity, complex structures; each database contains positional reference data and model information, and also metadata such as sample identification information and attributes that define the source and processing. Whilst this is critical to assessing these analyses, it also hugely complicates the understanding of variability of the property under assessment and requires multiple queries to study related datasets making extracting physical properties from these databases difficult. Therefore the PropBase Query Layer has been created to allow simplified aggregation and extraction of all related data and its presentation of complex data in simple, mostly denormalized, tables which combine information from multiple databases into a single system. The structure from each relational database is denormalized in a generalised structure, so that each dataset can be viewed together in a common format using a simple interface. Data are re-engineered to facilitate easy loading. The query layer structure comprises tables, procedures, functions, triggers, views and materialised views. The structure contains a main table PRB_DATA which contains all of the data with the following attribution: • a unique identifier • the data source • the unique identifier from the parent database for traceability • the 3D location • the property type • the property value • the units • necessary qualifiers • precision information and an audit trail Data sources, property type and units are constrained by dictionaries, a key component of the structure which defines what properties and inheritance hierarchies are to be coded and also guides the process as to what and how these are extracted from the structure. Data types served by the Query Layer include site investigation derived geotechnical data, hydrogeology datasets, regional geochemistry, geophysical logs as well as lithological and borehole metadata. The size and complexity of the data sets with multiple parent structures requires a technically robust approach to keep the layer synchronised. This is achieved through Oracle procedures written in PL/SQL containing the logic required to carry out the data manipulation (inserts, updates, deletes) to keep the layer synchronised with the underlying databases either as regular scheduled jobs (weekly, monthly etc) or invoked on demand. The PropBase Query Layer's implementation has enabled rapid data discovery, visualisation and interpretation of geological data with greater ease, simplifying the parametrisation of 3D model volumes and facilitating the study of intra-unit heterogeneity.

  15. EquiX-A Search and Query Language for XML.

    ERIC Educational Resources Information Center

    Cohen, Sara; Kanza, Yaron; Kogan, Yakov; Sagiv, Yehoshua; Nutt, Werner; Serebrenik, Alexander

    2002-01-01

    Describes EquiX, a search language for XML that combines querying with searching to query the data and the meta-data content of Web pages. Topics include search engines; a data model for XML documents; search query syntax; search query semantics; an algorithm for evaluating a query on a document; and indexing EquiX queries. (LRW)

  16. Spatial and symbolic queries for 3D image data

    NASA Astrophysics Data System (ADS)

    Benson, Daniel C.; Zick, Gregory L.

    1992-04-01

    We present a query system for an object-oriented biomedical imaging database containing 3-D anatomical structures and their corresponding 2-D images. The graphical interface facilitates the formation of spatial queries, nonspatial or symbolic queries, and combined spatial/symbolic queries. A query editor is used for the creation and manipulation of 3-D query objects as volumes, surfaces, lines, and points. Symbolic predicates are formulated through a combination of text fields and multiple choice selections. Query results, which may include images, image contents, composite objects, graphics, and alphanumeric data, are displayed in multiple views. Objects returned by the query may be selected directly within the views for further inspection or modification, or for use as query objects in subsequent queries. Our image database query system provides visual feedback and manipulation of spatial query objects, multiple views of volume data, and the ability to combine spatial and symbolic queries. The system allows for incremental enhancement of existing objects and the addition of new objects and spatial relationships. The query system is designed for databases containing symbolic and spatial data. This paper discuses its application to data acquired in biomedical 3- D image reconstruction, but it is applicable to other areas such as CAD/CAM, geographical information systems, and computer vision.

  17. GenoQuery: a new querying module for functional annotation in a genomic warehouse

    PubMed Central

    Lemoine, Frédéric; Labedan, Bernard; Froidevaux, Christine

    2008-01-01

    Motivation: We have to cope with both a deluge of new genome sequences and a huge amount of data produced by high-throughput approaches used to exploit these genomic features. Crossing and comparing such heterogeneous and disparate data will help improving functional annotation of genomes. This requires designing elaborate integration systems such as warehouses for storing and querying these data. Results: We have designed a relational genomic warehouse with an original multi-layer architecture made of a databases layer and an entities layer. We describe a new querying module, GenoQuery, which is based on this architecture. We use the entities layer to define mixed queries. These mixed queries allow searching for instances of biological entities and their properties in the different databases, without specifying in which database they should be found. Accordingly, we further introduce the central notion of alternative queries. Such queries have the same meaning as the original mixed queries, while exploiting complementarities yielded by the various integrated databases of the warehouse. We explain how GenoQuery computes all the alternative queries of a given mixed query. We illustrate how useful this querying module is by means of a thorough example. Availability: http://www.lri.fr/~lemoine/GenoQuery/ Contact: chris@lri.fr, lemoine@lri.fr PMID:18586731

  18. BioCarian: search engine for exploratory searches in heterogeneous biological databases.

    PubMed

    Zaki, Nazar; Tennakoon, Chandana

    2017-10-02

    There are a large number of biological databases publicly available for scientists in the web. Also, there are many private databases generated in the course of research projects. These databases are in a wide variety of formats. Web standards have evolved in the recent times and semantic web technologies are now available to interconnect diverse and heterogeneous sources of data. Therefore, integration and querying of biological databases can be facilitated by techniques used in semantic web. Heterogeneous databases can be converted into Resource Description Format (RDF) and queried using SPARQL language. Searching for exact queries in these databases is trivial. However, exploratory searches need customized solutions, especially when multiple databases are involved. This process is cumbersome and time consuming for those without a sufficient background in computer science. In this context, a search engine facilitating exploratory searches of databases would be of great help to the scientific community. We present BioCarian, an efficient and user-friendly search engine for performing exploratory searches on biological databases. The search engine is an interface for SPARQL queries over RDF databases. We note that many of the databases can be converted to tabular form. We first convert the tabular databases to RDF. The search engine provides a graphical interface based on facets to explore the converted databases. The facet interface is more advanced than conventional facets. It allows complex queries to be constructed, and have additional features like ranking of facet values based on several criteria, visually indicating the relevance of a facet value and presenting the most important facet values when a large number of choices are available. For the advanced users, SPARQL queries can be run directly on the databases. Using this feature, users will be able to incorporate federated searches of SPARQL endpoints. We used the search engine to do an exploratory search on previously published viral integration data and were able to deduce the main conclusions of the original publication. BioCarian is accessible via http://www.biocarian.com . We have developed a search engine to explore RDF databases that can be used by both novice and advanced users.

  19. Searching for rare diseases in PubMed: a blind comparison of Orphanet expert query and query based on terminological knowledge.

    PubMed

    Griffon, N; Schuers, M; Dhombres, F; Merabti, T; Kerdelhué, G; Rollin, L; Darmoni, S J

    2016-08-02

    Despite international initiatives like Orphanet, it remains difficult to find up-to-date information about rare diseases. The aim of this study is to propose an exhaustive set of queries for PubMed based on terminological knowledge and to evaluate it versus the queries based on expertise provided by the most frequently used resource in Europe: Orphanet. Four rare disease terminologies (MeSH, OMIM, HPO and HRDO) were manually mapped to each other permitting the automatic creation of expended terminological queries for rare diseases. For 30 rare diseases, 30 citations retrieved by Orphanet expert query and/or query based on terminological knowledge were assessed for relevance by two independent reviewers unaware of the query's origin. An adjudication procedure was used to resolve any discrepancy. Precision, relative recall and F-measure were all computed. For each Orphanet rare disease (n = 8982), there was a corresponding terminological query, in contrast with only 2284 queries provided by Orphanet. Only 553 citations were evaluated due to queries with 0 or only a few hits. There were no significant differences between the Orpha query and terminological query in terms of precision, respectively 0.61 vs 0.52 (p = 0.13). Nevertheless, terminological queries retrieved more citations more often than Orpha queries (0.57 vs. 0.33; p = 0.01). Interestingly, Orpha queries seemed to retrieve older citations than terminological queries (p < 0.0001). The terminological queries proposed in this study are now currently available for all rare diseases. They may be a useful tool for both precision or recall oriented literature search.

  20. Fast Query-Optimized Kernel-Machine Classification

    NASA Technical Reports Server (NTRS)

    Mazzoni, Dominic; DeCoste, Dennis

    2004-01-01

    A recently developed algorithm performs kernel-machine classification via incremental approximate nearest support vectors. The algorithm implements support-vector machines (SVMs) at speeds 10 to 100 times those attainable by use of conventional SVM algorithms. The algorithm offers potential benefits for classification of images, recognition of speech, recognition of handwriting, and diverse other applications in which there are requirements to discern patterns in large sets of data. SVMs constitute a subset of kernel machines (KMs), which have become popular as models for machine learning and, more specifically, for automated classification of input data on the basis of labeled training data. While similar in many ways to k-nearest-neighbors (k-NN) models and artificial neural networks (ANNs), SVMs tend to be more accurate. Using representations that scale only linearly in the numbers of training examples, while exploring nonlinear (kernelized) feature spaces that are exponentially larger than the original input dimensionality, KMs elegantly and practically overcome the classic curse of dimensionality. However, the price that one must pay for the power of KMs is that query-time complexity scales linearly with the number of training examples, making KMs often orders of magnitude more computationally expensive than are ANNs, decision trees, and other popular machine learning alternatives. The present algorithm treats an SVM classifier as a special form of a k-NN. The algorithm is based partly on an empirical observation that one can often achieve the same classification as that of an exact KM by using only small fraction of the nearest support vectors (SVs) of a query. The exact KM output is a weighted sum over the kernel values between the query and the SVs. In this algorithm, the KM output is approximated with a k-NN classifier, the output of which is a weighted sum only over the kernel values involving k selected SVs. Before query time, there are gathered statistics about how misleading the output of the k-NN model can be, relative to the outputs of the exact KM for a representative set of examples, for each possible k from 1 to the total number of SVs. From these statistics, there are derived upper and lower thresholds for each step k. These thresholds identify output levels for which the particular variant of the k-NN model already leans so strongly positively or negatively that a reversal in sign is unlikely, given the weaker SV neighbors still remaining. At query time, the partial output of each query is incrementally updated, stopping as soon as it exceeds the predetermined statistical thresholds of the current step. For an easy query, stopping can occur as early as step k = 1. For more difficult queries, stopping might not occur until nearly all SVs are touched. A key empirical observation is that this approach can tolerate very approximate nearest-neighbor orderings. In experiments, SVs and queries were projected to a subspace comprising the top few principal- component dimensions and neighbor orderings were computed in that subspace. This approach ensured that the overhead of the nearest-neighbor computations was insignificant, relative to that of the exact KM computation.

  1. Testing the Usability of Interactive Visualizations for Complex Problem-Solving: Findings Related to Improving Interfaces and Help.

    ERIC Educational Resources Information Center

    Mirel, Barbara

    2001-01-01

    Conducts a scenario-based usability test with 10 data analysts using visual querying (visually analyzing data with interactive graphics). Details a range of difficulties found in visual selection that, at times, gave rise to inaccurate selections, invalid conclusions, and misguided decisions. Argues that support for visual selection must be built…

  2. Augmenting Oracle Text with the UMLS for enhanced searching of free-text medical reports.

    PubMed

    Ding, Jing; Erdal, Selnur; Dhaval, Rakesh; Kamal, Jyoti

    2007-10-11

    The intrinsic complexity of free-text medical reports imposes great challenges for information retrieval systems. We have developed a prototype search engine for retrieving clinical reports that leverages the powerful indexing and querying capabilities of Oracle Text, and the rich biomedical domain knowledge and semantic structures that are captured in the UMLS Metathesaurus.

  3. Trails research: where do we go from here?

    Treesearch

    Michael A. Schuett; Patricia Seiser

    2002-01-01

    This paper describes a recent study focusing on trails research needs. This study was supported by American Trails. Using a Delphi technique, 86 trails experts representing a variety of federal, state and local agencies, nonprofits, and trail uses were queried by email on trails research needs. A Delphi technique is a prognostic tool for dealing with complex problems...

  4. Legal Decision-Making by People with Aphasia: Critical Incidents for Speech Pathologists

    ERIC Educational Resources Information Center

    Ferguson, Alison; Duffield, Gemma; Worrall, Linda

    2010-01-01

    Background: The assessment and management of a person with aphasia for whom decision-making capacity is queried represents a highly complex clinical issue. In addition, there are few published guidelines and even fewer published accounts of empirical research to assist. Aims: The research presented in this paper aimed to identify the main issues…

  5. Examining Learning Styles and Perceived Benefits of Analogical Problem Construction on SQL Knowledge Acquisition

    ERIC Educational Resources Information Center

    Mills, Robert J.; Dupin-Bryant, Pamela A.; Johnson, John D.; Beaulieu, Tanya Y.

    2015-01-01

    The demand for Information Systems (IS) graduates with expertise in Structured Query Language (SQL) and database management is vast and projected to increase as "big data" becomes ubiquitous. To prepare students to solve complex problems in a data-driven world, educators must explore instructional strategies to help link prior knowledge…

  6. A method to implement fine-grained access control for personal health records through standard relational database queries.

    PubMed

    Sujansky, Walter V; Faus, Sam A; Stone, Ethan; Brennan, Patricia Flatley

    2010-10-01

    Online personal health records (PHRs) enable patients to access, manage, and share certain of their own health information electronically. This capability creates the need for precise access-controls mechanisms that restrict the sharing of data to that intended by the patient. The authors describe the design and implementation of an access-control mechanism for PHR repositories that is modeled on the eXtensible Access Control Markup Language (XACML) standard, but intended to reduce the cognitive and computational complexity of XACML. The authors implemented the mechanism entirely in a relational database system using ANSI-standard SQL statements. Based on a set of access-control rules encoded as relational table rows, the mechanism determines via a single SQL query whether a user who accesses patient data from a specific application is authorized to perform a requested operation on a specified data object. Testing of this query on a moderately large database has demonstrated execution times consistently below 100ms. The authors include the details of the implementation, including algorithms, examples, and a test database as Supplementary materials. Copyright © 2010 Elsevier Inc. All rights reserved.

  7. PRIDE: new developments and new datasets.

    PubMed

    Jones, Philip; Côté, Richard G; Cho, Sang Yun; Klie, Sebastian; Martens, Lennart; Quinn, Antony F; Thorneycroft, David; Hermjakob, Henning

    2008-01-01

    The PRIDE (http://www.ebi.ac.uk/pride) database of protein and peptide identifications was previously described in the NAR Database Special Edition in 2006. Since this publication, the volume of public data in the PRIDE relational database has increased by more than an order of magnitude. Several significant public datasets have been added, including identifications and processed mass spectra generated by the HUPO Brain Proteome Project and the HUPO Liver Proteome Project. The PRIDE software development team has made several significant changes and additions to the user interface and tool set associated with PRIDE. The focus of these changes has been to facilitate the submission process and to improve the mechanisms by which PRIDE can be queried. The PRIDE team has developed a Microsoft Excel workbook that allows the required data to be collated in a series of relatively simple spreadsheets, with automatic generation of PRIDE XML at the end of the process. The ability to query PRIDE has been augmented by the addition of a BioMart interface allowing complex queries to be constructed. Collaboration with groups outside the EBI has been fruitful in extending PRIDE, including an approach to encode iTRAQ quantitative data in PRIDE XML.

  8. Protecting personal data in epidemiological research: DataSHIELD and UK law.

    PubMed

    Wallace, Susan E; Gaye, Amadou; Shoush, Osama; Burton, Paul R

    2014-01-01

    Data from individual collections, such as biobanks and cohort studies, are now being shared in order to create combined datasets which can be queried to ask complex scientific questions. But this sharing must be done with due regard for data protection principles. DataSHIELD is a new technology that queries nonaggregated, individual-level data in situ but returns query data in an anonymous format. This raises questions of the ability of DataSHIELD to adequately protect participant confidentiality. An ethico-legal analysis was conducted that examined each step of the DataSHIELD process from the perspective of UK case law, regulations, and guidance. DataSHIELD reaches agreed UK standards of protection for the sharing of biomedical data. All direct processing of personal data is conducted within the protected environment of the contributing study; participating studies have scientific, ethics, and data access approvals in place prior to the analysis; studies are clear that their consents conform with this use of data, and participants are informed that anonymisation for further disclosure will take place. DataSHIELD can provide a flexible means of interrogating data while protecting the participants' confidentiality in accordance with applicable legislation and guidance. © 2014 S. Karger AG, Basel.

  9. Visual information mining in remote sensing image archives

    NASA Astrophysics Data System (ADS)

    Pelizzari, Andrea; Descargues, Vincent; Datcu, Mihai P.

    2002-01-01

    The present article focuses on the development of interactive exploratory tools for visually mining the image content in large remote sensing archives. Two aspects are treated: the iconic visualization of the global information in the archive and the progressive visualization of the image details. The proposed methods are integrated in the Image Information Mining (I2M) system. The images and image structure in the I2M system are indexed based on a probabilistic approach. The resulting links are managed by a relational data base. Both the intrinsic complexity of the observed images and the diversity of user requests result in a great number of associations in the data base. Thus new tools have been designed to visualize, in iconic representation the relationships created during a query or information mining operation: the visualization of the query results positioned on the geographical map, quick-looks gallery, visualization of the measure of goodness of the query, visualization of the image space for statistical evaluation purposes. Additionally the I2M system is enhanced with progressive detail visualization in order to allow better access for operator inspection. I2M is a three-tier Java architecture and is optimized for the Internet.

  10. Exploring Diverse Data Sets and Developing New Theories and Ideas With Project Integration Architecture

    NASA Technical Reports Server (NTRS)

    Benyo, Theresa L.; Jones, William H.

    2005-01-01

    The development of new ideas is the essence of scientific research. This is frequently done by developing models of physical processes and comparing model predictions with results from experiments. With models becoming ever more complex and data acquisition systems becoming more powerful, the researcher is burdened with wading through data ranging in volume up to a level of many terabytes and beyond. These data often come from multiple, heterogeneous sources and usually the methods for searching through it are at or near the manual level. In addition, current documentation methods are generally limited to researchers pen-and-paper style notebooks. Researchers may want to form constraint-based queries on a body of existing knowledge that is, itself, distributed over many different machines and environments and from the results of such queries then spawn additional queries, simulations, and data analyses in order to discover new insights into the problem being investigated. Currently, researchers are restricted to working within the boundaries of tools that are inefficient at probing current and legacy data to extend the knowledge of the problem at hand and reveal innovative and efficient solutions. A framework called the Project Integration Architecture is discussed that can address these desired functionalities.

  11. RadSearch: a RIS/PACS integrated query tool

    NASA Astrophysics Data System (ADS)

    Tsao, Sinchai; Documet, Jorge; Moin, Paymann; Wang, Kevin; Liu, Brent J.

    2008-03-01

    Radiology Information Systems (RIS) contain a wealth of information that can be used for research, education, and practice management. However, the sheer amount of information available makes querying specific data difficult and time consuming. Previous work has shown that a clinical RIS database and its RIS text reports can be extracted, duplicated and indexed for searches while complying with HIPAA and IRB requirements. This project's intent is to provide a software tool, the RadSearch Toolkit, to allow intelligent indexing and parsing of RIS reports for easy yet powerful searches. In addition, the project aims to seamlessly query and retrieve associated images from the Picture Archiving and Communication System (PACS) in situations where an integrated RIS/PACS is in place - even subselecting individual series, such as in an MRI study. RadSearch's application of simple text parsing techniques to index text-based radiology reports will allow the search engine to quickly return relevant results. This powerful combination will be useful in both private practice and academic settings; administrators can easily obtain complex practice management information such as referral patterns; researchers can conduct retrospective studies with specific, multiple criteria; teaching institutions can quickly and effectively create thorough teaching files.

  12. The creation of the world--according to science.

    PubMed

    Brustein, Ram; Kupferman, Judy

    2012-01-01

    How was the world created? This question has received attention from many perspectives including religion, culture, philosophy, mysticism, and science. While it may not seem like a query amenable to scientific measurement, it has led scientists to pose fascinating ideas and observations including the Big Bang, the concept of inflation, the fact that most of the universe is made up of dark matter and dark energy that can not be perceived, and more. Scientists cannot claim to know the definitive answer, but they can approach the question from a scientific viewpoint. This begins by examining data, which, thanks to new technology, yields more information than has been previously available. Using novel scientific methods and techniques to analyze the data, fresh perspectives concerning the creation of the world have emerged. This process and its main findings will be described.

  13. Hybrid Quantum-Classical Approach to Quantum Optimal Control.

    PubMed

    Li, Jun; Yang, Xiaodong; Peng, Xinhua; Sun, Chang-Pu

    2017-04-14

    A central challenge in quantum computing is to identify more computational problems for which utilization of quantum resources can offer significant speedup. Here, we propose a hybrid quantum-classical scheme to tackle the quantum optimal control problem. We show that the most computationally demanding part of gradient-based algorithms, namely, computing the fitness function and its gradient for a control input, can be accomplished by the process of evolution and measurement on a quantum simulator. By posing queries to and receiving answers from the quantum simulator, classical computing devices update the control parameters until an optimal control solution is found. To demonstrate the quantum-classical scheme in experiment, we use a seven-qubit nuclear magnetic resonance system, on which we have succeeded in optimizing state preparation without involving classical computation of the large Hilbert space evolution.

  14. Medical professionalism in the new millennium: are there intercultural differences? Brief report and commentary.

    PubMed

    Shah, Manasi M; Summerhill, Eleanor M; Manthous, Constantine A

    2009-05-01

    We hypothesized that differences in premedical and medical indoctrination might lead to demonstrable differences in notions of medical professionalism among U.S. medical schoolgraduates (USMG) and international medical graduates (IMG). We used the previously validated Barry Challenges to Professionalism questionnaire to query applicants to our Medicine residency. Two hundred sixty-six of 1,476 applicants responded; 57 were USMG and 188 IMG were non-U.S. citizens. There were no significant differences in responses based on gender or medical school background (comparing USMG vs IMG). Graduates of U.S. and Canadian schools were more likely than those of Indian schools to answer correctly three of 10 questions. We use the results of this ostensibly "negative" study to comment on the foundations for the hypothesis and logistic difficulty of studying the question.

  15. A Simple Answer to a Simple Question on Changing Answers

    ERIC Educational Resources Information Center

    Bridgeman, Brent

    2012-01-01

    In an article in the Winter 2011 issue of the "Journal of Educational Measurement", van der Linden, Jeon, and Ferrara suggested that "test takers should trust their initial instincts and retain their initial responses when they have the opportunity to review test items." They presented a complex IRT model that appeared to show that students would…

  16. Domain Specific vs Domain General: Implications for Dynamic Assessment

    ERIC Educational Resources Information Center

    Kaniel, Shlomo

    2010-01-01

    The article responds to the need for evidence-based dynamic assessment. The article is divided into two sections: In Part 1 we examine the scientific answer to the question of how far human mental activities and capabilities are domain general (DG) / domain specific (DS). A highly complex answer emerges from the literature review of domains such…

  17. Strong Numbers and Weak Perceptions: Understanding the "Rodney Dangerfield Economy"

    ERIC Educational Resources Information Center

    Wood, William C.

    2006-01-01

    Ordinary citizens, politicians, and economists alike regularly ask how the economy is doing. With a highly polarized electorate and media to match, it has become harder to get an honest answer. Social educators should understand the complexity of such a question--and the subtlety of the answers--as they approach the topic themselves. The American…

  18. Defining Innovation: Using Soft Systems Methodology to Approach the Complexity of Innovation in Educational Technology

    ERIC Educational Resources Information Center

    Cox, Glenda

    2010-01-01

    This paper explores what educational technologists in one South African Institution consider innovation to be. Ten educational technologists in various faculties across the university were interviewed and asked to define and answer questions about innovation. Their answers were coded and the results of the overlaps in coding have been assimilated…

  19. A Federal Commission Wrestles with Gender Equity in Sports: Easy Answers Elude Panel Charged with Reviewing Title IX.

    ERIC Educational Resources Information Center

    Suggs, Welch

    2003-01-01

    Members of the Secretary's Commission on Opportunities in Athletics (U.S. Secretary of Education) continue working to bring about gender equity in college athletics, but they are not reaching many answers to the complex questions involved, including those occurring when a college drops sports teams. (SLD)

  20. Integrating a local database into the StarView distributed user interface

    NASA Technical Reports Server (NTRS)

    Silberberg, D. P.

    1992-01-01

    A distributed user interface to the Space Telescope Data Archive and Distribution Service (DADS) known as StarView is being developed. The DADS architecture consists of the data archive as well as a relational database catalog describing the archive. StarView is a client/server system in which the user interface is the front-end client to the DADS catalog and archive servers. Users query the DADS catalog from the StarView interface. Query commands are transmitted via a network and evaluated by the database. The results are returned via the network and are displayed on StarView forms. Based on the results, users decide which data sets to retrieve from the DADS archive. Archive requests are packaged by StarView and sent to DADS, which returns the requested data sets to the users. The advantages of distributed client/server user interfaces over traditional one-machine systems are well known. Since users run software on machines separate from the database, the overall client response time is much faster. Also, since the server is free to process only database requests, the database response time is much faster. Disadvantages inherent in this architecture are slow overall database access time due to the network delays, lack of a 'get previous row' command, and that refinements of a previously issued query must be submitted to the database server, even though the domain of values have already been returned by the previous query. This architecture also does not allow users to cross correlate DADS catalog data with other catalogs. Clearly, a distributed user interface would be more powerful if it overcame these disadvantages. A local database is being integrated into StarView to overcome these disadvantages. When a query is made through a StarView form, which is often composed of fields from multiple tables, it is translated to an SQL query and issued to the DADS catalog. At the same time, a local database table is created to contain the resulting rows of the query. The returned rows are displayed on the form as well as inserted into the local database table. Identical results are produced by reissuing the query to either the DADS catalog or to the local table. Relational databases do not provide a 'get previous row' function because of the inherent complexity of retrieving previous rows of multiple-table joins. However, since this function is easily implemented on a single table, StarView uses the local table to retrieve the previous row. Also, StarView issues subsequent query refinements to the local table instead of the DADS catalog, eliminating the network transmission overhead. Finally, other catalogs can be imported into the local database for cross correlation with local tables. Overall, it is believe that this is a more powerful architecture for distributed, database user interfaces.

Top