Sample records for spatial query language

  1. Research on Extension of Sparql Ontology Query Language Considering the Computation of Indoor Spatial Relations

    NASA Astrophysics Data System (ADS)

    Li, C.; Zhu, X.; Guo, W.; Liu, Y.; Huang, H.

    2015-05-01

    A method suitable for indoor complex semantic query considering the computation of indoor spatial relations is provided According to the characteristics of indoor space. This paper designs ontology model describing the space related information of humans, events and Indoor space objects (e.g. Storey and Room) as well as their relations to meet the indoor semantic query. The ontology concepts are used in IndoorSPARQL query language which extends SPARQL syntax for representing and querying indoor space. And four types specific primitives for indoor query, "Adjacent", "Opposite", "Vertical" and "Contain", are defined as query functions in IndoorSPARQL used to support quantitative spatial computations. Also a method is proposed to analysis the query language. Finally this paper adopts this method to realize indoor semantic query on the study area through constructing the ontology model for the study building. The experimental results show that the method proposed in this paper can effectively support complex indoor space semantic query.

  2. Spatial information semantic query based on SPARQL

    NASA Astrophysics Data System (ADS)

    Xiao, Zhifeng; Huang, Lei; Zhai, Xiaofang

    2009-10-01

    How can the efficiency of spatial information inquiries be enhanced in today's fast-growing information age? We are rich in geospatial data but poor in up-to-date geospatial information and knowledge that are ready to be accessed by public users. This paper adopts an approach for querying spatial semantic by building an Web Ontology language(OWL) format ontology and introducing SPARQL Protocol and RDF Query Language(SPARQL) to search spatial semantic relations. It is important to establish spatial semantics that support for effective spatial reasoning for performing semantic query. Compared to earlier keyword-based and information retrieval techniques that rely on syntax, we use semantic approaches in our spatial queries system. Semantic approaches need to be developed by ontology, so we use OWL to describe spatial information extracted by the large-scale map of Wuhan. Spatial information expressed by ontology with formal semantics is available to machines for processing and to people for understanding. The approach is illustrated by introducing a case study for using SPARQL to query geo-spatial ontology instances of Wuhan. The paper shows that making use of SPARQL to search OWL ontology instances can ensure the result's accuracy and applicability. The result also indicates constructing a geo-spatial semantic query system has positive efforts on forming spatial query and retrieval.

  3. A Big Spatial Data Processing Framework Applying to National Geographic Conditions Monitoring

    NASA Astrophysics Data System (ADS)

    Xiao, F.

    2018-04-01

    In this paper, a novel framework for spatial data processing is proposed, which apply to National Geographic Conditions Monitoring project of China. It includes 4 layers: spatial data storage, spatial RDDs, spatial operations, and spatial query language. The spatial data storage layer uses HDFS to store large size of spatial vector/raster data in the distributed cluster. The spatial RDDs are the abstract logical dataset of spatial data types, and can be transferred to the spark cluster to conduct spark transformations and actions. The spatial operations layer is a series of processing on spatial RDDs, such as range query, k nearest neighbor and spatial join. The spatial query language is a user-friendly interface which provide people not familiar with Spark with a comfortable way to operation the spatial operation. Compared with other spatial frameworks, it is highlighted that comprehensive technologies are referred for big spatial data processing. Extensive experiments on real datasets show that the framework achieves better performance than traditional process methods.

  4. Demonstration of Hadoop-GIS: A Spatial Data Warehousing System Over MapReduce.

    PubMed

    Aji, Ablimit; Sun, Xiling; Vo, Hoang; Liu, Qioaling; Lee, Rubao; Zhang, Xiaodong; Saltz, Joel; Wang, Fusheng

    2013-11-01

    The proliferation of GPS-enabled devices, and the rapid improvement of scientific instruments have resulted in massive amounts of spatial data in the last decade. Support of high performance spatial queries on large volumes data has become increasingly important in numerous fields, which requires a scalable and efficient spatial data warehousing solution as existing approaches exhibit scalability limitations and efficiency bottlenecks for large scale spatial applications. In this demonstration, we present Hadoop-GIS - a scalable and high performance spatial query system over MapReduce. Hadoop-GIS provides an efficient spatial query engine to process spatial queries, data and space based partitioning, and query pipelines that parallelize queries implicitly on MapReduce. Hadoop-GIS also provides an expressive, SQL-like spatial query language for workload specification. We will demonstrate how spatial queries are expressed in spatially extended SQL queries, and submitted through a command line/web interface for execution. Parallel to our system demonstration, we explain the system architecture and details on how queries are translated to MapReduce operators, optimized, and executed on Hadoop. In addition, we will showcase how the system can be used to support two representative real world use cases: large scale pathology analytical imaging, and geo-spatial data warehousing.

  5. Demonstration of Hadoop-GIS: A Spatial Data Warehousing System Over MapReduce

    PubMed Central

    Aji, Ablimit; Sun, Xiling; Vo, Hoang; Liu, Qioaling; Lee, Rubao; Zhang, Xiaodong; Saltz, Joel; Wang, Fusheng

    2016-01-01

    The proliferation of GPS-enabled devices, and the rapid improvement of scientific instruments have resulted in massive amounts of spatial data in the last decade. Support of high performance spatial queries on large volumes data has become increasingly important in numerous fields, which requires a scalable and efficient spatial data warehousing solution as existing approaches exhibit scalability limitations and efficiency bottlenecks for large scale spatial applications. In this demonstration, we present Hadoop-GIS – a scalable and high performance spatial query system over MapReduce. Hadoop-GIS provides an efficient spatial query engine to process spatial queries, data and space based partitioning, and query pipelines that parallelize queries implicitly on MapReduce. Hadoop-GIS also provides an expressive, SQL-like spatial query language for workload specification. We will demonstrate how spatial queries are expressed in spatially extended SQL queries, and submitted through a command line/web interface for execution. Parallel to our system demonstration, we explain the system architecture and details on how queries are translated to MapReduce operators, optimized, and executed on Hadoop. In addition, we will showcase how the system can be used to support two representative real world use cases: large scale pathology analytical imaging, and geo-spatial data warehousing. PMID:27617325

  6. Object-Oriented Query Language For Events Detection From Images Sequences

    NASA Astrophysics Data System (ADS)

    Ganea, Ion Eugen

    2015-09-01

    In this paper is presented a method to represent the events extracted from images sequences and the query language used for events detection. Using an object oriented model the spatial and temporal relationships between salient objects and also between events are stored and queried. This works aims to unify the storing and querying phases for video events processing. The object oriented language syntax used for events processing allow the instantiation of the indexes classes in order to improve the accuracy of the query results. The experiments were performed on images sequences provided from sport domain and it shows the reliability and the robustness of the proposed language. To extend the language will be added a specific syntax for constructing the templates for abnormal events and for detection of the incidents as the final goal of the research.

  7. An intelligent user interface for browsing satellite data catalogs

    NASA Technical Reports Server (NTRS)

    Cromp, Robert F.; Crook, Sharon

    1989-01-01

    A large scale domain-independent spatial data management expert system that serves as a front-end to databases containing spatial data is described. This system is unique for two reasons. First, it uses spatial search techniques to generate a list of all the primary keys that fall within a user's spatial constraints prior to invoking the database management system, thus substantially decreasing the amount of time required to answer a user's query. Second, a domain-independent query expert system uses a domain-specific rule base to preprocess the user's English query, effectively mapping a broad class of queries into a smaller subset that can be handled by a commercial natural language processing system. The methods used by the spatial search module and the query expert system are explained, and the system architecture for the spatial data management expert system is described. The system is applied to data from the International Ultraviolet Explorer (IUE) satellite, and results are given.

  8. KBGIS-II: A knowledge-based geographic information system

    NASA Technical Reports Server (NTRS)

    Smith, Terence; Peuquet, Donna; Menon, Sudhakar; Agarwal, Pankaj

    1986-01-01

    The architecture and working of a recently implemented Knowledge-Based Geographic Information System (KBGIS-II), designed to satisfy several general criteria for the GIS, is described. The system has four major functions including query-answering, learning and editing. The main query finds constrained locations for spatial objects that are describable in a predicate-calculus based spatial object language. The main search procedures include a family of constraint-satisfaction procedures that use a spatial object knowledge base to search efficiently for complex spatial objects in large, multilayered spatial data bases. These data bases are represented in quadtree form. The search strategy is designed to reduce the computational cost of search in the average case. The learning capabilities of the system include the addition of new locations of complex spatial objects to the knowledge base as queries are answered, and the ability to learn inductively definitions of new spatial objects from examples. The new definitions are added to the knowledge base by the system. The system is performing all its designated tasks successfully. Future reports will relate performance characteristics of the system.

  9. Astronomical Data Processing Using SciQL, an SQL Based Query Language for Array Data

    NASA Astrophysics Data System (ADS)

    Zhang, Y.; Scheers, B.; Kersten, M.; Ivanova, M.; Nes, N.

    2012-09-01

    SciQL (pronounced as ‘cycle’) is a novel SQL-based array query language for scientific applications with both tables and arrays as first class citizens. SciQL lowers the entrance fee of adopting relational DBMS (RDBMS) in scientific domains, because it includes functionality often only found in mathematics software packages. In this paper, we demonstrate the usefulness of SciQL for astronomical data processing using examples from the Transient Key Project of the LOFAR radio telescope. In particular, how the LOFAR light-curve database of all detected sources can be constructed, by correlating sources across the spatial, frequency, time and polarisation domains.

  10. Geographic Video 3d Data Model And Retrieval

    NASA Astrophysics Data System (ADS)

    Han, Z.; Cui, C.; Kong, Y.; Wu, H.

    2014-04-01

    Geographic video includes both spatial and temporal geographic features acquired through ground-based or non-ground-based cameras. With the popularity of video capture devices such as smartphones, the volume of user-generated geographic video clips has grown significantly and the trend of this growth is quickly accelerating. Such a massive and increasing volume poses a major challenge to efficient video management and query. Most of the today's video management and query techniques are based on signal level content extraction. They are not able to fully utilize the geographic information of the videos. This paper aimed to introduce a geographic video 3D data model based on spatial information. The main idea of the model is to utilize the location, trajectory and azimuth information acquired by sensors such as GPS receivers and 3D electronic compasses in conjunction with video contents. The raw spatial information is synthesized to point, line, polygon and solid according to the camcorder parameters such as focal length and angle of view. With the video segment and video frame, we defined the three categories geometry object using the geometry model of OGC Simple Features Specification for SQL. We can query video through computing the spatial relation between query objects and three categories geometry object such as VFLocation, VSTrajectory, VSFOView and VFFovCone etc. We designed the query methods using the structured query language (SQL) in detail. The experiment indicate that the model is a multiple objective, integration, loosely coupled, flexible and extensible data model for the management of geographic stereo video.

  11. Applying Semantic Web Concepts to Support Net-Centric Warfare Using the Tactical Assessment Markup Language (TAML)

    DTIC Science & Technology

    2006-06-01

    SPARQL SPARQL Protocol and RDF Query Language SQL Structured Query Language SUMO Suggested Upper Merged Ontology SW... Query optimization algorithms are implemented in the Pellet reasoner in order to ensure querying a knowledge base is efficient . These algorithms...memory as a treelike structure in order for the data to be queried . XML Query (XQuery) is the standard language used when querying XML

  12. KBGIS-2: A knowledge-based geographic information system

    NASA Technical Reports Server (NTRS)

    Smith, T.; Peuquet, D.; Menon, S.; Agarwal, P.

    1986-01-01

    The architecture and working of a recently implemented knowledge-based geographic information system (KBGIS-2) that was designed to satisfy several general criteria for the geographic information system are described. The system has four major functions that include query-answering, learning, and editing. The main query finds constrained locations for spatial objects that are describable in a predicate-calculus based spatial objects language. The main search procedures include a family of constraint-satisfaction procedures that use a spatial object knowledge base to search efficiently for complex spatial objects in large, multilayered spatial data bases. These data bases are represented in quadtree form. The search strategy is designed to reduce the computational cost of search in the average case. The learning capabilities of the system include the addition of new locations of complex spatial objects to the knowledge base as queries are answered, and the ability to learn inductively definitions of new spatial objects from examples. The new definitions are added to the knowledge base by the system. The system is currently performing all its designated tasks successfully, although currently implemented on inadequate hardware. Future reports will detail the performance characteristics of the system, and various new extensions are planned in order to enhance the power of KBGIS-2.

  13. On describing human white matter anatomy: the white matter query language.

    PubMed

    Wassermann, Demian; Makris, Nikos; Rathi, Yogesh; Shenton, Martha; Kikinis, Ron; Kubicki, Marek; Westin, Carl-Fredrik

    2013-01-01

    The main contribution of this work is the careful syntactical definition of major white matter tracts in the human brain based on a neuroanatomist's expert knowledge. We present a technique to formally describe white matter tracts and to automatically extract them from diffusion MRI data. The framework is based on a novel query language with a near-to-English textual syntax. This query language allows us to construct a dictionary of anatomical definitions describing white matter tracts. The definitions include adjacent gray and white matter regions, and rules for spatial relations. This enables automated coherent labeling of white matter anatomy across subjects. We use our method to encode anatomical knowledge in human white matter describing 10 association and 8 projection tracts per hemisphere and 7 commissural tracts. The technique is shown to be comparable in accuracy to manual labeling. We present results applying this framework to create a white matter atlas from 77 healthy subjects, and we use this atlas in a proof-of-concept study to detect tract changes specific to schizophrenia.

  14. RDF-GL: A SPARQL-Based Graphical Query Language for RDF

    NASA Astrophysics Data System (ADS)

    Hogenboom, Frederik; Milea, Viorel; Frasincar, Flavius; Kaymak, Uzay

    This chapter presents RDF-GL, a graphical query language (GQL) for RDF. The GQL is based on the textual query language SPARQL and mainly focuses on SPARQL SELECT queries. The advantage of a GQL over textual query languages is that complexity is hidden through the use of graphical symbols. RDF-GL is supported by a Java-based editor, SPARQLinG, which is presented as well. The editor does not only allow for RDF-GL query creation, but also converts RDF-GL queries to SPARQL queries and is able to subsequently execute these. Experiments show that using the GQL in combination with the editor makes RDF querying more accessible for end users.

  15. Applying Query Structuring in Cross-language Retrieval.

    ERIC Educational Resources Information Center

    Pirkola, Ari; Puolamaki, Deniz; Jarvelin, Kalervo

    2003-01-01

    Explores ways to apply query structuring in cross-language information retrieval. Tested were: English queries translated into Finnish using an electronic dictionary, and run in a Finnish newspaper databases; effects of compound-based structuring using a proximity operator for translation equivalents of query language compound components; and a…

  16. A Semantic Graph Query Language

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kaplan, I L

    2006-10-16

    Semantic graphs can be used to organize large amounts of information from a number of sources into one unified structure. A semantic query language provides a foundation for extracting information from the semantic graph. The graph query language described here provides a simple, powerful method for querying semantic graphs.

  17. Model-based query language for analyzing clinical processes.

    PubMed

    Barzdins, Janis; Barzdins, Juris; Rencis, Edgars; Sostaks, Agris

    2013-01-01

    Nowadays large databases of clinical process data exist in hospitals. However, these data are rarely used in full scope. In order to perform queries on hospital processes, one must either choose from the predefined queries or develop queries using MS Excel-type software system, which is not always a trivial task. In this paper we propose a new query language for analyzing clinical processes that is easily perceptible also by non-IT professionals. We develop this language based on a process modeling language which is also described in this paper. Prototypes of both languages have already been verified using real examples from hospitals.

  18. End-User Use of Data Base Query Language: Pros and Cons.

    ERIC Educational Resources Information Center

    Nicholes, Walter

    1988-01-01

    Man-machine interface, the concept of a computer "query," a review of database technology, and a description of the use of query languages at Brigham Young University are discussed. The pros and cons of end-user use of database query languages are explored. (Author/MLW)

  19. EquiX-A Search and Query Language for XML.

    ERIC Educational Resources Information Center

    Cohen, Sara; Kanza, Yaron; Kogan, Yakov; Sagiv, Yehoshua; Nutt, Werner; Serebrenik, Alexander

    2002-01-01

    Describes EquiX, a search language for XML that combines querying with searching to query the data and the meta-data content of Web pages. Topics include search engines; a data model for XML documents; search query syntax; search query semantics; an algorithm for evaluating a query on a document; and indexing EquiX queries. (LRW)

  20. A Natural Language Interface Concordant with a Knowledge Base.

    PubMed

    Han, Yong-Jin; Park, Seong-Bae; Park, Se-Young

    2016-01-01

    The discordance between expressions interpretable by a natural language interface (NLI) system and those answerable by a knowledge base is a critical problem in the field of NLIs. In order to solve this discordance problem, this paper proposes a method to translate natural language questions into formal queries that can be generated from a graph-based knowledge base. The proposed method considers a subgraph of a knowledge base as a formal query. Thus, all formal queries corresponding to a concept or a predicate in the knowledge base can be generated prior to query time and all possible natural language expressions corresponding to each formal query can also be collected in advance. A natural language expression has a one-to-one mapping with a formal query. Hence, a natural language question is translated into a formal query by matching the question with the most appropriate natural language expression. If the confidence of this matching is not sufficiently high the proposed method rejects the question and does not answer it. Multipredicate queries are processed by regarding them as a set of collected expressions. The experimental results show that the proposed method thoroughly handles answerable questions from the knowledge base and rejects unanswerable ones effectively.

  1. Design Recommendations for Query Languages

    DTIC Science & Technology

    1980-09-01

    DESIGN RECOMMENDATIONS FOR QUERY LANGUAGES S.L. Ehrenreich Submitted by: Stanley M. Halpin, Acting Chief HUMAN FACTORS TECHNICAL AREA Approved by: Edgar ...respond to que- ries that it recognizes as faulty. Codd (1974) states that in designing a nat- ural query language, attention must be given to dealing...impaired. Codd (1974) also regarded the user’s perception of the data base to be of critical importance in properly designing a query language system

  2. Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce.

    PubMed

    Aji, Ablimit; Wang, Fusheng; Vo, Hoang; Lee, Rubao; Liu, Qiaoling; Zhang, Xiaodong; Saltz, Joel

    2013-08-01

    Support of high performance queries on large volumes of spatial data becomes increasingly important in many application domains, including geospatial problems in numerous fields, location based services, and emerging scientific applications that are increasingly data- and compute-intensive. The emergence of massive scale spatial data is due to the proliferation of cost effective and ubiquitous positioning technologies, development of high resolution imaging technologies, and contribution from a large number of community users. There are two major challenges for managing and querying massive spatial data to support spatial queries: the explosion of spatial data, and the high computational complexity of spatial queries. In this paper, we present Hadoop-GIS - a scalable and high performance spatial data warehousing system for running large scale spatial queries on Hadoop. Hadoop-GIS supports multiple types of spatial queries on MapReduce through spatial partitioning, customizable spatial query engine RESQUE, implicit parallel spatial query execution on MapReduce, and effective methods for amending query results through handling boundary objects. Hadoop-GIS utilizes global partition indexing and customizable on demand local spatial indexing to achieve efficient query processing. Hadoop-GIS is integrated into Hive to support declarative spatial queries with an integrated architecture. Our experiments have demonstrated the high efficiency of Hadoop-GIS on query response and high scalability to run on commodity clusters. Our comparative experiments have showed that performance of Hadoop-GIS is on par with parallel SDBMS and outperforms SDBMS for compute-intensive queries. Hadoop-GIS is available as a set of library for processing spatial queries, and as an integrated software package in Hive.

  3. Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce

    PubMed Central

    Aji, Ablimit; Wang, Fusheng; Vo, Hoang; Lee, Rubao; Liu, Qiaoling; Zhang, Xiaodong; Saltz, Joel

    2013-01-01

    Support of high performance queries on large volumes of spatial data becomes increasingly important in many application domains, including geospatial problems in numerous fields, location based services, and emerging scientific applications that are increasingly data- and compute-intensive. The emergence of massive scale spatial data is due to the proliferation of cost effective and ubiquitous positioning technologies, development of high resolution imaging technologies, and contribution from a large number of community users. There are two major challenges for managing and querying massive spatial data to support spatial queries: the explosion of spatial data, and the high computational complexity of spatial queries. In this paper, we present Hadoop-GIS – a scalable and high performance spatial data warehousing system for running large scale spatial queries on Hadoop. Hadoop-GIS supports multiple types of spatial queries on MapReduce through spatial partitioning, customizable spatial query engine RESQUE, implicit parallel spatial query execution on MapReduce, and effective methods for amending query results through handling boundary objects. Hadoop-GIS utilizes global partition indexing and customizable on demand local spatial indexing to achieve efficient query processing. Hadoop-GIS is integrated into Hive to support declarative spatial queries with an integrated architecture. Our experiments have demonstrated the high efficiency of Hadoop-GIS on query response and high scalability to run on commodity clusters. Our comparative experiments have showed that performance of Hadoop-GIS is on par with parallel SDBMS and outperforms SDBMS for compute-intensive queries. Hadoop-GIS is available as a set of library for processing spatial queries, and as an integrated software package in Hive. PMID:24187650

  4. A high-performance spatial database based approach for pathology imaging algorithm evaluation

    PubMed Central

    Wang, Fusheng; Kong, Jun; Gao, Jingjing; Cooper, Lee A.D.; Kurc, Tahsin; Zhou, Zhengwen; Adler, David; Vergara-Niedermayr, Cristobal; Katigbak, Bryan; Brat, Daniel J.; Saltz, Joel H.

    2013-01-01

    Background: Algorithm evaluation provides a means to characterize variability across image analysis algorithms, validate algorithms by comparison with human annotations, combine results from multiple algorithms for performance improvement, and facilitate algorithm sensitivity studies. The sizes of images and image analysis results in pathology image analysis pose significant challenges in algorithm evaluation. We present an efficient parallel spatial database approach to model, normalize, manage, and query large volumes of analytical image result data. This provides an efficient platform for algorithm evaluation. Our experiments with a set of brain tumor images demonstrate the application, scalability, and effectiveness of the platform. Context: The paper describes an approach and platform for evaluation of pathology image analysis algorithms. The platform facilitates algorithm evaluation through a high-performance database built on the Pathology Analytic Imaging Standards (PAIS) data model. Aims: (1) Develop a framework to support algorithm evaluation by modeling and managing analytical results and human annotations from pathology images; (2) Create a robust data normalization tool for converting, validating, and fixing spatial data from algorithm or human annotations; (3) Develop a set of queries to support data sampling and result comparisons; (4) Achieve high performance computation capacity via a parallel data management infrastructure, parallel data loading and spatial indexing optimizations in this infrastructure. Materials and Methods: We have considered two scenarios for algorithm evaluation: (1) algorithm comparison where multiple result sets from different methods are compared and consolidated; and (2) algorithm validation where algorithm results are compared with human annotations. We have developed a spatial normalization toolkit to validate and normalize spatial boundaries produced by image analysis algorithms or human annotations. The validated data were formatted based on the PAIS data model and loaded into a spatial database. To support efficient data loading, we have implemented a parallel data loading tool that takes advantage of multi-core CPUs to accelerate data injection. The spatial database manages both geometric shapes and image features or classifications, and enables spatial sampling, result comparison, and result aggregation through expressive structured query language (SQL) queries with spatial extensions. To provide scalable and efficient query support, we have employed a shared nothing parallel database architecture, which distributes data homogenously across multiple database partitions to take advantage of parallel computation power and implements spatial indexing to achieve high I/O throughput. Results: Our work proposes a high performance, parallel spatial database platform for algorithm validation and comparison. This platform was evaluated by storing, managing, and comparing analysis results from a set of brain tumor whole slide images. The tools we develop are open source and available to download. Conclusions: Pathology image algorithm validation and comparison are essential to iterative algorithm development and refinement. One critical component is the support for queries involving spatial predicates and comparisons. In our work, we develop an efficient data model and parallel database approach to model, normalize, manage and query large volumes of analytical image result data. Our experiments demonstrate that the data partitioning strategy and the grid-based indexing result in good data distribution across database nodes and reduce I/O overhead in spatial join queries through parallel retrieval of relevant data and quick subsetting of datasets. The set of tools in the framework provide a full pipeline to normalize, load, manage and query analytical results for algorithm evaluation. PMID:23599905

  5. The CMS DBS query language

    NASA Astrophysics Data System (ADS)

    Kuznetsov, Valentin; Riley, Daniel; Afaq, Anzar; Sekhri, Vijay; Guo, Yuyi; Lueking, Lee

    2010-04-01

    The CMS experiment has implemented a flexible and powerful system enabling users to find data within the CMS physics data catalog. The Dataset Bookkeeping Service (DBS) comprises a database and the services used to store and access metadata related to CMS physics data. To this, we have added a generalized query system in addition to the existing web and programmatic interfaces to the DBS. This query system is based on a query language that hides the complexity of the underlying database structure by discovering the join conditions between database tables. This provides a way of querying the system that is simple and straightforward for CMS data managers and physicists to use without requiring knowledge of the database tables or keys. The DBS Query Language uses the ANTLR tool to build the input query parser and tokenizer, followed by a query builder that uses a graph representation of the DBS schema to construct the SQL query sent to underlying database. We will describe the design of the query system, provide details of the language components and overview of how this component fits into the overall data discovery system architecture.

  6. MorphoSaurus--design and evaluation of an interlingua-based, cross-language document retrieval engine for the medical domain.

    PubMed

    Markó, K; Schulz, S; Hahn, U

    2005-01-01

    We propose an interlingua-based indexing approach to account for the particular challenges that arise in the design and implementation of cross-language document retrieval systems for the medical domain. Documents, as well as queries, are mapped to a language-independent conceptual layer on which retrieval operations are performed. We contrast this approach with the direct translation of German queries to English ones which, subsequently, are matched against English documents. We evaluate both approaches, interlingua-based and direct translation, on a large medical document collection, the OHSUMED corpus. A substantial benefit for interlingua-based document retrieval using German queries on English texts is found, which amounts to 93% of the (monolingual) English baseline. Most state-of-the-art cross-language information retrieval systems translate user queries to the language(s) of the target documents. In contra-distinction to this approach, translating both documents and user queries into a language-independent, concept-like representation format is more beneficial to enhance cross-language retrieval performance.

  7. HodDB: Design and Analysis of a Query Processor for Brick.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Fierro, Gabriel; Culler, David

    Brick is a recently proposed metadata schema and ontology for describing building components and the relationships between them. It represents buildings as directed labeled graphs using the RDF data model. Using the SPARQL query language, building-agnostic applications query a Brick graph to discover the set of resources and relationships they require to operate. Latency-sensitive applications, such as user interfaces, demand response and modelpredictive control, require fast queries — conventionally less than 100ms. We benchmark a set of popular open-source and commercial SPARQL databases against three real Brick models using seven application queries and find that none of them meet thismore » performance target. This lack of performance can be attributed to design decisions that optimize for queries over large graphs consisting of billions of triples, but give poor spatial locality and join performance on the small dense graphs typical of Brick. We present the design and evaluation of HodDB, a RDF/SPARQL database for Brick built over a node-based index structure. HodDB performs Brick queries 3-700x faster than leading SPARQL databases and consistently meets the 100ms threshold, enabling the portability of important latency-sensitive building applications.« less

  8. A Priority Fuzzy Logic Extension of the XQuery Language

    NASA Astrophysics Data System (ADS)

    Škrbić, Srdjan; Wettayaprasit, Wiphada; Saeueng, Pannipa

    2011-09-01

    In recent years there have been significant research findings in flexible XML querying techniques using fuzzy set theory. Many types of fuzzy extensions to XML data model and XML query languages have been proposed. In this paper, we introduce priority fuzzy logic extensions to XQuery language. Describing these extensions we introduce a new query language. Moreover, we describe a way to implement an interpreter for this language using an existing XML native database.

  9. Relational Algebra and SQL: Better Together

    ERIC Educational Resources Information Center

    McMaster, Kirby; Sambasivam, Samuel; Hadfield, Steven; Wolthuis, Stuart

    2013-01-01

    In this paper, we describe how database instructors can teach Relational Algebra and Structured Query Language together through programming. Students write query programs consisting of sequences of Relational Algebra operations vs. Structured Query Language SELECT statements. The query programs can then be run interactively, allowing students to…

  10. Natural Language Query System Design for Interactive Information Storage and Retrieval Systems. M.S. Thesis

    NASA Technical Reports Server (NTRS)

    Dominick, Wayne D. (Editor); Liu, I-Hsiung

    1985-01-01

    The currently developed multi-level language interfaces of information systems are generally designed for experienced users. These interfaces commonly ignore the nature and needs of the largest user group, i.e., casual users. This research identifies the importance of natural language query system research within information storage and retrieval system development; addresses the topics of developing such a query system; and finally, proposes a framework for the development of natural language query systems in order to facilitate the communication between casual users and information storage and retrieval systems.

  11. A natural language interface plug-in for cooperative query answering in biological databases.

    PubMed

    Jamil, Hasan M

    2012-06-11

    One of the many unique features of biological databases is that the mere existence of a ground data item is not always a precondition for a query response. It may be argued that from a biologist's standpoint, queries are not always best posed using a structured language. By this we mean that approximate and flexible responses to natural language like queries are well suited for this domain. This is partly due to biologists' tendency to seek simpler interfaces and partly due to the fact that questions in biology involve high level concepts that are open to interpretations computed using sophisticated tools. In such highly interpretive environments, rigidly structured databases do not always perform well. In this paper, our goal is to propose a semantic correspondence plug-in to aid natural language query processing over arbitrary biological database schema with an aim to providing cooperative responses to queries tailored to users' interpretations. Natural language interfaces for databases are generally effective when they are tuned to the underlying database schema and its semantics. Therefore, changes in database schema become impossible to support, or a substantial reorganization cost must be absorbed to reflect any change. We leverage developments in natural language parsing, rule languages and ontologies, and data integration technologies to assemble a prototype query processor that is able to transform a natural language query into a semantically equivalent structured query over the database. We allow knowledge rules and their frequent modifications as part of the underlying database schema. The approach we adopt in our plug-in overcomes some of the serious limitations of many contemporary natural language interfaces, including support for schema modifications and independence from underlying database schema. The plug-in introduced in this paper is generic and facilitates connecting user selected natural language interfaces to arbitrary databases using a semantic description of the intended application. We demonstrate the feasibility of our approach with a practical example.

  12. Spatial and symbolic queries for 3D image data

    NASA Astrophysics Data System (ADS)

    Benson, Daniel C.; Zick, Gregory L.

    1992-04-01

    We present a query system for an object-oriented biomedical imaging database containing 3-D anatomical structures and their corresponding 2-D images. The graphical interface facilitates the formation of spatial queries, nonspatial or symbolic queries, and combined spatial/symbolic queries. A query editor is used for the creation and manipulation of 3-D query objects as volumes, surfaces, lines, and points. Symbolic predicates are formulated through a combination of text fields and multiple choice selections. Query results, which may include images, image contents, composite objects, graphics, and alphanumeric data, are displayed in multiple views. Objects returned by the query may be selected directly within the views for further inspection or modification, or for use as query objects in subsequent queries. Our image database query system provides visual feedback and manipulation of spatial query objects, multiple views of volume data, and the ability to combine spatial and symbolic queries. The system allows for incremental enhancement of existing objects and the addition of new objects and spatial relationships. The query system is designed for databases containing symbolic and spatial data. This paper discuses its application to data acquired in biomedical 3- D image reconstruction, but it is applicable to other areas such as CAD/CAM, geographical information systems, and computer vision.

  13. A Visual Interface for Querying Heterogeneous Phylogenetic Databases.

    PubMed

    Jamil, Hasan M

    2017-01-01

    Despite the recent growth in the number of phylogenetic databases, access to these wealth of resources remain largely tool or form-based interface driven. It is our thesis that the flexibility afforded by declarative query languages may offer the opportunity to access these repositories in a better way, and to use such a language to pose truly powerful queries in unprecedented ways. In this paper, we propose a substantially enhanced closed visual query language, called PhyQL, that can be used to query phylogenetic databases represented in a canonical form. The canonical representation presented helps capture most phylogenetic tree formats in a convenient way, and is used as the storage model for our PhyloBase database for which PhyQL serves as the query language. We have implemented a visual interface for the end users to pose PhyQL queries using visual icons, and drag and drop operations defined over them. Once a query is posed, the interface translates the visual query into a Datalog query for execution over the canonical database. Responses are returned as hyperlinks to phylogenies that can be viewed in several formats using the tree viewers supported by PhyloBase. Results cached in PhyQL buffer allows secondary querying on the computed results making it a truly powerful querying architecture.

  14. A Relational Algebra Query Language for Programming Relational Databases

    ERIC Educational Resources Information Center

    McMaster, Kirby; Sambasivam, Samuel; Anderson, Nicole

    2011-01-01

    In this paper, we describe a Relational Algebra Query Language (RAQL) and Relational Algebra Query (RAQ) software product we have developed that allows database instructors to teach relational algebra through programming. Instead of defining query operations using mathematical notation (the approach commonly taken in database textbooks), students…

  15. Knowledge Query Language (KQL)

    DTIC Science & Technology

    2016-02-12

    Lexington Massachusetts This page intentionally left blank. iii EXECUTIVE SUMMARY Currently, queries for data ...retrieval from non-Structured Query Language (NoSQL) data stores are tightly coupled to the specific implementation of the data store implementation...independent of the storage content and format for querying NoSQL or relational data stores. This approach uses address expressions (or A-Expressions

  16. Bin-Hash Indexing: A Parallel Method for Fast Query Processing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bethel, Edward W; Gosink, Luke J.; Wu, Kesheng

    2008-06-27

    This paper presents a new parallel indexing data structure for answering queries. The index, called Bin-Hash, offers extremely high levels of concurrency, and is therefore well-suited for the emerging commodity of parallel processors, such as multi-cores, cell processors, and general purpose graphics processing units (GPU). The Bin-Hash approach first bins the base data, and then partitions and separately stores the values in each bin as a perfect spatial hash table. To answer a query, we first determine whether or not a record satisfies the query conditions based on the bin boundaries. For the bins with records that can not bemore » resolved, we examine the spatial hash tables. The procedures for examining the bin numbers and the spatial hash tables offer the maximum possible level of concurrency; all records are able to be evaluated by our procedure independently in parallel. Additionally, our Bin-Hash procedures access much smaller amounts of data than similar parallel methods, such as the projection index. This smaller data footprint is critical for certain parallel processors, like GPUs, where memory resources are limited. To demonstrate the effectiveness of Bin-Hash, we implement it on a GPU using the data-parallel programming language CUDA. The concurrency offered by the Bin-Hash index allows us to fully utilize the GPU's massive parallelism in our work; over 12,000 records can be simultaneously evaluated at any one time. We show that our new query processing method is an order of magnitude faster than current state-of-the-art CPU-based indexing technologies. Additionally, we compare our performance to existing GPU-based projection index strategies.« less

  17. Generating and Executing Complex Natural Language Queries across Linked Data.

    PubMed

    Hamon, Thierry; Mougin, Fleur; Grabar, Natalia

    2015-01-01

    With the recent and intensive research in the biomedical area, the knowledge accumulated is disseminated through various knowledge bases. Links between these knowledge bases are needed in order to use them jointly. Linked Data, SPARQL language, and interfaces in Natural Language question-answering provide interesting solutions for querying such knowledge bases. We propose a method for translating natural language questions in SPARQL queries. We use Natural Language Processing tools, semantic resources, and the RDF triples description. The method is designed on 50 questions over 3 biomedical knowledge bases, and evaluated on 27 questions. It achieves 0.78 F-measure on the test set. The method for translating natural language questions into SPARQL queries is implemented as Perl module available at http://search.cpan.org/ thhamon/RDF-NLP-SPARQLQuery.

  18. The white matter query language: a novel approach for describing human white matter anatomy

    PubMed Central

    Makris, Nikos; Rathi, Yogesh; Shenton, Martha; Kikinis, Ron; Kubicki, Marek; Westin, Carl-Fredrik

    2016-01-01

    We have developed a novel method to describe human white matter anatomy using an approach that is both intuitive and simple to use, and which automatically extracts white matter tracts from diffusion MRI volumes. Further, our method simplifies the quantification and statistical analysis of white matter tracts on large diffusion MRI databases. This work reflects the careful syntactical definition of major white matter fiber tracts in the human brain based on a neuroanatomist’s expert knowledge. The framework is based on a novel query language with a near-to-English textual syntax. This query language makes it possible to construct a dictionary of anatomical definitions that describe white matter tracts. The definitions include adjacent gray and white matter regions, and rules for spatial relations. This novel method makes it possible to automatically label white matter anatomy across subjects. After describing this method, we provide an example of its implementation where we encode anatomical knowledge in human white matter for ten association and 15 projection tracts per hemisphere, along with seven commissural tracts. Importantly, this novel method is comparable in accuracy to manual labeling. Finally, we present results applying this method to create a white matter atlas from 77 healthy subjects, and we use this atlas in a small proof-of-concept study to detect changes in association tracts that characterize schizophrenia. PMID:26754839

  19. The white matter query language: a novel approach for describing human white matter anatomy.

    PubMed

    Wassermann, Demian; Makris, Nikos; Rathi, Yogesh; Shenton, Martha; Kikinis, Ron; Kubicki, Marek; Westin, Carl-Fredrik

    2016-12-01

    We have developed a novel method to describe human white matter anatomy using an approach that is both intuitive and simple to use, and which automatically extracts white matter tracts from diffusion MRI volumes. Further, our method simplifies the quantification and statistical analysis of white matter tracts on large diffusion MRI databases. This work reflects the careful syntactical definition of major white matter fiber tracts in the human brain based on a neuroanatomist's expert knowledge. The framework is based on a novel query language with a near-to-English textual syntax. This query language makes it possible to construct a dictionary of anatomical definitions that describe white matter tracts. The definitions include adjacent gray and white matter regions, and rules for spatial relations. This novel method makes it possible to automatically label white matter anatomy across subjects. After describing this method, we provide an example of its implementation where we encode anatomical knowledge in human white matter for ten association and 15 projection tracts per hemisphere, along with seven commissural tracts. Importantly, this novel method is comparable in accuracy to manual labeling. Finally, we present results applying this method to create a white matter atlas from 77 healthy subjects, and we use this atlas in a small proof-of-concept study to detect changes in association tracts that characterize schizophrenia.

  20. Concept-based query language approach to enterprise information systems

    NASA Astrophysics Data System (ADS)

    Niemi, Timo; Junkkari, Marko; Järvelin, Kalervo

    2014-01-01

    In enterprise information systems (EISs) it is necessary to model, integrate and compute very diverse data. In advanced EISs the stored data often are based both on structured (e.g. relational) and semi-structured (e.g. XML) data models. In addition, the ad hoc information needs of end-users may require the manipulation of data-oriented (structural), behavioural and deductive aspects of data. Contemporary languages capable of treating this kind of diversity suit only persons with good programming skills. In this paper we present a concept-oriented query language approach to manipulate this diversity so that the programming skill requirements are considerably reduced. In our query language, the features which need technical knowledge are hidden in application-specific concepts and structures. Therefore, users need not be aware of the underlying technology. Application-specific concepts and structures are represented by the modelling primitives of the extended RDOOM (relational deductive object-oriented modelling) which contains primitives for all crucial real world relationships (is-a relationship, part-of relationship, association), XML documents and views. Our query language also supports intensional and extensional-intensional queries, in addition to conventional extensional queries. In its query formulation, the end-user combines available application-specific concepts and structures through shared variables.

  1. Knowledge Query Language (KQL)

    DTIC Science & Technology

    2016-02-01

    unlimited. This page intentionally left blank. iii EXECUTIVE SUMMARY Currently, queries for data ...retrieval from non-Structured Query Language (NoSQL) data stores are tightly coupled to the specific implementation of the data store implementation, making...of the storage content and format for querying NoSQL or relational data stores. This approach uses address expressions (or A-Expressions) embedded in

  2. a Novel Approach of Indexing and Retrieving Spatial Polygons for Efficient Spatial Region Queries

    NASA Astrophysics Data System (ADS)

    Zhao, J. H.; Wang, X. Z.; Wang, F. Y.; Shen, Z. H.; Zhou, Y. C.; Wang, Y. L.

    2017-10-01

    Spatial region queries are more and more widely used in web-based applications. Mechanisms to provide efficient query processing over geospatial data are essential. However, due to the massive geospatial data volume, heavy geometric computation, and high access concurrency, it is difficult to get response in real time. Spatial indexes are usually used in this situation. In this paper, based on k-d tree, we introduce a distributed KD-Tree (DKD-Tree) suitbable for polygon data, and a two-step query algorithm. The spatial index construction is recursive and iterative, and the query is an in memory process. Both the index and query methods can be processed in parallel, and are implemented based on HDFS, Spark and Redis. Experiments on a large volume of Remote Sensing images metadata have been carried out, and the advantages of our method are investigated by comparing with spatial region queries executed on PostgreSQL and PostGIS. Results show that our approach not only greatly improves the efficiency of spatial region query, but also has good scalability, Moreover, the two-step spatial range query algorithm can also save cluster resources to support a large number of concurrent queries. Therefore, this method is very useful when building large geographic information systems.

  3. SPARQL Query Re-writing Using Partonomy Based Transformation Rules

    NASA Astrophysics Data System (ADS)

    Jain, Prateek; Yeh, Peter Z.; Verma, Kunal; Henson, Cory A.; Sheth, Amit P.

    Often the information present in a spatial knowledge base is represented at a different level of granularity and abstraction than the query constraints. For querying ontology's containing spatial information, the precise relationships between spatial entities has to be specified in the basic graph pattern of SPARQL query which can result in long and complex queries. We present a novel approach to help users intuitively write SPARQL queries to query spatial data, rather than relying on knowledge of the ontology structure. Our framework re-writes queries, using transformation rules to exploit part-whole relations between geographical entities to address the mismatches between query constraints and knowledge base. Our experiments were performed on completely third party datasets and queries. Evaluations were performed on Geonames dataset using questions from National Geographic Bee serialized into SPARQL and British Administrative Geography Ontology using questions from a popular trivia website. These experiments demonstrate high precision in retrieval of results and ease in writing queries.

  4. Spatial Knowledge Infrastructures - Creating Value for Policy Makers and Benefits the Community

    NASA Astrophysics Data System (ADS)

    Arnold, L. M.

    2016-12-01

    The spatial data infrastructure is arguably one of the most significant advancements in the spatial sector. It's been a game changer for governments, providing for the coordination and sharing of spatial data across organisations and the provision of accessible information to the broader community of users. Today however, end-users such as policy-makers require far more from these spatial data infrastructures. They want more than just data; they want the knowledge that can be extracted from data and they don't want to have to download, manipulate and process data in order to get the knowledge they seek. It's time for the spatial sector to reduce its focus on data in spatial data infrastructures and take a more proactive step in emphasising and delivering the knowledge value. Nowadays, decision-makers want to be able to query at will the data to meet their immediate need for knowledge. This is a new value proposal for the decision-making consumer and will require a shift in thinking. This paper presents a model for a Spatial Knowledge Infrastructure and underpinning methods that will realise a new real-time approach to delivering knowledge. The methods embrace the new capabilities afforded through the sematic web, domain and process ontologies and natural query language processing. Semantic Web technologies today have the potential to transform the spatial industry into more than just a distribution channel for data. The Semantic Web RDF (Resource Description Framework) enables meaning to be drawn from data automatically. While pushing data out to end-users will remain a central role for data producers, the power of the semantic web is that end-users have the ability to marshal a broad range of spatial resources via a query to extract knowledge from available data. This can be done without actually having to configure systems specifically for the end-user. All data producers need do is make data accessible in RDF and the spatial analytics does the rest.

  5. SPARQL Assist language-neutral query composer

    PubMed Central

    2012-01-01

    Background SPARQL query composition is difficult for the lay-person, and even the experienced bioinformatician in cases where the data model is unfamiliar. Moreover, established best-practices and internationalization concerns dictate that the identifiers for ontological terms should be opaque rather than human-readable, which further complicates the task of synthesizing queries manually. Results We present SPARQL Assist: a Web application that addresses these issues by providing context-sensitive type-ahead completion during SPARQL query construction. Ontological terms are suggested using their multi-lingual labels and descriptions, leveraging existing support for internationalization and language-neutrality. Moreover, the system utilizes the semantics embedded in ontologies, and within the query itself, to help prioritize the most likely suggestions. Conclusions To ensure success, the Semantic Web must be easily available to all users, regardless of locale, training, or preferred language. By enhancing support for internationalization, and moreover by simplifying the manual construction of SPARQL queries through the use of controlled-natural-language interfaces, we believe we have made some early steps towards simplifying access to Semantic Web resources. PMID:22373327

  6. SPARQL assist language-neutral query composer.

    PubMed

    McCarthy, Luke; Vandervalk, Ben; Wilkinson, Mark

    2012-01-25

    SPARQL query composition is difficult for the lay-person, and even the experienced bioinformatician in cases where the data model is unfamiliar. Moreover, established best-practices and internationalization concerns dictate that the identifiers for ontological terms should be opaque rather than human-readable, which further complicates the task of synthesizing queries manually. We present SPARQL Assist: a Web application that addresses these issues by providing context-sensitive type-ahead completion during SPARQL query construction. Ontological terms are suggested using their multi-lingual labels and descriptions, leveraging existing support for internationalization and language-neutrality. Moreover, the system utilizes the semantics embedded in ontologies, and within the query itself, to help prioritize the most likely suggestions. To ensure success, the Semantic Web must be easily available to all users, regardless of locale, training, or preferred language. By enhancing support for internationalization, and moreover by simplifying the manual construction of SPARQL queries through the use of controlled-natural-language interfaces, we believe we have made some early steps towards simplifying access to Semantic Web resources.

  7. Towards Building a High Performance Spatial Query System for Large Scale Medical Imaging Data.

    PubMed

    Aji, Ablimit; Wang, Fusheng; Saltz, Joel H

    2012-11-06

    Support of high performance queries on large volumes of scientific spatial data is becoming increasingly important in many applications. This growth is driven by not only geospatial problems in numerous fields, but also emerging scientific applications that are increasingly data- and compute-intensive. For example, digital pathology imaging has become an emerging field during the past decade, where examination of high resolution images of human tissue specimens enables more effective diagnosis, prediction and treatment of diseases. Systematic analysis of large-scale pathology images generates tremendous amounts of spatially derived quantifications of micro-anatomic objects, such as nuclei, blood vessels, and tissue regions. Analytical pathology imaging provides high potential to support image based computer aided diagnosis. One major requirement for this is effective querying of such enormous amount of data with fast response, which is faced with two major challenges: the "big data" challenge and the high computation complexity. In this paper, we present our work towards building a high performance spatial query system for querying massive spatial data on MapReduce. Our framework takes an on demand index building approach for processing spatial queries and a partition-merge approach for building parallel spatial query pipelines, which fits nicely with the computing model of MapReduce. We demonstrate our framework on supporting multi-way spatial joins for algorithm evaluation and nearest neighbor queries for microanatomic objects. To reduce query response time, we propose cost based query optimization to mitigate the effect of data skew. Our experiments show that the framework can efficiently support complex analytical spatial queries on MapReduce.

  8. Towards Building a High Performance Spatial Query System for Large Scale Medical Imaging Data

    PubMed Central

    Aji, Ablimit; Wang, Fusheng; Saltz, Joel H.

    2013-01-01

    Support of high performance queries on large volumes of scientific spatial data is becoming increasingly important in many applications. This growth is driven by not only geospatial problems in numerous fields, but also emerging scientific applications that are increasingly data- and compute-intensive. For example, digital pathology imaging has become an emerging field during the past decade, where examination of high resolution images of human tissue specimens enables more effective diagnosis, prediction and treatment of diseases. Systematic analysis of large-scale pathology images generates tremendous amounts of spatially derived quantifications of micro-anatomic objects, such as nuclei, blood vessels, and tissue regions. Analytical pathology imaging provides high potential to support image based computer aided diagnosis. One major requirement for this is effective querying of such enormous amount of data with fast response, which is faced with two major challenges: the “big data” challenge and the high computation complexity. In this paper, we present our work towards building a high performance spatial query system for querying massive spatial data on MapReduce. Our framework takes an on demand index building approach for processing spatial queries and a partition-merge approach for building parallel spatial query pipelines, which fits nicely with the computing model of MapReduce. We demonstrate our framework on supporting multi-way spatial joins for algorithm evaluation and nearest neighbor queries for microanatomic objects. To reduce query response time, we propose cost based query optimization to mitigate the effect of data skew. Our experiments show that the framework can efficiently support complex analytical spatial queries on MapReduce. PMID:24501719

  9. Querying Proofs

    NASA Technical Reports Server (NTRS)

    Aspinall, David; Denney, Ewen; Lueth, Christoph

    2012-01-01

    We motivate and introduce a query language PrQL designed for inspecting machine representations of proofs. PrQL natively supports hiproofs which express proof structure using hierarchical nested labelled trees. The core language presented in this paper is locally structured (first-order), with queries built using recursion and patterns over proof structure and rule names. We define the syntax and semantics of locally structured queries, demonstrate their power, and sketch some implementation experiments.

  10. Querying Proofs (Work in Progress)

    NASA Technical Reports Server (NTRS)

    Aspinall, David; Denney, Ewen; Lueth, Christoph

    2011-01-01

    We motivate and introduce the basis for a query language designed for inspecting electronic representations of proofs. We argue that there is much to learn from large proofs beyond their validity, and that a dedicated query language can provide a principled way of implementing a family of useful operations.

  11. Research on presentation and query service of geo-spatial data based on ontology

    NASA Astrophysics Data System (ADS)

    Li, Hong-wei; Li, Qin-chao; Cai, Chang

    2008-10-01

    The paper analyzed the deficiency on presentation and query of geo-spatial data existed in current GIS, discussed the advantages that ontology possessed in formalization of geo-spatial data and the presentation of semantic granularity, taken land-use classification system as an example to construct domain ontology, and described it by OWL; realized the grade level and category presentation of land-use data benefited from the thoughts of vertical and horizontal navigation; and then discussed query mode of geo-spatial data based on ontology, including data query based on types and grade levels, instances and spatial relation, and synthetic query based on types and instances; these methods enriched query mode of current GIS, and is a useful attempt; point out that the key point of the presentation and query of spatial data based on ontology is to construct domain ontology that can correctly reflect geo-concept and its spatial relation and realize its fine formalization description.

  12. A New Publicly Available Chemical Query Language, CSRML, to support Chemotype Representations for Application to Data-Mining and Modeling

    EPA Science Inventory

    A new XML-based query language, CSRML, has been developed for representing chemical substructures, molecules, reaction rules, and reactions. CSRML queries are capable of integrating additional forms of information beyond the simple substructure (e.g., SMARTS) or reaction transfor...

  13. A Framework for Building and Reasoning with Adaptive and Interoperable PMESII Models

    DTIC Science & Technology

    2007-11-01

    Description Logic SOA Service Oriented Architecture SPARQL Simple Protocol And RDF Query Language SQL Standard Query Language SROM Stability and...another by providing a more expressive ontological structure for one of the models, e.g., semantic networks can be mapped to first- order logical...Pellet is an open-source reasoner that works with OWL-DL. It accepts the SPARQL protocol and RDF query language ( SPARQL ) and provides a Java API to

  14. Ontology-based geospatial data query and integration

    USGS Publications Warehouse

    Zhao, T.; Zhang, C.; Wei, M.; Peng, Z.-R.

    2008-01-01

    Geospatial data sharing is an increasingly important subject as large amount of data is produced by a variety of sources, stored in incompatible formats, and accessible through different GIS applications. Past efforts to enable sharing have produced standardized data format such as GML and data access protocols such as Web Feature Service (WFS). While these standards help enabling client applications to gain access to heterogeneous data stored in different formats from diverse sources, the usability of the access is limited due to the lack of data semantics encoded in the WFS feature types. Past research has used ontology languages to describe the semantics of geospatial data but ontology-based queries cannot be applied directly to legacy data stored in databases or shapefiles, or to feature data in WFS services. This paper presents a method to enable ontology query on spatial data available from WFS services and on data stored in databases. We do not create ontology instances explicitly and thus avoid the problems of data replication. Instead, user queries are rewritten to WFS getFeature requests and SQL queries to database. The method also has the benefits of being able to utilize existing tools of databases, WFS, and GML while enabling query based on ontology semantics. ?? 2008 Springer-Verlag Berlin Heidelberg.

  15. An Experimental Investigation of Complexity in Database Query Formulation Tasks

    ERIC Educational Resources Information Center

    Casterella, Gretchen Irwin; Vijayasarathy, Leo

    2013-01-01

    Information Technology professionals and other knowledge workers rely on their ability to extract data from organizational databases to respond to business questions and support decision making. Structured query language (SQL) is the standard programming language for querying data in relational databases, and SQL skills are in high demand and are…

  16. SIMS: addressing the problem of heterogeneity in databases

    NASA Astrophysics Data System (ADS)

    Arens, Yigal

    1997-02-01

    The heterogeneity of remotely accessible databases -- with respect to contents, query language, semantics, organization, etc. -- presents serious obstacles to convenient querying. The SIMS (single interface to multiple sources) system addresses this global integration problem. It does so by defining a single language for describing the domain about which information is stored in the databases and using this language as the query language. Each database to which SIMS is to provide access is modeled using this language. The model describes a database's contents, organization, and other relevant features. SIMS uses these models, together with a planning system drawing on techniques from artificial intelligence, to decompose a given user's high-level query into a series of queries against the databases and other data manipulation steps. The retrieval plan is constructed so as to minimize data movement over the network and maximize parallelism to increase execution speed. SIMS can recover from network failures during plan execution by obtaining data from alternate sources, when possible. SIMS has been demonstrated in the domains of medical informatics and logistics, using real databases.

  17. Supporting temporal queries on clinical relational databases: the S-WATCH-QL language.

    PubMed Central

    Combi, C.; Missora, L.; Pinciroli, F.

    1996-01-01

    Due to the ubiquitous and special nature of time, specially in clinical datábases there's the need of particular temporal data and operators. In this paper we describe S-WATCH-QL (Structured Watch Query Language), a temporal extension of SQL, the widespread query language based on the relational model. S-WATCH-QL extends the well-known SQL by the addition of: a) temporal data types that allow the storage of information with different levels of granularity; b) historical relations that can store together both instantaneous valid times and intervals; c) some temporal clauses, functions and predicates allowing to define complex temporal queries. PMID:8947722

  18. Finding Relevant Data in a Sea of Languages

    DTIC Science & Technology

    2016-04-26

    full machine-translated text , unbiased word clouds , query-biased word clouds , and query-biased sentence...and information retrieval to automate language processing tasks so that the limited number of linguists available for analyzing text and spoken...the crime (stock market). The Cross-LAnguage Search Engine (CLASE) has already preprocessed the documents, extracting text to identify the language

  19. An Expressive and Efficient Language for XML Information Retrieval.

    ERIC Educational Resources Information Center

    Chinenyanga, Taurai Tapiwa; Kushmerick, Nicholas

    2002-01-01

    Discusses XML and information retrieval and describes a query language, ELIXIR (expressive and efficient language for XML information retrieval), with a textual similarity operator that can be used for similarity joins. Explains the algorithm for answering ELIXIR queries to generate intermediate relational data. (Author/LRW)

  20. DBPQL: A view-oriented query language for the Intel Data Base Processor

    NASA Technical Reports Server (NTRS)

    Fishwick, P. A.

    1983-01-01

    An interactive query language (BDPQL) for the Intel Data Base Processor (DBP) is defined. DBPQL includes a parser generator package which permits the analyst to easily create and manipulate the query statement syntax and semantics. The prototype language, DBPQL, includes trace and performance commands to aid the analyst when implementing new commands and analyzing the execution characteristics of the DBP. The DBPQL grammar file and associated key procedures are included as an appendix to this report.

  1. Time series patterns and language support in DBMS

    NASA Astrophysics Data System (ADS)

    Telnarova, Zdenka

    2017-07-01

    This contribution is focused on pattern type Time Series as a rich in semantics representation of data. Some example of implementation of this pattern type in traditional Data Base Management Systems is briefly presented. There are many approaches how to manipulate with patterns and query patterns. Crucial issue can be seen in systematic approach to pattern management and specific pattern query language which takes into consideration semantics of patterns. Query language SQL-TS for manipulating with patterns is shown on Time Series data.

  2. A New Framework for Textual Information Mining over Parse Trees. CRESST Report 805

    ERIC Educational Resources Information Center

    Mousavi, Hamid; Kerr, Deirdre; Iseli, Markus R.

    2011-01-01

    Textual information mining is a challenging problem that has resulted in the creation of many different rule-based linguistic query languages. However, these languages generally are not optimized for the purpose of text mining. In other words, they usually consider queries as individuals and only return raw results for each query. Moreover they…

  3. Graphical modeling and query language for hospitals.

    PubMed

    Barzdins, Janis; Barzdins, Juris; Rencis, Edgars; Sostaks, Agris

    2013-01-01

    So far there has been little evidence that implementation of the health information technologies (HIT) is leading to health care cost savings. One of the reasons for this lack of impact by the HIT likely lies in the complexity of the business process ownership in the hospitals. The goal of our research is to develop a business model-based method for hospital use which would allow doctors to retrieve directly the ad-hoc information from various hospital databases. We have developed a special domain-specific process modelling language called the MedMod. Formally, we define the MedMod language as a profile on UML Class diagrams, but we also demonstrate it on examples, where we explain the semantics of all its elements informally. Moreover, we have developed the Process Query Language (PQL) that is based on MedMod process definition language. The purpose of PQL is to allow a doctor querying (filtering) runtime data of hospital's processes described using MedMod. The MedMod language tries to overcome deficiencies in existing process modeling languages, allowing to specify the loosely-defined sequence of the steps to be performed in the clinical process. The main advantages of PQL are in two main areas - usability and efficiency. They are: 1) the view on data through "glasses" of familiar process, 2) the simple and easy-to-perceive means of setting filtering conditions require no more expertise than using spreadsheet applications, 3) the dynamic response to each step in construction of the complete query that shortens the learning curve greatly and reduces the error rate, and 4) the selected means of filtering and data retrieving allows to execute queries in O(n) time regarding the size of the dataset. We are about to continue developing this project with three further steps. First, we are planning to develop user-friendly graphical editors for the MedMod process modeling and query languages. The second step is to do evaluation of usability the proposed language and tool involving the physicians from several hospitals in Latvia and working with real data from these hospitals. Our third step is to develop an efficient implementation of the query language.

  4. Foundations of RDF Databases

    NASA Astrophysics Data System (ADS)

    Arenas, Marcelo; Gutierrez, Claudio; Pérez, Jorge

    The goal of this paper is to give an overview of the basics of the theory of RDF databases. We provide a formal definition of RDF that includes the features that distinguish this model from other graph data models. We then move into the fundamental issue of querying RDF data. We start by considering the RDF query language SPARQL, which is a W3C Recommendation since January 2008. We provide an algebraic syntax and a compositional semantics for this language, study the complexity of the evaluation problem for different fragments of SPARQL, and consider the problem of optimizing the evaluation of SPARQL queries, showing that a natural fragment of this language has some good properties in this respect. We furthermore study the expressive power of SPARQL, by comparing it with some well-known query languages such as relational algebra. We conclude by considering the issue of querying RDF data in the presence of RDFS vocabulary. In particular, we present a recently proposed extension of SPARQL with navigational capabilities.

  5. Spatial Relation Predicates in Topographic Feature Semantics

    USGS Publications Warehouse

    Varanka, Dalia E.; Caro, Holly K.

    2013-01-01

    Topographic data are designed and widely used for base maps of diverse applications, yet the power of these information sources largely relies on the interpretive skills of map readers and relational database expert users once the data are in map or geographic information system (GIS) form. Advances in geospatial semantic technology offer data model alternatives for explicating concepts and articulating complex data queries and statements. To understand and enrich the vocabulary of topographic feature properties for semantic technology, English language spatial relation predicates were analyzed in three standard topographic feature glossaries. The analytical approach drew from disciplinary concepts in geography, linguistics, and information science. Five major classes of spatial relation predicates were identified from the analysis; representations for most of these are not widely available. The classes are: part-whole (which are commonly modeled throughout semantic and linked-data networks), geometric, processes, human intention, and spatial prepositions. These are commonly found in the ‘real world’ and support the environmental science basis for digital topographical mapping. The spatial relation concepts are based on sets of relation terms presented in this chapter, though these lists are not prescriptive or exhaustive. The results of this study make explicit the concepts forming a broad set of spatial relation expressions, which in turn form the basis for expanding the range of possible queries for topographical data analysis and mapping.

  6. Recommender System for Learning SQL Using Hints

    ERIC Educational Resources Information Center

    Lavbic, Dejan; Matek, Tadej; Zrnec, Aljaž

    2017-01-01

    Today's software industry requires individuals who are proficient in as many programming languages as possible. Structured query language (SQL), as an adopted standard, is no exception, as it is the most widely used query language to retrieve and manipulate data. However, the process of learning SQL turns out to be challenging. The need for a…

  7. Ontological Approach to Military Knowledge Modeling and Management

    DTIC Science & Technology

    2004-03-01

    federated search mechanism has to reformulate user queries (expressed using the ontology) in the query languages of the different sources (e.g. SQL...ontologies as a common terminology – Unified query to perform federated search • Query processing – Ontology mapping to sources reformulate queries

  8. Experiments with Cross-Language Information Retrieval on a Health Portal for Psychology and Psychotherapy.

    PubMed

    Andrenucci, Andrea

    2016-01-01

    Few studies have been performed within cross-language information retrieval (CLIR) in the field of psychology and psychotherapy. The aim of this paper is to to analyze and assess the quality of available query translation methods for CLIR on a health portal for psychology. A test base of 100 user queries, 50 Multi Word Units (WUs) and 50 Single WUs, was used. Swedish was the source language and English the target language. Query translation methods based on machine translation (MT) and dictionary look-up were utilized in order to submit query translations to two search engines: Google Site Search and Quick Ask. Standard IR evaluation measures and a qualitative analysis were utilized to assess the results. The lexicon extracted with word alignment of the portal's parallel corpus provided better statistical results among dictionary look-ups. Google Translate provided more linguistically correct translations overall and also delivered better retrieval results in MT.

  9. Adding intelligence to scientific data management

    NASA Technical Reports Server (NTRS)

    Campbell, William J.; Short, Nicholas M., Jr.; Treinish, Lloyd A.

    1989-01-01

    NASA plans to solve some of the problems of handling large-scale scientific data bases by turning to artificial intelligence (AI) are discussed. The growth of the information glut and the ways that AI can help alleviate the resulting problems are reviewed. The employment of the Intelligent User Interface prototype, where the user will generate his own natural language query with the assistance of the system, is examined. Spatial data management, scientific data visualization, and data fusion are discussed.

  10. Evaluation of Content-Matched Range Monitoring Queries over Moving Objects in Mobile Computing Environments.

    PubMed

    Jung, HaRim; Song, MoonBae; Youn, Hee Yong; Kim, Ung Mo

    2015-09-18

    A content-matched (CM) rangemonitoring query overmoving objects continually retrieves the moving objects (i) whose non-spatial attribute values are matched to given non-spatial query values; and (ii) that are currently located within a given spatial query range. In this paper, we propose a new query indexing structure, called the group-aware query region tree (GQR-tree) for efficient evaluation of CMrange monitoring queries. The primary role of the GQR-tree is to help the server leverage the computational capabilities of moving objects in order to improve the system performance in terms of the wireless communication cost and server workload. Through a series of comprehensive simulations, we verify the superiority of the GQR-tree method over the existing methods.

  11. Query Language for Location-Based Services: A Model Checking Approach

    NASA Astrophysics Data System (ADS)

    Hoareau, Christian; Satoh, Ichiro

    We present a model checking approach to the rationale, implementation, and applications of a query language for location-based services. Such query mechanisms are necessary so that users, objects, and/or services can effectively benefit from the location-awareness of their surrounding environment. The underlying data model is founded on a symbolic model of space organized in a tree structure. Once extended to a semantic model for modal logic, we regard location query processing as a model checking problem, and thus define location queries as hybrid logicbased formulas. Our approach is unique to existing research because it explores the connection between location models and query processing in ubiquitous computing systems, relies on a sound theoretical basis, and provides modal logic-based query mechanisms for expressive searches over a decentralized data structure. A prototype implementation is also presented and will be discussed.

  12. Efficient Queries of Stand-off Annotations for Natural Language Processing on Electronic Medical Records.

    PubMed

    Luo, Yuan; Szolovits, Peter

    2016-01-01

    In natural language processing, stand-off annotation uses the starting and ending positions of an annotation to anchor it to the text and stores the annotation content separately from the text. We address the fundamental problem of efficiently storing stand-off annotations when applying natural language processing on narrative clinical notes in electronic medical records (EMRs) and efficiently retrieving such annotations that satisfy position constraints. Efficient storage and retrieval of stand-off annotations can facilitate tasks such as mapping unstructured text to electronic medical record ontologies. We first formulate this problem into the interval query problem, for which optimal query/update time is in general logarithm. We next perform a tight time complexity analysis on the basic interval tree query algorithm and show its nonoptimality when being applied to a collection of 13 query types from Allen's interval algebra. We then study two closely related state-of-the-art interval query algorithms, proposed query reformulations, and augmentations to the second algorithm. Our proposed algorithm achieves logarithmic time stabbing-max query time complexity and solves the stabbing-interval query tasks on all of Allen's relations in logarithmic time, attaining the theoretic lower bound. Updating time is kept logarithmic and the space requirement is kept linear at the same time. We also discuss interval management in external memory models and higher dimensions.

  13. Efficient Queries of Stand-off Annotations for Natural Language Processing on Electronic Medical Records

    PubMed Central

    Luo, Yuan; Szolovits, Peter

    2016-01-01

    In natural language processing, stand-off annotation uses the starting and ending positions of an annotation to anchor it to the text and stores the annotation content separately from the text. We address the fundamental problem of efficiently storing stand-off annotations when applying natural language processing on narrative clinical notes in electronic medical records (EMRs) and efficiently retrieving such annotations that satisfy position constraints. Efficient storage and retrieval of stand-off annotations can facilitate tasks such as mapping unstructured text to electronic medical record ontologies. We first formulate this problem into the interval query problem, for which optimal query/update time is in general logarithm. We next perform a tight time complexity analysis on the basic interval tree query algorithm and show its nonoptimality when being applied to a collection of 13 query types from Allen’s interval algebra. We then study two closely related state-of-the-art interval query algorithms, proposed query reformulations, and augmentations to the second algorithm. Our proposed algorithm achieves logarithmic time stabbing-max query time complexity and solves the stabbing-interval query tasks on all of Allen’s relations in logarithmic time, attaining the theoretic lower bound. Updating time is kept logarithmic and the space requirement is kept linear at the same time. We also discuss interval management in external memory models and higher dimensions. PMID:27478379

  14. GraQL: A Query Language for High-Performance Attributed Graph Databases

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chavarría-Miranda, Daniel; Castellana, Vito G.; Morari, Alessandro

    Graph databases have gained increasing interest in the last few years due to the emergence of data sources which are not easily analyzable in traditional relational models or for which a graph data model is the natural representation. In order to understand the design and implementation choices for an attributed graph database backend and query language, we have started to design our infrastructure for attributed graph databases. In this paper, we describe the design considerations of our in-memory attributed graph database system with a particular focus on the data definition and query language components.

  15. Evaluation of Content-Matched Range Monitoring Queries over Moving Objects in Mobile Computing Environments

    PubMed Central

    Jung, HaRim; Song, MoonBae; Youn, Hee Yong; Kim, Ung Mo

    2015-01-01

    A content-matched (CM) range monitoring query over moving objects continually retrieves the moving objects (i) whose non-spatial attribute values are matched to given non-spatial query values; and (ii) that are currently located within a given spatial query range. In this paper, we propose a new query indexing structure, called the group-aware query region tree (GQR-tree) for efficient evaluation of CM range monitoring queries. The primary role of the GQR-tree is to help the server leverage the computational capabilities of moving objects in order to improve the system performance in terms of the wireless communication cost and server workload. Through a series of comprehensive simulations, we verify the superiority of the GQR-tree method over the existing methods. PMID:26393613

  16. Lost in translation? A multilingual Query Builder improves the quality of PubMed queries: a randomised controlled trial.

    PubMed

    Schuers, Matthieu; Joulakian, Mher; Kerdelhué, Gaetan; Segas, Léa; Grosjean, Julien; Darmoni, Stéfan J; Griffon, Nicolas

    2017-07-03

    MEDLINE is the most widely used medical bibliographic database in the world. Most of its citations are in English and this can be an obstacle for some researchers to access the information the database contains. We created a multilingual query builder to facilitate access to the PubMed subset using a language other than English. The aim of our study was to assess the impact of this multilingual query builder on the quality of PubMed queries for non-native English speaking physicians and medical researchers. A randomised controlled study was conducted among French speaking general practice residents. We designed a multi-lingual query builder to facilitate information retrieval, based on available MeSH translations and providing users with both an interface and a controlled vocabulary in their own language. Participating residents were randomly allocated either the French or the English version of the query builder. They were asked to translate 12 short medical questions into MeSH queries. The main outcome was the quality of the query. Two librarians blind to the arm independently evaluated each query, using a modified published classification that differentiated eight types of errors. Twenty residents used the French version of the query builder and 22 used the English version. 492 queries were analysed. There were significantly more perfect queries in the French group vs. the English group (respectively 37.9% vs. 17.9%; p < 0.01). It took significantly more time for the members of the English group than the members of the French group to build each query, respectively 194 sec vs. 128 sec; p < 0.01. This multi-lingual query builder is an effective tool to improve the quality of PubMed queries in particular for researchers whose first language is not English.

  17. XGI: a graphical interface for XQuery creation.

    PubMed

    Li, Xiang; Gennari, John H; Brinkley, James F

    2007-10-11

    XML has become the default standard for data exchange among heterogeneous data sources, and in January 2007 XQuery (XML Query language) was recommended by the World Wide Web Consortium as the query language for XML. However, XQuery is a complex language that is difficult for non-programmers to learn. We have therefore developed XGI (XQuery Graphical Interface), a visual interface for graphically generating XQuery. In this paper we demonstrate the functionality of XGI through its application to a biomedical XML dataset. We describe the system architecture and the features of XGI in relation to several existing querying systems, we demonstrate the system's usability through a sample query construction, and we discuss a preliminary evaluation of XGI. Finally, we describe some limitations of the system, and our plans for future improvements.

  18. An XML-Based Manipulation and Query Language for Rule-Based Information

    NASA Astrophysics Data System (ADS)

    Mansour, Essam; Höpfner, Hagen

    Rules are utilized to assist in the monitoring process that is required in activities, such as disease management and customer relationship management. These rules are specified according to the application best practices. Most of research efforts emphasize on the specification and execution of these rules. Few research efforts focus on managing these rules as one object that has a management life-cycle. This paper presents our manipulation and query language that is developed to facilitate the maintenance of this object during its life-cycle and to query the information contained in this object. This language is based on an XML-based model. Furthermore, we evaluate the model and language using a prototype system applied to a clinical case study.

  19. Selecting the Best Mobile Information Service with Natural Language User Input

    NASA Astrophysics Data System (ADS)

    Feng, Qiangze; Qi, Hongwei; Fukushima, Toshikazu

    Information services accessed via mobile phones provide information directly relevant to subscribers’ daily lives and are an area of dynamic market growth worldwide. Although many information services are currently offered by mobile operators, many of the existing solutions require a unique gateway for each service, and it is inconvenient for users to have to remember a large number of such gateways. Furthermore, the Short Message Service (SMS) is very popular in China and Chinese users would prefer to access these services in natural language via SMS. This chapter describes a Natural Language Based Service Selection System (NL3S) for use with a large number of mobile information services. The system can accept user queries in natural language and navigate it to the required service. Since it is difficult for existing methods to achieve high accuracy and high coverage and anticipate which other services a user might want to query, the NL3S is developed based on a Multi-service Ontology (MO) and Multi-service Query Language (MQL). The MO and MQL provide semantic and linguistic knowledge, respectively, to facilitate service selection for a user query and to provide adaptive service recommendations. Experiments show that the NL3S can achieve 75-95% accuracies and 85-95% satisfactions for processing various styles of natural language queries. A trial involving navigation of 30 different mobile services shows that the NL3S can provide a viable commercial solution for mobile operators.

  20. Spatial aggregation query in dynamic geosensor networks

    NASA Astrophysics Data System (ADS)

    Yi, Baolin; Feng, Dayang; Xiao, Shisong; Zhao, Erdun

    2007-11-01

    Wireless sensor networks have been widely used for civilian and military applications, such as environmental monitoring and vehicle tracking. In many of these applications, the researches mainly aim at building sensor network based systems to leverage the sensed data to applications. However, the existing works seldom exploited spatial aggregation query considering the dynamic characteristics of sensor networks. In this paper, we investigate how to process spatial aggregation query over dynamic geosensor networks where both the sink node and sensor nodes are mobile and propose several novel improvements on enabling techniques. The mobility of sensors makes the existing routing protocol based on information of fixed framework or the neighborhood infeasible. We present an improved location-based stateless implicit geographic forwarding (IGF) protocol for routing a query toward the area specified by query window, a diameter-based window aggregation query (DWAQ) algorithm for query propagation and data aggregation in the query window, finally considering the location changing of the sink node, we present two schemes to forward the result to the sink node. Simulation results show that the proposed algorithms can improve query latency and query accuracy.

  1. Manchester visual query language

    NASA Astrophysics Data System (ADS)

    Oakley, John P.; Davis, Darryl N.; Shann, Richard T.

    1993-04-01

    We report a database language for visual retrieval which allows queries on image feature information which has been computed and stored along with images. The language is novel in that it provides facilities for dealing with feature data which has actually been obtained from image analysis. Each line in the Manchester Visual Query Language (MVQL) takes a set of objects as input and produces another, usually smaller, set as output. The MVQL constructs are mainly based on proven operators from the field of digital image analysis. An example is the Hough-group operator which takes as input a specification for the objects to be grouped, a specification for the relevant Hough space, and a definition of the voting rule. The output is a ranked list of high scoring bins. The query could be directed towards one particular image or an entire image database, in the latter case the bins in the output list would in general be associated with different images. We have implemented MVQL in two layers. The command interpreter is a Lisp program which maps each MVQL line to a sequence of commands which are used to control a specialized database engine. The latter is a hybrid graph/relational system which provides low-level support for inheritance and schema evolution. In the paper we outline the language and provide examples of useful queries. We also describe our solution to the engineering problems associated with the implementation of MVQL.

  2. A New Publicly Available Chemical Query Language, CSRML ...

    EPA Pesticide Factsheets

    A new XML-based query language, CSRML, has been developed for representing chemical substructures, molecules, reaction rules, and reactions. CSRML queries are capable of integrating additional forms of information beyond the simple substructure (e.g., SMARTS) or reaction transformation (e.g., SMIRKS, reaction SMILES) queries currently in use. Chemotypes, a term used to represent advanced CSRML queries for repeated application can be encoded not only with connectivity and topology, but also with properties of atoms, bonds, electronic systems, or molecules. The CSRML language has been developed in parallel with a public set of chemotypes, i.e., the ToxPrint chemotypes, which are designed to provide excellent coverage of environmental, regulatory and commercial use chemical space, as well as to represent features and frameworks believed to be especially relevant to toxicity concerns. A software application, ChemoTyper, has also been developed and made publicly available to enable chemotype searching and fingerprinting against a target structure set. The public ChemoTyper houses the ToxPrint chemotype CSRML dictionary, as well as reference implementation so that the query specifications may be adopted by other chemical structure knowledge systems. The full specifications of the XML standard used in CSRML-based chemotypes are publicly available to facilitate and encourage the exchange of structural knowledge. Paper details specifications for a new XML-based query lan

  3. Query Expansion and Query Translation as Logical Inference.

    ERIC Educational Resources Information Center

    Nie, Jian-Yun

    2003-01-01

    Examines query expansion during query translation in cross language information retrieval and develops a general framework for inferential information retrieval in two particular contexts: using fuzzy logic and probability theory. Obtains evaluation formulas that are shown to strongly correspond to those used in other information retrieval models.…

  4. A Text Knowledge Base from the AI Handbook.

    ERIC Educational Resources Information Center

    Simmons, Robert F.

    1987-01-01

    Describes a prototype natural language text knowledge system (TKS) that was used to organize 50 pages of a handbook on artificial intelligence as an inferential knowledge base with natural language query and command capabilities. Representation of text, database navigation, query systems, discourse structuring, and future research needs are…

  5. A Query Integrator and Manager for the Query Web

    PubMed Central

    Brinkley, James F.; Detwiler, Landon T.

    2012-01-01

    We introduce two concepts: the Query Web as a layer of interconnected queries over the document web and the semantic web, and a Query Web Integrator and Manager (QI) that enables the Query Web to evolve. QI permits users to write, save and reuse queries over any web accessible source, including other queries saved in other installations of QI. The saved queries may be in any language (e.g. SPARQL, XQuery); the only condition for interconnection is that the queries return their results in some form of XML. This condition allows queries to chain off each other, and to be written in whatever language is appropriate for the task. We illustrate the potential use of QI for several biomedical use cases, including ontology view generation using a combination of graph-based and logical approaches, value set generation for clinical data management, image annotation using terminology obtained from an ontology web service, ontology-driven brain imaging data integration, small-scale clinical data integration, and wider-scale clinical data integration. Such use cases illustrate the current range of applications of QI and lead us to speculate about the potential evolution from smaller groups of interconnected queries into a larger query network that layers over the document and semantic web. The resulting Query Web could greatly aid researchers and others who now have to manually navigate through multiple information sources in order to answer specific questions. PMID:22531831

  6. Path querying system on mobile devices

    NASA Astrophysics Data System (ADS)

    Lin, Xing; Wang, Yifei; Tian, Yuan; Wu, Lun

    2006-01-01

    Traditional approaches to path querying problems are not efficient and convenient under most circumstances. A more convenient and reliable approach to this problem has to be found. This paper is devoted to a path querying solution on mobile devices. By using an improved Dijkstra's shortest path algorithm and a natural language translating module, this system can help people find the shortest path between two places through their cell phones or other mobile devices. The chosen path is prompted in text of natural language, as well as a map picture. This system would be useful in solving best path querying problems and have potential to be a profitable business system.

  7. Shuttle-Data-Tape XML Translator

    NASA Technical Reports Server (NTRS)

    Barry, Matthew R.; Osborne, Richard N.

    2005-01-01

    JSDTImport is a computer program for translating native Shuttle Data Tape (SDT) files from American Standard Code for Information Interchange (ASCII) format into databases in other formats. JSDTImport solves the problem of organizing the SDT content, affording flexibility to enable users to choose how to store the information in a database to better support client and server applications. JSDTImport can be dynamically configured by use of a simple Extensible Markup Language (XML) file. JSDTImport uses this XML file to define how each record and field will be parsed, its layout and definition, and how the resulting database will be structured. JSDTImport also includes a client application programming interface (API) layer that provides abstraction for the data-querying process. The API enables a user to specify the search criteria to apply in gathering all the data relevant to a query. The API can be used to organize the SDT content and translate into a native XML database. The XML format is structured into efficient sections, enabling excellent query performance by use of the XPath query language. Optionally, the content can be translated into a Structured Query Language (SQL) database for fast, reliable SQL queries on standard database server computers.

  8. SQL/NF Translator for the Triton Nested Relational Database System

    DTIC Science & Technology

    1990-12-01

    18as., Ohio .. 9~~ ~~ 1 4- AFIT/GCE/ENG/90D-05 SQL/Nk1 TRANSLATOR FOR THE TRITON NESTED RELATIONAL DATABASE SYSTEM THESIS Craig William Schnepf Captain...FOR THE TRITON NESTED RELATIONAL DATABASE SYSTEM THESIS Presented to the Faculty of the School of Engineering of the Air Force Institute of Technnlogy... systems . The SQL/NF query language used for the nested relationil model is an extension of the popular relational model query language SQL. The query

  9. Extending the Query Language of a Data Warehouse for Patient Recruitment.

    PubMed

    Dietrich, Georg; Ertl, Maximilian; Fette, Georg; Kaspar, Mathias; Krebs, Jonathan; Mackenrodt, Daniel; Störk, Stefan; Puppe, Frank

    2017-01-01

    Patient recruitment for clinical trials is a laborious task, as many texts have to be screened. Usually, this work is done manually and takes a lot of time. We have developed a system that automates the screening process. Besides standard keyword queries, the query language supports extraction of numbers, time-spans and negations. In a feasibility study for patient recruitment from a stroke unit with 40 patients, we achieved encouraging extraction rates above 95% for numbers and negations and ca. 86% for time spans.

  10. Cognitive issues in searching images with visual queries

    NASA Astrophysics Data System (ADS)

    Yu, ByungGu; Evens, Martha W.

    1999-01-01

    In this paper, we propose our image indexing technique and visual query processing technique. Our mental images are different from the actual retinal images and many things, such as personal interests, personal experiences, perceptual context, the characteristics of spatial objects, and so on, affect our spatial perception. These private differences are propagated into our mental images and so our visual queries become different from the real images that we want to find. This is a hard problem and few people have tried to work on it. In this paper, we survey the human mental imagery system, the human spatial perception, and discuss several kinds of visual queries. Also, we propose our own approach to visual query interpretation and processing.

  11. An advanced web query interface for biological databases

    PubMed Central

    Latendresse, Mario; Karp, Peter D.

    2010-01-01

    Although most web-based biological databases (DBs) offer some type of web-based form to allow users to author DB queries, these query forms are quite restricted in the complexity of DB queries that they can formulate. They can typically query only one DB, and can query only a single type of object at a time (e.g. genes) with no possible interaction between the objects—that is, in SQL parlance, no joins are allowed between DB objects. Writing precise queries against biological DBs is usually left to a programmer skillful enough in complex DB query languages like SQL. We present a web interface for building precise queries for biological DBs that can construct much more precise queries than most web-based query forms, yet that is user friendly enough to be used by biologists. It supports queries containing multiple conditions, and connecting multiple object types without using the join concept, which is unintuitive to biologists. This interactive web interface is called the Structured Advanced Query Page (SAQP). Users interactively build up a wide range of query constructs. Interactive documentation within the SAQP describes the schema of the queried DBs. The SAQP is based on BioVelo, a query language based on list comprehension. The SAQP is part of the Pathway Tools software and is available as part of several bioinformatics web sites powered by Pathway Tools, including the BioCyc.org site that contains more than 500 Pathway/Genome DBs. PMID:20624715

  12. Regular paths in SparQL: querying the NCI Thesaurus.

    PubMed

    Detwiler, Landon T; Suciu, Dan; Brinkley, James F

    2008-11-06

    OWL, the Web Ontology Language, provides syntax and semantics for representing knowledge for the semantic web. Many of the constructs of OWL have a basis in the field of description logics. While the formal underpinnings of description logics have lead to a highly computable language, it has come at a cognitive cost. OWL ontologies are often unintuitive to readers lacking a strong logic background. In this work we describe GLEEN, a regular path expression library, which extends the RDF query language SparQL to support complex path expressions over OWL and other RDF-based ontologies. We illustrate the utility of GLEEN by showing how it can be used in a query-based approach to defining simpler, more intuitive views of OWL ontologies. In particular we show how relatively simple GLEEN-enhanced SparQL queries can create views of the OWL version of the NCI Thesaurus that match the views generated by the web-based NCI browser.

  13. Social media based NPL system to find and retrieve ARM data: Concept paper

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Devarakonda, Ranjeet; Giansiracusa, Michael T.; Kumar, Jitendra

    Information connectivity and retrieval has a role in our daily lives. The most pervasive source of online information is databases. The amount of data is growing at rapid rate and database technology is improving and having a profound effect. Almost all online applications are storing and retrieving information from databases. One challenge in supplying the public with wider access to informational databases is the need for knowledge of database languages like Structured Query Language (SQL). Although the SQL language has been published in many forms, not everybody is able to write SQL queries. Another challenge is that it may notmore » be practical to make the public aware of the structure of the database. There is a need for novice users to query relational databases using their natural language. To solve this problem, many natural language interfaces to structured databases have been developed. The goal is to provide more intuitive method for generating database queries and delivering responses. Social media makes it possible to interact with a wide section of the population. Through this medium, and with the help of Natural Language Processing (NLP) we can make the data of the Atmospheric Radiation Measurement Data Center (ADC) more accessible to the public. We propose an architecture for using Apache Lucene/Solr [1], OpenML [2,3], and Kafka [4] to generate an automated query/response system with inputs from Twitter5, our Cassandra DB, and our log database. Using the Twitter API and NLP we can give the public the ability to ask questions of our database and get automated responses.« less

  14. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Devarakonda, Ranjeet; Giansiracusa, Michael T.; Kumar, Jitendra

    Information connectivity and retrieval has a role in our daily lives. The most pervasive source of online information is databases. The amount of data is growing at rapid rate and database technology is improving and having a profound effect. Almost all online applications are storing and retrieving information from databases. One challenge in supplying the public with wider access to informational databases is the need for knowledge of database languages like Structured Query Language (SQL). Although the SQL language has been published in many forms, not everybody is able to write SQL queries. Another challenge is that it may notmore » be practical to make the public aware of the structure of the database. There is a need for novice users to query relational databases using their natural language. To solve this problem, many natural language interfaces to structured databases have been developed. The goal is to provide more intuitive method for generating database queries and delivering responses. Social media makes it possible to interact with a wide section of the population. Through this medium, and with the help of Natural Language Processing (NLP) we can make the data of the Atmospheric Radiation Measurement Data Center (ADC) more accessible to the public. We propose an architecture for using Apache Lucene/Solr [1], OpenML [2,3], and Kafka [4] to generate an automated query/response system with inputs from Twitter5, our Cassandra DB, and our log database. Using the Twitter API and NLP we can give the public the ability to ask questions of our database and get automated responses.« less

  15. A natural language query system for Hubble Space Telescope proposal selection

    NASA Technical Reports Server (NTRS)

    Hornick, Thomas; Cohen, William; Miller, Glenn

    1987-01-01

    The proposal selection process for the Hubble Space Telescope is assisted by a robust and easy to use query program (TACOS). The system parses an English subset language sentence regardless of the order of the keyword phases, allowing the user a greater flexibility than a standard command query language. Capabilities for macro and procedure definition are also integrated. The system was designed for flexibility in both use and maintenance. In addition, TACOS can be applied to any knowledge domain that can be expressed in terms of a single reaction. The system was implemented mostly in Common LISP. The TACOS design is described in detail, with particular attention given to the implementation methods of sentence processing.

  16. Unsupervised Spatial Event Detection in Targeted Domains with Applications to Civil Unrest Modeling

    PubMed Central

    Zhao, Liang; Chen, Feng; Dai, Jing; Hua, Ting; Lu, Chang-Tien; Ramakrishnan, Naren

    2014-01-01

    Twitter has become a popular data source as a surrogate for monitoring and detecting events. Targeted domains such as crime, election, and social unrest require the creation of algorithms capable of detecting events pertinent to these domains. Due to the unstructured language, short-length messages, dynamics, and heterogeneity typical of Twitter data streams, it is technically difficult and labor-intensive to develop and maintain supervised learning systems. We present a novel unsupervised approach for detecting spatial events in targeted domains and illustrate this approach using one specific domain, viz. civil unrest modeling. Given a targeted domain, we propose a dynamic query expansion algorithm to iteratively expand domain-related terms, and generate a tweet homogeneous graph. An anomaly identification method is utilized to detect spatial events over this graph by jointly maximizing local modularity and spatial scan statistics. Extensive experiments conducted in 10 Latin American countries demonstrate the effectiveness of the proposed approach. PMID:25350136

  17. Design of an On-Line Query Language for Full Text Patent Search.

    ERIC Educational Resources Information Center

    Glantz, Richard S.

    The design of an English-like query language and an interactive computer environment for searching the full text of the U.S. patent collection are discussed. Special attention is paid to achieving a transparent user interface, to providing extremely broad search capabilities (including nested substitution classes, Kleene star events, and domain…

  18. SP2Bench: A SPARQL Performance Benchmark

    NASA Astrophysics Data System (ADS)

    Schmidt, Michael; Hornung, Thomas; Meier, Michael; Pinkel, Christoph; Lausen, Georg

    A meaningful analysis and comparison of both existing storage schemes for RDF data and evaluation approaches for SPARQL queries necessitates a comprehensive and universal benchmark platform. We present SP2Bench, a publicly available, language-specific performance benchmark for the SPARQL query language. SP2Bench is settled in the DBLP scenario and comprises a data generator for creating arbitrarily large DBLP-like documents and a set of carefully designed benchmark queries. The generated documents mirror vital key characteristics and social-world distributions encountered in the original DBLP data set, while the queries implement meaningful requests on top of this data, covering a variety of SPARQL operator constellations and RDF access patterns. In this chapter, we discuss requirements and desiderata for SPARQL benchmarks and present the SP2Bench framework, including its data generator, benchmark queries and performance metrics.

  19. Natural language query system design for interactive information storage and retrieval systems. Presentation visuals. M.S. Thesis Final Report, 1 Jul. 1985 - 31 Dec. 1987

    NASA Technical Reports Server (NTRS)

    Dominick, Wayne D. (Editor); Liu, I-Hsiung

    1985-01-01

    This Working Paper Series entry represents a collection of presentation visuals associated with the companion report entitled Natural Language Query System Design for Interactive Information Storage and Retrieval Systems, USL/DBMS NASA/RECON Working Paper Series report number DBMS.NASA/RECON-17.

  20. The Effectiveness of Stemming for Natural-Language Access to Slovene Textual Data.

    ERIC Educational Resources Information Center

    Popovic, Mirko; Willett, Peter

    1992-01-01

    Reports on the use of stemming for Slovene language documents and queries in free-text retrieval systems and demonstrates that an appropriate stemming algorithm results in an increase in retrieval effectiveness when compared with nonstemming processing. A comparison is made with stemming of English versions of the same documents and queries. (24…

  1. A 5E Learning Cycle Approach-Based, Multimedia-Supplemented Instructional Unit for Structured Query Language

    ERIC Educational Resources Information Center

    Piyayodilokchai, Hongsiri; Panjaburee, Patcharin; Laosinchai, Parames; Ketpichainarong, Watcharee; Ruenwongsa, Pintip

    2013-01-01

    With the benefit of multimedia and the learning cycle approach in promoting effective active learning, this paper proposed a learning cycle approach-based, multimedia-supplemented instructional unit for Structured Query Language (SQL) for second-year undergraduate students with the aim of enhancing their basic knowledge of SQL and ability to apply…

  2. A Role Calculus for ORM

    NASA Astrophysics Data System (ADS)

    Curland, Matthew; Halpin, Terry; Stirewalt, Kurt

    A conceptual schema of an information system specifies the fact structures of interest as well as related business rules that are either constraints or derivation rules. Constraints restrict the possible or permitted states or state transitions, while derivation rules enable some facts to be derived from others. Graphical languages are commonly used to specify conceptual schemas, but often need to be supplemented by more expressive textual languages to capture additional business rules, as well as conceptual queries that enable conceptual models to be queried directly. This paper describes research to provide a role calculus to underpin textual languages for Object-Role Modeling (ORM), to enable business rules and queries to be formulated in a language intelligible to business users. The role-based nature of this calculus, which exploits the attribute-free nature of ORM, appears to offer significant advantages over other proposed approaches, especially in the area of semantic stability.

  3. Searching for cancer information on the internet: analyzing natural language search queries.

    PubMed

    Bader, Judith L; Theofanos, Mary Frances

    2003-12-11

    Searching for health information is one of the most-common tasks performed by Internet users. Many users begin searching on popular search engines rather than on prominent health information sites. We know that many visitors to our (National Cancer Institute) Web site, cancer.gov, arrive via links in search engine result. To learn more about the specific needs of our general-public users, we wanted to understand what lay users really wanted to know about cancer, how they phrased their questions, and how much detail they used. The National Cancer Institute partnered with AskJeeves, Inc to develop a methodology to capture, sample, and analyze 3 months of cancer-related queries on the Ask.com Web site, a prominent United States consumer search engine, which receives over 35 million queries per week. Using a benchmark set of 500 terms and word roots supplied by the National Cancer Institute, AskJeeves identified a test sample of cancer queries for 1 week in August 2001. From these 500 terms only 37 appeared >or= 5 times/day over the trial test week in 17208 queries. Using these 37 terms, 204165 instances of cancer queries were found in the Ask.com query logs for the actual test period of June-August 2001. Of these, 7500 individual user questions were randomly selected for detailed analysis and assigned to appropriate categories. The exact language of sample queries is presented. Considering multiples of the same questions, the sample of 7500 individual user queries represented 76077 queries (37% of the total 3-month pool). Overall 78.37% of sampled Cancer queries asked about 14 specific cancer types. Within each cancer type, queries were sorted into appropriate subcategories including at least the following: General Information, Symptoms, Diagnosis and Testing, Treatment, Statistics, Definition, and Cause/Risk/Link. The most-common specific cancer types mentioned in queries were Digestive/Gastrointestinal/Bowel (15.0%), Breast (11.7%), Skin (11.3%), and Genitourinary (10.5%). Additional subcategories of queries about specific cancer types varied, depending on user input. Queries that were not specific to a cancer type were also tracked and categorized. Natural-language searching affords users the opportunity to fully express their information needs and can aid users naïve to the content and vocabulary. The specific queries analyzed for this study reflect news and research studies reported during the study dates and would surely change with different study dates. Analyzing queries from search engines represents one way of knowing what kinds of content to provide to users of a given Web site. Users ask questions using whole sentences and keywords, often misspelling words. Providing the option for natural-language searching does not obviate the need for good information architecture, usability engineering, and user testing in order to optimize user experience.

  4. Searching for Cancer Information on the Internet: Analyzing Natural Language Search Queries

    PubMed Central

    Theofanos, Mary Frances

    2003-01-01

    Background Searching for health information is one of the most-common tasks performed by Internet users. Many users begin searching on popular search engines rather than on prominent health information sites. We know that many visitors to our (National Cancer Institute) Web site, cancer.gov, arrive via links in search engine result. Objective To learn more about the specific needs of our general-public users, we wanted to understand what lay users really wanted to know about cancer, how they phrased their questions, and how much detail they used. Methods The National Cancer Institute partnered with AskJeeves, Inc to develop a methodology to capture, sample, and analyze 3 months of cancer-related queries on the Ask.com Web site, a prominent United States consumer search engine, which receives over 35 million queries per week. Using a benchmark set of 500 terms and word roots supplied by the National Cancer Institute, AskJeeves identified a test sample of cancer queries for 1 week in August 2001. From these 500 terms only 37 appeared ≥ 5 times/day over the trial test week in 17208 queries. Using these 37 terms, 204165 instances of cancer queries were found in the Ask.com query logs for the actual test period of June-August 2001. Of these, 7500 individual user questions were randomly selected for detailed analysis and assigned to appropriate categories. The exact language of sample queries is presented. Results Considering multiples of the same questions, the sample of 7500 individual user queries represented 76077 queries (37% of the total 3-month pool). Overall 78.37% of sampled Cancer queries asked about 14 specific cancer types. Within each cancer type, queries were sorted into appropriate subcategories including at least the following: General Information, Symptoms, Diagnosis and Testing, Treatment, Statistics, Definition, and Cause/Risk/Link. The most-common specific cancer types mentioned in queries were Digestive/Gastrointestinal/Bowel (15.0%), Breast (11.7%), Skin (11.3%), and Genitourinary (10.5%). Additional subcategories of queries about specific cancer types varied, depending on user input. Queries that were not specific to a cancer type were also tracked and categorized. Conclusions Natural-language searching affords users the opportunity to fully express their information needs and can aid users naïve to the content and vocabulary. The specific queries analyzed for this study reflect news and research studies reported during the study dates and would surely change with different study dates. Analyzing queries from search engines represents one way of knowing what kinds of content to provide to users of a given Web site. Users ask questions using whole sentences and keywords, often misspelling words. Providing the option for natural-language searching does not obviate the need for good information architecture, usability engineering, and user testing in order to optimize user experience. PMID:14713659

  5. MRML: an extensible communication protocol for interoperability and benchmarking of multimedia information retrieval systems

    NASA Astrophysics Data System (ADS)

    Mueller, Wolfgang; Mueller, Henning; Marchand-Maillet, Stephane; Pun, Thierry; Squire, David M.; Pecenovic, Zoran; Giess, Christoph; de Vries, Arjen P.

    2000-10-01

    While in the area of relational databases interoperability is ensured by common communication protocols (e.g. ODBC/JDBC using SQL), Content Based Image Retrieval Systems (CBIRS) and other multimedia retrieval systems are lacking both a common query language and a common communication protocol. Besides its obvious short term convenience, interoperability of systems is crucial for the exchange and analysis of user data. In this paper, we present and describe an extensible XML-based query markup language, called MRML (Multimedia Retrieval markup Language). MRML is primarily designed so as to ensure interoperability between different content-based multimedia retrieval systems. Further, MRML allows researchers to preserve their freedom in extending their system as needed. MRML encapsulates multimedia queries in a way that enable multimedia (MM) query languages, MM content descriptions, MM query engines, and MM user interfaces to grow independently from each other, reaching a maximum of interoperability while ensuring a maximum of freedom for the developer. For benefitting from this, only a few simple design principles have to be respected when extending MRML for one's fprivate needs. The design of extensions withing the MRML framework will be described in detail in the paper. MRML has been implemented and tested for the CBIRS Viper, using the user interface Snake Charmer. Both are part of the GNU project and can be downloaded at our site.

  6. Saying What You're Looking For: Linguistics Meets Video Search.

    PubMed

    Barrett, Daniel Paul; Barbu, Andrei; Siddharth, N; Siskind, Jeffrey Mark

    2016-10-01

    We present an approach to searching large video corpora for clips which depict a natural-language query in the form of a sentence. Compositional semantics is used to encode subtle meaning differences lost in other approaches, such as the difference between two sentences which have identical words but entirely different meaning: The person rode the horse versus The horse rode the person. Given a sentential query and a natural-language parser, we produce a score indicating how well a video clip depicts that sentence for each clip in a corpus and return a ranked list of clips. Two fundamental problems are addressed simultaneously: detecting and tracking objects, and recognizing whether those tracks depict the query. Because both tracking and object detection are unreliable, our approach uses the sentential query to focus the tracker on the relevant participants and ensures that the resulting tracks are described by the sentential query. While most earlier work was limited to single-word queries which correspond to either verbs or nouns, we search for complex queries which contain multiple phrases, such as prepositional phrases, and modifiers, such as adverbs. We demonstrate this approach by searching for 2,627 naturally elicited sentential queries in 10 Hollywood movies.

  7. vSPARQL: A View Definition Language for the Semantic Web

    PubMed Central

    Shaw, Marianne; Detwiler, Landon T.; Noy, Natalya; Brinkley, James; Suciu, Dan

    2010-01-01

    Translational medicine applications would like to leverage the biological and biomedical ontologies, vocabularies, and data sets available on the semantic web. We present a general solution for RDF information set reuse inspired by database views. Our view definition language, vSPARQL, allows applications to specify the exact content that they are interested in and how that content should be restructured or modified. Applications can access relevant content by querying against these view definitions. We evaluate the expressivity of our approach by defining views for practical use cases and comparing our view definition language to existing query languages. PMID:20800106

  8. Concepts and implementations of natural language query systems

    NASA Technical Reports Server (NTRS)

    Dominick, Wayne D. (Editor); Liu, I-Hsiung

    1984-01-01

    The currently developed user language interfaces of information systems are generally intended for serious users. These interfaces commonly ignore potentially the largest user group, i.e., casual users. This project discusses the concepts and implementations of a natural query language system which satisfy the nature and information needs of casual users by allowing them to communicate with the system in the form of their native (natural) language. In addition, a framework for the development of such an interface is also introduced for the MADAM (Multics Approach to Data Access and Management) system at the University of Southwestern Louisiana.

  9. Complex analyses on clinical information systems using restricted natural language querying to resolve time-event dependencies.

    PubMed

    Safari, Leila; Patrick, Jon D

    2018-06-01

    This paper reports on a generic framework to provide clinicians with the ability to conduct complex analyses on elaborate research topics using cascaded queries to resolve internal time-event dependencies in the research questions, as an extension to the proposed Clinical Data Analytics Language (CliniDAL). A cascaded query model is proposed to resolve internal time-event dependencies in the queries which can have up to five levels of criteria starting with a query to define subjects to be admitted into a study, followed by a query to define the time span of the experiment. Three more cascaded queries can be required to define control groups, control variables and output variables which all together simulate a real scientific experiment. According to the complexity of the research questions, the cascaded query model has the flexibility of merging some lower level queries for simple research questions or adding a nested query to each level to compose more complex queries. Three different scenarios (one of them contains two studies) are described and used for evaluation of the proposed solution. CliniDAL's complex analyses solution enables answering complex queries with time-event dependencies at most in a few hours which manually would take many days. An evaluation of results of the research studies based on the comparison between CliniDAL and SQL solutions reveals high usability and efficiency of CliniDAL's solution. Copyright © 2018 Elsevier Inc. All rights reserved.

  10. An Examination of Natural Language as a Query Formation Tool for Retrieving Information on E-Health from Pub Med.

    ERIC Educational Resources Information Center

    Peterson, Gabriel M.; Su, Kuichun; Ries, James E.; Sievert, Mary Ellen C.

    2002-01-01

    Discussion of Internet use for information searches on health-related topics focuses on a study that examined complexity and variability of natural language in using search terms that express the concept of electronic health (e-health). Highlights include precision of retrieved information; shift in terminology; and queries using the Pub Med…

  11. Sugeno Fuzzy Integral as a Basis for the Interpretation of Flexible Queries Involving Monotonic Aggregates.

    ERIC Educational Resources Information Center

    Bosc, P.; Lietard, L.; Pivert, O.

    2003-01-01

    Considers flexible querying of relational databases. Highlights include SQL languages and basic aggregate operators; Sugeno's fuzzy integral; evaluation examples; and how and under what conditions other aggregate functions could be applied to fuzzy sets in a flexible query. (Author/LRW)

  12. Quality assessment of structure and language elements of written responses given by seven Scandinavian drug information centres.

    PubMed

    Reppe, Linda Amundstuen; Spigset, Olav; Kampmann, Jens Peter; Damkier, Per; Christensen, Hanne Rolighed; Böttiger, Ylva; Schjøtt, Jan

    2017-05-01

    The aim of this study was to identify structure and language elements affecting the quality of responses from Scandinavian drug information centres (DICs). Six different fictitious drug-related queries were sent to each of seven Scandinavian DICs. The centres were blinded for which queries were part of the study. The responses were assessed qualitatively by six clinical pharmacologists (internal experts) and six general practitioners (GPs, external experts). In addition, linguistic aspects of the responses were evaluated by a plain language expert. The quality of responses was generally judged as satisfactory to good. Presenting specific advice and conclusions were considered to improve the quality of the responses. However, small nuances in language formulations could affect the individual judgments of the experts, e.g. on whether or not advice was given. Some experts preferred the use of primary sources to the use of secondary and tertiary sources. Both internal and external experts criticised the use of abbreviations, professional terminology and study findings that was left unexplained. The plain language expert emphasised the importance of defining and explaining pharmacological terms to ensure that enquirers understand the response as intended. In addition, more use of active voice and less compressed text structure would be desirable. This evaluation of responses to DIC queries may give some indications on how to improve written responses on drug-related queries with respect to language and text structure. Giving specific advice and precise conclusions and avoiding too compressed language and non-standard abbreviations may aid to reach this goal.

  13. Agent-Based Computing Integration and Testing

    DTIC Science & Technology

    2006-12-01

    Query Language (DQL). Regrettably, DQL never became a W3C Member Submission itself, but likely had some influence on the SPARQL Protocol And RDF... Query Language ( SPARQL ) subsequently produced by the W3C Data Access Working Group (DAWG) as that working group also contained members from the DAML...Sponsored by Defense Advanced Research Projects Agency DARPA Order No. K536 APPROVED FOR PUBLIC RELEASE

  14. Using Web Ontology Language to Integrate Heterogeneous Databases in the Neurosciences

    PubMed Central

    Lam, Hugo Y.K.; Marenco, Luis; Shepherd, Gordon M.; Miller, Perry L.; Cheung, Kei-Hoi

    2006-01-01

    Integrative neuroscience involves the integration and analysis of diverse types of neuroscience data involving many different experimental techniques. This data will increasingly be distributed across many heterogeneous databases that are web-accessible. Currently, these databases do not expose their schemas (database structures) and their contents to web applications/agents in a standardized, machine-friendly way. This limits database interoperation. To address this problem, we describe a pilot project that illustrates how neuroscience databases can be expressed using the Web Ontology Language, which is a semantically-rich ontological language, as a common data representation language to facilitate complex cross-database queries. In this pilot project, an existing tool called “D2RQ” was used to translate two neuroscience databases (NeuronDB and CoCoDat) into OWL, and the resulting OWL ontologies were then merged. An OWL-based reasoner (Racer) was then used to provide a sophisticated query language (nRQL) to perform integrated queries across the two databases based on the merged ontology. This pilot project is one step toward exploring the use of semantic web technologies in the neurosciences. PMID:17238384

  15. vSPARQL: a view definition language for the semantic web.

    PubMed

    Shaw, Marianne; Detwiler, Landon T; Noy, Natalya; Brinkley, James; Suciu, Dan

    2011-02-01

    Translational medicine applications would like to leverage the biological and biomedical ontologies, vocabularies, and data sets available on the semantic web. We present a general solution for RDF information set reuse inspired by database views. Our view definition language, vSPARQL, allows applications to specify the exact content that they are interested in and how that content should be restructured or modified. Applications can access relevant content by querying against these view definitions. We evaluate the expressivity of our approach by defining views for practical use cases and comparing our view definition language to existing query languages. Copyright © 2010 Elsevier Inc. All rights reserved.

  16. QATT: a Natural Language Interface for QPE. M.S. Thesis

    NASA Technical Reports Server (NTRS)

    White, Douglas Robert-Graham

    1989-01-01

    QATT, a natural language interface developed for the Qualitative Process Engine (QPE) system is presented. The major goal was to evaluate the use of a preexisting natural language understanding system designed to be tailored for query processing in multiple domains of application. The other goal of QATT is to provide a comfortable environment in which to query envisionments in order to gain insight into the qualitative behavior of physical systems. It is shown that the use of the preexisting system made possible the development of a reasonably useful interface in a few months.

  17. Query Expansion Using SNOMED-CT and Weighing Schemes

    DTIC Science & Technology

    2014-11-01

    For this research, we have used SNOMED-CT along with UMLS Methathesaurus as our ontology in medical domain to expand the queries. General Terms...CT along with UMLS Methathesaurus as our ontology in medical domain to expand the queries. 15. SUBJECT TERMS 16. SECURITY CLASSIFICATION OF: 17...University of the Basque country discuss their finding on query expansion using external sources headlined by Unified Medical Language System ( UMLS

  18. Remote sensing and GIS integration: Towards intelligent imagery within a spatial data infrastructure

    NASA Astrophysics Data System (ADS)

    Abdelrahim, Mohamed Mahmoud Hosny

    2001-11-01

    In this research, an "Intelligent Imagery System Prototype" (IISP) was developed. IISP is an integration tool that facilitates the environment for active, direct, and on-the-fly usage of high resolution imagery, internally linked to hidden GIS vector layers, to query the real world phenomena and, consequently, to perform exploratory types of spatial analysis based on a clear/undisturbed image scene. The IISP was designed and implemented using the software components approach to verify the hypothesis that a fully rectified, partially rectified, or even unrectified digital image can be internally linked to a variety of different hidden vector databases/layers covering the end user area of interest, and consequently may be reliably used directly as a base for "on-the-fly" querying of real-world phenomena and for performing exploratory types of spatial analysis. Within IISP, differentially rectified, partially rectified (namely, IKONOS GEOCARTERRA(TM)), and unrectified imagery (namely, scanned aerial photographs and captured video frames) were investigated. The system was designed to handle four types of spatial functions, namely, pointing query, polygon/line-based image query, database query, and buffering. The system was developed using ESRI MapObjects 2.0a as the core spatial component within Visual Basic 6.0. When used to perform the pre-defined spatial queries using different combinations of image and vector data, the IISP provided the same results as those obtained by querying pre-processed vector layers even when the image used was not orthorectified and the vector layers had different parameters. In addition, the real-time pixel location orthorectification technique developed and presented within the IKONOS GEOCARTERRA(TM) case provided a horizontal accuracy (RMSE) of +/- 2.75 metres. This accuracy is very close to the accuracy level obtained when purchasing the orthorectified IKONOS PRECISION products (RMSE of +/- 1.9 metre). The latter cost approximately four times as much as the IKONOS GEOCARTERRA(TM) products. The developed IISP is a step closer towards the direct and active involvement of high-resolution remote sensing imagery in querying the real world and performing exploratory types of spatial analysis. (Abstract shortened by UMI.)

  19. Query Health: standards-based, cross-platform population health surveillance

    PubMed Central

    Klann, Jeffrey G; Buck, Michael D; Brown, Jeffrey; Hadley, Marc; Elmore, Richard; Weber, Griffin M; Murphy, Shawn N

    2014-01-01

    Objective Understanding population-level health trends is essential to effectively monitor and improve public health. The Office of the National Coordinator for Health Information Technology (ONC) Query Health initiative is a collaboration to develop a national architecture for distributed, population-level health queries across diverse clinical systems with disparate data models. Here we review Query Health activities, including a standards-based methodology, an open-source reference implementation, and three pilot projects. Materials and methods Query Health defined a standards-based approach for distributed population health queries, using an ontology based on the Quality Data Model and Consolidated Clinical Document Architecture, Health Quality Measures Format (HQMF) as the query language, the Query Envelope as the secure transport layer, and the Quality Reporting Document Architecture as the result language. Results We implemented this approach using Informatics for Integrating Biology and the Bedside (i2b2) and hQuery for data analytics and PopMedNet for access control, secure query distribution, and response. We deployed the reference implementation at three pilot sites: two public health departments (New York City and Massachusetts) and one pilot designed to support Food and Drug Administration post-market safety surveillance activities. The pilots were successful, although improved cross-platform data normalization is needed. Discussions This initiative resulted in a standards-based methodology for population health queries, a reference implementation, and revision of the HQMF standard. It also informed future directions regarding interoperability and data access for ONC's Data Access Framework initiative. Conclusions Query Health was a test of the learning health system that supplied a functional methodology and reference implementation for distributed population health queries that has been validated at three sites. PMID:24699371

  20. Query Health: standards-based, cross-platform population health surveillance.

    PubMed

    Klann, Jeffrey G; Buck, Michael D; Brown, Jeffrey; Hadley, Marc; Elmore, Richard; Weber, Griffin M; Murphy, Shawn N

    2014-01-01

    Understanding population-level health trends is essential to effectively monitor and improve public health. The Office of the National Coordinator for Health Information Technology (ONC) Query Health initiative is a collaboration to develop a national architecture for distributed, population-level health queries across diverse clinical systems with disparate data models. Here we review Query Health activities, including a standards-based methodology, an open-source reference implementation, and three pilot projects. Query Health defined a standards-based approach for distributed population health queries, using an ontology based on the Quality Data Model and Consolidated Clinical Document Architecture, Health Quality Measures Format (HQMF) as the query language, the Query Envelope as the secure transport layer, and the Quality Reporting Document Architecture as the result language. We implemented this approach using Informatics for Integrating Biology and the Bedside (i2b2) and hQuery for data analytics and PopMedNet for access control, secure query distribution, and response. We deployed the reference implementation at three pilot sites: two public health departments (New York City and Massachusetts) and one pilot designed to support Food and Drug Administration post-market safety surveillance activities. The pilots were successful, although improved cross-platform data normalization is needed. This initiative resulted in a standards-based methodology for population health queries, a reference implementation, and revision of the HQMF standard. It also informed future directions regarding interoperability and data access for ONC's Data Access Framework initiative. Query Health was a test of the learning health system that supplied a functional methodology and reference implementation for distributed population health queries that has been validated at three sites. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.

  1. The contribution of morphological knowledge to French MeSH mapping for information retrieval.

    PubMed Central

    Zweigenbaum, P.; Darmoni, S. J.; Grabar, N.

    2001-01-01

    MeSH-indexed Internet health directories must provide a mapping from natural language queries to MeSH terms so that both health professionals and the general public can query their contents. We describe here the design of lexical knowledge bases for mapping French expressions to MeSH terms, and the initial evaluation of their contribution to Doc'CISMeF, the search tool of a MeSH-indexed directory of French-language medical Internet resources. The observed trend is in favor of the use of morphological knowledge as a moderate (approximately 5%) but effective factor for improving query to term mapping capabilities. PMID:11825295

  2. SPARK: Adapting Keyword Query to Semantic Search

    NASA Astrophysics Data System (ADS)

    Zhou, Qi; Wang, Chong; Xiong, Miao; Wang, Haofen; Yu, Yong

    Semantic search promises to provide more accurate result than present-day keyword search. However, progress with semantic search has been delayed due to the complexity of its query languages. In this paper, we explore a novel approach of adapting keywords to querying the semantic web: the approach automatically translates keyword queries into formal logic queries so that end users can use familiar keywords to perform semantic search. A prototype system named 'SPARK' has been implemented in light of this approach. Given a keyword query, SPARK outputs a ranked list of SPARQL queries as the translation result. The translation in SPARK consists of three major steps: term mapping, query graph construction and query ranking. Specifically, a probabilistic query ranking model is proposed to select the most likely SPARQL query. In the experiment, SPARK achieved an encouraging translation result.

  3. Language model: Extension to solve inconsistency, incompleteness, and short query in cultural heritage collection

    NASA Astrophysics Data System (ADS)

    Tan, Kian Lam; Lim, Chen Kim

    2017-10-01

    With the explosive growth of online information such as email messages, news articles, and scientific literature, many institutions and museums are converting their cultural collections from physical data to digital format. However, this conversion resulted in the issues of inconsistency and incompleteness. Besides, the usage of inaccurate keywords also resulted in short query problem. Most of the time, the inconsistency and incompleteness are caused by the aggregation fault in annotating a document itself while the short query problem is caused by naive user who has prior knowledge and experience in cultural heritage domain. In this paper, we presented an approach to solve the problem of inconsistency, incompleteness and short query by incorporating the Term Similarity Matrix into the Language Model. Our approach is tested on the Cultural Heritage in CLEF (CHiC) collection which consists of short queries and documents. The results show that the proposed approach is effective and has improved the accuracy in retrieval time.

  4. p3d--Python module for structural bioinformatics.

    PubMed

    Fufezan, Christian; Specht, Michael

    2009-08-21

    High-throughput bioinformatic analysis tools are needed to mine the large amount of structural data via knowledge based approaches. The development of such tools requires a robust interface to access the structural data in an easy way. For this the Python scripting language is the optimal choice since its philosophy is to write an understandable source code. p3d is an object oriented Python module that adds a simple yet powerful interface to the Python interpreter to process and analyse three dimensional protein structure files (PDB files). p3d's strength arises from the combination of a) very fast spatial access to the structural data due to the implementation of a binary space partitioning (BSP) tree, b) set theory and c) functions that allow to combine a and b and that use human readable language in the search queries rather than complex computer language. All these factors combined facilitate the rapid development of bioinformatic tools that can perform quick and complex analyses of protein structures. p3d is the perfect tool to quickly develop tools for structural bioinformatics using the Python scripting language.

  5. A Query System Implementation Case Study.

    ERIC Educational Resources Information Center

    Hiser, Judith N.; Neil, M. Elizabeth

    1985-01-01

    The Department of Administrative Programming Services of Clemson University investigated products available in user-friendly retrieval systems. The test of INTELLECT, a natural language query system written by Artifical Intelligence Corporation, is described. (Author/MLW)

  6. A SQL-Database Based Meta-CASE System and its Query Subsystem

    NASA Astrophysics Data System (ADS)

    Eessaar, Erki; Sgirka, Rünno

    Meta-CASE systems simplify the creation of CASE (Computer Aided System Engineering) systems. In this paper, we present a meta-CASE system that provides a web-based user interface and uses an object-relational database system (ORDBMS) as its basis. The use of ORDBMSs allows us to integrate different parts of the system and simplify the creation of meta-CASE and CASE systems. ORDBMSs provide powerful query mechanism. The proposed system allows developers to use queries to evaluate and gradually improve artifacts and calculate values of software measures. We illustrate the use of the systems by using SimpleM modeling language and discuss the use of SQL in the context of queries about artifacts. We have created a prototype of the meta-CASE system by using PostgreSQL™ ORDBMS and PHP scripting language.

  7. Genomes as geography: using GIS technology to build interactive genome feature maps

    PubMed Central

    Dolan, Mary E; Holden, Constance C; Beard, M Kate; Bult, Carol J

    2006-01-01

    Background Many commonly used genome browsers display sequence annotations and related attributes as horizontal data tracks that can be toggled on and off according to user preferences. Most genome browsers use only simple keyword searches and limit the display of detailed annotations to one chromosomal region of the genome at a time. We have employed concepts, methodologies, and tools that were developed for the display of geographic data to develop a Genome Spatial Information System (GenoSIS) for displaying genomes spatially, and interacting with genome annotations and related attribute data. In contrast to the paradigm of horizontally stacked data tracks used by most genome browsers, GenoSIS uses the concept of registered spatial layers composed of spatial objects for integrated display of diverse data. In addition to basic keyword searches, GenoSIS supports complex queries, including spatial queries, and dynamically generates genome maps. Our adaptation of the geographic information system (GIS) model in a genome context supports spatial representation of genome features at multiple scales with a versatile and expressive query capability beyond that supported by existing genome browsers. Results We implemented an interactive genome sequence feature map for the mouse genome in GenoSIS, an application that uses ArcGIS, a commercially available GIS software system. The genome features and their attributes are represented as spatial objects and data layers that can be toggled on and off according to user preferences or displayed selectively in response to user queries. GenoSIS supports the generation of custom genome maps in response to complex queries about genome features based on both their attributes and locations. Our example application of GenoSIS to the mouse genome demonstrates the powerful visualization and query capability of mature GIS technology applied in a novel domain. Conclusion Mapping tools developed specifically for geographic data can be exploited to display, explore and interact with genome data. The approach we describe here is organism independent and is equally useful for linear and circular chromosomes. One of the unique capabilities of GenoSIS compared to existing genome browsers is the capacity to generate genome feature maps dynamically in response to complex attribute and spatial queries. PMID:16984652

  8. The Limitations of Term Co-Occurrence Data for Query Expansion in Document Retrieval Systems.

    ERIC Educational Resources Information Center

    Peat, Helen J.; Willett, Peter

    1991-01-01

    Identifies limitations in the use of term co-occurrence data as a basis for automatic query expansion in natural language document retrieval systems. The use of similarity coefficients to calculate the degree of similarity between pairs of terms is explained, and frequency and discriminatory characteristics for nearest neighbors of query terms are…

  9. Enhancing SAMOS Data Access in DOMS via a Neo4j Property Graph Database.

    NASA Astrophysics Data System (ADS)

    Stallard, A. P.; Smith, S. R.; Elya, J. L.

    2016-12-01

    The Shipboard Automated Meteorological and Oceanographic System (SAMOS) initiative provides routine access to high-quality marine meteorological and near-surface oceanographic observations from research vessels. The Distributed Oceanographic Match-Up Service (DOMS) under development is a centralized service that allows researchers to easily match in situ and satellite oceanographic data from distributed sources to facilitate satellite calibration, validation, and retrieval algorithm development. The service currently uses Apache Solr as a backend search engine on each node in the distributed network. While Solr is a high-performance solution that facilitates creation and maintenance of indexed data, it is limited in the sense that its schema is fixed. The property graph model escapes this limitation by creating relationships between data objects. The authors will present the development of the SAMOS Neo4j property graph database including new search possibilities that take advantage of the property graph model, performance comparisons with Apache Solr, and a vision for graph databases as a storage tool for oceanographic data. The integration of the SAMOS Neo4j graph into DOMS will also be described. Currently, Neo4j contains spatial and temporal records from SAMOS which are modeled into a time tree and r-tree using Graph Aware and Spatial plugin tools for Neo4j. These extensions provide callable Java procedures within CYPHER (Neo4j's query language) that generate in-graph structures. Once generated, these structures can be queried using procedures from these libraries, or directly via CYPHER statements. Neo4j excels at performing relationship and path-based queries, which challenge relational-SQL databases because they require memory intensive joins due to the limitation of their design. Consider a user who wants to find records over several years, but only for specific months. If a traditional database only stores timestamps, this type of query would be complex and likely prohibitively slow. Using the time tree model, one can specify a path from the root to the data which restricts resolutions to certain timeframes (e.g., months). This query can be executed without joins, unions, or other compute-intensive operations, putting Neo4j at a computational advantage to the SQL database alternative.

  10. A data analysis expert system for large established distributed databases

    NASA Technical Reports Server (NTRS)

    Gnacek, Anne-Marie; An, Y. Kim; Ryan, J. Patrick

    1987-01-01

    A design for a natural language database interface system, called the Deductively Augmented NASA Management Decision support System (DANMDS), is presented. The DANMDS system components have been chosen on the basis of the following considerations: maximal employment of the existing NASA IBM-PC computers and supporting software; local structuring and storing of external data via the entity-relationship model; a natural easy-to-use error-free database query language; user ability to alter query language vocabulary and data analysis heuristic; and significant artificial intelligence data analysis heuristic techniques that allow the system to become progressively and automatically more useful.

  11. GELLO: an object-oriented query and expression language for clinical decision support.

    PubMed

    Sordo, Margarita; Ogunyemi, Omolola; Boxwala, Aziz A; Greenes, Robert A

    2003-01-01

    GELLO is a purpose-specific, object-oriented (OO) query and expression language. GELLO is the result of a concerted effort of the Decision Systems Group (DSG) working with the HL7 Clinical Decision Support Technical Committee (CDSTC) to provide the HL7 community with a common format for data encoding and manipulation. GELLO will soon be submitted for ballot to the HL7 CDSTC for consideration as a standard.

  12. Machine Translation-Supported Cross-Language Information Retrieval for a Consumer Health Resource

    PubMed Central

    Rosemblat, Graciela; Gemoets, Darren; Browne, Allen C.; Tse, Tony

    2003-01-01

    The U.S. National Institutes of Health, through its National Library of Medicine, developed ClinicalTrials.gov to provide the public with easy access to information on clinical trials on a wide range of conditions or diseases. Only English language information retrieval is currently supported. Given the growing number of Spanish speakers in the U.S. and their increasing use of the Web, we anticipate a significant increase in Spanish-speaking users. This study compares the effectiveness of two common cross-language information retrieval methods using machine translation, query translation versus document translation, using a subset of genuine user queries from ClinicalTrials.gov. Preliminary results conducted with the ClinicalTrials.gov search engine show that in our environment, query translation is statistically significantly better than document translation. We discuss possible reasons for this result and we conclude with suggestions for future work. PMID:14728236

  13. Extended Maptree: a Representation of Fine-Grained Topology and Spatial Hierarchy of Bim

    NASA Astrophysics Data System (ADS)

    Wu, Y.; Shang, J.; Hu, X.; Zhou, Z.

    2017-09-01

    Spatial queries play significant roles in exchanging Building Information Modeling (BIM) data and integrating BIM with indoor spatial information. However, topological operators implemented for BIM spatial queries are limited to qualitative relations (e.g. touching, intersecting). To overcome this limitation, we propose an extended maptree model to represent the fine-grained topology and spatial hierarchy of indoor spaces. The model is based on a maptree which consists of combinatorial maps and an adjacency tree. Topological relations (e.g., adjacency, incidence, and covering) derived from BIM are represented explicitly and formally by extended maptrees, which can facilitate the spatial queries of BIM. To construct an extended maptree, we first use a solid model represented by vertical extrusion and boundary representation to generate the isolated 3-cells of combinatorial maps. Then, the spatial relationships defined in IFC are used to sew them together. Furthermore, the incremental edges of extended maptrees are labeled as removed 2-cells. Based on this, we can merge adjacent 3-cells according to the spatial hierarchy of IFC.

  14. Comparing the performance of two CBIRS indexing schemes

    NASA Astrophysics Data System (ADS)

    Mueller, Wolfgang; Robbert, Guenter; Henrich, Andreas

    2003-01-01

    Content based image retrieval (CBIR) as it is known today has to deal with a number of challenges. Quickly summarized, the main challenges are firstly, to bridge the semantic gap between high-level concepts and low-level features using feedback, secondly to provide performance under adverse conditions. High-dimensional spaces, as well as a demanding machine learning task make the right way of indexing an important issue. When indexing multimedia data, most groups opt for extraction of high-dimensional feature vectors from the data, followed by dimensionality reduction like PCA (Principal Components Analysis) or LSI (Latent Semantic Indexing). The resulting vectors are indexed using spatial indexing structures such as kd-trees or R-trees, for example. Other projects, such as MARS and Viper propose the adaptation of text indexing techniques, notably the inverted file. Here, the Viper system is the most direct adaptation of text retrieval techniques to quantized vectors. However, while the Viper query engine provides decent performance together with impressive user-feedback behavior, as well as the possibility for easy integration of long-term learning algorithms, and support for potentially infinite feature vectors, there has been no comparison of vector-based methods and inverted-file-based methods under similar conditions. In this publication, we compare a CBIR query engine that uses inverted files (Bothrops, a rewrite of the Viper query engine based on a relational database), and a CBIR query engine based on LSD (Local Split Decision) trees for spatial indexing using the same feature sets. The Benchathlon initiative works on providing a set of images and ground truth for simulating image queries by example and corresponding user feedback. When performing the Benchathlon benchmark on a CBIR system (the System Under Test, SUT), a benchmarking harness connects over internet to the SUT, performing a number of queries using an agreed-upon protocol, the multimedia retrieval markup language (MRML). Using this benchmark one can measure the quality of retrieval, as well as the overall (speed) performance of the benchmarked system. Our Benchmarks will draw on the Benchathlon"s work for documenting the retrieval performance of both inverted file-based and LSD tree based techniques. However in addition to these results, we will present statistics, that can be obtained only inside the system under test. These statistics will include the number of complex mathematical operations, as well as the amount of data that has to be read from disk during operation of a query.

  15. Cyclone: java-based querying and computing with Pathway/Genome databases.

    PubMed

    Le Fèvre, François; Smidtas, Serge; Schächter, Vincent

    2007-05-15

    Cyclone aims at facilitating the use of BioCyc, a collection of Pathway/Genome Databases (PGDBs). Cyclone provides a fully extensible Java Object API to analyze and visualize these data. Cyclone can read and write PGDBs, and can write its own data in the CycloneML format. This format is automatically generated from the BioCyc ontology by Cyclone itself, ensuring continued compatibility. Cyclone objects can also be stored in a relational database CycloneDB. Queries can be written in SQL, and in an intuitive and concise object-oriented query language, Hibernate Query Language (HQL). In addition, Cyclone interfaces easily with Java software including the Eclipse IDE for HQL edition, the Jung API for graph algorithms or Cytoscape for graph visualization. Cyclone is freely available under an open source license at: http://sourceforge.net/projects/nemo-cyclone. For download and installation instructions, tutorials, use cases and examples, see http://nemo-cyclone.sourceforge.net.

  16. Learning for Semantic Parsing and Natural Language Generation Using Statistical Machine Translation Techniques

    DTIC Science & Technology

    2007-08-01

    In this domain, queries typically show a deeply nested structure, which makes the semantic parsing task rather challenging , e.g.: What states border...only 80% of the GEOQUERY queries are semantically tractable, which shows that GEOQUERY is indeed a more challenging domain than ATIS. Note that none...a particularly challenging task, because of the inherent ambiguity of natural languages on both sides. It has inspired a large body of research. In

  17. TEQUEL: The query language of SADDLE

    NASA Technical Reports Server (NTRS)

    Rajan, S. D.

    1984-01-01

    A relational database management system is presented that is tailored for engineering applications. A wide variety of engineering data types are supported and the data definition language (DDL) and data manipulation language (DML) are extended to handle matrices. The system can be used either in the standalone mode or through a FORTRAN or PASCAL application program. The query language is of the relational calculus type and allows the user to store, retrieve, update and delete tuples from relations. The relational operations including union, intersect and differ facilitate creation of temporary relations that can be used for manipulating information in a powerful manner. Sample applications are shown to illustrate the creation of data through a FORTRAN program and data manipulation using the TEQUEL DML.

  18. An Analysis of Application Generators.

    DTIC Science & Technology

    1983-03-01

    query language OUEL in the programming language C, THESEUS [20], which embeds relational operators in the language Euclid. Schmidt [21] reports some...34The Design and Implementation of INGRES," ACM-TODS, Vol. 1. No. 3, 1976,. 33 £ 20. Shopiro,J.E., " THESEUS -A Programming Language for Relational

  19. On the Semantics of SPARQL

    NASA Astrophysics Data System (ADS)

    Arenas, Marcelo; Gutierrez, Claudio; Pérez, Jorge

    The Resource Description Framework (RDF) is the standard data model for representing information about World Wide Web resources. In January 2008, it was released the recommendation of the W3C for querying RDF data, a query language called SPARQL. In this chapter, we give a detailed description of the semantics of this language. We start by focusing on the definition of a formal semantics for the core part of SPARQL, and then move to the definition for the entire language, including all the features in the specification of SPARQL by the W3C such as blank nodes in graph patterns and bag semantics for solutions.

  20. A web-based data-querying tool based on ontology-driven methodology and flowchart-based model.

    PubMed

    Ping, Xiao-Ou; Chung, Yufang; Tseng, Yi-Ju; Liang, Ja-Der; Yang, Pei-Ming; Huang, Guan-Tarn; Lai, Feipei

    2013-10-08

    Because of the increased adoption rate of electronic medical record (EMR) systems, more health care records have been increasingly accumulating in clinical data repositories. Therefore, querying the data stored in these repositories is crucial for retrieving the knowledge from such large volumes of clinical data. The aim of this study is to develop a Web-based approach for enriching the capabilities of the data-querying system along the three following considerations: (1) the interface design used for query formulation, (2) the representation of query results, and (3) the models used for formulating query criteria. The Guideline Interchange Format version 3.5 (GLIF3.5), an ontology-driven clinical guideline representation language, was used for formulating the query tasks based on the GLIF3.5 flowchart in the Protégé environment. The flowchart-based data-querying model (FBDQM) query execution engine was developed and implemented for executing queries and presenting the results through a visual and graphical interface. To examine a broad variety of patient data, the clinical data generator was implemented to automatically generate the clinical data in the repository, and the generated data, thereby, were employed to evaluate the system. The accuracy and time performance of the system for three medical query tasks relevant to liver cancer were evaluated based on the clinical data generator in the experiments with varying numbers of patients. In this study, a prototype system was developed to test the feasibility of applying a methodology for building a query execution engine using FBDQMs by formulating query tasks using the existing GLIF. The FBDQM-based query execution engine was used to successfully retrieve the clinical data based on the query tasks formatted using the GLIF3.5 in the experiments with varying numbers of patients. The accuracy of the three queries (ie, "degree of liver damage," "degree of liver damage when applying a mutually exclusive setting," and "treatments for liver cancer") was 100% for all four experiments (10 patients, 100 patients, 1000 patients, and 10,000 patients). Among the three measured query phases, (1) structured query language operations, (2) criteria verification, and (3) other, the first two had the longest execution time. The ontology-driven FBDQM-based approach enriched the capabilities of the data-querying system. The adoption of the GLIF3.5 increased the potential for interoperability, shareability, and reusability of the query tasks.

  1. Generic functional requirements for a NASA general-purpose data base management system

    NASA Technical Reports Server (NTRS)

    Lohman, G. M.

    1981-01-01

    Generic functional requirements for a general-purpose, multi-mission data base management system (DBMS) for application to remotely sensed scientific data bases are detailed. The motivation for utilizing DBMS technology in this environment is explained. The major requirements include: (1) a DBMS for scientific observational data; (2) a multi-mission capability; (3) user-friendly; (4) extensive and integrated information about data; (5) robust languages for defining data structures and formats; (6) scientific data types and structures; (7) flexible physical access mechanisms; (8) ways of representing spatial relationships; (9) a high level nonprocedural interactive query and data manipulation language; (10) data base maintenance utilities; (11) high rate input/output and large data volume storage; and adaptability to a distributed data base and/or data base machine configuration. Detailed functions are specified in a top-down hierarchic fashion. Implementation, performance, and support requirements are also given.

  2. Active Wiki Knowledge Repository

    DTIC Science & Technology

    2012-10-01

    data using SPARQL queries or RESTful web-services; ‘gardening’ tools for examining the semantically tagged content in the wiki; high-level language tool...Tagging & RDF triple-store Fusion and inferences for collaboration Tools for Consuming Data SPARQL queries or RESTful WS Inference & Gardening tools...other stores using AW SPARQL queries and rendering templates; and 4) Interactively share maps and other content using annotation tools to post notes

  3. A novel informatics concept for high-throughput shotgun lipidomics based on the molecular fragmentation query language

    PubMed Central

    2011-01-01

    Shotgun lipidome profiling relies on direct mass spectrometric analysis of total lipid extracts from cells, tissues or organisms and is a powerful tool to elucidate the molecular composition of lipidomes. We present a novel informatics concept of the molecular fragmentation query language implemented within the LipidXplorer open source software kit that supports accurate quantification of individual species of any ionizable lipid class in shotgun spectra acquired on any mass spectrometry platform. PMID:21247462

  4. Spatial Query for Planetary Data

    NASA Technical Reports Server (NTRS)

    Shams, Khawaja S.; Crockett, Thomas M.; Powell, Mark W.; Joswig, Joseph C.; Fox, Jason M.

    2011-01-01

    Science investigators need to quickly and effectively assess past observations of specific locations on a planetary surface. This innovation involves a location-based search technology that was adapted and applied to planetary science data to support a spatial query capability for mission operations software. High-performance location-based searching requires the use of spatial data structures for database organization. Spatial data structures are designed to organize datasets based on their coordinates in a way that is optimized for location-based retrieval. The particular spatial data structure that was adapted for planetary data search is the R+ tree.

  5. Cross-Language Information Retrieval: An Analysis of Errors.

    ERIC Educational Resources Information Center

    Ruiz, Miguel E.; Srinivasan, Padmini

    1998-01-01

    Investigates an automatic method for Cross Language Information Retrieval (CLIR) that utilizes the multilingual Unified Medical Language System (UMLS) Metathesaurus to translate Spanish natural-language queries into English. Results indicate that for Spanish, the UMLS Metathesaurus-based CLIR method is at least equivalent to if not better than…

  6. elevatr: Access Elevation Data from Various APIs | Science ...

    EPA Pesticide Factsheets

    Several web services are available that provide access to elevation data. This package provides access to several of those services and returns elevation data either as a SpatialPointsDataFrame from point elevation services or as a raster object from raster elevation services. Currently, the package supports access to the Mapzen Elevation Service, Mapzen Terrain Service, and the USGS Elevation Point Query Service. The R language for statistical computing is increasingly used for spatial data analysis . This R package, elevatr, is in response to this and provides access to elevation data from various sources directly in R. The impact of `elevatr` is that it will 1) facilitate spatial analysis in R by providing access to foundational dataset for many types of analyses (e.g. hydrology, limnology) 2) open up a new set of users and uses for APIs widely used outside of R, and 3) provide an excellent example federal open source development as promoted by the Federal Source Code Policy (https://sourcecode.cio.gov/).

  7. Visually defining and querying consistent multi-granular clinical temporal abstractions.

    PubMed

    Combi, Carlo; Oliboni, Barbara

    2012-02-01

    The main goal of this work is to propose a framework for the visual specification and query of consistent multi-granular clinical temporal abstractions. We focus on the issue of querying patient clinical information by visually defining and composing temporal abstractions, i.e., high level patterns derived from several time-stamped raw data. In particular, we focus on the visual specification of consistent temporal abstractions with different granularities and on the visual composition of different temporal abstractions for querying clinical databases. Temporal abstractions on clinical data provide a concise and high-level description of temporal raw data, and a suitable way to support decision making. Granularities define partitions on the time line and allow one to represent time and, thus, temporal clinical information at different levels of detail, according to the requirements coming from the represented clinical domain. The visual representation of temporal information has been considered since several years in clinical domains. Proposed visualization techniques must be easy and quick to understand, and could benefit from visual metaphors that do not lead to ambiguous interpretations. Recently, physical metaphors such as strips, springs, weights, and wires have been proposed and evaluated on clinical users for the specification of temporal clinical abstractions. Visual approaches to boolean queries have been considered in the last years and confirmed that the visual support to the specification of complex boolean queries is both an important and difficult research topic. We propose and describe a visual language for the definition of temporal abstractions based on a set of intuitive metaphors (striped wall, plastered wall, brick wall), allowing the clinician to use different granularities. A new algorithm, underlying the visual language, allows the physician to specify only consistent abstractions, i.e., abstractions not containing contradictory conditions on the component abstractions. Moreover, we propose a visual query language where different temporal abstractions can be composed to build complex queries: temporal abstractions are visually connected through the usual logical connectives AND, OR, and NOT. The proposed visual language allows one to simply define temporal abstractions by using intuitive metaphors, and to specify temporal intervals related to abstractions by using different temporal granularities. The physician can interact with the designed and implemented tool by point-and-click selections, and can visually compose queries involving several temporal abstractions. The evaluation of the proposed granularity-related metaphors consisted in two parts: (i) solving 30 interpretation exercises by choosing the correct interpretation of a given screenshot representing a possible scenario, and (ii) solving a complex exercise, by visually specifying through the interface a scenario described only in natural language. The exercises were done by 13 subjects. The percentage of correct answers to the interpretation exercises were slightly different with respect to the considered metaphors (54.4--striped wall, 73.3--plastered wall, 61--brick wall, and 61--no wall), but post hoc statistical analysis on means confirmed that differences were not statistically significant. The result of the user's satisfaction questionnaire related to the evaluation of the proposed granularity-related metaphors ratified that there are no preferences for one of them. The evaluation of the proposed logical notation consisted in two parts: (i) solving five interpretation exercises provided by a screenshot representing a possible scenario and by three different possible interpretations, of which only one was correct, and (ii) solving five exercises, by visually defining through the interface a scenario described only in natural language. Exercises had an increasing difficulty. The evaluation involved a total of 31 subjects. Results related to this evaluation phase confirmed us about the soundness of the proposed solution even in comparison with a well known proposal based on a tabular query form (the only significant difference is that our proposal requires more time for the training phase: 21 min versus 14 min). In this work we have considered the issue of visually composing and querying temporal clinical patient data. In this context we have proposed a visual framework for the specification of consistent temporal abstractions with different granularities and for the visual composition of different temporal abstractions to build (possibly) complex queries on clinical databases. A new algorithm has been proposed to check the consistency of the specified granular abstraction. From the evaluation of the proposed metaphors and interfaces and from the comparison of the visual query language with a well known visual method for boolean queries, the soundness of the overall system has been confirmed; moreover, pros and cons and possible improvements emerged from the comparison of different visual metaphors and solutions. Copyright © 2011 Elsevier B.V. All rights reserved.

  8. Automatic Query Formulations in Information Retrieval.

    ERIC Educational Resources Information Center

    Salton, G.; And Others

    1983-01-01

    Introduces methods designed to reduce role of search intermediaries by generating Boolean search formulations automatically using term frequency considerations from natural language statements provided by system patrons. Experimental results are supplied and methods are described for applying automatic query formulation process in practice.…

  9. On-Demand Associative Cross-Language Information Retrieval

    NASA Astrophysics Data System (ADS)

    Geraldo, André Pinto; Moreira, Viviane P.; Gonçalves, Marcos A.

    This paper proposes the use of algorithms for mining association rules as an approach for Cross-Language Information Retrieval. These algorithms have been widely used to analyse market basket data. The idea is to map the problem of finding associations between sales items to the problem of finding term translations over a parallel corpus. The proposal was validated by means of experiments using queries in two distinct languages: Portuguese and Finnish to retrieve documents in English. The results show that the performance of our proposed approach is comparable to the performance of the monolingual baseline and to query translation via machine translation, even though these systems employ more complex Natural Language Processing techniques. The combination between machine translation and our approach yielded the best results, even outperforming the monolingual baseline.

  10. START: a system for flexible analysis of hundreds of genomic signal tracks in few lines of SQL-like queries.

    PubMed

    Zhu, Xinjie; Zhang, Qiang; Ho, Eric Dun; Yu, Ken Hung-On; Liu, Chris; Huang, Tim H; Cheng, Alfred Sze-Lok; Kao, Ben; Lo, Eric; Yip, Kevin Y

    2017-09-22

    A genomic signal track is a set of genomic intervals associated with values of various types, such as measurements from high-throughput experiments. Analysis of signal tracks requires complex computational methods, which often make the analysts focus too much on the detailed computational steps rather than on their biological questions. Here we propose Signal Track Query Language (STQL) for simple analysis of signal tracks. It is a Structured Query Language (SQL)-like declarative language, which means one only specifies what computations need to be done but not how these computations are to be carried out. STQL provides a rich set of constructs for manipulating genomic intervals and their values. To run STQL queries, we have developed the Signal Track Analytical Research Tool (START, http://yiplab.cse.cuhk.edu.hk/start/ ), a system that includes a Web-based user interface and a back-end execution system. The user interface helps users select data from our database of around 10,000 commonly-used public signal tracks, manage their own tracks, and construct, store and share STQL queries. The back-end system automatically translates STQL queries into optimized low-level programs and runs them on a computer cluster in parallel. We use STQL to perform 14 representative analytical tasks. By repeating these analyses using bedtools, Galaxy and custom Python scripts, we show that the STQL solution is usually the simplest, and the parallel execution achieves significant speed-up with large data files. Finally, we describe how a biologist with minimal formal training in computer programming self-learned STQL to analyze DNA methylation data we produced from 60 pairs of hepatocellular carcinoma (HCC) samples. Overall, STQL and START provide a generic way for analyzing a large number of genomic signal tracks in parallel easily.

  11. VISAGE: Interactive Visual Graph Querying.

    PubMed

    Pienta, Robert; Navathe, Shamkant; Tamersoy, Acar; Tong, Hanghang; Endert, Alex; Chau, Duen Horng

    2016-06-01

    Extracting useful patterns from large network datasets has become a fundamental challenge in many domains. We present VISAGE, an interactive visual graph querying approach that empowers users to construct expressive queries, without writing complex code (e.g., finding money laundering rings of bankers and business owners). Our contributions are as follows: (1) we introduce graph autocomplete , an interactive approach that guides users to construct and refine queries, preventing over-specification; (2) VISAGE guides the construction of graph queries using a data-driven approach, enabling users to specify queries with varying levels of specificity, from concrete and detailed (e.g., query by example), to abstract (e.g., with "wildcard" nodes of any types), to purely structural matching; (3) a twelve-participant, within-subject user study demonstrates VISAGE's ease of use and the ability to construct graph queries significantly faster than using a conventional query language; (4) VISAGE works on real graphs with over 468K edges, achieving sub-second response times for common queries.

  12. VISAGE: Interactive Visual Graph Querying

    PubMed Central

    Pienta, Robert; Navathe, Shamkant; Tamersoy, Acar; Tong, Hanghang; Endert, Alex; Chau, Duen Horng

    2017-01-01

    Extracting useful patterns from large network datasets has become a fundamental challenge in many domains. We present VISAGE, an interactive visual graph querying approach that empowers users to construct expressive queries, without writing complex code (e.g., finding money laundering rings of bankers and business owners). Our contributions are as follows: (1) we introduce graph autocomplete, an interactive approach that guides users to construct and refine queries, preventing over-specification; (2) VISAGE guides the construction of graph queries using a data-driven approach, enabling users to specify queries with varying levels of specificity, from concrete and detailed (e.g., query by example), to abstract (e.g., with “wildcard” nodes of any types), to purely structural matching; (3) a twelve-participant, within-subject user study demonstrates VISAGE’s ease of use and the ability to construct graph queries significantly faster than using a conventional query language; (4) VISAGE works on real graphs with over 468K edges, achieving sub-second response times for common queries. PMID:28553670

  13. Semantic based man-machine interface for real-time communication

    NASA Technical Reports Server (NTRS)

    Ali, M.; Ai, C.-S.

    1988-01-01

    A flight expert system (FLES) was developed to assist pilots in monitoring, diagnosing and recovering from in-flight faults. To provide a communications interface between the flight crew and FLES, a natural language interface (NALI) was implemented. Input to NALI is processed by three processors: (1) the semantics parser; (2) the knowledge retriever; and (3) the response generator. First the semantic parser extracts meaningful words and phrases to generate an internal representation of the query. At this point, the semantic parser has the ability to map different input forms related to the same concept into the same internal representation. Then the knowledge retriever analyzes and stores the context of the query to aid in resolving ellipses and pronoun references. At the end of this process, a sequence of retrievel functions is created as a first step in generating the proper response. Finally, the response generator generates the natural language response to the query. The architecture of NALI was designed to process both temporal and nontemporal queries. The architecture and implementation of NALI are described.

  14. Time-related patient data retrieval for the case studies from the pharmacogenomics research network

    PubMed Central

    Zhu, Qian; Tao, Cui; Ding, Ying; Chute, Christopher G.

    2012-01-01

    There are lots of question-based data elements from the pharmacogenomics research network (PGRN) studies. Many data elements contain temporal information. To semantically represent these elements so that they can be machine processiable is a challenging problem for the following reasons: (1) the designers of these studies usually do not have the knowledge of any computer modeling and query languages, so that the original data elements usually are represented in spreadsheets in human languages; and (2) the time aspects in these data elements can be too complex to be represented faithfully in a machine-understandable way. In this paper, we introduce our efforts on representing these data elements using semantic web technologies. We have developed an ontology, CNTRO, for representing clinical events and their temporal relations in the web ontology language (OWL). Here we use CNTRO to represent the time aspects in the data elements. We have evaluated 720 time-related data elements from PGRN studies. We adapted and extended the knowledge representation requirements for EliXR-TIME to categorize our data elements. A CNTRO-based SPARQL query builder has been developed to customize users’ own SPARQL queries for each knowledge representation requirement. The SPARQL query builder has been evaluated with a simulated EHR triple store to ensure its functionalities. PMID:23076712

  15. Time-related patient data retrieval for the case studies from the pharmacogenomics research network.

    PubMed

    Zhu, Qian; Tao, Cui; Ding, Ying; Chute, Christopher G

    2012-11-01

    There are lots of question-based data elements from the pharmacogenomics research network (PGRN) studies. Many data elements contain temporal information. To semantically represent these elements so that they can be machine processiable is a challenging problem for the following reasons: (1) the designers of these studies usually do not have the knowledge of any computer modeling and query languages, so that the original data elements usually are represented in spreadsheets in human languages; and (2) the time aspects in these data elements can be too complex to be represented faithfully in a machine-understandable way. In this paper, we introduce our efforts on representing these data elements using semantic web technologies. We have developed an ontology, CNTRO, for representing clinical events and their temporal relations in the web ontology language (OWL). Here we use CNTRO to represent the time aspects in the data elements. We have evaluated 720 time-related data elements from PGRN studies. We adapted and extended the knowledge representation requirements for EliXR-TIME to categorize our data elements. A CNTRO-based SPARQL query builder has been developed to customize users' own SPARQL queries for each knowledge representation requirement. The SPARQL query builder has been evaluated with a simulated EHR triple store to ensure its functionalities.

  16. SPARQLGraph: a web-based platform for graphically querying biological Semantic Web databases.

    PubMed

    Schweiger, Dominik; Trajanoski, Zlatko; Pabinger, Stephan

    2014-08-15

    Semantic Web has established itself as a framework for using and sharing data across applications and database boundaries. Here, we present a web-based platform for querying biological Semantic Web databases in a graphical way. SPARQLGraph offers an intuitive drag & drop query builder, which converts the visual graph into a query and executes it on a public endpoint. The tool integrates several publicly available Semantic Web databases, including the databases of the just recently released EBI RDF platform. Furthermore, it provides several predefined template queries for answering biological questions. Users can easily create and save new query graphs, which can also be shared with other researchers. This new graphical way of creating queries for biological Semantic Web databases considerably facilitates usability as it removes the requirement of knowing specific query languages and database structures. The system is freely available at http://sparqlgraph.i-med.ac.at.

  17. Computing health quality measures using Informatics for Integrating Biology and the Bedside.

    PubMed

    Klann, Jeffrey G; Murphy, Shawn N

    2013-04-19

    The Health Quality Measures Format (HQMF) is a Health Level 7 (HL7) standard for expressing computable Clinical Quality Measures (CQMs). Creating tools to process HQMF queries in clinical databases will become increasingly important as the United States moves forward with its Health Information Technology Strategic Plan to Stages 2 and 3 of the Meaningful Use incentive program (MU2 and MU3). Informatics for Integrating Biology and the Bedside (i2b2) is one of the analytical databases used as part of the Office of the National Coordinator (ONC)'s Query Health platform to move toward this goal. Our goal is to integrate i2b2 with the Query Health HQMF architecture, to prepare for other HQMF use-cases (such as MU2 and MU3), and to articulate the functional overlap between i2b2 and HQMF. Therefore, we analyze the structure of HQMF, and then we apply this understanding to HQMF computation on the i2b2 clinical analytical database platform. Specifically, we develop a translator between two query languages, HQMF and i2b2, so that the i2b2 platform can compute HQMF queries. We use the HQMF structure of queries for aggregate reporting, which define clinical data elements and the temporal and logical relationships between them. We use the i2b2 XML format, which allows flexible querying of a complex clinical data repository in an easy-to-understand domain-specific language. The translator can represent nearly any i2b2-XML query as HQMF and execute in i2b2 nearly any HQMF query expressible in i2b2-XML. This translator is part of the freely available reference implementation of the QueryHealth initiative. We analyze limitations of the conversion and find it covers many, but not all, of the complex temporal and logical operators required by quality measures. HQMF is an expressive language for defining quality measures, and it will be important to understand and implement for CQM computation, in both meaningful use and population health. However, its current form might allow complexity that is intractable for current database systems (both in terms of implementation and computation). Our translator, which supports the subset of HQMF currently expressible in i2b2-XML, may represent the beginnings of a practical compromise. It is being pilot-tested in two Query Health demonstration projects, and it can be further expanded to balance computational tractability with the advanced features needed by measure developers.

  18. Computing Health Quality Measures Using Informatics for Integrating Biology and the Bedside

    PubMed Central

    Murphy, Shawn N

    2013-01-01

    Background The Health Quality Measures Format (HQMF) is a Health Level 7 (HL7) standard for expressing computable Clinical Quality Measures (CQMs). Creating tools to process HQMF queries in clinical databases will become increasingly important as the United States moves forward with its Health Information Technology Strategic Plan to Stages 2 and 3 of the Meaningful Use incentive program (MU2 and MU3). Informatics for Integrating Biology and the Bedside (i2b2) is one of the analytical databases used as part of the Office of the National Coordinator (ONC)’s Query Health platform to move toward this goal. Objective Our goal is to integrate i2b2 with the Query Health HQMF architecture, to prepare for other HQMF use-cases (such as MU2 and MU3), and to articulate the functional overlap between i2b2 and HQMF. Therefore, we analyze the structure of HQMF, and then we apply this understanding to HQMF computation on the i2b2 clinical analytical database platform. Specifically, we develop a translator between two query languages, HQMF and i2b2, so that the i2b2 platform can compute HQMF queries. Methods We use the HQMF structure of queries for aggregate reporting, which define clinical data elements and the temporal and logical relationships between them. We use the i2b2 XML format, which allows flexible querying of a complex clinical data repository in an easy-to-understand domain-specific language. Results The translator can represent nearly any i2b2-XML query as HQMF and execute in i2b2 nearly any HQMF query expressible in i2b2-XML. This translator is part of the freely available reference implementation of the QueryHealth initiative. We analyze limitations of the conversion and find it covers many, but not all, of the complex temporal and logical operators required by quality measures. Conclusions HQMF is an expressive language for defining quality measures, and it will be important to understand and implement for CQM computation, in both meaningful use and population health. However, its current form might allow complexity that is intractable for current database systems (both in terms of implementation and computation). Our translator, which supports the subset of HQMF currently expressible in i2b2-XML, may represent the beginnings of a practical compromise. It is being pilot-tested in two Query Health demonstration projects, and it can be further expanded to balance computational tractability with the advanced features needed by measure developers. PMID:23603227

  19. A Web-Based Data-Querying Tool Based on Ontology-Driven Methodology and Flowchart-Based Model

    PubMed Central

    Ping, Xiao-Ou; Chung, Yufang; Liang, Ja-Der; Yang, Pei-Ming; Huang, Guan-Tarn; Lai, Feipei

    2013-01-01

    Background Because of the increased adoption rate of electronic medical record (EMR) systems, more health care records have been increasingly accumulating in clinical data repositories. Therefore, querying the data stored in these repositories is crucial for retrieving the knowledge from such large volumes of clinical data. Objective The aim of this study is to develop a Web-based approach for enriching the capabilities of the data-querying system along the three following considerations: (1) the interface design used for query formulation, (2) the representation of query results, and (3) the models used for formulating query criteria. Methods The Guideline Interchange Format version 3.5 (GLIF3.5), an ontology-driven clinical guideline representation language, was used for formulating the query tasks based on the GLIF3.5 flowchart in the Protégé environment. The flowchart-based data-querying model (FBDQM) query execution engine was developed and implemented for executing queries and presenting the results through a visual and graphical interface. To examine a broad variety of patient data, the clinical data generator was implemented to automatically generate the clinical data in the repository, and the generated data, thereby, were employed to evaluate the system. The accuracy and time performance of the system for three medical query tasks relevant to liver cancer were evaluated based on the clinical data generator in the experiments with varying numbers of patients. Results In this study, a prototype system was developed to test the feasibility of applying a methodology for building a query execution engine using FBDQMs by formulating query tasks using the existing GLIF. The FBDQM-based query execution engine was used to successfully retrieve the clinical data based on the query tasks formatted using the GLIF3.5 in the experiments with varying numbers of patients. The accuracy of the three queries (ie, “degree of liver damage,” “degree of liver damage when applying a mutually exclusive setting,” and “treatments for liver cancer”) was 100% for all four experiments (10 patients, 100 patients, 1000 patients, and 10,000 patients). Among the three measured query phases, (1) structured query language operations, (2) criteria verification, and (3) other, the first two had the longest execution time. Conclusions The ontology-driven FBDQM-based approach enriched the capabilities of the data-querying system. The adoption of the GLIF3.5 increased the potential for interoperability, shareability, and reusability of the query tasks. PMID:25600078

  20. Computer systems and methods for the query and visualization of multidimensional databases

    DOEpatents

    Stolte, Chris; Tang, Diane L.; Hanrahan, Patrick

    2006-08-08

    A method and system for producing graphics. A hierarchical structure of a database is determined. A visual table, comprising a plurality of panes, is constructed by providing a specification that is in a language based on the hierarchical structure of the database. In some cases, this language can include fields that are in the database schema. The database is queried to retrieve a set of tuples in accordance with the specification. A subset of the set of tuples is associated with a pane in the plurality of panes.

  1. Computer systems and methods for the query and visualization of multidimensional database

    DOEpatents

    Stolte, Chris; Tang, Diane L.; Hanrahan, Patrick

    2010-05-11

    A method and system for producing graphics. A hierarchical structure of a database is determined. A visual table, comprising a plurality of panes, is constructed by providing a specification that is in a language based on the hierarchical structure of the database. In some cases, this language can include fields that are in the database schema. The database is queried to retrieve a set of tuples in accordance with the specification. A subset of the set of tuples is associated with a pane in the plurality of panes.

  2. A database system to support image algorithm evaluation

    NASA Technical Reports Server (NTRS)

    Lien, Y. E.

    1977-01-01

    The design is given of an interactive image database system IMDB, which allows the user to create, retrieve, store, display, and manipulate images through the facility of a high-level, interactive image query (IQ) language. The query language IQ permits the user to define false color functions, pixel value transformations, overlay functions, zoom functions, and windows. The user manipulates the images through generic functions. The user can direct images to display devices for visual and qualitative analysis. Image histograms and pixel value distributions can also be computed to obtain a quantitative analysis of images.

  3. CytoscapeRPC: a plugin to create, modify and query Cytoscape networks from scripting languages.

    PubMed

    Bot, Jan J; Reinders, Marcel J T

    2011-09-01

    CytoscapeRPC is a plugin for Cytoscape which allows users to create, query and modify Cytoscape networks from any programming language which supports XML-RPC. This enables them to access Cytoscape functionality and visualize their data interactively without leaving the programming environment with which they are familiar. Install through the Cytoscape plugin manager or visit the web page: http://wiki.nbic.nl/index.php/CytoscapeRPC for the user tutorial and download. j.j.bot@tudelft.nl; j.j.bot@tudelft.nl.

  4. The Use of Dynamic Segment Scoring for Language-Independent Question Answering

    DTIC Science & Technology

    2001-01-01

    initial window with one sentence is compared to scores corre- his/PRONOUN brother/ CONSANGUINITY like/SIMILARITY his/PRONOUN call/NOMENCLATURE he/PRONOUN...the query processing mod- ule. Using the differences between index numbers to specify phys- ical distance relationships among query keywords, we can

  5. A Simple Blueprint for Automatic Boolean Query Processing.

    ERIC Educational Resources Information Center

    Salton, G.

    1988-01-01

    Describes a new Boolean retrieval environment in which an extended soft Boolean logic is used to automatically construct queries from original natural language formulations provided by users. Experimental results that compare the retrieval effectiveness of this method to conventional Boolean and vector processing are discussed. (27 references)…

  6. Experiments in Multi-Lingual Information Retrieval.

    ERIC Educational Resources Information Center

    Salton, Gerard

    A comparison was made of the performance in an automatic information retrieval environment of user queries and document abstracts available in natural language form in both English and French. The results obtained indicate that the automatic indexing and retrieval techniques actually used appear equally effective in handling the query and document…

  7. Implementation of relational data base management systems on micro-computers

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Huang, C.L.

    1982-01-01

    This dissertation describes an implementation of a Relational Data Base Management System on a microcomputer. A specific floppy disk based hardward called TERAK is being used, and high level query interface which is similar to a subset of the SEQUEL language is provided. The system contains sub-systems such as I/O, file management, virtual memory management, query system, B-tree management, scanner, command interpreter, expression compiler, garbage collection, linked list manipulation, disk space management, etc. The software has been implemented to fulfill the following goals: (1) it is highly modularized. (2) The system is physically segmented into 16 logically independent, overlayable segments,more » in a way such that a minimal amount of memory is needed at execution time. (3) Virtual memory system is simulated that provides the system with seemingly unlimited memory space. (4) A language translator is applied to recognize user requests in the query language. The code generation of this translator generates compact code for the execution of UPDATE, DELETE, and QUERY commands. (5) A complete set of basic functions needed for on-line data base manipulations is provided through the use of a friendly query interface. (6) To eliminate the dependency on the environment (both software and hardware) as much as possible, so that it would be easy to transplant the system to other computers. (7) To simulate each relation as a sequential file. It is intended to be a highly efficient, single user system suited to be used by small or medium sized organizations for, say, administrative purposes. Experiments show that quite satisfying results have indeed been achieved.« less

  8. KARL: A Knowledge-Assisted Retrieval Language. Presentation visuals. M.S. Thesis Final Report, 1 Jul. 1985 - 31 Dec. 1987

    NASA Technical Reports Server (NTRS)

    Dominick, Wayne D. (Editor); Triantafyllopoulos, Spiros

    1985-01-01

    A collection of presentation visuals associated with the companion report entitled KARL: A Knowledge-Assisted Retrieval Language, is presented. Information is given on data retrieval, natural language database front ends, generic design objectives, processing capababilities and the query processing cycle.

  9. Getting Answers to Natural Language Questions on the Web.

    ERIC Educational Resources Information Center

    Radev, Dragomir R.; Libner, Kelsey; Fan, Weiguo

    2002-01-01

    Describes a study that investigated the use of natural language questions on Web search engines. Highlights include query languages; differences in search engine syntax; and results of logistic regression and analysis of variance that showed aspects of questions that predicted significantly different performances, including the number of words,…

  10. Army technology development. IBIS query. Software to support the Image Based Information System (IBIS) expansion for mapping, charting and geodesy

    NASA Technical Reports Server (NTRS)

    Friedman, S. Z.; Walker, R. E.; Aitken, R. B.

    1986-01-01

    The Image Based Information System (IBIS) has been under development at the Jet Propulsion Laboratory (JPL) since 1975. It is a collection of more than 90 programs that enable processing of image, graphical, tabular data for spatial analysis. IBIS can be utilized to create comprehensive geographic data bases. From these data, an analyst can study various attributes describing characteristics of a given study area. Even complex combinations of disparate data types can be synthesized to obtain a new perspective on spatial phenomena. In 1984, new query software was developed enabling direct Boolean queries of IBIS data bases through the submission of easily understood expressions. An improved syntax methodology, a data dictionary, and display software simplified the analysts' tasks associated with building, executing, and subsequently displaying the results of a query. The primary purpose of this report is to describe the features and capabilities of the new query software. A secondary purpose of this report is to compare this new query software to the query software developed previously (Friedman, 1982). With respect to this topic, the relative merits and drawbacks of both approaches are covered.

  11. Query-Based Outlier Detection in Heterogeneous Information Networks.

    PubMed

    Kuck, Jonathan; Zhuang, Honglei; Yan, Xifeng; Cam, Hasan; Han, Jiawei

    2015-03-01

    Outlier or anomaly detection in large data sets is a fundamental task in data science, with broad applications. However, in real data sets with high-dimensional space, most outliers are hidden in certain dimensional combinations and are relative to a user's search space and interest. It is often more effective to give power to users and allow them to specify outlier queries flexibly, and the system will then process such mining queries efficiently. In this study, we introduce the concept of query-based outlier in heterogeneous information networks, design a query language to facilitate users to specify such queries flexibly, define a good outlier measure in heterogeneous networks, and study how to process outlier queries efficiently in large data sets. Our experiments on real data sets show that following such a methodology, interesting outliers can be defined and uncovered flexibly and effectively in large heterogeneous networks.

  12. Query-Based Outlier Detection in Heterogeneous Information Networks

    PubMed Central

    Kuck, Jonathan; Zhuang, Honglei; Yan, Xifeng; Cam, Hasan; Han, Jiawei

    2015-01-01

    Outlier or anomaly detection in large data sets is a fundamental task in data science, with broad applications. However, in real data sets with high-dimensional space, most outliers are hidden in certain dimensional combinations and are relative to a user’s search space and interest. It is often more effective to give power to users and allow them to specify outlier queries flexibly, and the system will then process such mining queries efficiently. In this study, we introduce the concept of query-based outlier in heterogeneous information networks, design a query language to facilitate users to specify such queries flexibly, define a good outlier measure in heterogeneous networks, and study how to process outlier queries efficiently in large data sets. Our experiments on real data sets show that following such a methodology, interesting outliers can be defined and uncovered flexibly and effectively in large heterogeneous networks. PMID:27064397

  13. An ontology-driven tool for structured data acquisition using Web forms.

    PubMed

    Gonçalves, Rafael S; Tu, Samson W; Nyulas, Csongor I; Tierney, Michael J; Musen, Mark A

    2017-08-01

    Structured data acquisition is a common task that is widely performed in biomedicine. However, current solutions for this task are far from providing a means to structure data in such a way that it can be automatically employed in decision making (e.g., in our example application domain of clinical functional assessment, for determining eligibility for disability benefits) based on conclusions derived from acquired data (e.g., assessment of impaired motor function). To use data in these settings, we need it structured in a way that can be exploited by automated reasoning systems, for instance, in the Web Ontology Language (OWL); the de facto ontology language for the Web. We tackle the problem of generating Web-based assessment forms from OWL ontologies, and aggregating input gathered through these forms as an ontology of "semantically-enriched" form data that can be queried using an RDF query language, such as SPARQL. We developed an ontology-based structured data acquisition system, which we present through its specific application to the clinical functional assessment domain. We found that data gathered through our system is highly amenable to automatic analysis using queries. We demonstrated how ontologies can be used to help structuring Web-based forms and to semantically enrich the data elements of the acquired structured data. The ontologies associated with the enriched data elements enable automated inferences and provide a rich vocabulary for performing queries.

  14. SPARQLog: SPARQL with Rules and Quantification

    NASA Astrophysics Data System (ADS)

    Bry, François; Furche, Tim; Marnette, Bruno; Ley, Clemens; Linse, Benedikt; Poppe, Olga

    SPARQL has become the gold-standard for RDF query languages. Nevertheless, we believe there is further room for improving RDF query languages. In this chapter, we investigate the addition of rules and quantifier alternation to SPARQL. That extension, called SPARQLog, extends previous RDF query languages by arbitrary quantifier alternation: blank nodes may occur in the scope of all, some, or none of the universal variables of a rule. In addition, SPARQLog is aware of important RDF features such as the distinction between blank nodes, literals and IRIs or the RDFS vocabulary. The semantics of SPARQLog is closed (every answer is an RDF graph), but lifts RDF's restrictions on literal and blank node occurrences for intermediary data. We show how to define a sound and complete operational semantics that can be implemented using existing logic programming techniques. While SPARQLog is Turing complete, we identify a decidable (in fact, polynomial time) fragment SwARQLog ensuring polynomial data-complexity inspired from the notion of super-weak acyclicity in data exchange. Furthermore, we prove that SPARQLog with no universal quantifiers in the scope of existential ones (∀ ∃ fragment) is equivalent to full SPARQLog in presence of graph projection. Thus, the convenience of arbitrary quantifier alternation comes, in fact, for free. These results, though here presented in the context of RDF querying, apply similarly also in the more general setting of data exchange.

  15. Database Reports Over the Internet

    NASA Technical Reports Server (NTRS)

    Smith, Dean Lance

    2002-01-01

    Most of the summer was spent developing software that would permit existing test report forms to be printed over the web on a printer that is supported by Adobe Acrobat Reader. The data is stored in a DBMS (Data Base Management System). The client asks for the information from the database using an HTML (Hyper Text Markup Language) form in a web browser. JavaScript is used with the forms to assist the user and verify the integrity of the entered data. Queries to a database are made in SQL (Sequential Query Language), a widely supported standard for making queries to databases. Java servlets, programs written in the Java programming language running under the control of network server software, interrogate the database and complete a PDF form template kept in a file. The completed report is sent to the browser requesting the report. Some errors are sent to the browser in an HTML web page, others are reported to the server. Access to the databases was restricted since the data are being transported to new DBMS software that will run on new hardware. However, the SQL queries were made to Microsoft Access, a DBMS that is available on most PCs (Personal Computers). Access does support the SQL commands that were used, and a database was created with Access that contained typical data for the report forms. Some of the problems and features are discussed below.

  16. Querying XML Data with SPARQL

    NASA Astrophysics Data System (ADS)

    Bikakis, Nikos; Gioldasis, Nektarios; Tsinaraki, Chrisa; Christodoulakis, Stavros

    SPARQL is today the standard access language for Semantic Web data. In the recent years XML databases have also acquired industrial importance due to the widespread applicability of XML in the Web. In this paper we present a framework that bridges the heterogeneity gap and creates an interoperable environment where SPARQL queries are used to access XML databases. Our approach assumes that fairly generic mappings between ontology constructs and XML Schema constructs have been automatically derived or manually specified. The mappings are used to automatically translate SPARQL queries to semantically equivalent XQuery queries which are used to access the XML databases. We present the algorithms and the implementation of SPARQL2XQuery framework, which is used for answering SPARQL queries over XML databases.

  17. SkyQuery - A Prototype Distributed Query and Cross-Matching Web Service for the Virtual Observatory

    NASA Astrophysics Data System (ADS)

    Thakar, A. R.; Budavari, T.; Malik, T.; Szalay, A. S.; Fekete, G.; Nieto-Santisteban, M.; Haridas, V.; Gray, J.

    2002-12-01

    We have developed a prototype distributed query and cross-matching service for the VO community, called SkyQuery, which is implemented with hierarchichal Web Services. SkyQuery enables astronomers to run combined queries on existing distributed heterogeneous astronomy archives. SkyQuery provides a simple, user-friendly interface to run distributed queries over the federation of registered astronomical archives in the VO. The SkyQuery client connects to the portal Web Service, which farms the query out to the individual archives, which are also Web Services called SkyNodes. The cross-matching algorithm is run recursively on each SkyNode. Each archive is a relational DBMS with a HTM index for fast spatial lookups. The results of the distributed query are returned as an XML DataSet that is automatically rendered by the client. SkyQuery also returns the image cutout corresponding to the query result. SkyQuery finds not only matches between the various catalogs, but also dropouts - objects that exist in some of the catalogs but not in others. This is often as important as finding matches. We demonstrate the utility of SkyQuery with a brown-dwarf search between SDSS and 2MASS, and a search for radio-quiet quasars in SDSS, 2MASS and FIRST. The importance of a service like SkyQuery for the worldwide astronomical community cannot be overstated: data on the same objects in various archives is mapped in different wavelength ranges and looks very different due to different errors, instrument sensitivities and other peculiarities of each archive. Our cross-matching algorithm preforms a fuzzy spatial join across multiple catalogs. This type of cross-matching is currently often done by eye, one object at a time. A static cross-identification table for a set of archives would become obsolete by the time it was built - the exponential growth of astronomical data means that a dynamic cross-identification mechanism like SkyQuery is the only viable option. SkyQuery was funded by a grant from the NASA AISR program.

  18. Vaccine-criticism on the internet: new insights based on French-speaking websites.

    PubMed

    Ward, Jeremy K; Peretti-Watel, Patrick; Larson, Heidi J; Raude, Jocelyn; Verger, Pierre

    2015-02-18

    The internet is playing an increasingly important part in fueling vaccine related controversies and in generating vaccine hesitant behaviors. English language Antivaccination websites have been thoroughly analyzed, however, little is known of the arguments presented in other languages on the internet. This study presents three types of results: (1) Authors apply a time tested content analysis methodology to describe the information diffused by French language vaccine critical websites in comparison with English speaking websites. The contents of French language vaccine critical websites are very similar to those of English language websites except for the relative absence of moral and religious arguments. (2) Authors evaluate the likelihood that internet users will find those websites through vaccine-related queries on a variety of French-language versions of google. Queries on controversial vaccines generated many more vaccine critical websites than queries on vaccination in general. (3) Authors propose a typology of vaccine critical websites. Authors distinguish between (a) websites that criticize all vaccines ("antivaccine" websites) and websites that criticize only some vaccines ("vaccine-selective" websites), and between (b) websites that focus on vaccines ("vaccine-focused" websites) and those for which vaccines were only a secondary topic of interest ("generalist" websites). The differences in stances by groups and websites affect the likelihood that they will be believed and by whom. This study therefore helps understand the different information landscapes that may contribute to the variety of forms of vaccine hesitancy. Public authorities should have better awareness and understanding of these stances to bring appropriate answers to the different controversies about vaccination. Copyright © 2014 Elsevier Ltd. All rights reserved.

  19. Functional Analysis of Language Interactions between Down Syndrome Children and Their Mothers.

    ERIC Educational Resources Information Center

    Hooshyar, Nahid T.

    A 20-minute videotape sample was obtained of the language interactions between 20 Down syndrome children (ages 38 to 107 months) and their mothers during informal playtime. Linguistic utterances of mothers and children were coded according to the following language categories: query, declarative, imperative, performative, feedback, imitation,…

  20. Data management with a landslide inventory of the Franconian Alb (Germany) using a spatial database and GIS tools

    NASA Astrophysics Data System (ADS)

    Bemm, Stefan; Sandmeier, Christine; Wilde, Martina; Jaeger, Daniel; Schwindt, Daniel; Terhorst, Birgit

    2014-05-01

    The area of the Swabian-Franconian cuesta landscape (Southern Germany) is highly prone to landslides. This was apparent in the late spring of 2013, when numerous landslides occurred as a consequence of heavy and long-lasting rainfalls. The specific climatic situation caused numerous damages with serious impact on settlements and infrastructure. Knowledge on spatial distribution of landslides, processes and characteristics are important to evaluate the potential risk that can occur from mass movements in those areas. In the frame of two projects about 400 landslides were mapped and detailed data sets were compiled during years 2011 to 2014 at the Franconian Alb. The studies are related to the project "Slope stability and hazard zones in the northern Bavarian cuesta" (DFG, German Research Foundation) as well as to the LfU (The Bavarian Environment Agency) within the project "Georisks and climate change - hazard indication map Jura". The central goal of the present study is to create a spatial database for landslides. The database should contain all fundamental parameters to characterize the mass movements and should provide the potential for secure data storage and data management, as well as statistical evaluations. The spatial database was created with PostgreSQL, an object-relational database management system and PostGIS, a spatial database extender for PostgreSQL, which provides the possibility to store spatial and geographic objects and to connect to several GIS applications, like GRASS GIS, SAGA GIS, QGIS and GDAL, a geospatial library (Obe et al. 2011). Database access for querying, importing, and exporting spatial and non-spatial data is ensured by using GUI or non-GUI connections. The database allows the use of procedural languages for writing advanced functions in the R, Python or Perl programming languages. It is possible to work directly with the (spatial) data entirety of the database in R. The inventory of the database includes (amongst others), informations on location, landslide types and causes, geomorphological positions, geometries, hazards and damages, as well as assessments related to the activity of landslides. Furthermore, there are stored spatial objects, which represent the components of a landslide, in particular the scarps and the accumulation areas. Besides, waterways, map sheets, contour lines, detailed infrastructure data, digital elevation models, aspect and slope data are included. Examples of spatial queries to the database are intersections of raster and vector data for calculating values for slope gradients or aspects of landslide areas and for creating multiple, overlaying sections for the comparison of slopes, as well as distances to the infrastructure or to the next receiving drainage. Furthermore, getting informations on landslide magnitudes, distribution and clustering, as well as potential correlations concerning geomorphological or geological conditions. The data management concept in this study can be implemented for any academic, public or private use, because it is independent from any obligatory licenses. The created spatial database offers a platform for interdisciplinary research and socio-economic questions, as well as for landslide susceptibility and hazard indication mapping. Obe, R.O., Hsu, L.S. 2011. PostGIS in action. - pp 492, Manning Publications, Stamford

  1. Querying phenotype-genotype relationships on patient datasets using semantic web technology: the example of Cerebrotendinous xanthomatosis.

    PubMed

    Taboada, María; Martínez, Diego; Pilo, Belén; Jiménez-Escrig, Adriano; Robinson, Peter N; Sobrido, María J

    2012-07-31

    Semantic Web technology can considerably catalyze translational genetics and genomics research in medicine, where the interchange of information between basic research and clinical levels becomes crucial. This exchange involves mapping abstract phenotype descriptions from research resources, such as knowledge databases and catalogs, to unstructured datasets produced through experimental methods and clinical practice. This is especially true for the construction of mutation databases. This paper presents a way of harmonizing abstract phenotype descriptions with patient data from clinical practice, and querying this dataset about relationships between phenotypes and genetic variants, at different levels of abstraction. Due to the current availability of ontological and terminological resources that have already reached some consensus in biomedicine, a reuse-based ontology engineering approach was followed. The proposed approach uses the Ontology Web Language (OWL) to represent the phenotype ontology and the patient model, the Semantic Web Rule Language (SWRL) to bridge the gap between phenotype descriptions and clinical data, and the Semantic Query Web Rule Language (SQWRL) to query relevant phenotype-genotype bidirectional relationships. The work tests the use of semantic web technology in the biomedical research domain named cerebrotendinous xanthomatosis (CTX), using a real dataset and ontologies. A framework to query relevant phenotype-genotype bidirectional relationships is provided. Phenotype descriptions and patient data were harmonized by defining 28 Horn-like rules in terms of the OWL concepts. In total, 24 patterns of SWQRL queries were designed following the initial list of competency questions. As the approach is based on OWL, the semantic of the framework adapts the standard logical model of an open world assumption. This work demonstrates how semantic web technologies can be used to support flexible representation and computational inference mechanisms required to query patient datasets at different levels of abstraction. The open world assumption is especially good for describing only partially known phenotype-genotype relationships, in a way that is easily extensible. In future, this type of approach could offer researchers a valuable resource to infer new data from patient data for statistical analysis in translational research. In conclusion, phenotype description formalization and mapping to clinical data are two key elements for interchanging knowledge between basic and clinical research.

  2. Making Temporal Search More Central in Spatial Data Infrastructures

    NASA Astrophysics Data System (ADS)

    Corti, P.; Lewis, B.

    2017-10-01

    A temporally enabled Spatial Data Infrastructure (SDI) is a framework of geospatial data, metadata, users, and tools intended to provide an efficient and flexible way to use spatial information which includes the historical dimension. One of the key software components of an SDI is the catalogue service which is needed to discover, query, and manage the metadata. A search engine is a software system capable of supporting fast and reliable search, which may use any means necessary to get users to the resources they need quickly and efficiently. These techniques may include features such as full text search, natural language processing, weighted results, temporal search based on enrichment, visualization of patterns in distributions of results in time and space using temporal and spatial faceting, and many others. In this paper we will focus on the temporal aspects of search which include temporal enrichment using a time miner - a software engine able to search for date components within a larger block of text, the storage of time ranges in the search engine, handling historical dates, and the use of temporal histograms in the user interface to display the temporal distribution of search results.

  3. An integrated information retrieval and document management system

    NASA Technical Reports Server (NTRS)

    Coles, L. Stephen; Alvarez, J. Fernando; Chen, James; Chen, William; Cheung, Lai-Mei; Clancy, Susan; Wong, Alexis

    1993-01-01

    This paper describes the requirements and prototype development for an intelligent document management and information retrieval system that will be capable of handling millions of pages of text or other data. Technologies for scanning, Optical Character Recognition (OCR), magneto-optical storage, and multiplatform retrieval using a Standard Query Language (SQL) will be discussed. The semantic ambiguity inherent in the English language is somewhat compensated-for through the use of coefficients or weighting factors for partial synonyms. Such coefficients are used both for defining structured query trees for routine queries and for establishing long-term interest profiles that can be used on a regular basis to alert individual users to the presence of relevant documents that may have just arrived from an external source, such as a news wire service. Although this attempt at evidential reasoning is limited in comparison with the latest developments in AI Expert Systems technology, it has the advantage of being commercially available.

  4. Computer-Aided Clinical Trial Recruitment Based on Domain-Specific Language Translation: A Case Study of Retinopathy of Prematurity

    PubMed Central

    2017-01-01

    Reusing the data from healthcare information systems can effectively facilitate clinical trials (CTs). How to select candidate patients eligible for CT recruitment criteria is a central task. Related work either depends on DBA (database administrator) to convert the recruitment criteria to native SQL queries or involves the data mapping between a standard ontology/information model and individual data source schema. This paper proposes an alternative computer-aided CT recruitment paradigm, based on syntax translation between different DSLs (domain-specific languages). In this paradigm, the CT recruitment criteria are first formally represented as production rules. The referenced rule variables are all from the underlying database schema. Then the production rule is translated to an intermediate query-oriented DSL (e.g., LINQ). Finally, the intermediate DSL is directly mapped to native database queries (e.g., SQL) automated by ORM (object-relational mapping). PMID:29065644

  5. The EPMI Malay Basin petroleum geology database: Design philosophy and keys to success

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Low, H.E.; Creaney, S.; Fairchild, L.H.

    1994-07-01

    Esso Production Malaysia Inc. (EPMI) developed and populated a database containing information collected in the areas of basic well data: stratigraphy, lithology, facies; pressure, temperature, column/contacts; geochemistry, shows and stains, migration, fluid properties; maturation; seal; structure. Paradox was used as the database engine and query language, with links to ZYCOR ZMAP+ for mapping and SAS for data analysis. Paradox has a query language that is simple enough for users. The ability to link to good analytical packages was deemed more important than having the capability in the package. Important elements of design philosophy were included: (1) information on data qualitymore » had to be rigorously recorded; (2) raw and interpreted data were kept separate and clearly identified; (3) correlations between rock and chronostratigraphic surfaces were recorded; and (4) queries across technical boundaries had to be seamless.« less

  6. Computer-Aided Clinical Trial Recruitment Based on Domain-Specific Language Translation: A Case Study of Retinopathy of Prematurity.

    PubMed

    Zhang, Yinsheng; Zhang, Guoming; Shang, Qian

    2017-01-01

    Reusing the data from healthcare information systems can effectively facilitate clinical trials (CTs). How to select candidate patients eligible for CT recruitment criteria is a central task. Related work either depends on DBA (database administrator) to convert the recruitment criteria to native SQL queries or involves the data mapping between a standard ontology/information model and individual data source schema. This paper proposes an alternative computer-aided CT recruitment paradigm, based on syntax translation between different DSLs (domain-specific languages). In this paradigm, the CT recruitment criteria are first formally represented as production rules. The referenced rule variables are all from the underlying database schema. Then the production rule is translated to an intermediate query-oriented DSL (e.g., LINQ). Finally, the intermediate DSL is directly mapped to native database queries (e.g., SQL) automated by ORM (object-relational mapping).

  7. a Spatiotemporal Aggregation Query Method Using Multi-Thread Parallel Technique Based on Regional Division

    NASA Astrophysics Data System (ADS)

    Liao, S.; Chen, L.; Li, J.; Xiong, W.; Wu, Q.

    2015-07-01

    Existing spatiotemporal database supports spatiotemporal aggregation query over massive moving objects datasets. Due to the large amounts of data and single-thread processing method, the query speed cannot meet the application requirements. On the other hand, the query efficiency is more sensitive to spatial variation then temporal variation. In this paper, we proposed a spatiotemporal aggregation query method using multi-thread parallel technique based on regional divison and implemented it on the server. Concretely, we divided the spatiotemporal domain into several spatiotemporal cubes, computed spatiotemporal aggregation on all cubes using the technique of multi-thread parallel processing, and then integrated the query results. By testing and analyzing on the real datasets, this method has improved the query speed significantly.

  8. Medical Image Databases

    PubMed Central

    Tagare, Hemant D.; Jaffe, C. Carl; Duncan, James

    1997-01-01

    Abstract Information contained in medical images differs considerably from that residing in alphanumeric format. The difference can be attributed to four characteristics: (1) the semantics of medical knowledge extractable from images is imprecise; (2) image information contains form and spatial data, which are not expressible in conventional language; (3) a large part of image information is geometric; (4) diagnostic inferences derived from images rest on an incomplete, continuously evolving model of normality. This paper explores the differentiating characteristics of text versus images and their impact on design of a medical image database intended to allow content-based indexing and retrieval. One strategy for implementing medical image databases is presented, which employs object-oriented iconic queries, semantics by association with prototypes, and a generic schema. PMID:9147338

  9. A Fuzzy Query Mechanism for Human Resource Websites

    NASA Astrophysics Data System (ADS)

    Lai, Lien-Fu; Wu, Chao-Chin; Huang, Liang-Tsung; Kuo, Jung-Chih

    Users' preferences often contain imprecision and uncertainty that are difficult for traditional human resource websites to deal with. In this paper, we apply the fuzzy logic theory to develop a fuzzy query mechanism for human resource websites. First, a storing mechanism is proposed to store fuzzy data into conventional database management systems without modifying DBMS models. Second, a fuzzy query language is proposed for users to make fuzzy queries on fuzzy databases. User's fuzzy requirement can be expressed by a fuzzy query which consists of a set of fuzzy conditions. Third, each fuzzy condition associates with a fuzzy importance to differentiate between fuzzy conditions according to their degrees of importance. Fourth, the fuzzy weighted average is utilized to aggregate all fuzzy conditions based on their degrees of importance and degrees of matching. Through the mutual compensation of all fuzzy conditions, the ordering of query results can be obtained according to user's preference.

  10. A Semantic Basis for Proof Queries and Transformations

    NASA Technical Reports Server (NTRS)

    Aspinall, David; Denney, Ewen W.; Luth, Christoph

    2013-01-01

    We extend the query language PrQL, designed for inspecting machine representations of proofs, to also allow transformation of proofs. PrQL natively supports hiproofs which express proof structure using hierarchically nested labelled trees, which we claim is a natural way of taming the complexity of huge proofs. Query-driven transformations enable manipulation of this structure, in particular, to transform proofs produced by interactive theorem provers into forms that assist their understanding, or that could be consumed by other tools. In this paper we motivate and define basic transformation operations, using an abstract denotational semantics of hiproofs and queries. This extends our previous semantics for queries based on syntactic tree representations.We define update operations that add and remove sub-proofs, and manipulate the hierarchy to group and ungroup nodes. We show that

  11. Semantic-based surveillance video retrieval.

    PubMed

    Hu, Weiming; Xie, Dan; Fu, Zhouyu; Zeng, Wenrong; Maybank, Steve

    2007-04-01

    Visual surveillance produces large amounts of video data. Effective indexing and retrieval from surveillance video databases are very important. Although there are many ways to represent the content of video clips in current video retrieval algorithms, there still exists a semantic gap between users and retrieval systems. Visual surveillance systems supply a platform for investigating semantic-based video retrieval. In this paper, a semantic-based video retrieval framework for visual surveillance is proposed. A cluster-based tracking algorithm is developed to acquire motion trajectories. The trajectories are then clustered hierarchically using the spatial and temporal information, to learn activity models. A hierarchical structure of semantic indexing and retrieval of object activities, where each individual activity automatically inherits all the semantic descriptions of the activity model to which it belongs, is proposed for accessing video clips and individual objects at the semantic level. The proposed retrieval framework supports various queries including queries by keywords, multiple object queries, and queries by sketch. For multiple object queries, succession and simultaneity restrictions, together with depth and breadth first orders, are considered. For sketch-based queries, a method for matching trajectories drawn by users to spatial trajectories is proposed. The effectiveness and efficiency of our framework are tested in a crowded traffic scene.

  12. [Human body meridian spatial decision support system for clinical treatment and teaching of acupuncture and moxibustion].

    PubMed

    Wu, Dehua

    2016-01-01

    The spatial position and distribution of human body meridian are expressed limitedly in the decision support system (DSS) of acupuncture and moxibustion at present, which leads to the failure to give the effective quantitative analysis on the spatial range and the difficulty for the decision-maker to provide a realistic spatial decision environment. Focusing on the limit spatial expression in DSS of acupuncture and moxibustion, it was proposed that on the basis of the geographic information system, in association of DSS technology, the design idea was developed on the human body meridian spatial DSS. With the 4-layer service-oriented architecture adopted, the data center integrated development platform was taken as the system development environment. The hierarchical organization was done for the spatial data of human body meridian via the directory tree. The structured query language (SQL) server was used to achieve the unified management of spatial data and attribute data. The technologies of architecture, configuration and plug-in development model were integrated to achieve the data inquiry, buffer analysis and program evaluation of the human body meridian spatial DSS. The research results show that the human body meridian spatial DSS could reflect realistically the spatial characteristics of the spatial position and distribution of human body meridian and met the constantly changeable demand of users. It has the powerful spatial analysis function and assists with the scientific decision in clinical treatment and teaching of acupuncture and moxibustion. It is the new attempt to the informatization research of human body meridian.

  13. VisGets: coordinated visualizations for web-based information exploration and discovery.

    PubMed

    Dörk, Marian; Carpendale, Sheelagh; Collins, Christopher; Williamson, Carey

    2008-01-01

    In common Web-based search interfaces, it can be difficult to formulate queries that simultaneously combine temporal, spatial, and topical data filters. We investigate how coordinated visualizations can enhance search and exploration of information on the World Wide Web by easing the formulation of these types of queries. Drawing from visual information seeking and exploratory search, we introduce VisGets--interactive query visualizations of Web-based information that operate with online information within a Web browser. VisGets provide the information seeker with visual overviews of Web resources and offer a way to visually filter the data. Our goal is to facilitate the construction of dynamic search queries that combine filters from more than one data dimension. We present a prototype information exploration system featuring three linked VisGets (temporal, spatial, and topical), and used it to visually explore news items from online RSS feeds.

  14. Developing A Web-based User Interface for Semantic Information Retrieval

    NASA Technical Reports Server (NTRS)

    Berrios, Daniel C.; Keller, Richard M.

    2003-01-01

    While there are now a number of languages and frameworks that enable computer-based systems to search stored data semantically, the optimal design for effective user interfaces for such systems is still uncle ar. Such interfaces should mask unnecessary query detail from users, yet still allow them to build queries of arbitrary complexity without significant restrictions. We developed a user interface supporting s emantic query generation for Semanticorganizer, a tool used by scient ists and engineers at NASA to construct networks of knowledge and dat a. Through this interface users can select node types, node attribute s and node links to build ad-hoc semantic queries for searching the S emanticOrganizer network.

  15. Query by forms: User-oriented relational database retrieving system and its application in analysis of experiment data

    NASA Astrophysics Data System (ADS)

    Skotniczny, Zbigniew

    1989-12-01

    The Query by Forms (QbF) system is a user-oriented interactive tool for querying large relational database with minimal queries difinition cost. The system was worked out under the assumption that user's time and effort for defining needed queries is the most severe bottleneck. The system may be applied in any Rdb/VMS databases system and is recommended for specific information systems of any project where end-user queries cannot be foreseen. The tool is dedicated to specialist of an application domain who have to analyze data maintained in database from any needed point of view, who do not need to know commercial databases languages. The paper presents the system developed as a compromise between its functionality and usability. User-system communication via a menu-driven "tree-like" structure of screen-forms which produces a query difinition and execution is discussed in detail. Output of query results (printed reports and graphics) is also discussed. Finally the paper shows one application of QbF to a HERA-project.

  16. A Magnetic Petrology Database for Satellite Magnetic Anomaly Interpretations

    NASA Astrophysics Data System (ADS)

    Nazarova, K.; Wasilewski, P.; Didenko, A.; Genshaft, Y.; Pashkevich, I.

    2002-05-01

    A Magnetic Petrology Database (MPDB) is now being compiled at NASA/Goddard Space Flight Center in cooperation with Russian and Ukrainian Institutions. The purpose of this database is to provide the geomagnetic community with a comprehensive and user-friendly method of accessing magnetic petrology data via Internet for more realistic interpretation of satellite magnetic anomalies. Magnetic Petrology Data had been accumulated in NASA/Goddard Space Flight Center, United Institute of Physics of the Earth (Russia) and Institute of Geophysics (Ukraine) over several decades and now consists of many thousands of records of data in our archives. The MPDB was, and continues to be in big demand especially since recent launching in near Earth orbit of the mini-constellation of three satellites - Oersted (in 1999), Champ (in 2000), and SAC-C (in 2000) which will provide lithospheric magnetic maps with better spatial and amplitude resolution (about 1 nT). The MPDB is focused on lower crustal and upper mantle rocks and will include data on mantle xenoliths, serpentinized ultramafic rocks, granulites, iron quartzites and rocks from Archean-Proterozoic metamorphic sequences from all around the world. A substantial amount of data is coming from the area of unique Kursk Magnetic Anomaly and Kola Deep Borehole (which recovered 12 km of continental crust). A prototype MPDB can be found on the Geodynamics Branch web server of Goddard Space Flight Center at http://core2.gsfc.nasa.gov/terr_mag/magnpetr.html. The MPDB employs a searchable relational design and consists of 7 interrelated tables. The schema of database is shown at http://core2.gsfc.nasa.gov/terr_mag/doc.html. MySQL database server was utilized to implement MPDB. The SQL (Structured Query Language) is used to query the database. To present the results of queries on WEB and for WEB programming we utilized PHP scripting language and CGI scripts. The prototype MPDB is designed to search database by major satellite magnetic anomaly, tectonic structure, geographical location, rock type, magnetic properties, chemistry and reference, see http://core2.gsfc.nasa.gov/terr_mag/query1.html. The output of database is HTML structured table, text file, and downloadable file. This database will be very useful for studies of lithospheric satellite magnetic anomalies on the Earth and other terrestrial planets.

  17. StarView: The object oriented design of the ST DADS user interface

    NASA Technical Reports Server (NTRS)

    Williams, J. D.; Pollizzi, J. A.

    1992-01-01

    StarView is the user interface being developed for the Hubble Space Telescope Data Archive and Distribution Service (ST DADS). ST DADS is the data archive for HST observations and a relational database catalog describing the archived data. Users will use StarView to query the catalog and select appropriate datasets for study. StarView sends requests for archived datasets to ST DADS which processes the requests and returns the database to the user. StarView is designed to be a powerful and extensible user interface. Unique features include an internal relational database to navigate query results, a form definition language that will work with both CRT and X interfaces, a data definition language that will allow StarView to work with any relational database, and the ability to generate adhoc queries without requiring the user to understand the structure of the ST DADS catalog. Ultimately, StarView will allow the user to refine queries in the local database for improved performance and merge in data from external sources for correlation with other query results. The user will be able to create a query from single or multiple forms, merging the selected attributes into a single query. Arbitrary selection of attributes for querying is supported. The user will be able to select how query results are viewed. A standard form or table-row format may be used. Navigation capabilities are provided to aid the user in viewing query results. Object oriented analysis and design techniques were used in the design of StarView to support the mechanisms and concepts required to implement these features. One such mechanism is the Model-View-Controller (MVC) paradigm. The MVC allows the user to have multiple views of the underlying database, while providing a consistent mechanism for interaction regardless of the view. This approach supports both CRT and X interfaces while providing a common mode of user interaction. Another powerful abstraction is the concept of a Query Model. This concept allows a single query to be built form a single or multiple forms before it is submitted to ST DADS. Supporting this concept is the adhoc query generator which allows the user to select and qualify an indeterminate number attributes from the database. The user does not need any knowledge of how the joins across various tables are to be resolved. The adhoc generator calculates the joins automatically and generates the correct SQL query.

  18. Combining qualitative and quantitative spatial and temporal information in a hierarchical structure: Approximate reasoning for plan execution monitoring

    NASA Technical Reports Server (NTRS)

    Hoebel, Louis J.

    1993-01-01

    The problem of plan generation (PG) and the problem of plan execution monitoring (PEM), including updating, queries, and resource-bounded replanning, have different reasoning and representation requirements. PEM requires the integration of qualitative and quantitative information. PEM is the receiving of data about the world in which a plan or agent is executing. The problem is to quickly determine the relevance of the data, the consistency of the data with respect to the expected effects, and if execution should continue. Only spatial and temporal aspects of the plan are addressed for relevance in this work. Current temporal reasoning systems are deficient in computational aspects or expressiveness. This work presents a hybrid qualitative and quantitative system that is fully expressive in its assertion language while offering certain computational efficiencies. In order to proceed, methods incorporating approximate reasoning using hierarchies, notions of locality, constraint expansion, and absolute parameters need be used and are shown to be useful for the anytime nature of PEM.

  19. Rasdaman for Big Spatial Raster Data

    NASA Astrophysics Data System (ADS)

    Hu, F.; Huang, Q.; Scheele, C. J.; Yang, C. P.; Yu, M.; Liu, K.

    2015-12-01

    Spatial raster data have grown exponentially over the past decade. Recent advancements on data acquisition technology, such as remote sensing, have allowed us to collect massive observation data of various spatial resolution and domain coverage. The volume, velocity, and variety of such spatial data, along with the computational intensive nature of spatial queries, pose grand challenge to the storage technologies for effective big data management. While high performance computing platforms (e.g., cloud computing) can be used to solve the computing-intensive issues in big data analysis, data has to be managed in a way that is suitable for distributed parallel processing. Recently, rasdaman (raster data manager) has emerged as a scalable and cost-effective database solution to store and retrieve massive multi-dimensional arrays, such as sensor, image, and statistics data. Within this paper, the pros and cons of using rasdaman to manage and query spatial raster data will be examined and compared with other common approaches, including file-based systems, relational databases (e.g., PostgreSQL/PostGIS), and NoSQL databases (e.g., MongoDB and Hive). Earth Observing System (EOS) data collected from NASA's Atmospheric Scientific Data Center (ASDC) will be used and stored in these selected database systems, and a set of spatial and non-spatial queries will be designed to benchmark their performance on retrieving large-scale, multi-dimensional arrays of EOS data. Lessons learnt from using rasdaman will be discussed as well.

  20. Optimizability of OGC Standards Implementations - a Case Study

    NASA Astrophysics Data System (ADS)

    Misev, D.; Baumann, P.

    2012-04-01

    Why do we shop at Amazon? Because they have a unique offering that is nowhere else available? Certainly not. Rather, Amazon offers (i) simple, yet effective search; (ii) very simple payment; (iii) extremely rapid delivery. This is how scientific services will be distinguished in future: not for their data holding (there will be manifold choice), but for their service quality. We are facing the transition from data stewardship to service stewardship. One of the OGC standards which particularly enables flexible retrieval is the Web Coverage Processing Service (WCPS). It defines a high-level query language on large, multi-dimensional raster data, such as 1D timeseries, 2D EO imagery, 3D x/y/t image time series and x/y/z geophysical data, 4D x/y/z/t climate and ocean data. We have implemented WCPS based on an Array Database Management System, rasdaman, which is available in open source. In this demonstration, we study WCPS queries on 2D, 3D, and 4D data sets. Particular emphasis is placed on the computational load queries generate in such on-demand processing and filtering. We look at different techniques and their impact on performance, such as adaptive storage partitioning, query rewriting, and just-in-time compilation. Results show that there is significant potential for effective server-side optimization once a query language is sufficiently high-level and declarative.

  1. Integrating unified medical language system and association mining techniques into relevance feedback for biomedical literature search.

    PubMed

    Ji, Yanqing; Ying, Hao; Tran, John; Dews, Peter; Massanari, R Michael

    2016-07-19

    Finding highly relevant articles from biomedical databases is challenging not only because it is often difficult to accurately express a user's underlying intention through keywords but also because a keyword-based query normally returns a long list of hits with many citations being unwanted by the user. This paper proposes a novel biomedical literature search system, called BiomedSearch, which supports complex queries and relevance feedback. The system employed association mining techniques to build a k-profile representing a user's relevance feedback. More specifically, we developed a weighted interest measure and an association mining algorithm to find the strength of association between a query and each concept in the article(s) selected by the user as feedback. The top concepts were utilized to form a k-profile used for the next-round search. BiomedSearch relies on Unified Medical Language System (UMLS) knowledge sources to map text files to standard biomedical concepts. It was designed to support queries with any levels of complexity. A prototype of BiomedSearch software was made and it was preliminarily evaluated using the Genomics data from TREC (Text Retrieval Conference) 2006 Genomics Track. Initial experiment results indicated that BiomedSearch increased the mean average precision (MAP) for a set of queries. With UMLS and association mining techniques, BiomedSearch can effectively utilize users' relevance feedback to improve the performance of biomedical literature search.

  2. Improving Concept-Based Web Image Retrieval by Mixing Semantically Similar Greek Queries

    ERIC Educational Resources Information Center

    Lazarinis, Fotis

    2008-01-01

    Purpose: Image searching is a common activity for web users. Search engines offer image retrieval services based on textual queries. Previous studies have shown that web searching is more demanding when the search is not in English and does not use a Latin-based language. The aim of this paper is to explore the behaviour of the major search…

  3. The Comparison of SQL, QBE, and DFQL as Query Languages for Relational Databases

    DTIC Science & Technology

    1994-03-01

    is: Dname F-mune Laame Headquarter James Borg b. Query 7: RetieMl involving explicit sets Retrieve the Social Security Numbers of employees who worked...i •••,• I• i , i I I • I 10. Ka Dispullahta MABES TNI-AL Cilangkap-Jakarta Timur Indonesia 11. Parunmungan Girsang 3 Jl. Cawang Baru 34-36 Jakarta

  4. Design of a Low-Cost Adaptive Question Answering System for Closed Domain Factoid Queries

    ERIC Educational Resources Information Center

    Toh, Huey Ling

    2010-01-01

    Closed domain question answering (QA) systems achieve precision and recall at the cost of complex language processing techniques to parse the answer corpus. We propose a "query-based" model for indexing answers in a closed domain factoid QA system. Further, we use a phrase term inference method for improving the ranking order of related questions.…

  5. NVST Data Archiving System Based On FastBit NoSQL Database

    NASA Astrophysics Data System (ADS)

    Liu, Ying-bo; Wang, Feng; Ji, Kai-fan; Deng, Hui; Dai, Wei; Liang, Bo

    2014-06-01

    The New Vacuum Solar Telescope (NVST) is a 1-meter vacuum solar telescope that aims to observe the fine structures of active regions on the Sun. The main tasks of the NVST are high resolution imaging and spectral observations, including the measurements of the solar magnetic field. The NVST has been collecting more than 20 million FITS files since it began routine observations in 2012 and produces a maximum observational records of 120 thousand files in a day. Given the large amount of files, the effective archiving and retrieval of files becomes a critical and urgent problem. In this study, we implement a new data archiving system for the NVST based on the Fastbit Not Only Structured Query Language (NoSQL) database. Comparing to the relational database (i.e., MySQL; My Structured Query Language), the Fastbit database manifests distinctive advantages on indexing and querying performance. In a large scale database of 40 million records, the multi-field combined query response time of Fastbit database is about 15 times faster and fully meets the requirements of the NVST. Our study brings a new idea for massive astronomical data archiving and would contribute to the design of data management systems for other astronomical telescopes.

  6. GenoMetric Query Language: a novel approach to large-scale genomic data management.

    PubMed

    Masseroli, Marco; Pinoli, Pietro; Venco, Francesco; Kaitoua, Abdulrahman; Jalili, Vahid; Palluzzi, Fernando; Muller, Heiko; Ceri, Stefano

    2015-06-15

    Improvement of sequencing technologies and data processing pipelines is rapidly providing sequencing data, with associated high-level features, of many individual genomes in multiple biological and clinical conditions. They allow for data-driven genomic, transcriptomic and epigenomic characterizations, but require state-of-the-art 'big data' computing strategies, with abstraction levels beyond available tool capabilities. We propose a high-level, declarative GenoMetric Query Language (GMQL) and a toolkit for its use. GMQL operates downstream of raw data preprocessing pipelines and supports queries over thousands of heterogeneous datasets and samples; as such it is key to genomic 'big data' analysis. GMQL leverages a simple data model that provides both abstractions of genomic region data and associated experimental, biological and clinical metadata and interoperability between many data formats. Based on Hadoop framework and Apache Pig platform, GMQL ensures high scalability, expressivity, flexibility and simplicity of use, as demonstrated by several biological query examples on ENCODE and TCGA datasets. The GMQL toolkit is freely available for non-commercial use at http://www.bioinformatics.deib.polimi.it/GMQL/. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  7. CITE NLM: Natural-Language Searching in an Online Catalog.

    ERIC Educational Resources Information Center

    Doszkocs, Tamas E.

    1983-01-01

    The National Library of Medicine's Current Information Transfer in English public access online catalog offers unique subject search capabilities--natural-language query input, automatic medical subject headings display, closest match search strategy, ranked document output, dynamic end user feedback for search refinement. References, description…

  8. Interrogation: General vs. Local.

    ERIC Educational Resources Information Center

    Johnson, Jeannette

    This paper proposes a set of hypotheses on the nature of interrogration as a possible language universal. Examples and phrase structure rules and diagrams are given. Examining Tamazight and English, genetically unrelated languages with almost no contact, the author distinguishes two types of interrogation: (1) general, querying acceptability to…

  9. Dynamic Querying of Mass-Storage RDF Data with Rule-Based Entailment Regimes

    NASA Astrophysics Data System (ADS)

    Ianni, Giovambattista; Krennwallner, Thomas; Martello, Alessandra; Polleres, Axel

    RDF Schema (RDFS) as a lightweight ontology language is gaining popularity and, consequently, tools for scalable RDFS inference and querying are needed. SPARQL has become recently a W3C standard for querying RDF data, but it mostly provides means for querying simple RDF graphs only, whereas querying with respect to RDFS or other entailment regimes is left outside the current specification. In this paper, we show that SPARQL faces certain unwanted ramifications when querying ontologies in conjunction with RDF datasets that comprise multiple named graphs, and we provide an extension for SPARQL that remedies these effects. Moreover, since RDFS inference has a close relationship with logic rules, we generalize our approach to select a custom ruleset for specifying inferences to be taken into account in a SPARQL query. We show that our extensions are technically feasible by providing benchmark results for RDFS querying in our prototype system GiaBATA, which uses Datalog coupled with a persistent Relational Database as a back-end for implementing SPARQL with dynamic rule-based inference. By employing different optimization techniques like magic set rewriting our system remains competitive with state-of-the-art RDFS querying systems.

  10. Clinician-Oriented Access to Data - C.O.A.D.: A Natural Language Interface to a VA DHCP Database

    PubMed Central

    Levy, Christine; Rogers, Elizabeth

    1995-01-01

    Hospitals collect enormous amounts of data related to the on-going care of patients. Unfortunately, a clinicians access to the data is limited by complexities of the database structure and/or programming skills required to access the database. The COAD project attempts to bridge the gap between the clinical user's need for specific information from the database, and the wealth of data residing in the hospital information system. The project design includes a natural language interface to data contained in a VA DHCP database. We have developed a prototype which links natural language software to certain DHCP data elements, including, patient demographics, prescriptions, diagnoses, laboratory data, and provider information. English queries can by typed onto the system, and answers to the questions are returned. Future work includes refinement of natural language/DHCP connections to enable more sophisticated queries, and optimization of the system to reduce response time to user questions.

  11. Spanish for Business: A Journey into Employability

    ERIC Educational Resources Information Center

    Lallana, Amparo; Pastor-González, Victoria

    2016-01-01

    As language lecturers, we believe that we equip our graduates with a range of key skills that give them an edge in the employment market. But, query final year students of a Business and Languages degree on the value of language learning for employability, and they are likely to mention a small number of functional abilities such as CV writing and…

  12. Persistent Identifiers for Improved Accessibility for Linked Data Querying

    NASA Astrophysics Data System (ADS)

    Shepherd, A.; Chandler, C. L.; Arko, R. A.; Fils, D.; Jones, M. B.; Krisnadhi, A.; Mecum, B.

    2016-12-01

    The adoption of linked open data principles within the geosciences has increased the amount of accessible information available on the Web. However, this data is difficult to consume for those who are unfamiliar with Semantic Web technologies such as Web Ontology Language (OWL), Resource Description Framework (RDF) and SPARQL - the RDF query language. Consumers would need to understand the structure of the data and how to efficiently query it. Furthermore, understanding how to query doesn't solve problems of poor precision and recall in search results. For consumers unfamiliar with the data, full-text searches are most accessible, but not ideal as they arrest the advantages of data disambiguation and co-reference resolution efforts. Conversely, URI searches across linked data can deliver improved search results, but knowledge of these exact URIs may remain difficult to obtain. The increased adoption of Persistent Identifiers (PIDs) can lead to improved linked data querying by a wide variety of consumers. Because PIDs resolve to a single entity, they are an excellent data point for disambiguating content. At the same time, PIDs are more accessible and prominent than a single data provider's linked data URI. When present in linked open datasets, PIDs provide balance between the technical and social hurdles of linked data querying as evidenced by the NSF EarthCube GeoLink project. The GeoLink project, funded by NSF's EarthCube initiative, have brought together data repositories include content from field expeditions, laboratory analyses, journal publications, conference presentations, theses/reports, and funding awards that span scientific studies from marine geology to marine ecosystems and biogeochemistry to paleoclimatology.

  13. UMass at TREC 2002: Cross Language and Novelty Tracks

    DTIC Science & Technology

    2002-01-01

    resources – stemmers, dictionaries , machine translation, and an acronym database. We found that proper names were extremely important in this year’s queries...data by manually annotating 48 additional topics. 1. Cross Language Track We submitted one monolingual run and four cross-language runs. For the... monolingual run, the technology was essentially the same as the system we used for TREC 2001. For the cross-language run, we integrated some new

  14. The Effect of Bilingual Term List Size on Dictionary-Based Cross-Language Information Retrieval

    DTIC Science & Technology

    2006-01-01

    The Effect of Bilingual Term List Size on Dictionary -Based Cross-Language Information Retrieval Dina Demner-Fushman Department of Computer Science... dictionary -based Cross-Language Information Retrieval (CLIR), in which the goal is to find documents written in one natural language based on queries that...in which the documents are written. In dictionary -based CLIR techniques, the princi- pal source of translation knowledge is a translation lexicon

  15. Guiding Students to Answers: Query Recommendation

    ERIC Educational Resources Information Center

    Yilmazel, Ozgur

    2011-01-01

    This paper reports on a guided navigation system built on the textbook search engine developed at Anadolu University to support distance education students. The search engine uses Turkish Language specific language processing modules to enable searches over course material presented in Open Education Faculty textbooks. We implemented a guided…

  16. Querying phenotype-genotype relationships on patient datasets using semantic web technology: the example of cerebrotendinous xanthomatosis

    PubMed Central

    2012-01-01

    Background Semantic Web technology can considerably catalyze translational genetics and genomics research in medicine, where the interchange of information between basic research and clinical levels becomes crucial. This exchange involves mapping abstract phenotype descriptions from research resources, such as knowledge databases and catalogs, to unstructured datasets produced through experimental methods and clinical practice. This is especially true for the construction of mutation databases. This paper presents a way of harmonizing abstract phenotype descriptions with patient data from clinical practice, and querying this dataset about relationships between phenotypes and genetic variants, at different levels of abstraction. Methods Due to the current availability of ontological and terminological resources that have already reached some consensus in biomedicine, a reuse-based ontology engineering approach was followed. The proposed approach uses the Ontology Web Language (OWL) to represent the phenotype ontology and the patient model, the Semantic Web Rule Language (SWRL) to bridge the gap between phenotype descriptions and clinical data, and the Semantic Query Web Rule Language (SQWRL) to query relevant phenotype-genotype bidirectional relationships. The work tests the use of semantic web technology in the biomedical research domain named cerebrotendinous xanthomatosis (CTX), using a real dataset and ontologies. Results A framework to query relevant phenotype-genotype bidirectional relationships is provided. Phenotype descriptions and patient data were harmonized by defining 28 Horn-like rules in terms of the OWL concepts. In total, 24 patterns of SWQRL queries were designed following the initial list of competency questions. As the approach is based on OWL, the semantic of the framework adapts the standard logical model of an open world assumption. Conclusions This work demonstrates how semantic web technologies can be used to support flexible representation and computational inference mechanisms required to query patient datasets at different levels of abstraction. The open world assumption is especially good for describing only partially known phenotype-genotype relationships, in a way that is easily extensible. In future, this type of approach could offer researchers a valuable resource to infer new data from patient data for statistical analysis in translational research. In conclusion, phenotype description formalization and mapping to clinical data are two key elements for interchanging knowledge between basic and clinical research. PMID:22849591

  17. Constraint-based Data Mining

    NASA Astrophysics Data System (ADS)

    Boulicaut, Jean-Francois; Jeudy, Baptiste

    Knowledge Discovery in Databases (KDD) is a complex interactive process. The promising theoretical framework of inductive databases considers this is essentially a querying process. It is enabled by a query language which can deal either with raw data or patterns which hold in the data. Mining patterns turns to be the so-called inductive query evaluation process for which constraint-based Data Mining techniques have to be designed. An inductive query specifies declaratively the desired constraints and algorithms are used to compute the patterns satisfying the constraints in the data. We survey important results of this active research domain. This chapter emphasizes a real breakthrough for hard problems concerning local pattern mining under various constraints and it points out the current directions of research as well.

  18. Spatiotemporal conceptual platform for querying archaeological information systems

    NASA Astrophysics Data System (ADS)

    Partsinevelos, Panagiotis; Sartzetaki, Mary; Sarris, Apostolos

    2015-04-01

    Spatial and temporal distribution of archaeological sites has been shown to associate with several attributes including marine, water, mineral and food resources, climate conditions, geomorphological features, etc. In this study, archeological settlement attributes are evaluated under various associations in order to provide a specialized query platform in a geographic information system (GIS). Towards this end, a spatial database is designed to include a series of archaeological findings for a secluded geographic area of Crete in Greece. The key categories of the geodatabase include the archaeological type (palace, burial site, village, etc.), temporal information of the habitation/usage period (pre Minoan, Minoan, Byzantine, etc.), and the extracted geographical attributes of the sites (distance to sea, altitude, resources, etc.). Most of the related spatial attributes are extracted with readily available GIS tools. Additionally, a series of conceptual data attributes are estimated, including: Temporal relation of an era to a future one in terms of alteration of the archaeological type, topologic relations of various types and attributes, spatial proximity relations between various types. These complex spatiotemporal relational measures reveal new attributes towards better understanding of site selection for prehistoric and/or historic cultures, yet their potential combinations can become numerous. Therefore, after the quantification of the above mentioned attributes, they are classified as of their importance for archaeological site location modeling. Under this new classification scheme, the user may select a geographic area of interest and extract only the important attributes for a specific archaeological type. These extracted attributes may then be queried against the entire spatial database and provide a location map of possible new archaeological sites. This novel type of querying is robust since the user does not have to type a standard SQL query but graphically select an area of interest. In addition, according to the application at hand, novel spatiotemporal attributes and relations can be supported, towards the understanding of historical settlement patterns.

  19. Identifying QT prolongation from ECG impressions using a general-purpose Natural Language Processor

    PubMed Central

    Denny, Joshua C.; Miller, Randolph A.; Waitman, Lemuel Russell; Arrieta, Mark; Peterson, Joshua F.

    2009-01-01

    Objective Typically detected via electrocardiograms (ECGs), QT interval prolongation is a known risk factor for sudden cardiac death. Since medications can promote or exacerbate the condition, detection of QT interval prolongation is important for clinical decision support. We investigated the accuracy of natural language processing (NLP) for identifying QT prolongation from cardiologist-generated, free-text ECG impressions compared to corrected QT (QTc) thresholds reported by ECG machines. Methods After integrating negation detection to a locally-developed natural language processor, the KnowledgeMap concept identifier, we evaluated NLP-based detection of QT prolongation compared to the calculated QTc on a set of 44,318 ECGs obtained from hospitalized patients. We also created a string query using regular expressions to identify QT prolongation. We calculated sensitivity and specificity of the methods using manual physician review of the cardiologist-generated reports as the gold standard. To investigate causes of “false positive” calculated QTc, we manually reviewed randomly selected ECGs with a long calculated QTc but no mention of QT prolongation. Separately, we validated the performance of the negation detection algorithm on 5,000 manually-categorized ECG phrases for any medical concept (not limited to QT prolongation) prior to developing the NLP query for QT prolongation. Results The NLP query for QT prolongation correctly identified 2,364 of 2,373 ECGs with QT prolongation with a sensitivity of 0.996 and a positive predictive value of 1.000. There were no false positives. The regular expression query had a sensitivity of 0.999 and positive predictive value of 0.982. In contrast, the positive predictive value of common QTc thresholds derived from ECG machines was 0.07–0.25 with corresponding sensitivities of 0.994–0.046. The negation detection algorithm had a recall of 0.973 and precision of 0.982 for 10,490 concepts found within ECG impressions. Conclusions NLP and regular expression queries of cardiologists’ ECG interpretations can more effectively identify QT prolongation than the automated QTc intervals reported by ECG machines. Future clinical decision support could employ NLP queries to detect QTc prolongation and other reported ECG abnormalities. PMID:18938105

  20. Geospatial Data Management Platform for Urban Groundwater

    NASA Astrophysics Data System (ADS)

    Gaitanaru, D.; Priceputu, A.; Gogu, C. R.

    2012-04-01

    Due to the large amount of civil work projects and research studies, large quantities of geo-data are produced for the urban environments. These data are usually redundant as well as they are spread in different institutions or private companies. Time consuming operations like data processing and information harmonisation represents the main reason to systematically avoid the re-use of data. The urban groundwater data shows the same complex situation. The underground structures (subway lines, deep foundations, underground parkings, and others), the urban facility networks (sewer systems, water supply networks, heating conduits, etc), the drainage systems, the surface water works and many others modify continuously. As consequence, their influence on groundwater changes systematically. However, these activities provide a large quantity of data, aquifers modelling and then behaviour prediction can be done using monitored quantitative and qualitative parameters. Due to the rapid evolution of technology in the past few years, transferring large amounts of information through internet has now become a feasible solution for sharing geoscience data. Furthermore, standard platform-independent means to do this have been developed (specific mark-up languages like: GML, GeoSciML, WaterML, GWML, CityML). They allow easily large geospatial databases updating and sharing through internet, even between different companies or between research centres that do not necessarily use the same database structures. For Bucharest City (Romania) an integrated platform for groundwater geospatial data management is developed under the framework of a national research project - "Sedimentary media modeling platform for groundwater management in urban areas" (SIMPA) financed by the National Authority for Scientific Research of Romania. The platform architecture is based on three components: a geospatial database, a desktop application (a complex set of hydrogeological and geological analysis tools) and a front-end geoportal service. The SIMPA platform makes use of mark-up transfer standards to provide a user-friendly application that can be accessed through internet to query, analyse, and visualise geospatial data related to urban groundwater. The platform holds the information within the local groundwater geospatial databases and the user is able to access this data through a geoportal service. The database architecture allows storing accurate and very detailed geological, hydrogeological, and infrastructure information that can be straightforwardly generalized and further upscaled. The geoportal service offers the possibility of querying a dataset from the spatial database. The query is coded in a standard mark-up language, and sent to the server through a standard Hyper Text Transfer Protocol (http) to be processed by the local application. After the validation of the query, the results are sent back to the user to be displayed by the geoportal application. The main advantage of the SIMPA platform is that it offers to the user the possibility to make a primary multi-criteria query, which results in a smaller set of data to be analysed afterwards. This improves both the transfer process parameters and the user's means of creating the desired query.

  1. The semantic web and computer vision: old AI meets new AI

    NASA Astrophysics Data System (ADS)

    Mundy, J. L.; Dong, Y.; Gilliam, A.; Wagner, R.

    2018-04-01

    There has been vast process in linking semantic information across the billions of web pages through the use of ontologies encoded in the Web Ontology Language (OWL) based on the Resource Description Framework (RDF). A prime example is the Wikipedia where the knowledge contained in its more than four million pages is encoded in an ontological database called DBPedia http://wiki.dbpedia.org/. Web-based query tools can retrieve semantic information from DBPedia encoded in interlinked ontologies that can be accessed using natural language. This paper will show how this vast context can be used to automate the process of querying images and other geospatial data in support of report changes in structures and activities. Computer vision algorithms are selected and provided with context based on natural language requests for monitoring and analysis. The resulting reports provide semantically linked observations from images and 3D surface models.

  2. Exploiting salient semantic analysis for information retrieval

    NASA Astrophysics Data System (ADS)

    Luo, Jing; Meng, Bo; Quan, Changqin; Tu, Xinhui

    2016-11-01

    Recently, many Wikipedia-based methods have been proposed to improve the performance of different natural language processing (NLP) tasks, such as semantic relatedness computation, text classification and information retrieval. Among these methods, salient semantic analysis (SSA) has been proven to be an effective way to generate conceptual representation for words or documents. However, its feasibility and effectiveness in information retrieval is mostly unknown. In this paper, we study how to efficiently use SSA to improve the information retrieval performance, and propose a SSA-based retrieval method under the language model framework. First, SSA model is adopted to build conceptual representations for documents and queries. Then, these conceptual representations and the bag-of-words (BOW) representations can be used in combination to estimate the language models of queries and documents. The proposed method is evaluated on several standard text retrieval conference (TREC) collections. Experiment results on standard TREC collections show the proposed models consistently outperform the existing Wikipedia-based retrieval methods.

  3. Querying Safety Cases

    NASA Technical Reports Server (NTRS)

    Denney, Ewen W.; Naylor, Dwight; Pai, Ganesh

    2014-01-01

    Querying a safety case to show how the various stakeholders' concerns about system safety are addressed has been put forth as one of the benefits of argument-based assurance (in a recent study by the Health Foundation, UK, which reviewed the use of safety cases in safety-critical industries). However, neither the literature nor current practice offer much guidance on querying mechanisms appropriate for, or available within, a safety case paradigm. This paper presents a preliminary approach that uses a formal basis for querying safety cases, specifically Goal Structuring Notation (GSN) argument structures. Our approach semantically enriches GSN arguments with domain-specific metadata that the query language leverages, along with its inherent structure, to produce views. We have implemented the approach in our toolset AdvoCATE, and illustrate it by application to a fragment of the safety argument for an Unmanned Aircraft System (UAS) being developed at NASA Ames. We also discuss the potential practical utility of our query mechanism within the context of the existing framework for UAS safety assurance.

  4. Information Network Model Query Processing

    NASA Astrophysics Data System (ADS)

    Song, Xiaopu

    Information Networking Model (INM) [31] is a novel database model for real world objects and relationships management. It naturally and directly supports various kinds of static and dynamic relationships between objects. In INM, objects are networked through various natural and complex relationships. INM Query Language (INM-QL) [30] is designed to explore such information network, retrieve information about schema, instance, their attributes, relationships, and context-dependent information, and process query results in the user specified form. INM database management system has been implemented using Berkeley DB, and it supports INM-QL. This thesis is mainly focused on the implementation of the subsystem that is able to effectively and efficiently process INM-QL. The subsystem provides a lexical and syntactical analyzer of INM-QL, and it is able to choose appropriate evaluation strategies and index mechanism to process queries in INM-QL without the user's intervention. It also uses intermediate result structure to hold intermediate query result and other helping structures to reduce complexity of query processing.

  5. StreamQRE: Modular Specification and Efficient Evaluation of Quantitative Queries over Streaming Data.

    PubMed

    Mamouras, Konstantinos; Raghothaman, Mukund; Alur, Rajeev; Ives, Zachary G; Khanna, Sanjeev

    2017-06-01

    Real-time decision making in emerging IoT applications typically relies on computing quantitative summaries of large data streams in an efficient and incremental manner. To simplify the task of programming the desired logic, we propose StreamQRE, which provides natural and high-level constructs for processing streaming data. Our language has a novel integration of linguistic constructs from two distinct programming paradigms: streaming extensions of relational query languages and quantitative extensions of regular expressions. The former allows the programmer to employ relational constructs to partition the input data by keys and to integrate data streams from different sources, while the latter can be used to exploit the logical hierarchy in the input stream for modular specifications. We first present the core language with a small set of combinators, formal semantics, and a decidable type system. We then show how to express a number of common patterns with illustrative examples. Our compilation algorithm translates the high-level query into a streaming algorithm with precise complexity bounds on per-item processing time and total memory footprint. We also show how to integrate approximation algorithms into our framework. We report on an implementation in Java, and evaluate it with respect to existing high-performance engines for processing streaming data. Our experimental evaluation shows that (1) StreamQRE allows more natural and succinct specification of queries compared to existing frameworks, (2) the throughput of our implementation is higher than comparable systems (for example, two-to-four times greater than RxJava), and (3) the approximation algorithms supported by our implementation can lead to substantial memory savings.

  6. StreamQRE: Modular Specification and Efficient Evaluation of Quantitative Queries over Streaming Data*

    PubMed Central

    Mamouras, Konstantinos; Raghothaman, Mukund; Alur, Rajeev; Ives, Zachary G.; Khanna, Sanjeev

    2017-01-01

    Real-time decision making in emerging IoT applications typically relies on computing quantitative summaries of large data streams in an efficient and incremental manner. To simplify the task of programming the desired logic, we propose StreamQRE, which provides natural and high-level constructs for processing streaming data. Our language has a novel integration of linguistic constructs from two distinct programming paradigms: streaming extensions of relational query languages and quantitative extensions of regular expressions. The former allows the programmer to employ relational constructs to partition the input data by keys and to integrate data streams from different sources, while the latter can be used to exploit the logical hierarchy in the input stream for modular specifications. We first present the core language with a small set of combinators, formal semantics, and a decidable type system. We then show how to express a number of common patterns with illustrative examples. Our compilation algorithm translates the high-level query into a streaming algorithm with precise complexity bounds on per-item processing time and total memory footprint. We also show how to integrate approximation algorithms into our framework. We report on an implementation in Java, and evaluate it with respect to existing high-performance engines for processing streaming data. Our experimental evaluation shows that (1) StreamQRE allows more natural and succinct specification of queries compared to existing frameworks, (2) the throughput of our implementation is higher than comparable systems (for example, two-to-four times greater than RxJava), and (3) the approximation algorithms supported by our implementation can lead to substantial memory savings. PMID:29151821

  7. Python Winding Itself Around Datacubes: How to Access Massive Multi-Dimensional Arrays in a Pythonic Way

    NASA Astrophysics Data System (ADS)

    Merticariu, Vlad; Misev, Dimitar; Baumann, Peter

    2017-04-01

    While python has developed into the lingua franca in Data Science there is often a paradigm break when accessing specialized tools. In particular for one of the core data categories in science and engineering, massive multi-dimensional arrays, out-of-memory solutions typically employ their own, different models. We discuss this situation on the example of the scalable open-source array engine, rasdaman ("raster data manager") which offers access to and processing of Petascale multi-dimensional arrays through an SQL-style array query language, rasql. Such queries are executed in the server on a storage engine utilizing adaptive array partitioning and based on a processing engine implementing a "tile streaming" paradigm to allow processing of arrays massively larger than server RAM. The rasdaman QL has acted as blueprint for forthcoming ISO Array SQL and the Open Geospatial Consortium (OGC) geo analytics language, Web Coverage Processing Service, adopted in 2008. Not surprisingly, rasdaman is OGC and INSPIRE Reference Implementation for their "Big Earth Data" standards suite. Recently, rasdaman has been augmented with a python interface which allows to transparently interact with the database (credits go to Siddharth Shukla's Master Thesis at Jacobs University). Programmers do not need to know the rasdaman query language, as the operators are silently transformed, through lazy evaluation, into queries. Arrays delivered are likewise automatically transformed into their python representation. In the talk, the rasdaman concept will be illustrated with the help of large-scale real-life examples of operational satellite image and weather data services, and sample python code.

  8. Prolog as a Teaching Tool for Relational Database Interrogation.

    ERIC Educational Resources Information Center

    Collier, P. A.; Samson, W. B.

    1982-01-01

    The use of the Prolog programing language is promoted as the language to use by anyone teaching a course in relational databases. A short introduction to Prolog is followed by a series of examples of queries. Several references are noted for anyone wishing to gain a deeper understanding. (MP)

  9. A Graphical Database Interface for Casual, Naive Users.

    ERIC Educational Resources Information Center

    Burgess, Clifford; Swigger, Kathleen

    1986-01-01

    Describes the design of a database interface for infrequent users of computers which consists of a graphical display of a model of a database and a natural language query language. This interface was designed for and tested with physicians at the University of Texas Health Science Center in Dallas. (LRW)

  10. NLPIR: A Theoretical Framework for Applying Natural Language Processing to Information Retrieval.

    ERIC Educational Resources Information Center

    Zhou, Lina; Zhang, Dongsong

    2003-01-01

    Proposes a theoretical framework called NLPIR that integrates natural language processing (NLP) into information retrieval (IR) based on the assumption that there exists representation distance between queries and documents. Discusses problems in traditional keyword-based IR, including relevance, and describes some existing NLP techniques.…

  11. Issues central to a useful image understanding environment

    NASA Astrophysics Data System (ADS)

    Beveridge, J. Ross; Draper, Bruce A.; Hanson, Allen R.; Riseman, Edward M.

    1992-04-01

    A recent DARPA initiative has sparked interested in software environments for computer vision. The goal is a single environment to support both basic research and technology transfer. This paper lays out six fundamental attributes such a system must possess: (1) support for both C and Lisp, (2) extensibility, (3) data sharing, (4) data query facilities tailored to vision, (5) graphics, and (6) code sharing. The first three attributes fundamentally constrain the system design. Support for both C and Lisp demands some form of database or data-store for passing data between languages. Extensibility demands that system support facilities, such as spatial retrieval of data, be readily extended to new user-defined datatypes. Finally, data sharing demands that data saved by one user, including data of a user-defined type, must be readable by another user.

  12. Content-aware network storage system supporting metadata retrieval

    NASA Astrophysics Data System (ADS)

    Liu, Ke; Qin, Leihua; Zhou, Jingli; Nie, Xuejun

    2008-12-01

    Nowadays, content-based network storage has become the hot research spot of academy and corporation[1]. In order to solve the problem of hit rate decline causing by migration and achieve the content-based query, we exploit a new content-aware storage system which supports metadata retrieval to improve the query performance. Firstly, we extend the SCSI command descriptor block to enable system understand those self-defined query requests. Secondly, the extracted metadata is encoded by extensible markup language to improve the universality. Thirdly, according to the demand of information lifecycle management (ILM), we store those data in different storage level and use corresponding query strategy to retrieval them. Fourthly, as the file content identifier plays an important role in locating data and calculating block correlation, we use it to fetch files and sort query results through friendly user interface. Finally, the experiments indicate that the retrieval strategy and sort algorithm have enhanced the retrieval efficiency and precision.

  13. A general natural-language text processor for clinical radiology.

    PubMed Central

    Friedman, C; Alderson, P O; Austin, J H; Cimino, J J; Johnson, S B

    1994-01-01

    OBJECTIVE: Development of a general natural-language processor that identifies clinical information in narrative reports and maps that information into a structured representation containing clinical terms. DESIGN: The natural-language processor provides three phases of processing, all of which are driven by different knowledge sources. The first phase performs the parsing. It identifies the structure of the text through use of a grammar that defines semantic patterns and a target form. The second phase, regularization, standardizes the terms in the initial target structure via a compositional mapping of multi-word phrases. The third phase, encoding, maps the terms to a controlled vocabulary. Radiology is the test domain for the processor and the target structure is a formal model for representing clinical information in that domain. MEASUREMENTS: The impression sections of 230 radiology reports were encoded by the processor. Results of an automated query of the resultant database for the occurrences of four diseases were compared with the analysis of a panel of three physicians to determine recall and precision. RESULTS: Without training specific to the four diseases, recall and precision of the system (combined effect of the processor and query generator) were 70% and 87%. Training of the query component increased recall to 85% without changing precision. PMID:7719797

  14. KARL: A Knowledge-Assisted Retrieval Language. M.S. Thesis Final Report, 1 Jul. 1985 - 31 Dec. 1987

    NASA Technical Reports Server (NTRS)

    Dominick, Wayne D. (Editor); Triantafyllopoulos, Spiros

    1985-01-01

    Data classification and storage are tasks typically performed by application specialists. In contrast, information users are primarily non-computer specialists who use information in their decision-making and other activities. Interaction efficiency between such users and the computer is often reduced by machine requirements and resulting user reluctance to use the system. This thesis examines the problems associated with information retrieval for non-computer specialist users, and proposes a method for communicating in restricted English that uses knowledge of the entities involved, relationships between entities, and basic English language syntax and semantics to translate the user requests into formal queries. The proposed method includes an intelligent dictionary, syntax and semantic verifiers, and a formal query generator. In addition, the proposed system has a learning capability that can improve portability and performance. With the increasing demand for efficient human-machine communication, the significance of this thesis becomes apparent. As human resources become more valuable, software systems that will assist in improving the human-machine interface will be needed and research addressing new solutions will be of utmost importance. This thesis presents an initial design and implementation as a foundation for further research and development into the emerging field of natural language database query systems.

  15. Meeting medical terminology needs--the Ontology-Enhanced Medical Concept Mapper.

    PubMed

    Leroy, G; Chen, H

    2001-12-01

    This paper describes the development and testing of the Medical Concept Mapper, a tool designed to facilitate access to online medical information sources by providing users with appropriate medical search terms for their personal queries. Our system is valuable for patients whose knowledge of medical vocabularies is inadequate to find the desired information, and for medical experts who search for information outside their field of expertise. The Medical Concept Mapper maps synonyms and semantically related concepts to a user's query. The system is unique because it integrates our natural language processing tool, i.e., the Arizona (AZ) Noun Phraser, with human-created ontologies, the Unified Medical Language System (UMLS) and WordNet, and our computer generated Concept Space, into one system. Our unique contribution results from combining the UMLS Semantic Net with Concept Space in our deep semantic parsing (DSP) algorithm. This algorithm establishes a medical query context based on the UMLS Semantic Net, which allows Concept Space terms to be filtered so as to isolate related terms relevant to the query. We performed two user studies in which Medical Concept Mapper terms were compared against human experts' terms. We conclude that the AZ Noun Phraser is well suited to extract medical phrases from user queries, that WordNet is not well suited to provide strictly medical synonyms, that the UMLS Metathesaurus is well suited to provide medical synonyms, and that Concept Space is well suited to provide related medical terms, especially when these terms are limited by our DSP algorithm.

  16. Ontobee: A linked ontology data server to support ontology term dereferencing, linkage, query and integration

    PubMed Central

    Ong, Edison; Xiang, Zuoshuang; Zhao, Bin; Liu, Yue; Lin, Yu; Zheng, Jie; Mungall, Chris; Courtot, Mélanie; Ruttenberg, Alan; He, Yongqun

    2017-01-01

    Linked Data (LD) aims to achieve interconnected data by representing entities using Unified Resource Identifiers (URIs), and sharing information using Resource Description Frameworks (RDFs) and HTTP. Ontologies, which logically represent entities and relations in specific domains, are the basis of LD. Ontobee (http://www.ontobee.org/) is a linked ontology data server that stores ontology information using RDF triple store technology and supports query, visualization and linkage of ontology terms. Ontobee is also the default linked data server for publishing and browsing biomedical ontologies in the Open Biological Ontology (OBO) Foundry (http://obofoundry.org) library. Ontobee currently hosts more than 180 ontologies (including 131 OBO Foundry Library ontologies) with over four million terms. Ontobee provides a user-friendly web interface for querying and visualizing the details and hierarchy of a specific ontology term. Using the eXtensible Stylesheet Language Transformation (XSLT) technology, Ontobee is able to dereference a single ontology term URI, and then output RDF/eXtensible Markup Language (XML) for computer processing or display the HTML information on a web browser for human users. Statistics and detailed information are generated and displayed for each ontology listed in Ontobee. In addition, a SPARQL web interface is provided for custom advanced SPARQL queries of one or multiple ontologies. PMID:27733503

  17. PatternQuery: web application for fast detection of biomacromolecular structural patterns in the entire Protein Data Bank.

    PubMed

    Sehnal, David; Pravda, Lukáš; Svobodová Vařeková, Radka; Ionescu, Crina-Maria; Koča, Jaroslav

    2015-07-01

    Well defined biomacromolecular patterns such as binding sites, catalytic sites, specific protein or nucleic acid sequences, etc. precisely modulate many important biological phenomena. We introduce PatternQuery, a web-based application designed for detection and fast extraction of such patterns. The application uses a unique query language with Python-like syntax to define the patterns that will be extracted from datasets provided by the user, or from the entire Protein Data Bank (PDB). Moreover, the database-wide search can be restricted using a variety of criteria, such as PDB ID, resolution, and organism of origin, to provide only relevant data. The extraction generally takes a few seconds for several hundreds of entries, up to approximately one hour for the whole PDB. The detected patterns are made available for download to enable further processing, as well as presented in a clear tabular and graphical form directly in the browser. The unique design of the language and the provided service could pave the way towards novel PDB-wide analyses, which were either difficult or unfeasible in the past. The application is available free of charge at http://ncbr.muni.cz/PatternQuery. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  18. Designing integrated computational biology pipelines visually.

    PubMed

    Jamil, Hasan M

    2013-01-01

    The long-term cost of developing and maintaining a computational pipeline that depends upon data integration and sophisticated workflow logic is too high to even contemplate "what if" or ad hoc type queries. In this paper, we introduce a novel application building interface for computational biology research, called VizBuilder, by leveraging a recent query language called BioFlow for life sciences databases. Using VizBuilder, it is now possible to develop ad hoc complex computational biology applications at throw away costs. The underlying query language supports data integration and workflow construction almost transparently and fully automatically, using a best effort approach. Users express their application by drawing it with VizBuilder icons and connecting them in a meaningful way. Completed applications are compiled and translated as BioFlow queries for execution by the data management system LifeDB, for which VizBuilder serves as a front end. We discuss VizBuilder features and functionalities in the context of a real life application after we briefly introduce BioFlow. The architecture and design principles of VizBuilder are also discussed. Finally, we outline future extensions of VizBuilder. To our knowledge, VizBuilder is a unique system that allows visually designing computational biology pipelines involving distributed and heterogeneous resources in an ad hoc manner.

  19. A knowledge base browser using hypermedia

    NASA Technical Reports Server (NTRS)

    Pocklington, Tony; Wang, Lui

    1990-01-01

    A hypermedia system is being developed to browse CLIPS (C Language Integrated Production System) knowledge bases. This system will be used to help train flight controllers for the Mission Control Center. Browsing this knowledge base will be accomplished either by having navigating through the various collection nodes that have already been defined, or through the query languages.

  20. A Tutorial in Creating Web-Enabled Databases with Inmagic DB/TextWorks through ODBC.

    ERIC Educational Resources Information Center

    Breeding, Marshall

    2000-01-01

    Explains how to create Web-enabled databases. Highlights include Inmagic's DB/Text WebPublisher product called DB/TextWorks; ODBC (Open Database Connectivity) drivers; Perl programming language; HTML coding; Structured Query Language (SQL); Common Gateway Interface (CGI) programming; and examples of HTML pages and Perl scripts. (LRW)

  1. Uptake in Incidental Focus on Form in Meaning-Focused ESL Lessons

    ERIC Educational Resources Information Center

    Loewen, Shawn

    2004-01-01

    Uptake is a term used to describe learners' responses to the provision of feedback after either an erroneous utterance or a query about a linguistic item within the context of meaning-focused language activities. Some researchers argue that uptake may contribute to second language acquisition by facilitating noticing and pushing learners to…

  2. Sense & Meaning: A Second Order Analysis of Language

    ERIC Educational Resources Information Center

    Singh, Amrendra Kumar; Mishra, Nirbhay

    2012-01-01

    What we know through language is whether the way things are or the ways the things are constructed through anthropological tradition and socio cultural shaping. Actually at the very outset, it is not very clear the settling point of this query. However, we can very well understand the point why a critical understanding of…

  3. Student Query Trend Assessment with Semantical Annotation and Artificial Intelligent Multi-Agents

    ERIC Educational Resources Information Center

    Malik, Kaleem Razzaq; Mir, Rizwan Riaz; Farhan, Muhammad; Rafiq, Tariq; Aslam, Muhammad

    2017-01-01

    Research in era of data representation to contribute and improve key data policy involving the assessment of learning, training and English language competency. Students are required to communicate in English with high level impact using language and influence. The electronic technology works to assess students' questions positively enabling…

  4. BioSWR – Semantic Web Services Registry for Bioinformatics

    PubMed Central

    Repchevsky, Dmitry; Gelpi, Josep Ll.

    2014-01-01

    Despite of the variety of available Web services registries specially aimed at Life Sciences, their scope is usually restricted to a limited set of well-defined types of services. While dedicated registries are generally tied to a particular format, general-purpose ones are more adherent to standards and usually rely on Web Service Definition Language (WSDL). Although WSDL is quite flexible to support common Web services types, its lack of semantic expressiveness led to various initiatives to describe Web services via ontology languages. Nevertheless, WSDL 2.0 descriptions gained a standard representation based on Web Ontology Language (OWL). BioSWR is a novel Web services registry that provides standard Resource Description Framework (RDF) based Web services descriptions along with the traditional WSDL based ones. The registry provides Web-based interface for Web services registration, querying and annotation, and is also accessible programmatically via Representational State Transfer (REST) API or using a SPARQL Protocol and RDF Query Language. BioSWR server is located at http://inb.bsc.es/BioSWR/and its code is available at https://sourceforge.net/projects/bioswr/under the LGPL license. PMID:25233118

  5. BioSWR--semantic web services registry for bioinformatics.

    PubMed

    Repchevsky, Dmitry; Gelpi, Josep Ll

    2014-01-01

    Despite of the variety of available Web services registries specially aimed at Life Sciences, their scope is usually restricted to a limited set of well-defined types of services. While dedicated registries are generally tied to a particular format, general-purpose ones are more adherent to standards and usually rely on Web Service Definition Language (WSDL). Although WSDL is quite flexible to support common Web services types, its lack of semantic expressiveness led to various initiatives to describe Web services via ontology languages. Nevertheless, WSDL 2.0 descriptions gained a standard representation based on Web Ontology Language (OWL). BioSWR is a novel Web services registry that provides standard Resource Description Framework (RDF) based Web services descriptions along with the traditional WSDL based ones. The registry provides Web-based interface for Web services registration, querying and annotation, and is also accessible programmatically via Representational State Transfer (REST) API or using a SPARQL Protocol and RDF Query Language. BioSWR server is located at http://inb.bsc.es/BioSWR/and its code is available at https://sourceforge.net/projects/bioswr/under the LGPL license.

  6. A Query Language for Handling Big Observation Data Sets in the Sensor Web

    NASA Astrophysics Data System (ADS)

    Autermann, Christian; Stasch, Christoph; Jirka, Simon; Koppe, Roland

    2017-04-01

    The Sensor Web provides a framework for the standardized Web-based sharing of environmental observations and sensor metadata. While the issue of varying data formats and protocols is addressed by these standards, the fast growing size of observational data is imposing new challenges for the application of these standards. Most solutions for handling big observational datasets currently focus on remote sensing applications, while big in-situ datasets relying on vector features still lack a solid approach. Conventional Sensor Web technologies may not be adequate, as the sheer size of the data transmitted and the amount of metadata accumulated may render traditional OGC Sensor Observation Services (SOS) unusable. Besides novel approaches to store and process observation data in place, e.g. by harnessing big data technologies from mainstream IT, the access layer has to be amended to utilize and integrate these large observational data archives into applications and to enable analysis. For this, an extension to the SOS will be discussed that establishes a query language to dynamically process and filter observations at storage level, similar to the OGC Web Coverage Service (WCS) and it's Web Coverage Processing Service (WCPS) extension. This will enable applications to request e.g. spatial or temporal aggregated data sets in a resolution it is able to display or it requires. The approach will be developed and implemented in cooperation with the The Alfred Wegener Institute, Helmholtz Centre for Polar and Marine Research whose catalogue of data compromises marine observations of physical, chemical and biological phenomena from a wide variety of sensors, including mobile (like research vessels, aircrafts or underwater vehicles) and stationary (like buoys or research stations). Observations are made with a high temporal resolution and the resulting time series may span multiple decades.

  7. Modeling and interoperability of heterogeneous genomic big data for integrative processing and querying.

    PubMed

    Masseroli, Marco; Kaitoua, Abdulrahman; Pinoli, Pietro; Ceri, Stefano

    2016-12-01

    While a huge amount of (epi)genomic data of multiple types is becoming available by using Next Generation Sequencing (NGS) technologies, the most important emerging problem is the so-called tertiary analysis, concerned with sense making, e.g., discovering how different (epi)genomic regions and their products interact and cooperate with each other. We propose a paradigm shift in tertiary analysis, based on the use of the Genomic Data Model (GDM), a simple data model which links genomic feature data to their associated experimental, biological and clinical metadata. GDM encompasses all the data formats which have been produced for feature extraction from (epi)genomic datasets. We specifically describe the mapping to GDM of SAM (Sequence Alignment/Map), VCF (Variant Call Format), NARROWPEAK (for called peaks produced by NGS ChIP-seq or DNase-seq methods), and BED (Browser Extensible Data) formats, but GDM supports as well all the formats describing experimental datasets (e.g., including copy number variations, DNA somatic mutations, or gene expressions) and annotations (e.g., regarding transcription start sites, genes, enhancers or CpG islands). We downloaded and integrated samples of all the above-mentioned data types and formats from multiple sources. The GDM is able to homogeneously describe semantically heterogeneous data and makes the ground for providing data interoperability, e.g., achieved through the GenoMetric Query Language (GMQL), a high-level, declarative query language for genomic big data. The combined use of the data model and the query language allows comprehensive processing of multiple heterogeneous data, and supports the development of domain-specific data-driven computations and bio-molecular knowledge discovery. Copyright © 2016 Elsevier Inc. All rights reserved.

  8. Towards a light-weight query engine for accessing health sensor data in a fall prevention system.

    PubMed

    Kreiner, Karl; Gossy, Christian; Drobics, Mario

    2014-01-01

    Connecting various sensors in sensor networks has become popular during the last decade. An important aspect next to storing and creating data is information access by domain experts, such as researchers, caretakers and physicians. In this work we present the design and prototypic implementation of a light-weight query engine using natural language processing for accessing health-related sensor data in a fall prevention system.

  9. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Madduri, Kamesh; Wu, Kesheng

    The Resource Description Framework (RDF) is a popular data model for representing linked data sets arising from the web, as well as large scienti c data repositories such as UniProt. RDF data intrinsically represents a labeled and directed multi-graph. SPARQL is a query language for RDF that expresses subgraph pattern- nding queries on this implicit multigraph in a SQL- like syntax. SPARQL queries generate complex intermediate join queries; to compute these joins e ciently, we propose a new strategy based on bitmap indexes. We store the RDF data in column-oriented structures as compressed bitmaps along with two dictionaries. This papermore » makes three new contributions. (i) We present an e cient parallel strategy for parsing the raw RDF data, building dictionaries of unique entities, and creating compressed bitmap indexes of the data. (ii) We utilize the constructed bitmap indexes to e ciently answer SPARQL queries, simplifying the join evaluations. (iii) To quantify the performance impact of using bitmap indexes, we compare our approach to the state-of-the-art triple-store RDF-3X. We nd that our bitmap index-based approach to answering queries is up to an order of magnitude faster for a variety of SPARQL queries, on gigascale RDF data sets.« less

  10. Visual graph query formulation and exploration: a new perspective on information retrieval at the edge

    NASA Astrophysics Data System (ADS)

    Kase, Sue E.; Vanni, Michelle; Knight, Joanne A.; Su, Yu; Yan, Xifeng

    2016-05-01

    Within operational environments decisions must be made quickly based on the information available. Identifying an appropriate knowledge base and accurately formulating a search query are critical tasks for decision-making effectiveness in dynamic situations. The spreading of graph data management tools to access large graph databases is a rapidly emerging research area of potential benefit to the intelligence community. A graph representation provides a natural way of modeling data in a wide variety of domains. Graph structures use nodes, edges, and properties to represent and store data. This research investigates the advantages of information search by graph query initiated by the analyst and interactively refined within the contextual dimensions of the answer space toward a solution. The paper introduces SLQ, a user-friendly graph querying system enabling the visual formulation of schemaless and structureless graph queries. SLQ is demonstrated with an intelligence analyst information search scenario focused on identifying individuals responsible for manufacturing a mosquito-hosted deadly virus. The scenario highlights the interactive construction of graph queries without prior training in complex query languages or graph databases, intuitive navigation through the problem space, and visualization of results in graphical format.

  11. Large Survey Database: A Distributed Framework for Storage and Analysis of Large Datasets

    NASA Astrophysics Data System (ADS)

    Juric, Mario

    2011-01-01

    The Large Survey Database (LSD) is a Python framework and DBMS for distributed storage, cross-matching and querying of large survey catalogs (>10^9 rows, >1 TB). The primary driver behind its development is the analysis of Pan-STARRS PS1 data. It is specifically optimized for fast queries and parallel sweeps of positionally and temporally indexed datasets. It transparently scales to more than >10^2 nodes, and can be made to function in "shared nothing" architectures. An LSD database consists of a set of vertically and horizontally partitioned tables, physically stored as compressed HDF5 files. Vertically, we partition the tables into groups of related columns ('column groups'), storing together logically related data (e.g., astrometry, photometry). Horizontally, the tables are partitioned into partially overlapping ``cells'' by position in space (lon, lat) and time (t). This organization allows for fast lookups based on spatial and temporal coordinates, as well as data and task distribution. The design was inspired by the success of Google BigTable (Chang et al., 2006). Our programming model is a pipelined extension of MapReduce (Dean and Ghemawat, 2004). An SQL-like query language is used to access data. For complex tasks, map-reduce ``kernels'' that operate on query results on a per-cell basis can be written, with the framework taking care of scheduling and execution. The combination leverages users' familiarity with SQL, while offering a fully distributed computing environment. LSD adds little overhead compared to direct Python file I/O. In tests, we sweeped through 1.1 Grows of PanSTARRS+SDSS data (220GB) less than 15 minutes on a dual CPU machine. In a cluster environment, we achieved bandwidths of 17Gbits/sec (I/O limited). Based on current experience, we believe LSD should scale to be useful for analysis and storage of LSST-scale datasets. It can be downloaded from http://mwscience.net/lsd.

  12. Comparative study on the customization of natural language interfaces to databases.

    PubMed

    Pazos R, Rodolfo A; Aguirre L, Marco A; González B, Juan J; Martínez F, José A; Pérez O, Joaquín; Verástegui O, Andrés A

    2016-01-01

    In the last decades the popularity of natural language interfaces to databases (NLIDBs) has increased, because in many cases information obtained from them is used for making important business decisions. Unfortunately, the complexity of their customization by database administrators make them difficult to use. In order for a NLIDB to obtain a high percentage of correctly translated queries, it is necessary that it is correctly customized for the database to be queried. In most cases the performance reported in NLIDB literature is the highest possible; i.e., the performance obtained when the interfaces were customized by the implementers. However, for end users it is more important the performance that the interface can yield when the NLIDB is customized by someone different from the implementers. Unfortunately, there exist very few articles that report NLIDB performance when the NLIDBs are not customized by the implementers. This article presents a semantically-enriched data dictionary (which permits solving many of the problems that occur when translating from natural language to SQL) and an experiment in which two groups of undergraduate students customized our NLIDB and English language frontend (ELF), considered one of the best available commercial NLIDBs. The experimental results show that, when customized by the first group, our NLIDB obtained a 44.69 % of correctly answered queries and ELF 11.83 % for the ATIS database, and when customized by the second group, our NLIDB attained 77.05 % and ELF 13.48 %. The performance attained by our NLIDB, when customized by ourselves was 90 %.

  13. Cognitive search model and a new query paradigm

    NASA Astrophysics Data System (ADS)

    Xu, Zhonghui

    2001-06-01

    This paper proposes a cognitive model in which people begin to search pictures by using semantic content and find a right picture by judging whether its visual content is a proper visualization of the semantics desired. It is essential that human search is not just a process of matching computation on visual feature but rather a process of visualization of the semantic content known. For people to search electronic images in the way as they manually do in the model, we suggest that querying be a semantic-driven process like design. A query-by-design paradigm is prosed in the sense that what you design is what you find. Unlike query-by-example, query-by-design allows users to specify the semantic content through an iterative and incremental interaction process so that a retrieval can start with association and identification of the given semantic content and get refined while further visual cues are available. An experimental image retrieval system, Kuafu, has been under development using the query-by-design paradigm and an iconic language is adopted.

  14. Evaluation methodology for query-based scene understanding systems

    NASA Astrophysics Data System (ADS)

    Huster, Todd P.; Ross, Timothy D.; Culbertson, Jared L.

    2015-05-01

    In this paper, we are proposing a method for the principled evaluation of scene understanding systems in a query-based framework. We can think of a query-based scene understanding system as a generalization of typical sensor exploitation systems where instead of performing a narrowly defined task (e.g., detect, track, classify, etc.), the system can perform general user-defined tasks specified in a query language. Examples of this type of system have been developed as part of DARPA's Mathematics of Sensing, Exploitation, and Execution (MSEE) program. There is a body of literature on the evaluation of typical sensor exploitation systems, but the open-ended nature of the query interface introduces new aspects to the evaluation problem that have not been widely considered before. In this paper, we state the evaluation problem and propose an approach to efficiently learn about the quality of the system under test. We consider the objective of the evaluation to be to build a performance model of the system under test, and we rely on the principles of Bayesian experiment design to help construct and select optimal queries for learning about the parameters of that model.

  15. SPANG: a SPARQL client supporting generation and reuse of queries for distributed RDF databases.

    PubMed

    Chiba, Hirokazu; Uchiyama, Ikuo

    2017-02-08

    Toward improved interoperability of distributed biological databases, an increasing number of datasets have been published in the standardized Resource Description Framework (RDF). Although the powerful SPARQL Protocol and RDF Query Language (SPARQL) provides a basis for exploiting RDF databases, writing SPARQL code is burdensome for users including bioinformaticians. Thus, an easy-to-use interface is necessary. We developed SPANG, a SPARQL client that has unique features for querying RDF datasets. SPANG dynamically generates typical SPARQL queries according to specified arguments. It can also call SPARQL template libraries constructed in a local system or published on the Web. Further, it enables combinatorial execution of multiple queries, each with a distinct target database. These features facilitate easy and effective access to RDF datasets and integrative analysis of distributed data. SPANG helps users to exploit RDF datasets by generation and reuse of SPARQL queries through a simple interface. This client will enhance integrative exploitation of biological RDF datasets distributed across the Web. This software package is freely available at http://purl.org/net/spang .

  16. A new relational database structure and online interface for the HITRAN database

    NASA Astrophysics Data System (ADS)

    Hill, Christian; Gordon, Iouli E.; Rothman, Laurence S.; Tennyson, Jonathan

    2013-11-01

    A new format for the HITRAN database is proposed. By storing the line-transition data in a number of linked tables described by a relational database schema, it is possible to overcome the limitations of the existing format, which have become increasingly apparent over the last few years as new and more varied data are being used by radiative-transfer models. Although the database in the new format can be searched using the well-established Structured Query Language (SQL), a web service, HITRANonline, has been deployed to allow users to make most common queries of the database using a graphical user interface in a web page. The advantages of the relational form of the database to ensuring data integrity and consistency are explored, and the compatibility of the online interface with the emerging standards of the Virtual Atomic and Molecular Data Centre (VAMDC) project is discussed. In particular, the ability to access HITRAN data using a standard query language from other websites, command line tools and from within computer programs is described.

  17. Do-It-Yourself: A Special Library's Approach to Creating Dynamic Web Pages Using Commercial Off-The-Shelf Applications

    NASA Technical Reports Server (NTRS)

    Steeman, Gerald; Connell, Christopher

    2000-01-01

    Many librarians may feel that dynamic Web pages are out of their reach, financially and technically. Yet we are reminded in library and Web design literature that static home pages are a thing of the past. This paper describes how librarians at the Institute for Defense Analyses (IDA) library developed a database-driven, dynamic intranet site using commercial off-the-shelf applications. Administrative issues include surveying a library users group for interest and needs evaluation; outlining metadata elements; and, committing resources from managing time to populate the database and training in Microsoft FrontPage and Web-to-database design. Technical issues covered include Microsoft Access database fundamentals, lessons learned in the Web-to-database process (including setting up Database Source Names (DSNs), redesigning queries to accommodate the Web interface, and understanding Access 97 query language vs. Standard Query Language (SQL)). This paper also offers tips on editing Active Server Pages (ASP) scripting to create desired results. A how-to annotated resource list closes out the paper.

  18. A geospatial database model for the management of remote sensing datasets at multiple spectral, spatial, and temporal scales

    NASA Astrophysics Data System (ADS)

    Ifimov, Gabriela; Pigeau, Grace; Arroyo-Mora, J. Pablo; Soffer, Raymond; Leblanc, George

    2017-10-01

    In this study the development and implementation of a geospatial database model for the management of multiscale datasets encompassing airborne imagery and associated metadata is presented. To develop the multi-source geospatial database we have used a Relational Database Management System (RDBMS) on a Structure Query Language (SQL) server which was then integrated into ArcGIS and implemented as a geodatabase. The acquired datasets were compiled, standardized, and integrated into the RDBMS, where logical associations between different types of information were linked (e.g. location, date, and instrument). Airborne data, at different processing levels (digital numbers through geocorrected reflectance), were implemented in the geospatial database where the datasets are linked spatially and temporally. An example dataset consisting of airborne hyperspectral imagery, collected for inter and intra-annual vegetation characterization and detection of potential hydrocarbon seepage events over pipeline areas, is presented. Our work provides a model for the management of airborne imagery, which is a challenging aspect of data management in remote sensing, especially when large volumes of data are collected.

  19. Conflict and Accommodation in Classroom Codeswitching in Taiwan

    ERIC Educational Resources Information Center

    Tien, Ching-yi

    2009-01-01

    The concept of "English only" as the best teaching-learning method in English as a foreign language classrooms has been promoted in Taiwan over the last decade. During that time, the concept has been queried and debated. Teachers and learners have come to realise that for beginners and slow language learners, the use of codeswitching in…

  20. SGML and Related Standards: New Directions as the Second Decade Begins.

    ERIC Educational Resources Information Center

    Mason, James David

    1997-01-01

    ISO--International Organization for Standards highlights the activities of WG8 (Working Group 8 of ISO) in the alignment of standards for a common tree model and common query languages. Examines the how Document Style Semantics and Specification Language (DSSSL) and HyTime make documents easier to work with and more powerful in their ability to…

  1. BROWSER: An Automatic Indexing On-Line Text Retrieval System. Annual Progress Report.

    ERIC Educational Resources Information Center

    Williams, J. H., Jr.

    The development and testing of the Browsing On-line With Selective Retrieval (BROWSER) text retrieval system allowing a natural language query statement and providing on-line browsing capabilities through an IBM 2260 display terminal is described. The prototype system contains data bases of 25,000 German language patent abstracts, 9,000 English…

  2. Designing a Syntax-Based Retrieval System for Supporting Language Learning

    ERIC Educational Resources Information Center

    Tsao, Nai-Lung; Kuo, Chin-Hwa; Wible, David; Hung, Tsung-Fu

    2009-01-01

    In this paper, we propose a syntax-based text retrieval system for on-line language learning and use a fast regular expression search engine as its main component. Regular expression searches provide more scalable querying and search results than keyword-based searches. However, without a well-designed index scheme, the execution time of regular…

  3. Graph Mining Meets the Semantic Web

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lee, Sangkeun; Sukumar, Sreenivas R; Lim, Seung-Hwan

    The Resource Description Framework (RDF) and SPARQL Protocol and RDF Query Language (SPARQL) were introduced about a decade ago to enable flexible schema-free data interchange on the Semantic Web. Today, data scientists use the framework as a scalable graph representation for integrating, querying, exploring and analyzing data sets hosted at different sources. With increasing adoption, the need for graph mining capabilities for the Semantic Web has emerged. We address that need through implementation of three popular iterative Graph Mining algorithms (Triangle count, Connected component analysis, and PageRank). We implement these algorithms as SPARQL queries, wrapped within Python scripts. We evaluatemore » the performance of our implementation on 6 real world data sets and show graph mining algorithms (that have a linear-algebra formulation) can indeed be unleashed on data represented as RDF graphs using the SPARQL query interface.« less

  4. Interaction and Communication of Agents in Networks and Language Complexity Estimates

    NASA Technical Reports Server (NTRS)

    Smid, Jan; Obitko, Marek; Fisher, David; Truszkowski, Walt

    2004-01-01

    Knowledge acquisition and sharing are arguably the most critical activities of communicating agents. We report about our on-going project featuring knowledge acquisition and sharing among communicating agents embedded in a network. The applications we target range from hardware robots to virtual entities such as internet agents. Agent experiments can be simulated using a convenient simulation language. We analyzed the complexity of communicating agent simulations using Java and Easel. Scenarios we have studied are listed below. The communication among agents can range from declarative queries to sub-natural language queries. 1) A set of agents monitoring an object are asked to build activity profiles based on exchanging elementary observations; 2) A set of car drivers form a line, where every car is following its predecessor. An unsafe distance cm create a strong wave in the line. Individual agents are asked to incorporate and apply directions how to avoid the wave. 3) A set of micro-vehicles form a grid and are asked to propagate information and concepts to a central server.

  5. Selected Topics from LVCSR Research for Asian Languages at Tokyo Tech

    NASA Astrophysics Data System (ADS)

    Furui, Sadaoki

    This paper presents our recent work in regard to building Large Vocabulary Continuous Speech Recognition (LVCSR) systems for the Thai, Indonesian, and Chinese languages. For Thai, since there is no word boundary in the written form, we have proposed a new method for automatically creating word-like units from a text corpus, and applied topic and speaking style adaptation to the language model to recognize spoken-style utterances. For Indonesian, we have applied proper noun-specific adaptation to acoustic modeling, and rule-based English-to-Indonesian phoneme mapping to solve the problem of large variation in proper noun and English word pronunciation in a spoken-query information retrieval system. In spoken Chinese, long organization names are frequently abbreviated, and abbreviated utterances cannot be recognized if the abbreviations are not included in the dictionary. We have proposed a new method for automatically generating Chinese abbreviations, and by expanding the vocabulary using the generated abbreviations, we have significantly improved the performance of spoken query-based search.

  6. Spatial Language Learning

    ERIC Educational Resources Information Center

    Fu, Zhengling

    2016-01-01

    Spatial language constitutes part of the basic fabric of language. Although languages may have the same number of terms to cover a set of spatial relations, they do not always do so in the same way. Spatial languages differ across languages quite radically, thus providing a real semantic challenge for second language learners. The essay first…

  7. Merged data models for multi-parameterized querying: Spectral data base meets GIS-based map archive

    NASA Astrophysics Data System (ADS)

    Naß, A.; D'Amore, M.; Helbert, J.

    2017-09-01

    Current and upcoming planetary missions deliver a huge amount of different data (remote sensing data, in-situ data, and derived products). Within this contribution present how different data (bases) can be managed and merged, to enable multi-parameterized querying based on the constant spatial context.

  8. Usage and administration manual for a geodatabase compendium of water-resources data-Rio Grande Basin from the Rio Arriba-Sandoval County line, New Mexico, to Presidio, Texas, 1889-2009

    USGS Publications Warehouse

    Burley, Thomas E.

    2011-01-01

    The U.S. Geological Survey, in cooperation with the New Mexico Interstate Stream Commission, developed a geodatabase compendium (hereinafter referred to as the 'geodatabase') of available water-resources data for the reach of the Rio Grande from Rio Arriba-Sandoval County line, New Mexico, to Presidio, Texas. Since 1889, a wealth of water-resources data has been collected in the Rio Grande Basin from Rio Arriba-Sandoval County line, New Mexico, to Presidio, Texas, for a variety of purposes. Collecting agencies, researchers, and organizations have included the U.S. Geological Survey, Bureau of Reclamation, International Boundary and Water Commission, State agencies, irrigation districts, municipal water utilities, universities, and other entities. About 1,750 data records were recently (2010) evaluated to enhance their usability by compiling them into a single geospatial relational database (geodatabase). This report is intended as a user's manual and administration guide for the geodatabase. All data available, including water quality, water level, and discharge data (both instantaneous and daily) from January 1, 1889, through December 17, 2009, were compiled for the study area. A flexible and efficient geodatabase design was used, enhancing the ability of the geodatabase to handle data from diverse sources and helping to ensure sustainability of the geodatabase with long-term maintenance. Geodatabase tables include daily data values, site locations and information, sample event information, and parameters, as well as data sources and collecting agencies. The end products of this effort are a comprehensive water-resources geodatabase that enables the visualization of primary sampling sites for surface discharges, groundwater elevations, and water-quality and associated data for the study area. In addition, repeatable data processing scripts, Structured Query Language queries for loading prepared data sources, and a detailed process for refreshing all data in the compendium have been developed. The geodatabase functionality allows users to explore spatial characteristics of the data, conduct spatial analyses, and pose questions to the geodatabase in the form of queries. Users can also customize and extend the geodatabase, combine it with other databases, or use the geodatabase design for other water-resources applications.

  9. Acquaintance: Language-Independent Document Categorization by N-Grams

    DTIC Science & Technology

    1995-11-01

    the topics. A typical topic (number 32) read “Cual es la importancia de las Naciones Unidas (NU) para Mexico?” To overcome this, the topic...from the query rather than adding anything substantive to it. The rendering of the above query became “ importancia de las Naciones Unidas (NU) para...individual tracks will be discussed below, the same software and basic procedure were used in each track. For the work in TREC-4, a generic, unoptimized

  10. Group Centric Information Sharing Using Hierarchical Models

    DTIC Science & Technology

    2011-01-01

    enable people to create data using RDF, build vocabularies using web ontology language (OWL), write rules and query data stores using SPARQL [8...a strict joined and the document was added with a strict add. In order to represent the fact that an action is allowed (or not), we have created a...greatly improve the system’s readiness to handle any number of access decision queries . a. The pair is tested against the gSIS Join and Add semantics

  11. Declarative Programming with Temporal Constraints, in the Language CG.

    PubMed

    Negreanu, Lorina

    2015-01-01

    Specifying and interpreting temporal constraints are key elements of knowledge representation and reasoning, with applications in temporal databases, agent programming, and ambient intelligence. We present and formally characterize the language CG, which tackles this issue. In CG, users are able to develop time-dependent programs, in a flexible and straightforward manner. Such programs can, in turn, be coupled with evolving environments, thus empowering users to control the environment's evolution. CG relies on a structure for storing temporal information, together with a dedicated query mechanism. Hence, we explore the computational complexity of our query satisfaction problem. We discuss previous implementation attempts of CG and introduce a novel prototype which relies on logic programming. Finally, we address the issue of consistency and correctness of CG program execution, using the Event-B modeling approach.

  12. Ontobee: A linked ontology data server to support ontology term dereferencing, linkage, query and integration.

    PubMed

    Ong, Edison; Xiang, Zuoshuang; Zhao, Bin; Liu, Yue; Lin, Yu; Zheng, Jie; Mungall, Chris; Courtot, Mélanie; Ruttenberg, Alan; He, Yongqun

    2017-01-04

    Linked Data (LD) aims to achieve interconnected data by representing entities using Unified Resource Identifiers (URIs), and sharing information using Resource Description Frameworks (RDFs) and HTTP. Ontologies, which logically represent entities and relations in specific domains, are the basis of LD. Ontobee (http://www.ontobee.org/) is a linked ontology data server that stores ontology information using RDF triple store technology and supports query, visualization and linkage of ontology terms. Ontobee is also the default linked data server for publishing and browsing biomedical ontologies in the Open Biological Ontology (OBO) Foundry (http://obofoundry.org) library. Ontobee currently hosts more than 180 ontologies (including 131 OBO Foundry Library ontologies) with over four million terms. Ontobee provides a user-friendly web interface for querying and visualizing the details and hierarchy of a specific ontology term. Using the eXtensible Stylesheet Language Transformation (XSLT) technology, Ontobee is able to dereference a single ontology term URI, and then output RDF/eXtensible Markup Language (XML) for computer processing or display the HTML information on a web browser for human users. Statistics and detailed information are generated and displayed for each ontology listed in Ontobee. In addition, a SPARQL web interface is provided for custom advanced SPARQL queries of one or multiple ontologies. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  13. In-database processing of a large collection of remote sensing data: applications and implementation

    NASA Astrophysics Data System (ADS)

    Kikhtenko, Vladimir; Mamash, Elena; Chubarov, Dmitri; Voronina, Polina

    2016-04-01

    Large archives of remote sensing data are now available to scientists, yet the need to work with individual satellite scenes or product files constrains studies that span a wide temporal range or spatial extent. The resources (storage capacity, computing power and network bandwidth) required for such studies are often beyond the capabilities of individual geoscientists. This problem has been tackled before in remote sensing research and inspired several information systems. Some of them such as NASA Giovanni [1] and Google Earth Engine have already proved their utility for science. Analysis tasks involving large volumes of numerical data are not unique to Earth Sciences. Recent advances in data science are enabled by the development of in-database processing engines that bring processing closer to storage, use declarative query languages to facilitate parallel scalability and provide high-level abstraction of the whole dataset. We build on the idea of bridging the gap between file archives containing remote sensing data and databases by integrating files into relational database as foreign data sources and performing analytical processing inside the database engine. Thereby higher level query language can efficiently address problems of arbitrary size: from accessing the data associated with a specific pixel or a grid cell to complex aggregation over spatial or temporal extents over a large number of individual data files. This approach was implemented using PostgreSQL for a Siberian regional archive of satellite data products holding hundreds of terabytes of measurements from multiple sensors and missions taken over a decade-long span. While preserving the original storage layout and therefore compatibility with existing applications the in-database processing engine provides a toolkit for provisioning remote sensing data in scientific workflows and applications. The use of SQL - a widely used higher level declarative query language - simplifies interoperability between desktop GIS, web applications and geographic web services and interactive scientific applications (MATLAB, IPython). The system is also automatically ingesting direct readout data from meteorological and research satellites in near-real time with distributed acquisition workflows managed by Taverna workflow engine [2]. The system has demonstrated its utility in performing non-trivial analytic processing such as the computation of the Robust Satellite Technique (RST) indices [3]. It had been useful in different tasks such as studying urban heat islands, analyzing patterns in the distribution of wildfire occurrences, detecting phenomena related to seismic and earthquake activity. Initial experience has highlighted several limitations of the proposed approach yet it has demonstrated ability to facilitate the use of large archives of remote sensing data by geoscientists. 1. J.G. Acker, G. Leptoukh, Online analysis enhances use of NASA Earth science data. EOS Trans. AGU, 2007, 88(2), P. 14-17. 2. D. Hull, K. Wolsfencroft, R. Stevens, C. Goble, M.R. Pocock, P. Li and T. Oinn, Taverna: a tool for building and running workflows of services. Nucleic Acids Research. 2006. V. 34. P. W729-W732. 3. V. Tramutoli, G. Di Bello, N. Pergola, S. Piscitelli, Robust satellite techniques for remote sensing of seismically active areas // Annals of Geophysics. 2001. no. 44(2). P. 295-312.

  14. Managing and Querying Image Annotation and Markup in XML.

    PubMed

    Wang, Fusheng; Pan, Tony; Sharma, Ashish; Saltz, Joel

    2010-01-01

    Proprietary approaches for representing annotations and image markup are serious barriers for researchers to share image data and knowledge. The Annotation and Image Markup (AIM) project is developing a standard based information model for image annotation and markup in health care and clinical trial environments. The complex hierarchical structures of AIM data model pose new challenges for managing such data in terms of performance and support of complex queries. In this paper, we present our work on managing AIM data through a native XML approach, and supporting complex image and annotation queries through native extension of XQuery language. Through integration with xService, AIM databases can now be conveniently shared through caGrid.

  15. Managing and Querying Image Annotation and Markup in XML

    PubMed Central

    Wang, Fusheng; Pan, Tony; Sharma, Ashish; Saltz, Joel

    2010-01-01

    Proprietary approaches for representing annotations and image markup are serious barriers for researchers to share image data and knowledge. The Annotation and Image Markup (AIM) project is developing a standard based information model for image annotation and markup in health care and clinical trial environments. The complex hierarchical structures of AIM data model pose new challenges for managing such data in terms of performance and support of complex queries. In this paper, we present our work on managing AIM data through a native XML approach, and supporting complex image and annotation queries through native extension of XQuery language. Through integration with xService, AIM databases can now be conveniently shared through caGrid. PMID:21218167

  16. Automatic Extraction of Destinations, Origins and Route Parts from Human Generated Route Directions

    NASA Astrophysics Data System (ADS)

    Zhang, Xiao; Mitra, Prasenjit; Klippel, Alexander; Maceachren, Alan

    Researchers from the cognitive and spatial sciences are studying text descriptions of movement patterns in order to examine how humans communicate and understand spatial information. In particular, route directions offer a rich source of information on how cognitive systems conceptualize movement patterns by segmenting them into meaningful parts. Route directions are composed using a plethora of cognitive spatial organization principles: changing levels of granularity, hierarchical organization, incorporation of cognitively and perceptually salient elements, and so forth. Identifying such information in text documents automatically is crucial for enabling machine-understanding of human spatial language. The benefits are: a) creating opportunities for large-scale studies of human linguistic behavior; b) extracting and georeferencing salient entities (landmarks) that are used by human route direction providers; c) developing methods to translate route directions to sketches and maps; and d) enabling queries on large corpora of crawled/analyzed movement data. In this paper, we introduce our approach and implementations that bring us closer to the goal of automatically processing linguistic route directions. We report on research directed at one part of the larger problem, that is, extracting the three most critical parts of route directions and movement patterns in general: origin, destination, and route parts. We use machine-learning based algorithms to extract these parts of routes, including, for example, destination names and types. We prove the effectiveness of our approach in several experiments using hand-tagged corpora.

  17. A Response to Jordan's (2004) "Explanatory Adequacy and Theories of Second Language Acquisition"

    ERIC Educational Resources Information Center

    Gregg, Kevin R.

    2005-01-01

    In a recent paper (Jordan, Geoff Jordan takes issue with some of my claims about second language acquisition (SLA) theory. Specifically, he queries the necessity of a property theory, and he finds my discussion of explanation unsatisfactory. In this brief reply, I try to answer his criticisms. In a brief but interesting paper, Geoff Jordan (2004:…

  18. Attention Development in 10-Month-Old Infants Selected by the WILSTAAR Screen for Pre-Language Difficulties

    ERIC Educational Resources Information Center

    St. James-Roberts, Ian; Alston, Enid

    2006-01-01

    Background: WILSTAAR comprises a programme for identifying and treating 8-10-month-old infants who are at risk of language and cognitive difficulties. It has been adopted by health trusts, and included in Sure Start intervention schemes, throughout the UK. This study addresses one of the main queries raised by critics of the programme, by…

  19. English-Chinese Cross-Language IR Using Bilingual Dictionaries

    DTIC Science & Technology

    2006-01-01

    specialized dictionaries together contain about two million entries [6]. 4 Monolingual Experiment The Chinese documents and the Chinese translations of... monolingual performance. The main performance-limiting factor is the limited coverage of the dictionary used in query translation. Some of the key con...English-Chinese Cross-Language IR using Bilingual Dictionaries Aitao Chen , Hailing Jiang , and Fredric Gey School of Information Management

  20. Reactome graph database: Efficient access to complex pathway data

    PubMed Central

    Korninger, Florian; Viteri, Guilherme; Marin-Garcia, Pablo; Ping, Peipei; Wu, Guanming; Stein, Lincoln; D’Eustachio, Peter

    2018-01-01

    Reactome is a free, open-source, open-data, curated and peer-reviewed knowledgebase of biomolecular pathways. One of its main priorities is to provide easy and efficient access to its high quality curated data. At present, biological pathway databases typically store their contents in relational databases. This limits access efficiency because there are performance issues associated with queries traversing highly interconnected data. The same data in a graph database can be queried more efficiently. Here we present the rationale behind the adoption of a graph database (Neo4j) as well as the new ContentService (REST API) that provides access to these data. The Neo4j graph database and its query language, Cypher, provide efficient access to the complex Reactome data model, facilitating easy traversal and knowledge discovery. The adoption of this technology greatly improved query efficiency, reducing the average query time by 93%. The web service built on top of the graph database provides programmatic access to Reactome data by object oriented queries, but also supports more complex queries that take advantage of the new underlying graph-based data storage. By adopting graph database technology we are providing a high performance pathway data resource to the community. The Reactome graph database use case shows the power of NoSQL database engines for complex biological data types. PMID:29377902

  1. An approach for heterogeneous and loosely coupled geospatial data distributed computing

    NASA Astrophysics Data System (ADS)

    Chen, Bin; Huang, Fengru; Fang, Yu; Huang, Zhou; Lin, Hui

    2010-07-01

    Most GIS (Geographic Information System) applications tend to have heterogeneous and autonomous geospatial information resources, and the availability of these local resources is unpredictable and dynamic under a distributed computing environment. In order to make use of these local resources together to solve larger geospatial information processing problems that are related to an overall situation, in this paper, with the support of peer-to-peer computing technologies, we propose a geospatial data distributed computing mechanism that involves loosely coupled geospatial resource directories and a term named as Equivalent Distributed Program of global geospatial queries to solve geospatial distributed computing problems under heterogeneous GIS environments. First, a geospatial query process schema for distributed computing as well as a method for equivalent transformation from a global geospatial query to distributed local queries at SQL (Structured Query Language) level to solve the coordinating problem among heterogeneous resources are presented. Second, peer-to-peer technologies are used to maintain a loosely coupled network environment that consists of autonomous geospatial information resources, thus to achieve decentralized and consistent synchronization among global geospatial resource directories, and to carry out distributed transaction management of local queries. Finally, based on the developed prototype system, example applications of simple and complex geospatial data distributed queries are presented to illustrate the procedure of global geospatial information processing.

  2. Reactome graph database: Efficient access to complex pathway data.

    PubMed

    Fabregat, Antonio; Korninger, Florian; Viteri, Guilherme; Sidiropoulos, Konstantinos; Marin-Garcia, Pablo; Ping, Peipei; Wu, Guanming; Stein, Lincoln; D'Eustachio, Peter; Hermjakob, Henning

    2018-01-01

    Reactome is a free, open-source, open-data, curated and peer-reviewed knowledgebase of biomolecular pathways. One of its main priorities is to provide easy and efficient access to its high quality curated data. At present, biological pathway databases typically store their contents in relational databases. This limits access efficiency because there are performance issues associated with queries traversing highly interconnected data. The same data in a graph database can be queried more efficiently. Here we present the rationale behind the adoption of a graph database (Neo4j) as well as the new ContentService (REST API) that provides access to these data. The Neo4j graph database and its query language, Cypher, provide efficient access to the complex Reactome data model, facilitating easy traversal and knowledge discovery. The adoption of this technology greatly improved query efficiency, reducing the average query time by 93%. The web service built on top of the graph database provides programmatic access to Reactome data by object oriented queries, but also supports more complex queries that take advantage of the new underlying graph-based data storage. By adopting graph database technology we are providing a high performance pathway data resource to the community. The Reactome graph database use case shows the power of NoSQL database engines for complex biological data types.

  3. Optimizing Interactive Development of Data-Intensive Applications

    PubMed Central

    Interlandi, Matteo; Tetali, Sai Deep; Gulzar, Muhammad Ali; Noor, Joseph; Condie, Tyson; Kim, Miryung; Millstein, Todd

    2017-01-01

    Modern Data-Intensive Scalable Computing (DISC) systems are designed to process data through batch jobs that execute programs (e.g., queries) compiled from a high-level language. These programs are often developed interactively by posing ad-hoc queries over the base data until a desired result is generated. We observe that there can be significant overlap in the structure of these queries used to derive the final program. Yet, each successive execution of a slightly modified query is performed anew, which can significantly increase the development cycle. Vega is an Apache Spark framework that we have implemented for optimizing a series of similar Spark programs, likely originating from a development or exploratory data analysis session. Spark developers (e.g., data scientists) can leverage Vega to significantly reduce the amount of time it takes to re-execute a modified Spark program, reducing the overall time to market for their Big Data applications. PMID:28405637

  4. Using a data base management system for modelling SSME test history data

    NASA Technical Reports Server (NTRS)

    Abernethy, K.

    1985-01-01

    The usefulness of a data base management system (DBMS) for modelling historical test data for the complete series of static test firings for the Space Shuttle Main Engine (SSME) was assessed. From an analysis of user data base query requirements, it became clear that a relational DMBS which included a relationally complete query language would permit a model satisfying the query requirements. Representative models and sample queries are discussed. A list of environment-particular evaluation criteria for the desired DBMS was constructed; these criteria include requirements in the areas of user-interface complexity, program independence, flexibility, modifiability, and output capability. The evaluation process included the construction of several prototype data bases for user assessement. The systems studied, representing the three major DBMS conceptual models, were: MIRADS, a hierarchical system; DMS-1100, a CODASYL-based network system; ORACLE, a relational system; and DATATRIEVE, a relational-type system.

  5. Code query by example

    NASA Astrophysics Data System (ADS)

    Vaucouleur, Sebastien

    2011-02-01

    We introduce code query by example for customisation of evolvable software products in general and of enterprise resource planning systems (ERPs) in particular. The concept is based on an initial empirical study on practices around ERP systems. We motivate our design choices based on those empirical results, and we show how the proposed solution helps with respect to the infamous upgrade problem: the conflict between the need for customisation and the need for upgrade of ERP systems. We further show how code query by example can be used as a form of lightweight static analysis, to detect automatically potential defects in large software products. Code query by example as a form of lightweight static analysis is particularly interesting in the context of ERP systems: it is often the case that programmers working in this field are not computer science specialists but more of domain experts. Hence, they require a simple language to express custom rules.

  6. A data model and database for high-resolution pathology analytical image informatics.

    PubMed

    Wang, Fusheng; Kong, Jun; Cooper, Lee; Pan, Tony; Kurc, Tahsin; Chen, Wenjin; Sharma, Ashish; Niedermayr, Cristobal; Oh, Tae W; Brat, Daniel; Farris, Alton B; Foran, David J; Saltz, Joel

    2011-01-01

    The systematic analysis of imaged pathology specimens often results in a vast amount of morphological information at both the cellular and sub-cellular scales. While microscopy scanners and computerized analysis are capable of capturing and analyzing data rapidly, microscopy image data remain underutilized in research and clinical settings. One major obstacle which tends to reduce wider adoption of these new technologies throughout the clinical and scientific communities is the challenge of managing, querying, and integrating the vast amounts of data resulting from the analysis of large digital pathology datasets. This paper presents a data model, which addresses these challenges, and demonstrates its implementation in a relational database system. This paper describes a data model, referred to as Pathology Analytic Imaging Standards (PAIS), and a database implementation, which are designed to support the data management and query requirements of detailed characterization of micro-anatomic morphology through many interrelated analysis pipelines on whole-slide images and tissue microarrays (TMAs). (1) Development of a data model capable of efficiently representing and storing virtual slide related image, annotation, markup, and feature information. (2) Development of a database, based on the data model, capable of supporting queries for data retrieval based on analysis and image metadata, queries for comparison of results from different analyses, and spatial queries on segmented regions, features, and classified objects. The work described in this paper is motivated by the challenges associated with characterization of micro-scale features for comparative and correlative analyses involving whole-slides tissue images and TMAs. Technologies for digitizing tissues have advanced significantly in the past decade. Slide scanners are capable of producing high-magnification, high-resolution images from whole slides and TMAs within several minutes. Hence, it is becoming increasingly feasible for basic, clinical, and translational research studies to produce thousands of whole-slide images. Systematic analysis of these large datasets requires efficient data management support for representing and indexing results from hundreds of interrelated analyses generating very large volumes of quantifications such as shape and texture and of classifications of the quantified features. We have designed a data model and a database to address the data management requirements of detailed characterization of micro-anatomic morphology through many interrelated analysis pipelines. The data model represents virtual slide related image, annotation, markup and feature information. The database supports a wide range of metadata and spatial queries on images, annotations, markups, and features. We currently have three databases running on a Dell PowerEdge T410 server with CentOS 5.5 Linux operating system. The database server is IBM DB2 Enterprise Edition 9.7.2. The set of databases consists of 1) a TMA database containing image analysis results from 4740 cases of breast cancer, with 641 MB storage size; 2) an algorithm validation database, which stores markups and annotations from two segmentation algorithms and two parameter sets on 18 selected slides, with 66 GB storage size; and 3) an in silico brain tumor study database comprising results from 307 TCGA slides, with 365 GB storage size. The latter two databases also contain human-generated annotations and markups for regions and nuclei. Modeling and managing pathology image analysis results in a database provide immediate benefits on the value and usability of data in a research study. The database provides powerful query capabilities, which are otherwise difficult or cumbersome to support by other approaches such as programming languages. Standardized, semantic annotated data representation and interfaces also make it possible to more efficiently share image data and analysis results.

  7. Integrating Syntax, Semantics, and Discourse DARPA Natural Language Understanding Program. Volume 3. Papers

    DTIC Science & Technology

    1989-09-30

    parses, in a second experiment. This procedure used PUNDIT’s Selection Pattern Query and Response ( SPQR ) component JLang19881. We first used SPQR in...messages pattern. SPQR continues the analysis of the ISR. from each domain, and the resulting output is and the parsing of the sentence is allowed to...UNISYS P. 0. Box 517, Paoli, PA 19301 ABSTRACT knowledge. This paper presents SPQR (Selectional Pat- One obvious benefit of acquiring domain- tern Queries

  8. The Design and Implementation of the Ariel Active Database Rule System

    DTIC Science & Technology

    1991-10-01

    but only as a main-memory prototype. The POSTGRES rule system (PRS) [SHP88, SRH90] and the Starburst rule system (SRS) [WCL91, HCL+90] have been...query language of POSTGRES for specifying data definition commands, queries and updates [SRH90]. POSTQUEL commands retrieve, append, delete, and replace...placed on an arbitrary attribute (e.g., one without an index) ( POSTGRES rule system [SHP88, SHP89, SR1I90], HiPAC [C+891, DIPS [SLR89], Alert [SPAM91

  9. Towards ontology-driven navigation of the lipid bibliosphere

    PubMed Central

    Baker, Christopher JO; Kanagasabai, Rajaraman; Ang, Wee Tiong; Veeramani, Anitha; Low, Hong-Sang; Wenk, Markus R

    2008-01-01

    Background The indexing of scientific literature and content is a relevant and contemporary requirement within life science information systems. Navigating information available in legacy formats continues to be a challenge both in enterprise and academic domains. The emergence of semantic web technologies and their fusion with artificial intelligence techniques has provided a new toolkit with which to address these data integration challenges. In the emerging field of lipidomics such navigation challenges are barriers to the translation of scientific results into actionable knowledge, critical to the treatment of diseases such as Alzheimer's syndrome, Mycobacterium infections and cancer. Results We present a literature-driven workflow involving document delivery and natural language processing steps generating tagged sentences containing lipid, protein and disease names, which are instantiated to custom designed lipid ontology. We describe the design challenges in capturing lipid nomenclature, the mandate of the ontology and its role as query model in the navigation of the lipid bibliosphere. We illustrate the extent of the description logic-based A-box query capability provided by the instantiated ontology using a graphical query composer to query sentences describing lipid-protein and lipid-disease correlations. Conclusion As scientists accept the need to readjust the manner in which we search for information and derive knowledge we illustrate a system that can constrain the literature explosion and knowledge navigation problems. Specifically we have focussed on solving this challenge for lipidomics researchers who have to deal with the lack of standardized vocabulary, differing classification schemes, and a wide array of synonyms before being able to derive scientific insights. The use of the OWL-DL variant of the Web Ontology Language (OWL) and description logic reasoning is pivotal in this regard, providing the lipid scientist with advanced query access to the results of text mining algorithms instantiated into the ontology. The visual query paradigm assists in the adoption of this technology. PMID:18315858

  10. Towards ontology-driven navigation of the lipid bibliosphere.

    PubMed

    Baker, Christopher Jo; Kanagasabai, Rajaraman; Ang, Wee Tiong; Veeramani, Anitha; Low, Hong-Sang; Wenk, Markus R

    2008-01-01

    The indexing of scientific literature and content is a relevant and contemporary requirement within life science information systems. Navigating information available in legacy formats continues to be a challenge both in enterprise and academic domains. The emergence of semantic web technologies and their fusion with artificial intelligence techniques has provided a new toolkit with which to address these data integration challenges. In the emerging field of lipidomics such navigation challenges are barriers to the translation of scientific results into actionable knowledge, critical to the treatment of diseases such as Alzheimer's syndrome, Mycobacterium infections and cancer. We present a literature-driven workflow involving document delivery and natural language processing steps generating tagged sentences containing lipid, protein and disease names, which are instantiated to custom designed lipid ontology. We describe the design challenges in capturing lipid nomenclature, the mandate of the ontology and its role as query model in the navigation of the lipid bibliosphere. We illustrate the extent of the description logic-based A-box query capability provided by the instantiated ontology using a graphical query composer to query sentences describing lipid-protein and lipid-disease correlations. As scientists accept the need to readjust the manner in which we search for information and derive knowledge we illustrate a system that can constrain the literature explosion and knowledge navigation problems. Specifically we have focussed on solving this challenge for lipidomics researchers who have to deal with the lack of standardized vocabulary, differing classification schemes, and a wide array of synonyms before being able to derive scientific insights. The use of the OWL-DL variant of the Web Ontology Language (OWL) and description logic reasoning is pivotal in this regard, providing the lipid scientist with advanced query access to the results of text mining algorithms instantiated into the ontology. The visual query paradigm assists in the adoption of this technology.

  11. Addressing the Challenges of Multi-Domain Data Integration with the SemantEco Framework

    NASA Astrophysics Data System (ADS)

    Patton, E. W.; Seyed, P.; McGuinness, D. L.

    2013-12-01

    Data integration across multiple domains will continue to be a challenge with the proliferation of big data in the sciences. Data origination issues and how data are manipulated are critical to enable scientists to understand and consume disparate datasets as research becomes more multidisciplinary. We present the SemantEco framework as an exemplar for designing an integrative portal for data discovery, exploration, and interpretation that uses best practice W3C Recommendations. We use the Resource Description Framework (RDF) with extensible ontologies described in the Web Ontology Language (OWL) to provide graph-based data representation. Furthermore, SemantEco ingests data via the software package csv2rdf4lod, which generates data provenance using the W3C provenance recommendation (PROV). Our presentation will discuss benefits and challenges of semantic integration, their effect on runtime performance, and how the SemantEco framework assisted in identifying performance issues and improved query performance across multiple domains by an order of magnitude. SemantEco benefits from a semantic approach that provides an 'open world', which allows data to incrementally change just as it does in the real world. SemantEco modules may load new ontologies and data using the W3C's SPARQL Protocol and RDF Query Language via HTTP. Modules may also provide user interface elements for applications and query capabilities to support new use cases. Modules can associate with domains, which are first-class objects in SemantEco. This enables SemantEco to perform integration and reasoning both within and across domains on module-provided data. The SemantEco framework has been used to construct a web portal for environmental and ecological data. The portal includes water and air quality data from the U.S. Geological Survey (USGS) and Environmental Protection Agency (EPA) and species observation counts for birds and fish from the Avian Knowledge Network and the Santa Barbara Long Term Ecological Research, respectively. We provide regulation ontologies using OWL2 datatype facets to detect out-of-range measurements for environmental standards set by the EPA, i.a. Users adjust queries using module-defined facets and a map presents the resulting measurement sites. Custom icons identify sites that violate regulations, making them easy to locate. Selecting a site gives the option of charting spatially proximate data from different domains over time. Our portal currently provides 1.6 billion triples of scientific data in RDF. We segment data by ZIP code and reasoning over 2157 measurements with our EPA regulation ontology that contains 131 regulations takes 2.5 seconds on a 2.4 GHz Intel Core 2 Quad with 8 GB of RAM. SemantEco's modular design and reasoning capabilities make it an exemplar for building multidisciplinary data integration tools that provide data access to scientists and the general population alike. Its provenance tracking provides accountability and its reasoning services can assist users in interpreting data. Future work includes support for geographical queries using the Open Geospatial Consortium's GeoSPARQL standard.

  12. A search engine to access PubMed monolingual subsets: proof of concept and evaluation in French.

    PubMed

    Griffon, Nicolas; Schuers, Matthieu; Soualmia, Lina Fatima; Grosjean, Julien; Kerdelhué, Gaétan; Kergourlay, Ivan; Dahamna, Badisse; Darmoni, Stéfan Jacques

    2014-12-01

    PubMed contains numerous articles in languages other than English. However, existing solutions to access these articles in the language in which they were written remain unconvincing. The aim of this study was to propose a practical search engine, called Multilingual PubMed, which will permit access to a PubMed subset in 1 language and to evaluate the precision and coverage for the French version (Multilingual PubMed-French). To create this tool, translations of MeSH were enriched (eg, adding synonyms and translations in French) and integrated into a terminology portal. PubMed subsets in several European languages were also added to our database using a dedicated parser. The response time for the generic semantic search engine was evaluated for simple queries. BabelMeSH, Multilingual PubMed-French, and 3 different PubMed strategies were compared by searching for literature in French. Precision and coverage were measured for 20 randomly selected queries. The results were evaluated as relevant to title and abstract, the evaluator being blind to search strategy. More than 650,000 PubMed citations in French were integrated into the Multilingual PubMed-French information system. The response times were all below the threshold defined for usability (2 seconds). Two search strategies (Multilingual PubMed-French and 1 PubMed strategy) showed high precision (0.93 and 0.97, respectively), but coverage was 4 times higher for Multilingual PubMed-French. It is now possible to freely access biomedical literature using a practical search tool in French. This tool will be of particular interest for health professionals and other end users who do not read or query sufficiently in English. The information system is theoretically well suited to expand the approach to other European languages, such as German, Spanish, Norwegian, and Portuguese.

  13. A Search Engine to Access PubMed Monolingual Subsets: Proof of Concept and Evaluation in French

    PubMed Central

    Schuers, Matthieu; Soualmia, Lina Fatima; Grosjean, Julien; Kerdelhué, Gaétan; Kergourlay, Ivan; Dahamna, Badisse; Darmoni, Stéfan Jacques

    2014-01-01

    Background PubMed contains numerous articles in languages other than English. However, existing solutions to access these articles in the language in which they were written remain unconvincing. Objective The aim of this study was to propose a practical search engine, called Multilingual PubMed, which will permit access to a PubMed subset in 1 language and to evaluate the precision and coverage for the French version (Multilingual PubMed-French). Methods To create this tool, translations of MeSH were enriched (eg, adding synonyms and translations in French) and integrated into a terminology portal. PubMed subsets in several European languages were also added to our database using a dedicated parser. The response time for the generic semantic search engine was evaluated for simple queries. BabelMeSH, Multilingual PubMed-French, and 3 different PubMed strategies were compared by searching for literature in French. Precision and coverage were measured for 20 randomly selected queries. The results were evaluated as relevant to title and abstract, the evaluator being blind to search strategy. Results More than 650,000 PubMed citations in French were integrated into the Multilingual PubMed-French information system. The response times were all below the threshold defined for usability (2 seconds). Two search strategies (Multilingual PubMed-French and 1 PubMed strategy) showed high precision (0.93 and 0.97, respectively), but coverage was 4 times higher for Multilingual PubMed-French. Conclusions It is now possible to freely access biomedical literature using a practical search tool in French. This tool will be of particular interest for health professionals and other end users who do not read or query sufficiently in English. The information system is theoretically well suited to expand the approach to other European languages, such as German, Spanish, Norwegian, and Portuguese. PMID:25448528

  14. ODG: Omics database generator - a tool for generating, querying, and analyzing multi-omics comparative databases to facilitate biological understanding.

    PubMed

    Guhlin, Joseph; Silverstein, Kevin A T; Zhou, Peng; Tiffin, Peter; Young, Nevin D

    2017-08-10

    Rapid generation of omics data in recent years have resulted in vast amounts of disconnected datasets without systemic integration and knowledge building, while individual groups have made customized, annotated datasets available on the web with few ways to link them to in-lab datasets. With so many research groups generating their own data, the ability to relate it to the larger genomic and comparative genomic context is becoming increasingly crucial to make full use of the data. The Omics Database Generator (ODG) allows users to create customized databases that utilize published genomics data integrated with experimental data which can be queried using a flexible graph database. When provided with omics and experimental data, ODG will create a comparative, multi-dimensional graph database. ODG can import definitions and annotations from other sources such as InterProScan, the Gene Ontology, ENZYME, UniPathway, and others. This annotation data can be especially useful for studying new or understudied species for which transcripts have only been predicted, and rapidly give additional layers of annotation to predicted genes. In better studied species, ODG can perform syntenic annotation translations or rapidly identify characteristics of a set of genes or nucleotide locations, such as hits from an association study. ODG provides a web-based user-interface for configuring the data import and for querying the database. Queries can also be run from the command-line and the database can be queried directly through programming language hooks available for most languages. ODG supports most common genomic formats as well as generic, easy to use tab-separated value format for user-provided annotations. ODG is a user-friendly database generation and query tool that adapts to the supplied data to produce a comparative genomic database or multi-layered annotation database. ODG provides rapid comparative genomic annotation and is therefore particularly useful for non-model or understudied species. For species for which more data are available, ODG can be used to conduct complex multi-omics, pattern-matching queries.

  15. Measuring Up: Implementing a Dental Quality Measure in the Electronic Health Record Context

    PubMed Central

    Bhardwaj, Aarti; Ramoni, Rachel; Kalenderian, Elsbeth; Neumann, Ana; Hebballi, Nutan B; White, Joel M; McClellan, Lyle; Walji, Muhammad F

    2015-01-01

    Background Quality improvement requires quality measures that are validly implementable. In this work, we assessed the feasibility and performance of an automated electronic Meaningful Use dental clinical quality measure (percentage of children who received fluoride varnish). Methods We defined how to implement the automated measure queries in a dental electronic health record (EHR). Within records identified through automated query, we manually reviewed a subsample to assess the performance of the query. Results The automated query found 71.0% of patients to have had fluoride varnish compared to 77.6% found using the manual chart review. The automated quality measure performance was 90.5% sensitivity, 90.8% specificity, 96.9% positive predictive value, and 75.2% negative predictive value. Conclusions Our findings support the feasibility of automated dental quality measure queries in the context of sufficient structured data. Information noted only in the free text rather than in structured data would require natural language processing approaches to effectively query. Practical Implications To participate in self-directed quality improvement, dental clinicians must embrace the accountability era. Commitment to quality will require enhanced documentation in order to support near-term automated calculation of quality measures. PMID:26562736

  16. Executing Complexity-Increasing Queries in Relational (MySQL) and NoSQL (MongoDB and EXist) Size-Growing ISO/EN 13606 Standardized EHR Databases

    PubMed Central

    Sánchez-de-Madariaga, Ricardo; Muñoz, Adolfo; Castro, Antonio L; Moreno, Oscar; Pascual, Mario

    2018-01-01

    This research shows a protocol to assess the computational complexity of querying relational and non-relational (NoSQL (not only Structured Query Language)) standardized electronic health record (EHR) medical information database systems (DBMS). It uses a set of three doubling-sized databases, i.e. databases storing 5000, 10,000 and 20,000 realistic standardized EHR extracts, in three different database management systems (DBMS): relational MySQL object-relational mapping (ORM), document-based NoSQL MongoDB, and native extensible markup language (XML) NoSQL eXist. The average response times to six complexity-increasing queries were computed, and the results showed a linear behavior in the NoSQL cases. In the NoSQL field, MongoDB presents a much flatter linear slope than eXist. NoSQL systems may also be more appropriate to maintain standardized medical information systems due to the special nature of the updating policies of medical information, which should not affect the consistency and efficiency of the data stored in NoSQL databases. One limitation of this protocol is the lack of direct results of improved relational systems such as archetype relational mapping (ARM) with the same data. However, the interpolation of doubling-size database results to those presented in the literature and other published results suggests that NoSQL systems might be more appropriate in many specific scenarios and problems to be solved. For example, NoSQL may be appropriate for document-based tasks such as EHR extracts used in clinical practice, or edition and visualization, or situations where the aim is not only to query medical information, but also to restore the EHR in exactly its original form. PMID:29608174

  17. Executing Complexity-Increasing Queries in Relational (MySQL) and NoSQL (MongoDB and EXist) Size-Growing ISO/EN 13606 Standardized EHR Databases.

    PubMed

    Sánchez-de-Madariaga, Ricardo; Muñoz, Adolfo; Castro, Antonio L; Moreno, Oscar; Pascual, Mario

    2018-03-19

    This research shows a protocol to assess the computational complexity of querying relational and non-relational (NoSQL (not only Structured Query Language)) standardized electronic health record (EHR) medical information database systems (DBMS). It uses a set of three doubling-sized databases, i.e. databases storing 5000, 10,000 and 20,000 realistic standardized EHR extracts, in three different database management systems (DBMS): relational MySQL object-relational mapping (ORM), document-based NoSQL MongoDB, and native extensible markup language (XML) NoSQL eXist. The average response times to six complexity-increasing queries were computed, and the results showed a linear behavior in the NoSQL cases. In the NoSQL field, MongoDB presents a much flatter linear slope than eXist. NoSQL systems may also be more appropriate to maintain standardized medical information systems due to the special nature of the updating policies of medical information, which should not affect the consistency and efficiency of the data stored in NoSQL databases. One limitation of this protocol is the lack of direct results of improved relational systems such as archetype relational mapping (ARM) with the same data. However, the interpolation of doubling-size database results to those presented in the literature and other published results suggests that NoSQL systems might be more appropriate in many specific scenarios and problems to be solved. For example, NoSQL may be appropriate for document-based tasks such as EHR extracts used in clinical practice, or edition and visualization, or situations where the aim is not only to query medical information, but also to restore the EHR in exactly its original form.

  18. Processing SPARQL queries with regular expressions in RDF databases

    PubMed Central

    2011-01-01

    Background As the Resource Description Framework (RDF) data model is widely used for modeling and sharing a lot of online bioinformatics resources such as Uniprot (dev.isb-sib.ch/projects/uniprot-rdf) or Bio2RDF (bio2rdf.org), SPARQL - a W3C recommendation query for RDF databases - has become an important query language for querying the bioinformatics knowledge bases. Moreover, due to the diversity of users’ requests for extracting information from the RDF data as well as the lack of users’ knowledge about the exact value of each fact in the RDF databases, it is desirable to use the SPARQL query with regular expression patterns for querying the RDF data. To the best of our knowledge, there is currently no work that efficiently supports regular expression processing in SPARQL over RDF databases. Most of the existing techniques for processing regular expressions are designed for querying a text corpus, or only for supporting the matching over the paths in an RDF graph. Results In this paper, we propose a novel framework for supporting regular expression processing in SPARQL query. Our contributions can be summarized as follows. 1) We propose an efficient framework for processing SPARQL queries with regular expression patterns in RDF databases. 2) We propose a cost model in order to adapt the proposed framework in the existing query optimizers. 3) We build a prototype for the proposed framework in C++ and conduct extensive experiments demonstrating the efficiency and effectiveness of our technique. Conclusions Experiments with a full-blown RDF engine show that our framework outperforms the existing ones by up to two orders of magnitude in processing SPARQL queries with regular expression patterns. PMID:21489225

  19. Processing SPARQL queries with regular expressions in RDF databases.

    PubMed

    Lee, Jinsoo; Pham, Minh-Duc; Lee, Jihwan; Han, Wook-Shin; Cho, Hune; Yu, Hwanjo; Lee, Jeong-Hoon

    2011-03-29

    As the Resource Description Framework (RDF) data model is widely used for modeling and sharing a lot of online bioinformatics resources such as Uniprot (dev.isb-sib.ch/projects/uniprot-rdf) or Bio2RDF (bio2rdf.org), SPARQL - a W3C recommendation query for RDF databases - has become an important query language for querying the bioinformatics knowledge bases. Moreover, due to the diversity of users' requests for extracting information from the RDF data as well as the lack of users' knowledge about the exact value of each fact in the RDF databases, it is desirable to use the SPARQL query with regular expression patterns for querying the RDF data. To the best of our knowledge, there is currently no work that efficiently supports regular expression processing in SPARQL over RDF databases. Most of the existing techniques for processing regular expressions are designed for querying a text corpus, or only for supporting the matching over the paths in an RDF graph. In this paper, we propose a novel framework for supporting regular expression processing in SPARQL query. Our contributions can be summarized as follows. 1) We propose an efficient framework for processing SPARQL queries with regular expression patterns in RDF databases. 2) We propose a cost model in order to adapt the proposed framework in the existing query optimizers. 3) We build a prototype for the proposed framework in C++ and conduct extensive experiments demonstrating the efficiency and effectiveness of our technique. Experiments with a full-blown RDF engine show that our framework outperforms the existing ones by up to two orders of magnitude in processing SPARQL queries with regular expression patterns.

  20. Web-based Visualization and Query of semantically segmented multiresolution 3D Models in the Field of Cultural Heritage

    NASA Astrophysics Data System (ADS)

    Auer, M.; Agugiaro, G.; Billen, N.; Loos, L.; Zipf, A.

    2014-05-01

    Many important Cultural Heritage sites have been studied over long periods of time by different means of technical equipment, methods and intentions by different researchers. This has led to huge amounts of heterogeneous "traditional" datasets and formats. The rising popularity of 3D models in the field of Cultural Heritage in recent years has brought additional data formats and makes it even more necessary to find solutions to manage, publish and study these data in an integrated way. The MayaArch3D project aims to realize such an integrative approach by establishing a web-based research platform bringing spatial and non-spatial databases together and providing visualization and analysis tools. Especially the 3D components of the platform use hierarchical segmentation concepts to structure the data and to perform queries on semantic entities. This paper presents a database schema to organize not only segmented models but also different Levels-of-Details and other representations of the same entity. It is further implemented in a spatial database which allows the storing of georeferenced 3D data. This enables organization and queries by semantic, geometric and spatial properties. As service for the delivery of the segmented models a standardization candidate of the OpenGeospatialConsortium (OGC), the Web3DService (W3DS) has been extended to cope with the new database schema and deliver a web friendly format for WebGL rendering. Finally a generic user interface is presented which uses the segments as navigation metaphor to browse and query the semantic segmentation levels and retrieve information from an external database of the German Archaeological Institute (DAI).

  1. EarthServer: Use of Rasdaman as a data store for use in visualisation of complex EO data

    NASA Astrophysics Data System (ADS)

    Clements, Oliver; Walker, Peter; Grant, Mike

    2013-04-01

    The European Commission FP7 project EarthServer is establishing open access and ad-hoc analytics on extreme-size Earth Science data, based on and extending cutting-edge Array Database technology. EarthServer is built around the Rasdaman Raster Data Manager which extends standard relational database systems with the ability to store and retrieve multi-dimensional raster data of unlimited size through an SQL style query language. Rasdaman facilitates visualisation of data by providing several Open Geospatial Consortium (OGC) standard interfaces through its web services wrapper, Petascope. These include the well established standards, Web Coverage Service (WCS) and Web Map Service (WMS) as well as the emerging standard, Web Coverage Processing Service (WCPS). The WCPS standard allows the running of ad-hoc queries on the data stored within Rasdaman, creating an infrastructure where users are not restricted by bandwidth when manipulating or querying huge datasets. Here we will show that the use of EarthServer technologies and infrastructure allows access and visualisation of massive scale data through a web client with only marginal bandwidth use as opposed to the current mechanism of copying huge amounts of data to create visualisations locally. For example if a user wanted to generate a plot of global average chlorophyll for a complete decade time series they would only have to download the result instead of Terabytes of data. Firstly we will present a brief overview of the capabilities of Rasdaman and the WCPS query language to introduce the ways in which it is used in a visualisation tool chain. We will show that there are several ways in which WCPS can be utilised to create both standard and novel web based visualisations. An example of a standard visualisation is the production of traditional 2d plots, allowing users the ability to plot data products easily. However, the query language allows the creation of novel/custom products, which can then immediately be plotted with the same system. For more complex multi-spectral data, WCPS allows the user to explore novel combinations of bands in standard band-ratio algorithms through a web browser with dynamic updating of the resultant image. To visualise very large datasets Rasdaman has the capability to dynamically scale a dataset or query result so that it can be appraised quickly for use in later unscaled queries. All of these techniques are accessible through a web based GIS interface increasing the number of potential users of the system. Lastly we will show the advances in dynamic web based 3D visualisations being explored within the EarthServer project. By utilising the emerging declarative 3D web standard X3DOM as a tool to visualise the results of WCPS queries we introduce several possible benefits, including quick appraisal of data for outliers or anomalous data points and visualisation of the uncertainty of data alongside the actual data values.

  2. Cooperative answers in database systems

    NASA Technical Reports Server (NTRS)

    Gaasterland, Terry; Godfrey, Parke; Minker, Jack; Novik, Lev

    1993-01-01

    A major concern of researchers who seek to improve human-computer communication involves how to move beyond literal interpretations of queries to a level of responsiveness that takes the user's misconceptions, expectations, desires, and interests into consideration. At Maryland, we are investigating how to better meet a user's needs within the framework of the cooperative answering system of Gal and Minker. We have been exploring how to use semantic information about the database to formulate coherent and informative answers. The work has two main thrusts: (1) the construction of a logic formula which embodies the content of a cooperative answer; and (2) the presentation of the logic formula to the user in a natural language form. The information that is available in a deductive database system for building cooperative answers includes integrity constraints, user constraints, the search tree for answers to the query, and false presuppositions that are present in the query. The basic cooperative answering theory of Gal and Minker forms the foundation of a cooperative answering system that integrates the new construction and presentation methods. This paper provides an overview of the cooperative answering strategies used in the CARMIN cooperative answering system, an ongoing research effort at Maryland. Section 2 gives some useful background definitions. Section 3 describes techniques for collecting cooperative logical formulae. Section 4 discusses which natural language generation techniques are useful for presenting the logic formula in natural language text. Section 5 presents a diagram of the system.

  3. ClassLess: A Comprehensive Database of Young Stellar Objects

    NASA Astrophysics Data System (ADS)

    Hillenbrand, Lynne; Baliber, Nairn

    2015-01-01

    We have designed and constructed a database housing published measurements of Young Stellar Objects (YSOs) within ~1 kpc of the Sun. ClassLess, so called because it includes YSOs in all stages of evolution, is a relational database in which user interaction is conducted via HTML web browsers, queries are performed in scientific language, and all data are linked to the sources of publication. Each star is associated with a cluster (or clusters), and both spatially resolved and unresolved measurements are stored, allowing proper use of data from multiple star systems. With this fully searchable tool, myriad ground- and space-based instruments and surveys across wavelength regimes can be exploited. In addition to primary measurements, the database self consistently calculates and serves higher level data products such as extinction, luminosity, and mass. As a result, searches for young stars with specific physical characteristics can be completed with just a few mouse clicks.

  4. Research on sudden environmental pollution public service platform construction based on WebGIS

    NASA Astrophysics Data System (ADS)

    Bi, T. P.; Gao, D. Y.; Zhong, X. Y.

    2016-08-01

    In order to actualize the social sharing and service of the emergency-response information for sudden pollution accidents, the public can share the risk source information service, dangerous goods control technology service and so on, The SQL Server and ArcSDE software are used to establish a spatial database to restore all kinds of information including risk sources, hazardous chemicals and handling methods in case of accidents. Combined with Chinese atmospheric environmental assessment standards, the SCREEN3 atmospheric dispersion model and one-dimensional liquid diffusion model are established to realize the query of related information and the display of the diffusion effect under B/S structure. Based on the WebGIS technology, C#.Net language is used to develop the sudden environmental pollution public service platform. As a result, the public service platform can make risk assessments and provide the best emergency processing services.

  5. A Polyglot Approach to Bioinformatics Data Integration: A Phylogenetic Analysis of HIV-1

    PubMed Central

    Reisman, Steven; Hatzopoulos, Thomas; Läufer, Konstantin; Thiruvathukal, George K.; Putonti, Catherine

    2016-01-01

    As sequencing technologies continue to drop in price and increase in throughput, new challenges emerge for the management and accessibility of genomic sequence data. We have developed a pipeline for facilitating the storage, retrieval, and subsequent analysis of molecular data, integrating both sequence and metadata. Taking a polyglot approach involving multiple languages, libraries, and persistence mechanisms, sequence data can be aggregated from publicly available and local repositories. Data are exposed in the form of a RESTful web service, formatted for easy querying, and retrieved for downstream analyses. As a proof of concept, we have developed a resource for annotated HIV-1 sequences. Phylogenetic analyses were conducted for >6,000 HIV-1 sequences revealing spatial and temporal factors influence the evolution of the individual genes uniquely. Nevertheless, signatures of origin can be extrapolated even despite increased globalization. The approach developed here can easily be customized for any species of interest. PMID:26819543

  6. Location Prediction Based on Transition Probability Matrices Constructing from Sequential Rules for Spatial-Temporal K-Anonymity Dataset

    PubMed Central

    Liu, Zhao; Zhu, Yunhong; Wu, Chenxue

    2016-01-01

    Spatial-temporal k-anonymity has become a mainstream approach among techniques for protection of users’ privacy in location-based services (LBS) applications, and has been applied to several variants such as LBS snapshot queries and continuous queries. Analyzing large-scale spatial-temporal anonymity sets may benefit several LBS applications. In this paper, we propose two location prediction methods based on transition probability matrices constructing from sequential rules for spatial-temporal k-anonymity dataset. First, we define single-step sequential rules mined from sequential spatial-temporal k-anonymity datasets generated from continuous LBS queries for multiple users. We then construct transition probability matrices from mined single-step sequential rules, and normalize the transition probabilities in the transition matrices. Next, we regard a mobility model for an LBS requester as a stationary stochastic process and compute the n-step transition probability matrices by raising the normalized transition probability matrices to the power n. Furthermore, we propose two location prediction methods: rough prediction and accurate prediction. The former achieves the probabilities of arriving at target locations along simple paths those include only current locations, target locations and transition steps. By iteratively combining the probabilities for simple paths with n steps and the probabilities for detailed paths with n-1 steps, the latter method calculates transition probabilities for detailed paths with n steps from current locations to target locations. Finally, we conduct extensive experiments, and correctness and flexibility of our proposed algorithm have been verified. PMID:27508502

  7. Approximate Algorithms for Computing Spatial Distance Histograms with Accuracy Guarantees

    PubMed Central

    Grupcev, Vladimir; Yuan, Yongke; Tu, Yi-Cheng; Huang, Jin; Chen, Shaoping; Pandit, Sagar; Weng, Michael

    2014-01-01

    Particle simulation has become an important research tool in many scientific and engineering fields. Data generated by such simulations impose great challenges to database storage and query processing. One of the queries against particle simulation data, the spatial distance histogram (SDH) query, is the building block of many high-level analytics, and requires quadratic time to compute using a straightforward algorithm. Previous work has developed efficient algorithms that compute exact SDHs. While beating the naive solution, such algorithms are still not practical in processing SDH queries against large-scale simulation data. In this paper, we take a different path to tackle this problem by focusing on approximate algorithms with provable error bounds. We first present a solution derived from the aforementioned exact SDH algorithm, and this solution has running time that is unrelated to the system size N. We also develop a mathematical model to analyze the mechanism that leads to errors in the basic approximate algorithm. Our model provides insights on how the algorithm can be improved to achieve higher accuracy and efficiency. Such insights give rise to a new approximate algorithm with improved time/accuracy tradeoff. Experimental results confirm our analysis. PMID:24693210

  8. Multidimensional indexing structure for use with linear optimization queries

    NASA Technical Reports Server (NTRS)

    Bergman, Lawrence David (Inventor); Castelli, Vittorio (Inventor); Chang, Yuan-Chi (Inventor); Li, Chung-Sheng (Inventor); Smith, John Richard (Inventor)

    2002-01-01

    Linear optimization queries, which usually arise in various decision support and resource planning applications, are queries that retrieve top N data records (where N is an integer greater than zero) which satisfy a specific optimization criterion. The optimization criterion is to either maximize or minimize a linear equation. The coefficients of the linear equation are given at query time. Methods and apparatus are disclosed for constructing, maintaining and utilizing a multidimensional indexing structure of database records to improve the execution speed of linear optimization queries. Database records with numerical attributes are organized into a number of layers and each layer represents a geometric structure called convex hull. Such linear optimization queries are processed by searching from the outer-most layer of this multi-layer indexing structure inwards. At least one record per layer will satisfy the query criterion and the number of layers needed to be searched depends on the spatial distribution of records, the query-issued linear coefficients, and N, the number of records to be returned. When N is small compared to the total size of the database, answering the query typically requires searching only a small fraction of all relevant records, resulting in a tremendous speedup as compared to linearly scanning the entire dataset.

  9. Privacy-Preserving Location-Based Query Using Location Indexes and Parallel Searching in Distributed Networks

    PubMed Central

    Liu, Lei; Zhao, Jing

    2014-01-01

    An efficient location-based query algorithm of protecting the privacy of the user in the distributed networks is given. This algorithm utilizes the location indexes of the users and multiple parallel threads to search and select quickly all the candidate anonymous sets with more users and their location information with more uniform distribution to accelerate the execution of the temporal-spatial anonymous operations, and it allows the users to configure their custom-made privacy-preserving location query requests. The simulated experiment results show that the proposed algorithm can offer simultaneously the location query services for more users and improve the performance of the anonymous server and satisfy the anonymous location requests of the users. PMID:24790579

  10. Privacy-preserving location-based query using location indexes and parallel searching in distributed networks.

    PubMed

    Zhong, Cheng; Liu, Lei; Zhao, Jing

    2014-01-01

    An efficient location-based query algorithm of protecting the privacy of the user in the distributed networks is given. This algorithm utilizes the location indexes of the users and multiple parallel threads to search and select quickly all the candidate anonymous sets with more users and their location information with more uniform distribution to accelerate the execution of the temporal-spatial anonymous operations, and it allows the users to configure their custom-made privacy-preserving location query requests. The simulated experiment results show that the proposed algorithm can offer simultaneously the location query services for more users and improve the performance of the anonymous server and satisfy the anonymous location requests of the users.

  11. Toward An Unstructured Mesh Database

    NASA Astrophysics Data System (ADS)

    Rezaei Mahdiraji, Alireza; Baumann, Peter Peter

    2014-05-01

    Unstructured meshes are used in several application domains such as earth sciences (e.g., seismology), medicine, oceanography, cli- mate modeling, GIS as approximate representations of physical objects. Meshes subdivide a domain into smaller geometric elements (called cells) which are glued together by incidence relationships. The subdivision of a domain allows computational manipulation of complicated physical structures. For instance, seismologists model earthquakes using elastic wave propagation solvers on hexahedral meshes. The hexahedral con- tains several hundred millions of grid points and millions of hexahedral cells. Each vertex node in the hexahedrals stores a multitude of data fields. To run simulation on such meshes, one needs to iterate over all the cells, iterate over incident cells to a given cell, retrieve coordinates of cells, assign data values to cells, etc. Although meshes are used in many application domains, to the best of our knowledge there is no database vendor that support unstructured mesh features. Currently, the main tool for querying and manipulating unstructured meshes are mesh libraries, e.g., CGAL and GRAL. Mesh li- braries are dedicated libraries which includes mesh algorithms and can be run on mesh representations. The libraries do not scale with dataset size, do not have declarative query language, and need deep C++ knowledge for query implementations. Furthermore, due to high coupling between the implementations and input file structure, the implementations are less reusable and costly to maintain. A dedicated mesh database offers the following advantages: 1) declarative querying, 2) ease of maintenance, 3) hiding mesh storage structure from applications, and 4) transparent query optimization. To design a mesh database, the first challenge is to define a suitable generic data model for unstructured meshes. We proposed ImG-Complexes data model as a generic topological mesh data model which extends incidence graph model to multi-incidence relationships. We instrument ImG model with sets of optional and application-specific constraints which can be used to check validity of meshes for a specific class of object such as manifold, pseudo-manifold, and simplicial manifold. We conducted experiments to measure the performance of the graph database solution in processing mesh queries and compare it with GrAL mesh library and PostgreSQL database on synthetic and real mesh datasets. The experiments show that each system perform well on specific types of mesh queries, e.g., graph databases perform well on global path-intensive queries. In the future, we investigate database operations for the ImG model and design a mesh query language.

  12. EmptyHeaded: A Relational Engine for Graph Processing

    PubMed Central

    Aberger, Christopher R.; Tu, Susan; Olukotun, Kunle; Ré, Christopher

    2016-01-01

    There are two types of high-performance graph processing engines: low- and high-level engines. Low-level engines (Galois, PowerGraph, Snap) provide optimized data structures and computation models but require users to write low-level imperative code, hence ensuring that efficiency is the burden of the user. In high-level engines, users write in query languages like datalog (SociaLite) or SQL (Grail). High-level engines are easier to use but are orders of magnitude slower than the low-level graph engines. We present EmptyHeaded, a high-level engine that supports a rich datalog-like query language and achieves performance comparable to that of low-level engines. At the core of EmptyHeaded’s design is a new class of join algorithms that satisfy strong theoretical guarantees but have thus far not achieved performance comparable to that of specialized graph processing engines. To achieve high performance, EmptyHeaded introduces a new join engine architecture, including a novel query optimizer and data layouts that leverage single-instruction multiple data (SIMD) parallelism. With this architecture, EmptyHeaded outperforms high-level approaches by up to three orders of magnitude on graph pattern queries, PageRank, and Single-Source Shortest Paths (SSSP) and is an order of magnitude faster than many low-level baselines. We validate that EmptyHeaded competes with the best-of-breed low-level engine (Galois), achieving comparable performance on PageRank and at most 3× worse performance on SSSP. PMID:28077912

  13. Parents' Spatial Language Mediates a Sex Difference in Preschoolers' Spatial-Language Use.

    PubMed

    Pruden, Shannon M; Levine, Susan C

    2017-11-01

    Do boys produce more terms than girls to describe the spatial world-that is, dimensional adjectives (e.g., big, little, tall, short), shape terms (e.g., circle, square), and words describing spatial features and properties (e.g., bent, curvy, edge)? If a sex difference in children's spatial-language use exists, is it related to the spatial language that parents use when interacting with children? We longitudinally tracked the development of spatial-language production in children between the ages of 14 and 46 months in a diverse sample of 58 parent-child dyads interacting in their homes. Boys produced and heard more of these three categories of spatial words, which we call "what" spatial types (i.e., unique "what" spatial words), but not more of all other word types, than girls. Mediation analysis revealed that sex differences in children's spatial talk at 34 to 46 months of age were fully mediated by parents' earlier spatial-language use, when children were 14 to 26 months old, time points at which there was no sex difference in children's spatial-language use.

  14. Spatial language facilitates spatial cognition: Evidence from children who lack language input

    PubMed Central

    Gentner, Dedre; Özyürek, Asli; Gürcanli, Özge; Goldin-Meadow, Susan

    2013-01-01

    Does spatial language influence how people think about space? To address this question, we observed children who did not know a conventional language, and tested their performance on nonlinguistic spatial tasks. We studied deaf children living in Istanbul whose hearing losses prevented them from acquiring speech and whose hearing parents had not exposed them to sign. Lacking a conventional language, the children used gestures, called homesigns, to communicate. In Study 1, we asked whether homesigners used gesture to convey spatial relations, and found that they did not. In Study 2, we tested a new group of homesigners on a spatial mapping task, and found that they performed significantly worse than hearing Turkish children who were matched to the deaf children on another cognitive task. The absence of spatial language thus went hand-in-hand with poor performance on the nonlinguistic spatial task, pointing to the importance of spatial language in thinking about space. PMID:23542409

  15. Spatial coding-based approach for partitioning big spatial data in Hadoop

    NASA Astrophysics Data System (ADS)

    Yao, Xiaochuang; Mokbel, Mohamed F.; Alarabi, Louai; Eldawy, Ahmed; Yang, Jianyu; Yun, Wenju; Li, Lin; Ye, Sijing; Zhu, Dehai

    2017-09-01

    Spatial data partitioning (SDP) plays a powerful role in distributed storage and parallel computing for spatial data. However, due to skew distribution of spatial data and varying volume of spatial vector objects, it leads to a significant challenge to ensure both optimal performance of spatial operation and data balance in the cluster. To tackle this problem, we proposed a spatial coding-based approach for partitioning big spatial data in Hadoop. This approach, firstly, compressed the whole big spatial data based on spatial coding matrix to create a sensing information set (SIS), including spatial code, size, count and other information. SIS was then employed to build spatial partitioning matrix, which was used to spilt all spatial objects into different partitions in the cluster finally. Based on our approach, the neighbouring spatial objects can be partitioned into the same block. At the same time, it also can minimize the data skew in Hadoop distributed file system (HDFS). The presented approach with a case study in this paper is compared against random sampling based partitioning, with three measurement standards, namely, the spatial index quality, data skew in HDFS, and range query performance. The experimental results show that our method based on spatial coding technique can improve the query performance of big spatial data, as well as the data balance in HDFS. We implemented and deployed this approach in Hadoop, and it is also able to support efficiently any other distributed big spatial data systems.

  16. XML at the ADC: Steps to a Next Generation Data Archive

    NASA Astrophysics Data System (ADS)

    Shaya, E.; Blackwell, J.; Gass, J.; Oliversen, N.; Schneider, G.; Thomas, B.; Cheung, C.; White, R. A.

    1999-05-01

    The eXtensible Markup Language (XML) is a document markup language that allows users to specify their own tags, to create hierarchical structures to qualify their data, and to support automatic checking of documents for structural validity. It is being intensively supported by nearly every major corporate software developer. Under the funds of a NASA AISRP proposal, the Astronomical Data Center (ADC, http://adc.gsfc.nasa.gov) is developing an infrastructure for importation, enhancement, and distribution of data and metadata using XML as the document markup language. We discuss the preliminary Document Type Definition (DTD, at http://adc.gsfc.nasa.gov/xml) which specifies the elements and their attributes in our metadata documents. This attempts to define both the metadata of an astronomical catalog and the `header' information of an astronomical table. In addition, we give an overview of the planned flow of data through automated pipelines from authors and journal presses into our XML archive and retrieval through the web via the XML-QL Query Language and eXtensible Style Language (XSL) scripts. When completed, the catalogs and journal tables at the ADC will be tightly hyperlinked to enhance data discovery. In addition one will be able to search on fragmentary information. For instance, one could query for a table by entering that the second author is so-and-so or that the third author is at such-and-such institution.

  17. Natural language information retrieval in digital libraries

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Strzalkowski, T.; Perez-Carballo, J.; Marinescu, M.

    In this paper we report on some recent developments in joint NYU and GE natural language information retrieval system. The main characteristic of this system is the use of advanced natural language processing to enhance the effectiveness of term-based document retrieval. The system is designed around a traditional statistical backbone consisting of the indexer module, which builds inverted index files from pre-processed documents, and a retrieval engine which searches and ranks the documents in response to user queries. Natural language processing is used to (1) preprocess the documents in order to extract content-carrying terms, (2) discover inter-term dependencies and buildmore » a conceptual hierarchy specific to the database domain, and (3) process user`s natural language requests into effective search queries. This system has been used in NIST-sponsored Text Retrieval Conferences (TREC), where we worked with approximately 3.3 GBytes of text articles including material from the Wall Street Journal, the Associated Press newswire, the Federal Register, Ziff Communications`s Computer Library, Department of Energy abstracts, U.S. Patents and the San Jose Mercury News, totaling more than 500 million words of English. The system have been designed to facilitate its scalability to deal with ever increasing amounts of data. In particular, a randomized index-splitting mechanism has been installed which allows the system to create a number of smaller indexes that can be independently and efficiently searched.« less

  18. Query2Question: Translating Visualization Interaction into Natural Language.

    PubMed

    Nafari, Maryam; Weaver, Chris

    2015-06-01

    Richly interactive visualization tools are increasingly popular for data exploration and analysis in a wide variety of domains. Existing systems and techniques for recording provenance of interaction focus either on comprehensive automated recording of low-level interaction events or on idiosyncratic manual transcription of high-level analysis activities. In this paper, we present the architecture and translation design of a query-to-question (Q2Q) system that automatically records user interactions and presents them semantically using natural language (written English). Q2Q takes advantage of domain knowledge and uses natural language generation (NLG) techniques to translate and transcribe a progression of interactive visualization states into a visual log of styled text that complements and effectively extends the functionality of visualization tools. We present Q2Q as a means to support a cross-examination process in which questions rather than interactions are the focus of analytic reasoning and action. We describe the architecture and implementation of the Q2Q system, discuss key design factors and variations that effect question generation, and present several visualizations that incorporate Q2Q for analysis in a variety of knowledge domains.

  19. Internet Distribution of Spacecraft Telemetry Data

    NASA Technical Reports Server (NTRS)

    Specht, Ted; Noble, David

    2006-01-01

    Remote Access Multi-mission Processing and Analysis Ground Environment (RAMPAGE) is a Java-language server computer program that enables near-real-time display of spacecraft telemetry data on any authorized client computer that has access to the Internet and is equipped with Web-browser software. In addition to providing a variety of displays of the latest available telemetry data, RAMPAGE can deliver notification of an alarm by electronic mail. Subscribers can then use RAMPAGE displays to determine the state of the spacecraft and formulate a response to the alarm, if necessary. A user can query spacecraft mission data in either binary or comma-separated-value format by use of a Web form or a Practical Extraction and Reporting Language (PERL) script to automate the query process. RAMPAGE runs on Linux and Solaris server computers in the Ground Data System (GDS) of NASA's Jet Propulsion Laboratory and includes components designed specifically to make it compatible with legacy GDS software. The client/server architecture of RAMPAGE and the use of the Java programming language make it possible to utilize a variety of competitive server and client computers, thereby also helping to minimize costs.

  20. The crustal dynamics intelligent user interface anthology

    NASA Technical Reports Server (NTRS)

    Short, Nicholas M., Jr.; Campbell, William J.; Roelofs, Larry H.; Wattawa, Scott L.

    1987-01-01

    The National Space Science Data Center (NSSDC) has initiated an Intelligent Data Management (IDM) research effort which has, as one of its components, the development of an Intelligent User Interface (IUI). The intent of the IUI is to develop a friendly and intelligent user interface service based on expert systems and natural language processing technologies. The purpose of such a service is to support the large number of potential scientific and engineering users that have need of space and land-related research and technical data, but have little or no experience in query languages or understanding of the information content or architecture of the databases of interest. This document presents the design concepts, development approach and evaluation of the performance of a prototype IUI system for the Crustal Dynamics Project Database, which was developed using a microcomputer-based expert system tool (M. 1), the natural language query processor THEMIS, and the graphics software system GSS. The IUI design is based on a multiple view representation of a database from both the user and database perspective, with intelligent processes to translate between the views.

  1. Design, Development and Utilization Perspectives on Database Management Systems

    ERIC Educational Resources Information Center

    Shneiderman, Ben

    1977-01-01

    This paper reviews the historical development of integrated data base management systems and examines competing approaches. Topics include management and utilization, implementation and design, query languages, security, integrity, privacy and concurrency. (Author/KP)

  2. Use of Statistical Estimators as Virtual Observatory Search ParametersEnabling Access to Solar and Planetary Resources through the Virtual Observatory

    NASA Astrophysics Data System (ADS)

    Merka, J.; Dolan, C. F.

    2015-12-01

    Finding and retrieving space physics data is often a complicated taskeven for publicly available data sets: Thousands of relativelysmall and many large data sets are stored in various formats and, inthe better case, accompanied by at least some documentation. VirtualHeliospheric and Magnetospheric Observatories (VHO and VMO) help researches by creating a single point of uniformdiscovery, access, and use of heliospheric (VHO) and magnetospheric(VMO) data.The VMO and VHO functionality relies on metadata expressed using theSPASE data model. This data model is developed by the SPASE WorkingGroup which is currently the only international group supporting globaldata management for Solar and Space Physics. The two Virtual Observatories(VxOs) have initiated and lead a development of a SPASE-related standardnamed SPASE Query Language for provided a standard way of submittingqueries and receiving results.The VMO and VHO use SPASE and SPASEQL for searches based on various criteria such as, for example, spatial location, time of observation, measurement type, parameter values, etc. The parameter values are represented by their statisticalestimators calculated typically over 10-minute intervals: mean, median, standard deviation, minimum, and maximum. The use of statistical estimatorsenables science driven data queries that simplify and shorten the effort tofind where and/or how often the sought phenomenon is observed, as we will present.

  3. Bilateral parietal contributions to spatial language.

    PubMed

    Conder, Julie; Fridriksson, Julius; Baylis, Gordon C; Smith, Cameron M; Boiteau, Timothy W; Almor, Amit

    2017-01-01

    It is commonly held that language is largely lateralized to the left hemisphere in most individuals, whereas spatial processing is associated with right hemisphere regions. In recent years, a number of neuroimaging studies have yielded conflicting results regarding the role of language and spatial processing areas in processing language about space (e.g., Carpenter, Just, Keller, Eddy, & Thulborn, 1999; Damasio et al., 2001). In the present study, we used sparse scanning event-related functional magnetic resonance imaging (fMRI) to investigate the neural correlates of spatial language, that is; language used to communicate the spatial relationship of one object to another. During scanning, participants listened to sentences about object relationships that were either spatial or non-spatial in nature (color or size relationships). Sentences describing spatial relationships elicited more activation in the superior parietal lobule and precuneus bilaterally in comparison to sentences describing size or color relationships. Activation of the precuneus suggests that spatial sentences elicit spatial-mental imagery, while the activation of the SPL suggests sentences containing spatial language involve integration of two distinct sets of information - linguistic and spatial. Copyright © 2016 Elsevier Inc. All rights reserved.

  4. Encouraging Spatial Talk: Using Children's Museums to Bolster Spatial Reasoning

    ERIC Educational Resources Information Center

    Polinsky, Naomi; Perez, Jasmin; Grehl, Mora; McCrink, Koleen

    2017-01-01

    Longitudinal spatial language intervention studies have shown that greater exposure to spatial language improves children's performance on spatial tasks. Can short naturalistic, spatial language interactions also evoke improved spatial performance? In this study, parents were asked to interact with their child at a block wall exhibit in a…

  5. Exploration as a mediator of the relation between the attainment of motor milestones and the development of spatial cognition and spatial language.

    PubMed

    Oudgenoeg-Paz, Ora; Leseman, Paul P M; Volman, M Chiel J M

    2015-09-01

    The embodied-cognition approach views cognition and language as grounded in daily sensorimotor child-environment interactions. Therefore, the attainment of motor milestones is expected to play a role in cognitive-linguistic development. Early attainment of unsupported sitting and independent walking indeed predict better spatial cognition and language at later ages. However, evidence linking these milestones with the development of spatial language and evidence regarding factors that might mediate this relation are scarce. The current study examined whether exploration of spatial-relational object properties (e.g., the possibility of containing or stacking) and exploration of the space through self-locomotion mediate the effect of, respectively, age of sitting and age of walking on spatial cognition and spatial language. Thus, we hypothesized that an earlier age of sitting and walking predicts, respectively, higher levels of spatial-relational object exploration and exploration through self-locomotion, which in turn, predict better spatial cognition and spatial language at later ages. Fifty-nine Dutch children took part in a longitudinal study. A combination of tests, observations, and parental reports was used to measure motor development, exploratory behavior (age 20 months), spatial memory (age 24 months), spatial processing (age 32 months), and spatial language (age 36 months). Results show that attainment of sitting predicted spatial memory and spatial language, but spatial-relational object exploration did not mediate these effects. Attainment of independent walking predicted spatial processing and spatial language, and exploration through self-locomotion (partially) mediated these relations. These findings extend previous work and provide partial support for the hypotheses about the mediating role of exploration. (c) 2015 APA, all rights reserved).

  6. TOMML: A Rule Language for Structured Data

    NASA Astrophysics Data System (ADS)

    Cirstea, Horatiu; Moreau, Pierre-Etienne; Reilles, Antoine

    We present the TOM language that extends JAVA with the purpose of providing high level constructs inspired by the rewriting community. TOM bridges thus the gap between a general purpose language and high level specifications based on rewriting. This approach was motivated by the promotion of rule based techniques and their integration in large scale applications. Powerful matching capabilities along with a rich strategy language are among TOM's strong features that make it easy to use and competitive with respect to other rule based languages. TOM is thus a natural choice for querying and transforming structured data and in particular XML documents [1]. We present here its main XML oriented features and illustrate its use on several examples.

  7. Conceptual Modeling via Logic Programming

    DTIC Science & Technology

    1990-01-01

    Define User Interface and Query Language L i1W= Ltl k.l 4. Define Procedures for Specifying Output S . Select Logic Programming Language 6. Develop ...baseline s change model. sessions and baselines. It was changed 6. Develop Methodology for C 31 Users. considerably with the advent of the window This...Model Development : Implica- for Conceptual Modeling Via Logic tions for Communications of a Cognitive Programming. Marina del Rey, Calif.: Analysis of

  8. Usability Evaluation of NLP-PIER: A Clinical Document Search Engine for Researchers.

    PubMed

    Hultman, Gretchen; McEwan, Reed; Pakhomov, Serguei; Lindemann, Elizabeth; Skube, Steven; Melton, Genevieve B

    2017-01-01

    NLP-PIER (Natural Language Processing - Patient Information Extraction for Research) is a self-service platform with a search engine for clinical researchers to perform natural language processing (NLP) queries using clinical notes. We conducted user-centered testing of NLP-PIER's usability to inform future design decisions. Quantitative and qualitative data were analyzed. Our findings will be used to improve the usability of NLP-PIER.

  9. Catalogue of HI PArameters (CHIPA)

    NASA Astrophysics Data System (ADS)

    Saponara, J.; Benaglia, P.; Koribalski, B.; Andruchow, I.

    2015-08-01

    The catalogue of HI parameters of galaxies HI (CHIPA) is the natural continuation of the compilation by M.C. Martin in 1998. CHIPA provides the most important parameters of nearby galaxies derived from observations of the neutral Hydrogen line. The catalogue contains information of 1400 galaxies across the sky and different morphological types. Parameters like the optical diameter of the galaxy, the blue magnitude, the distance, morphological type, HI extension are listed among others. Maps of the HI distribution, velocity and velocity dispersion can also be display for some cases. The main objective of this catalogue is to facilitate the bibliographic queries, through searching in a database accessible from the internet that will be available in 2015 (the website is under construction). The database was built using the open source `` mysql (SQL, Structured Query Language, management system relational database) '', while the website was built with ''HTML (Hypertext Markup Language)'' and ''PHP (Hypertext Preprocessor)''.

  10. Usability Evaluation of an Unstructured Clinical Document Query Tool for Researchers.

    PubMed

    Hultman, Gretchen; McEwan, Reed; Pakhomov, Serguei; Lindemann, Elizabeth; Skube, Steven; Melton, Genevieve B

    2018-01-01

    Natural Language Processing - Patient Information Extraction for Researchers (NLP-PIER) was developed for clinical researchers for self-service Natural Language Processing (NLP) queries with clinical notes. This study was to conduct a user-centered analysis with clinical researchers to gain insight into NLP-PIER's usability and to gain an understanding of the needs of clinical researchers when using an application for searching clinical notes. Clinical researcher participants (n=11) completed tasks using the system's two existing search interfaces and completed a set of surveys and an exit interview. Quantitative data including time on task, task completion rate, and survey responses were collected. Interviews were analyzed qualitatively. Survey scores, time on task and task completion proportions varied widely. Qualitative analysis indicated that participants found the system to be useful and usable in specific projects. This study identified several usability challenges and our findings will guide the improvement of NLP-PIER 's interfaces.

  11. Intelligent search in Big Data

    NASA Astrophysics Data System (ADS)

    Birialtsev, E.; Bukharaev, N.; Gusenkov, A.

    2017-10-01

    An approach to data integration, aimed on the ontology-based intelligent search in Big Data, is considered in the case when information objects are represented in the form of relational databases (RDB), structurally marked by their schemes. The source of information for constructing an ontology and, later on, the organization of the search are texts in natural language, treated as semi-structured data. For the RDBs, these are comments on the names of tables and their attributes. Formal definition of RDBs integration model in terms of ontologies is given. Within framework of the model universal RDB representation ontology, oil production subject domain ontology and linguistic thesaurus of subject domain language are built. Technique of automatic SQL queries generation for subject domain specialists is proposed. On the base of it, information system for TATNEFT oil-producing company RDBs was implemented. Exploitation of the system showed good relevance with majority of queries.

  12. Resource Needs and Pedagogical Value of Web Mapping for Spatial Thinking

    ERIC Educational Resources Information Center

    Manson, Steven; Shannon, Jerry; Eria, Sami; Kne, Len; Dyke, Kevin; Nelson, Sara; Batra, Lalit; Bonsal, Dudley; Kernik, Melinda; Immich, Jennifer; Matson, Laura

    2014-01-01

    Web mapping involves publishing and using maps via the Internet, and can range from presenting static maps to offering dynamic data querying and spatial analysis. Web mapping is seen as a promising way to support development of spatial thinking in the classroom but there are unanswered questions about how this promise plays out in reality. This…

  13. Deductive Coordination of Multiple Geospatial Knowledge Sources

    NASA Astrophysics Data System (ADS)

    Waldinger, R.; Reddy, M.; Culy, C.; Hobbs, J.; Jarvis, P.; Dungan, J. L.

    2002-12-01

    Deductive inference is applied to choreograph the cooperation of multiple knowledge sources to respond to geospatial queries. When no one source can provide an answer, the response may be deduced from pieces of the answer provided by many sources. Examples of sources include (1) The Alexandria Digital Library Gazetteer, a repository that gives the locations for almost six million place names, (2) The Cia World Factbook, an online almanac with basic information about more than 200 countries. (3) The SRI TerraVision 3D Terrain Visualization System, which displays a flight-simulator-like interactive display of geographic data held in a database, (4) The NASA GDACC WebGIS client for searching satellite and other geographic data available through OpenGIS Consortium (OGC) Web Map Servers, and (5) The Northern Arizona University Latitude/Longitude Distance Calculator. Queries are phrased in English and are translated into logical theorems by the Gemini Natural Language Parser. The theorems are proved by SNARK, a first-order-logic theorem prover, in the context of an axiomatic geospatial theory. The theory embodies a representational scheme that takes into account the fact that the same place may have many names, and the same name may refer to many places. SNARK has built-in procedures (RCC8 and the Allen calculus, respectively) for reasoning about spatial and temporal concepts. External knowledge sources may be consulted by SNARK as the proof is in progress, so that most knowledge need not be stored axiomatically. The Open Agent Architecture (OAA) facilitates communication between sources that may be implemented on different machines in different computer languages. An answer to the query, in the form of text or an image, is extracted from the proof. Currently, three-dimensional images are displayed by TerraVision but other displays are possible. The combined system is called Geo-Logica. Some example queries that can be handled by Geo-Logica include: (1) show the petrified forests in Oregon north of Portland, (2) show the lake in Argentina with the highest elevation, and (3) Show the IGPB land cover classification, derived using MODIS, of Montana for July, 2000. Use of a theorem prover allows sources to cooperate even if they adapt different notational conventions and representation schemes and have never been designed to work together. New sources can be added without reprogramming the system, by providing axioms that advertise their capabilities. Future directions include entering into a dialogue with the user to clarify ambiguities, elaborate on previous questions, or provide new information necessary to answer the question. In addition, of particular interest is to deal with temporally varying data, with answers displayed as animated images.

  14. Development of a web-based video management and application processing system

    NASA Astrophysics Data System (ADS)

    Chan, Shermann S.; Wu, Yi; Li, Qing; Zhuang, Yueting

    2001-07-01

    How to facilitate efficient video manipulation and access in a web-based environment is becoming a popular trend for video applications. In this paper, we present a web-oriented video management and application processing system, based on our previous work on multimedia database and content-based retrieval. In particular, we extend the VideoMAP architecture with specific web-oriented mechanisms, which include: (1) Concurrency control facilities for the editing of video data among different types of users, such as Video Administrator, Video Producer, Video Editor, and Video Query Client; different users are assigned various priority levels for different operations on the database. (2) Versatile video retrieval mechanism which employs a hybrid approach by integrating a query-based (database) mechanism with content- based retrieval (CBR) functions; its specific language (CAROL/ST with CBR) supports spatio-temporal semantics of video objects, and also offers an improved mechanism to describe visual content of videos by content-based analysis method. (3) Query profiling database which records the `histories' of various clients' query activities; such profiles can be used to provide the default query template when a similar query is encountered by the same kind of users. An experimental prototype system is being developed based on the existing VideoMAP prototype system, using Java and VC++ on the PC platform.

  15. A geo-spatial data management system for potentially active volcanoes—GEOWARN project

    NASA Astrophysics Data System (ADS)

    Gogu, Radu C.; Dietrich, Volker J.; Jenny, Bernhard; Schwandner, Florian M.; Hurni, Lorenz

    2006-02-01

    Integrated studies of active volcanic systems for the purpose of long-term monitoring and forecast and short-term eruption prediction require large numbers of data-sets from various disciplines. A modern database concept has been developed for managing and analyzing multi-disciplinary volcanological data-sets. The GEOWARN project (choosing the "Kos-Yali-Nisyros-Tilos volcanic field, Greece" and the "Campi Flegrei, Italy" as test sites) is oriented toward potentially active volcanoes situated in regions of high geodynamic unrest. This article describes the volcanological database of the spatial and temporal data acquired within the GEOWARN project. As a first step, a spatial database embedded in a Geographic Information System (GIS) environment was created. Digital data of different spatial resolution, and time-series data collected at different intervals or periods, were unified in a common, four-dimensional representation of space and time. The database scheme comprises various information layers containing geographic data (e.g. seafloor and land digital elevation model, satellite imagery, anthropogenic structures, land-use), geophysical data (e.g. from active and passive seismicity, gravity, tomography, SAR interferometry, thermal imagery, differential GPS), geological data (e.g. lithology, structural geology, oceanography), and geochemical data (e.g. from hydrothermal fluid chemistry and diffuse degassing features). As a second step based on the presented database, spatial data analysis has been performed using custom-programmed interfaces that execute query scripts resulting in a graphical visualization of data. These query tools were designed and compiled following scenarios of known "behavior" patterns of dormant volcanoes and first candidate signs of potential unrest. The spatial database and query approach is intended to facilitate scientific research on volcanic processes and phenomena, and volcanic surveillance.

  16. Application based on ArcObject inquiry and Google maps demonstration to real estate database

    NASA Astrophysics Data System (ADS)

    Hwang, JinTsong

    2007-06-01

    Real estate industry in Taiwan has been flourishing in recent years. To acquire various and abundant information of real estate for sale is the same goal for the consumers and the brokerages. Therefore, before looking at the property, it is important to get all pertinent information possible. Not only this beneficial for the real estate agent as they can provide the sellers with the most information, thereby solidifying the interest of the buyer, but may also save time and the cost of manpower were something out of place. Most of the brokerage sites are aware of utilizes Internet as form of media for publicity however; the contents are limited to specific property itself and the functions of query are mostly just provided searching by condition. This paper proposes a query interface on website which gives function of zone query by spatial analysis for non-GIS users, developing a user-friendly interface with ArcObject in VB6, and query by condition. The inquiry results can show on the web page which is embedded functions of Google Maps and the UrMap API on it. In addition, the demonstration of inquiry results will give the multimedia present way which includes hyperlink to Google Earth with surrounding of the property, the Virtual Reality scene of house, panorama of interior of building and so on. Therefore, the website provides extra spatial solution for query and demonstration abundant information of real estate in two-dimensional and three-dimensional types of view.

  17. Modeling and query the uncertainty of network constrained moving objects based on RFID data

    NASA Astrophysics Data System (ADS)

    Han, Liang; Xie, Kunqing; Ma, Xiujun; Song, Guojie

    2007-06-01

    The management of network constrained moving objects is more and more practical, especially in intelligent transportation system. In the past, the location information of moving objects on network is collected by GPS, which cost high and has the problem of frequent update and privacy. The RFID (Radio Frequency IDentification) devices are used more and more widely to collect the location information. They are cheaper and have less update. And they interfere in the privacy less. They detect the id of the object and the time when moving object passed by the node of the network. They don't detect the objects' exact movement in side the edge, which lead to a problem of uncertainty. How to modeling and query the uncertainty of the network constrained moving objects based on RFID data becomes a research issue. In this paper, a model is proposed to describe the uncertainty of network constrained moving objects. A two level index is presented to provide efficient access to the network and the data of movement. The processing of imprecise time-slice query and spatio-temporal range query are studied in this paper. The processing includes four steps: spatial filter, spatial refinement, temporal filter and probability calculation. Finally, some experiments are done based on the simulated data. In the experiments the performance of the index is studied. The precision and recall of the result set are defined. And how the query arguments affect the precision and recall of the result set is also discussed.

  18. Modal Logics with Counting

    NASA Astrophysics Data System (ADS)

    Areces, Carlos; Hoffmann, Guillaume; Denis, Alexandre

    We present a modal language that includes explicit operators to count the number of elements that a model might include in the extension of a formula, and we discuss how this logic has been previously investigated under different guises. We show that the language is related to graded modalities and to hybrid logics. We illustrate a possible application of the language to the treatment of plural objects and queries in natural language. We investigate the expressive power of this logic via bisimulations, discuss the complexity of its satisfiability problem, define a new reasoning task that retrieves the cardinality bound of the extension of a given input formula, and provide an algorithm to solve it.

  19. The relation between working memory and language comprehension in signers and speakers.

    PubMed

    Emmorey, Karen; Giezen, Marcel R; Petrich, Jennifer A F; Spurgeon, Erin; O'Grady Farnady, Lucinda

    2017-06-01

    This study investigated the relation between linguistic and spatial working memory (WM) resources and language comprehension for signed compared to spoken language. Sign languages are both linguistic and visual-spatial, and therefore provide a unique window on modality-specific versus modality-independent contributions of WM resources to language processing. Deaf users of American Sign Language (ASL), hearing monolingual English speakers, and hearing ASL-English bilinguals completed several spatial and linguistic serial recall tasks. Additionally, their comprehension of spatial and non-spatial information in ASL and spoken English narratives was assessed. Results from the linguistic serial recall tasks revealed that the often reported advantage for speakers on linguistic short-term memory tasks does not extend to complex WM tasks with a serial recall component. For English, linguistic WM predicted retention of non-spatial information, and both linguistic and spatial WM predicted retention of spatial information. For ASL, spatial WM predicted retention of spatial (but not non-spatial) information, and linguistic WM did not predict retention of either spatial or non-spatial information. Overall, our findings argue against strong assumptions of independent domain-specific subsystems for the storage and processing of linguistic and spatial information and furthermore suggest a less important role for serial encoding in signed than spoken language comprehension. Copyright © 2017 Elsevier B.V. All rights reserved.

  20. Heterogeneous database integration in biomedicine.

    PubMed

    Sujansky, W

    2001-08-01

    The rapid expansion of biomedical knowledge, reduction in computing costs, and spread of internet access have created an ocean of electronic data. The decentralized nature of our scientific community and healthcare system, however, has resulted in a patchwork of diverse, or heterogeneous, database implementations, making access to and aggregation of data across databases very difficult. The database heterogeneity problem applies equally to clinical data describing individual patients and biological data characterizing our genome. Specifically, databases are highly heterogeneous with respect to the data models they employ, the data schemas they specify, the query languages they support, and the terminologies they recognize. Heterogeneous database systems attempt to unify disparate databases by providing uniform conceptual schemas that resolve representational heterogeneities, and by providing querying capabilities that aggregate and integrate distributed data. Research in this area has applied a variety of database and knowledge-based techniques, including semantic data modeling, ontology definition, query translation, query optimization, and terminology mapping. Existing systems have addressed heterogeneous database integration in the realms of molecular biology, hospital information systems, and application portability.

  1. Children's Spatial Thinking: Does Talk about the Spatial World Matter?

    ERIC Educational Resources Information Center

    Pruden, Shannon M.; Levine, Susan C.; Huttenlocher, Janellen

    2011-01-01

    In this paper we examine the relations between parent spatial language input, children's own production of spatial language, and children's later spatial abilities. Using a longitudinal study design, we coded the use of spatial language (i.e. words describing the spatial features and properties of objects; e.g. big, tall, circle, curvy, edge) from…

  2. EMAP and EMAGE: a framework for understanding spatially organized data.

    PubMed

    Baldock, Richard A; Bard, Jonathan B L; Burger, Albert; Burton, Nicolas; Christiansen, Jeff; Feng, Guanjie; Hill, Bill; Houghton, Derek; Kaufman, Matthew; Rao, Jianguo; Sharpe, James; Ross, Allyson; Stevenson, Peter; Venkataraman, Shanmugasundaram; Waterhouse, Andrew; Yang, Yiya; Davidson, Duncan R

    2003-01-01

    The Edinburgh MouseAtlas Project (EMAP) is a time-series of mouse-embryo volumetric models. The models provide a context-free spatial framework onto which structural interpretations and experimental data can be mapped. This enables collation, comparison, and query of complex spatial patterns with respect to each other and with respect to known or hypothesized structure. The atlas also includes a time-dependent anatomical ontology and mapping between the ontology and the spatial models in the form of delineated anatomical regions or tissues. The models provide a natural, graphical context for browsing and visualizing complex data. The Edinburgh Mouse Atlas Gene-Expression Database (EMAGE) is one of the first applications of the EMAP framework and provides a spatially mapped gene-expression database with associated tools for data mapping, submission, and query. In this article, we describe the underlying principles of the Atlas and the gene-expression database, and provide a practical introduction to the use of the EMAP and EMAGE tools, including use of new techniques for whole body gene-expression data capture and mapping.

  3. A distributed query execution engine of big attributed graphs.

    PubMed

    Batarfi, Omar; Elshawi, Radwa; Fayoumi, Ayman; Barnawi, Ahmed; Sakr, Sherif

    2016-01-01

    A graph is a popular data model that has become pervasively used for modeling structural relationships between objects. In practice, in many real-world graphs, the graph vertices and edges need to be associated with descriptive attributes. Such type of graphs are referred to as attributed graphs. G-SPARQL has been proposed as an expressive language, with a centralized execution engine, for querying attributed graphs. G-SPARQL supports various types of graph querying operations including reachability, pattern matching and shortest path where any G-SPARQL query may include value-based predicates on the descriptive information (attributes) of the graph edges/vertices in addition to the structural predicates. In general, a main limitation of centralized systems is that their vertical scalability is always restricted by the physical limits of computer systems. This article describes the design, implementation in addition to the performance evaluation of DG-SPARQL, a distributed, hybrid and adaptive parallel execution engine of G-SPARQL queries. In this engine, the topology of the graph is distributed over the main memory of the underlying nodes while the graph data are maintained in a relational store which is replicated on the disk of each of the underlying nodes. DG-SPARQL evaluates parts of the query plan via SQL queries which are pushed to the underlying relational stores while other parts of the query plan, as necessary, are evaluated via indexless memory-based graph traversal algorithms. Our experimental evaluation shows the efficiency and the scalability of DG-SPARQL on querying massive attributed graph datasets in addition to its ability to outperform the performance of Apache Giraph, a popular distributed graph processing system, by orders of magnitudes.

  4. Relational Algebra in Spatial Decision Support Systems Ontologies.

    PubMed

    Diomidous, Marianna; Chardalias, Kostis; Koutonias, Panagiotis; Magnita, Adrianna; Andrianopoulos, Charalampos; Zimeras, Stelios; Mechili, Enkeleint Aggelos

    2017-01-01

    Decision Support Systems (DSS) is a powerful tool, for facilitates researchers to choose the correct decision based on their final results. Especially in medical cases where doctors could use these systems, to overcome the problem with the clinical misunderstanding. Based on these systems, queries must be constructed based on the particular questions that doctors must answer. In this work, combination between questions and queries would be presented via relational algebra.

  5. An Automated Approach to Reasoning Under Multiple Perspectives

    NASA Technical Reports Server (NTRS)

    deBessonet, Cary

    2004-01-01

    This is the final report with emphasis on research during the last term. The context for the research has been the development of an automated reasoning technology for use in SMS (symbolic Manipulation System), a system used to build and query knowledge bases (KBs) using a special knowledge representation language SL (Symbolic Language). SMS interpreters assertive SL input and enters the results as components of its universe. The system operates in two basic models: 1) constructive mode (for building KBs); and 2) query/search mode (for querying KBs). Query satisfaction consists of matching query components with KB components. The system allows "penumbral matches," that is, matches that do not exactly meet the specifications of the query, but which are deemed relevant for the conversational context. If the user wants to know whether SMS has information that holds, say, for "any chow," the scope of relevancy might be set so that the system would respond based on a finding that it has information that holds for "most dogs," although this is not exactly what was called for by the query. The response would be qualified accordingly, as would normally be the case in ordinary human conversation. The general goal of the research was to develop an approach by which assertive content could be interpreted from multiple perspectives so that reasoning operations could be successfully conducted over the results. The interpretation of an SL statement such as, "{person believes [captain (asserted (perhaps)) (astronaut saw (comet (bright)))]}," which in English would amount to asserting something to the effect that, "Some person believes that a captain perhaps asserted that an astronaut saw a bright comet," would require the recognition of multiple perspectives, including some that are: a) epistemically-based (focusing on "believes"); b) assertion-based (focusing on "asserted"); c) perception-based (focusing on "saw"); d) adjectivally-based (focusing on "bight"); and e) modally-based (focusing on "perhaps"). Any conclusion reached under a line of reasoning that employs such an assertion or its associated implications should somehow reflect the employed perspectives. The investigators made significant progress in developing an approach that would enable a system to conduct reasoning operations over assertions of this kind while maintaining consistency in its knowledge bases. Significant accomplishments were made in the areas of: 1) integration and inferencing; 2) generation of perspectives, including wholistic ad composite views; and 3) consistency maintenance.

  6. Spatial-temporal analysis of building surface temperatures in Hung Hom

    NASA Astrophysics Data System (ADS)

    Zeng, Ying; Shen, Yueqian

    2015-12-01

    This thesis presents a study on spatial-temporal analysis of building surface temperatures in Hung Hom. Observations were collected from Aug 2013 to Oct 2013 at a 30-min interval, using iButton sensors (N=20) covering twelve locations in Hung Hom. And thermal images were captured in PolyU from 05 Aug 2013 to 06 Aug 2013. A linear regression model of iButton and thermal records is established to calibrate temperature data. A 3D modeling system is developed based on Visual Studio 2010 development platform, using ArcEngine10.0 component, Microsoft Access 2010 database and C# programming language. The system realizes processing data, spatial analysis, compound query and 3D face temperature rendering and so on. After statistical analyses, building face azimuths are found to have a statistically significant relationship with sun azimuths at peak time. And seasonal building temperature changing also corresponds to the sun angle and sun azimuth variations. Building materials are found to have a significant effect on building surface temperatures. Buildings with lower albedo materials tend to have higher temperatures and larger thermal conductivity material have significant diurnal variations. For the geographical locations, the peripheral faces of campus have higher temperatures than the inner faces during day time and buildings located at the southeast are cooler than the western. Furthermore, human activity is found to have a strong relationship with building surface temperatures through weekday and weekend comparison.

  7. Standard Port-Visit Cost Forecasting Model for U.S. Navy Husbanding Contracts

    DTIC Science & Technology

    2009-12-01

    Protocol (HTTP) server.35 2. MySQL . An open-source database.36 3. PHP . A common scripting language used for Web development.37 E. IMPLEMENTATION OF...Inc. (2009). MySQL Community Server (Version 5.1) [Software]. Available from http://dev.mysql.com/downloads/ 37 The PHP Group (2009). PHP (Version...Logistics Services MySQL My Structured Query Language NAVSUP Navy Supply Systems Command NC Non-Contract Items NPS Naval Postgraduate

  8. Managing Objects in a Relational Framework

    DTIC Science & Technology

    1989-01-01

    Database Week, San Jose CA, May.1983, pp.107-113. [Stonebraker 85] Stonebraker,M. and Rowe,L.: "The Design of POSTGRES " Tech.Report UC Berkeley, Nov...latter is equivalent to the definition of an attribute in a POSTGRES relation using the generic Quel facility. Recently, recursive query languages have...utilize rewrite rules. OSQL [Lynl 88] provides a language for associative access. 2. The POSTGRES model [Sto 86] allows Quel and C-procedures as the

  9. 41. DISCOVERY, SEARCH, AND COMMUNICATION OF TEXTUAL KNOWLEDGE RESOURCES IN DISTRIBUTED SYSTEMS a. Discovering and Utilizing Knowledge Sources for Metasearch Knowledge Systems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zamora, Antonio

    Advanced Natural Language Processing Tools for Web Information Retrieval, Content Analysis, and Synthesis. The goal of this SBIR was to implement and evaluate several advanced Natural Language Processing (NLP) tools and techniques to enhance the precision and relevance of search results by analyzing and augmenting search queries and by helping to organize the search output obtained from heterogeneous databases and web pages containing textual information of interest to DOE and the scientific-technical user communities in general. The SBIR investigated 1) the incorporation of spelling checkers in search applications, 2) identification of significant phrases and concepts using a combination of linguisticmore » and statistical techniques, and 3) enhancement of the query interface and search retrieval results through the use of semantic resources, such as thesauri. A search program with a flexible query interface was developed to search reference databases with the objective of enhancing search results from web queries or queries of specialized search systems such as DOE's Information Bridge. The DOE ETDE/INIS Joint Thesaurus was processed to create a searchable database. Term frequencies and term co-occurrences were used to enhance the web information retrieval by providing algorithmically-derived objective criteria to organize relevant documents into clusters containing significant terms. A thesaurus provides an authoritative overview and classification of a field of knowledge. By organizing the results of a search using the thesaurus terminology, the output is more meaningful than when the results are just organized based on the terms that co-occur in the retrieved documents, some of which may not be significant. An attempt was made to take advantage of the hierarchy provided by broader and narrower terms, as well as other field-specific information in the thesauri. The search program uses linguistic morphological routines to find relevant entries regardless of whether terms are stored in singular or plural form. Implementation of additional inflectional morphology processes for verbs can enhance retrieval further, but this has to be balanced by the possibility of broadening the results too much. In addition to the DOE energy thesaurus, other sources of specialized organized knowledge such as the Medical Subject Headings (MeSH), the Unified Medical Language System (UMLS), and Wikipedia were investigated. The supporting role of the NLP thesaurus search program was enhanced by incorporating spelling aid and a part-of-speech tagger to cope with misspellings in the queries and to determine the grammatical roles of the query words and identify nouns for special processing. To improve precision, multiple modes of searching were implemented including Boolean operators, and field-specific searches. Programs to convert a thesaurus or reference file into searchable support files can be deployed easily, and the resulting files are immediately searchable to produce relevance-ranked results with builtin spelling aid, morphological processing, and advanced search logic. Demonstration systems were built for several databases, including the DOE energy thesaurus.« less

  10. Advanced SPARQL querying in small molecule databases.

    PubMed

    Galgonek, Jakub; Hurt, Tomáš; Michlíková, Vendula; Onderka, Petr; Schwarz, Jan; Vondrášek, Jiří

    2016-01-01

    In recent years, the Resource Description Framework (RDF) and the SPARQL query language have become more widely used in the area of cheminformatics and bioinformatics databases. These technologies allow better interoperability of various data sources and powerful searching facilities. However, we identified several deficiencies that make usage of such RDF databases restrictive or challenging for common users. We extended a SPARQL engine to be able to use special procedures inside SPARQL queries. This allows the user to work with data that cannot be simply precomputed and thus cannot be directly stored in the database. We designed an algorithm that checks a query against data ontology to identify possible user errors. This greatly improves query debugging. We also introduced an approach to visualize retrieved data in a user-friendly way, based on templates describing visualizations of resource classes. To integrate all of our approaches, we developed a simple web application. Our system was implemented successfully, and we demonstrated its usability on the ChEBI database transformed into RDF form. To demonstrate procedure call functions, we employed compound similarity searching based on OrChem. The application is publicly available at https://bioinfo.uochb.cas.cz/projects/chemRDF.

  11. Spatial Language Facilitates Spatial Cognition: Evidence from Children Who Lack Language Input

    ERIC Educational Resources Information Center

    Gentner, Dedre; Ozyurek, Asli; Gurcanli, Ozge; Goldin-Meadow, Susan

    2013-01-01

    Does spatial language influence how people think about space? To address this question, we observed children who did not know a conventional language, and tested their performance on nonlinguistic spatial tasks. We studied deaf children living in Istanbul whose hearing losses prevented them from acquiring speech and whose hearing parents had not…

  12. Browsing and Visualization of Linked Environmental Data

    NASA Astrophysics Data System (ADS)

    Nikolaou, Charalampos; Kyzirakos, Kostis; Bereta, Konstantina; Dogani, Kallirroi; Koubarakis, Manolis

    2014-05-01

    Linked environmental data has started to appear on the Web as environmental researchers make use of technologies such as ontologies, RDF, and SPARQL. Many of these datasets have an important geospatial and temporal dimension. The same is true also for the Web of data that is being rapidly populated not only with geospatial information, but also with temporal information. As the real-world entities represented in linked geospatial datasets evolve over time, the datasets themselves get updated and both the spatial and the temporal dimension of data become significant for users. For example, in the Earth Observation and Environment domains, data is constantly produced by satellite sensors and is associated with metadata containing, among others, temporal attributes, such as the time that an image was acquired. In addition, the acquisitions are considered to be valid for specific periods of time, for example until they get updated by new acquisitions. Satellite acquisitions might be utilized in applications such as the CORINE Land Cover programme operated by the European Environment Agency that makes available as a cartographic product the land cover of European areas. Periodically CORINE publishes the changes in the land cover of these areas in the form of changesets. Tools for exploiting the abundance of geospatial information have also started to emerge. However, these tools are designed for browsing a single data source, while in addition they cannot represent the temporal dimension. This is for two reasons: a) the lack of an implementation of a data model and a query language with temporal features covering the various semantics associated with the representation of time (e.g., valid and user-defined), and b) the lack of a standard temporal extension of RDF that would allow practitioners to utilize when publishing RDF data. Recently, we presented the temporal features of the data model stRDF, the query language stSPARQL, and their implementation in the geospatial RDF store Strabon (http://www.strabon.di.uoa.gr/) which, apart from querying geospatial information, can also be used to query both the valid time of a triple and user-defined time. With the aim of filling the aforementioned gaps and going beyond data exploration to map creation and sharing, we have designed and developed SexTant (http://sextant.di.uoa.gr/). SexTant can be used to produce thematic maps by layering spatiotemporal information which exists in a number of data sources ranging from standard SPARQL endpoints, to SPARQL endpoints following the standard GeoSPARQL defined by the Open Geospatial Consortium (OGC) for the modelling and querying of geospatial information, and other well-adopted geospatial file formats, such as KML and GeoJSON. In this work, we pick some real use cases from the environment domain to showcase the usefulness of SexTant to the environmental studies of a domain expert by presenting its browsing and visualization capabilities using a number of environmental datasets that we have published as linked data and also other geospatial data sources publicly available on the Web, such as KML files.

  13. Indexing and retrieving point and region objects

    NASA Astrophysics Data System (ADS)

    Ibrahim, Azzam T.; Fotouhi, Farshad A.

    1996-03-01

    R-tree and its variants are examples of spatial data structures for paged-secondary memory. To process a query, these structures require multiple path traversals. In this paper, we present a new image access method, SB+-tree which requires a single path traversal to process a query. Also, SB+-tree will allow commercial databases an access method for spatial objects without a major change, since most commercial databases already support B+-tree as an access method for text data. The SB+-tree can be used for zero and non-zero size data objects. Non-zero size objects are approximated by their minimum bounding rectangles (MBRs). The number of SB+-trees generated is dependent upon the number of dimensions of the approximation of the object. The structure supports efficient spatial operations such as regions-overlap, distance and direction. In this paper, we experimentally and analytically demonstrate the superiority of SB+-tree over R-tree.

  14. Archetype-based data warehouse environment to enable the reuse of electronic health record data.

    PubMed

    Marco-Ruiz, Luis; Moner, David; Maldonado, José A; Kolstrup, Nils; Bellika, Johan G

    2015-09-01

    The reuse of data captured during health care delivery is essential to satisfy the demands of clinical research and clinical decision support systems. A main barrier for the reuse is the existence of legacy formats of data and the high granularity of it when stored in an electronic health record (EHR) system. Thus, we need mechanisms to standardize, aggregate, and query data concealed in the EHRs, to allow their reuse whenever they are needed. To create a data warehouse infrastructure using archetype-based technologies, standards and query languages to enable the interoperability needed for data reuse. The work presented makes use of best of breed archetype-based data transformation and storage technologies to create a workflow for the modeling, extraction, transformation and load of EHR proprietary data into standardized data repositories. We converted legacy data and performed patient-centered aggregations via archetype-based transformations. Later, specific purpose aggregations were performed at a query level for particular use cases. Laboratory test results of a population of 230,000 patients belonging to Troms and Finnmark counties in Norway requested between January 2013 and November 2014 have been standardized. Test records normalization has been performed by defining transformation and aggregation functions between the laboratory records and an archetype. These mappings were used to automatically generate open EHR compliant data. These data were loaded into an archetype-based data warehouse. Once loaded, we defined indicators linked to the data in the warehouse to monitor test activity of Salmonella and Pertussis using the archetype query language. Archetype-based standards and technologies can be used to create a data warehouse environment that enables data from EHR systems to be reused in clinical research and decision support systems. With this approach, existing EHR data becomes available in a standardized and interoperable format, thus opening a world of possibilities toward semantic or concept-based reuse, query and communication of clinical data. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.

  15. Modeling Spatial Relationships within a Fuzzy Framework.

    ERIC Educational Resources Information Center

    Petry, Frederick E.; Cobb, Maria A.

    1998-01-01

    Presents a model for representing and storing binary topological and directional relationships between 2-dimensional objects that is used to provide a basis for fuzzy querying capabilities. A data structure called an abstract spatial graph (ASG) is defined for the binary relationships that maintains all necessary information regarding topology and…

  16. The EarthServer Geology Service: web coverage services for geosciences

    NASA Astrophysics Data System (ADS)

    Laxton, John; Sen, Marcus; Passmore, James

    2014-05-01

    The EarthServer FP7 project is implementing web coverage services using the OGC WCS and WCPS standards for a range of earth science domains: cryospheric; atmospheric; oceanographic; planetary; and geological. BGS is providing the geological service (http://earthserver.bgs.ac.uk/). Geoscience has used remote sensed data from satellites and planes for some considerable time, but other areas of geosciences are less familiar with the use of coverage data. This is rapidly changing with the development of new sensor networks and the move from geological maps to geological spatial models. The BGS geology service is designed initially to address two coverage data use cases and three levels of data access restriction. Databases of remote sensed data are typically very large and commonly held offline, making it time-consuming for users to assess and then download data. The service is designed to allow the spatial selection, editing and display of Landsat and aerial photographic imagery, including band selection and contrast stretching. This enables users to rapidly view data, assess is usefulness for their purposes, and then enhance and download it if it is suitable. At present the service contains six band Landsat 7 (Blue, Green, Red, NIR 1, NIR 2, MIR) and three band false colour aerial photography (NIR, green, blue), totalling around 1Tb. Increasingly 3D spatial models are being produced in place of traditional geological maps. Models make explicit spatial information implicit on maps and thus are seen as a better way of delivering geosciences information to non-geoscientists. However web delivery of models, including the provision of suitable visualisation clients, has proved more challenging than delivering maps. The EarthServer geology service is delivering 35 surfaces as coverages, comprising the modelled superficial deposits of the Glasgow area. These can be viewed using a 3D web client developed in the EarthServer project by Fraunhofer. As well as remote sensed imagery and 3D models, the geology service is also delivering DTM coverages which can be viewed in the 3D client in conjunction with both imagery and models. The service is accessible through a web GUI which allows the imagery to be viewed against a range of background maps and DTMs, and in the 3D client; spatial selection to be carried out graphically; the results of image enhancement to be displayed; and selected data to be downloaded. The GUI also provides access to the Glasgow model in the 3D client, as well as tutorial material. In the final year of the project it is intended to increase the volume of data to 20Tb and enhance the WCPS processing, including depth and thickness querying of 3D models. We have also investigated the use of GeoSciML, developed to describe and interchange the information on geological maps, to describe model surface coverages. EarthServer is developing a combined WCPS and xQuery query language, and we will investigate applying this to the GeoSciML described surfaces to answer questions such as 'find all units with a predominant sand lithology within 25m of the surface'.

  17. Framing Electronic Medical Records as Polylingual Documents in Query Expansion

    PubMed Central

    Huang, Edward W; Wang, Sheng; Lee, Doris Jung-Lin; Zhang, Runshun; Liu, Baoyan; Zhou, Xuezhong; Zhai, ChengXiang

    2017-01-01

    We present a study of electronic medical record (EMR) retrieval that emulates situations in which a doctor treats a new patient. Given a query consisting of a new patient’s symptoms, the retrieval system returns the set of most relevant records of previously treated patients. However, due to semantic, functional, and treatment synonyms in medical terminology, queries are often incomplete and thus require enhancement. In this paper, we present a topic model that frames symptoms and treatments as separate languages. Our experimental results show that this method improves retrieval performance over several baselines with statistical significance. These baselines include methods used in prior studies as well as state-of-the-art embedding techniques. Finally, we show that our proposed topic model discovers all three types of synonyms to improve medical record retrieval. PMID:29854161

  18. Variation in spatial language and cognition: exploring visuo-spatial thinking and speaking cross-linguistically.

    PubMed

    Soroli, Efstathia

    2012-08-01

    Languages differ strikingly in how they encode spatial information. This variability is realized with spatial semantic elements mapped across languages in very different ways onto lexical/syntactic structures. For example, satellite-framed languages (e.g., English) express MANNER: in the verb and PATH: in satellites, while verb-framed languages (e.g., French) lexicalize PATH: in the verb, leaving MANNER: implicit or peripheral. Some languages are harder to classify into these categories, rather presenting equipollently framed systems, such as Chinese (serial-verb constructions) or Greek (parallel verb- and satellite-framed structures in equally frequent contexts). Such properties seem to have implications not only on the formulation/articulation levels, but also on the conceptualization level, thereby reviving questions concerning the language-thought interface. The present study investigates the relative impact of language-independent and language-specific factors on spatial representations across three typologically different languages (English-French-Greek) combining a variety of complementary tasks (production, non-verbal, and verbal categorization). The findings show that typological properties of languages can have an impact on both linguistic and non-linguistic organization of spatial information, open new perspectives for the investigation of conceptualization, and contribute more generally to the debate concerning the universal and language-specific dimensions of cognition.

  19. An Ontology-Based Reasoning Framework for Querying Satellite Images for Disaster Monitoring.

    PubMed

    Alirezaie, Marjan; Kiselev, Andrey; Längkvist, Martin; Klügl, Franziska; Loutfi, Amy

    2017-11-05

    This paper presents a framework in which satellite images are classified and augmented with additional semantic information to enable queries about what can be found on the map at a particular location, but also about paths that can be taken. This is achieved by a reasoning framework based on qualitative spatial reasoning that is able to find answers to high level queries that may vary on the current situation. This framework called SemCityMap, provides the full pipeline from enriching the raw image data with rudimentary labels to the integration of a knowledge representation and reasoning methods to user interfaces for high level querying. To illustrate the utility of SemCityMap in a disaster scenario, we use an urban environment-central Stockholm-in combination with a flood simulation. We show that the system provides useful answers to high-level queries also with respect to the current flood status. Examples of such queries concern path planning for vehicles or retrieval of safe regions such as "find all regions close to schools and far from the flooded area". The particular advantage of our approach lies in the fact that ontological information and reasoning is explicitly integrated so that queries can be formulated in a natural way using concepts on appropriate level of abstraction, including additional constraints.

  20. An Ontology-Based Reasoning Framework for Querying Satellite Images for Disaster Monitoring

    PubMed Central

    Alirezaie, Marjan; Klügl, Franziska; Loutfi, Amy

    2017-01-01

    This paper presents a framework in which satellite images are classified and augmented with additional semantic information to enable queries about what can be found on the map at a particular location, but also about paths that can be taken. This is achieved by a reasoning framework based on qualitative spatial reasoning that is able to find answers to high level queries that may vary on the current situation. This framework called SemCityMap, provides the full pipeline from enriching the raw image data with rudimentary labels to the integration of a knowledge representation and reasoning methods to user interfaces for high level querying. To illustrate the utility of SemCityMap in a disaster scenario, we use an urban environment—central Stockholm—in combination with a flood simulation. We show that the system provides useful answers to high-level queries also with respect to the current flood status. Examples of such queries concern path planning for vehicles or retrieval of safe regions such as “find all regions close to schools and far from the flooded area”. The particular advantage of our approach lies in the fact that ontological information and reasoning is explicitly integrated so that queries can be formulated in a natural way using concepts on appropriate level of abstraction, including additional constraints. PMID:29113073

  1. Behavioral Issues in the Use of Interactive Systems

    DTIC Science & Technology

    1976-12-14

    communication. American Psychologist, 1971, 26, 949-961. Codd , E. F . Seven steps to rendezvous with the casual user. IBM Research Report, RI 1333. 1974. Conrad...Approved for public releasel distribution unlimited. F LL(I j i’ This ~Research wavs spotdi pr yteEnierons~ao _ 1Repr o seionrin who Zur ichati emte...natural language ( Codd , 1974). Behavioral work has shown that non-programmers could learn to use a laboratory query language in about 3 hours (Thomas

  2. Inferring the Why in Images

    DTIC Science & Technology

    2014-01-01

    model. We combinatorially replaced tokens with words from our vocabulary to score the relationships be- tween concepts. The second-order queries (not...is the action, y3 is an object, and y4 is the scene. Language Potentials: We captialize on state-of-the-art natural language models to score the rela...model estimated on billions of web-pages [4, 10] to form each L(·). Scoring Function: Given the image x, we score a possible labeling configuration y of

  3. The implementation of POSTGRES

    NASA Technical Reports Server (NTRS)

    Stonebraker, Michael; Rowe, Lawrence A.; Hirohama, Michael

    1990-01-01

    The design and implementation decisions made for the three-dimensional data manager POSTGRES are discussed. Attention is restricted to the DBMS backend functions. The POSTGRES data model and query language, the rules system, the storage system, the POSTGRES implementation, and the current status and performance are discussed.

  4. Text Information Extraction System (TIES) | Informatics Technology for Cancer Research (ITCR)

    Cancer.gov

    TIES is a service based software system for acquiring, deidentifying, and processing clinical text reports using natural language processing, and also for querying, sharing and using this data to foster tissue and image based research, within and between institutions.

  5. Remote file inquiry (RFI) system

    NASA Technical Reports Server (NTRS)

    1975-01-01

    System interrogates and maintains user-definable data files from remote terminals, using English-like, free-form query language easily learned by persons not proficient in computer programming. System operates in asynchronous mode, allowing any number of inquiries within limitation of available core to be active concurrently.

  6. Knowledge-Based Information Retrieval.

    ERIC Educational Resources Information Center

    Ford, Nigel

    1991-01-01

    Discussion of information retrieval focuses on theoretical and empirical advances in knowledge-based information retrieval. Topics discussed include the use of natural language for queries; the use of expert systems; intelligent tutoring systems; user modeling; the need for evaluation of system effectiveness; and examples of systems, including…

  7. Image databases: Problems and perspectives

    NASA Technical Reports Server (NTRS)

    Gudivada, V. Naidu

    1989-01-01

    With the increasing number of computer graphics, image processing, and pattern recognition applications, economical storage, efficient representation and manipulation, and powerful and flexible query languages for retrieval of image data are of paramount importance. These and related issues pertinent to image data bases are examined.

  8. An SQL query generator for CLIPS

    NASA Technical Reports Server (NTRS)

    Snyder, James; Chirica, Laurian

    1990-01-01

    As expert systems become more widely used, their access to large amounts of external information becomes increasingly important. This information exists in several forms such as statistical, tabular data, knowledge gained by experts and large databases of information maintained by companies. Because many expert systems, including CLIPS, do not provide access to this external information, much of the usefulness of expert systems is left untapped. The scope of this paper is to describe a database extension for the CLIPS expert system shell. The current industry standard database language is SQL. Due to SQL standardization, large amounts of information stored on various computers, potentially at different locations, will be more easily accessible. Expert systems should be able to directly access these existing databases rather than requiring information to be re-entered into the expert system environment. The ORACLE relational database management system (RDBMS) was used to provide a database connection within the CLIPS environment. To facilitate relational database access a query generation system was developed as a CLIPS user function. The queries are entered in a CLlPS-like syntax and are passed to the query generator, which constructs and submits for execution, an SQL query to the ORACLE RDBMS. The query results are asserted as CLIPS facts. The query generator was developed primarily for use within the ICADS project (Intelligent Computer Aided Design System) currently being developed by the CAD Research Unit in the California Polytechnic State University (Cal Poly). In ICADS, there are several parallel or distributed expert systems accessing a common knowledge base of facts. Expert system has a narrow domain of interest and therefore needs only certain portions of the information. The query generator provides a common method of accessing this information and allows the expert system to specify what data is needed without specifying how to retrieve it.

  9. SAFOD Brittle Microstructure and Mechanics Knowledge Base (BM2KB)

    NASA Astrophysics Data System (ADS)

    Babaie, Hassan A.; Broda Cindi, M.; Hadizadeh, Jafar; Kumar, Anuj

    2013-07-01

    Scientific drilling near Parkfield, California has established the San Andreas Fault Observatory at Depth (SAFOD), which provides the solid earth community with short range geophysical and fault zone material data. The BM2KB ontology was developed in order to formalize the knowledge about brittle microstructures in the fault rocks sampled from the SAFOD cores. A knowledge base, instantiated from this domain ontology, stores and presents the observed microstructural and analytical data with respect to implications for brittle deformation and mechanics of faulting. These data can be searched on the knowledge base‧s Web interface by selecting a set of terms (classes, properties) from different drop-down lists that are dynamically populated from the ontology. In addition to this general search, a query can also be conducted to view data contributed by a specific investigator. A search by sample is done using the EarthScope SAFOD Core Viewer that allows a user to locate samples on high resolution images of core sections belonging to different runs and holes. The class hierarchy of the BM2KB ontology was initially designed using the Unified Modeling Language (UML), which was used as a visual guide to develop the ontology in OWL applying the Protégé ontology editor. Various Semantic Web technologies such as the RDF, RDFS, and OWL ontology languages, SPARQL query language, and Pellet reasoning engine, were used to develop the ontology. An interactive Web application interface was developed through Jena, a java based framework, with AJAX technology, jsp pages, and java servlets, and deployed via an Apache tomcat server. The interface allows the registered user to submit data related to their research on a sample of the SAFOD core. The submitted data, after initial review by the knowledge base administrator, are added to the extensible knowledge base and become available in subsequent queries to all types of users. The interface facilitates inference capabilities in the ontology, supports SPARQL queries, allows for modifications based on successive discoveries, and provides an accessible knowledge base on the Web.

  10. On-demand information retrieval in sensor networks with localised query and energy-balanced data collection.

    PubMed

    Teng, Rui; Zhang, Bing

    2011-01-01

    On-demand information retrieval enables users to query and collect up-to-date sensing information from sensor nodes. Since high energy efficiency is required in a sensor network, it is desirable to disseminate query messages with small traffic overhead and to collect sensing data with low energy consumption. However, on-demand query messages are generally forwarded to sensor nodes in network-wide broadcasts, which create large traffic overhead. In addition, since on-demand information retrieval may introduce intermittent and spatial data collections, the construction and maintenance of conventional aggregation structures such as clusters and chains will be at high cost. In this paper, we propose an on-demand information retrieval approach that exploits the name resolution of data queries according to the attribute and location of each sensor node. The proposed approach localises each query dissemination and enable localised data collection with maximised aggregation. To illustrate the effectiveness of the proposed approach, an analytical model that describes the criteria of sink proxy selection is provided. The evaluation results reveal that the proposed scheme significantly reduces energy consumption and improves the balance of energy consumption among sensor nodes by alleviating heavy traffic near the sink.

  11. Innovations in individual feature history management - The significance of feature-based temporal model

    USGS Publications Warehouse

    Choi, J.; Seong, J.C.; Kim, B.; Usery, E.L.

    2008-01-01

    A feature relies on three dimensions (space, theme, and time) for its representation. Even though spatiotemporal models have been proposed, they have principally focused on the spatial changes of a feature. In this paper, a feature-based temporal model is proposed to represent the changes of both space and theme independently. The proposed model modifies the ISO's temporal schema and adds new explicit temporal relationship structure that stores temporal topological relationship with the ISO's temporal primitives of a feature in order to keep track feature history. The explicit temporal relationship can enhance query performance on feature history by removing topological comparison during query process. Further, a prototype system has been developed to test a proposed feature-based temporal model by querying land parcel history in Athens, Georgia. The result of temporal query on individual feature history shows the efficiency of the explicit temporal relationship structure. ?? Springer Science+Business Media, LLC 2007.

  12. Object orientation affects spatial language comprehension.

    PubMed

    Burigo, Michele; Sacchi, Simona

    2013-01-01

    Typical spatial descriptions, such as "The car is in front of the house," describe the position of a located object (LO; e.g., the car) in space relative to a reference object (RO) whose location is known (e.g., the house). The orientation of the RO affects spatial language comprehension via the reference frame selection process. However, the effects of the LO's orientation on spatial language have not received great attention. This study explores whether the pure geometric information of the LO (e.g., its orientation) affects spatial language comprehension using placing and production tasks. Our results suggest that the orientation of the LO influences spatial language comprehension even in the absence of functional relationships. Copyright © 2013 Cognitive Science Society, Inc.

  13. Petaminer: Using ROOT for efficient data storage in MySQL database

    NASA Astrophysics Data System (ADS)

    Cranshaw, J.; Malon, D.; Vaniachine, A.; Fine, V.; Lauret, J.; Hamill, P.

    2010-04-01

    High Energy and Nuclear Physics (HENP) experiments store Petabytes of event data and Terabytes of calibration data in ROOT files. The Petaminer project is developing a custom MySQL storage engine to enable the MySQL query processor to directly access experimental data stored in ROOT files. Our project is addressing the problem of efficient navigation to PetaBytes of HENP experimental data described with event-level TAG metadata, which is required by data intensive physics communities such as the LHC and RHIC experiments. Physicists need to be able to compose a metadata query and rapidly retrieve the set of matching events, where improved efficiency will facilitate the discovery process by permitting rapid iterations of data evaluation and retrieval. Our custom MySQL storage engine enables the MySQL query processor to directly access TAG data stored in ROOT TTrees. As ROOT TTrees are column-oriented, reading them directly provides improved performance over traditional row-oriented TAG databases. Leveraging the flexible and powerful SQL query language to access data stored in ROOT TTrees, the Petaminer approach enables rich MySQL index-building capabilities for further performance optimization.

  14. Automatic query formulations in information retrieval.

    PubMed

    Salton, G; Buckley, C; Fox, E A

    1983-07-01

    Modern information retrieval systems are designed to supply relevant information in response to requests received from the user population. In most retrieval environments the search requests consist of keywords, or index terms, interrelated by appropriate Boolean operators. Since it is difficult for untrained users to generate effective Boolean search requests, trained search intermediaries are normally used to translate original statements of user need into useful Boolean search formulations. Methods are introduced in this study which reduce the role of the search intermediaries by making it possible to generate Boolean search formulations completely automatically from natural language statements provided by the system patrons. Frequency considerations are used automatically to generate appropriate term combinations as well as Boolean connectives relating the terms. Methods are covered to produce automatic query formulations both in a standard Boolean logic system, as well as in an extended Boolean system in which the strict interpretation of the connectives is relaxed. Experimental results are supplied to evaluate the effectiveness of the automatic query formulation process, and methods are described for applying the automatic query formulation process in practice.

  15. Using discordance to improve classification in narrative clinical databases: an application to community-acquired pneumonia.

    PubMed

    Hripcsak, George; Knirsch, Charles; Zhou, Li; Wilcox, Adam; Melton, Genevieve B

    2007-03-01

    Data mining in electronic medical records may facilitate clinical research, but much of the structured data may be miscoded, incomplete, or non-specific. The exploitation of narrative data using natural language processing may help, although nesting, varying granularity, and repetition remain challenges. In a study of community-acquired pneumonia using electronic records, these issues led to poor classification. Limiting queries to accurate, complete records led to vastly reduced, possibly biased samples. We exploited knowledge latent in the electronic records to improve classification. A similarity metric was used to cluster cases. We defined discordance as the degree to which cases within a cluster give different answers for some query that addresses a classification task of interest. Cases with higher discordance are more likely to be incorrectly classified, and can be reviewed manually to adjust the classification, improve the query, or estimate the likely accuracy of the query. In a study of pneumonia--in which the ICD9-CM coding was found to be very poor--the discordance measure was statistically significantly correlated with classification correctness (.45; 95% CI .15-.62).

  16. Enabling online studies of conceptual relationships between medical terms: developing an efficient web platform.

    PubMed

    Albin, Aaron; Ji, Xiaonan; Borlawsky, Tara B; Ye, Zhan; Lin, Simon; Payne, Philip Ro; Huang, Kun; Xiang, Yang

    2014-10-07

    The Unified Medical Language System (UMLS) contains many important ontologies in which terms are connected by semantic relations. For many studies on the relationships between biomedical concepts, the use of transitively associated information from ontologies and the UMLS has been shown to be effective. Although there are a few tools and methods available for extracting transitive relationships from the UMLS, they usually have major restrictions on the length of transitive relations or on the number of data sources. Our goal was to design an efficient online platform that enables efficient studies on the conceptual relationships between any medical terms. To overcome the restrictions of available methods and to facilitate studies on the conceptual relationships between medical terms, we developed a Web platform, onGrid, that supports efficient transitive queries and conceptual relationship studies using the UMLS. This framework uses the latest technique in converting natural language queries into UMLS concepts, performs efficient transitive queries, and visualizes the result paths. It also dynamically builds a relationship matrix for two sets of input biomedical terms. We are thus able to perform effective studies on conceptual relationships between medical terms based on their relationship matrix. The advantage of onGrid is that it can be applied to study any two sets of biomedical concept relations and the relations within one set of biomedical concepts. We use onGrid to study the disease-disease relationships in the Online Mendelian Inheritance in Man (OMIM). By crossvalidating our results with an external database, the Comparative Toxicogenomics Database (CTD), we demonstrated that onGrid is effective for the study of conceptual relationships between medical terms. onGrid is an efficient tool for querying the UMLS for transitive relations, studying the relationship between medical terms, and generating hypotheses.

  17. HBVPathDB: a database of HBV infection-related molecular interaction network.

    PubMed

    Zhang, Yi; Bo, Xiao-Chen; Yang, Jing; Wang, Sheng-Qi

    2005-03-21

    To describe molecules or genes interaction between hepatitis B viruses (HBV) and host, for understanding how virus' and host's genes and molecules are networked to form a biological system and for perceiving mechanism of HBV infection. The knowledge of HBV infection-related reactions was organized into various kinds of pathways with carefully drawn graphs in HBVPathDB. Pathway information is stored with relational database management system (DBMS), which is currently the most efficient way to manage large amounts of data and query is implemented with powerful Structured Query Language (SQL). The search engine is written using Personal Home Page (PHP) with SQL embedded and web retrieval interface is developed for searching with Hypertext Markup Language (HTML). We present the first version of HBVPathDB, which is a HBV infection-related molecular interaction network database composed of 306 pathways with 1 050 molecules involved. With carefully drawn graphs, pathway information stored in HBVPathDB can be browsed in an intuitive way. We develop an easy-to-use interface for flexible accesses to the details of database. Convenient software is implemented to query and browse the pathway information of HBVPathDB. Four search page layout options-category search, gene search, description search, unitized search-are supported by the search engine of the database. The database is freely available at http://www.bio-inf.net/HBVPathDB/HBV/. The conventional perspective HBVPathDB have already contained a considerable amount of pathway information with HBV infection related, which is suitable for in-depth analysis of molecular interaction network of virus and host. HBVPathDB integrates pathway data-sets with convenient software for query, browsing, visualization, that provides users more opportunity to identify regulatory key molecules as potential drug targets and to explore the possible mechanism of HBV infection based on gene expression datasets.

  18. A Survey in Indexing and Searching XML Documents.

    ERIC Educational Resources Information Center

    Luk, Robert W. P.; Leong, H. V.; Dillon, Tharam S.; Chan, Alvin T. S.; Croft, W. Bruce; Allan, James

    2002-01-01

    Discussion of XML focuses on indexing techniques for XML documents, grouping them into flat-file, semistructured, and structured indexing paradigms. Highlights include searching techniques, including full text search and multistage search; search result presentations; database and information retrieval system integration; XML query languages; and…

  19. Constructing a Graph Database for Semantic Literature-Based Discovery.

    PubMed

    Hristovski, Dimitar; Kastrin, Andrej; Dinevski, Dejan; Rindflesch, Thomas C

    2015-01-01

    Literature-based discovery (LBD) generates discoveries, or hypotheses, by combining what is already known in the literature. Potential discoveries have the form of relations between biomedical concepts; for example, a drug may be determined to treat a disease other than the one for which it was intended. LBD views the knowledge in a domain as a network; a set of concepts along with the relations between them. As a starting point, we used SemMedDB, a database of semantic relations between biomedical concepts extracted with SemRep from Medline. SemMedDB is distributed as a MySQL relational database, which has some problems when dealing with network data. We transformed and uploaded SemMedDB into the Neo4j graph database, and implemented the basic LBD discovery algorithms with the Cypher query language. We conclude that storing the data needed for semantic LBD is more natural in a graph database. Also, implementing LBD discovery algorithms is conceptually simpler with a graph query language when compared with standard SQL.

  20. Influenza-like illness surveillance on Twitter through automated learning of naïve language.

    PubMed

    Gesualdo, Francesco; Stilo, Giovanni; Agricola, Eleonora; Gonfiantini, Michaela V; Pandolfi, Elisabetta; Velardi, Paola; Tozzi, Alberto E

    2013-01-01

    Twitter has the potential to be a timely and cost-effective source of data for syndromic surveillance. When speaking of an illness, Twitter users often report a combination of symptoms, rather than a suspected or final diagnosis, using naïve, everyday language. We developed a minimally trained algorithm that exploits the abundance of health-related web pages to identify all jargon expressions related to a specific technical term. We then translated an influenza case definition into a Boolean query, each symptom being described by a technical term and all related jargon expressions, as identified by the algorithm. Subsequently, we monitored all tweets that reported a combination of symptoms satisfying the case definition query. In order to geolocalize messages, we defined 3 localization strategies based on codes associated with each tweet. We found a high correlation coefficient between the trend of our influenza-positive tweets and ILI trends identified by US traditional surveillance systems.

  1. Influenza-Like Illness Surveillance on Twitter through Automated Learning of Naïve Language

    PubMed Central

    Gesualdo, Francesco; Stilo, Giovanni; Agricola, Eleonora; Gonfiantini, Michaela V.; Pandolfi, Elisabetta; Velardi, Paola; Tozzi, Alberto E.

    2013-01-01

    Twitter has the potential to be a timely and cost-effective source of data for syndromic surveillance. When speaking of an illness, Twitter users often report a combination of symptoms, rather than a suspected or final diagnosis, using naïve, everyday language. We developed a minimally trained algorithm that exploits the abundance of health-related web pages to identify all jargon expressions related to a specific technical term. We then translated an influenza case definition into a Boolean query, each symptom being described by a technical term and all related jargon expressions, as identified by the algorithm. Subsequently, we monitored all tweets that reported a combination of symptoms satisfying the case definition query. In order to geolocalize messages, we defined 3 localization strategies based on codes associated with each tweet. We found a high correlation coefficient between the trend of our influenza-positive tweets and ILI trends identified by US traditional surveillance systems. PMID:24324799

  2. High level language-based robotic control system

    NASA Technical Reports Server (NTRS)

    Rodriguez, Guillermo (Inventor); Kruetz, Kenneth K. (Inventor); Jain, Abhinandan (Inventor)

    1994-01-01

    This invention is a robot control system based on a high level language implementing a spatial operator algebra. There are two high level languages included within the system. At the highest level, applications programs can be written in a robot-oriented applications language including broad operators such as MOVE and GRASP. The robot-oriented applications language statements are translated into statements in the spatial operator algebra language. Programming can also take place using the spatial operator algebra language. The statements in the spatial operator algebra language from either source are then translated into machine language statements for execution by a digital control computer. The system also includes the capability of executing the control code sequences in a simulation mode before actual execution to assure proper action at execution time. The robot's environment is checked as part of the process and dynamic reconfiguration is also possible. The languages and system allow the programming and control of multiple arms and the use of inward/outward spatial recursions in which every computational step can be related to a transformation from one point in the mechanical robot to another point to name two major advantages.

  3. High level language-based robotic control system

    NASA Technical Reports Server (NTRS)

    Rodriguez, Guillermo (Inventor); Kreutz, Kenneth K. (Inventor); Jain, Abhinandan (Inventor)

    1996-01-01

    This invention is a robot control system based on a high level language implementing a spatial operator algebra. There are two high level languages included within the system. At the highest level, applications programs can be written in a robot-oriented applications language including broad operators such as MOVE and GRASP. The robot-oriented applications language statements are translated into statements in the spatial operator algebra language. Programming can also take place using the spatial operator algebra language. The statements in the spatial operator algebra language from either source are then translated into machine language statements for execution by a digital control computer. The system also includes the capability of executing the control code sequences in a simulation mode before actual execution to assure proper action at execution time. The robot's environment is checked as part of the process and dynamic reconfiguration is also possible. The languages and system allow the programming and control of multiple arms and the use of inward/outward spatial recursions in which every computational step can be related to a transformation from one point in the mechanical robot to another point to name two major advantages.

  4. Agile Datacube Analytics (not just) for the Earth Sciences

    NASA Astrophysics Data System (ADS)

    Misev, Dimitar; Merticariu, Vlad; Baumann, Peter

    2017-04-01

    Metadata are considered small, smart, and queryable; data, on the other hand, are known as big, clumsy, hard to analyze. Consequently, gridded data - such as images, image timeseries, and climate datacubes - are managed separately from the metadata, and with different, restricted retrieval capabilities. One reason for this silo approach is that databases, while good at tables, XML hierarchies, RDF graphs, etc., traditionally do not support multi-dimensional arrays well. This gap is being closed by Array Databases which extend the SQL paradigm of "any query, anytime" to NoSQL arrays. They introduce semantically rich modelling combined with declarative, high-level query languages on n-D arrays. On Server side, such queries can be optimized, parallelized, and distributed based on partitioned array storage. This way, they offer new vistas in flexibility, scalability, performance, and data integration. In this respect, the forthcoming ISO SQL extension MDA ("Multi-dimensional Arrays") will be a game changer in Big Data Analytics. We introduce concepts and opportunities through the example of rasdaman ("raster data manager") which in fact has pioneered the field of Array Databases and forms the blueprint for ISO SQL/MDA and further Big Data standards, such as OGC WCPS for querying spatio-temporal Earth datacubes. With operational installations exceeding 140 TB queries have been split across more than one thousand cloud nodes, using CPUs as well as GPUs. Installations can easily be mashed up securely, enabling large-scale location-transparent query processing in federations. Federation queries have been demonstrated live at EGU 2016 spanning Europe and Australia in the context of the intercontinental EarthServer initiative, visualized through NASA WorldWind.

  5. Agile Datacube Analytics (not just) for the Earth Sciences

    NASA Astrophysics Data System (ADS)

    Baumann, P.

    2016-12-01

    Metadata are considered small, smart, and queryable; data, on the other hand, are known as big, clumsy, hard to analyze. Consequently, gridded data - such as images, image timeseries, and climate datacubes - are managed separately from the metadata, and with different, restricted retrieval capabilities. One reason for this silo approach is that databases, while good at tables, XML hierarchies, RDF graphs, etc., traditionally do not support multi-dimensional arrays well.This gap is being closed by Array Databases which extend the SQL paradigm of "any query, anytime" to NoSQL arrays. They introduce semantically rich modelling combined with declarative, high-level query languages on n-D arrays. On Server side, such queries can be optimized, parallelized, and distributed based on partitioned array storage. This way, they offer new vistas in flexibility, scalability, performance, and data integration. In this respect, the forthcoming ISO SQL extension MDA ("Multi-dimensional Arrays") will be a game changer in Big Data Analytics.We introduce concepts and opportunities through the example of rasdaman ("raster data manager") which in fact has pioneered the field of Array Databases and forms the blueprint for ISO SQL/MDA and further Big Data standards, such as OGC WCPS for querying spatio-temporal Earth datacubes. With operational installations exceeding 140 TB queries have been split across more than one thousand cloud nodes, using CPUs as well as GPUs. Installations can easily be mashed up securely, enabling large-scale location-transparent query processing in federations. Federation queries have been demonstrated live at EGU 2016 spanning Europe and Australia in the context of the intercontinental EarthServer initiative, visualized through NASA WorldWind.

  6. NELS 2.0 - A general system for enterprise wide information management

    NASA Technical Reports Server (NTRS)

    Smith, Stephanie L.

    1993-01-01

    NELS, the NASA Electronic Library System, is an information management tool for creating distributed repositories of documents, drawings, and code for use and reuse by the aerospace community. The NELS retrieval engine can load metadata and source files of full text objects, perform natural language queries to retrieve ranked objects, and create links to connect user interfaces. For flexibility, the NELS architecture has layered interfaces between the application program and the stored library information. The session manager provides the interface functions for development of NELS applications. The data manager is an interface between session manager and the structured data system. The center of the structured data system is the Wide Area Information Server. This system architecture provides access to information across heterogeneous platforms in a distributed environment. There are presently three user interfaces that connect to the NELS engine; an X-Windows interface, and ASCII interface and the Spatial Data Management System. This paper describes the design and operation of NELS as an information management tool and repository.

  7. Intelligent communication assistant for databases

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jakobson, G.; Shaked, V.; Rowley, S.

    1983-01-01

    An intelligent communication assistant for databases, called FRED (front end for databases) is explored. FRED is designed to facilitate access to database systems by users of varying levels of experience. FRED is a second generation of natural language front-ends for databases and intends to solve two critical interface problems existing between end-users and databases: connectivity and communication problems. The authors report their experiences in developing software for natural language query processing, dialog control, and knowledge representation, as well as the direction of future work. 10 references.

  8. Semantic Web Services with Web Ontology Language (OWL-S) - Specification of Agent-Services for DARPA Agent Markup Language (DAML)

    DTIC Science & Technology

    2006-08-01

    effective for describing taxonomic categories and properties of things, the structures found in SWRL and SPARQL are better suited to describing conditions...up the query processing time, which may occur many times and furthermore it is time critical. In order to maintain information about the...that time spent during this phase does not depend linearly on the number of concepts present in the data structure , but in the order of log of concepts

  9. AiGERM: A logic programming front end for GERM

    NASA Technical Reports Server (NTRS)

    Hashim, Safaa H.

    1990-01-01

    AiGerm (Artificially Intelligent Graphical Entity Relation Modeler) is a relational data base query and programming language front end for MCC (Mission Control Center)/STP's (Space Test Program) Germ (Graphical Entity Relational Modeling) system. It is intended as an add-on component of the Germ system to be used for navigating very large networks of information. It can also function as an expert system shell for prototyping knowledge-based systems. AiGerm provides an interface between the programming language and Germ.

  10. EAGLE: 'EAGLE'Is an' Algorithmic Graph Library for Exploration

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    2015-01-16

    The Resource Description Framework (RDF) and SPARQL Protocol and RDF Query Language (SPARQL) were introduced about a decade ago to enable flexible schema-free data interchange on the Semantic Web. Today data scientists use the framework as a scalable graph representation for integrating, querying, exploring and analyzing data sets hosted at different sources. With increasing adoption, the need for graph mining capabilities for the Semantic Web has emerged. Today there is no tools to conduct "graph mining" on RDF standard data sets. We address that need through implementation of popular iterative Graph Mining algorithms (Triangle count, Connected component analysis, degree distribution,more » diversity degree, PageRank, etc.). We implement these algorithms as SPARQL queries, wrapped within Python scripts and call our software tool as EAGLE. In RDF style, EAGLE stands for "EAGLE 'Is an' algorithmic graph library for exploration. EAGLE is like 'MATLAB' for 'Linked Data.'« less

  11. Biotea: semantics for Pubmed Central.

    PubMed

    Garcia, Alexander; Lopez, Federico; Garcia, Leyla; Giraldo, Olga; Bucheli, Victor; Dumontier, Michel

    2018-01-01

    A significant portion of biomedical literature is represented in a manner that makes it difficult for consumers to find or aggregate content through a computational query. One approach to facilitate reuse of the scientific literature is to structure this information as linked data using standardized web technologies. In this paper we present the second version of Biotea, a semantic, linked data version of the open-access subset of PubMed Central that has been enhanced with specialized annotation pipelines that uses existing infrastructure from the National Center for Biomedical Ontology. We expose our models, services, software and datasets. Our infrastructure enables manual and semi-automatic annotation, resulting data are represented as RDF-based linked data and can be readily queried using the SPARQL query language. We illustrate the utility of our system with several use cases. Our datasets, methods and techniques are available at http://biotea.github.io.

  12. Bottom-Up Evaluation of Twig Join Pattern Queries in XML Document Databases

    NASA Astrophysics Data System (ADS)

    Chen, Yangjun

    Since the extensible markup language XML emerged as a new standard for information representation and exchange on the Internet, the problem of storing, indexing, and querying XML documents has been among the major issues of database research. In this paper, we study the twig pattern matching and discuss a new algorithm for processing ordered twig pattern queries. The time complexity of the algorithmis bounded by O(|D|·|Q| + |T|·leaf Q ) and its space overhead is by O(leaf T ·leaf Q ), where T stands for a document tree, Q for a twig pattern and D is a largest data stream associated with a node q of Q, which contains the database nodes that match the node predicate at q. leaf T (leaf Q ) represents the number of the leaf nodes of T (resp. Q). In addition, the algorithm can be adapted to an indexing environment with XB-trees being used.

  13. Development of a replicated database of DHCP data for evaluation of drug use.

    PubMed Central

    Graber, S E; Seneker, J A; Stahl, A A; Franklin, K O; Neel, T E; Miller, R A

    1996-01-01

    This case report describes development and testing of a method to extract clinical information stored in the Veterans Affairs (VA) Decentralized Hospital Computer System (DHCP) for the purpose of analyzing data about groups of patients. The authors used a microcomputer-based, structured query language (SQL)-compatible, relational database system to replicate a subset of the Nashville VA Hospital's DHCP patient database. This replicated database contained the complete current Nashville DHCP prescription, provider, patient, and drug data sets, and a subset of the laboratory data. A pilot project employed this replicated database to answer questions that might arise in drug-use evaluation, such as identification of cases of polypharmacy, suboptimal drug regimens, and inadequate laboratory monitoring of drug therapy. These database queries included as candidates for review all prescriptions for all outpatients. The queries demonstrated that specific drug-use events could be identified for any time interval represented in the replicated database. PMID:8653451

  14. Development of a replicated database of DHCP data for evaluation of drug use.

    PubMed

    Graber, S E; Seneker, J A; Stahl, A A; Franklin, K O; Neel, T E; Miller, R A

    1996-01-01

    This case report describes development and testing of a method to extract clinical information stored in the Veterans Affairs (VA) Decentralized Hospital Computer System (DHCP) for the purpose of analyzing data about groups of patients. The authors used a microcomputer-based, structured query language (SQL)-compatible, relational database system to replicate a subset of the Nashville VA Hospital's DHCP patient database. This replicated database contained the complete current Nashville DHCP prescription, provider, patient, and drug data sets, and a subset of the laboratory data. A pilot project employed this replicated database to answer questions that might arise in drug-use evaluation, such as identification of cases of polypharmacy, suboptimal drug regimens, and inadequate laboratory monitoring of drug therapy. These database queries included as candidates for review all prescriptions for all outpatients. The queries demonstrated that specific drug-use events could be identified for any time interval represented in the replicated database.

  15. Referential shift in Nicaraguan Sign Language: a transition from lexical to spatial devices

    PubMed Central

    Kocab, Annemarie; Pyers, Jennie; Senghas, Ann

    2015-01-01

    Even the simplest narratives combine multiple strands of information, integrating different characters and their actions by expressing multiple perspectives of events. We examined the emergence of referential shift devices, which indicate changes among these perspectives, in Nicaraguan Sign Language (NSL). Sign languages, like spoken languages, mark referential shift grammatically with a shift in deictic perspective. In addition, sign languages can mark the shift with a point or a movement of the body to a specified spatial location in the three-dimensional space in front of the signer, capitalizing on the spatial affordances of the manual modality. We asked whether the use of space to mark referential shift emerges early in a new sign language by comparing the first two age cohorts of deaf signers of NSL. Eight first-cohort signers and 10 second-cohort signers watched video vignettes and described them in NSL. Narratives were coded for lexical (use of words) and spatial (use of signing space) devices. Although the cohorts did not differ significantly in the number of perspectives represented, second-cohort signers used referential shift devices to explicitly mark a shift in perspective in more of their narratives. Furthermore, while there was no significant difference between cohorts in the use of non-spatial, lexical devices, there was a difference in spatial devices, with second-cohort signers using them in significantly more of their narratives. This suggests that spatial devices have only recently increased as systematic markers of referential shift. Spatial referential shift devices may have emerged more slowly because they depend on the establishment of fundamental spatial conventions in the language. While the modality of sign languages can ultimately engender the syntactic use of three-dimensional space, we propose that a language must first develop systematic spatial distinctions before harnessing space for grammatical functions. PMID:25713541

  16. Referential shift in Nicaraguan Sign Language: a transition from lexical to spatial devices.

    PubMed

    Kocab, Annemarie; Pyers, Jennie; Senghas, Ann

    2014-01-01

    Even the simplest narratives combine multiple strands of information, integrating different characters and their actions by expressing multiple perspectives of events. We examined the emergence of referential shift devices, which indicate changes among these perspectives, in Nicaraguan Sign Language (NSL). Sign languages, like spoken languages, mark referential shift grammatically with a shift in deictic perspective. In addition, sign languages can mark the shift with a point or a movement of the body to a specified spatial location in the three-dimensional space in front of the signer, capitalizing on the spatial affordances of the manual modality. We asked whether the use of space to mark referential shift emerges early in a new sign language by comparing the first two age cohorts of deaf signers of NSL. Eight first-cohort signers and 10 second-cohort signers watched video vignettes and described them in NSL. Narratives were coded for lexical (use of words) and spatial (use of signing space) devices. Although the cohorts did not differ significantly in the number of perspectives represented, second-cohort signers used referential shift devices to explicitly mark a shift in perspective in more of their narratives. Furthermore, while there was no significant difference between cohorts in the use of non-spatial, lexical devices, there was a difference in spatial devices, with second-cohort signers using them in significantly more of their narratives. This suggests that spatial devices have only recently increased as systematic markers of referential shift. Spatial referential shift devices may have emerged more slowly because they depend on the establishment of fundamental spatial conventions in the language. While the modality of sign languages can ultimately engender the syntactic use of three-dimensional space, we propose that a language must first develop systematic spatial distinctions before harnessing space for grammatical functions.

  17. Children's attention to task-relevant information accounts for relations between language and spatial cognition.

    PubMed

    Miller, Hilary E; Simmering, Vanessa R

    2018-08-01

    Children's spatial language reliably predicts their spatial skills, but the nature of this relation is a source of debate. This investigation examined whether the mechanisms accounting for such relations are specific to language use or reflect a domain-general mechanism of selective attention. Experiment 1 examined whether 4-year-olds' spatial skills were predicted by their selective attention or their adaptive language use. Children completed (a) an attention task assessing attention to task-relevant color, size, and location cues; (b) a description task assessing adaptive language use to describe scenes varying in color, size, and location; and (c) three spatial tasks. There was correspondence between the cue types that children attended to and produced across description and attention tasks. Adaptive language use was predicted by both children's attention and task-related language production, suggesting that selective attention underlies skills in using language adaptively. After controlling for age, gender, receptive vocabulary, and adaptive language use, spatial skills were predicted by children's selective attention. The attention score predicted variance in spatial performance previously accounted for by adaptive language use. Experiment 2 followed up on the attention task (Experiment 2a) and description task (Experiment 2b) from Experiment 1 to assess whether performance in the tasks related to selective attention or task-specific demands. Performance in Experiments 2a and 2b paralleled that in Experiment 1, suggesting that the effects in Experiment 1 reflected children's selective attention skills. These findings show that selective attention is a central factor supporting spatial skill development that could account for many effects previously attributed to children's language use. Copyright © 2018 Elsevier Inc. All rights reserved.

  18. Component Models for Semantic Web Languages

    NASA Astrophysics Data System (ADS)

    Henriksson, Jakob; Aßmann, Uwe

    Intelligent applications and agents on the Semantic Web typically need to be specified with, or interact with specifications written in, many different kinds of formal languages. Such languages include ontology languages, data and metadata query languages, as well as transformation languages. As learnt from years of experience in development of complex software systems, languages need to support some form of component-based development. Components enable higher software quality, better understanding and reusability of already developed artifacts. Any component approach contains an underlying component model, a description detailing what valid components are and how components can interact. With the multitude of languages developed for the Semantic Web, what are their underlying component models? Do we need to develop one for each language, or is a more general and reusable approach achievable? We present a language-driven component model specification approach. This means that a component model can be (automatically) generated from a given base language (actually, its specification, e.g. its grammar). As a consequence, we can provide components for different languages and simplify the development of software artifacts used on the Semantic Web.

  19. Cluster-Based Query Expansion Using Language Modeling for Biomedical Literature Retrieval

    ERIC Educational Resources Information Center

    Xu, Xuheng

    2011-01-01

    The tremendously huge volume of biomedical literature, scientists' specific information needs, long terms of multiples words, and fundamental problems of synonym and polysemy have been challenging issues facing the biomedical information retrieval community researchers. Search engines have significantly improved the efficiency and effectiveness of…

  20. A Probabilistic Approach to Crosslingual Information Retrieval

    DTIC Science & Technology

    2001-06-01

    language expansion step can be performed before the translation process. Implemented as a call to the INQUERY function get_modified_query with one of the...database consists of American English while the dictionary is British English. Therefore, e.g. the Spanish word basura is translated to rubbish and

  1. Survey of Event Processing

    DTIC Science & Technology

    2007-12-01

    1 A Brief History of Event Processing... history of event processing. The Applications section defines several application domains and use cases for event processing technology. Event...subscription” and “subscription language” will be used where some will often use “(continuous) query” or “query language.” A Brief History of

  2. Design of multi-language trading system of ethnic characteristic agricultural products based on android

    NASA Astrophysics Data System (ADS)

    Huanqin, Wu; Yasheng, Jin; Yugang, Dai

    2017-06-01

    Under the current situation where Internet technology develops rapidly, mobile E-commerce technology has brought great convenience to our life. Now, the graphical user interface (GUI) of most E-commerce platforms only supports Chinese. Thus, the development of Android client of E-commerce that supports ethnic languages owns a great prospect. The principle that combines front end design and database technology is adopted in this paper to construct the Android client system of E-commerce platforms that supports ethnic languages, which realizes the displaying, browsing, querying, searching, trading and other functions of ethnic characteristic agricultural products on android platforms.

  3. Achieving Sub-Second Search in the CMR

    NASA Astrophysics Data System (ADS)

    Gilman, J.; Baynes, K.; Pilone, D.; Mitchell, A. E.; Murphy, K. J.

    2014-12-01

    The Common Metadata Repository (CMR) is the next generation Earth Science Metadata catalog for NASA's Earth Observing data. It joins together the holdings from the EOS Clearing House (ECHO) and the Global Change Master Directory (GCMD), creating a unified, authoritative source for EOSDIS metadata. The CMR allows ingest in many different formats while providing consistent search behavior and retrieval in any supported format. Performance is a critical component of the CMR, ensuring improved data discovery and client interactivity. The CMR delivers sub-second search performance for any of the common query conditions (including spatial) across hundreds of millions of metadata granules. It also allows the addition of new metadata concepts such as visualizations, parameter metadata, and documentation. The CMR's goals presented many challenges. This talk will describe the CMR architecture, design, and innovations that were made to achieve its goals. This includes: * Architectural features like immutability and backpressure. * Data management techniques such as caching and parallel loading that give big performance gains. * Open Source and COTS tools like Elasticsearch search engine. * Adoption of Clojure, a functional programming language for the Java Virtual Machine. * Development of a custom spatial search plugin for Elasticsearch and why it was necessary. * Introduction of a unified model for metadata that maps every supported metadata format to a consistent domain model.

  4. Perspective taking in language: integrating the spatial and action domains.

    PubMed

    Beveridge, Madeleine E L; Pickering, Martin J

    2013-09-17

    Language is an inherently social behavior. In this paper, we bring together two research areas that typically occupy distinct sections of the literature: perspective taking in spatial language (whether people represent a scene from their own or a different spatial perspective), and perspective taking in action language (the extent to which they simulate an action as though they were performing that action). First, we note that vocabulary is used inconsistently across the spatial and action domains, and propose a more transparent vocabulary that will allow researchers to integrate action- and spatial-perspective taking. Second, we note that embodied theories of language comprehension often make the narrow assumption that understanding action descriptions involves adopting the perspective of an agent carrying out that action. We argue that comprehenders can adopt embodied action-perspectives other than that of the agent, including those of the patient or an observer. Third, we review evidence showing that perspective taking in spatial language is a flexible process. We argue that the flexibility of spatial-perspective taking provides a means for conversation partners engaged in dialogue to maximize similarity between their situation models. These situation models can then be used as the basis for action language simulations, in which language users adopt a particular action-perspective.

  5. Perspective taking in language: integrating the spatial and action domains

    PubMed Central

    Beveridge, Madeleine E. L.; Pickering, Martin J.

    2013-01-01

    Language is an inherently social behavior. In this paper, we bring together two research areas that typically occupy distinct sections of the literature: perspective taking in spatial language (whether people represent a scene from their own or a different spatial perspective), and perspective taking in action language (the extent to which they simulate an action as though they were performing that action). First, we note that vocabulary is used inconsistently across the spatial and action domains, and propose a more transparent vocabulary that will allow researchers to integrate action- and spatial-perspective taking. Second, we note that embodied theories of language comprehension often make the narrow assumption that understanding action descriptions involves adopting the perspective of an agent carrying out that action. We argue that comprehenders can adopt embodied action-perspectives other than that of the agent, including those of the patient or an observer. Third, we review evidence showing that perspective taking in spatial language is a flexible process. We argue that the flexibility of spatial-perspective taking provides a means for conversation partners engaged in dialogue to maximize similarity between their situation models. These situation models can then be used as the basis for action language simulations, in which language users adopt a particular action-perspective. PMID:24062676

  6. A searching and reporting system for relational databases using a graph-based metadata representation.

    PubMed

    Hewitt, Robin; Gobbi, Alberto; Lee, Man-Ling

    2005-01-01

    Relational databases are the current standard for storing and retrieving data in the pharmaceutical and biotech industries. However, retrieving data from a relational database requires specialized knowledge of the database schema and of the SQL query language. At Anadys, we have developed an easy-to-use system for searching and reporting data in a relational database to support our drug discovery project teams. This system is fast and flexible and allows users to access all data without having to write SQL queries. This paper presents the hierarchical, graph-based metadata representation and SQL-construction methods that, together, are the basis of this system's capabilities.

  7. Health consumer-oriented information retrieval.

    PubMed

    Claveau, Vincent; Hamon, Thierry; Le Maguer, Sébastien; Grabar, Natalia

    2015-01-01

    While patients can freely access their Electronic Health Records or online health information, they may not be able to correctly understand the content of these documents. One of the challenges is related to the difference between expert and non-expert languages. We propose to investigate this issue within the Information Retrieval field. The patient queries have to be associated with the corresponding expert documents, that provide trustworthy information. Our approach relies on a state-of-the-art IR system called Indri and on semantic resources. Different query expansion strategies are explored. Our system shows up to 0.6740 P@10, up to 0.7610 R@10, and up to 0.6793 NDCG@10.

  8. Measuring up: Implementing a dental quality measure in the electronic health record context.

    PubMed

    Bhardwaj, Aarti; Ramoni, Rachel; Kalenderian, Elsbeth; Neumann, Ana; Hebballi, Nutan B; White, Joel M; McClellan, Lyle; Walji, Muhammad F

    2016-01-01

    Quality improvement requires using quality measures that can be implemented in a valid manner. Using guidelines set forth by the Meaningful Use portion of the Health Information Technology for Economic and Clinical Health Act, the authors assessed the feasibility and performance of an automated electronic Meaningful Use dental clinical quality measure to determine the percentage of children who received fluoride varnish. The authors defined how to implement the automated measure queries in a dental electronic health record. Within records identified through automated query, the authors manually reviewed a subsample to assess the performance of the query. The automated query results revealed that 71.0% of patients had fluoride varnish compared with the manual chart review results that indicated 77.6% of patients had fluoride varnish. The automated quality measure performance results indicated 90.5% sensitivity, 90.8% specificity, 96.9% positive predictive value, and 75.2% negative predictive value. The authors' findings support the feasibility of using automated dental quality measure queries in the context of sufficient structured data. Information noted only in free text rather than in structured data would require using natural language processing approaches to effectively query electronic health records. To participate in self-directed quality improvement, dental clinicians must embrace the accountability era. Commitment to quality will require enhanced documentation to support near-term automated calculation of quality measures. Copyright © 2016 American Dental Association. Published by Elsevier Inc. All rights reserved.

  9. Providing R-Tree Support for Mongodb

    NASA Astrophysics Data System (ADS)

    Xiang, Longgang; Shao, Xiaotian; Wang, Dehao

    2016-06-01

    Supporting large amounts of spatial data is a significant characteristic of modern databases. However, unlike some mature relational databases, such as Oracle and PostgreSQL, most of current burgeoning NoSQL databases are not well designed for storing geospatial data, which is becoming increasingly important in various fields. In this paper, we propose a novel method to provide R-tree index, as well as corresponding spatial range query and nearest neighbour query functions, for MongoDB, one of the most prevalent NoSQL databases. First, after in-depth analysis of MongoDB's features, we devise an efficient tabular document structure which flattens R-tree index into MongoDB collections. Further, relevant mechanisms of R-tree operations are issued, and then we discuss in detail how to integrate R-tree into MongoDB. Finally, we present the experimental results which show that our proposed method out-performs the built-in spatial index of MongoDB. Our research will greatly facilitate big data management issues with MongoDB in a variety of geospatial information applications.

  10. Understanding Language, Hearing Status, and Visual-Spatial Skills

    PubMed Central

    Marschark, Marc; Spencer, Linda J.; Durkin, Andreana; Borgna, Georgianna; Convertino, Carol; Machmer, Elizabeth; Kronenberger, William G.; Trani, Alexandra

    2015-01-01

    It is frequently assumed that deaf individuals have superior visual-spatial abilities relative to hearing peers and thus, in educational settings, they are often considered visual learners. There is some empirical evidence to support the former assumption, although it is inconsistent, and apparently none to support the latter. Three experiments examined visual-spatial and related cognitive abilities among deaf individuals who varied in their preferred language modality and use of cochlear implants (CIs) and hearing individuals who varied in their sign language skills. Sign language and spoken language assessments accompanied tasks involving visual-spatial processing, working memory, nonverbal logical reasoning, and executive function. Results were consistent with other recent studies indicating no generalized visual-spatial advantage for deaf individuals and suggested that their performance in that domain may be linked to the strength of their preferred language skills regardless of modality. Hearing individuals performed more strongly than deaf individuals on several visual-spatial and self-reported executive functioning measures, regardless of sign language skills or use of CIs. Findings are inconsistent with assumptions that deaf individuals are visual learners or are superior to hearing individuals across a broad range of visual-spatial tasks. Further, performance of deaf and hearing individuals on the same visual-spatial tasks was associated with differing cognitive abilities, suggesting that different cognitive processes may be involved in visual-spatial processing in these groups. PMID:26141071

  11. EpiK: A Knowledge Base for Epidemiological Modeling and Analytics of Infectious Diseases

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hasan, S. M. Shamimul; Fox, Edward A.; Bisset, Keith

    Computational epidemiology seeks to develop computational methods to study the distribution and determinants of health-related states or events (including disease), and the application of this study to the control of diseases and other health problems. Recent advances in computing and data sciences have led to the development of innovative modeling environments to support this important goal. The datasets used to drive the dynamic models as well as the data produced by these models presents unique challenges owing to their size, heterogeneity and diversity. These datasets form the basis of effective and easy to use decision support and analytical environments. Asmore » a result, it is important to develop scalable data management systems to store, manage and integrate these datasets. In this paper, we develop EpiK—a knowledge base that facilitates the development of decision support and analytical environments to support epidemic science. An important goal is to develop a framework that links the input as well as output datasets to facilitate effective spatio-temporal and social reasoning that is critical in planning and intervention analysis before and during an epidemic. The data management framework links modeling workflow data and its metadata using a controlled vocabulary. The metadata captures information about storage, the mapping between the linked model and the physical layout, and relationships to support services. EpiK is designed to support agent-based modeling and analytics frameworks—aggregate models can be seen as special cases and are thus supported. We use semantic web technologies to create a representation of the datasets that encapsulates both the location and the schema heterogeneity. The choice of RDF as a representation language is motivated by the diversity and growth of the datasets that need to be integrated. A query bank is developed—the queries capture a broad range of questions that can be posed and answered during a typical case study pertaining to disease outbreaks. The queries are constructed using SPARQL Protocol and RDF Query Language (SPARQL) over the EpiK. EpiK can hide schema and location heterogeneity while efficiently supporting queries that span the computational epidemiology modeling pipeline: from model construction to simulation output. As a result, we show that the performance of benchmark queries varies significantly with respect to the choice of hardware underlying the database and resource description framework (RDF) engine.« less

  12. EpiK: A Knowledge Base for Epidemiological Modeling and Analytics of Infectious Diseases

    DOE PAGES

    Hasan, S. M. Shamimul; Fox, Edward A.; Bisset, Keith; ...

    2017-11-06

    Computational epidemiology seeks to develop computational methods to study the distribution and determinants of health-related states or events (including disease), and the application of this study to the control of diseases and other health problems. Recent advances in computing and data sciences have led to the development of innovative modeling environments to support this important goal. The datasets used to drive the dynamic models as well as the data produced by these models presents unique challenges owing to their size, heterogeneity and diversity. These datasets form the basis of effective and easy to use decision support and analytical environments. Asmore » a result, it is important to develop scalable data management systems to store, manage and integrate these datasets. In this paper, we develop EpiK—a knowledge base that facilitates the development of decision support and analytical environments to support epidemic science. An important goal is to develop a framework that links the input as well as output datasets to facilitate effective spatio-temporal and social reasoning that is critical in planning and intervention analysis before and during an epidemic. The data management framework links modeling workflow data and its metadata using a controlled vocabulary. The metadata captures information about storage, the mapping between the linked model and the physical layout, and relationships to support services. EpiK is designed to support agent-based modeling and analytics frameworks—aggregate models can be seen as special cases and are thus supported. We use semantic web technologies to create a representation of the datasets that encapsulates both the location and the schema heterogeneity. The choice of RDF as a representation language is motivated by the diversity and growth of the datasets that need to be integrated. A query bank is developed—the queries capture a broad range of questions that can be posed and answered during a typical case study pertaining to disease outbreaks. The queries are constructed using SPARQL Protocol and RDF Query Language (SPARQL) over the EpiK. EpiK can hide schema and location heterogeneity while efficiently supporting queries that span the computational epidemiology modeling pipeline: from model construction to simulation output. As a result, we show that the performance of benchmark queries varies significantly with respect to the choice of hardware underlying the database and resource description framework (RDF) engine.« less

  13. Spatial congruence in language and species richness but not threat in the world's top linguistic hotspot.

    PubMed

    Turvey, Samuel T; Pettorelli, Nathalie

    2014-12-07

    Languages share key evolutionary properties with biological species, and global-level spatial congruence in richness and threat is documented between languages and several taxonomic groups. However, there is little understanding of the functional connection between diversification or extinction in languages and species, or the relationship between linguistic and species richness across different spatial scales. New Guinea is the world's most linguistically rich region and contains extremely high biological diversity. We demonstrate significant positive relationships between language and mammal richness in New Guinea across multiple spatial scales, revealing a likely functional relationship over scales at which infra-island diversification may occur. However, correlations are driven by spatial congruence between low levels of language and species richness. Regional biocultural richness may have showed closer congruence before New Guinea's linguistic landscape was altered by Holocene demographic events. In contrast to global studies, we demonstrate a significant negative correlation across New Guinea between areas with high levels of threatened languages and threatened mammals, indicating that landscape-scale threats differ between these groups. Spatial resource prioritization to conserve biodiversity may not benefit threatened languages, and conservation policy must adopt a multi-faceted approach to protect biocultural diversity as a whole.

  14. An Efficient Method for the Retrieval of Objects by Topological Relations in Spatial Database Systems.

    ERIC Educational Resources Information Center

    Lin, P. L.; Tan, W. H.

    2003-01-01

    Presents a new method to improve the performance of query processing in a spatial database. Experiments demonstrated that performance of database systems can be improved because both the number of objects accessed and number of objects requiring detailed inspection are much less than those in the previous approach. (AEF)

  15. High-performance analysis of filtered semantic graphs

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Buluc, Aydin; Fox, Armando; Gilbert, John R.

    2012-01-01

    High performance is a crucial consideration when executing a complex analytic query on a massive semantic graph. In a semantic graph, vertices and edges carry "attributes" of various types. Analytic queries on semantic graphs typically depend on the values of these attributes; thus, the computation must either view the graph through a filter that passes only those individual vertices and edges of interest, or else must first materialize a subgraph or subgraphs consisting of only the vertices and edges of interest. The filtered approach is superior due to its generality, ease of use, and memory efficiency, but may carry amore » performance cost. In the Knowledge Discovery Toolbox (KDT), a Python library for parallel graph computations, the user writes filters in a high-level language, but those filters result in relatively low performance due to the bottleneck of having to call into the Python interpreter for each edge. In this work, we use the Selective Embedded JIT Specialization (SEJITS) approach to automatically translate filters defined by programmers into a lower-level efficiency language, bypassing the upcall into Python. We evaluate our approach by comparing it with the high-performance C++ /MPI Combinatorial BLAS engine, and show that the productivity gained by using a high-level filtering language comes without sacrificing performance.« less

  16. Automated Assistance in the Formulation of Search Statements for Bibliographic Databases.

    ERIC Educational Resources Information Center

    Oakes, Michael P.; Taylor, Malcolm J.

    1998-01-01

    Reports on the design of an automated query system to help pharmacologists access the Derwent Drug File (DDF). Topics include knowledge types; knowledge representation; role of the search intermediary; vocabulary selection, thesaurus, and user input in natural language; browsing; evaluation methods; and search statement generation for the World…

  17. The Effect of Relational Database Technology on Administrative Computing at Carnegie Mellon University.

    ERIC Educational Resources Information Center

    Golden, Cynthia; Eisenberger, Dorit

    1990-01-01

    Carnegie Mellon University's decision to standardize its administrative system development efforts on relational database technology and structured query language is discussed and its impact is examined in one of its larger, more widely used applications, the university information system. Advantages, new responsibilities, and challenges of the…

  18. A Gene Ontology Tutorial in Python.

    PubMed

    Vesztrocy, Alex Warwick; Dessimoz, Christophe

    2017-01-01

    This chapter is a tutorial on using Gene Ontology resources in the Python programming language. This entails querying the Gene Ontology graph, retrieving Gene Ontology annotations, performing gene enrichment analyses, and computing basic semantic similarity between GO terms. An interactive version of the tutorial, including solutions, is available at http://gohandbook.org .

  19. ADVICE--Educational System for Teaching Database Courses

    ERIC Educational Resources Information Center

    Cvetanovic, M.; Radivojevic, Z.; Blagojevic, V.; Bojovic, M.

    2011-01-01

    This paper presents a Web-based educational system, ADVICE, that helps students to bridge the gap between database management system (DBMS) theory and practice. The usage of ADVICE is presented through a set of laboratory exercises developed to teach students conceptual and logical modeling, SQL, formal query languages, and normalization. While…

  20. A Prototype of an Intelligent System for Information Retrieval: IOTA.

    ERIC Educational Resources Information Center

    Chiaramella, Y.; Defude, B.

    1987-01-01

    Discusses expert systems and their value as components of information retrieval systems related to semantic inference, and describes IOTA, a model of an intelligent information retrieval system which emphasizes natural language query processing. Experimental results are discussed and current and future developments are highlighted. (Author/LRW)

  1. E = Mc(super 2) for the Chemist: When is Mass Conserved?

    ERIC Educational Resources Information Center

    Treptow. Richard S.

    2005-01-01

    An equation derived by Albert Einstein in 1905 that expresses a relationship between mass and energy, formulated as E = mc(super 2) is discussed with reference to the extent mass is conserved. This query can be used to challenge students and develop their language and critical thinking skills.

  2. A spatio-temporal index for aerial full waveform laser scanning data

    NASA Astrophysics Data System (ADS)

    Laefer, Debra F.; Vo, Anh-Vu; Bertolotto, Michela

    2018-04-01

    Aerial laser scanning is increasingly available in the full waveform version of the raw signal, which can provide greater insight into and control over the data and, thus, richer information about the scanned scenes. However, when compared to conventional discrete point storage, preserving raw waveforms leads to vastly larger and more complex data volumes. To begin addressing these challenges, this paper introduces a novel bi-level approach for storing and indexing full waveform (FWF) laser scanning data in a relational database environment, while considering both the spatial and the temporal dimensions of that data. In the storage scheme's upper level, the full waveform datasets are partitioned into spatial and temporal coherent groups that are indexed by a two-dimensional R∗-tree. To further accelerate intra-block data retrieval, at the lower level a three-dimensional local octree is created for each pulse block. The local octrees are implemented in-memory and can be efficiently written to a database for reuse. The indexing solution enables scalable and efficient three-dimensional (3D) spatial and spatio-temporal queries on the actual pulse data - functionalities not available in other systems. The proposed FWF laser scanning data solution is capable of managing multiple FWF datasets derived from large flight missions. The flight structure is embedded into the data storage model and can be used for querying predicates. Such functionality is important to FWF data exploration since aircraft locations and orientations are frequently required for FWF data analyses. Empirical tests on real datasets of up to 1 billion pulses from Dublin, Ireland prove the almost perfect scalability of the system. The use of the local 3D octree in the indexing structure accelerated pulse clipping by 1.2-3.5 times for non-axis-aligned (NAA) polyhedron shaped clipping windows, while axis-aligned (AA) polyhedron clipping was better served using only the top indexing layer. The distinct behaviours of the hybrid indexing for AA and NAA clipping windows are attributable to the different proportion of the local-index-related overheads with respect to the total querying costs. When temporal constraints were added, generally the number of costly spatial checks were reduced, thereby shortening the querying times.

  3. The (Spatial) Memory Game: Testing the Relationship Between Spatial Language, Object Knowledge, and Spatial Cognition.

    PubMed

    Gudde, Harmen B; Griffiths, Debra; Coventry, Kenny R

    2018-02-19

    The memory game paradigm is a behavioral procedure to explore the relationship between language, spatial memory, and object knowledge. Using two different versions of the paradigm, spatial language use and memory for object location are tested under different, experimentally manipulated conditions. This allows us to tease apart proposed models explaining the influence of object knowledge on spatial language (e.g., spatial demonstratives), and spatial memory, as well as understanding the parameters that affect demonstrative choice and spatial memory more broadly. Key to the development of the method was the need to collect data on language use (e.g., spatial demonstratives: "this/that") and spatial memory data under strictly controlled conditions, while retaining a degree of ecological validity. The language version (section 3.1) of the memory game tests how conditions affect language use. Participants refer verbally to objects placed at different locations (e.g., using spatial demonstratives: "this/that red circle"). Different parameters can be experimentally manipulated: the distance from the participant, the position of a conspecific, and for example whether the participant owns, knows, or sees the object while referring to it. The same parameters can be manipulated in the memory version of the memory game (section 3.2). This version tests the effects of the different conditions on object-location memory. Following object placement, participants get 10 seconds to memorize the object's location. After the object and location cues are removed, participants verbally direct the experimenter to move a stick to indicate where the object was. The difference between the memorized and the actual location shows the direction and strength of the memory error, allowing comparisons between the influences of the respective parameters.

  4. Advanced Query and Data Mining Capabilities for MaROS

    NASA Technical Reports Server (NTRS)

    Wang, Paul; Wallick, Michael N.; Allard, Daniel A.; Gladden, Roy E.; Hy, Franklin H.

    2013-01-01

    The Mars Relay Operational Service (MaROS) comprises a number of tools to coordinate, plan, and visualize various aspects of the Mars Relay network. These levels include a Web-based user interface, a back-end "ReSTlet" built in Java, and databases that store the data as it is received from the network. As part of MaROS, the innovators have developed and implemented a feature set that operates on several levels of the software architecture. This new feature is an advanced querying capability through either the Web-based user interface, or through a back-end REST interface to access all of the data gathered from the network. This software is not meant to replace the REST interface, but to augment and expand the range of available data. The current REST interface provides specific data that is used by the MaROS Web application to display and visualize the information; however, the returned information from the REST interface has typically been pre-processed to return only a subset of the entire information within the repository, particularly only the information that is of interest to the GUI (graphical user interface). The new, advanced query and data mining capabilities allow users to retrieve the raw data and/or to perform their own data processing. The query language used to access the repository is a restricted subset of the structured query language (SQL) that can be built safely from the Web user interface, or entered as freeform SQL by a user. The results are returned in a CSV (Comma Separated Values) format for easy exporting to third party tools and applications that can be used for data mining or user-defined visualization and interpretation. This is the first time that a service is capable of providing access to all cross-project relay data from a single Web resource. Because MaROS contains the data for a variety of missions from the Mars network, which span both NASA and ESA, the software also establishes an access control list (ACL) on each data record in the database repository to enforce user access permissions through a multilayered approach.

  5. Open Data, Jupyter Notebooks and Geospatial Data Standards Combined - Opening up large volumes of marine and climate data to other communities

    NASA Astrophysics Data System (ADS)

    Clements, O.; Siemen, S.; Wagemann, J.

    2017-12-01

    The EU-funded Earthserver-2 project aims to offer on-demand access to large volumes of environmental data (Earth Observation, Marine, Climate data and Planetary data) via the interface standard Web Coverage Service defined by the Open Geospatial Consortium. Providing access to data via OGC web services (e.g. WCS and WMS) has the potential to open up services to a wider audience, especially to users outside the respective communities. Especially WCS 2.0 with its processing extension Web Coverage Processing Service (WCPS) is highly beneficial to make large volumes accessible to non-expert communities. Users do not have to deal with custom community data formats, such as GRIB for the meteorological community, but can directly access the data in a format they are more familiar with, such as NetCDF, JSON or CSV. Data requests can further directly be integrated into custom processing routines and users are not required to download Gigabytes of data anymore. WCS supports trim (reduction of data extent) and slice (reduction of data dimension) operations on multi-dimensional data, providing users a very flexible on-demand access to the data. WCPS allows the user to craft queries to run on the data using a text-based query language, similar to SQL. These queries can be very powerful, e.g. condensing a three-dimensional data cube into its two-dimensional mean. However, the more processing-intensive the more complex the query. As part of the EarthServer-2 project, we developed a python library that helps users to generate complex WCPS queries with Python, a programming language they are more familiar with. The interactive presentation aims to give practical examples how users can benefit from two specific WCS services from the Marine and Climate community. Use-cases from the two communities will show different approaches to take advantage of a Web Coverage (Processing) Service. The entire content is available with Jupyter Notebooks, as they prove to be a highly beneficial tool to generate reproducible workflows for environmental data analysis.

  6. Understanding Language, Hearing Status, and Visual-Spatial Skills.

    PubMed

    Marschark, Marc; Spencer, Linda J; Durkin, Andreana; Borgna, Georgianna; Convertino, Carol; Machmer, Elizabeth; Kronenberger, William G; Trani, Alexandra

    2015-10-01

    It is frequently assumed that deaf individuals have superior visual-spatial abilities relative to hearing peers and thus, in educational settings, they are often considered visual learners. There is some empirical evidence to support the former assumption, although it is inconsistent, and apparently none to support the latter. Three experiments examined visual-spatial and related cognitive abilities among deaf individuals who varied in their preferred language modality and use of cochlear implants (CIs) and hearing individuals who varied in their sign language skills. Sign language and spoken language assessments accompanied tasks involving visual-spatial processing, working memory, nonverbal logical reasoning, and executive function. Results were consistent with other recent studies indicating no generalized visual-spatial advantage for deaf individuals and suggested that their performance in that domain may be linked to the strength of their preferred language skills regardless of modality. Hearing individuals performed more strongly than deaf individuals on several visual-spatial and self-reported executive functioning measures, regardless of sign language skills or use of CIs. Findings are inconsistent with assumptions that deaf individuals are visual learners or are superior to hearing individuals across a broad range of visual-spatial tasks. Further, performance of deaf and hearing individuals on the same visual-spatial tasks was associated with differing cognitive abilities, suggesting that different cognitive processes may be involved in visual-spatial processing in these groups. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  7. A Lightweight I/O Scheme to Facilitate Spatial and Temporal Queries of Scientific Data Analytics

    NASA Technical Reports Server (NTRS)

    Tian, Yuan; Liu, Zhuo; Klasky, Scott; Wang, Bin; Abbasi, Hasan; Zhou, Shujia; Podhorszki, Norbert; Clune, Tom; Logan, Jeremy; Yu, Weikuan

    2013-01-01

    In the era of petascale computing, more scientific applications are being deployed on leadership scale computing platforms to enhance the scientific productivity. Many I/O techniques have been designed to address the growing I/O bottleneck on large-scale systems by handling massive scientific data in a holistic manner. While such techniques have been leveraged in a wide range of applications, they have not been shown as adequate for many mission critical applications, particularly in data post-processing stage. One of the examples is that some scientific applications generate datasets composed of a vast amount of small data elements that are organized along many spatial and temporal dimensions but require sophisticated data analytics on one or more dimensions. Including such dimensional knowledge into data organization can be beneficial to the efficiency of data post-processing, which is often missing from exiting I/O techniques. In this study, we propose a novel I/O scheme named STAR (Spatial and Temporal AggRegation) to enable high performance data queries for scientific analytics. STAR is able to dive into the massive data, identify the spatial and temporal relationships among data variables, and accordingly organize them into an optimized multi-dimensional data structure before storing to the storage. This technique not only facilitates the common access patterns of data analytics, but also further reduces the application turnaround time. In particular, STAR is able to enable efficient data queries along the time dimension, a practice common in scientific analytics but not yet supported by existing I/O techniques. In our case study with a critical climate modeling application GEOS-5, the experimental results on Jaguar supercomputer demonstrate an improvement up to 73 times for the read performance compared to the original I/O method.

  8. A Story of a Crashed Plane in US-Mexican border

    NASA Astrophysics Data System (ADS)

    Bermudez, Luis; Hobona, Gobe; Vretanos, Peter; Peterson, Perry

    2013-04-01

    A plane has crashed on the US-Mexican border. The search and rescue command center planner needs to find information about the crash site, a mountain, nearby mountains for the establishment of a communications tower, as well as ranches for setting up a local incident center. Events like this one occur all over the world and exchanging information seamlessly is key to save lives and prevent further disasters. This abstract describes an interoperability testbed that applied this scenario using technologies based on Open Geospatial Consortium (OGC) standards. The OGC, which has about 500 members, serves as a global forum for the collaboration of developers and users of spatial data products and services, and to advance the development of international standards for geospatial interoperability. The OGC Interoperability Program conducts international interoperability testbeds, such as the OGC Web Services Phase 9 (OWS-9), that encourages rapid development, testing, validation, demonstration and adoption of open, consensus based standards and best practices. The Cross-Community Interoperability (CCI) thread in OWS-9 advanced the Web Feature Service for Gazetteers (WFS-G) by providing a Single Point of Entry Global Gazetteer (SPEGG), where a user can submit a single query and access global geographic names data across multiple Federal names databases. Currently users must make two queries with differing input parameters against two separate databases to obtain authoritative cross border geographic names data. The gazetteers in this scenario included: GNIS and GNS. GNIS or Geographic Names Information System is managed by USGS. It was first developed in 1964 and contains information about domestic and Antarctic names. GNS or GeoNET Names Server provides the Geographic Names Data Base (GNDB) and it is managed by National Geospatial Intelligence Agency (NGA). GNS has been in service since 1994, and serves names for areas outside the United States and its dependent areas, as well as names for undersea features. The following challenges were advanced: Cascaded WFS-G servers (allowing to query multiple WFSs with a "parent" WFS), implemented query names filters (e.g. fuzzy search, text search), implemented dealing with multilingualism and diacritics, implemented advanced spatial constraints (e.g. search by radial search and nearest neighbor) and semantically mediated feature types (e.g. mountain vs. hill). To enable semantic mediation, a series of semantic mappings were defined between the NGA GNS, USGS GNIS and the Alexandria Digital Library (ADL) Gazetteer. The mappings were encoded in the Web Ontology Language (OWL) to enable them to be used by semantic web technologies. The semantic mappings were then published for ingestion into a semantic mediator that used the mappings to associate location types from one gazetteer with location types in another. The semantic mediator was then able to transform requests on the fly, providing a single point of entry WFS-G to multiple gazetteers. The presentation will provide a live presentation of the work performed, highlight main developments, and discuss future development.

  9. Earth-Base: A Free And Open Source, RESTful Earth Sciences Platform

    NASA Astrophysics Data System (ADS)

    Kishor, P.; Heim, N. A.; Peters, S. E.; McClennen, M.

    2012-12-01

    This presentation describes the motivation, concept, and architecture behind Earth-Base, a web-based, RESTful data-management, analysis and visualization platform for earth sciences data. Traditionally web applications have been built directly accessing data from a database using a scripting language. While such applications are great at bring results to a wide audience, they are limited in scope to the imagination and capabilities of the application developer. Earth-Base decouples the data store from the web application by introducing an intermediate "data application" tier. The data application's job is to query the data store using self-documented, RESTful URIs, and send the results back formatted as JavaScript Object Notation (JSON). Decoupling the data store from the application allows virtually limitless flexibility in developing applications, both web-based for human consumption or programmatic for machine consumption. It also allows outside developers to use the data in their own applications, potentially creating applications that the original data creator and app developer may not have even thought of. Standardized specifications for URI-based querying and JSON-formatted results make querying and developing applications easy. URI-based querying also allows utilizing distributed datasets easily. Companion mechanisms for querying data snapshots aka time-travel, usage tracking and license management, and verification of semantic equivalence of data are also described. The latter promotes the "What You Expect Is What You Get" (WYEIWYG) principle that can aid in data citation and verification.

  10. Virtual Observatory Interfaces to the Chandra Data Archive

    NASA Astrophysics Data System (ADS)

    Tibbetts, M.; Harbo, P.; Van Stone, D.; Zografou, P.

    2014-05-01

    The Chandra Data Archive (CDA) plays a central role in the operation of the Chandra X-ray Center (CXC) by providing access to Chandra data. Proprietary interfaces have been the backbone of the CDA throughout the Chandra mission. While these interfaces continue to provide the depth and breadth of mission specific access Chandra users expect, the CXC has been adding Virtual Observatory (VO) interfaces to the Chandra proposal catalog and observation catalog. VO interfaces provide standards-based access to Chandra data through simple positional queries or more complex queries using the Astronomical Data Query Language. Recent development at the CDA has generalized our existing VO services to create a suite of services that can be configured to provide VO interfaces to any dataset. This approach uses a thin web service layer for the individual VO interfaces, a middle-tier query component which is shared among the VO interfaces for parsing, scheduling, and executing queries, and existing web services for file and data access. The CXC VO services provide Simple Cone Search (SCS), Simple Image Access (SIA), and Table Access Protocol (TAP) implementations for both the Chandra proposal and observation catalogs within the existing archive architecture. Our work with the Chandra proposal and observation catalogs, as well as additional datasets beyond the CDA, illustrates how we can provide configurable VO services to extend core archive functionality.

  11. Semantator: semantic annotator for converting biomedical text to linked data.

    PubMed

    Tao, Cui; Song, Dezhao; Sharma, Deepak; Chute, Christopher G

    2013-10-01

    More than 80% of biomedical data is embedded in plain text. The unstructured nature of these text-based documents makes it challenging to easily browse and query the data of interest in them. One approach to facilitate browsing and querying biomedical text is to convert the plain text to a linked web of data, i.e., converting data originally in free text to structured formats with defined meta-level semantics. In this paper, we introduce Semantator (Semantic Annotator), a semantic-web-based environment for annotating data of interest in biomedical documents, browsing and querying the annotated data, and interactively refining annotation results if needed. Through Semantator, information of interest can be either annotated manually or semi-automatically using plug-in information extraction tools. The annotated results will be stored in RDF and can be queried using the SPARQL query language. In addition, semantic reasoners can be directly applied to the annotated data for consistency checking and knowledge inference. Semantator has been released online and was used by the biomedical ontology community who provided positive feedbacks. Our evaluation results indicated that (1) Semantator can perform the annotation functionalities as designed; (2) Semantator can be adopted in real applications in clinical and transactional research; and (3) the annotated results using Semantator can be easily used in Semantic-web-based reasoning tools for further inference. Copyright © 2013 Elsevier Inc. All rights reserved.

  12. A spatial data handling system for retrieval of images by unrestricted regions of user interest

    NASA Technical Reports Server (NTRS)

    Dorfman, Erik; Cromp, Robert F.

    1992-01-01

    The Intelligent Data Management (IDM) project at NASA/Goddard Space Flight Center has prototyped an Intelligent Information Fusion System (IIFS), which automatically ingests metadata from remote sensor observations into a large catalog which is directly queryable by end-users. The greatest challenge in the implementation of this catalog was supporting spatially-driven searches, where the user has a possible complex region of interest and wishes to recover those images that overlap all or simply a part of that region. A spatial data management system is described, which is capable of storing and retrieving records of image data regardless of their source. This system was designed and implemented as part of the IIFS catalog. A new data structure, called a hypercylinder, is central to the design. The hypercylinder is specifically tailored for data distributed over the surface of a sphere, such as satellite observations of the Earth or space. Operations on the hypercylinder are regulated by two expert systems. The first governs the ingest of new metadata records, and maintains the efficiency of the data structure as it grows. The second translates, plans, and executes users' spatial queries, performing incremental optimization as partial query results are returned.

  13. Ontology for Transforming Geo-Spatial Data for Discovery and Integration of Scientific Data

    NASA Astrophysics Data System (ADS)

    Nguyen, L.; Chee, T.; Minnis, P.

    2013-12-01

    Discovery and access to geo-spatial scientific data across heterogeneous repositories and multi-discipline datasets can present challenges for scientist. We propose to build a workflow for transforming geo-spatial datasets into semantic environment by using relationships to describe the resource using OWL Web Ontology, RDF, and a proposed geo-spatial vocabulary. We will present methods for transforming traditional scientific dataset, use of a semantic repository, and querying using SPARQL to integrate and access datasets. This unique repository will enable discovery of scientific data by geospatial bound or other criteria.

  14. Introduction to SQL. Ch. 1

    NASA Technical Reports Server (NTRS)

    McGlynn, T.; Santisteban, M.

    2007-01-01

    This chapter provides a very brief introduction to the Structured Query Language (SQL) for getting information from relational databases. We make no pretense that this is a complete or comprehensive discussion of SQL. There are many aspects of the language the will be completely ignored in the presentation. The goal here is to provide enough background so that users understand the basic concepts involved in building and using relational databases. We also go through the steps involved in building a particular astronomical database used in some of the other presentations in this volume.

  15. Translation lexicon acquisition from bilingual dictionaries

    NASA Astrophysics Data System (ADS)

    Doermann, David S.; Ma, Huanfeng; Karagol-Ayan, Burcu; Oard, Douglas W.

    2001-12-01

    Bilingual dictionaries hold great potential as a source of lexical resources for training automated systems for optical character recognition, machine translation and cross-language information retrieval. In this work we describe a system for extracting term lexicons from printed copies of bilingual dictionaries. We describe our approach to page and definition segmentation and entry parsing. We have used the approach to parse a number of dictionaries and demonstrate the results for retrieval using a French-English Dictionary to generate a translation lexicon and a corpus of English queries applied to French documents to evaluation cross-language IR.

  16. Heterogeneous distributed query processing: The DAVID system

    NASA Technical Reports Server (NTRS)

    Jacobs, Barry E.

    1985-01-01

    The objective of the Distributed Access View Integrated Database (DAVID) project is the development of an easy to use computer system with which NASA scientists, engineers and administrators can uniformly access distributed heterogeneous databases. Basically, DAVID will be a database management system that sits alongside already existing database and file management systems. Its function is to enable users to access the data in other languages and file systems without having to learn the data manipulation languages. Given here is an outline of a talk on the DAVID project and several charts.

  17. What Is Spatio-Temporal Data Warehousing?

    NASA Astrophysics Data System (ADS)

    Vaisman, Alejandro; Zimányi, Esteban

    In the last years, extending OLAP (On-Line Analytical Processing) systems with spatial and temporal features has attracted the attention of the GIS (Geographic Information Systems) and database communities. However, there is no a commonly agreed definition of what is a spatio-temporal data warehouse and what functionality such a data warehouse should support. Further, the solutions proposed in the literature vary considerably in the kind of data that can be represented as well as the kind of queries that can be expressed. In this paper we present a conceptual framework for defining spatio-temporal data warehouses using an extensible data type system. We also define a taxonomy of different classes of queries of increasing expressive power, and show how to express such queries using an extension of the tuple relational calculus with aggregated functions.

  18. HC StratoMineR: A Web-Based Tool for the Rapid Analysis of High-Content Datasets.

    PubMed

    Omta, Wienand A; van Heesbeen, Roy G; Pagliero, Romina J; van der Velden, Lieke M; Lelieveld, Daphne; Nellen, Mehdi; Kramer, Maik; Yeong, Marley; Saeidi, Amir M; Medema, Rene H; Spruit, Marco; Brinkkemper, Sjaak; Klumperman, Judith; Egan, David A

    2016-10-01

    High-content screening (HCS) can generate large multidimensional datasets and when aligned with the appropriate data mining tools, it can yield valuable insights into the mechanism of action of bioactive molecules. However, easy-to-use data mining tools are not widely available, with the result that these datasets are frequently underutilized. Here, we present HC StratoMineR, a web-based tool for high-content data analysis. It is a decision-supportive platform that guides even non-expert users through a high-content data analysis workflow. HC StratoMineR is built by using My Structured Query Language for storage and querying, PHP: Hypertext Preprocessor as the main programming language, and jQuery for additional user interface functionality. R is used for statistical calculations, logic and data visualizations. Furthermore, C++ and graphical processor unit power is diffusely embedded in R by using the rcpp and rpud libraries for operations that are computationally highly intensive. We show that we can use HC StratoMineR for the analysis of multivariate data from a high-content siRNA knock-down screen and a small-molecule screen. It can be used to rapidly filter out undesirable data; to select relevant data; and to perform quality control, data reduction, data exploration, morphological hit picking, and data clustering. Our results demonstrate that HC StratoMineR can be used to functionally categorize HCS hits and, thus, provide valuable information for hit prioritization.

  19. Spatial Data Services for Interdisciplinary Applications from the NASA Socioeconomic Data and Applications Center

    NASA Astrophysics Data System (ADS)

    Chen, R. S.; MacManus, K.; Vinay, S.; Yetman, G.

    2016-12-01

    The Socioeconomic Data and Applications Center (SEDAC), one of 12 Distributed Active Archive Centers (DAACs) in the NASA Earth Observing System Data and Information System (EOSDIS), has developed a variety of operational spatial data services aimed at providing online access, visualization, and analytic functions for geospatial socioeconomic and environmental data. These services include: open web services that implement Open Geospatial Consortium (OGC) specifications such as Web Map Service (WMS), Web Feature Service (WFS), and Web Coverage Service (WCS); spatial query services that support Web Processing Service (WPS) and Representation State Transfer (REST); and web map clients and a mobile app that utilize SEDAC and other open web services. These services may be accessed from a variety of external map clients and visualization tools such as NASA's WorldView, NOAA's Climate Explorer, and ArcGIS Online. More than 200 data layers related to population, settlements, infrastructure, agriculture, environmental pollution, land use, health, hazards, climate change and other aspects of sustainable development are available through WMS, WFS, and/or WCS. Version 2 of the SEDAC Population Estimation Service (PES) supports spatial queries through WPS and REST in the form of a user-defined polygon or circle. The PES returns an estimate of the population residing in the defined area for a specific year (2000, 2005, 2010, 2015, or 2020) based on SEDAC's Gridded Population of the World version 4 (GPWv4) dataset, together with measures of accuracy. The SEDAC Hazards Mapper and the recently released HazPop iOS mobile app enable users to easily submit spatial queries to the PES and see the results. SEDAC has developed an operational virtualized backend infrastructure to manage these services and support their continual improvement as standards change, new data and services become available, and user needs evolve. An ongoing challenge is to improve the reliability and performance of the infrastructure, in conjunction with external services, to meet both research and operational needs.

  20. Spatial Language, Visual Attention, and Perceptual Simulation

    ERIC Educational Resources Information Center

    Coventry, Kenny R.; Lynott, Dermot; Cangelosi, Angelo; Monrouxe, Lynn; Joyce, Dan; Richardson, Daniel C.

    2010-01-01

    Spatial language descriptions, such as "The bottle is over the glass", direct the attention of the hearer to particular aspects of the visual world. This paper asks how they do so, and what brain mechanisms underlie this process. In two experiments employing behavioural and eye tracking methodologies we examined the effects of spatial language on…

  1. A database de-identification framework to enable direct queries on medical data for secondary use.

    PubMed

    Erdal, B S; Liu, J; Ding, J; Chen, J; Marsh, C B; Kamal, J; Clymer, B D

    2012-01-01

    To qualify the use of patient clinical records as non-human-subject for research purpose, electronic medical record data must be de-identified so there is minimum risk to protected health information exposure. This study demonstrated a robust framework for structured data de-identification that can be applied to any relational data source that needs to be de-identified. Using a real world clinical data warehouse, a pilot implementation of limited subject areas were used to demonstrate and evaluate this new de-identification process. Query results and performances are compared between source and target system to validate data accuracy and usability. The combination of hashing, pseudonyms, and session dependent randomizer provides a rigorous de-identification framework to guard against 1) source identifier exposure; 2) internal data analyst manually linking to source identifiers; and 3) identifier cross-link among different researchers or multiple query sessions by the same researcher. In addition, a query rejection option is provided to refuse queries resulting in less than preset numbers of subjects and total records to prevent users from accidental subject identification due to low volume of data. This framework does not prevent subject re-identification based on prior knowledge and sequence of events. Also, it does not deal with medical free text de-identification, although text de-identification using natural language processing can be included due its modular design. We demonstrated a framework resulting in HIPAA Compliant databases that can be directly queried by researchers. This technique can be augmented to facilitate inter-institutional research data sharing through existing middleware such as caGrid.

  2. Aggregating Queries Against Large Inventories of Remotely Accessible Data

    NASA Astrophysics Data System (ADS)

    Gallagher, J. H. R.; Fulker, D. W.

    2016-12-01

    Those seeking to discover data for a specific purpose often encounter search results that are so large as to be useless without computing assistance. This situation arises, with increasing frequency, in part because repositories contain ever greater numbers of granules, and their granularities may well be poorly aligned or even orthogonal to the data-selection needs of the user. This presentation describes a recently developed service for simultaneously querying large lists of OPeNDAP-accessible granules to extract specified data. The specifications include a richly expressive set of data-selection criteria—applicable to content as well as metadata—and the service has been tested successfully against lists naming hundreds of thousands of granules. Querying such numbers of local files (i.e., granules) on a desktop or laptop computer is practical (by using a scripting language, e.g.), but this practicality is diminished when the data are remote and thus best accessed through a Web-services interface. In these cases, which are increasingly common, scripted queries can take many hours because of inherent network latencies. Furthermore, communication dropouts can add fragility to such scripts, yielding gaps in the acquired results. In contrast, OPeNDAP's new aggregated-query services enable data discovery in the context of very large inventory sizes. These capabilities have been developed for use with OPeNDAP's Hyrax server, which is an open-source realization of DAP (for "Data Access Protocol," a specification widely used in NASA, NOAA and other data-intensive contexts). These aggregated-query services exhibit good response times (on the order of seconds, not hours) even for inventories that list hundreds of thousands of source granules.

  3. Challenges facing the development of the Arabic chatbot

    NASA Astrophysics Data System (ADS)

    AlHagbani, Eman Saad; Khan, Muhammad Badruddin

    2016-07-01

    The future information systems are expected to be more intelligent and will take human queries in natural language as input and answer them promptly. To develop a chatbot or a computer program that can chat with humans in realistic manner to extent that human get impressions that he/she is talking with other human is a challenging task. To make such chatbots, different technologies will work together ranging from artificial intelligence to development of semantic resources. Sophisticated chatbots are developed to perform conversation in number of languages. Arabic chatbots can be helpful in automating many operations and serve people who only know Arabic language. However, the technology for Arabic language is still in its infancy stage due to some challenges surrounding the Arabic language. This paper offers an overview of the chatbot application and the several obstacles and challenges that need to be resolved to develop an effective Arabic chatbot.

  4. UCLA at TREC 2014 Clinical Decision Support Track: Exploring Language Models, Query Expansion, and Boosting

    DTIC Science & Technology

    2014-11-01

    for 6 months. Median performance for this topic was relatively low, despite being an easy diagnosis of hypothyroidism for a medical expert. However... hypothyroidism was ranked 3rd in the retrieval results. Without boosting, the highest-ranked article on hypothyroidism was ranked 16th. In contrast, this

  5. "Garbage" In, "Refuse and Refuse Disposal" Out: Making the Most of the Subject Authority File in OPAC.

    ERIC Educational Resources Information Center

    Horn, Marguerite E.

    2002-01-01

    Discusses the difference in subject access in OPACs (online public access catalogs) between subject searching (authority, alphabetic, or controlled vocabulary) versus keyword searching (uncontrolled, free text, natural language vocabulary). Compares a query on the term "garbage" in two online catalogs and discusses results. (Author/LRW)

  6. Alaska High School Seniors Survey Report, 1979-80.

    ERIC Educational Resources Information Center

    Alaska State Commission on Postsecondary Education, Juneau.

    Public and private high school seniors from Alaska were surveyed in an effort to document the pattern of postsecondary education outside the state and to understand the underlying motivations of the "brain drain." For 1979-1980, 3,295 seniors responded (57 percent) to queries on their sex, race, primary home language, family income,…

  7. Adaptation of machine translation for multilingual information retrieval in the medical domain.

    PubMed

    Pecina, Pavel; Dušek, Ondřej; Goeuriot, Lorraine; Hajič, Jan; Hlaváčová, Jaroslava; Jones, Gareth J F; Kelly, Liadh; Leveling, Johannes; Mareček, David; Novák, Michal; Popel, Martin; Rosa, Rudolf; Tamchyna, Aleš; Urešová, Zdeňka

    2014-07-01

    We investigate machine translation (MT) of user search queries in the context of cross-lingual information retrieval (IR) in the medical domain. The main focus is on techniques to adapt MT to increase translation quality; however, we also explore MT adaptation to improve effectiveness of cross-lingual IR. Our MT system is Moses, a state-of-the-art phrase-based statistical machine translation system. The IR system is based on the BM25 retrieval model implemented in the Lucene search engine. The MT techniques employed in this work include in-domain training and tuning, intelligent training data selection, optimization of phrase table configuration, compound splitting, and exploiting synonyms as translation variants. The IR methods include morphological normalization and using multiple translation variants for query expansion. The experiments are performed and thoroughly evaluated on three language pairs: Czech-English, German-English, and French-English. MT quality is evaluated on data sets created within the Khresmoi project and IR effectiveness is tested on the CLEF eHealth 2013 data sets. The search query translation results achieved in our experiments are outstanding - our systems outperform not only our strong baselines, but also Google Translate and Microsoft Bing Translator in direct comparison carried out on all the language pairs. The baseline BLEU scores increased from 26.59 to 41.45 for Czech-English, from 23.03 to 40.82 for German-English, and from 32.67 to 40.82 for French-English. This is a 55% improvement on average. In terms of the IR performance on this particular test collection, a significant improvement over the baseline is achieved only for French-English. For Czech-English and German-English, the increased MT quality does not lead to better IR results. Most of the MT techniques employed in our experiments improve MT of medical search queries. Especially the intelligent training data selection proves to be very successful for domain adaptation of MT. Certain improvements are also obtained from German compound splitting on the source language side. Translation quality, however, does not appear to correlate with the IR performance - better translation does not necessarily yield better retrieval. We discuss in detail the contribution of the individual techniques and state-of-the-art features and provide future research directions. Copyright © 2014 Elsevier B.V. All rights reserved.

  8. A Semantic Parsing Method for Mapping Clinical Questions to Logical Forms

    PubMed Central

    Roberts, Kirk; Patra, Braja Gopal

    2017-01-01

    This paper presents a method for converting natural language questions about structured data in the electronic health record (EHR) into logical forms. The logical forms can then subsequently be converted to EHR-dependent structured queries. The natural language processing task, known as semantic parsing, has the potential to convert questions to logical forms with extremely high precision, resulting in a system that is usable and trusted by clinicians for real-time use in clinical settings. We propose a hybrid semantic parsing method, combining rule-based methods with a machine learning-based classifier. The overall semantic parsing precision on a set of 212 questions is 95.6%. The parser’s rules furthermore allow it to “know what it does not know”, enabling the system to indicate when unknown terms prevent it from understanding the question’s full logical structure. When combined with a module for converting a logical form into an EHR-dependent query, this high-precision approach allows for a question answering system to provide a user with a single, verifiably correct answer. PMID:29854217

  9. Design and implementation of an open source indexing solution for a large set of radiological reports and images.

    PubMed

    Voet, T; Devolder, P; Pynoo, B; Vercruysse, J; Duyck, P

    2007-11-01

    This paper hopes to share the insights we experienced during designing, building, and running an indexing solution for a large set of radiological reports and images in a production environment for more than 3 years. Several technical challenges were encountered and solved in the course of this project. One hundred four million words in 1.8 million radiological reports from 1989 to the present were indexed and became instantaneously searchable in a user-friendly fashion; the median query duration is only 31 ms. Currently, our highly tuned index holds 332,088 unique words in four languages. The indexing system is feature-rich and language-independent and allows for making complex queries. For research and training purposes it certainly is a valuable and convenient addition to our radiology informatics toolbox. Extended use of open-source technology dramatically reduced both implementation time and cost. All software we developed related to the indexing project has been made available to the open-source community covered by an unrestricted Berkeley Software Distribution-style license.

  10. Data-driven decision support for radiologists: re-using the National Lung Screening Trial dataset for pulmonary nodule management.

    PubMed

    Morrison, James J; Hostetter, Jason; Wang, Kenneth; Siegel, Eliot L

    2015-02-01

    Real-time mining of large research trial datasets enables development of case-based clinical decision support tools. Several applicable research datasets exist including the National Lung Screening Trial (NLST), a dataset unparalleled in size and scope for studying population-based lung cancer screening. Using these data, a clinical decision support tool was developed which matches patient demographics and lung nodule characteristics to a cohort of similar patients. The NLST dataset was converted into Structured Query Language (SQL) tables hosted on a web server, and a web-based JavaScript application was developed which performs real-time queries. JavaScript is used for both the server-side and client-side language, allowing for rapid development of a robust client interface and server-side data layer. Real-time data mining of user-specified patient cohorts achieved a rapid return of cohort cancer statistics and lung nodule distribution information. This system demonstrates the potential of individualized real-time data mining using large high-quality clinical trial datasets to drive evidence-based clinical decision-making.

  11. A Novel Visual Interface to Foster Innovation in Mechanical Engineering and Protect from Patent Infringement

    NASA Astrophysics Data System (ADS)

    Sorce, Salvatore; Malizia, Alessio; Jiang, Pingfei; Atherton, Mark; Harrison, David

    2018-04-01

    One of the main time and money consuming tasks in the design of industrial devices and parts is the checking of possible patent infringements. Indeed, the great number of documents to be mined and the wide variety of technical language used to describe inventions are reasons why considerable amounts of time may be needed. On the other hand, the early detection of a possible patent conflict, in addition to reducing the risk of legal disputes, could stimulate a designers’ creativity to overcome similarities in overlapping patents. For this reason, there are a lot of existing patent analysis systems, each with its own features and access modes. We have designed a visual interface providing an intuitive access to such systems, freeing the designers from the specific knowledge of querying languages and providing them with visual clues. We tested the interface on a framework aimed at representing mechanical engineering patents; the framework is based on a semantic database and provides patent conflict analysis for early-stage designs. The interface supports a visual query composition to obtain a list of potentially overlapping designs.

  12. Olelo: a web application for intuitive exploration of biomedical literature

    PubMed Central

    Niedermeier, Julian; Jankrift, Marcel; Tietböhl, Sören; Stachewicz, Toni; Folkerts, Hendrik; Uflacker, Matthias; Neves, Mariana

    2017-01-01

    Abstract Researchers usually query the large biomedical literature in PubMed via keywords, logical operators and filters, none of which is very intuitive. Question answering systems are an alternative to keyword searches. They allow questions in natural language as input and results reflect the given type of question, such as short answers and summaries. Few of those systems are available online but they experience drawbacks in terms of long response times and they support a limited amount of question and result types. Additionally, user interfaces are usually restricted to only displaying the retrieved information. For our Olelo web application, we combined biomedical literature and terminologies in a fast in-memory database to enable real-time responses to researchers’ queries. Further, we extended the built-in natural language processing features of the database with question answering and summarization procedures. Combined with a new explorative approach of document filtering and a clean user interface, Olelo enables a fast and intelligent search through the ever-growing biomedical literature. Olelo is available at http://www.hpi.de/plattner/olelo. PMID:28472397

  13. PIML: the Pathogen Information Markup Language.

    PubMed

    He, Yongqun; Vines, Richard R; Wattam, Alice R; Abramochkin, Georgiy V; Dickerman, Allan W; Eckart, J Dana; Sobral, Bruno W S

    2005-01-01

    A vast amount of information about human, animal and plant pathogens has been acquired, stored and displayed in varied formats through different resources, both electronically and otherwise. However, there is no community standard format for organizing this information or agreement on machine-readable format(s) for data exchange, thereby hampering interoperation efforts across information systems harboring such infectious disease data. The Pathogen Information Markup Language (PIML) is a free, open, XML-based format for representing pathogen information. XSLT-based visual presentations of valid PIML documents were developed and can be accessed through the PathInfo website or as part of the interoperable web services federation known as ToolBus/PathPort. Currently, detailed PIML documents are available for 21 pathogens deemed of high priority with regard to public health and national biological defense. A dynamic query system allows simple queries as well as comparisons among these pathogens. Continuing efforts are being taken to include other groups' supporting PIML and to develop more PIML documents. All the PIML-related information is accessible from http://www.vbi.vt.edu/pathport/pathinfo/

  14. Cyberinfrastructure for the digital brain: spatial standards for integrating rodent brain atlases

    PubMed Central

    Zaslavsky, Ilya; Baldock, Richard A.; Boline, Jyl

    2014-01-01

    Biomedical research entails capture and analysis of massive data volumes and new discoveries arise from data-integration and mining. This is only possible if data can be mapped onto a common framework such as the genome for genomic data. In neuroscience, the framework is intrinsically spatial and based on a number of paper atlases. This cannot meet today's data-intensive analysis and integration challenges. A scalable and extensible software infrastructure that is standards based but open for novel data and resources, is required for integrating information such as signal distributions, gene-expression, neuronal connectivity, electrophysiology, anatomy, and developmental processes. Therefore, the International Neuroinformatics Coordinating Facility (INCF) initiated the development of a spatial framework for neuroscience data integration with an associated Digital Atlasing Infrastructure (DAI). A prototype implementation of this infrastructure for the rodent brain is reported here. The infrastructure is based on a collection of reference spaces to which data is mapped at the required resolution, such as the Waxholm Space (WHS), a 3D reconstruction of the brain generated using high-resolution, multi-channel microMRI. The core standards of the digital atlasing service-oriented infrastructure include Waxholm Markup Language (WaxML): XML schema expressing a uniform information model for key elements such as coordinate systems, transformations, points of interest (POI)s, labels, and annotations; and Atlas Web Services: interfaces for querying and updating atlas data. The services return WaxML-encoded documents with information about capabilities, spatial reference systems (SRSs) and structures, and execute coordinate transformations and POI-based requests. Key elements of INCF-DAI cyberinfrastructure have been prototyped for both mouse and rat brain atlas sources, including the Allen Mouse Brain Atlas, UCSD Cell-Centered Database, and Edinburgh Mouse Atlas Project. PMID:25309417

  15. Cyberinfrastructure for the digital brain: spatial standards for integrating rodent brain atlases.

    PubMed

    Zaslavsky, Ilya; Baldock, Richard A; Boline, Jyl

    2014-01-01

    Biomedical research entails capture and analysis of massive data volumes and new discoveries arise from data-integration and mining. This is only possible if data can be mapped onto a common framework such as the genome for genomic data. In neuroscience, the framework is intrinsically spatial and based on a number of paper atlases. This cannot meet today's data-intensive analysis and integration challenges. A scalable and extensible software infrastructure that is standards based but open for novel data and resources, is required for integrating information such as signal distributions, gene-expression, neuronal connectivity, electrophysiology, anatomy, and developmental processes. Therefore, the International Neuroinformatics Coordinating Facility (INCF) initiated the development of a spatial framework for neuroscience data integration with an associated Digital Atlasing Infrastructure (DAI). A prototype implementation of this infrastructure for the rodent brain is reported here. The infrastructure is based on a collection of reference spaces to which data is mapped at the required resolution, such as the Waxholm Space (WHS), a 3D reconstruction of the brain generated using high-resolution, multi-channel microMRI. The core standards of the digital atlasing service-oriented infrastructure include Waxholm Markup Language (WaxML): XML schema expressing a uniform information model for key elements such as coordinate systems, transformations, points of interest (POI)s, labels, and annotations; and Atlas Web Services: interfaces for querying and updating atlas data. The services return WaxML-encoded documents with information about capabilities, spatial reference systems (SRSs) and structures, and execute coordinate transformations and POI-based requests. Key elements of INCF-DAI cyberinfrastructure have been prototyped for both mouse and rat brain atlas sources, including the Allen Mouse Brain Atlas, UCSD Cell-Centered Database, and Edinburgh Mouse Atlas Project.

  16. Plasticity of Human Spatial Cognition: Spatial Language and Cognition Covary across Cultures

    ERIC Educational Resources Information Center

    Haun, Daniel B. M.; Rapold, Christian J.; Janzen, Gabriele; Levinson, Stephen C.

    2011-01-01

    The present paper explores cross-cultural variation in spatial cognition by comparing spatial reconstruction tasks by Dutch and Namibian elementary school children. These two communities differ in the way they predominantly express spatial relations in language. Four experiments investigate cognitive strategy preferences across different levels of…

  17. Datacube Services in Action, Using Open Source and Open Standards

    NASA Astrophysics Data System (ADS)

    Baumann, P.; Misev, D.

    2016-12-01

    Array Databases comprise novel, promising technology for massive spatio-temporal datacubes, extending the SQL paradigm of "any query, anytime" to n-D arrays. On server side, such queries can be optimized, parallelized, and distributed based on partitioned array storage. The rasdaman ("raster data manager") system, which has pioneered Array Databases, is available in open source on www.rasdaman.org. Its declarative query language extends SQL with array operators which are optimized and parallelized on server side. The rasdaman engine, which is part of OSGeo Live, is mature and in operational use databases individually holding dozens of Terabytes. Further, the rasdaman concepts have strongly impacted international Big Data standards in the field, including the forthcoming MDA ("Multi-Dimensional Array") extension to ISO SQL, the OGC Web Coverage Service (WCS) and Web Coverage Processing Service (WCPS) standards, and the forthcoming INSPIRE WCS/WCPS; in both OGC and INSPIRE, OGC is WCS Core Reference Implementation. In our talk we present concepts, architecture, operational services, and standardization impact of open-source rasdaman, as well as experiences made.

  18. Ontology-based vector space model and fuzzy query expansion to retrieve knowledge on medical computational problem solutions.

    PubMed

    Bratsas, Charalampos; Koutkias, Vassilis; Kaimakamis, Evangelos; Bamidis, Panagiotis; Maglaveras, Nicos

    2007-01-01

    Medical Computational Problem (MCP) solving is related to medical problems and their computerized algorithmic solutions. In this paper, an extension of an ontology-based model to fuzzy logic is presented, as a means to enhance the information retrieval (IR) procedure in semantic management of MCPs. We present herein the methodology followed for the fuzzy expansion of the ontology model, the fuzzy query expansion procedure, as well as an appropriate ontology-based Vector Space Model (VSM) that was constructed for efficient mapping of user-defined MCP search criteria and MCP acquired knowledge. The relevant fuzzy thesaurus is constructed by calculating the simultaneous occurrences of terms and the term-to-term similarities derived from the ontology that utilizes UMLS (Unified Medical Language System) concepts by using Concept Unique Identifiers (CUI), synonyms, semantic types, and broader-narrower relationships for fuzzy query expansion. The current approach constitutes a sophisticated advance for effective, semantics-based MCP-related IR.

  19. Query optimization for graph analytics on linked data using SPARQL

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hong, Seokyong; Lee, Sangkeun; Lim, Seung -Hwan

    2015-07-01

    Triplestores that support query languages such as SPARQL are emerging as the preferred and scalable solution to represent data and meta-data as massive heterogeneous graphs using Semantic Web standards. With increasing adoption, the desire to conduct graph-theoretic mining and exploratory analysis has also increased. Addressing that desire, this paper presents a solution that is the marriage of Graph Theory and the Semantic Web. We present software that can analyze Linked Data using graph operations such as counting triangles, finding eccentricity, testing connectedness, and computing PageRank directly on triple stores via the SPARQL interface. We describe the process of optimizing performancemore » of the SPARQL-based implementation of such popular graph algorithms by reducing the space-overhead, simplifying iterative complexity and removing redundant computations by understanding query plans. Our optimized approach shows significant performance gains on triplestores hosted on stand-alone workstations as well as hardware-optimized scalable supercomputers such as the Cray XMT.« less

  20. Implied motion language can influence visual spatial memory.

    PubMed

    Vinson, David W; Engelen, Jan; Zwaan, Rolf A; Matlock, Teenie; Dale, Rick

    2017-07-01

    How do language and vision interact? Specifically, what impact can language have on visual processing, especially related to spatial memory? What are typically considered errors in visual processing, such as remembering the location of an object to be farther along its motion trajectory than it actually is, can be explained as perceptual achievements that are driven by our ability to anticipate future events. In two experiments, we tested whether the prior presentation of motion language influences visual spatial memory in ways that afford greater perceptual prediction. Experiment 1 showed that motion language influenced judgments for the spatial memory of an object beyond the known effects of implied motion present in the image itself. Experiment 2 replicated this finding. Our findings support a theory of perception as prediction.

  1. Federated ontology-based queries over cancer data

    PubMed Central

    2012-01-01

    Background Personalised medicine provides patients with treatments that are specific to their genetic profiles. It requires efficient data sharing of disparate data types across a variety of scientific disciplines, such as molecular biology, pathology, radiology and clinical practice. Personalised medicine aims to offer the safest and most effective therapeutic strategy based on the gene variations of each subject. In particular, this is valid in oncology, where knowledge about genetic mutations has already led to new therapies. Current molecular biology techniques (microarrays, proteomics, epigenetic technology and improved DNA sequencing technology) enable better characterisation of cancer tumours. The vast amounts of data, however, coupled with the use of different terms - or semantic heterogeneity - in each discipline makes the retrieval and integration of information difficult. Results Existing software infrastructures for data-sharing in the cancer domain, such as caGrid, support access to distributed information. caGrid follows a service-oriented model-driven architecture. Each data source in caGrid is associated with metadata at increasing levels of abstraction, including syntactic, structural, reference and domain metadata. The domain metadata consists of ontology-based annotations associated with the structural information of each data source. However, caGrid's current querying functionality is given at the structural metadata level, without capitalising on the ontology-based annotations. This paper presents the design of and theoretical foundations for distributed ontology-based queries over cancer research data. Concept-based queries are reformulated to the target query language, where join conditions between multiple data sources are found by exploiting the semantic annotations. The system has been implemented, as a proof of concept, over the caGrid infrastructure. The approach is applicable to other model-driven architectures. A graphical user interface has been developed, supporting ontology-based queries over caGrid data sources. An extensive evaluation of the query reformulation technique is included. Conclusions To support personalised medicine in oncology, it is crucial to retrieve and integrate molecular, pathology, radiology and clinical data in an efficient manner. The semantic heterogeneity of the data makes this a challenging task. Ontologies provide a formal framework to support querying and integration. This paper provides an ontology-based solution for querying distributed databases over service-oriented, model-driven infrastructures. PMID:22373043

  2. Translingual Literacy, Language Difference, and Matters of Agency

    ERIC Educational Resources Information Center

    Lu, Min-Zhan; Horner, Bruce

    2013-01-01

    We argue that composition scholarship's defenses of language differences in student writing reinforce dominant ideology's spatial framework conceiving language difference as deviation from a norm of sameness. We argue instead for adopting a temporal-spatial framework defining difference as the norm of utterances, and defining languages,…

  3. I must have missed that: Alpha-band oscillations track attention to spoken language.

    PubMed

    Boudewyn, M A; Carter, C S

    2018-05-26

    Attention is critical to the construction of mental representations of language context during comprehension. We investigated the consequences of momentary lapses in attention during listening comprehension on neural activity and behavior. Participants listened to two full-length stories while EEG was recorded, and afterwards completed multiple choice comprehension questions. Listening was periodically interrupted by attention probes, in which participants were asked whether their attention immediately preceding the probe's appearance was focused on the story. The results showed that (1) participants spent a substantial amount of time off-task, endorsing attention lapses on over 30% of probes; (2) for probes on which an attention lapse was endorsed, later accuracy on comprehension questions querying pre-probe information was decreased; (3) the pre-probe period just before the endorsement of an attention lapse was characterized by a greater percentage of above-threshold oscillations in the alpha-band (8-12 Hz) compared to just prior to the endorsement of on-task or split-attention listening; and (4) when participants made "I have no idea" responses to comprehension questions, their EEG record revealed a greater percentage of above-threshold alpha oscillations during the original presentation of the information queried by the comprehension questions, compared to correct responses or incorrect guesses. These results connect changes in neural activity in the alpha band to episodes of mind-wandering during listening comprehension, and in turn to decreased comprehension accuracy. This demonstrates how alpha can be used to track attentional engagement during language comprehension, and illustrates the dependence of successful language comprehension on attention. Copyright © 2018 Elsevier Ltd. All rights reserved.

  4. Language-concordant automated telephone queries to assess medication adherence in a diverse population: a cross-sectional analysis of convergent validity with pharmacy claims.

    PubMed

    Ratanawongsa, Neda; Quan, Judy; Handley, Margaret A; Sarkar, Urmimala; Schillinger, Dean

    2018-04-06

    Clinicians have difficulty accurately assessing medication non-adherence within chronic disease care settings. Health information technology (HIT) could offer novel tools to assess medication adherence in diverse populations outside of usual health care settings. In a multilingual urban safety net population, we examined the validity of assessing adherence using automated telephone self-management (ATSM) queries, when compared with non-adherence using continuous medication gap (CMG) on pharmacy claims. We hypothesized that patients reporting greater days of missed pills to ATSM queries would have higher rates of non-adherence as measured by CMG, and that ATSM adherence assessments would perform as well as structured interview assessments. As part of an ATSM-facilitated diabetes self-management program, low-income health plan members typed numeric responses to rotating weekly ATSM queries: "In the last 7 days, how many days did you MISS taking your …" diabetes, blood pressure, or cholesterol pill. Research assistants asked similar questions in computer-assisted structured telephone interviews. We measured continuous medication gap (CMG) by claims over 12 preceding months. To evaluate convergent validity, we compared rates of optimal adherence (CMG ≤ 20%) across respondents reporting 0, 1, and ≥ 2 missed pill days on ATSM and on structured interview. Among 210 participants, 46% had limited health literacy, 57% spoke Cantonese, and 19% Spanish. ATSM respondents reported ≥1 missed day for diabetes (33%), blood pressure (19%), and cholesterol (36%) pills. Interview respondents reported ≥1 missed day for diabetes (28%), blood pressure (21%), and cholesterol (26%) pills. Optimal adherence rates by CMG were lower among ATSM respondents reporting more missed days for blood pressure (p = 0.02) and cholesterol (p < 0.01); by interview, differences were significant for cholesterol (p = 0.01). Language-concordant ATSM demonstrated modest potential for assessing adherence. Studies should evaluate HIT assessments of medication beliefs and concerns in diverse populations. NCT00683020 , registered May 21, 2008.

  5. Biomedical information retrieval across languages.

    PubMed

    Daumke, Philipp; Markü, Kornél; Poprat, Michael; Schulz, Stefan; Klar, Rüdiger

    2007-06-01

    This work presents a new dictionary-based approach to biomedical cross-language information retrieval (CLIR) that addresses many of the general and domain-specific challenges in current CLIR research. Our method is based on a multilingual lexicon that was generated partly manually and partly automatically, and currently covers six European languages. It contains morphologically meaningful word fragments, termed subwords. Using subwords instead of entire words significantly reduces the number of lexical entries necessary to sufficiently cover a specific language and domain. Mediation between queries and documents is based on these subwords as well as on lists of word-n-grams that are generated from large monolingual corpora and constitute possible translation units. The translations are then sent to a standard Internet search engine. This process makes our approach an effective tool for searching the biomedical content of the World Wide Web in different languages. We evaluate this approach using the OHSUMED corpus, a large medical document collection, within a cross-language retrieval setting.

  6. Generating Concise Rules for Human Motion Retrieval

    NASA Astrophysics Data System (ADS)

    Mukai, Tomohiko; Wakisaka, Ken-Ichi; Kuriyama, Shigeru

    This paper proposes a method for retrieving human motion data with concise retrieval rules based on the spatio-temporal features of motion appearance. Our method first converts motion clip into a form of clausal language that represents geometrical relations between body parts and their temporal relationship. A retrieval rule is then learned from the set of manually classified examples using inductive logic programming (ILP). ILP automatically discovers the essential rule in the same clausal form with a user-defined hypothesis-testing procedure. All motions are indexed using this clausal language, and the desired clips are retrieved by subsequence matching using the rule. Such rule-based retrieval offers reasonable performance and the rule can be intuitively edited in the same language form. Consequently, our method enables efficient and flexible search from a large dataset with simple query language.

  7. The lexeme hypotheses: Their use to generate highly grammatical and completely computerized medical records.

    PubMed

    Macfarlane, Donald

    2016-07-01

    Medical records often contain free text created by harried clinicians. Free text often contains errors which make it an unsuitable target for computerized data extraction. The cost of healthcare can be reduced by creating medical records that are fully computerized at their inception. We examine hypotheses that enable us to construct such records. We regard the text of the medical record as being an ordered collection of meaningful fragments. The intellectual content (or "lexeme") of each text fragment in the record is considered separately from the language that used to express it. We further consider that each lexeme exists as a combination of a lexeme query (defining the issue being addressed) and a lexeme response to that query. The medical record can then be perceived as a stream of these responses. The responses can be expressed in any style or language, including computer code. Examining medical records in this light gives rise to a number of observations and hypotheses. The physical location and nature of the medical episode (which we term "context") determines the general layout of the record. The order that lexeme-queries are addressed in within the record is highly consistent ("coherence"). Issues are only addressed if they are logically called-for by the context or by a previously-selected lexeme response ("predicance"), and only to a needed depth of detail ("level"). We hypothesize that all of the lexeme queries required to write any clinical notes can be stored in a large database ("lexicon") in coherence order, wherein each lexeme query is associated with its own collection of lexeme responses. We hypothesize that the issue a note-writer will need to address next is identifiable purely by using the rules of coherence, level and predicance. We have tested these hypotheses with a computer program which repeatedly offers the user a menu of lexeme responses with associated text. On selection, the program issues the text fragment, and its corresponding computer code, to output files. The program then uses coherence, predicance and level to navigate to the next appropriate lexeme query for presentation to the user. The net result is that the user creates a grammatically correct and completely computerized note at the time of its inception. The value of this approach and its practical implementation to create medical records are discussed. In our work so far, the hypotheses appear not to be false, but further testing is needed using a larger lexicon to establish their robustness in actual clinical practice. Copyright © 2016 Elsevier Ltd. All rights reserved.

  8. The Voronoi spatio-temporal data structure

    NASA Astrophysics Data System (ADS)

    Mioc, Darka

    2002-04-01

    Current GIS models cannot integrate the temporal dimension of spatial data easily. Indeed, current GISs do not support incremental (local) addition and deletion of spatial objects, and they can not support the temporal evolution of spatial data. Spatio-temporal facilities would be very useful in many GIS applications: harvesting and forest planning, cadastre, urban and regional planning, and emergency planning. The spatio-temporal model that can overcome these problems is based on a topological model---the Voronoi data structure. Voronoi diagrams are irregular tessellations of space, that adapt to spatial objects and therefore they are a synthesis of raster and vector spatial data models. The main advantage of the Voronoi data structure is its local and sequential map updates, which allows us to automatically record each event and performed map updates within the system. These map updates are executed through map construction commands that are composed of atomic actions (geometric algorithms for addition, deletion, and motion of spatial objects) on the dynamic Voronoi data structure. The formalization of map commands led to the development of a spatial language comprising a set of atomic operations or constructs on spatial primitives (points and lines), powerful enough to define the complex operations. This resulted in a new formal model for spatio-temporal change representation, where each update is uniquely characterized by the numbers of newly created and inactivated Voronoi regions. This is used for the extension of the model towards the hierarchical Voronoi data structure. In this model, spatio-temporal changes induced by map updates are preserved in a hierarchical data structure that combines events and corresponding changes in topology. This hierarchical Voronoi data structure has an implicit time ordering of events visible through changes in topology, and it is equivalent to an event structure that can support temporal data without precise temporal information. This formal model of spatio-temporal change representation is currently applied to retroactive map updates and visualization of map evolution. It offers new possibilities in the domains of temporal GIS, transaction processing, spatio-temporal queries, spatio-temporal analysis, map animation and map visualization.

  9. A Querying Method over RDF-ized Health Level Seven v2.5 Messages Using Life Science Knowledge Resources.

    PubMed

    Kawazoe, Yoshimasa; Imai, Takeshi; Ohe, Kazuhiko

    2016-04-05

    Health level seven version 2.5 (HL7 v2.5) is a widespread messaging standard for information exchange between clinical information systems. By applying Semantic Web technologies for handling HL7 v2.5 messages, it is possible to integrate large-scale clinical data with life science knowledge resources. Showing feasibility of a querying method over large-scale resource description framework (RDF)-ized HL7 v2.5 messages using publicly available drug databases. We developed a method to convert HL7 v2.5 messages into the RDF. We also converted five kinds of drug databases into RDF and provided explicit links between the corresponding items among them. With those linked drug data, we then developed a method for query expansion to search the clinical data using semantic information on drug classes along with four types of temporal patterns. For evaluation purpose, medication orders and laboratory test results for a 3-year period at the University of Tokyo Hospital were used, and the query execution times were measured. Approximately 650 million RDF triples for medication orders and 790 million RDF triples for laboratory test results were converted. Taking three types of query in use cases for detecting adverse events of drugs as an example, we confirmed these queries were represented in SPARQL Protocol and RDF Query Language (SPARQL) using our methods and comparison with conventional query expressions were performed. The measurement results confirm that the query time is feasible and increases logarithmically or linearly with the amount of data and without diverging. The proposed methods enabled query expressions that separate knowledge resources and clinical data, thereby suggesting the feasibility for improving the usability of clinical data by enhancing the knowledge resources. We also demonstrate that when HL7 v2.5 messages are automatically converted into RDF, searches are still possible through SPARQL without modifying the structure. As such, the proposed method benefits not only our hospitals, but also numerous hospitals that handle HL7 v2.5 messages. Our approach highlights a potential of large-scale data federation techniques to retrieve clinical information, which could be applied as applications of clinical intelligence to improve clinical practices, such as adverse drug event monitoring and cohort selection for a clinical study as well as discovering new knowledge from clinical information.

  10. Modelling the spatial distribution of Fasciola hepatica in bovines using decision tree, logistic regression and GIS query approaches for Brazil.

    PubMed

    Bennema, S C; Molento, M B; Scholte, R G; Carvalho, O S; Pritsch, I

    2017-11-01

    Fascioliasis is a condition caused by the trematode Fasciola hepatica. In this paper, the spatial distribution of F. hepatica in bovines in Brazil was modelled using a decision tree approach and a logistic regression, combined with a geographic information system (GIS) query. In the decision tree and the logistic model, isothermality had the strongest influence on disease prevalence. Also, the 50-year average precipitation in the warmest quarter of the year was included as a risk factor, having a negative influence on the parasite prevalence. The risk maps developed using both techniques, showed a predicted higher prevalence mainly in the South of Brazil. The prediction performance seemed to be high, but both techniques failed to reach a high accuracy in predicting the medium and high prevalence classes to the entire country. The GIS query map, based on the range of isothermality, minimum temperature of coldest month, precipitation of warmest quarter of the year, altitude and the average dailyland surface temperature, showed a possibility of presence of F. hepatica in a very large area. The risk maps produced using these methods can be used to focus activities of animal and public health programmes, even on non-evaluated F. hepatica areas.

  11. PlanetServer: Innovative approaches for the online analysis of hyperspectral satellite data from Mars

    NASA Astrophysics Data System (ADS)

    Oosthoek, J. H. P.; Flahaut, J.; Rossi, A. P.; Baumann, P.; Misev, D.; Campalani, P.; Unnithan, V.

    2014-06-01

    PlanetServer is a WebGIS system, currently under development, enabling the online analysis of Compact Reconnaissance Imaging Spectrometer (CRISM) hyperspectral data from Mars. It is part of the EarthServer project which builds infrastructure for online access and analysis of huge Earth Science datasets. Core functionality consists of the rasdaman Array Database Management System (DBMS) for storage, and the Open Geospatial Consortium (OGC) Web Coverage Processing Service (WCPS) for data querying. Various WCPS queries have been designed to access spatial and spectral subsets of the CRISM data. The client WebGIS, consisting mainly of the OpenLayers javascript library, uses these queries to enable online spatial and spectral analysis. Currently the PlanetServer demonstration consists of two CRISM Full Resolution Target (FRT) observations, surrounding the NASA Curiosity rover landing site. A detailed analysis of one of these observations is performed in the Case Study section. The current PlanetServer functionality is described step by step, and is tested by focusing on detecting mineralogical evidence described in earlier Gale crater studies. Both the PlanetServer methodology and its possible use for mineralogical studies will be further discussed. Future work includes batch ingestion of CRISM data and further development of the WebGIS and analysis tools.

  12. Spatial Language and the Embedded Listener Model in Parents’ Input to Children

    PubMed Central

    Ferrara, Katrina; Silva, Malena; Wilson, Colin; Landau, Barbara

    2015-01-01

    Language is a collaborative act: in order to communicate successfully, speakers must generate utterances that are not only semantically valid, but also sensitive to the knowledge state of the listener. Such sensitivity could reflect use of an “embedded listener model,” where speakers choose utterances on the basis of an internal model of the listeners’ conceptual and linguistic knowledge. In this paper, we ask whether parents’ spatial descriptions incorporate an embedded listener model that reflects their children’s understanding of spatial relations and spatial terms. Adults described the positions of targets in spatial arrays to their children or to the adult experimenter. Arrays were designed so that targets could not be identified unless spatial relationships within the array were encoded and described. Parents of 3–4 year-old children encoded relationships in ways that were well-matched to their children’s level of spatial language. These encodings differed from those of the same relationships in speech to the adult experimenter (Experiment 1). By contrast, parents of individuals with severe spatial impairments (Williams syndrome) did not show clear evidence of sensitivity to their children’s level of spatial language (Experiment 2). The results provide evidence for an embedded listener model in the domain of spatial language, and indicate conditions under which the ability to model listener knowledge may be more challenging. PMID:26717804

  13. Spatial Language and the Embedded Listener Model in Parents' Input to Children.

    PubMed

    Ferrara, Katrina; Silva, Malena; Wilson, Colin; Landau, Barbara

    2016-11-01

    Language is a collaborative act: To communicate successfully, speakers must generate utterances that are not only semantically valid but also sensitive to the knowledge state of the listener. Such sensitivity could reflect the use of an "embedded listener model," where speakers choose utterances on the basis of an internal model of the listener's conceptual and linguistic knowledge. In this study, we ask whether parents' spatial descriptions incorporate an embedded listener model that reflects their children's understanding of spatial relations and spatial terms. Adults described the positions of targets in spatial arrays to their children or to the adult experimenter. Arrays were designed so that targets could not be identified unless spatial relationships within the array were encoded and described. Parents of 3-4-year-old children encoded relationships in ways that were well-matched to their children's level of spatial language. These encodings differed from those of the same relationships in speech to the adult experimenter (Experiment 1). In contrast, parents of individuals with severe spatial impairments (Williams syndrome) did not show clear evidence of sensitivity to their children's level of spatial language (Experiment 2). The results provide evidence for an embedded listener model in the domain of spatial language and indicate conditions under which the ability to model listener knowledge may be more challenging. Copyright © 2015 Cognitive Science Society, Inc.

  14. Framework for Evaluating Loop Invariant Detection Games in Relation to Automated Dynamic Invariant Detectors

    DTIC Science & Technology

    2015-09-01

    Detectability ...............................................................................................37 Figure 20. Excel VBA Codes for Checker...National Vulnerability Database OS Operating System SQL Structured Query Language VC Verification Condition VBA Visual Basic for Applications...checks each of these assertions for detectability by Daikon. The checker is an Excel Visual Basic for Applications ( VBA ) script that checks the

  15. Query Enhancement with Topic Detection and Disambiguation for Robust Retrieval

    ERIC Educational Resources Information Center

    Zhang, Hui

    2013-01-01

    With the rapid increase in the amount of available information, people nowadays rely heavily on information retrieval (IR) systems such as web search engine to fulfill their information needs. However, due to the lack of domain knowledge and the limitation of natural language such as synonyms and polysemes, many system users cannot formulate their…

  16. Quantifying Precision and Availability of Location Memory in Everyday Pictures and Some Implications for Picture Database Design

    ERIC Educational Resources Information Center

    Lansdale, Mark W.; Oliff, Lynda; Baguley, Thom S.

    2005-01-01

    The authors investigated whether memory for object locations in pictures could be exploited to address known difficulties of designing query languages for picture databases. M. W. Lansdale's (1998) model of location memory was adapted to 4 experiments observing memory for everyday pictures. These experiments showed that location memory is…

  17. Examining Learning Styles and Perceived Benefits of Analogical Problem Construction on SQL Knowledge Acquisition

    ERIC Educational Resources Information Center

    Mills, Robert J.; Dupin-Bryant, Pamela A.; Johnson, John D.; Beaulieu, Tanya Y.

    2015-01-01

    The demand for Information Systems (IS) graduates with expertise in Structured Query Language (SQL) and database management is vast and projected to increase as "big data" becomes ubiquitous. To prepare students to solve complex problems in a data-driven world, educators must explore instructional strategies to help link prior knowledge…

  18. Improving integrative searching of systems chemical biology data using semantic annotation.

    PubMed

    Chen, Bin; Ding, Ying; Wild, David J

    2012-03-08

    Systems chemical biology and chemogenomics are considered critical, integrative disciplines in modern biomedical research, but require data mining of large, integrated, heterogeneous datasets from chemistry and biology. We previously developed an RDF-based resource called Chem2Bio2RDF that enabled querying of such data using the SPARQL query language. Whilst this work has proved useful in its own right as one of the first major resources in these disciplines, its utility could be greatly improved by the application of an ontology for annotation of the nodes and edges in the RDF graph, enabling a much richer range of semantic queries to be issued. We developed a generalized chemogenomics and systems chemical biology OWL ontology called Chem2Bio2OWL that describes the semantics of chemical compounds, drugs, protein targets, pathways, genes, diseases and side-effects, and the relationships between them. The ontology also includes data provenance. We used it to annotate our Chem2Bio2RDF dataset, making it a rich semantic resource. Through a series of scientific case studies we demonstrate how this (i) simplifies the process of building SPARQL queries, (ii) enables useful new kinds of queries on the data and (iii) makes possible intelligent reasoning and semantic graph mining in chemogenomics and systems chemical biology. Chem2Bio2OWL is available at http://chem2bio2rdf.org/owl. The document is available at http://chem2bio2owl.wikispaces.com.

  19. Querying archetype-based EHRs by search ontology-based XPath engineering.

    PubMed

    Kropf, Stefan; Uciteli, Alexandr; Schierle, Katrin; Krücken, Peter; Denecke, Kerstin; Herre, Heinrich

    2018-05-11

    Legacy data and new structured data can be stored in a standardized format as XML-based EHRs on XML databases. Querying documents on these databases is crucial for answering research questions. Instead of using free text searches, that lead to false positive results, the precision can be increased by constraining the search to certain parts of documents. A search ontology-based specification of queries on XML documents defines search concepts and relates them to parts in the XML document structure. Such query specification method is practically introduced and evaluated by applying concrete research questions formulated in natural language on a data collection for information retrieval purposes. The search is performed by search ontology-based XPath engineering that reuses ontologies and XML-related W3C standards. The key result is that the specification of research questions can be supported by the usage of search ontology-based XPath engineering. A deeper recognition of entities and a semantic understanding of the content is necessary for a further improvement of precision and recall. Key limitation is that the application of the introduced process requires skills in ontology and software development. In future, the time consuming ontology development could be overcome by implementing a new clinical role: the clinical ontologist. The introduced Search Ontology XML extension connects Search Terms to certain parts in XML documents and enables an ontology-based definition of queries. Search ontology-based XPath engineering can support research question answering by the specification of complex XPath expressions without deep syntax knowledge about XPaths.

  20. Exposing the cancer genome atlas as a SPARQL endpoint

    PubMed Central

    Deus, Helena F.; Veiga, Diogo F.; Freire, Pablo R.; Weinstein, John N.; Mills, Gordon B.; Almeida, Jonas S.

    2011-01-01

    The Cancer Genome Atlas (TCGA) is a multidisciplinary, multi-institutional effort to characterize several types of cancer. Datasets from biomedical domains such as TCGA present a particularly challenging task for those interested in dynamically aggregating its results because the data sources are typically both heterogeneous and distributed. The Linked Data best practices offer a solution to integrate and discover data with those characteristics, namely through exposure of data as Web services supporting SPARQL, the Resource Description Framework query language. Most SPARQL endpoints, however, cannot easily be queried by data experts. Furthermore, exposing experimental data as SPARQL endpoints remains a challenging task because, in most cases, data must first be converted to Resource Description Framework triples. In line with those requirements, we have developed an infrastructure to expose clinical, demographic and molecular data elements generated by TCGA as a SPARQL endpoint by assigning elements to entities of the Simple Sloppy Semantic Database (S3DB) management model. All components of the infrastructure are available as independent Representational State Transfer (REST) Web services to encourage reusability, and a simple interface was developed to automatically assemble SPARQL queries by navigating a representation of the TCGA domain. A key feature of the proposed solution that greatly facilitates assembly of SPARQL queries is the distinction between the TCGA domain descriptors and data elements. Furthermore, the use of the S3DB management model as a mediator enables queries to both public and protected data without the need for prior submission to a single data source. PMID:20851208

  1. ClassLess: A Comprehensive Database of Young Stellar Objects

    NASA Astrophysics Data System (ADS)

    Hillenbrand, Lynne A.; baliber, nairn

    2015-08-01

    We have designed and constructed a database intended to house catalog and literature-published measurements of Young Stellar Objects (YSOs) within ~1 kpc of the Sun. ClassLess, so called because it includes YSOs in all stages of evolution, is a relational database in which user interaction is conducted via HTML web browsers, queries are performed in scientific language, and all data are linked to the sources of publication. Each star is associated with a cluster (or clusters), and both spatially resolved and unresolved measurements are stored, allowing proper use of data from multiple star systems. With this fully searchable tool, myriad ground- and space-based instruments and surveys across wavelength regimes can be exploited. In addition to primary measurements, the database self consistently calculates and serves higher level data products such as extinction, luminosity, and mass. As a result, searches for young stars with specific physical characteristics can be completed with just a few mouse clicks. We are in the database population phase now, and are eager to engage with interested experts worldwide on local galactic star formation and young stellar populations.

  2. An Intelligent Polar Cyberinfrastrucuture to Support Spatiotemporal Decision Making

    NASA Astrophysics Data System (ADS)

    Song, M.; Li, W.; Zhou, X.

    2014-12-01

    In the era of big data, polar sciences have already faced an urgent demand of utilizing intelligent approaches to support precise and effective spatiotemporal decision-making. Service-oriented cyberinfrastructure has advantages of seamlessly integrating distributed computing resources, and aggregating a variety of geospatial data derived from Earth observation network. This paper focuses on building a smart service-oriented cyberinfrastructure to support intelligent question answering related to polar datasets. The innovation of this polar cyberinfrastructure includes: (1) a problem-solving environment that parses geospatial question in natural language, builds geoprocessing rules, composites atomic processing services and executes the entire workflow; (2) a self-adaptive spatiotemporal filter that is capable of refining query constraints through semantic analysis; (3) a dynamic visualization strategy to support results animation and statistics in multiple spatial reference systems; and (4) a user-friendly online portal to support collaborative decision-making. By means of this polar cyberinfrastructure, we intend to facilitate integration of distributed and heterogeneous Arctic datasets and comprehensive analysis of multiple environmental elements (e.g. snow, ice, permafrost) to provide a better understanding of the environmental variation in circumpolar regions.

  3. Don’t Like RDF Reification? Making Statements about Statements Using Singleton Property

    PubMed Central

    Nguyen, Vinh; Bodenreider, Olivier; Sheth, Amit

    2015-01-01

    Statements about RDF statements, or meta triples, provide additional information about individual triples, such as the source, the occurring time or place, or the certainty. Integrating such meta triples into semantic knowledge bases would enable the querying and reasoning mechanisms to be aware of provenance, time, location, or certainty of triples. However, an efficient RDF representation for such meta knowledge of triples remains challenging. The existing standard reification approach allows such meta knowledge of RDF triples to be expressed using RDF by two steps. The first step is representing the triple by a Statement instance which has subject, predicate, and object indicated separately in three different triples. The second step is creating assertions about that instance as if it is a statement. While reification is simple and intuitive, this approach does not have formal semantics and is not commonly used in practice as described in the RDF Primer. In this paper, we propose a novel approach called Singleton Property for representing statements about statements and provide a formal semantics for it. We explain how this singleton property approach fits well with the existing syntax and formal semantics of RDF, and the syntax of SPARQL query language. We also demonstrate the use of singleton property in the representation and querying of meta knowledge in two examples of Semantic Web knowledge bases: YAGO2 and BKR. Our experiments on the BKR show that the singleton property approach gives a decent performance in terms of number of triples, query length and query execution time compared to existing approaches. This approach, which is also simple and intuitive, can be easily adopted for representing and querying statements about statements in other knowledge bases. PMID:25750938

  4. Federated Space-Time Query for Earth Science Data Using OpenSearch Conventions

    NASA Astrophysics Data System (ADS)

    Lynnes, C.; Beaumont, B.; Duerr, R. E.; Hua, H.

    2009-12-01

    The past decade has seen a burgeoning of remote sensing and Earth science data providers, as evidenced in the growth of the Earth Science Information Partner (ESIP) federation. At the same time, the need to combine diverse data sets to enable understanding of the Earth as a system has also grown. While the expansion of data providers is in general a boon to such studies, the diversity presents a challenge to finding useful data for a given study. Locating all the data files with aerosol information for a particular volcanic eruption, for example, may involve learning and using several different search tools to execute the requisite space-time queries. To address this issue, the ESIP federation is developing a federated space-time query framework, based on the OpenSearch convention (www.opensearch.org), with Geo and Time extensions. In this framework, data providers publish OpenSearch Description Documents that describe in a machine-readable form how to execute queries against the provider. The novelty of OpenSearch is that the space-time query interface becomes both machine callable and easy enough to integrate into the web browser's search box. This flexibility, together with a simple REST (HTTP-get) interface, should allow a variety of data providers to participate in the federated search framework, from large institutional data centers to individual scientists. The simple interface enables trivial querying of multiple data sources and participation in recursive-like federated searches--all using the same common OpenSearch interface. This simplicity also makes the construction of clients easy, as does existing OpenSearch client libraries in a variety of languages. Moreover, a number of clients and aggregation services already exist and OpenSearch is already supported by a number of web browsers such as Firefox and Internet Explorer.

  5. An RDF/OWL knowledge base for query answering and decision support in clinical pharmacogenetics.

    PubMed

    Samwald, Matthias; Freimuth, Robert; Luciano, Joanne S; Lin, Simon; Powers, Robert L; Marshall, M Scott; Adlassnig, Klaus-Peter; Dumontier, Michel; Boyce, Richard D

    2013-01-01

    Genetic testing for personalizing pharmacotherapy is bound to become an important part of clinical routine. To address associated issues with data management and quality, we are creating a semantic knowledge base for clinical pharmacogenetics. The knowledge base is made up of three components: an expressive ontology formalized in the Web Ontology Language (OWL 2 DL), a Resource Description Framework (RDF) model for capturing detailed results of manual annotation of pharmacogenomic information in drug product labels, and an RDF conversion of relevant biomedical datasets. Our work goes beyond the state of the art in that it makes both automated reasoning as well as query answering as simple as possible, and the reasoning capabilities go beyond the capabilities of previously described ontologies.

  6. Engineering the ATLAS TAG Browser

    NASA Astrophysics Data System (ADS)

    Zhang, Qizhi; ATLAS Collaboration

    2011-12-01

    ELSSI is a web-based event metadata (TAG) browser and event-level selection service for ATLAS. In this paper, we describe some of the challenges encountered in the process of developing ELSSI, and the software engineering strategies adopted to address those challenges. Approaches to management of access to data, browsing, data rendering, query building, query validation, execution, connection management, and communication with auxiliary services are discussed. We also describe strategies for dealing with data that may vary over time, such as run-dependent trigger decision decoding. Along with examples, we illustrate how programming techniques in multiple languages (PHP, JAVASCRIPT, XML, AJAX, and PL/SQL) have been blended to achieve the required results. Finally, we evaluate features of the ELSSI service in terms of functionality, scalability, and performance.

  7. Astronomical Data Integration Beyond the Virtual Observatory

    NASA Astrophysics Data System (ADS)

    Lemson, G.; Laurino, O.

    2015-09-01

    "Data integration" generally refers to the process of combining data from different source data bases into a unified view. Much work has been devoted in this area by the International Virtual Observatory Alliance (IVOA), allowing users to discover and access databases through standard protocols. However, different archives present their data through their own schemas and users must still select, filter, and combine data for each archive individually. An important reason for this is that the creation of common data models that satisfy all sub-disciplines is fraught with difficulties. Furthermore it requires a substantial amount of work for data providers to present their data according to some standard representation. We will argue that existing standards allow us to build a data integration framework that works around these problems. The particular framework requires the implementation of the IVOA Table Access Protocol (TAP) only. It uses the newly developed VO data modelling language (VO-DML) specification, which allows one to define extensible object-oriented data models using a subset of UML concepts through a simple XML serialization language. A rich mapping language allows one to describe how instances of VO-DML data models are represented by the TAP service, bridging the possible mismatch between a local archive's schema and some agreed-upon representation of the astronomical domain. In this so called local-as-view approach to data integration, “mediators" use the mapping prescriptions to translate queries phrased in terms of the common schema to the underlying TAP service. This mapping language has a graphical representation, which we expose through a web based graphical “drag-and-drop-and-connect" interface. This service allows any user to map the holdings of any TAP service to the data model(s) of choice. The mappings are defined and stored outside of the data sources themselves, which allows the interface to be used in a kind of crowd-sourcing effort to annotate any remote database of interest. This reduces the burden of publishing one's data and allows a great flexibility in the definition of the views through which particular communities might wish to access remote archives. At the same time, the framework easies the user's effort to select, filter, and combine data from many different archives, so as to build knowledge bases for their analysis. We will present the framework and demonstrate a prototype implementation. We will discuss ideas for producing the missing elements, in particular the query language and the implementation of mediator tools to translate object queries to ADQL

  8. A new reference implementation of the PSICQUIC web service.

    PubMed

    del-Toro, Noemi; Dumousseau, Marine; Orchard, Sandra; Jimenez, Rafael C; Galeota, Eugenia; Launay, Guillaume; Goll, Johannes; Breuer, Karin; Ono, Keiichiro; Salwinski, Lukasz; Hermjakob, Henning

    2013-07-01

    The Proteomics Standard Initiative Common QUery InterfaCe (PSICQUIC) specification was created by the Human Proteome Organization Proteomics Standards Initiative (HUPO-PSI) to enable computational access to molecular-interaction data resources by means of a standard Web Service and query language. Currently providing >150 million binary interaction evidences from 28 servers globally, the PSICQUIC interface allows the concurrent search of multiple molecular-interaction information resources using a single query. Here, we present an extension of the PSICQUIC specification (version 1.3), which has been released to be compliant with the enhanced standards in molecular interactions. The new release also includes a new reference implementation of the PSICQUIC server available to the data providers. It offers augmented web service capabilities and improves the user experience. PSICQUIC has been running for almost 5 years, with a user base growing from only 4 data providers to 28 (April 2013) allowing access to 151 310 109 binary interactions. The power of this web service is shown in PSICQUIC View web application, an example of how to simultaneously query, browse and download results from the different PSICQUIC servers. This application is free and open to all users with no login requirement (http://www.ebi.ac.uk/Tools/webservices/psicquic/view/main.xhtml).

  9. Spatial language and converseness.

    PubMed

    Burigo, Michele; Coventry, Kenny R; Cangelosi, Angelo; Lynott, Dermot

    2016-12-01

    Typical spatial language sentences consist of describing the location of an object (the located object) in relation to another object (the reference object) as in "The book is above the vase". While it has been suggested that the properties of the located object (the book) are not translated into language because they are irrelevant when exchanging location information, it has been shown that the orientation of the located object affects the production and comprehension of spatial descriptions. In line with the claim that spatial language apprehension involves inferences about relations that hold between objects it has been suggested that during spatial language apprehension people use the orientation of the located object to evaluate whether the logical property of converseness (e.g., if "the book is above the vase" is true, then also "the vase is below the book" must be true) holds across the objects' spatial relation. In three experiments using sentence acceptability rating tasks we tested this hypothesis and demonstrated that when converseness is violated people's acceptability ratings of a scene's description are reduced indicating that people do take into account geometric properties of the located object and use it to infer logical spatial relations.

  10. Architecture and prototypical implementation of a semantic querying system for big Earth observation image bases

    PubMed Central

    Tiede, Dirk; Baraldi, Andrea; Sudmanns, Martin; Belgiu, Mariana; Lang, Stefan

    2017-01-01

    ABSTRACT Spatiotemporal analytics of multi-source Earth observation (EO) big data is a pre-condition for semantic content-based image retrieval (SCBIR). As a proof of concept, an innovative EO semantic querying (EO-SQ) subsystem was designed and prototypically implemented in series with an EO image understanding (EO-IU) subsystem. The EO-IU subsystem is automatically generating ESA Level 2 products (scene classification map, up to basic land cover units) from optical satellite data. The EO-SQ subsystem comprises a graphical user interface (GUI) and an array database embedded in a client server model. In the array database, all EO images are stored as a space-time data cube together with their Level 2 products generated by the EO-IU subsystem. The GUI allows users to (a) develop a conceptual world model based on a graphically supported query pipeline as a combination of spatial and temporal operators and/or standard algorithms and (b) create, save and share within the client-server architecture complex semantic queries/decision rules, suitable for SCBIR and/or spatiotemporal EO image analytics, consistent with the conceptual world model. PMID:29098143

  11. Content-based retrieval of historical Ottoman documents stored as textual images.

    PubMed

    Saykol, Ediz; Sinop, Ali Kemal; Güdükbay, Ugur; Ulusoy, Ozgür; Cetin, A Enis

    2004-03-01

    There is an accelerating demand to access the visual content of documents stored in historical and cultural archives. Availability of electronic imaging tools and effective image processing techniques makes it feasible to process the multimedia data in large databases. In this paper, a framework for content-based retrieval of historical documents in the Ottoman Empire archives is presented. The documents are stored as textual images, which are compressed by constructing a library of symbols occurring in a document, and the symbols in the original image are then replaced with pointers into the codebook to obtain a compressed representation of the image. The features in wavelet and spatial domain based on angular and distance span of shapes are used to extract the symbols. In order to make content-based retrieval in historical archives, a query is specified as a rectangular region in an input image and the same symbol-extraction process is applied to the query region. The queries are processed on the codebook of documents and the query images are identified in the resulting documents using the pointers in textual images. The querying process does not require decompression of images. The new content-based retrieval framework is also applicable to many other document archives using different scripts.

  12. Random and Directed Walk-Based Top-k Queries in Wireless Sensor Networks

    PubMed Central

    Fu, Jun-Song; Liu, Yun

    2015-01-01

    In wireless sensor networks, filter-based top-k query approaches are the state-of-the-art solutions and have been extensively researched in the literature, however, they are very sensitive to the network parameters, including the size of the network, dynamics of the sensors’ readings and declines in the overall range of all the readings. In this work, a random walk-based top-k query approach called RWTQ and a directed walk-based top-k query approach called DWTQ are proposed. At the beginning of a top-k query, one or several tokens are sent to the specific node(s) in the network by the base station. Then, each token walks in the network independently to record and process the readings in a random or directed way. A strategy of choosing the “right” way in DWTQ is carefully designed for the token(s) to arrive at the high-value regions as soon as possible. When designing the walking strategy for DWTQ, the spatial correlations of the readings are also considered. Theoretical analysis and simulation results indicate that RWTQ and DWTQ both are very robust against these parameters discussed previously. In addition, DWTQ outperforms TAG, FILA and EXTOK in transmission cost, energy consumption and network lifetime. PMID:26016914

  13. Distributed XQuery-Based Integration and Visualization of Multimodality Brain Mapping Data

    PubMed Central

    Detwiler, Landon T.; Suciu, Dan; Franklin, Joshua D.; Moore, Eider B.; Poliakov, Andrew V.; Lee, Eunjung S.; Corina, David P.; Ojemann, George A.; Brinkley, James F.

    2008-01-01

    This paper addresses the need for relatively small groups of collaborating investigators to integrate distributed and heterogeneous data about the brain. Although various national efforts facilitate large-scale data sharing, these approaches are generally too “heavyweight” for individual or small groups of investigators, with the result that most data sharing among collaborators continues to be ad hoc. Our approach to this problem is to create a “lightweight” distributed query architecture, in which data sources are accessible via web services that accept arbitrary query languages but return XML results. A Distributed XQuery Processor (DXQP) accepts distributed XQueries in which subqueries are shipped to the remote data sources to be executed, with the resulting XML integrated by DXQP. A web-based application called DXBrain accesses DXQP, allowing a user to create, save and execute distributed XQueries, and to view the results in various formats including a 3-D brain visualization. Example results are presented using distributed brain mapping data sources obtained in studies of language organization in the brain, but any other XML source could be included. The advantage of this approach is that it is very easy to add and query a new source, the tradeoff being that the user needs to understand XQuery and the schemata of the underlying sources. For small numbers of known sources this burden is not onerous for a knowledgeable user, leading to the conclusion that the system helps to fill the gap between ad hoc local methods and large scale but complex national data sharing efforts. PMID:19198662

  14. Language Preferences on Websites and in Google Searches for Human Health and Food Information

    PubMed Central

    Singh, Punam Mony; Wight, Carly A; Sercinoglu, Olcan; Wilson, David C; Boytsov, Artem

    2007-01-01

    Background While it is known that the majority of pages on the World Wide Web are in English, little is known about the preferred language of users searching for health information online. Objectives (1) To help global and domestic publishers, for example health and food agencies, to determine the need for translation of online information from English into local languages. (2) To help these agencies determine which language(s) they should select when publishing information online in target nations and for target subpopulations within nations. Methods To estimate the percentage of Web publishers that translate their health and food websites, we measured the frequency at which domain names retrieved by Google overlap for language translations of the same health-related search term. To quantify language choice of searchers from different countries, Google provided estimates of the rate at which its search engine was queried in six languages relative to English for the terms “avian flu,” “tuberculosis,” “schizophrenia,” and “maize” (corn) from January 2004 to April 2006. The estimate was based on a 20% sample of all Google queries from 227 nations. Results We estimate that 80%-90% of health- and food-related institutions do not translate their websites into multiple languages, even when the information concerns pandemic disease such as avian influenza. Although Internet users are often well-educated, there was a strong preference for searching for health and food information in the local language, rather than English. For “avian flu,” we found that only 1% of searches in non-English-speaking nations were in English, whereas for “tuberculosis” or “schizophrenia,” about 4%-40% of searches in non-English countries employed English. A subset of searches for health information presumably originating from immigrants occurred in their native tongue, not the language of the adopted country. However, Spanish-language online searches for “avian flu,” “schizophrenia,” and “maize/corn” in the United States occurred at only <1% of the English search rate, although the US online Hispanic population constitutes 12% of the total US online population. Sub-Saharan Africa and Bangladesh searches for health information occurred in unexpected languages, perhaps reflecting the presence of aid workers and the global migration of Internet users, respectively. In Latin America, indigenous-language search terms were often used rather than Spanish. Conclusions (1) Based on the strong preference for searching the Internet for health information in the local language, indigenous language, or immigrant language of origin, global and domestic health and food agencies should continue their efforts to translate their institutional websites into more languages. (2) We have provided linguistic online search pattern data to help health and food agencies better select languages for targeted website publishing. PMID:17613488

  15. What Does Children's Spatial Language Reveal about Spatial Concepts? Evidence from the Use of Containment Expressions

    ERIC Educational Resources Information Center

    Johanson, Megan; Papafragou, Anna

    2014-01-01

    Children's overextensions of spatial language are often taken to reveal spatial biases. However, it is unclear whether extension patterns should be attributed to children's overly general spatial concepts or to a narrower notion of conceptual similarity allowing metaphor-like extensions. We describe a previously unnoticed extension of…

  16. EarthServer: a Summary of Achievements in Technology, Services, and Standards

    NASA Astrophysics Data System (ADS)

    Baumann, Peter

    2015-04-01

    Big Data in the Earth sciences, the Tera- to Exabyte archives, mostly are made up from coverage data, according to ISO and OGC defined as the digital representation of some space-time varying phenomenon. Common examples include 1-D sensor timeseries, 2-D remote sensing imagery, 3D x/y/t image timese ries and x/y/z geology data, and 4-D x/y/z/t atmosphere and ocean data. Analytics on such data requires on-demand processing of sometimes significant complexity, such as getting the Fourier transform of satellite images. As network bandwidth limits prohibit transfer of such Big Data it is indispensable to devise protocols allowing clients to task flexible and fast processing on the server. The transatlantic EarthServer initiative, running from 2011 through 2014, has united 11 partners to establish Big Earth Data Analytics. A key ingredient has been flexibility for users to ask whatever they want, not impeded and complicated by system internals. The EarthServer answer to this is to use high-level, standards-based query languages which unify data and metadata search in a simple, yet powerful way. A second key ingredient is scalability. Without any doubt, scalability ultimately can only be achieved through parallelization. In the past, parallelizing cod e has been done at compile time and usually with manual intervention. The EarthServer approach is to perform a samentic-based dynamic distribution of queries fragments based on networks optimization and further criteria. The EarthServer platform is comprised by rasdaman, the pioneer and leading Array DBMS built for any-size multi-dimensional raster data being extended with support for irregular grids and general meshes; in-situ retrieval (evaluation of database queries on existing archive structures, avoiding data import and, hence, duplication); the aforementioned distributed query processing. Additionally, Web clients for multi-dimensional data visualization are being established. Client/server interfaces are strictly based on OGC and W3C standards, in particular the Web Coverage Processing Service (WCPS) which defines a high-level coverage query language. Reviewers have attested EarthServer that "With no doubt the project has been shaping the Big Earth Data landscape through the standardization activities within OGC, ISO and beyond". We present the project approach, its outcomes and impact on standardization and Big Data technology, and vistas for the future.

  17. ProteoLens: a visual analytic tool for multi-scale database-driven biological network data mining.

    PubMed

    Huan, Tianxiao; Sivachenko, Andrey Y; Harrison, Scott H; Chen, Jake Y

    2008-08-12

    New systems biology studies require researchers to understand how interplay among myriads of biomolecular entities is orchestrated in order to achieve high-level cellular and physiological functions. Many software tools have been developed in the past decade to help researchers visually navigate large networks of biomolecular interactions with built-in template-based query capabilities. To further advance researchers' ability to interrogate global physiological states of cells through multi-scale visual network explorations, new visualization software tools still need to be developed to empower the analysis. A robust visual data analysis platform driven by database management systems to perform bi-directional data processing-to-visualizations with declarative querying capabilities is needed. We developed ProteoLens as a JAVA-based visual analytic software tool for creating, annotating and exploring multi-scale biological networks. It supports direct database connectivity to either Oracle or PostgreSQL database tables/views, on which SQL statements using both Data Definition Languages (DDL) and Data Manipulation languages (DML) may be specified. The robust query languages embedded directly within the visualization software help users to bring their network data into a visualization context for annotation and exploration. ProteoLens supports graph/network represented data in standard Graph Modeling Language (GML) formats, and this enables interoperation with a wide range of other visual layout tools. The architectural design of ProteoLens enables the de-coupling of complex network data visualization tasks into two distinct phases: 1) creating network data association rules, which are mapping rules between network node IDs or edge IDs and data attributes such as functional annotations, expression levels, scores, synonyms, descriptions etc; 2) applying network data association rules to build the network and perform the visual annotation of graph nodes and edges according to associated data values. We demonstrated the advantages of these new capabilities through three biological network visualization case studies: human disease association network, drug-target interaction network and protein-peptide mapping network. The architectural design of ProteoLens makes it suitable for bioinformatics expert data analysts who are experienced with relational database management to perform large-scale integrated network visual explorations. ProteoLens is a promising visual analytic platform that will facilitate knowledge discoveries in future network and systems biology studies.

  18. Atypical hemispheric dominance for attention: functional MRI topography.

    PubMed

    Flöel, Agnes; Jansen, Andreas; Deppe, Michael; Kanowski, Martin; Konrad, Carsten; Sommer, Jens; Knecht, Stefan

    2005-09-01

    The right hemisphere is predominantly involved in tasks associated with spatial attention. However, left hemispheric dominance for spatial attention can be found in healthy individuals, and both spatial attention and language can be lateralized to the same hemisphere. Little is known about the underlying regional distribution of neural activation in these 'atypical' individuals. Previously a large number of healthy subjects were screened for hemispheric dominance of visuospatial attention and language, using functional Doppler ultrasonography. From this group, subjects were chosen who were 'atypical' for hemispheric dominance of visuospatial attention and language, and their pattern of brain activation was studied with functional magnetic resonance imaging during a task probing spatial attention. Right-handed subjects with the 'typical' pattern of brain organization served as control subjects. It was found that subjects with an inverted lateralization of language and spatial attention (language right, attention left) recruited left-hemispheric areas in the attention task, homotopic to those recruited by control subjects in the right hemisphere. Subjects with lateralization of both language and attention to the right hemisphere activated an attentional network in the right hemisphere that was comparable to control subjects. The present findings suggest that not the hemispheric side, but the intrahemispheric pattern of activation is the distinct feature for the neural processes underlying language and attention.

  19. Spatial and Facial Processing in the Signed Discourse of Two Groups of Deaf Signers with Clinical Language Impairment

    ERIC Educational Resources Information Center

    Penn, Claire; Commerford, Ann; Ogilvy, Dale

    2007-01-01

    The linguistic and cognitive profiles of five deaf adults with a sign language disorder were compared with those of matched deaf controls. The test involved a battery of sign language tests, a signed narrative discourse task and a neuropsychological test protocol administered in sign language. Spatial syntax and facial processing were examined in…

  20. A novel visualization model for web search results.

    PubMed

    Nguyen, Tien N; Zhang, Jin

    2006-01-01

    This paper presents an interactive visualization system, named WebSearchViz, for visualizing the Web search results and acilitating users' navigation and exploration. The metaphor in our model is the solar system with its planets and asteroids revolving around the sun. Location, color, movement, and spatial distance of objects in the visual space are used to represent the semantic relationships between a query and relevant Web pages. Especially, the movement of objects and their speeds add a new dimension to the visual space, illustrating the degree of relevance among a query and Web search results in the context of users' subjects of interest. By interacting with the visual space, users are able to observe the semantic relevance between a query and a resulting Web page with respect to their subjects of interest, context information, or concern. Users' subjects of interest can be dynamically changed, redefined, added, or deleted from the visual space.

Top