spatial join queries: Topics by Science.gov

Sample records for spatial join queries

Towards Building a High Performance Spatial Query System for Large Scale Medical Imaging Data.

PubMed

Aji, Ablimit; Wang, Fusheng; Saltz, Joel H

2012-11-06

Support of high performance queries on large volumes of scientific spatial data is becoming increasingly important in many applications. This growth is driven by not only geospatial problems in numerous fields, but also emerging scientific applications that are increasingly data- and compute-intensive. For example, digital pathology imaging has become an emerging field during the past decade, where examination of high resolution images of human tissue specimens enables more effective diagnosis, prediction and treatment of diseases. Systematic analysis of large-scale pathology images generates tremendous amounts of spatially derived quantifications of micro-anatomic objects, such as nuclei, blood vessels, and tissue regions. Analytical pathology imaging provides high potential to support image based computer aided diagnosis. One major requirement for this is effective querying of such enormous amount of data with fast response, which is faced with two major challenges: the "big data" challenge and the high computation complexity. In this paper, we present our work towards building a high performance spatial query system for querying massive spatial data on MapReduce. Our framework takes an on demand index building approach for processing spatial queries and a partition-merge approach for building parallel spatial query pipelines, which fits nicely with the computing model of MapReduce. We demonstrate our framework on supporting multi-way spatial joins for algorithm evaluation and nearest neighbor queries for microanatomic objects. To reduce query response time, we propose cost based query optimization to mitigate the effect of data skew. Our experiments show that the framework can efficiently support complex analytical spatial queries on MapReduce.
Towards Building a High Performance Spatial Query System for Large Scale Medical Imaging Data

PubMed Central

Aji, Ablimit; Wang, Fusheng; Saltz, Joel H.

2013-01-01

Support of high performance queries on large volumes of scientific spatial data is becoming increasingly important in many applications. This growth is driven by not only geospatial problems in numerous fields, but also emerging scientific applications that are increasingly data- and compute-intensive. For example, digital pathology imaging has become an emerging field during the past decade, where examination of high resolution images of human tissue specimens enables more effective diagnosis, prediction and treatment of diseases. Systematic analysis of large-scale pathology images generates tremendous amounts of spatially derived quantifications of micro-anatomic objects, such as nuclei, blood vessels, and tissue regions. Analytical pathology imaging provides high potential to support image based computer aided diagnosis. One major requirement for this is effective querying of such enormous amount of data with fast response, which is faced with two major challenges: the “big data” challenge and the high computation complexity. In this paper, we present our work towards building a high performance spatial query system for querying massive spatial data on MapReduce. Our framework takes an on demand index building approach for processing spatial queries and a partition-merge approach for building parallel spatial query pipelines, which fits nicely with the computing model of MapReduce. We demonstrate our framework on supporting multi-way spatial joins for algorithm evaluation and nearest neighbor queries for microanatomic objects. To reduce query response time, we propose cost based query optimization to mitigate the effect of data skew. Our experiments show that the framework can efficiently support complex analytical spatial queries on MapReduce. PMID:24501719
A Big Spatial Data Processing Framework Applying to National Geographic Conditions Monitoring

NASA Astrophysics Data System (ADS)

Xiao, F.

2018-04-01

In this paper, a novel framework for spatial data processing is proposed, which apply to National Geographic Conditions Monitoring project of China. It includes 4 layers: spatial data storage, spatial RDDs, spatial operations, and spatial query language. The spatial data storage layer uses HDFS to store large size of spatial vector/raster data in the distributed cluster. The spatial RDDs are the abstract logical dataset of spatial data types, and can be transferred to the spark cluster to conduct spark transformations and actions. The spatial operations layer is a series of processing on spatial RDDs, such as range query, k nearest neighbor and spatial join. The spatial query language is a user-friendly interface which provide people not familiar with Spark with a comfortable way to operation the spatial operation. Compared with other spatial frameworks, it is highlighted that comprehensive technologies are referred for big spatial data processing. Extensive experiments on real datasets show that the framework achieves better performance than traditional process methods.
Ad-Hoc Queries over Document Collections - A Case Study

NASA Astrophysics Data System (ADS)

Löser, Alexander; Lutter, Steffen; Düssel, Patrick; Markl, Volker

We discuss the novel problem of supporting analytical business intelligence queries over web-based textual content, e.g., BI-style reports based on 100.000's of documents from an ad-hoc web search result. Neither conventional search engines nor conventional Business Intelligence and ETL tools address this problem, which lies at the intersection of their capabilities. "Google Squared" or our system GOOLAP.info, are examples of these kinds of systems. They execute information extraction methods over one or several document collections at query time and integrate extracted records into a common view or tabular structure. Frequent extraction and object resolution failures cause incomplete records which could not be joined into a record answering the query. Our focus is the identification of join-reordering heuristics maximizing the size of complete records answering a structured query. With respect to given costs for document extraction we propose two novel join-operations: The multi-way CJ-operator joins records from multiple relationships extracted from a single document. The two-way join-operator DJ ensures data density by removing incomplete records from results. In a preliminary case study we observe that our join-reordering heuristics positively impact result size, record density and lower execution costs.
IJA: an efficient algorithm for query processing in sensor networks.

PubMed

Lee, Hyun Chang; Lee, Young Jae; Lim, Ji Hyang; Kim, Dong Hwa

2011-01-01

One of main features in sensor networks is the function that processes real time state information after gathering needed data from many domains. The component technologies consisting of each node called a sensor node that are including physical sensors, processors, actuators and power have advanced significantly over the last decade. Thanks to the advanced technology, over time sensor networks have been adopted in an all-round industry sensing physical phenomenon. However, sensor nodes in sensor networks are considerably constrained because with their energy and memory resources they have a very limited ability to process any information compared to conventional computer systems. Thus query processing over the nodes should be constrained because of their limitations. Due to the problems, the join operations in sensor networks are typically processed in a distributed manner over a set of nodes and have been studied. By way of example while simple queries, such as select and aggregate queries, in sensor networks have been addressed in the literature, the processing of join queries in sensor networks remains to be investigated. Therefore, in this paper, we propose and describe an Incremental Join Algorithm (IJA) in Sensor Networks to reduce the overhead caused by moving a join pair to the final join node or to minimize the communication cost that is the main consumer of the battery when processing the distributed queries in sensor networks environments. At the same time, the simulation result shows that the proposed IJA algorithm significantly reduces the number of bytes to be moved to join nodes compared to the popular synopsis join algorithm.
IJA: An Efficient Algorithm for Query Processing in Sensor Networks

PubMed Central

Lee, Hyun Chang; Lee, Young Jae; Lim, Ji Hyang; Kim, Dong Hwa

2011-01-01

One of main features in sensor networks is the function that processes real time state information after gathering needed data from many domains. The component technologies consisting of each node called a sensor node that are including physical sensors, processors, actuators and power have advanced significantly over the last decade. Thanks to the advanced technology, over time sensor networks have been adopted in an all-round industry sensing physical phenomenon. However, sensor nodes in sensor networks are considerably constrained because with their energy and memory resources they have a very limited ability to process any information compared to conventional computer systems. Thus query processing over the nodes should be constrained because of their limitations. Due to the problems, the join operations in sensor networks are typically processed in a distributed manner over a set of nodes and have been studied. By way of example while simple queries, such as select and aggregate queries, in sensor networks have been addressed in the literature, the processing of join queries in sensor networks remains to be investigated. Therefore, in this paper, we propose and describe an Incremental Join Algorithm (IJA) in Sensor Networks to reduce the overhead caused by moving a join pair to the final join node or to minimize the communication cost that is the main consumer of the battery when processing the distributed queries in sensor networks environments. At the same time, the simulation result shows that the proposed IJA algorithm significantly reduces the number of bytes to be moved to join nodes compared to the popular synopsis join algorithm. PMID:22319375
HodDB: Design and Analysis of a Query Processor for Brick.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fierro, Gabriel; Culler, David

Brick is a recently proposed metadata schema and ontology for describing building components and the relationships between them. It represents buildings as directed labeled graphs using the RDF data model. Using the SPARQL query language, building-agnostic applications query a Brick graph to discover the set of resources and relationships they require to operate. Latency-sensitive applications, such as user interfaces, demand response and modelpredictive control, require fast queries — conventionally less than 100ms. We benchmark a set of popular open-source and commercial SPARQL databases against three real Brick models using seven application queries and find that none of them meet thismore » performance target. This lack of performance can be attributed to design decisions that optimize for queries over large graphs consisting of billions of triples, but give poor spatial locality and join performance on the small dense graphs typical of Brick. We present the design and evaluation of HodDB, a RDF/SPARQL database for Brick built over a node-based index structure. HodDB performs Brick queries 3-700x faster than leading SPARQL databases and consistently meets the 100ms threshold, enabling the portability of important latency-sensitive building applications.« less
SkyQuery - A Prototype Distributed Query and Cross-Matching Web Service for the Virtual Observatory

NASA Astrophysics Data System (ADS)

Thakar, A. R.; Budavari, T.; Malik, T.; Szalay, A. S.; Fekete, G.; Nieto-Santisteban, M.; Haridas, V.; Gray, J.

2002-12-01

We have developed a prototype distributed query and cross-matching service for the VO community, called SkyQuery, which is implemented with hierarchichal Web Services. SkyQuery enables astronomers to run combined queries on existing distributed heterogeneous astronomy archives. SkyQuery provides a simple, user-friendly interface to run distributed queries over the federation of registered astronomical archives in the VO. The SkyQuery client connects to the portal Web Service, which farms the query out to the individual archives, which are also Web Services called SkyNodes. The cross-matching algorithm is run recursively on each SkyNode. Each archive is a relational DBMS with a HTM index for fast spatial lookups. The results of the distributed query are returned as an XML DataSet that is automatically rendered by the client. SkyQuery also returns the image cutout corresponding to the query result. SkyQuery finds not only matches between the various catalogs, but also dropouts - objects that exist in some of the catalogs but not in others. This is often as important as finding matches. We demonstrate the utility of SkyQuery with a brown-dwarf search between SDSS and 2MASS, and a search for radio-quiet quasars in SDSS, 2MASS and FIRST. The importance of a service like SkyQuery for the worldwide astronomical community cannot be overstated: data on the same objects in various archives is mapped in different wavelength ranges and looks very different due to different errors, instrument sensitivities and other peculiarities of each archive. Our cross-matching algorithm preforms a fuzzy spatial join across multiple catalogs. This type of cross-matching is currently often done by eye, one object at a time. A static cross-identification table for a set of archives would become obsolete by the time it was built - the exponential growth of astronomical data means that a dynamic cross-identification mechanism like SkyQuery is the only viable option. SkyQuery was funded by a grant from the NASA AISR program.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Madduri, Kamesh; Wu, Kesheng

The Resource Description Framework (RDF) is a popular data model for representing linked data sets arising from the web, as well as large scienti c data repositories such as UniProt. RDF data intrinsically represents a labeled and directed multi-graph. SPARQL is a query language for RDF that expresses subgraph pattern- nding queries on this implicit multigraph in a SQL- like syntax. SPARQL queries generate complex intermediate join queries; to compute these joins e ciently, we propose a new strategy based on bitmap indexes. We store the RDF data in column-oriented structures as compressed bitmaps along with two dictionaries. This papermore » makes three new contributions. (i) We present an e cient parallel strategy for parsing the raw RDF data, building dictionaries of unique entities, and creating compressed bitmap indexes of the data. (ii) We utilize the constructed bitmap indexes to e ciently answer SPARQL queries, simplifying the join evaluations. (iii) To quantify the performance impact of using bitmap indexes, we compare our approach to the state-of-the-art triple-store RDF-3X. We nd that our bitmap index-based approach to answering queries is up to an order of magnitude faster for a variety of SPARQL queries, on gigascale RDF data sets.« less
Enhancing SAMOS Data Access in DOMS via a Neo4j Property Graph Database.

NASA Astrophysics Data System (ADS)

Stallard, A. P.; Smith, S. R.; Elya, J. L.

2016-12-01

The Shipboard Automated Meteorological and Oceanographic System (SAMOS) initiative provides routine access to high-quality marine meteorological and near-surface oceanographic observations from research vessels. The Distributed Oceanographic Match-Up Service (DOMS) under development is a centralized service that allows researchers to easily match in situ and satellite oceanographic data from distributed sources to facilitate satellite calibration, validation, and retrieval algorithm development. The service currently uses Apache Solr as a backend search engine on each node in the distributed network. While Solr is a high-performance solution that facilitates creation and maintenance of indexed data, it is limited in the sense that its schema is fixed. The property graph model escapes this limitation by creating relationships between data objects. The authors will present the development of the SAMOS Neo4j property graph database including new search possibilities that take advantage of the property graph model, performance comparisons with Apache Solr, and a vision for graph databases as a storage tool for oceanographic data. The integration of the SAMOS Neo4j graph into DOMS will also be described. Currently, Neo4j contains spatial and temporal records from SAMOS which are modeled into a time tree and r-tree using Graph Aware and Spatial plugin tools for Neo4j. These extensions provide callable Java procedures within CYPHER (Neo4j's query language) that generate in-graph structures. Once generated, these structures can be queried using procedures from these libraries, or directly via CYPHER statements. Neo4j excels at performing relationship and path-based queries, which challenge relational-SQL databases because they require memory intensive joins due to the limitation of their design. Consider a user who wants to find records over several years, but only for specific months. If a traditional database only stores timestamps, this type of query would be complex and likely prohibitively slow. Using the time tree model, one can specify a path from the root to the data which restricts resolutions to certain timeframes (e.g., months). This query can be executed without joins, unions, or other compute-intensive operations, putting Neo4j at a computational advantage to the SQL database alternative.
A high-performance spatial database based approach for pathology imaging algorithm evaluation

PubMed Central

Wang, Fusheng; Kong, Jun; Gao, Jingjing; Cooper, Lee A.D.; Kurc, Tahsin; Zhou, Zhengwen; Adler, David; Vergara-Niedermayr, Cristobal; Katigbak, Bryan; Brat, Daniel J.; Saltz, Joel H.

2013-01-01

Background: Algorithm evaluation provides a means to characterize variability across image analysis algorithms, validate algorithms by comparison with human annotations, combine results from multiple algorithms for performance improvement, and facilitate algorithm sensitivity studies. The sizes of images and image analysis results in pathology image analysis pose significant challenges in algorithm evaluation. We present an efficient parallel spatial database approach to model, normalize, manage, and query large volumes of analytical image result data. This provides an efficient platform for algorithm evaluation. Our experiments with a set of brain tumor images demonstrate the application, scalability, and effectiveness of the platform. Context: The paper describes an approach and platform for evaluation of pathology image analysis algorithms. The platform facilitates algorithm evaluation through a high-performance database built on the Pathology Analytic Imaging Standards (PAIS) data model. Aims: (1) Develop a framework to support algorithm evaluation by modeling and managing analytical results and human annotations from pathology images; (2) Create a robust data normalization tool for converting, validating, and fixing spatial data from algorithm or human annotations; (3) Develop a set of queries to support data sampling and result comparisons; (4) Achieve high performance computation capacity via a parallel data management infrastructure, parallel data loading and spatial indexing optimizations in this infrastructure. Materials and Methods: We have considered two scenarios for algorithm evaluation: (1) algorithm comparison where multiple result sets from different methods are compared and consolidated; and (2) algorithm validation where algorithm results are compared with human annotations. We have developed a spatial normalization toolkit to validate and normalize spatial boundaries produced by image analysis algorithms or human annotations. The validated data were formatted based on the PAIS data model and loaded into a spatial database. To support efficient data loading, we have implemented a parallel data loading tool that takes advantage of multi-core CPUs to accelerate data injection. The spatial database manages both geometric shapes and image features or classifications, and enables spatial sampling, result comparison, and result aggregation through expressive structured query language (SQL) queries with spatial extensions. To provide scalable and efficient query support, we have employed a shared nothing parallel database architecture, which distributes data homogenously across multiple database partitions to take advantage of parallel computation power and implements spatial indexing to achieve high I/O throughput. Results: Our work proposes a high performance, parallel spatial database platform for algorithm validation and comparison. This platform was evaluated by storing, managing, and comparing analysis results from a set of brain tumor whole slide images. The tools we develop are open source and available to download. Conclusions: Pathology image algorithm validation and comparison are essential to iterative algorithm development and refinement. One critical component is the support for queries involving spatial predicates and comparisons. In our work, we develop an efficient data model and parallel database approach to model, normalize, manage and query large volumes of analytical image result data. Our experiments demonstrate that the data partitioning strategy and the grid-based indexing result in good data distribution across database nodes and reduce I/O overhead in spatial join queries through parallel retrieval of relevant data and quick subsetting of datasets. The set of tools in the framework provide a full pipeline to normalize, load, manage and query analytical results for algorithm evaluation. PMID:23599905
Parallel multi-join query optimization algorithm for distributed sensor network in the internet of things

NASA Astrophysics Data System (ADS)

Zheng, Yan

2015-03-01

Internet of things (IoT), focusing on providing users with information exchange and intelligent control, attracts a lot of attention of researchers from all over the world since the beginning of this century. IoT is consisted of large scale of sensor nodes and data processing units, and the most important features of IoT can be illustrated as energy confinement, efficient communication and high redundancy. With the sensor nodes increment, the communication efficiency and the available communication band width become bottle necks. Many research work is based on the instance which the number of joins is less. However, it is not proper to the increasing multi-join query in whole internet of things. To improve the communication efficiency between parallel units in the distributed sensor network, this paper proposed parallel query optimization algorithm based on distribution attributes cost graph. The storage information relations and the network communication cost are considered in this algorithm, and an optimized information changing rule is established. The experimental result shows that the algorithm has good performance, and it would effectively use the resource of each node in the distributed sensor network. Therefore, executive efficiency of multi-join query between different nodes could be improved.
In-Network Processing of an Iceberg Join Query in Wireless Sensor Networks Based on 2-Way Fragment Semijoins

PubMed Central

Kang, Hyunchul

2015-01-01

We investigate the in-network processing of an iceberg join query in wireless sensor networks (WSNs). An iceberg join is a special type of join where only those joined tuples whose cardinality exceeds a certain threshold (called iceberg threshold) are qualified for the result. Processing such a join involves the value matching for the join predicate as well as the checking of the cardinality constraint for the iceberg threshold. In the previous scheme, the value matching is carried out as the main task for filtering non-joinable tuples while the iceberg threshold is treated as an additional constraint. We take an alternative approach, meeting the cardinality constraint first and matching values next. In this approach, with a logical fragmentation of the join operand relations on the aggregate counts of the joining attribute values, the optimal sequence of 2-way fragment semijoins is generated, where each fragment semijoin employs a Bloom filter as a synopsis of the joining attribute values. This sequence filters non-joinable tuples in an energy-efficient way in WSNs. Through implementation and a set of detailed experiments, we show that our alternative approach considerably outperforms the previous one. PMID:25774710
An advanced web query interface for biological databases

PubMed Central

Latendresse, Mario; Karp, Peter D.

2010-01-01

Although most web-based biological databases (DBs) offer some type of web-based form to allow users to author DB queries, these query forms are quite restricted in the complexity of DB queries that they can formulate. They can typically query only one DB, and can query only a single type of object at a time (e.g. genes) with no possible interaction between the objects—that is, in SQL parlance, no joins are allowed between DB objects. Writing precise queries against biological DBs is usually left to a programmer skillful enough in complex DB query languages like SQL. We present a web interface for building precise queries for biological DBs that can construct much more precise queries than most web-based query forms, yet that is user friendly enough to be used by biologists. It supports queries containing multiple conditions, and connecting multiple object types without using the join concept, which is unintuitive to biologists. This interactive web interface is called the Structured Advanced Query Page (SAQP). Users interactively build up a wide range of query constructs. Interactive documentation within the SAQP describes the schema of the queried DBs. The SAQP is based on BioVelo, a query language based on list comprehension. The SAQP is part of the Pathway Tools software and is available as part of several bioinformatics web sites powered by Pathway Tools, including the BioCyc.org site that contains more than 500 Pathway/Genome DBs. PMID:20624715
Group Centric Information Sharing Using Hierarchical Models

DTIC Science & Technology

2011-01-01

enable people to create data using RDF, build vocabularies using web ontology language (OWL), write rules and query data stores using SPARQL [8...a strict joined and the document was added with a strict add. In order to represent the fact that an action is allowed (or not), we have created a...greatly improve the system’s readiness to handle any number of access decision queries . a. The pair is tested against the gSIS Join and Add semantics
Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce.

PubMed

Aji, Ablimit; Wang, Fusheng; Vo, Hoang; Lee, Rubao; Liu, Qiaoling; Zhang, Xiaodong; Saltz, Joel

2013-08-01

Support of high performance queries on large volumes of spatial data becomes increasingly important in many application domains, including geospatial problems in numerous fields, location based services, and emerging scientific applications that are increasingly data- and compute-intensive. The emergence of massive scale spatial data is due to the proliferation of cost effective and ubiquitous positioning technologies, development of high resolution imaging technologies, and contribution from a large number of community users. There are two major challenges for managing and querying massive spatial data to support spatial queries: the explosion of spatial data, and the high computational complexity of spatial queries. In this paper, we present Hadoop-GIS - a scalable and high performance spatial data warehousing system for running large scale spatial queries on Hadoop. Hadoop-GIS supports multiple types of spatial queries on MapReduce through spatial partitioning, customizable spatial query engine RESQUE, implicit parallel spatial query execution on MapReduce, and effective methods for amending query results through handling boundary objects. Hadoop-GIS utilizes global partition indexing and customizable on demand local spatial indexing to achieve efficient query processing. Hadoop-GIS is integrated into Hive to support declarative spatial queries with an integrated architecture. Our experiments have demonstrated the high efficiency of Hadoop-GIS on query response and high scalability to run on commodity clusters. Our comparative experiments have showed that performance of Hadoop-GIS is on par with parallel SDBMS and outperforms SDBMS for compute-intensive queries. Hadoop-GIS is available as a set of library for processing spatial queries, and as an integrated software package in Hive.
Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce

PubMed Central

Aji, Ablimit; Wang, Fusheng; Vo, Hoang; Lee, Rubao; Liu, Qiaoling; Zhang, Xiaodong; Saltz, Joel

2013-01-01

Support of high performance queries on large volumes of spatial data becomes increasingly important in many application domains, including geospatial problems in numerous fields, location based services, and emerging scientific applications that are increasingly data- and compute-intensive. The emergence of massive scale spatial data is due to the proliferation of cost effective and ubiquitous positioning technologies, development of high resolution imaging technologies, and contribution from a large number of community users. There are two major challenges for managing and querying massive spatial data to support spatial queries: the explosion of spatial data, and the high computational complexity of spatial queries. In this paper, we present Hadoop-GIS – a scalable and high performance spatial data warehousing system for running large scale spatial queries on Hadoop. Hadoop-GIS supports multiple types of spatial queries on MapReduce through spatial partitioning, customizable spatial query engine RESQUE, implicit parallel spatial query execution on MapReduce, and effective methods for amending query results through handling boundary objects. Hadoop-GIS utilizes global partition indexing and customizable on demand local spatial indexing to achieve efficient query processing. Hadoop-GIS is integrated into Hive to support declarative spatial queries with an integrated architecture. Our experiments have demonstrated the high efficiency of Hadoop-GIS on query response and high scalability to run on commodity clusters. Our comparative experiments have showed that performance of Hadoop-GIS is on par with parallel SDBMS and outperforms SDBMS for compute-intensive queries. Hadoop-GIS is available as a set of library for processing spatial queries, and as an integrated software package in Hive. PMID:24187650
Demonstration of Hadoop-GIS: A Spatial Data Warehousing System Over MapReduce.

PubMed

Aji, Ablimit; Sun, Xiling; Vo, Hoang; Liu, Qioaling; Lee, Rubao; Zhang, Xiaodong; Saltz, Joel; Wang, Fusheng

2013-11-01

The proliferation of GPS-enabled devices, and the rapid improvement of scientific instruments have resulted in massive amounts of spatial data in the last decade. Support of high performance spatial queries on large volumes data has become increasingly important in numerous fields, which requires a scalable and efficient spatial data warehousing solution as existing approaches exhibit scalability limitations and efficiency bottlenecks for large scale spatial applications. In this demonstration, we present Hadoop-GIS - a scalable and high performance spatial query system over MapReduce. Hadoop-GIS provides an efficient spatial query engine to process spatial queries, data and space based partitioning, and query pipelines that parallelize queries implicitly on MapReduce. Hadoop-GIS also provides an expressive, SQL-like spatial query language for workload specification. We will demonstrate how spatial queries are expressed in spatially extended SQL queries, and submitted through a command line/web interface for execution. Parallel to our system demonstration, we explain the system architecture and details on how queries are translated to MapReduce operators, optimized, and executed on Hadoop. In addition, we will showcase how the system can be used to support two representative real world use cases: large scale pathology analytical imaging, and geo-spatial data warehousing.
System for Performing Single Query Searches of Heterogeneous and Dispersed Databases

NASA Technical Reports Server (NTRS)

Maluf, David A. (Inventor); Okimura, Takeshi (Inventor); Gurram, Mohana M. (Inventor); Tran, Vu Hoang (Inventor); Knight, Christopher D. (Inventor); Trinh, Anh Ngoc (Inventor)

2017-01-01

The present invention is a distributed computer system of heterogeneous databases joined in an information grid and configured with an Application Programming Interface hardware which includes a search engine component for performing user-structured queries on multiple heterogeneous databases in real time. This invention reduces overhead associated with the impedance mismatch that commonly occurs in heterogeneous database queries.
Spatial and symbolic queries for 3D image data

NASA Astrophysics Data System (ADS)

Benson, Daniel C.; Zick, Gregory L.

1992-04-01

We present a query system for an object-oriented biomedical imaging database containing 3-D anatomical structures and their corresponding 2-D images. The graphical interface facilitates the formation of spatial queries, nonspatial or symbolic queries, and combined spatial/symbolic queries. A query editor is used for the creation and manipulation of 3-D query objects as volumes, surfaces, lines, and points. Symbolic predicates are formulated through a combination of text fields and multiple choice selections. Query results, which may include images, image contents, composite objects, graphics, and alphanumeric data, are displayed in multiple views. Objects returned by the query may be selected directly within the views for further inspection or modification, or for use as query objects in subsequent queries. Our image database query system provides visual feedback and manipulation of spatial query objects, multiple views of volume data, and the ability to combine spatial and symbolic queries. The system allows for incremental enhancement of existing objects and the addition of new objects and spatial relationships. The query system is designed for databases containing symbolic and spatial data. This paper discuses its application to data acquired in biomedical 3- D image reconstruction, but it is applicable to other areas such as CAD/CAM, geographical information systems, and computer vision.

Demonstration of Hadoop-GIS: A Spatial Data Warehousing System Over MapReduce

PubMed Central

Aji, Ablimit; Sun, Xiling; Vo, Hoang; Liu, Qioaling; Lee, Rubao; Zhang, Xiaodong; Saltz, Joel; Wang, Fusheng

2016-01-01

The proliferation of GPS-enabled devices, and the rapid improvement of scientific instruments have resulted in massive amounts of spatial data in the last decade. Support of high performance spatial queries on large volumes data has become increasingly important in numerous fields, which requires a scalable and efficient spatial data warehousing solution as existing approaches exhibit scalability limitations and efficiency bottlenecks for large scale spatial applications. In this demonstration, we present Hadoop-GIS – a scalable and high performance spatial query system over MapReduce. Hadoop-GIS provides an efficient spatial query engine to process spatial queries, data and space based partitioning, and query pipelines that parallelize queries implicitly on MapReduce. Hadoop-GIS also provides an expressive, SQL-like spatial query language for workload specification. We will demonstrate how spatial queries are expressed in spatially extended SQL queries, and submitted through a command line/web interface for execution. Parallel to our system demonstration, we explain the system architecture and details on how queries are translated to MapReduce operators, optimized, and executed on Hadoop. In addition, we will showcase how the system can be used to support two representative real world use cases: large scale pathology analytical imaging, and geo-spatial data warehousing. PMID:27617325
An Expressive and Efficient Language for XML Information Retrieval.

ERIC Educational Resources Information Center

Chinenyanga, Taurai Tapiwa; Kushmerick, Nicholas

2002-01-01

Discusses XML and information retrieval and describes a query language, ELIXIR (expressive and efficient language for XML information retrieval), with a textual similarity operator that can be used for similarity joins. Explains the algorithm for answering ELIXIR queries to generate intermediate relational data. (Author/LRW)
Spatial information semantic query based on SPARQL

NASA Astrophysics Data System (ADS)

Xiao, Zhifeng; Huang, Lei; Zhai, Xiaofang

2009-10-01

How can the efficiency of spatial information inquiries be enhanced in today's fast-growing information age? We are rich in geospatial data but poor in up-to-date geospatial information and knowledge that are ready to be accessed by public users. This paper adopts an approach for querying spatial semantic by building an Web Ontology language(OWL) format ontology and introducing SPARQL Protocol and RDF Query Language(SPARQL) to search spatial semantic relations. It is important to establish spatial semantics that support for effective spatial reasoning for performing semantic query. Compared to earlier keyword-based and information retrieval techniques that rely on syntax, we use semantic approaches in our spatial queries system. Semantic approaches need to be developed by ontology, so we use OWL to describe spatial information extracted by the large-scale map of Wuhan. Spatial information expressed by ontology with formal semantics is available to machines for processing and to people for understanding. The approach is illustrated by introducing a case study for using SPARQL to query geo-spatial ontology instances of Wuhan. The paper shows that making use of SPARQL to search OWL ontology instances can ensure the result's accuracy and applicability. The result also indicates constructing a geo-spatial semantic query system has positive efforts on forming spatial query and retrieval.
a Novel Approach of Indexing and Retrieving Spatial Polygons for Efficient Spatial Region Queries

NASA Astrophysics Data System (ADS)

Zhao, J. H.; Wang, X. Z.; Wang, F. Y.; Shen, Z. H.; Zhou, Y. C.; Wang, Y. L.

2017-10-01

Spatial region queries are more and more widely used in web-based applications. Mechanisms to provide efficient query processing over geospatial data are essential. However, due to the massive geospatial data volume, heavy geometric computation, and high access concurrency, it is difficult to get response in real time. Spatial indexes are usually used in this situation. In this paper, based on k-d tree, we introduce a distributed KD-Tree (DKD-Tree) suitbable for polygon data, and a two-step query algorithm. The spatial index construction is recursive and iterative, and the query is an in memory process. Both the index and query methods can be processed in parallel, and are implemented based on HDFS, Spark and Redis. Experiments on a large volume of Remote Sensing images metadata have been carried out, and the advantages of our method are investigated by comparing with spatial region queries executed on PostgreSQL and PostGIS. Results show that our approach not only greatly improves the efficiency of spatial region query, but also has good scalability, Moreover, the two-step spatial range query algorithm can also save cluster resources to support a large number of concurrent queries. Therefore, this method is very useful when building large geographic information systems.
SPARQL Query Re-writing Using Partonomy Based Transformation Rules

NASA Astrophysics Data System (ADS)

Jain, Prateek; Yeh, Peter Z.; Verma, Kunal; Henson, Cory A.; Sheth, Amit P.

Often the information present in a spatial knowledge base is represented at a different level of granularity and abstraction than the query constraints. For querying ontology's containing spatial information, the precise relationships between spatial entities has to be specified in the basic graph pattern of SPARQL query which can result in long and complex queries. We present a novel approach to help users intuitively write SPARQL queries to query spatial data, rather than relying on knowledge of the ontology structure. Our framework re-writes queries, using transformation rules to exploit part-whole relations between geographical entities to address the mismatches between query constraints and knowledge base. Our experiments were performed on completely third party datasets and queries. Evaluations were performed on Geonames dataset using questions from National Geographic Bee serialized into SPARQL and British Administrative Geography Ontology using questions from a popular trivia website. These experiments demonstrate high precision in retrieval of results and ease in writing queries.
In-network processing of joins in wireless sensor networks.

PubMed

Kang, Hyunchul

2013-03-11

The join or correlated filtering of sensor readings is one of the fundamental query operations in wireless sensor networks (WSNs). Although the join in centralized or distributed databases is a well-researched problem, join processing in WSNs has quite different characteristics and is much more difficult to perform due to the lack of statistics on sensor readings and the resource constraints of sensor nodes. Since data transmission is orders of magnitude more costly than processing at a sensor node, in-network processing of joins is essential. In this paper, the state-of-the-art techniques for join implementation in WSNs are surveyed. The requirements and challenges, join types, and components of join implementation are described. The open issues for further research are identified.
In-Network Processing of Joins in Wireless Sensor Networks

PubMed Central

Kang, Hyunchul

2013-01-01

The join or correlated filtering of sensor readings is one of the fundamental query operations in wireless sensor networks (WSNs). Although the join in centralized or distributed databases is a well-researched problem, join processing in WSNs has quite different characteristics and is much more difficult to perform due to the lack of statistics on sensor readings and the resource constraints of sensor nodes. Since data transmission is orders of magnitude more costly than processing at a sensor node, in-network processing of joins is essential. In this paper, the state-of-the-art techniques for join implementation in WSNs are surveyed. The requirements and challenges, join types, and components of join implementation are described. The open issues for further research are identified. PMID:23478603
Research on presentation and query service of geo-spatial data based on ontology

NASA Astrophysics Data System (ADS)

Li, Hong-wei; Li, Qin-chao; Cai, Chang

2008-10-01

The paper analyzed the deficiency on presentation and query of geo-spatial data existed in current GIS, discussed the advantages that ontology possessed in formalization of geo-spatial data and the presentation of semantic granularity, taken land-use classification system as an example to construct domain ontology, and described it by OWL; realized the grade level and category presentation of land-use data benefited from the thoughts of vertical and horizontal navigation; and then discussed query mode of geo-spatial data based on ontology, including data query based on types and grade levels, instances and spatial relation, and synthetic query based on types and instances; these methods enriched query mode of current GIS, and is a useful attempt; point out that the key point of the presentation and query of spatial data based on ontology is to construct domain ontology that can correctly reflect geo-concept and its spatial relation and realize its fine formalization description.
RCQ-GA: RDF Chain Query Optimization Using Genetic Algorithms

NASA Astrophysics Data System (ADS)

Hogenboom, Alexander; Milea, Viorel; Frasincar, Flavius; Kaymak, Uzay

The application of Semantic Web technologies in an Electronic Commerce environment implies a need for good support tools. Fast query engines are needed for efficient querying of large amounts of data, usually represented using RDF. We focus on optimizing a special class of SPARQL queries, the so-called RDF chain queries. For this purpose, we devise a genetic algorithm called RCQ-GA that determines the order in which joins need to be performed for an efficient evaluation of RDF chain queries. The approach is benchmarked against a two-phase optimization algorithm, previously proposed in literature. The more complex a query is, the more RCQ-GA outperforms the benchmark in solution quality, execution time needed, and consistency of solution quality. When the algorithms are constrained by a time limit, the overall performance of RCQ-GA compared to the benchmark further improves.
The Selection and Placement Method of Materialized Views on Big Data Platform of Equipment Condition Assessment

NASA Astrophysics Data System (ADS)

Ma, Yan; Yao, Jinxia; Gu, Chao; Chen, Yufeng; Yang, Yi; Zou, Lida

2017-05-01

With the formation of electric big data environment, more and more big data analyses emerge. In the complicated data analysis on equipment condition assessment, there exist many join operations, which are time-consuming. In order to save time, the approach of materialized view is usually used. It places part of common and critical join results on external storage and avoids the frequent join operation. In the paper we propose the methods of selecting and placing materialized views to reduce the query time of electric transmission and transformation equipment, and make the profits of service providers maximal. In selection method we design a computation way for the value of non-leaf node based on MVPP structure chart. In placement method we use relevance weights to place the selected materialized views, which help reduce the network transmission time. Our experiments show that the proposed selection and placement methods have a high throughput and good optimization ability of query time for electric transmission and transformation equipment.
Evaluation of Content-Matched Range Monitoring Queries over Moving Objects in Mobile Computing Environments.

PubMed

Jung, HaRim; Song, MoonBae; Youn, Hee Yong; Kim, Ung Mo

2015-09-18

A content-matched (CM) rangemonitoring query overmoving objects continually retrieves the moving objects (i) whose non-spatial attribute values are matched to given non-spatial query values; and (ii) that are currently located within a given spatial query range. In this paper, we propose a new query indexing structure, called the group-aware query region tree (GQR-tree) for efficient evaluation of CMrange monitoring queries. The primary role of the GQR-tree is to help the server leverage the computational capabilities of moving objects in order to improve the system performance in terms of the wireless communication cost and server workload. Through a series of comprehensive simulations, we verify the superiority of the GQR-tree method over the existing methods.
The CMS DBS query language

NASA Astrophysics Data System (ADS)

Kuznetsov, Valentin; Riley, Daniel; Afaq, Anzar; Sekhri, Vijay; Guo, Yuyi; Lueking, Lee

2010-04-01

The CMS experiment has implemented a flexible and powerful system enabling users to find data within the CMS physics data catalog. The Dataset Bookkeeping Service (DBS) comprises a database and the services used to store and access metadata related to CMS physics data. To this, we have added a generalized query system in addition to the existing web and programmatic interfaces to the DBS. This query system is based on a query language that hides the complexity of the underlying database structure by discovering the join conditions between database tables. This provides a way of querying the system that is simple and straightforward for CMS data managers and physicists to use without requiring knowledge of the database tables or keys. The DBS Query Language uses the ANTLR tool to build the input query parser and tokenizer, followed by a query builder that uses a graph representation of the DBS schema to construct the SQL query sent to underlying database. We will describe the design of the query system, provide details of the language components and overview of how this component fits into the overall data discovery system architecture.
An intelligent user interface for browsing satellite data catalogs

NASA Technical Reports Server (NTRS)

Cromp, Robert F.; Crook, Sharon

1989-01-01

A large scale domain-independent spatial data management expert system that serves as a front-end to databases containing spatial data is described. This system is unique for two reasons. First, it uses spatial search techniques to generate a list of all the primary keys that fall within a user's spatial constraints prior to invoking the database management system, thus substantially decreasing the amount of time required to answer a user's query. Second, a domain-independent query expert system uses a domain-specific rule base to preprocess the user's English query, effectively mapping a broad class of queries into a smaller subset that can be handled by a commercial natural language processing system. The methods used by the spatial search module and the query expert system are explained, and the system architecture for the spatial data management expert system is described. The system is applied to data from the International Ultraviolet Explorer (IUE) satellite, and results are given.
Research on Extension of Sparql Ontology Query Language Considering the Computation of Indoor Spatial Relations

NASA Astrophysics Data System (ADS)

Li, C.; Zhu, X.; Guo, W.; Liu, Y.; Huang, H.

2015-05-01

A method suitable for indoor complex semantic query considering the computation of indoor spatial relations is provided According to the characteristics of indoor space. This paper designs ontology model describing the space related information of humans, events and Indoor space objects (e.g. Storey and Room) as well as their relations to meet the indoor semantic query. The ontology concepts are used in IndoorSPARQL query language which extends SPARQL syntax for representing and querying indoor space. And four types specific primitives for indoor query, "Adjacent", "Opposite", "Vertical" and "Contain", are defined as query functions in IndoorSPARQL used to support quantitative spatial computations. Also a method is proposed to analysis the query language. Finally this paper adopts this method to realize indoor semantic query on the study area through constructing the ontology model for the study building. The experimental results show that the method proposed in this paper can effectively support complex indoor space semantic query.
Evaluation of Content-Matched Range Monitoring Queries over Moving Objects in Mobile Computing Environments

PubMed Central

Jung, HaRim; Song, MoonBae; Youn, Hee Yong; Kim, Ung Mo

2015-01-01

A content-matched (CM) range monitoring query over moving objects continually retrieves the moving objects (i) whose non-spatial attribute values are matched to given non-spatial query values; and (ii) that are currently located within a given spatial query range. In this paper, we propose a new query indexing structure, called the group-aware query region tree (GQR-tree) for efficient evaluation of CM range monitoring queries. The primary role of the GQR-tree is to help the server leverage the computational capabilities of moving objects in order to improve the system performance in terms of the wireless communication cost and server workload. Through a series of comprehensive simulations, we verify the superiority of the GQR-tree method over the existing methods. PMID:26393613
Efficient hemodynamic event detection utilizing relational databases and wavelet analysis

NASA Technical Reports Server (NTRS)

Saeed, M.; Mark, R. G.

2001-01-01

Development of a temporal query framework for time-oriented medical databases has hitherto been a challenging problem. We describe a novel method for the detection of hemodynamic events in multiparameter trends utilizing wavelet coefficients in a MySQL relational database. Storage of the wavelet coefficients allowed for a compact representation of the trends, and provided robust descriptors for the dynamics of the parameter time series. A data model was developed to allow for simplified queries along several dimensions and time scales. Of particular importance, the data model and wavelet framework allowed for queries to be processed with minimal table-join operations. A web-based search engine was developed to allow for user-defined queries. Typical queries required between 0.01 and 0.02 seconds, with at least two orders of magnitude improvement in speed over conventional queries. This powerful and innovative structure will facilitate research on large-scale time-oriented medical databases.
Spatial aggregation query in dynamic geosensor networks

NASA Astrophysics Data System (ADS)

Yi, Baolin; Feng, Dayang; Xiao, Shisong; Zhao, Erdun

2007-11-01

Wireless sensor networks have been widely used for civilian and military applications, such as environmental monitoring and vehicle tracking. In many of these applications, the researches mainly aim at building sensor network based systems to leverage the sensed data to applications. However, the existing works seldom exploited spatial aggregation query considering the dynamic characteristics of sensor networks. In this paper, we investigate how to process spatial aggregation query over dynamic geosensor networks where both the sink node and sensor nodes are mobile and propose several novel improvements on enabling techniques. The mobility of sensors makes the existing routing protocol based on information of fixed framework or the neighborhood infeasible. We present an improved location-based stateless implicit geographic forwarding (IGF) protocol for routing a query toward the area specified by query window, a diameter-based window aggregation query (DWAQ) algorithm for query propagation and data aggregation in the query window, finally considering the location changing of the sink node, we present two schemes to forward the result to the sink node. Simulation results show that the proposed algorithms can improve query latency and query accuracy.
Analyzing Enron Data: Bitmap Indexing Outperforms MySQL Queries bySeveral Orders of Magnitude

DOE Office of Scientific and Technical Information (OSTI.GOV)

Stockinger, Kurt; Rotem, Doron; Shoshani, Arie

2006-01-28

FastBit is an efficient, compressed bitmap indexing technology that was developed in our group. In this report we evaluate the performance of MySQL and FastBit for analyzing the email traffic of the Enron dataset. The first finding shows that materializing the join results of several tables significantly improves the query performance. The second finding shows that FastBit outperforms MySQL by several orders of magnitude.
StarView: The object oriented design of the ST DADS user interface

NASA Technical Reports Server (NTRS)

Williams, J. D.; Pollizzi, J. A.

1992-01-01

StarView is the user interface being developed for the Hubble Space Telescope Data Archive and Distribution Service (ST DADS). ST DADS is the data archive for HST observations and a relational database catalog describing the archived data. Users will use StarView to query the catalog and select appropriate datasets for study. StarView sends requests for archived datasets to ST DADS which processes the requests and returns the database to the user. StarView is designed to be a powerful and extensible user interface. Unique features include an internal relational database to navigate query results, a form definition language that will work with both CRT and X interfaces, a data definition language that will allow StarView to work with any relational database, and the ability to generate adhoc queries without requiring the user to understand the structure of the ST DADS catalog. Ultimately, StarView will allow the user to refine queries in the local database for improved performance and merge in data from external sources for correlation with other query results. The user will be able to create a query from single or multiple forms, merging the selected attributes into a single query. Arbitrary selection of attributes for querying is supported. The user will be able to select how query results are viewed. A standard form or table-row format may be used. Navigation capabilities are provided to aid the user in viewing query results. Object oriented analysis and design techniques were used in the design of StarView to support the mechanisms and concepts required to implement these features. One such mechanism is the Model-View-Controller (MVC) paradigm. The MVC allows the user to have multiple views of the underlying database, while providing a consistent mechanism for interaction regardless of the view. This approach supports both CRT and X interfaces while providing a common mode of user interaction. Another powerful abstraction is the concept of a Query Model. This concept allows a single query to be built form a single or multiple forms before it is submitted to ST DADS. Supporting this concept is the adhoc query generator which allows the user to select and qualify an indeterminate number attributes from the database. The user does not need any knowledge of how the joins across various tables are to be resolved. The adhoc generator calculates the joins automatically and generates the correct SQL query.
Cognitive issues in searching images with visual queries

NASA Astrophysics Data System (ADS)

Yu, ByungGu; Evens, Martha W.

1999-01-01

In this paper, we propose our image indexing technique and visual query processing technique. Our mental images are different from the actual retinal images and many things, such as personal interests, personal experiences, perceptual context, the characteristics of spatial objects, and so on, affect our spatial perception. These private differences are propagated into our mental images and so our visual queries become different from the real images that we want to find. This is a hard problem and few people have tried to work on it. In this paper, we survey the human mental imagery system, the human spatial perception, and discuss several kinds of visual queries. Also, we propose our own approach to visual query interpretation and processing.

EmptyHeaded: A Relational Engine for Graph Processing

PubMed Central

Aberger, Christopher R.; Tu, Susan; Olukotun, Kunle; Ré, Christopher

2016-01-01

There are two types of high-performance graph processing engines: low- and high-level engines. Low-level engines (Galois, PowerGraph, Snap) provide optimized data structures and computation models but require users to write low-level imperative code, hence ensuring that efficiency is the burden of the user. In high-level engines, users write in query languages like datalog (SociaLite) or SQL (Grail). High-level engines are easier to use but are orders of magnitude slower than the low-level graph engines. We present EmptyHeaded, a high-level engine that supports a rich datalog-like query language and achieves performance comparable to that of low-level engines. At the core of EmptyHeaded’s design is a new class of join algorithms that satisfy strong theoretical guarantees but have thus far not achieved performance comparable to that of specialized graph processing engines. To achieve high performance, EmptyHeaded introduces a new join engine architecture, including a novel query optimizer and data layouts that leverage single-instruction multiple data (SIMD) parallelism. With this architecture, EmptyHeaded outperforms high-level approaches by up to three orders of magnitude on graph pattern queries, PageRank, and Single-Source Shortest Paths (SSSP) and is an order of magnitude faster than many low-level baselines. We validate that EmptyHeaded competes with the best-of-breed low-level engine (Galois), achieving comparable performance on PageRank and at most 3× worse performance on SSSP. PMID:28077912
Use of Graph Database for the Integration of Heterogeneous Biological Data.

PubMed

Yoon, Byoung-Ha; Kim, Seon-Kyu; Kim, Seon-Young

2017-03-01

Understanding complex relationships among heterogeneous biological data is one of the fundamental goals in biology. In most cases, diverse biological data are stored in relational databases, such as MySQL and Oracle, which store data in multiple tables and then infer relationships by multiple-join statements. Recently, a new type of database, called the graph-based database, was developed to natively represent various kinds of complex relationships, and it is widely used among computer science communities and IT industries. Here, we demonstrate the feasibility of using a graph-based database for complex biological relationships by comparing the performance between MySQL and Neo4j, one of the most widely used graph databases. We collected various biological data (protein-protein interaction, drug-target, gene-disease, etc.) from several existing sources, removed duplicate and redundant data, and finally constructed a graph database containing 114,550 nodes and 82,674,321 relationships. When we tested the query execution performance of MySQL versus Neo4j, we found that Neo4j outperformed MySQL in all cases. While Neo4j exhibited a very fast response for various queries, MySQL exhibited latent or unfinished responses for complex queries with multiple-join statements. These results show that using graph-based databases, such as Neo4j, is an efficient way to store complex biological relationships. Moreover, querying a graph database in diverse ways has the potential to reveal novel relationships among heterogeneous biological data.
Use of Graph Database for the Integration of Heterogeneous Biological Data

PubMed Central

Yoon, Byoung-Ha; Kim, Seon-Kyu

2017-01-01

Understanding complex relationships among heterogeneous biological data is one of the fundamental goals in biology. In most cases, diverse biological data are stored in relational databases, such as MySQL and Oracle, which store data in multiple tables and then infer relationships by multiple-join statements. Recently, a new type of database, called the graph-based database, was developed to natively represent various kinds of complex relationships, and it is widely used among computer science communities and IT industries. Here, we demonstrate the feasibility of using a graph-based database for complex biological relationships by comparing the performance between MySQL and Neo4j, one of the most widely used graph databases. We collected various biological data (protein-protein interaction, drug-target, gene-disease, etc.) from several existing sources, removed duplicate and redundant data, and finally constructed a graph database containing 114,550 nodes and 82,674,321 relationships. When we tested the query execution performance of MySQL versus Neo4j, we found that Neo4j outperformed MySQL in all cases. While Neo4j exhibited a very fast response for various queries, MySQL exhibited latent or unfinished responses for complex queries with multiple-join statements. These results show that using graph-based databases, such as Neo4j, is an efficient way to store complex biological relationships. Moreover, querying a graph database in diverse ways has the potential to reveal novel relationships among heterogeneous biological data. PMID:28416946
Recursive Joins to Query Data Hierarchies in Microsoft Access

ERIC Educational Resources Information Center

Dadashzadeh, Mohammad

2007-01-01

Organizational charts (departments, sub-departments, sub-sub-departments, and so on), project work breakdown structures (tasks, subtasks, work packages, etc.), discussion forums (posting, response, response to response, etc.), family trees (parent, child, grandchild, etc.), manufacturing bill-of-material, product classifications, and document…
Remote sensing and GIS integration: Towards intelligent imagery within a spatial data infrastructure

NASA Astrophysics Data System (ADS)

Abdelrahim, Mohamed Mahmoud Hosny

2001-11-01

In this research, an "Intelligent Imagery System Prototype" (IISP) was developed. IISP is an integration tool that facilitates the environment for active, direct, and on-the-fly usage of high resolution imagery, internally linked to hidden GIS vector layers, to query the real world phenomena and, consequently, to perform exploratory types of spatial analysis based on a clear/undisturbed image scene. The IISP was designed and implemented using the software components approach to verify the hypothesis that a fully rectified, partially rectified, or even unrectified digital image can be internally linked to a variety of different hidden vector databases/layers covering the end user area of interest, and consequently may be reliably used directly as a base for "on-the-fly" querying of real-world phenomena and for performing exploratory types of spatial analysis. Within IISP, differentially rectified, partially rectified (namely, IKONOS GEOCARTERRA(TM)), and unrectified imagery (namely, scanned aerial photographs and captured video frames) were investigated. The system was designed to handle four types of spatial functions, namely, pointing query, polygon/line-based image query, database query, and buffering. The system was developed using ESRI MapObjects 2.0a as the core spatial component within Visual Basic 6.0. When used to perform the pre-defined spatial queries using different combinations of image and vector data, the IISP provided the same results as those obtained by querying pre-processed vector layers even when the image used was not orthorectified and the vector layers had different parameters. In addition, the real-time pixel location orthorectification technique developed and presented within the IKONOS GEOCARTERRA(TM) case provided a horizontal accuracy (RMSE) of +/- 2.75 metres. This accuracy is very close to the accuracy level obtained when purchasing the orthorectified IKONOS PRECISION products (RMSE of +/- 1.9 metre). The latter cost approximately four times as much as the IKONOS GEOCARTERRA(TM) products. The developed IISP is a step closer towards the direct and active involvement of high-resolution remote sensing imagery in querying the real world and performing exploratory types of spatial analysis. (Abstract shortened by UMI.)
Genomes as geography: using GIS technology to build interactive genome feature maps

PubMed Central

Dolan, Mary E; Holden, Constance C; Beard, M Kate; Bult, Carol J

2006-01-01

Background Many commonly used genome browsers display sequence annotations and related attributes as horizontal data tracks that can be toggled on and off according to user preferences. Most genome browsers use only simple keyword searches and limit the display of detailed annotations to one chromosomal region of the genome at a time. We have employed concepts, methodologies, and tools that were developed for the display of geographic data to develop a Genome Spatial Information System (GenoSIS) for displaying genomes spatially, and interacting with genome annotations and related attribute data. In contrast to the paradigm of horizontally stacked data tracks used by most genome browsers, GenoSIS uses the concept of registered spatial layers composed of spatial objects for integrated display of diverse data. In addition to basic keyword searches, GenoSIS supports complex queries, including spatial queries, and dynamically generates genome maps. Our adaptation of the geographic information system (GIS) model in a genome context supports spatial representation of genome features at multiple scales with a versatile and expressive query capability beyond that supported by existing genome browsers. Results We implemented an interactive genome sequence feature map for the mouse genome in GenoSIS, an application that uses ArcGIS, a commercially available GIS software system. The genome features and their attributes are represented as spatial objects and data layers that can be toggled on and off according to user preferences or displayed selectively in response to user queries. GenoSIS supports the generation of custom genome maps in response to complex queries about genome features based on both their attributes and locations. Our example application of GenoSIS to the mouse genome demonstrates the powerful visualization and query capability of mature GIS technology applied in a novel domain. Conclusion Mapping tools developed specifically for geographic data can be exploited to display, explore and interact with genome data. The approach we describe here is organism independent and is equally useful for linear and circular chromosomes. One of the unique capabilities of GenoSIS compared to existing genome browsers is the capacity to generate genome feature maps dynamically in response to complex attribute and spatial queries. PMID:16984652
miBLAST: scalable evaluation of a batch of nucleotide sequence queries with BLAST

PubMed Central

Kim, You Jung; Boyd, Andrew; Athey, Brian D.; Patel, Jignesh M.

2005-01-01

A common task in many modern bioinformatics applications is to match a set of nucleotide query sequences against a large sequence dataset. Exis-ting tools, such as BLAST, are designed to evaluate a single query at a time and can be unacceptably slow when the number of sequences in the query set is large. In this paper, we present a new algorithm, called miBLAST, that evaluates such batch workloads efficiently. At the core, miBLAST employs a q-gram filtering and an index join for efficiently detecting similarity between the query sequences and database sequences. This set-oriented technique, which indexes both the query and the database sets, results in substantial performance improvements over existing methods. Our results show that miBLAST is significantly faster than BLAST in many cases. For example, miBLAST aligned 247 965 oligonucleotide sequences in the Affymetrix probe set against the Human UniGene in 1.26 days, compared with 27.27 days with BLAST (an improvement by a factor of 22). The relative performance of miBLAST increases for larger word sizes; however, it decreases for longer queries. miBLAST employs the familiar BLAST statistical model and output format, guaranteeing the same accuracy as BLAST and facilitating a seamless transition for existing BLAST users. PMID:16061938
Extended Maptree: a Representation of Fine-Grained Topology and Spatial Hierarchy of Bim

NASA Astrophysics Data System (ADS)

Wu, Y.; Shang, J.; Hu, X.; Zhou, Z.

2017-09-01

Spatial queries play significant roles in exchanging Building Information Modeling (BIM) data and integrating BIM with indoor spatial information. However, topological operators implemented for BIM spatial queries are limited to qualitative relations (e.g. touching, intersecting). To overcome this limitation, we propose an extended maptree model to represent the fine-grained topology and spatial hierarchy of indoor spaces. The model is based on a maptree which consists of combinatorial maps and an adjacency tree. Topological relations (e.g., adjacency, incidence, and covering) derived from BIM are represented explicitly and formally by extended maptrees, which can facilitate the spatial queries of BIM. To construct an extended maptree, we first use a solid model represented by vertical extrusion and boundary representation to generate the isolated 3-cells of combinatorial maps. Then, the spatial relationships defined in IFC are used to sew them together. Furthermore, the incremental edges of extended maptrees are labeled as removed 2-cells. Based on this, we can merge adjacent 3-cells according to the spatial hierarchy of IFC.
Spatial Query for Planetary Data

NASA Technical Reports Server (NTRS)

Shams, Khawaja S.; Crockett, Thomas M.; Powell, Mark W.; Joswig, Joseph C.; Fox, Jason M.

2011-01-01

Science investigators need to quickly and effectively assess past observations of specific locations on a planetary surface. This innovation involves a location-based search technology that was adapted and applied to planetary science data to support a spatial query capability for mission operations software. High-performance location-based searching requires the use of spatial data structures for database organization. Spatial data structures are designed to organize datasets based on their coordinates in a way that is optimized for location-based retrieval. The particular spatial data structure that was adapted for planetary data search is the R+ tree.
Geographic Video 3d Data Model And Retrieval

NASA Astrophysics Data System (ADS)

Han, Z.; Cui, C.; Kong, Y.; Wu, H.

2014-04-01

Geographic video includes both spatial and temporal geographic features acquired through ground-based or non-ground-based cameras. With the popularity of video capture devices such as smartphones, the volume of user-generated geographic video clips has grown significantly and the trend of this growth is quickly accelerating. Such a massive and increasing volume poses a major challenge to efficient video management and query. Most of the today's video management and query techniques are based on signal level content extraction. They are not able to fully utilize the geographic information of the videos. This paper aimed to introduce a geographic video 3D data model based on spatial information. The main idea of the model is to utilize the location, trajectory and azimuth information acquired by sensors such as GPS receivers and 3D electronic compasses in conjunction with video contents. The raw spatial information is synthesized to point, line, polygon and solid according to the camcorder parameters such as focal length and angle of view. With the video segment and video frame, we defined the three categories geometry object using the geometry model of OGC Simple Features Specification for SQL. We can query video through computing the spatial relation between query objects and three categories geometry object such as VFLocation, VSTrajectory, VSFOView and VFFovCone etc. We designed the query methods using the structured query language (SQL) in detail. The experiment indicate that the model is a multiple objective, integration, loosely coupled, flexible and extensible data model for the management of geographic stereo video.
KBGIS-II: A knowledge-based geographic information system

NASA Technical Reports Server (NTRS)

Smith, Terence; Peuquet, Donna; Menon, Sudhakar; Agarwal, Pankaj

1986-01-01

The architecture and working of a recently implemented Knowledge-Based Geographic Information System (KBGIS-II), designed to satisfy several general criteria for the GIS, is described. The system has four major functions including query-answering, learning and editing. The main query finds constrained locations for spatial objects that are describable in a predicate-calculus based spatial object language. The main search procedures include a family of constraint-satisfaction procedures that use a spatial object knowledge base to search efficiently for complex spatial objects in large, multilayered spatial data bases. These data bases are represented in quadtree form. The search strategy is designed to reduce the computational cost of search in the average case. The learning capabilities of the system include the addition of new locations of complex spatial objects to the knowledge base as queries are answered, and the ability to learn inductively definitions of new spatial objects from examples. The new definitions are added to the knowledge base by the system. The system is performing all its designated tasks successfully. Future reports will relate performance characteristics of the system.
Bottom-Up Evaluation of Twig Join Pattern Queries in XML Document Databases

NASA Astrophysics Data System (ADS)

Chen, Yangjun

Since the extensible markup language XML emerged as a new standard for information representation and exchange on the Internet, the problem of storing, indexing, and querying XML documents has been among the major issues of database research. In this paper, we study the twig pattern matching and discuss a new algorithm for processing ordered twig pattern queries. The time complexity of the algorithmis bounded by O(|D|·|Q| + |T|·leaf Q ) and its space overhead is by O(leaf T ·leaf Q ), where T stands for a document tree, Q for a twig pattern and D is a largest data stream associated with a node q of Q, which contains the database nodes that match the node predicate at q. leaf T (leaf Q ) represents the number of the leaf nodes of T (resp. Q). In addition, the algorithm can be adapted to an indexing environment with XB-trees being used.
Army technology development. IBIS query. Software to support the Image Based Information System (IBIS) expansion for mapping, charting and geodesy

NASA Technical Reports Server (NTRS)

Friedman, S. Z.; Walker, R. E.; Aitken, R. B.

1986-01-01

The Image Based Information System (IBIS) has been under development at the Jet Propulsion Laboratory (JPL) since 1975. It is a collection of more than 90 programs that enable processing of image, graphical, tabular data for spatial analysis. IBIS can be utilized to create comprehensive geographic data bases. From these data, an analyst can study various attributes describing characteristics of a given study area. Even complex combinations of disparate data types can be synthesized to obtain a new perspective on spatial phenomena. In 1984, new query software was developed enabling direct Boolean queries of IBIS data bases through the submission of easily understood expressions. An improved syntax methodology, a data dictionary, and display software simplified the analysts' tasks associated with building, executing, and subsequently displaying the results of a query. The primary purpose of this report is to describe the features and capabilities of the new query software. A secondary purpose of this report is to compare this new query software to the query software developed previously (Friedman, 1982). With respect to this topic, the relative merits and drawbacks of both approaches are covered.
Benchmarking distributed data warehouse solutions for storing genomic variant information

PubMed Central

Wiewiórka, Marek S.; Wysakowicz, Dawid P.; Okoniewski, Michał J.

2017-01-01

Abstract Genomic-based personalized medicine encompasses storing, analysing and interpreting genomic variants as its central issues. At a time when thousands of patientss sequenced exomes and genomes are becoming available, there is a growing need for efficient database storage and querying. The answer could be the application of modern distributed storage systems and query engines. However, the application of large genomic variant databases to this problem has not been sufficiently far explored so far in the literature. To investigate the effectiveness of modern columnar storage [column-oriented Database Management System (DBMS)] and query engines, we have developed a prototypic genomic variant data warehouse, populated with large generated content of genomic variants and phenotypic data. Next, we have benchmarked performance of a number of combinations of distributed storages and query engines on a set of SQL queries that address biological questions essential for both research and medical applications. In addition, a non-distributed, analytical database (MonetDB) has been used as a baseline. Comparison of query execution times confirms that distributed data warehousing solutions outperform classic relational DBMSs. Moreover, pre-aggregation and further denormalization of data, which reduce the number of distributed join operations, significantly improve query performance by several orders of magnitude. Most of distributed back-ends offer a good performance for complex analytical queries, while the Optimized Row Columnar (ORC) format paired with Presto and Parquet with Spark 2 query engines provide, on average, the lowest execution times. Apache Kudu on the other hand, is the only solution that guarantees a sub-second performance for simple genome range queries returning a small subset of data, where low-latency response is expected, while still offering decent performance for running analytical queries. In summary, research and clinical applications that require the storage and analysis of variants from thousands of samples can benefit from the scalability and performance of distributed data warehouse solutions. Database URL: https://github.com/ZSI-Bio/variantsdwh PMID:29220442
Achieving Sub-Second Search in the CMR

NASA Astrophysics Data System (ADS)

Gilman, J.; Baynes, K.; Pilone, D.; Mitchell, A. E.; Murphy, K. J.

2014-12-01

The Common Metadata Repository (CMR) is the next generation Earth Science Metadata catalog for NASA's Earth Observing data. It joins together the holdings from the EOS Clearing House (ECHO) and the Global Change Master Directory (GCMD), creating a unified, authoritative source for EOSDIS metadata. The CMR allows ingest in many different formats while providing consistent search behavior and retrieval in any supported format. Performance is a critical component of the CMR, ensuring improved data discovery and client interactivity. The CMR delivers sub-second search performance for any of the common query conditions (including spatial) across hundreds of millions of metadata granules. It also allows the addition of new metadata concepts such as visualizations, parameter metadata, and documentation. The CMR's goals presented many challenges. This talk will describe the CMR architecture, design, and innovations that were made to achieve its goals. This includes: * Architectural features like immutability and backpressure. * Data management techniques such as caching and parallel loading that give big performance gains. * Open Source and COTS tools like Elasticsearch search engine. * Adoption of Clojure, a functional programming language for the Java Virtual Machine. * Development of a custom spatial search plugin for Elasticsearch and why it was necessary. * Introduction of a unified model for metadata that maps every supported metadata format to a consistent domain model.
a Spatiotemporal Aggregation Query Method Using Multi-Thread Parallel Technique Based on Regional Division

NASA Astrophysics Data System (ADS)

Liao, S.; Chen, L.; Li, J.; Xiong, W.; Wu, Q.

2015-07-01

Existing spatiotemporal database supports spatiotemporal aggregation query over massive moving objects datasets. Due to the large amounts of data and single-thread processing method, the query speed cannot meet the application requirements. On the other hand, the query efficiency is more sensitive to spatial variation then temporal variation. In this paper, we proposed a spatiotemporal aggregation query method using multi-thread parallel technique based on regional divison and implemented it on the server. Concretely, we divided the spatiotemporal domain into several spatiotemporal cubes, computed spatiotemporal aggregation on all cubes using the technique of multi-thread parallel processing, and then integrated the query results. By testing and analyzing on the real datasets, this method has improved the query speed significantly.
Semantic-based surveillance video retrieval.

PubMed

Hu, Weiming; Xie, Dan; Fu, Zhouyu; Zeng, Wenrong; Maybank, Steve

2007-04-01

Visual surveillance produces large amounts of video data. Effective indexing and retrieval from surveillance video databases are very important. Although there are many ways to represent the content of video clips in current video retrieval algorithms, there still exists a semantic gap between users and retrieval systems. Visual surveillance systems supply a platform for investigating semantic-based video retrieval. In this paper, a semantic-based video retrieval framework for visual surveillance is proposed. A cluster-based tracking algorithm is developed to acquire motion trajectories. The trajectories are then clustered hierarchically using the spatial and temporal information, to learn activity models. A hierarchical structure of semantic indexing and retrieval of object activities, where each individual activity automatically inherits all the semantic descriptions of the activity model to which it belongs, is proposed for accessing video clips and individual objects at the semantic level. The proposed retrieval framework supports various queries including queries by keywords, multiple object queries, and queries by sketch. For multiple object queries, succession and simultaneity restrictions, together with depth and breadth first orders, are considered. For sketch-based queries, a method for matching trajectories drawn by users to spatial trajectories is proposed. The effectiveness and efficiency of our framework are tested in a crowded traffic scene.
Linked data and provenance in biological data webs.

PubMed

Zhao, Jun; Miles, Alistair; Klyne, Graham; Shotton, David

2009-03-01

The Web is now being used as a platform for publishing and linking life science data. The Web's linking architecture can be exploited to join heterogeneous data from multiple sources. However, as data are frequently being updated in a decentralized environment, provenance information becomes critical to providing reliable and trustworthy services to scientists. This article presents design patterns for representing and querying provenance information relating to mapping links between heterogeneous data from sources in the domain of functional genomics. We illustrate the use of named resource description framework (RDF) graphs at different levels of granularity to make provenance assertions about linked data, and demonstrate that these assertions are sufficient to support requirements including data currency, integrity, evidential support and historical queries.
KBGIS-2: A knowledge-based geographic information system

NASA Technical Reports Server (NTRS)

Smith, T.; Peuquet, D.; Menon, S.; Agarwal, P.

1986-01-01

The architecture and working of a recently implemented knowledge-based geographic information system (KBGIS-2) that was designed to satisfy several general criteria for the geographic information system are described. The system has four major functions that include query-answering, learning, and editing. The main query finds constrained locations for spatial objects that are describable in a predicate-calculus based spatial objects language. The main search procedures include a family of constraint-satisfaction procedures that use a spatial object knowledge base to search efficiently for complex spatial objects in large, multilayered spatial data bases. These data bases are represented in quadtree form. The search strategy is designed to reduce the computational cost of search in the average case. The learning capabilities of the system include the addition of new locations of complex spatial objects to the knowledge base as queries are answered, and the ability to learn inductively definitions of new spatial objects from examples. The new definitions are added to the knowledge base by the system. The system is currently performing all its designated tasks successfully, although currently implemented on inadequate hardware. Future reports will detail the performance characteristics of the system, and various new extensions are planned in order to enhance the power of KBGIS-2.
VisGets: coordinated visualizations for web-based information exploration and discovery.

PubMed

Dörk, Marian; Carpendale, Sheelagh; Collins, Christopher; Williamson, Carey

2008-01-01

In common Web-based search interfaces, it can be difficult to formulate queries that simultaneously combine temporal, spatial, and topical data filters. We investigate how coordinated visualizations can enhance search and exploration of information on the World Wide Web by easing the formulation of these types of queries. Drawing from visual information seeking and exploratory search, we introduce VisGets--interactive query visualizations of Web-based information that operate with online information within a Web browser. VisGets provide the information seeker with visual overviews of Web resources and offer a way to visually filter the data. Our goal is to facilitate the construction of dynamic search queries that combine filters from more than one data dimension. We present a prototype information exploration system featuring three linked VisGets (temporal, spatial, and topical), and used it to visually explore news items from online RSS feeds.

Rasdaman for Big Spatial Raster Data

NASA Astrophysics Data System (ADS)

Hu, F.; Huang, Q.; Scheele, C. J.; Yang, C. P.; Yu, M.; Liu, K.

2015-12-01

Spatial raster data have grown exponentially over the past decade. Recent advancements on data acquisition technology, such as remote sensing, have allowed us to collect massive observation data of various spatial resolution and domain coverage. The volume, velocity, and variety of such spatial data, along with the computational intensive nature of spatial queries, pose grand challenge to the storage technologies for effective big data management. While high performance computing platforms (e.g., cloud computing) can be used to solve the computing-intensive issues in big data analysis, data has to be managed in a way that is suitable for distributed parallel processing. Recently, rasdaman (raster data manager) has emerged as a scalable and cost-effective database solution to store and retrieve massive multi-dimensional arrays, such as sensor, image, and statistics data. Within this paper, the pros and cons of using rasdaman to manage and query spatial raster data will be examined and compared with other common approaches, including file-based systems, relational databases (e.g., PostgreSQL/PostGIS), and NoSQL databases (e.g., MongoDB and Hive). Earth Observing System (EOS) data collected from NASA's Atmospheric Scientific Data Center (ASDC) will be used and stored in these selected database systems, and a set of spatial and non-spatial queries will be designed to benchmark their performance on retrieving large-scale, multi-dimensional arrays of EOS data. Lessons learnt from using rasdaman will be discussed as well.
Spatiotemporal conceptual platform for querying archaeological information systems

NASA Astrophysics Data System (ADS)

Partsinevelos, Panagiotis; Sartzetaki, Mary; Sarris, Apostolos

2015-04-01

Spatial and temporal distribution of archaeological sites has been shown to associate with several attributes including marine, water, mineral and food resources, climate conditions, geomorphological features, etc. In this study, archeological settlement attributes are evaluated under various associations in order to provide a specialized query platform in a geographic information system (GIS). Towards this end, a spatial database is designed to include a series of archaeological findings for a secluded geographic area of Crete in Greece. The key categories of the geodatabase include the archaeological type (palace, burial site, village, etc.), temporal information of the habitation/usage period (pre Minoan, Minoan, Byzantine, etc.), and the extracted geographical attributes of the sites (distance to sea, altitude, resources, etc.). Most of the related spatial attributes are extracted with readily available GIS tools. Additionally, a series of conceptual data attributes are estimated, including: Temporal relation of an era to a future one in terms of alteration of the archaeological type, topologic relations of various types and attributes, spatial proximity relations between various types. These complex spatiotemporal relational measures reveal new attributes towards better understanding of site selection for prehistoric and/or historic cultures, yet their potential combinations can become numerous. Therefore, after the quantification of the above mentioned attributes, they are classified as of their importance for archaeological site location modeling. Under this new classification scheme, the user may select a geographic area of interest and extract only the important attributes for a specific archaeological type. These extracted attributes may then be queried against the entire spatial database and provide a location map of possible new archaeological sites. This novel type of querying is robust since the user does not have to type a standard SQL query but graphically select an area of interest. In addition, according to the application at hand, novel spatiotemporal attributes and relations can be supported, towards the understanding of historical settlement patterns.
Accelerating semantic graph databases on commodity clusters

DOE Office of Scientific and Technical Information (OSTI.GOV)

Morari, Alessandro; Castellana, Vito G.; Haglin, David J.

We are developing a full software system for accelerating semantic graph databases on commodity cluster that scales to hundreds of nodes while maintaining constant query throughput. Our framework comprises a SPARQL to C++ compiler, a library of parallel graph methods and a custom multithreaded runtime layer, which provides a Partitioned Global Address Space (PGAS) programming model with fork/join parallelism and automatic load balancing over a commodity clusters. We present preliminary results for the compiler and for the runtime.
Federated ontology-based queries over cancer data

PubMed Central

2012-01-01

Background Personalised medicine provides patients with treatments that are specific to their genetic profiles. It requires efficient data sharing of disparate data types across a variety of scientific disciplines, such as molecular biology, pathology, radiology and clinical practice. Personalised medicine aims to offer the safest and most effective therapeutic strategy based on the gene variations of each subject. In particular, this is valid in oncology, where knowledge about genetic mutations has already led to new therapies. Current molecular biology techniques (microarrays, proteomics, epigenetic technology and improved DNA sequencing technology) enable better characterisation of cancer tumours. The vast amounts of data, however, coupled with the use of different terms - or semantic heterogeneity - in each discipline makes the retrieval and integration of information difficult. Results Existing software infrastructures for data-sharing in the cancer domain, such as caGrid, support access to distributed information. caGrid follows a service-oriented model-driven architecture. Each data source in caGrid is associated with metadata at increasing levels of abstraction, including syntactic, structural, reference and domain metadata. The domain metadata consists of ontology-based annotations associated with the structural information of each data source. However, caGrid's current querying functionality is given at the structural metadata level, without capitalising on the ontology-based annotations. This paper presents the design of and theoretical foundations for distributed ontology-based queries over cancer research data. Concept-based queries are reformulated to the target query language, where join conditions between multiple data sources are found by exploiting the semantic annotations. The system has been implemented, as a proof of concept, over the caGrid infrastructure. The approach is applicable to other model-driven architectures. A graphical user interface has been developed, supporting ontology-based queries over caGrid data sources. An extensive evaluation of the query reformulation technique is included. Conclusions To support personalised medicine in oncology, it is crucial to retrieve and integrate molecular, pathology, radiology and clinical data in an efficient manner. The semantic heterogeneity of the data makes this a challenging task. Ontologies provide a formal framework to support querying and integration. This paper provides an ontology-based solution for querying distributed databases over service-oriented, model-driven infrastructures. PMID:22373043
Merged data models for multi-parameterized querying: Spectral data base meets GIS-based map archive

NASA Astrophysics Data System (ADS)

Naß, A.; D'Amore, M.; Helbert, J.

2017-09-01

Current and upcoming planetary missions deliver a huge amount of different data (remote sensing data, in-situ data, and derived products). Within this contribution present how different data (bases) can be managed and merged, to enable multi-parameterized querying based on the constant spatial context.
An XML-Based Knowledge Management System of Port Information for U.S. Coast Guard Cutters

DTIC Science & Technology

2003-03-01

using DTDs was not chosen. XML Schema performs many of the same functions as SQL type schemas, but differ by the unique structure of XML documents...to access data from content files within the developed system. XPath is not equivalent to SQL . While XPath is very powerful at reaching into an XML...document and finding nodes or node sets, it is not a complete query language. For operations like joins, unions, intersections, etc., SQL is far
Abstracting data warehousing issues in scientific research.

PubMed

Tews, Cody; Bracio, Boris R

2002-01-01

This paper presents the design and implementation of the Idaho Biomedical Data Management System (IBDMS). This system preprocesses biomedical data from the IMPROVE (Improving Control of Patient Status in Critical Care) library via an Open Database Connectivity (ODBC) connection. The ODBC connection allows for local and remote simulations to access filtered, joined, and sorted data using the Structured Query Language (SQL). The tool is capable of providing an overview of available data in addition to user defined data subset for verification of models of the human respiratory system.
Bin-Hash Indexing: A Parallel Method for Fast Query Processing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bethel, Edward W; Gosink, Luke J.; Wu, Kesheng

2008-06-27

This paper presents a new parallel indexing data structure for answering queries. The index, called Bin-Hash, offers extremely high levels of concurrency, and is therefore well-suited for the emerging commodity of parallel processors, such as multi-cores, cell processors, and general purpose graphics processing units (GPU). The Bin-Hash approach first bins the base data, and then partitions and separately stores the values in each bin as a perfect spatial hash table. To answer a query, we first determine whether or not a record satisfies the query conditions based on the bin boundaries. For the bins with records that can not bemore » resolved, we examine the spatial hash tables. The procedures for examining the bin numbers and the spatial hash tables offer the maximum possible level of concurrency; all records are able to be evaluated by our procedure independently in parallel. Additionally, our Bin-Hash procedures access much smaller amounts of data than similar parallel methods, such as the projection index. This smaller data footprint is critical for certain parallel processors, like GPUs, where memory resources are limited. To demonstrate the effectiveness of Bin-Hash, we implement it on a GPU using the data-parallel programming language CUDA. The concurrency offered by the Bin-Hash index allows us to fully utilize the GPU's massive parallelism in our work; over 12,000 records can be simultaneously evaluated at any one time. We show that our new query processing method is an order of magnitude faster than current state-of-the-art CPU-based indexing technologies. Additionally, we compare our performance to existing GPU-based projection index strategies.« less
Spatial Knowledge Infrastructures - Creating Value for Policy Makers and Benefits the Community

NASA Astrophysics Data System (ADS)

Arnold, L. M.

2016-12-01

The spatial data infrastructure is arguably one of the most significant advancements in the spatial sector. It's been a game changer for governments, providing for the coordination and sharing of spatial data across organisations and the provision of accessible information to the broader community of users. Today however, end-users such as policy-makers require far more from these spatial data infrastructures. They want more than just data; they want the knowledge that can be extracted from data and they don't want to have to download, manipulate and process data in order to get the knowledge they seek. It's time for the spatial sector to reduce its focus on data in spatial data infrastructures and take a more proactive step in emphasising and delivering the knowledge value. Nowadays, decision-makers want to be able to query at will the data to meet their immediate need for knowledge. This is a new value proposal for the decision-making consumer and will require a shift in thinking. This paper presents a model for a Spatial Knowledge Infrastructure and underpinning methods that will realise a new real-time approach to delivering knowledge. The methods embrace the new capabilities afforded through the sematic web, domain and process ontologies and natural query language processing. Semantic Web technologies today have the potential to transform the spatial industry into more than just a distribution channel for data. The Semantic Web RDF (Resource Description Framework) enables meaning to be drawn from data automatically. While pushing data out to end-users will remain a central role for data producers, the power of the semantic web is that end-users have the ability to marshal a broad range of spatial resources via a query to extract knowledge from available data. This can be done without actually having to configure systems specifically for the end-user. All data producers need do is make data accessible in RDF and the spatial analytics does the rest.
Object-Oriented Query Language For Events Detection From Images Sequences

NASA Astrophysics Data System (ADS)

Ganea, Ion Eugen

2015-09-01

In this paper is presented a method to represent the events extracted from images sequences and the query language used for events detection. Using an object oriented model the spatial and temporal relationships between salient objects and also between events are stored and queried. This works aims to unify the storing and querying phases for video events processing. The object oriented language syntax used for events processing allow the instantiation of the indexes classes in order to improve the accuracy of the query results. The experiments were performed on images sequences provided from sport domain and it shows the reliability and the robustness of the proposed language. To extend the language will be added a specific syntax for constructing the templates for abnormal events and for detection of the incidents as the final goal of the research.
Web-based Visualization and Query of semantically segmented multiresolution 3D Models in the Field of Cultural Heritage

NASA Astrophysics Data System (ADS)

Auer, M.; Agugiaro, G.; Billen, N.; Loos, L.; Zipf, A.

2014-05-01

Many important Cultural Heritage sites have been studied over long periods of time by different means of technical equipment, methods and intentions by different researchers. This has led to huge amounts of heterogeneous "traditional" datasets and formats. The rising popularity of 3D models in the field of Cultural Heritage in recent years has brought additional data formats and makes it even more necessary to find solutions to manage, publish and study these data in an integrated way. The MayaArch3D project aims to realize such an integrative approach by establishing a web-based research platform bringing spatial and non-spatial databases together and providing visualization and analysis tools. Especially the 3D components of the platform use hierarchical segmentation concepts to structure the data and to perform queries on semantic entities. This paper presents a database schema to organize not only segmented models but also different Levels-of-Details and other representations of the same entity. It is further implemented in a spatial database which allows the storing of georeferenced 3D data. This enables organization and queries by semantic, geometric and spatial properties. As service for the delivery of the segmented models a standardization candidate of the OpenGeospatialConsortium (OGC), the Web3DService (W3DS) has been extended to cope with the new database schema and deliver a web friendly format for WebGL rendering. Finally a generic user interface is presented which uses the segments as navigation metaphor to browse and query the semantic segmentation levels and retrieve information from an external database of the German Archaeological Institute (DAI).
GIS Methodic and New Database for Magmatic Rocks. Application for Atlantic Oceanic Magmatism.

NASA Astrophysics Data System (ADS)

Asavin, A. M.

2001-12-01

There are several geochemical Databases in INTERNET available now. There one of the main peculiarities of stored geochemical information is geographical coordinates of each samples in those Databases. As rule the software of this Database use spatial information only for users interface search procedures. In the other side, GIS-software (Geographical Information System software),for example ARC/INFO software which using for creation and analyzing special geological, geochemical and geophysical e-map, have been deeply involved with geographical coordinates for of samples. We join peculiarities GIS systems and relational geochemical Database from special software. Our geochemical information system created in Vernadsky Geological State Museum and institute of Geochemistry and Analytical Chemistry from Moscow. Now we tested system with data of geochemistry oceanic rock from Atlantic and Pacific oceans, about 10000 chemical analysis. GIS information content consist from e-map covers Wold Globes. Parts of these maps are Atlantic ocean covers gravica map (with grid 2''), oceanic bottom hot stream, altimeteric maps, seismic activity, tectonic map and geological map. Combination of this information content makes possible created new geochemical maps and combination of spatial analysis and numerical geochemical modeling of volcanic process in ocean segment. Now we tested information system on thick client technology. Interface between GIS system Arc/View and Database resides in special multiply SQL-queries sequence. The result of the above gueries were simple DBF-file with geographical coordinates. This file act at the instant of creation geochemical and other special e-map from oceanic region. We used more complex method for geophysical data. From ARC\\View we created grid cover for polygon spatial geophysical information.
Location Prediction Based on Transition Probability Matrices Constructing from Sequential Rules for Spatial-Temporal K-Anonymity Dataset

PubMed Central

Liu, Zhao; Zhu, Yunhong; Wu, Chenxue

2016-01-01

Spatial-temporal k-anonymity has become a mainstream approach among techniques for protection of users’ privacy in location-based services (LBS) applications, and has been applied to several variants such as LBS snapshot queries and continuous queries. Analyzing large-scale spatial-temporal anonymity sets may benefit several LBS applications. In this paper, we propose two location prediction methods based on transition probability matrices constructing from sequential rules for spatial-temporal k-anonymity dataset. First, we define single-step sequential rules mined from sequential spatial-temporal k-anonymity datasets generated from continuous LBS queries for multiple users. We then construct transition probability matrices from mined single-step sequential rules, and normalize the transition probabilities in the transition matrices. Next, we regard a mobility model for an LBS requester as a stationary stochastic process and compute the n-step transition probability matrices by raising the normalized transition probability matrices to the power n. Furthermore, we propose two location prediction methods: rough prediction and accurate prediction. The former achieves the probabilities of arriving at target locations along simple paths those include only current locations, target locations and transition steps. By iteratively combining the probabilities for simple paths with n steps and the probabilities for detailed paths with n-1 steps, the latter method calculates transition probabilities for detailed paths with n steps from current locations to target locations. Finally, we conduct extensive experiments, and correctness and flexibility of our proposed algorithm have been verified. PMID:27508502
Approximate Algorithms for Computing Spatial Distance Histograms with Accuracy Guarantees

PubMed Central

Grupcev, Vladimir; Yuan, Yongke; Tu, Yi-Cheng; Huang, Jin; Chen, Shaoping; Pandit, Sagar; Weng, Michael

2014-01-01

Particle simulation has become an important research tool in many scientific and engineering fields. Data generated by such simulations impose great challenges to database storage and query processing. One of the queries against particle simulation data, the spatial distance histogram (SDH) query, is the building block of many high-level analytics, and requires quadratic time to compute using a straightforward algorithm. Previous work has developed efficient algorithms that compute exact SDHs. While beating the naive solution, such algorithms are still not practical in processing SDH queries against large-scale simulation data. In this paper, we take a different path to tackle this problem by focusing on approximate algorithms with provable error bounds. We first present a solution derived from the aforementioned exact SDH algorithm, and this solution has running time that is unrelated to the system size N. We also develop a mathematical model to analyze the mechanism that leads to errors in the basic approximate algorithm. Our model provides insights on how the algorithm can be improved to achieve higher accuracy and efficiency. Such insights give rise to a new approximate algorithm with improved time/accuracy tradeoff. Experimental results confirm our analysis. PMID:24693210
Multidimensional indexing structure for use with linear optimization queries

NASA Technical Reports Server (NTRS)

Bergman, Lawrence David (Inventor); Castelli, Vittorio (Inventor); Chang, Yuan-Chi (Inventor); Li, Chung-Sheng (Inventor); Smith, John Richard (Inventor)

2002-01-01

Linear optimization queries, which usually arise in various decision support and resource planning applications, are queries that retrieve top N data records (where N is an integer greater than zero) which satisfy a specific optimization criterion. The optimization criterion is to either maximize or minimize a linear equation. The coefficients of the linear equation are given at query time. Methods and apparatus are disclosed for constructing, maintaining and utilizing a multidimensional indexing structure of database records to improve the execution speed of linear optimization queries. Database records with numerical attributes are organized into a number of layers and each layer represents a geometric structure called convex hull. Such linear optimization queries are processed by searching from the outer-most layer of this multi-layer indexing structure inwards. At least one record per layer will satisfy the query criterion and the number of layers needed to be searched depends on the spatial distribution of records, the query-issued linear coefficients, and N, the number of records to be returned. When N is small compared to the total size of the database, answering the query typically requires searching only a small fraction of all relevant records, resulting in a tremendous speedup as compared to linearly scanning the entire dataset.
Privacy-Preserving Location-Based Query Using Location Indexes and Parallel Searching in Distributed Networks

PubMed Central

Liu, Lei; Zhao, Jing

2014-01-01

An efficient location-based query algorithm of protecting the privacy of the user in the distributed networks is given. This algorithm utilizes the location indexes of the users and multiple parallel threads to search and select quickly all the candidate anonymous sets with more users and their location information with more uniform distribution to accelerate the execution of the temporal-spatial anonymous operations, and it allows the users to configure their custom-made privacy-preserving location query requests. The simulated experiment results show that the proposed algorithm can offer simultaneously the location query services for more users and improve the performance of the anonymous server and satisfy the anonymous location requests of the users. PMID:24790579
Privacy-preserving location-based query using location indexes and parallel searching in distributed networks.

PubMed

Zhong, Cheng; Liu, Lei; Zhao, Jing

2014-01-01

An efficient location-based query algorithm of protecting the privacy of the user in the distributed networks is given. This algorithm utilizes the location indexes of the users and multiple parallel threads to search and select quickly all the candidate anonymous sets with more users and their location information with more uniform distribution to accelerate the execution of the temporal-spatial anonymous operations, and it allows the users to configure their custom-made privacy-preserving location query requests. The simulated experiment results show that the proposed algorithm can offer simultaneously the location query services for more users and improve the performance of the anonymous server and satisfy the anonymous location requests of the users.
Do accountable care organizations (ACOs) help or hinder primary care physicians' ability to deliver high-quality care?

PubMed

Berenson, Robert A; Burton, Rachel A; McGrath, Megan

2016-09-01

Many view advanced primary care models such as the patient-centered medical home as foundational for accountable care organizations (ACOs), but it remains unclear how these two delivery reforms are complementary and how they may produce conflict. The objective of this study was to identify how joining an ACO could help or hinder a primary care practice's efforts to deliver high-quality care. This qualitative study involved interviews with a purposive sample of 32 early adopters of advanced primary care and/or ACO models, drawn from across the U.S. and conducted in mid-2014. Interview notes were coded using qualitative data analysis software, permitting topic-specific queries which were then summarized. Respondents perceived many potential benefits of joining an ACO, including care coordination staff, data analytics, and improved communication with other providers. However, respondents were also concerned about added "bureaucratic" requirements, referral restrictions, and a potential inability to recoup investments in practice improvements. Interviewees generally thought joining an ACO could complement a practice's efforts to deliver high-quality care, yet noted some concerns that could undermine these synergies. Both the advantages and disadvantages of joining an ACO seemed exacerbated for small practices, since they are most likely to benefit from additional resources yet are most likely to chafe under added bureaucratic requirements. Our identification of the potential pros and cons of joining an ACO may help providers identify areas to examine when weighing whether to enter into such an arrangement, and may help ACOs identify potential areas for improvement. Copyright © 2016 Elsevier Inc. All rights reserved.
Spatial coding-based approach for partitioning big spatial data in Hadoop

NASA Astrophysics Data System (ADS)

Yao, Xiaochuang; Mokbel, Mohamed F.; Alarabi, Louai; Eldawy, Ahmed; Yang, Jianyu; Yun, Wenju; Li, Lin; Ye, Sijing; Zhu, Dehai

2017-09-01

Spatial data partitioning (SDP) plays a powerful role in distributed storage and parallel computing for spatial data. However, due to skew distribution of spatial data and varying volume of spatial vector objects, it leads to a significant challenge to ensure both optimal performance of spatial operation and data balance in the cluster. To tackle this problem, we proposed a spatial coding-based approach for partitioning big spatial data in Hadoop. This approach, firstly, compressed the whole big spatial data based on spatial coding matrix to create a sensing information set (SIS), including spatial code, size, count and other information. SIS was then employed to build spatial partitioning matrix, which was used to spilt all spatial objects into different partitions in the cluster finally. Based on our approach, the neighbouring spatial objects can be partitioned into the same block. At the same time, it also can minimize the data skew in Hadoop distributed file system (HDFS). The presented approach with a case study in this paper is compared against random sampling based partitioning, with three measurement standards, namely, the spatial index quality, data skew in HDFS, and range query performance. The experimental results show that our method based on spatial coding technique can improve the query performance of big spatial data, as well as the data balance in HDFS. We implemented and deployed this approach in Hadoop, and it is also able to support efficiently any other distributed big spatial data systems.
Resource Needs and Pedagogical Value of Web Mapping for Spatial Thinking

ERIC Educational Resources Information Center

Manson, Steven; Shannon, Jerry; Eria, Sami; Kne, Len; Dyke, Kevin; Nelson, Sara; Batra, Lalit; Bonsal, Dudley; Kernik, Melinda; Immich, Jennifer; Matson, Laura

2014-01-01

Web mapping involves publishing and using maps via the Internet, and can range from presenting static maps to offering dynamic data querying and spatial analysis. Web mapping is seen as a promising way to support development of spatial thinking in the classroom but there are unanswered questions about how this promise plays out in reality. This…

A geo-spatial data management system for potentially active volcanoes—GEOWARN project

NASA Astrophysics Data System (ADS)

Gogu, Radu C.; Dietrich, Volker J.; Jenny, Bernhard; Schwandner, Florian M.; Hurni, Lorenz

2006-02-01

Integrated studies of active volcanic systems for the purpose of long-term monitoring and forecast and short-term eruption prediction require large numbers of data-sets from various disciplines. A modern database concept has been developed for managing and analyzing multi-disciplinary volcanological data-sets. The GEOWARN project (choosing the "Kos-Yali-Nisyros-Tilos volcanic field, Greece" and the "Campi Flegrei, Italy" as test sites) is oriented toward potentially active volcanoes situated in regions of high geodynamic unrest. This article describes the volcanological database of the spatial and temporal data acquired within the GEOWARN project. As a first step, a spatial database embedded in a Geographic Information System (GIS) environment was created. Digital data of different spatial resolution, and time-series data collected at different intervals or periods, were unified in a common, four-dimensional representation of space and time. The database scheme comprises various information layers containing geographic data (e.g. seafloor and land digital elevation model, satellite imagery, anthropogenic structures, land-use), geophysical data (e.g. from active and passive seismicity, gravity, tomography, SAR interferometry, thermal imagery, differential GPS), geological data (e.g. lithology, structural geology, oceanography), and geochemical data (e.g. from hydrothermal fluid chemistry and diffuse degassing features). As a second step based on the presented database, spatial data analysis has been performed using custom-programmed interfaces that execute query scripts resulting in a graphical visualization of data. These query tools were designed and compiled following scenarios of known "behavior" patterns of dormant volcanoes and first candidate signs of potential unrest. The spatial database and query approach is intended to facilitate scientific research on volcanic processes and phenomena, and volcanic surveillance.
Application based on ArcObject inquiry and Google maps demonstration to real estate database

NASA Astrophysics Data System (ADS)

Hwang, JinTsong

2007-06-01

Real estate industry in Taiwan has been flourishing in recent years. To acquire various and abundant information of real estate for sale is the same goal for the consumers and the brokerages. Therefore, before looking at the property, it is important to get all pertinent information possible. Not only this beneficial for the real estate agent as they can provide the sellers with the most information, thereby solidifying the interest of the buyer, but may also save time and the cost of manpower were something out of place. Most of the brokerage sites are aware of utilizes Internet as form of media for publicity however; the contents are limited to specific property itself and the functions of query are mostly just provided searching by condition. This paper proposes a query interface on website which gives function of zone query by spatial analysis for non-GIS users, developing a user-friendly interface with ArcObject in VB6, and query by condition. The inquiry results can show on the web page which is embedded functions of Google Maps and the UrMap API on it. In addition, the demonstration of inquiry results will give the multimedia present way which includes hyperlink to Google Earth with surrounding of the property, the Virtual Reality scene of house, panorama of interior of building and so on. Therefore, the website provides extra spatial solution for query and demonstration abundant information of real estate in two-dimensional and three-dimensional types of view.
Modeling and query the uncertainty of network constrained moving objects based on RFID data

NASA Astrophysics Data System (ADS)

Han, Liang; Xie, Kunqing; Ma, Xiujun; Song, Guojie

2007-06-01

The management of network constrained moving objects is more and more practical, especially in intelligent transportation system. In the past, the location information of moving objects on network is collected by GPS, which cost high and has the problem of frequent update and privacy. The RFID (Radio Frequency IDentification) devices are used more and more widely to collect the location information. They are cheaper and have less update. And they interfere in the privacy less. They detect the id of the object and the time when moving object passed by the node of the network. They don't detect the objects' exact movement in side the edge, which lead to a problem of uncertainty. How to modeling and query the uncertainty of the network constrained moving objects based on RFID data becomes a research issue. In this paper, a model is proposed to describe the uncertainty of network constrained moving objects. A two level index is presented to provide efficient access to the network and the data of movement. The processing of imprecise time-slice query and spatio-temporal range query are studied in this paper. The processing includes four steps: spatial filter, spatial refinement, temporal filter and probability calculation. Finally, some experiments are done based on the simulated data. In the experiments the performance of the index is studied. The precision and recall of the result set are defined. And how the query arguments affect the precision and recall of the result set is also discussed.
SAFE: SPARQL Federation over RDF Data Cubes with Access Control.

PubMed

Khan, Yasar; Saleem, Muhammad; Mehdi, Muntazir; Hogan, Aidan; Mehmood, Qaiser; Rebholz-Schuhmann, Dietrich; Sahay, Ratnesh

2017-02-01

Several query federation engines have been proposed for accessing public Linked Open Data sources. However, in many domains, resources are sensitive and access to these resources is tightly controlled by stakeholders; consequently, privacy is a major concern when federating queries over such datasets. In the Healthcare and Life Sciences (HCLS) domain real-world datasets contain sensitive statistical information: strict ownership is granted to individuals working in hospitals, research labs, clinical trial organisers, etc. Therefore, the legal and ethical concerns on (i) preserving the anonymity of patients (or clinical subjects); and (ii) respecting data ownership through access control; are key challenges faced by the data analytics community working within the HCLS domain. Likewise statistical data play a key role in the domain, where the RDF Data Cube Vocabulary has been proposed as a standard format to enable the exchange of such data. However, to the best of our knowledge, no existing approach has looked to optimise federated queries over such statistical data. We present SAFE: a query federation engine that enables policy-aware access to sensitive statistical datasets represented as RDF data cubes. SAFE is designed specifically to query statistical RDF data cubes in a distributed setting, where access control is coupled with source selection, user profiles and their access rights. SAFE proposes a join-aware source selection method that avoids wasteful requests to irrelevant and unauthorised data sources. In order to preserve anonymity and enforce stricter access control, SAFE's indexing system does not hold any data instances-it stores only predicates and endpoints. The resulting data summary has a significantly lower index generation time and size compared to existing engines, which allows for faster updates when sources change. We validate the performance of the system with experiments over real-world datasets provided by three clinical organisations as well as legacy linked datasets. We show that SAFE enables granular graph-level access control over distributed clinical RDF data cubes and efficiently reduces the source selection and overall query execution time when compared with general-purpose SPARQL query federation engines in the targeted setting.
EMAP and EMAGE: a framework for understanding spatially organized data.

PubMed

Baldock, Richard A; Bard, Jonathan B L; Burger, Albert; Burton, Nicolas; Christiansen, Jeff; Feng, Guanjie; Hill, Bill; Houghton, Derek; Kaufman, Matthew; Rao, Jianguo; Sharpe, James; Ross, Allyson; Stevenson, Peter; Venkataraman, Shanmugasundaram; Waterhouse, Andrew; Yang, Yiya; Davidson, Duncan R

2003-01-01

The Edinburgh MouseAtlas Project (EMAP) is a time-series of mouse-embryo volumetric models. The models provide a context-free spatial framework onto which structural interpretations and experimental data can be mapped. This enables collation, comparison, and query of complex spatial patterns with respect to each other and with respect to known or hypothesized structure. The atlas also includes a time-dependent anatomical ontology and mapping between the ontology and the spatial models in the form of delineated anatomical regions or tissues. The models provide a natural, graphical context for browsing and visualizing complex data. The Edinburgh Mouse Atlas Gene-Expression Database (EMAGE) is one of the first applications of the EMAP framework and provides a spatially mapped gene-expression database with associated tools for data mapping, submission, and query. In this article, we describe the underlying principles of the Atlas and the gene-expression database, and provide a practical introduction to the use of the EMAP and EMAGE tools, including use of new techniques for whole body gene-expression data capture and mapping.
Relational Algebra in Spatial Decision Support Systems Ontologies.

PubMed

Diomidous, Marianna; Chardalias, Kostis; Koutonias, Panagiotis; Magnita, Adrianna; Andrianopoulos, Charalampos; Zimeras, Stelios; Mechili, Enkeleint Aggelos

2017-01-01

Decision Support Systems (DSS) is a powerful tool, for facilitates researchers to choose the correct decision based on their final results. Especially in medical cases where doctors could use these systems, to overcome the problem with the clinical misunderstanding. Based on these systems, queries must be constructed based on the particular questions that doctors must answer. In this work, combination between questions and queries would be presented via relational algebra.
Composing Data Parallel Code for a SPARQL Graph Engine

DOE Office of Scientific and Technical Information (OSTI.GOV)

Castellana, Vito G.; Tumeo, Antonino; Villa, Oreste

Big data analytics process large amount of data to extract knowledge from them. Semantic databases are big data applications that adopt the Resource Description Framework (RDF) to structure metadata through a graph-based representation. The graph based representation provides several benefits, such as the possibility to perform in memory processing with large amounts of parallelism. SPARQL is a language used to perform queries on RDF-structured data through graph matching. In this paper we present a tool that automatically translates SPARQL queries to parallel graph crawling and graph matching operations. The tool also supports complex SPARQL constructs, which requires more than basicmore » graph matching for their implementation. The tool generates parallel code annotated with OpenMP pragmas for x86 Shared-memory Multiprocessors (SMPs). With respect to commercial database systems such as Virtuoso, our approach reduces memory occupation due to join operations and provides higher performance. We show the scaling of the automatically generated graph-matching code on a 48-core SMP.« less
Indexing and retrieving point and region objects

NASA Astrophysics Data System (ADS)

Ibrahim, Azzam T.; Fotouhi, Farshad A.

1996-03-01

R-tree and its variants are examples of spatial data structures for paged-secondary memory. To process a query, these structures require multiple path traversals. In this paper, we present a new image access method, SB+-tree which requires a single path traversal to process a query. Also, SB+-tree will allow commercial databases an access method for spatial objects without a major change, since most commercial databases already support B+-tree as an access method for text data. The SB+-tree can be used for zero and non-zero size data objects. Non-zero size objects are approximated by their minimum bounding rectangles (MBRs). The number of SB+-trees generated is dependent upon the number of dimensions of the approximation of the object. The structure supports efficient spatial operations such as regions-overlap, distance and direction. In this paper, we experimentally and analytically demonstrate the superiority of SB+-tree over R-tree.
Modeling Spatial Relationships within a Fuzzy Framework.

ERIC Educational Resources Information Center

Petry, Frederick E.; Cobb, Maria A.

1998-01-01

Presents a model for representing and storing binary topological and directional relationships between 2-dimensional objects that is used to provide a basis for fuzzy querying capabilities. A data structure called an abstract spatial graph (ASG) is defined for the binary relationships that maintains all necessary information regarding topology and…
An Ontology-Based Reasoning Framework for Querying Satellite Images for Disaster Monitoring.

PubMed

Alirezaie, Marjan; Kiselev, Andrey; Längkvist, Martin; Klügl, Franziska; Loutfi, Amy

2017-11-05

This paper presents a framework in which satellite images are classified and augmented with additional semantic information to enable queries about what can be found on the map at a particular location, but also about paths that can be taken. This is achieved by a reasoning framework based on qualitative spatial reasoning that is able to find answers to high level queries that may vary on the current situation. This framework called SemCityMap, provides the full pipeline from enriching the raw image data with rudimentary labels to the integration of a knowledge representation and reasoning methods to user interfaces for high level querying. To illustrate the utility of SemCityMap in a disaster scenario, we use an urban environment-central Stockholm-in combination with a flood simulation. We show that the system provides useful answers to high-level queries also with respect to the current flood status. Examples of such queries concern path planning for vehicles or retrieval of safe regions such as "find all regions close to schools and far from the flooded area". The particular advantage of our approach lies in the fact that ontological information and reasoning is explicitly integrated so that queries can be formulated in a natural way using concepts on appropriate level of abstraction, including additional constraints.
An Ontology-Based Reasoning Framework for Querying Satellite Images for Disaster Monitoring

PubMed Central

Alirezaie, Marjan; Klügl, Franziska; Loutfi, Amy

2017-01-01

This paper presents a framework in which satellite images are classified and augmented with additional semantic information to enable queries about what can be found on the map at a particular location, but also about paths that can be taken. This is achieved by a reasoning framework based on qualitative spatial reasoning that is able to find answers to high level queries that may vary on the current situation. This framework called SemCityMap, provides the full pipeline from enriching the raw image data with rudimentary labels to the integration of a knowledge representation and reasoning methods to user interfaces for high level querying. To illustrate the utility of SemCityMap in a disaster scenario, we use an urban environment—central Stockholm—in combination with a flood simulation. We show that the system provides useful answers to high-level queries also with respect to the current flood status. Examples of such queries concern path planning for vehicles or retrieval of safe regions such as “find all regions close to schools and far from the flooded area”. The particular advantage of our approach lies in the fact that ontological information and reasoning is explicitly integrated so that queries can be formulated in a natural way using concepts on appropriate level of abstraction, including additional constraints. PMID:29113073
Geospatial Data Stream Processing in Python Using FOSS4G Components

NASA Astrophysics Data System (ADS)

McFerren, G.; van Zyl, T.

2016-06-01

One viewpoint of current and future IT systems holds that there is an increase in the scale and velocity at which data are acquired and analysed from heterogeneous, dynamic sources. In the earth observation and geoinformatics domains, this process is driven by the increase in number and types of devices that report location and the proliferation of assorted sensors, from satellite constellations to oceanic buoy arrays. Much of these data will be encountered as self-contained messages on data streams - continuous, infinite flows of data. Spatial analytics over data streams concerns the search for spatial and spatio-temporal relationships within and amongst data "on the move". In spatial databases, queries can assess a store of data to unpack spatial relationships; this is not the case on streams, where spatial relationships need to be established with the incomplete data available. Methods for spatially-based indexing, filtering, joining and transforming of streaming data need to be established and implemented in software components. This article describes the usage patterns and performance metrics of a number of well known FOSS4G Python software libraries within the data stream processing paradigm. In particular, we consider the RTree library for spatial indexing, the Shapely library for geometric processing and transformation and the PyProj library for projection and geodesic calculations over streams of geospatial data. We introduce a message oriented Python-based geospatial data streaming framework called Swordfish, which provides data stream processing primitives, functions, transports and a common data model for describing messages, based on the Open Geospatial Consortium Observations and Measurements (O&M) and Unidata Common Data Model (CDM) standards. We illustrate how the geospatial software components are integrated with the Swordfish framework. Furthermore, we describe the tight temporal constraints under which geospatial functionality can be invoked when processing high velocity, potentially infinite geospatial data streams. The article discusses the performance of these libraries under simulated streaming loads (size, complexity and volume of messages) and how they can be deployed and utilised with Swordfish under real load scenarios, illustrated by a set of Vessel Automatic Identification System (AIS) use cases. We conclude that the described software libraries are able to perform adequately under geospatial data stream processing scenarios - many real application use cases will be handled sufficiently by the software.
On-demand information retrieval in sensor networks with localised query and energy-balanced data collection.

PubMed

Teng, Rui; Zhang, Bing

2011-01-01

On-demand information retrieval enables users to query and collect up-to-date sensing information from sensor nodes. Since high energy efficiency is required in a sensor network, it is desirable to disseminate query messages with small traffic overhead and to collect sensing data with low energy consumption. However, on-demand query messages are generally forwarded to sensor nodes in network-wide broadcasts, which create large traffic overhead. In addition, since on-demand information retrieval may introduce intermittent and spatial data collections, the construction and maintenance of conventional aggregation structures such as clusters and chains will be at high cost. In this paper, we propose an on-demand information retrieval approach that exploits the name resolution of data queries according to the attribute and location of each sensor node. The proposed approach localises each query dissemination and enable localised data collection with maximised aggregation. To illustrate the effectiveness of the proposed approach, an analytical model that describes the criteria of sink proxy selection is provided. The evaluation results reveal that the proposed scheme significantly reduces energy consumption and improves the balance of energy consumption among sensor nodes by alleviating heavy traffic near the sink.
Innovations in individual feature history management - The significance of feature-based temporal model

USGS Publications Warehouse

Choi, J.; Seong, J.C.; Kim, B.; Usery, E.L.

2008-01-01

A feature relies on three dimensions (space, theme, and time) for its representation. Even though spatiotemporal models have been proposed, they have principally focused on the spatial changes of a feature. In this paper, a feature-based temporal model is proposed to represent the changes of both space and theme independently. The proposed model modifies the ISO's temporal schema and adds new explicit temporal relationship structure that stores temporal topological relationship with the ISO's temporal primitives of a feature in order to keep track feature history. The explicit temporal relationship can enhance query performance on feature history by removing topological comparison during query process. Further, a prototype system has been developed to test a proposed feature-based temporal model by querying land parcel history in Athens, Georgia. The result of temporal query on individual feature history shows the efficiency of the explicit temporal relationship structure. ?? Springer Science+Business Media, LLC 2007.
Modular design, application architecture, and usage of a self-service model for enterprise data delivery: the Duke Enterprise Data Unified Content Explorer (DEDUCE).

PubMed

Horvath, Monica M; Rusincovitch, Shelley A; Brinson, Stephanie; Shang, Howard C; Evans, Steve; Ferranti, Jeffrey M

2014-12-01

Data generated in the care of patients are widely used to support clinical research and quality improvement, which has hastened the development of self-service query tools. User interface design for such tools, execution of query activity, and underlying application architecture have not been widely reported, and existing tools reflect a wide heterogeneity of methods and technical frameworks. We describe the design, application architecture, and use of a self-service model for enterprise data delivery within Duke Medicine. Our query platform, the Duke Enterprise Data Unified Content Explorer (DEDUCE), supports enhanced data exploration, cohort identification, and data extraction from our enterprise data warehouse (EDW) using a series of modular environments that interact with a central keystone module, Cohort Manager (CM). A data-driven application architecture is implemented through three components: an application data dictionary, the concept of "smart dimensions", and dynamically-generated user interfaces. DEDUCE CM allows flexible hierarchies of EDW queries within a grid-like workspace. A cohort "join" functionality allows switching between filters based on criteria occurring within or across patient encounters. To date, 674 users have been trained and activated in DEDUCE, and logon activity shows a steady increase, with variability between months. A comparison of filter conditions and export criteria shows that these activities have different patterns of usage across subject areas. Organizations with sophisticated EDWs may find that users benefit from development of advanced query functionality, complimentary to the user interfaces and infrastructure used in other well-published models. Driven by its EDW context, the DEDUCE application architecture was also designed to be responsive to source data and to allow modification through alterations in metadata rather than programming, allowing an agile response to source system changes. Copyright © 2014 Elsevier Inc. All rights reserved.
MV-OPES: Multivalued-Order Preserving Encryption Scheme: A Novel Scheme for Encrypting Integer Value to Many Different Values

NASA Astrophysics Data System (ADS)

Kadhem, Hasan; Amagasa, Toshiyuki; Kitagawa, Hiroyuki

Encryption can provide strong security for sensitive data against inside and outside attacks. This is especially true in the “Database as Service” model, where confidentiality and privacy are important issues for the client. In fact, existing encryption approaches are vulnerable to a statistical attack because each value is encrypted to another fixed value. This paper presents a novel database encryption scheme called MV-OPES (Multivalued — Order Preserving Encryption Scheme), which allows privacy-preserving queries over encrypted databases with an improved security level. Our idea is to encrypt a value to different multiple values to prevent statistical attacks. At the same time, MV-OPES preserves the order of the integer values to allow comparison operations to be directly applied on encrypted data. Using calculated distance (range), we propose a novel method that allows a join query between relations based on inequality over encrypted values. We also present techniques to offload query execution load to a database server as much as possible, thereby making a better use of server resources in a database outsourcing environment. Our scheme can easily be integrated with current database systems as it is designed to work with existing indexing structures. It is robust against statistical attack and the estimation of true values. MV-OPES experiments show that security for sensitive data can be achieved with reasonable overhead, establishing the practicability of the scheme.
The EarthServer Federation: State, Role, and Contribution to GEOSS

NASA Astrophysics Data System (ADS)

Merticariu, Vlad; Baumann, Peter

2016-04-01

The intercontinental EarthServer initiative has established a European datacube platform with proven scalability: known databases exceed 100 TB, and single queries have been split across more than 1,000 cloud nodes. Its service interface being rigorously based on the OGC "Big Geo Data" standards, Web Coverage Service (WCS) and Web Coverage Processing Service (WCPS), a series of clients can dock into the services, ranging from open-source OpenLayers and QGIS over open-source NASA WorldWind to proprietary ESRI ArcGIS. Datacube fusion in a "mix and match" style is supported by the platform technolgy, the rasdaman Array Database System, which transparently federates queries so that users simply approach any node of the federation to access any data item, internally optimized for minimal data transfer. Notably, rasdaman is part of GEOSS GCI. NASA is contributing its Web WorldWind virtual globe for user-friendly data extraction, navigation, and analysis. Integrated datacube / metadata queries are contributed by CITE. Current federation members include ESA (managed by MEEO sr.l.), Plymouth Marine Laboratory (PML), the European Centre for Medium-Range Weather Forecast (ECMWF), Australia's National Computational Infrastructure, and Jacobs University (adding in Planetary Science). Further data centers have expressed interest in joining. We present the EarthServer approach, discuss its underlying technology, and illustrate the contribution this datacube platform can make to GEOSS.
Providing R-Tree Support for Mongodb

NASA Astrophysics Data System (ADS)

Xiang, Longgang; Shao, Xiaotian; Wang, Dehao

2016-06-01

Supporting large amounts of spatial data is a significant characteristic of modern databases. However, unlike some mature relational databases, such as Oracle and PostgreSQL, most of current burgeoning NoSQL databases are not well designed for storing geospatial data, which is becoming increasingly important in various fields. In this paper, we propose a novel method to provide R-tree index, as well as corresponding spatial range query and nearest neighbour query functions, for MongoDB, one of the most prevalent NoSQL databases. First, after in-depth analysis of MongoDB's features, we devise an efficient tabular document structure which flattens R-tree index into MongoDB collections. Further, relevant mechanisms of R-tree operations are issued, and then we discuss in detail how to integrate R-tree into MongoDB. Finally, we present the experimental results which show that our proposed method out-performs the built-in spatial index of MongoDB. Our research will greatly facilitate big data management issues with MongoDB in a variety of geospatial information applications.
An Efficient Method for the Retrieval of Objects by Topological Relations in Spatial Database Systems.

ERIC Educational Resources Information Center

Lin, P. L.; Tan, W. H.

2003-01-01

Presents a new method to improve the performance of query processing in a spatial database. Experiments demonstrated that performance of database systems can be improved because both the number of objects accessed and number of objects requiring detailed inspection are much less than those in the previous approach. (AEF)
Ontology-based geospatial data query and integration

USGS Publications Warehouse

Zhao, T.; Zhang, C.; Wei, M.; Peng, Z.-R.

2008-01-01

Geospatial data sharing is an increasingly important subject as large amount of data is produced by a variety of sources, stored in incompatible formats, and accessible through different GIS applications. Past efforts to enable sharing have produced standardized data format such as GML and data access protocols such as Web Feature Service (WFS). While these standards help enabling client applications to gain access to heterogeneous data stored in different formats from diverse sources, the usability of the access is limited due to the lack of data semantics encoded in the WFS feature types. Past research has used ontology languages to describe the semantics of geospatial data but ontology-based queries cannot be applied directly to legacy data stored in databases or shapefiles, or to feature data in WFS services. This paper presents a method to enable ontology query on spatial data available from WFS services and on data stored in databases. We do not create ontology instances explicitly and thus avoid the problems of data replication. Instead, user queries are rewritten to WFS getFeature requests and SQL queries to database. The method also has the benefits of being able to utilize existing tools of databases, WFS, and GML while enabling query based on ontology semantics. ?? 2008 Springer-Verlag Berlin Heidelberg.

A spatio-temporal index for aerial full waveform laser scanning data

NASA Astrophysics Data System (ADS)

Laefer, Debra F.; Vo, Anh-Vu; Bertolotto, Michela

2018-04-01

Aerial laser scanning is increasingly available in the full waveform version of the raw signal, which can provide greater insight into and control over the data and, thus, richer information about the scanned scenes. However, when compared to conventional discrete point storage, preserving raw waveforms leads to vastly larger and more complex data volumes. To begin addressing these challenges, this paper introduces a novel bi-level approach for storing and indexing full waveform (FWF) laser scanning data in a relational database environment, while considering both the spatial and the temporal dimensions of that data. In the storage scheme's upper level, the full waveform datasets are partitioned into spatial and temporal coherent groups that are indexed by a two-dimensional R∗-tree. To further accelerate intra-block data retrieval, at the lower level a three-dimensional local octree is created for each pulse block. The local octrees are implemented in-memory and can be efficiently written to a database for reuse. The indexing solution enables scalable and efficient three-dimensional (3D) spatial and spatio-temporal queries on the actual pulse data - functionalities not available in other systems. The proposed FWF laser scanning data solution is capable of managing multiple FWF datasets derived from large flight missions. The flight structure is embedded into the data storage model and can be used for querying predicates. Such functionality is important to FWF data exploration since aircraft locations and orientations are frequently required for FWF data analyses. Empirical tests on real datasets of up to 1 billion pulses from Dublin, Ireland prove the almost perfect scalability of the system. The use of the local 3D octree in the indexing structure accelerated pulse clipping by 1.2-3.5 times for non-axis-aligned (NAA) polyhedron shaped clipping windows, while axis-aligned (AA) polyhedron clipping was better served using only the top indexing layer. The distinct behaviours of the hybrid indexing for AA and NAA clipping windows are attributable to the different proportion of the local-index-related overheads with respect to the total querying costs. When temporal constraints were added, generally the number of costly spatial checks were reduced, thereby shortening the querying times.
A Lightweight I/O Scheme to Facilitate Spatial and Temporal Queries of Scientific Data Analytics

NASA Technical Reports Server (NTRS)

Tian, Yuan; Liu, Zhuo; Klasky, Scott; Wang, Bin; Abbasi, Hasan; Zhou, Shujia; Podhorszki, Norbert; Clune, Tom; Logan, Jeremy; Yu, Weikuan

2013-01-01

In the era of petascale computing, more scientific applications are being deployed on leadership scale computing platforms to enhance the scientific productivity. Many I/O techniques have been designed to address the growing I/O bottleneck on large-scale systems by handling massive scientific data in a holistic manner. While such techniques have been leveraged in a wide range of applications, they have not been shown as adequate for many mission critical applications, particularly in data post-processing stage. One of the examples is that some scientific applications generate datasets composed of a vast amount of small data elements that are organized along many spatial and temporal dimensions but require sophisticated data analytics on one or more dimensions. Including such dimensional knowledge into data organization can be beneficial to the efficiency of data post-processing, which is often missing from exiting I/O techniques. In this study, we propose a novel I/O scheme named STAR (Spatial and Temporal AggRegation) to enable high performance data queries for scientific analytics. STAR is able to dive into the massive data, identify the spatial and temporal relationships among data variables, and accordingly organize them into an optimized multi-dimensional data structure before storing to the storage. This technique not only facilitates the common access patterns of data analytics, but also further reduces the application turnaround time. In particular, STAR is able to enable efficient data queries along the time dimension, a practice common in scientific analytics but not yet supported by existing I/O techniques. In our case study with a critical climate modeling application GEOS-5, the experimental results on Jaguar supercomputer demonstrate an improvement up to 73 times for the read performance compared to the original I/O method.
Spatial Relation Predicates in Topographic Feature Semantics

USGS Publications Warehouse

Varanka, Dalia E.; Caro, Holly K.

2013-01-01

Topographic data are designed and widely used for base maps of diverse applications, yet the power of these information sources largely relies on the interpretive skills of map readers and relational database expert users once the data are in map or geographic information system (GIS) form. Advances in geospatial semantic technology offer data model alternatives for explicating concepts and articulating complex data queries and statements. To understand and enrich the vocabulary of topographic feature properties for semantic technology, English language spatial relation predicates were analyzed in three standard topographic feature glossaries. The analytical approach drew from disciplinary concepts in geography, linguistics, and information science. Five major classes of spatial relation predicates were identified from the analysis; representations for most of these are not widely available. The classes are: part-whole (which are commonly modeled throughout semantic and linked-data networks), geometric, processes, human intention, and spatial prepositions. These are commonly found in the ‘real world’ and support the environmental science basis for digital topographical mapping. The spatial relation concepts are based on sets of relation terms presented in this chapter, though these lists are not prescriptive or exhaustive. The results of this study make explicit the concepts forming a broad set of spatial relation expressions, which in turn form the basis for expanding the range of possible queries for topographical data analysis and mapping.
A spatial data handling system for retrieval of images by unrestricted regions of user interest

NASA Technical Reports Server (NTRS)

Dorfman, Erik; Cromp, Robert F.

1992-01-01

The Intelligent Data Management (IDM) project at NASA/Goddard Space Flight Center has prototyped an Intelligent Information Fusion System (IIFS), which automatically ingests metadata from remote sensor observations into a large catalog which is directly queryable by end-users. The greatest challenge in the implementation of this catalog was supporting spatially-driven searches, where the user has a possible complex region of interest and wishes to recover those images that overlap all or simply a part of that region. A spatial data management system is described, which is capable of storing and retrieving records of image data regardless of their source. This system was designed and implemented as part of the IIFS catalog. A new data structure, called a hypercylinder, is central to the design. The hypercylinder is specifically tailored for data distributed over the surface of a sphere, such as satellite observations of the Earth or space. Operations on the hypercylinder are regulated by two expert systems. The first governs the ingest of new metadata records, and maintains the efficiency of the data structure as it grows. The second translates, plans, and executes users' spatial queries, performing incremental optimization as partial query results are returned.
Ontology for Transforming Geo-Spatial Data for Discovery and Integration of Scientific Data

NASA Astrophysics Data System (ADS)

Nguyen, L.; Chee, T.; Minnis, P.

2013-12-01

Discovery and access to geo-spatial scientific data across heterogeneous repositories and multi-discipline datasets can present challenges for scientist. We propose to build a workflow for transforming geo-spatial datasets into semantic environment by using relationships to describe the resource using OWL Web Ontology, RDF, and a proposed geo-spatial vocabulary. We will present methods for transforming traditional scientific dataset, use of a semantic repository, and querying using SPARQL to integrate and access datasets. This unique repository will enable discovery of scientific data by geospatial bound or other criteria.
What Is Spatio-Temporal Data Warehousing?

NASA Astrophysics Data System (ADS)

Vaisman, Alejandro; Zimányi, Esteban

In the last years, extending OLAP (On-Line Analytical Processing) systems with spatial and temporal features has attracted the attention of the GIS (Geographic Information Systems) and database communities. However, there is no a commonly agreed definition of what is a spatio-temporal data warehouse and what functionality such a data warehouse should support. Further, the solutions proposed in the literature vary considerably in the kind of data that can be represented as well as the kind of queries that can be expressed. In this paper we present a conceptual framework for defining spatio-temporal data warehouses using an extensible data type system. We also define a taxonomy of different classes of queries of increasing expressive power, and show how to express such queries using an extension of the tuple relational calculus with aggregated functions.
Spatial Data Services for Interdisciplinary Applications from the NASA Socioeconomic Data and Applications Center

NASA Astrophysics Data System (ADS)

Chen, R. S.; MacManus, K.; Vinay, S.; Yetman, G.

2016-12-01

The Socioeconomic Data and Applications Center (SEDAC), one of 12 Distributed Active Archive Centers (DAACs) in the NASA Earth Observing System Data and Information System (EOSDIS), has developed a variety of operational spatial data services aimed at providing online access, visualization, and analytic functions for geospatial socioeconomic and environmental data. These services include: open web services that implement Open Geospatial Consortium (OGC) specifications such as Web Map Service (WMS), Web Feature Service (WFS), and Web Coverage Service (WCS); spatial query services that support Web Processing Service (WPS) and Representation State Transfer (REST); and web map clients and a mobile app that utilize SEDAC and other open web services. These services may be accessed from a variety of external map clients and visualization tools such as NASA's WorldView, NOAA's Climate Explorer, and ArcGIS Online. More than 200 data layers related to population, settlements, infrastructure, agriculture, environmental pollution, land use, health, hazards, climate change and other aspects of sustainable development are available through WMS, WFS, and/or WCS. Version 2 of the SEDAC Population Estimation Service (PES) supports spatial queries through WPS and REST in the form of a user-defined polygon or circle. The PES returns an estimate of the population residing in the defined area for a specific year (2000, 2005, 2010, 2015, or 2020) based on SEDAC's Gridded Population of the World version 4 (GPWv4) dataset, together with measures of accuracy. The SEDAC Hazards Mapper and the recently released HazPop iOS mobile app enable users to easily submit spatial queries to the PES and see the results. SEDAC has developed an operational virtualized backend infrastructure to manage these services and support their continual improvement as standards change, new data and services become available, and user needs evolve. An ongoing challenge is to improve the reliability and performance of the infrastructure, in conjunction with external services, to meet both research and operational needs.
Integrating a local database into the StarView distributed user interface

NASA Technical Reports Server (NTRS)

Silberberg, D. P.

1992-01-01

A distributed user interface to the Space Telescope Data Archive and Distribution Service (DADS) known as StarView is being developed. The DADS architecture consists of the data archive as well as a relational database catalog describing the archive. StarView is a client/server system in which the user interface is the front-end client to the DADS catalog and archive servers. Users query the DADS catalog from the StarView interface. Query commands are transmitted via a network and evaluated by the database. The results are returned via the network and are displayed on StarView forms. Based on the results, users decide which data sets to retrieve from the DADS archive. Archive requests are packaged by StarView and sent to DADS, which returns the requested data sets to the users. The advantages of distributed client/server user interfaces over traditional one-machine systems are well known. Since users run software on machines separate from the database, the overall client response time is much faster. Also, since the server is free to process only database requests, the database response time is much faster. Disadvantages inherent in this architecture are slow overall database access time due to the network delays, lack of a 'get previous row' command, and that refinements of a previously issued query must be submitted to the database server, even though the domain of values have already been returned by the previous query. This architecture also does not allow users to cross correlate DADS catalog data with other catalogs. Clearly, a distributed user interface would be more powerful if it overcame these disadvantages. A local database is being integrated into StarView to overcome these disadvantages. When a query is made through a StarView form, which is often composed of fields from multiple tables, it is translated to an SQL query and issued to the DADS catalog. At the same time, a local database table is created to contain the resulting rows of the query. The returned rows are displayed on the form as well as inserted into the local database table. Identical results are produced by reissuing the query to either the DADS catalog or to the local table. Relational databases do not provide a 'get previous row' function because of the inherent complexity of retrieving previous rows of multiple-table joins. However, since this function is easily implemented on a single table, StarView uses the local table to retrieve the previous row. Also, StarView issues subsequent query refinements to the local table instead of the DADS catalog, eliminating the network transmission overhead. Finally, other catalogs can be imported into the local database for cross correlation with local tables. Overall, it is believe that this is a more powerful architecture for distributed, database user interfaces.
Modelling the spatial distribution of Fasciola hepatica in bovines using decision tree, logistic regression and GIS query approaches for Brazil.

PubMed

Bennema, S C; Molento, M B; Scholte, R G; Carvalho, O S; Pritsch, I

2017-11-01

Fascioliasis is a condition caused by the trematode Fasciola hepatica. In this paper, the spatial distribution of F. hepatica in bovines in Brazil was modelled using a decision tree approach and a logistic regression, combined with a geographic information system (GIS) query. In the decision tree and the logistic model, isothermality had the strongest influence on disease prevalence. Also, the 50-year average precipitation in the warmest quarter of the year was included as a risk factor, having a negative influence on the parasite prevalence. The risk maps developed using both techniques, showed a predicted higher prevalence mainly in the South of Brazil. The prediction performance seemed to be high, but both techniques failed to reach a high accuracy in predicting the medium and high prevalence classes to the entire country. The GIS query map, based on the range of isothermality, minimum temperature of coldest month, precipitation of warmest quarter of the year, altitude and the average dailyland surface temperature, showed a possibility of presence of F. hepatica in a very large area. The risk maps produced using these methods can be used to focus activities of animal and public health programmes, even on non-evaluated F. hepatica areas.
PlanetServer: Innovative approaches for the online analysis of hyperspectral satellite data from Mars

NASA Astrophysics Data System (ADS)

Oosthoek, J. H. P.; Flahaut, J.; Rossi, A. P.; Baumann, P.; Misev, D.; Campalani, P.; Unnithan, V.

2014-06-01

PlanetServer is a WebGIS system, currently under development, enabling the online analysis of Compact Reconnaissance Imaging Spectrometer (CRISM) hyperspectral data from Mars. It is part of the EarthServer project which builds infrastructure for online access and analysis of huge Earth Science datasets. Core functionality consists of the rasdaman Array Database Management System (DBMS) for storage, and the Open Geospatial Consortium (OGC) Web Coverage Processing Service (WCPS) for data querying. Various WCPS queries have been designed to access spatial and spectral subsets of the CRISM data. The client WebGIS, consisting mainly of the OpenLayers javascript library, uses these queries to enable online spatial and spectral analysis. Currently the PlanetServer demonstration consists of two CRISM Full Resolution Target (FRT) observations, surrounding the NASA Curiosity rover landing site. A detailed analysis of one of these observations is performed in the Case Study section. The current PlanetServer functionality is described step by step, and is tested by focusing on detecting mineralogical evidence described in earlier Gale crater studies. Both the PlanetServer methodology and its possible use for mineralogical studies will be further discussed. Future work includes batch ingestion of CRISM data and further development of the WebGIS and analysis tools.
TES/Aura L2 Ozone (O3) Limb V6 (TL2O3LS)

Atmospheric Science Data Center

2018-03-01

TES/Aura L2 Ozone (O3) Limb (TL2O3LS) News: TES News Join ... Project Title: TES Discipline: Tropospheric Composition Version: V6 Level: L2 Platform: TES/Aura L2 Ozone Spatial Coverage: 27 x 23 km Limb Spatial ...
TES/Aura L2 Ozone (O3) Limb V6 (TL2O3L)

Atmospheric Science Data Center

2018-03-01

TES/Aura L2 Ozone (O3) Limb (TL2O3L) News: TES News Join TES ... Project Title: TES Discipline: Tropospheric Composition Version: V6 Level: L2 Platform: TES/Aura L2 Ozone Spatial Coverage: 27 x 23 km Limb Spatial ...
Kathleen Krah | NREL

Science.gov Websites

tool validation. Previous to joining NREL, her Masters thesis focused on a spatial analysis of strategies. Her undergraduate thesis focused on techno-economic and logistical cost modeling of offshore wind
Architecture and prototypical implementation of a semantic querying system for big Earth observation image bases

PubMed Central

Tiede, Dirk; Baraldi, Andrea; Sudmanns, Martin; Belgiu, Mariana; Lang, Stefan

2017-01-01

ABSTRACT Spatiotemporal analytics of multi-source Earth observation (EO) big data is a pre-condition for semantic content-based image retrieval (SCBIR). As a proof of concept, an innovative EO semantic querying (EO-SQ) subsystem was designed and prototypically implemented in series with an EO image understanding (EO-IU) subsystem. The EO-IU subsystem is automatically generating ESA Level 2 products (scene classification map, up to basic land cover units) from optical satellite data. The EO-SQ subsystem comprises a graphical user interface (GUI) and an array database embedded in a client server model. In the array database, all EO images are stored as a space-time data cube together with their Level 2 products generated by the EO-IU subsystem. The GUI allows users to (a) develop a conceptual world model based on a graphically supported query pipeline as a combination of spatial and temporal operators and/or standard algorithms and (b) create, save and share within the client-server architecture complex semantic queries/decision rules, suitable for SCBIR and/or spatiotemporal EO image analytics, consistent with the conceptual world model. PMID:29098143
Content-based retrieval of historical Ottoman documents stored as textual images.

PubMed

Saykol, Ediz; Sinop, Ali Kemal; Güdükbay, Ugur; Ulusoy, Ozgür; Cetin, A Enis

2004-03-01

There is an accelerating demand to access the visual content of documents stored in historical and cultural archives. Availability of electronic imaging tools and effective image processing techniques makes it feasible to process the multimedia data in large databases. In this paper, a framework for content-based retrieval of historical documents in the Ottoman Empire archives is presented. The documents are stored as textual images, which are compressed by constructing a library of symbols occurring in a document, and the symbols in the original image are then replaced with pointers into the codebook to obtain a compressed representation of the image. The features in wavelet and spatial domain based on angular and distance span of shapes are used to extract the symbols. In order to make content-based retrieval in historical archives, a query is specified as a rectangular region in an input image and the same symbol-extraction process is applied to the query region. The queries are processed on the codebook of documents and the query images are identified in the resulting documents using the pointers in textual images. The querying process does not require decompression of images. The new content-based retrieval framework is also applicable to many other document archives using different scripts.
Random and Directed Walk-Based Top-k Queries in Wireless Sensor Networks

PubMed Central

Fu, Jun-Song; Liu, Yun

2015-01-01

In wireless sensor networks, filter-based top-k query approaches are the state-of-the-art solutions and have been extensively researched in the literature, however, they are very sensitive to the network parameters, including the size of the network, dynamics of the sensors’ readings and declines in the overall range of all the readings. In this work, a random walk-based top-k query approach called RWTQ and a directed walk-based top-k query approach called DWTQ are proposed. At the beginning of a top-k query, one or several tokens are sent to the specific node(s) in the network by the base station. Then, each token walks in the network independently to record and process the readings in a random or directed way. A strategy of choosing the “right” way in DWTQ is carefully designed for the token(s) to arrive at the high-value regions as soon as possible. When designing the walking strategy for DWTQ, the spatial correlations of the readings are also considered. Theoretical analysis and simulation results indicate that RWTQ and DWTQ both are very robust against these parameters discussed previously. In addition, DWTQ outperforms TAG, FILA and EXTOK in transmission cost, energy consumption and network lifetime. PMID:26016914
Flexible ordering of antibody class switch and V(D)J joining during B-cell ontogeny

PubMed Central

Kumar, Satyendra; Wuerffel, Robert; Achour, Ikbel; Lajoie, Bryan; Sen, Ranjan; Dekker, Job; Feeney, Ann J.; Kenter, Amy L.

2013-01-01

V(D)J joining is mediated by RAG recombinase during early B-lymphocyte development in the bone marrow (BM). Activation-induced deaminase initiates isotype switching in mature B cells of secondary lymphoid structures. Previous studies questioned the strict ontological partitioning of these processes. We show that pro-B cells undergo robust switching to a subset of immunoglobulin H (IgH) isotypes. Chromatin studies reveal that in pro-B cells, the spatial organization of the Igh locus may restrict switching to this subset of isotypes. We demonstrate that in the BM, V(D)J joining and switching are interchangeably inducible, providing an explanation for the hyper-IgE phenotype of Omenn syndrome. PMID:24240234
Joining up health and planning: how Joint Strategic Needs Assessment (JSNA) can inform health and wellbeing strategies and spatial planning.

PubMed

Tomlinson, Paul; Hewitt, Stephen; Blackshaw, Neil

2013-09-01

There has been a welcome joining up of the rhetoric around health, the environment and land use or spatial planning in both the English public health white paper and the National Planning Policy Framework. However, this paper highlights a real concern that this is not being followed through into practical guidance needed by local authorities (LAs), health bodies and developers about how to deliver this at the local level. The role of Joint Strategic Needs Assessments (JSNAs) and Health and Wellbeing Strategies (HWSs) have the potential to provide a strong basis for integrated local policies for health improvement, to address the wider determinants of health and to reduce inequities. However, the draft JSNA guidance from the Department of Health falls short of providing a robust, comprehensive and practical guide to meeting these very significant challenges. The paper identifies some examples of good practice. It recommends that action should be taken to raise the standards of all JSNAs to meet the new challenges and that HWSs should be aligned spatially and temporally with local plans and other LA strategies. HWSs should also identify spatially targeted interventions that can be delivered through spatial planning or transport planning. Steps need to be taken to ensure that district councils are brought into the process.
A novel visualization model for web search results.

PubMed

Nguyen, Tien N; Zhang, Jin

2006-01-01

This paper presents an interactive visualization system, named WebSearchViz, for visualizing the Web search results and acilitating users' navigation and exploration. The metaphor in our model is the solar system with its planets and asteroids revolving around the sun. Location, color, movement, and spatial distance of objects in the visual space are used to represent the semantic relationships between a query and relevant Web pages. Especially, the movement of objects and their speeds add a new dimension to the visual space, illustrating the degree of relevance among a query and Web search results in the context of users' subjects of interest. By interacting with the visual space, users are able to observe the semantic relevance between a query and a resulting Web page with respect to their subjects of interest, context information, or concern. Users' subjects of interest can be dynamically changed, redefined, added, or deleted from the visual space.
Clinic expert information extraction based on domain model and block importance model.

PubMed

Zhang, Yuanpeng; Wang, Li; Qian, Danmin; Geng, Xingyun; Yao, Dengfu; Dong, Jiancheng

2015-11-01

To extract expert clinic information from the Deep Web, there are two challenges to face. The first one is to make a judgment on forms. A novel method based on a domain model, which is a tree structure constructed by the attributes of query interfaces is proposed. With this model, query interfaces can be classified to a domain and filled in with domain keywords. Another challenge is to extract information from response Web pages indexed by query interfaces. To filter the noisy information on a Web page, a block importance model is proposed, both content and spatial features are taken into account in this model. The experimental results indicate that the domain model yields a precision 4.89% higher than that of the rule-based method, whereas the block importance model yields an F1 measure 10.5% higher than that of the XPath method. Copyright © 2015 Elsevier Ltd. All rights reserved.

A new method for species identification via protein-coding and non-coding DNA barcodes by combining machine learning with bioinformatic methods.

PubMed

Zhang, Ai-bing; Feng, Jie; Ward, Robert D; Wan, Ping; Gao, Qiang; Wu, Jun; Zhao, Wei-zhong

2012-01-01

Species identification via DNA barcodes is contributing greatly to current bioinventory efforts. The initial, and widely accepted, proposal was to use the protein-coding cytochrome c oxidase subunit I (COI) region as the standard barcode for animals, but recently non-coding internal transcribed spacer (ITS) genes have been proposed as candidate barcodes for both animals and plants. However, achieving a robust alignment for non-coding regions can be problematic. Here we propose two new methods (DV-RBF and FJ-RBF) to address this issue for species assignment by both coding and non-coding sequences that take advantage of the power of machine learning and bioinformatics. We demonstrate the value of the new methods with four empirical datasets, two representing typical protein-coding COI barcode datasets (neotropical bats and marine fish) and two representing non-coding ITS barcodes (rust fungi and brown algae). Using two random sub-sampling approaches, we demonstrate that the new methods significantly outperformed existing Neighbor-joining (NJ) and Maximum likelihood (ML) methods for both coding and non-coding barcodes when there was complete species coverage in the reference dataset. The new methods also out-performed NJ and ML methods for non-coding sequences in circumstances of potentially incomplete species coverage, although then the NJ and ML methods performed slightly better than the new methods for protein-coding barcodes. A 100% success rate of species identification was achieved with the two new methods for 4,122 bat queries and 5,134 fish queries using COI barcodes, with 95% confidence intervals (CI) of 99.75-100%. The new methods also obtained a 96.29% success rate (95%CI: 91.62-98.40%) for 484 rust fungi queries and a 98.50% success rate (95%CI: 96.60-99.37%) for 1094 brown algae queries, both using ITS barcodes.
Comparing the performance of two CBIRS indexing schemes

NASA Astrophysics Data System (ADS)

Mueller, Wolfgang; Robbert, Guenter; Henrich, Andreas

2003-01-01

Content based image retrieval (CBIR) as it is known today has to deal with a number of challenges. Quickly summarized, the main challenges are firstly, to bridge the semantic gap between high-level concepts and low-level features using feedback, secondly to provide performance under adverse conditions. High-dimensional spaces, as well as a demanding machine learning task make the right way of indexing an important issue. When indexing multimedia data, most groups opt for extraction of high-dimensional feature vectors from the data, followed by dimensionality reduction like PCA (Principal Components Analysis) or LSI (Latent Semantic Indexing). The resulting vectors are indexed using spatial indexing structures such as kd-trees or R-trees, for example. Other projects, such as MARS and Viper propose the adaptation of text indexing techniques, notably the inverted file. Here, the Viper system is the most direct adaptation of text retrieval techniques to quantized vectors. However, while the Viper query engine provides decent performance together with impressive user-feedback behavior, as well as the possibility for easy integration of long-term learning algorithms, and support for potentially infinite feature vectors, there has been no comparison of vector-based methods and inverted-file-based methods under similar conditions. In this publication, we compare a CBIR query engine that uses inverted files (Bothrops, a rewrite of the Viper query engine based on a relational database), and a CBIR query engine based on LSD (Local Split Decision) trees for spatial indexing using the same feature sets. The Benchathlon initiative works on providing a set of images and ground truth for simulating image queries by example and corresponding user feedback. When performing the Benchathlon benchmark on a CBIR system (the System Under Test, SUT), a benchmarking harness connects over internet to the SUT, performing a number of queries using an agreed-upon protocol, the multimedia retrieval markup language (MRML). Using this benchmark one can measure the quality of retrieval, as well as the overall (speed) performance of the benchmarked system. Our Benchmarks will draw on the Benchathlon"s work for documenting the retrieval performance of both inverted file-based and LSD tree based techniques. However in addition to these results, we will present statistics, that can be obtained only inside the system under test. These statistics will include the number of complex mathematical operations, as well as the amount of data that has to be read from disk during operation of a query.
Data Container Study for Handling array-based data using Hive, Spark, MongoDB, SciDB and Rasdaman

NASA Astrophysics Data System (ADS)

Xu, M.; Hu, F.; Yang, J.; Yu, M.; Yang, C. P.

2017-12-01

Geoscience communities have come up with various big data storage solutions, such as Rasdaman and Hive, to address the grand challenges for massive Earth observation data management and processing. To examine the readiness of current solutions in supporting big Earth observation, we propose to investigate and compare four popular data container solutions, including Rasdaman, Hive, Spark, SciDB and MongoDB. Using different types of spatial and non-spatial queries, datasets stored in common scientific data formats (e.g., NetCDF and HDF), and two applications (i.e. dust storm simulation data mining and MERRA data analytics), we systematically compare and evaluate the feature and performance of these four data containers in terms of data discover and access. The computing resources (e.g. CPU, memory, hard drive, network) consumed while performing various queries and operations are monitored and recorded for the performance evaluation. The initial results show that 1) the popular data container clusters are able to handle large volume of data, but their performances vary in different situations. Meanwhile, there is a trade-off between data preprocessing, disk saving, query-time saving, and resource consuming. 2) ClimateSpark, MongoDB and SciDB perform the best among all the containers in all the queries tests, and Hive performs the worst. 3) These studied data containers can be applied on other array-based datasets, such as high resolution remote sensing data and model simulation data. 4) Rasdaman clustering configuration is more complex than the others. A comprehensive report will detail the experimental results, and compare their pros and cons regarding system performance, ease of use, accessibility, scalability, compatibility, and flexibility.
The Grid File: A Data Structure Designed to Support Proximity Queries on Spatial Objects.

DTIC Science & Technology

1983-06-01

dimensional space. The technique to be presented for storing spatial objects works for any choice of parameters by which * simple objects can be represented...However, depending on characteristics of the data to be processed , some choices of parameters are better than others. Let us discuss some...considerations that may determine the choice of parameters. 1) istinction between lmaerba peuwuers ad extensiem prwuneert For some clasm of simple objects It
TOWARD AN ACCURATE ANALYSIS OF RANGE QUERIES ON SPATIAL DATA. (R825195)

EPA Science Inventory

The perspectives, information and conclusions conveyed in research project abstracts, progress reports, final reports, journal abstracts and journal publications convey the viewpoints of the principal investigator and may not represent the views and policies of ORD and EPA. Concl...
Research on spatio-temporal database techniques for spatial information service

NASA Astrophysics Data System (ADS)

Zhao, Rong; Wang, Liang; Li, Yuxiang; Fan, Rongshuang; Liu, Ping; Li, Qingyuan

2007-06-01

Geographic data should be described by spatial, temporal and attribute components, but the spatio-temporal queries are difficult to be answered within current GIS. This paper describes research into the development and application of spatio-temporal data management system based upon GeoWindows GIS software platform which was developed by Chinese Academy of Surveying and Mapping (CASM). Faced the current and practical requirements of spatial information application, and based on existing GIS platform, one kind of spatio-temporal data model which integrates vector and grid data together was established firstly. Secondly, we solved out the key technique of building temporal data topology, successfully developed a suit of spatio-temporal database management system adopting object-oriented methods. The system provides the temporal data collection, data storage, data management and data display and query functions. Finally, as a case study, we explored the application of spatio-temporal data management system with the administrative region data of multi-history periods of China as the basic data. With all the efforts above, the GIS capacity of management and manipulation in aspect of time and attribute of GIS has been enhanced, and technical reference has been provided for the further development of temporal geographic information system (TGIS).
Archival Research Capabilities of the WFIRST Data Set

NASA Astrophysics Data System (ADS)

Szalay, Alexander

WFIRST's unique combination of a large (~0.3 deg2) field of view and HST-like angular resolution and sensitivity in the near infrared will produce spectacular new insights into the origins of stars, galaxies, and structure in the cosmos. We propose a WFIRST Archive Science Investigation Team (SIT-F) to define an archival, query, and analysis system that will enable scientific discovery in all relevant areas of astrophysics and maximize the overall scientific yield of the mission. Guest investigators (GIs), guest observers (GOs), the WFIRST SIT's, WFIRST Science Center(s), and astronomers using data from other surveys will all benefit from the extensive, easy, fast and reliable use of the WFIRST archives. We propose to develop the science requirements for the archive and work to understand its interactions with other elements of the WFIRST mission. To accomplish this, we will conduct case studies to derive performance requirements for the WFIRST archives. These will clarify what is needed for GIs to make important scientific discoveries across a broad range of astrophysics. While other SITs will primarily address the science capabilities of the WFIRST instruments, we will look ahead to the science enabling capabilities of the WFIRST archives. We will demonstrate how the archive can be optimized to take advantage of the extraordinary science capabilities of the WFIRST instruments as well as major space and ground observatories to maximize the science return of the mission. We will use the "20 queries" methodology, formulated by Jim Gray, to cover the most important science analysis patterns and use these to establish the performance required of the WFIRST archive. The case studies will be centered on studying galaxy evolution as a function of cosmic time, environment and intrinsic properties. The analyses will require massive angular and spatial cross correlations between key galaxy properties to search for new fundamental scaling relations that may only become apparent when exploring a database of 108 galaxies with multiband photometry and grism spectroscopy. The case studies will require (i) the creation of a unified WFIRST object catalog consisting of data cross-matched to external catalogs, (ii) an easy-to-access, scalable database, utilizing the latest data discovery and querying techniques, (iii) in situ analyses of large and/or complex data, (iv) identification of links to supporting data and enabling queries spanning WFIRST and other databases, (v) combining simulations with modeling software. To accomplish these objectives, we will prototype a system capable of executing complex user-defined scripts including database access to a shared computational facility with tools for joining WFIRST to other surveys, also enabling comparisons to physical models. Our organizational plan divides the work into several general areas where our team members have specific expertise: (a) apply the 20 queries methodology to derive performance and functionality requirements, (b) develop a practical interactive server-side query system, built on our SDSS experience, (c) apply advanced cross-matching techniques, (d) create mock WFIRST imaging and grism data, (e) develop high level cross correlation tools, (e) optimize scripting systems using high-level languages (iPython), (f) perform close integration of cosmological simulations with observational data, (g) apply advanced machine learning techniques. Our efforts will be coordinated with the WFIRST Science Center (WSC), the other SITs, and the broader community in a manner consistent with direction and review of the Project Office. We will publish our results as milestones are reached, and issue progress reports on a regular basis. We will represent SIT-F at all relevant meetings including meetings of the other SITs (SITs A-E), and participate in "Big Data" conferences to interact with others in the field and learn new techniques that might be applicable to WFIRST.
Nosql for Storage and Retrieval of Large LIDAR Data Collections

NASA Astrophysics Data System (ADS)

Boehm, J.; Liu, K.

2015-08-01

Developments in LiDAR technology over the past decades have made LiDAR to become a mature and widely accepted source of geospatial information. This in turn has led to an enormous growth in data volume. The central idea for a file-centric storage of LiDAR point clouds is the observation that large collections of LiDAR data are typically delivered as large collections of files, rather than single files of terabyte size. This split of the dataset, commonly referred to as tiling, was usually done to accommodate a specific processing pipeline. It makes therefore sense to preserve this split. A document oriented NoSQL database can easily emulate this data partitioning, by representing each tile (file) in a separate document. The document stores the metadata of the tile. The actual files are stored in a distributed file system emulated by the NoSQL database. We demonstrate the use of MongoDB a highly scalable document oriented NoSQL database for storing large LiDAR files. MongoDB like any NoSQL database allows for queries on the attributes of the document. As a specialty MongoDB also allows spatial queries. Hence we can perform spatial queries on the bounding boxes of the LiDAR tiles. Inserting and retrieving files on a cloud-based database is compared to native file system and cloud storage transfer speed.
Astronomical Data Processing Using SciQL, an SQL Based Query Language for Array Data

NASA Astrophysics Data System (ADS)

Zhang, Y.; Scheers, B.; Kersten, M.; Ivanova, M.; Nes, N.

2012-09-01

SciQL (pronounced as ‘cycle’) is a novel SQL-based array query language for scientific applications with both tables and arrays as first class citizens. SciQL lowers the entrance fee of adopting relational DBMS (RDBMS) in scientific domains, because it includes functionality often only found in mathematics software packages. In this paper, we demonstrate the usefulness of SciQL for astronomical data processing using examples from the Transient Key Project of the LOFAR radio telescope. In particular, how the LOFAR light-curve database of all detected sources can be constructed, by correlating sources across the spatial, frequency, time and polarisation domains.
Ibmdbpy-spatial : An Open-source implementation of in-database geospatial analytics in Python

NASA Astrophysics Data System (ADS)

Roy, Avipsa; Fouché, Edouard; Rodriguez Morales, Rafael; Moehler, Gregor

2017-04-01

As the amount of spatial data acquired from several geodetic sources has grown over the years and as data infrastructure has become more powerful, the need for adoption of in-database analytic technology within geosciences has grown rapidly. In-database analytics on spatial data stored in a traditional enterprise data warehouse enables much faster retrieval and analysis for making better predictions about risks and opportunities, identifying trends and spot anomalies. Although there are a number of open-source spatial analysis libraries like geopandas and shapely available today, most of them have been restricted to manipulation and analysis of geometric objects with a dependency on GEOS and similar libraries. We present an open-source software package, written in Python, to fill the gap between spatial analysis and in-database analytics. Ibmdbpy-spatial provides a geospatial extension to the ibmdbpy package, implemented in 2015. It provides an interface for spatial data manipulation and access to in-database algorithms in IBM dashDB, a data warehouse platform with a spatial extender that runs as a service on IBM's cloud platform called Bluemix. Working in-database reduces the network overload, as the complete data need not be replicated into the user's local system altogether and only a subset of the entire dataset can be fetched into memory in a single instance. Ibmdbpy-spatial accelerates Python analytics by seamlessly pushing operations written in Python into the underlying database for execution using the dashDB spatial extender, thereby benefiting from in-database performance-enhancing features, such as columnar storage and parallel processing. The package is currently supported on Python versions from 2.7 up to 3.4. The basic architecture of the package consists of three main components - 1) a connection to the dashDB represented by the instance IdaDataBase, which uses a middleware API namely - pypyodbc or jaydebeapi to establish the database connection via ODBC or JDBC respectively, 2) an instance to represent the spatial data stored in the database as a dataframe in Python, called the IdaGeoDataFrame, with a specific geometry attribute which recognises a planar geometry column in dashDB and 3) Python wrappers for spatial functions like within, distance, area, buffer} and more which dashDB currently supports to make the querying process from Python much simpler for the users. The spatial functions translate well-known geopandas-like syntax into SQL queries utilising the database connection to perform spatial operations in-database and can operate on single geometries as well two different geometries from different IdaGeoDataFrames. The in-database queries strictly follow the standards of OpenGIS Implementation Specification for Geographic information - Simple feature access for SQL. The results of the operations obtained can thereby be accessed dynamically via interactive Jupyter notebooks from any system which supports Python, without any additional dependencies and can also be combined with other open source libraries such as matplotlib and folium in-built within Jupyter notebooks for visualization purposes. We built a use case to analyse crime hotspots in New York city to validate our implementation and visualized the results as a choropleth map for each borough.
A Study of the Efficiency of Spatial Indexing Methods Applied to Large Astronomical Databases

NASA Astrophysics Data System (ADS)

Donaldson, Tom; Berriman, G. Bruce; Good, John; Shiao, Bernie

2018-01-01

Spatial indexing of astronomical databases generally uses quadrature methods, which partition the sky into cells used to create an index (usually a B-tree) written as database column. We report the results of a study to compare the performance of two common indexing methods, HTM and HEALPix, on Solaris and Windows database servers installed with a PostgreSQL database, and a Windows Server installed with MS SQL Server. The indexing was applied to the 2MASS All-Sky Catalog and to the Hubble Source catalog. On each server, the study compared indexing performance by submitting 1 million queries at each index level with random sky positions and random cone search radius, which was computed on a logarithmic scale between 1 arcsec and 1 degree, and measuring the time to complete the query and write the output. These simulated queries, intended to model realistic use patterns, were run in a uniform way on many combinations of indexing method and indexing level. The query times in all simulations are strongly I/O-bound and are linear with number of records returned for large numbers of sources. There are, however, considerable differences between simulations, which reveal that hardware I/O throughput is a more important factor in managing the performance of a DBMS than the choice of indexing scheme. The choice of index itself is relatively unimportant: for comparable index levels, the performance is consistent within the scatter of the timings. At small index levels (large cells; e.g. level 4; cell size 3.7 deg), there is large scatter in the timings because of wide variations in the number of sources found in the cells. At larger index levels, performance improves and scatter decreases, but the improvement at level 8 (14 min) and higher is masked to some extent in the timing scatter caused by the range of query sizes. At very high levels (20; 0.0004 arsec), the granularity of the cells becomes so high that a large number of extraneous empty cells begin to degrade performance. Thus, for the use patterns studied here the database performance is not critically dependent on the exact choices of index or level.
Geologic map of the Patagonia Mountains, Santa Cruz County, Arizona

USGS Publications Warehouse

Graybeal, Frederick T.; Moyer, Lorre A.; Vikre, Peter; Dunlap, Pamela; Wallis, John C.

2015-01-01

Several spatial databases provide data for the geologic map of the Patagonia Mountains in Arizona. The data can be viewed and queried in ArcGIS 10, a geographic information system; a geologic map is also available in PDF format. All products are available online only.
Multi-Mission Geographic Information System for Science Operations: A Test Case Using MSL Data

NASA Astrophysics Data System (ADS)

Calef, F. J.; Abarca, H. E.; Soliman, T.; Abercrombie, S. P.; Powell, M. W.

2017-06-01

The Multi-Mission Geographic Information System (MMGIS) is a NASA AMMOS project in its second year of development, built to display and query science products in a spatial context. We present our progress building this tool using MSL in situ data.
Unleashing spatially distributed ecohydrology modeling using Big Data tools

NASA Astrophysics Data System (ADS)

Miles, B.; Idaszak, R.

2015-12-01

Physically based spatially distributed ecohydrology models are useful for answering science and management questions related to the hydrology and biogeochemistry of prairie, savanna, forested, as well as urbanized ecosystems. However, these models can produce hundreds of gigabytes of spatial output for a single model run over decadal time scales when run at regional spatial scales and moderate spatial resolutions (~100-km2+ at 30-m spatial resolution) or when run for small watersheds at high spatial resolutions (~1-km2 at 3-m spatial resolution). Numerical data formats such as HDF5 can store arbitrarily large datasets. However even in HPC environments, there are practical limits on the size of single files that can be stored and reliably backed up. Even when such large datasets can be stored, querying and analyzing these data can suffer from poor performance due to memory limitations and I/O bottlenecks, for example on single workstations where memory and bandwidth are limited, or in HPC environments where data are stored separately from computational nodes. The difficulty of storing and analyzing spatial data from ecohydrology models limits our ability to harness these powerful tools. Big Data tools such as distributed databases have the potential to surmount the data storage and analysis challenges inherent to large spatial datasets. Distributed databases solve these problems by storing data close to computational nodes while enabling horizontal scalability and fault tolerance. Here we present the architecture of and preliminary results from PatchDB, a distributed datastore for managing spatial output from the Regional Hydro-Ecological Simulation System (RHESSys). The initial version of PatchDB uses message queueing to asynchronously write RHESSys model output to an Apache Cassandra cluster. Once stored in the cluster, these data can be efficiently queried to quickly produce both spatial visualizations for a particular variable (e.g. maps and animations), as well as point time series of arbitrary variables at arbitrary points in space within a watershed or river basin. By treating ecohydrology modeling as a Big Data problem, we hope to provide a platform for answering transformative science and management questions related to water quantity and quality in a world of non-stationary climate.
GeoNetwork powered GI-cat: a geoportal hybrid solution

NASA Astrophysics Data System (ADS)

Baldini, Alessio; Boldrini, Enrico; Santoro, Mattia; Mazzetti, Paolo

2010-05-01

To the aim of setting up a Spatial Data Infrastructures (SDI) the creation of a system for the metadata management and discovery plays a fundamental role. An effective solution is the use of a geoportal (e.g. FAO/ESA geoportal), that has the important benefit of being accessible from a web browser. With this work we present a solution based integrating two of the available frameworks: GeoNetwork and GI-cat. GeoNetwork is an opensource software designed to improve accessibility of a wide variety of data together with the associated ancillary information (metadata), at different scale and from multidisciplinary sources; data are organized and documented in a standard and consistent way. GeoNetwork implements both the Portal and Catalog components of a Spatial Data Infrastructure (SDI) defined in the OGC Reference Architecture. It provides tools for managing and publishing metadata on spatial data and related services. GeoNetwork allows harvesting of various types of web data sources e.g. OGC Web Services (e.g. CSW, WCS, WMS). GI-cat is a distributed catalog based on a service-oriented framework of modular components and can be customized and tailored to support different deployment scenarios. It can federate a multiplicity of catalogs services, as well as inventory and access services in order to discover and access heterogeneous ESS resources. The federated resources are exposed by GI-cat through several standard catalog interfaces (e.g. OGC CSW AP ISO, OpenSearch, etc.) and by the GI-cat extended interface. Specific components implement mediation services for interfacing heterogeneous service providers, each of which exposes a specific standard specification; such components are called Accessors. These mediating components solve providers data modelmultiplicity by mapping them onto the GI-cat internal data model which implements the ISO 19115 Core profile. Accessors also implement the query protocol mapping; first they translate the query requests expressed according to the interface protocols exposed by GI-cat into the multiple query dialects spoken by the resource service providers. Currently, a number of well-accepted catalog and inventory services are supported, including several OGC Web Services, THREDDS Data Server, SeaDataNet Common Data Index, GBIF and OpenSearch engines. A GeoNetwork powered GI-cat has been developed in order to exploit the best of the two frameworks. The new system uses a modified version of GeoNetwork web interface in order to add the capability of querying also the specified GI-cat catalog and not only the GeoNetwork internal database. The resulting system consists in a geoportal in which GI-cat plays the role of the search engine. This new system allows to distribute the query on the different types of data sources linked to a GI-cat. The metadata results of the query are then visualized by the Geonetwork web interface. This configuration was experimented in the framework of GIIDA, a project of the Italian National Research Council (CNR) focused on data accessibility and interoperability. A second advantage of this solution is achieved setting up a GeoNetwork catalog amongst the accessors of the GI-cat instance. Such a configuration will allow in turn GI-cat to run the query against the internal GeoNetwork database. This allows to have both the harvesting and the metadata editor functionalities provided by GeoNetwork and the distributed search functionality of GI-cat available in a consistent way through the same web interface.
Mof-Tree: A Spatial Access Method To Manipulate Multiple Overlapping Features.

ERIC Educational Resources Information Center

Manolopoulos, Yannis; Nardelli, Enrico; Papadopoulos, Apostolos; Proietti, Guido

1997-01-01

Investigates the manipulation of large sets of two-dimensional data representing multiple overlapping features, and presents a new access method, the MOF-tree. Analyzes storage requirements and time with respect to window query operations involving multiple features. Examines both the pointer-based and pointerless MOF-tree representations.…
Providing interoperability of eHealth communities through peer-to-peer networks.

PubMed

Kilic, Ozgur; Dogac, Asuman; Eichelberg, Marco

2010-05-01

Providing an interoperability infrastructure for Electronic Healthcare Records (EHRs) is on the agenda of many national and regional eHealth initiatives. Two important integration profiles have been specified for this purpose, namely, the "Integrating the Healthcare Enterprise (IHE) Cross-enterprise Document Sharing (XDS)" and the "IHE Cross Community Access (XCA)." IHE XDS describes how to share EHRs in a community of healthcare enterprises and IHE XCA describes how EHRs are shared across communities. However, the current version of the IHE XCA integration profile does not address some of the important challenges of cross-community exchange environments. The first challenge is scalability. If every community that joins the network needs to connect to every other community, i.e., a pure peer-to-peer network, this solution will not scale. Furthermore, each community may use a different coding vocabulary for the same metadata attribute, in which case, the target community cannot interpret the query involving such an attribute. Yet another important challenge is that each community may (and typically will) have a different patient identifier domain. Querying for the patient identifiers in the target community using patient demographic data may create patient privacy concerns. In this paper, we address each of these challenges and show how they can be handled effectively in a superpeer-based peer-to-peer architecture.
Graph Databases for Large-Scale Healthcare Systems: A Framework for Efficient Data Management and Data Services

DOE Office of Scientific and Technical Information (OSTI.GOV)

Park, Yubin; Shankar, Mallikarjun; Park, Byung H.

Designing a database system for both efficient data management and data services has been one of the enduring challenges in the healthcare domain. In many healthcare systems, data services and data management are often viewed as two orthogonal tasks; data services refer to retrieval and analytic queries such as search, joins, statistical data extraction, and simple data mining algorithms, while data management refers to building error-tolerant and non-redundant database systems. The gap between service and management has resulted in rigid database systems and schemas that do not support effective analytics. We compose a rich graph structure from an abstracted healthcaremore » RDBMS to illustrate how we can fill this gap in practice. We show how a healthcare graph can be automatically constructed from a normalized relational database using the proposed 3NF Equivalent Graph (3EG) transformation.We discuss a set of real world graph queries such as finding self-referrals, shared providers, and collaborative filtering, and evaluate their performance over a relational database and its 3EG-transformed graph. Experimental results show that the graph representation serves as multiple de-normalized tables, thus reducing complexity in a database and enhancing data accessibility of users. Based on this finding, we propose an ensemble framework of databases for healthcare applications.« less
Can internet search queries be used for dengue fever surveillance in China?

PubMed

Guo, Pi; Wang, Li; Zhang, Yanhong; Luo, Ganfeng; Zhang, Yanting; Deng, Changyu; Zhang, Qin; Zhang, Qingying

2017-10-01

China experienced an unprecedented outbreak of dengue fever in 2014, and the number of cases reached the highest level over the past 25 years. Traditional sentinel surveillance systems of dengue fever in China have an obvious drawback that the average delay from receipt to dissemination of dengue case data is roughly 1-2 weeks. In order to exploit internet search queries to timely monitor dengue fever, we analyzed data of dengue incidence and Baidu search query from 31 provinces in mainland China during the period of January 2011 to December 2014. We found that there was a strong correlation between changes in people's online health-seeking behavior and dengue fever incidence. Our study represents the first attempt demonstrating a strong temporal and spatial correlation between internet search trends and dengue epidemics nationwide in China. The findings will help the government to strengthen the capacity of traditional surveillance systems for dengue fever. Copyright © 2017 The Author(s). Published by Elsevier Ltd.. All rights reserved.
AQUAdexIM: highly efficient in-memory indexing and querying of astronomy time series images

NASA Astrophysics Data System (ADS)

Hong, Zhi; Yu, Ce; Wang, Jie; Xiao, Jian; Cui, Chenzhou; Sun, Jizhou

2016-12-01

Astronomy has always been, and will continue to be, a data-based science, and astronomers nowadays are faced with increasingly massive datasets, one key problem of which is to efficiently retrieve the desired cup of data from the ocean. AQUAdexIM, an innovative spatial indexing and querying method, performs highly efficient on-the-fly queries under users' request to search for Time Series Images from existing observation data on the server side and only return the desired FITS images to users, so users no longer need to download entire datasets to their local machines, which will only become more and more impractical as the data size keeps increasing. Moreover, AQUAdexIM manages to keep a very low storage space overhead and its specially designed in-memory index structure enables it to search for Time Series Images of a given area of the sky 10 times faster than using Redis, a state-of-the-art in-memory database.

Data Container Study for Handling Array-based Data Using Rasdaman, Hive, Spark, and MongoDB

NASA Astrophysics Data System (ADS)

Xu, M.; Hu, F.; Yu, M.; Scheele, C.; Liu, K.; Huang, Q.; Yang, C. P.; Little, M. M.

2016-12-01

Geoscience communities have come up with various big data storage solutions, such as Rasdaman and Hive, to address the grand challenges for massive Earth observation data management and processing. To examine the readiness of current solutions in supporting big Earth observation, we propose to investigate and compare four popular data container solutions, including Rasdaman, Hive, Spark, and MongoDB. Using different types of spatial and non-spatial queries, datasets stored in common scientific data formats (e.g., NetCDF and HDF), and two applications (i.e. dust storm simulation data mining and MERRA data analytics), we systematically compare and evaluate the feature and performance of these four data containers in terms of data discover and access. The computing resources (e.g. CPU, memory, hard drive, network) consumed while performing various queries and operations are monitored and recorded for the performance evaluation. The initial results show that 1) Rasdaman has the best performance for queries on statistical and operational functions, and supports NetCDF data format better than HDF; 2) Rasdaman clustering configuration is more complex than the others; 3) Hive performs better on single pixel extraction from multiple images; and 4) Except for the single pixel extractions, Spark performs better than Hive and its performance is close to Rasdaman. A comprehensive report will detail the experimental results, and compare their pros and cons regarding system performance, ease of use, accessibility, scalability, compatibility, and flexibility.
Finding the Patient's Voice Using Big Data: Analysis of Users' Health-Related Concerns in the ChaCha Question-and-Answer Service (2009-2012).

PubMed

Priest, Chad; Knopf, Amelia; Groves, Doyle; Carpenter, Janet S; Furrey, Christopher; Krishnan, Anand; Miller, Wendy R; Otte, Julie L; Palakal, Mathew; Wiehe, Sarah; Wilson, Jeffrey

2016-03-09

The development of effective health care and public health interventions requires a comprehensive understanding of the perceptions, concerns, and stated needs of health care consumers and the public at large. Big datasets from social media and question-and-answer services provide insight into the public's health concerns and priorities without the financial, temporal, and spatial encumbrances of more traditional community-engagement methods and may prove a useful starting point for public-engagement health research (infodemiology). The objective of our study was to describe user characteristics and health-related queries of the ChaCha question-and-answer platform, and discuss how these data may be used to better understand the perceptions, concerns, and stated needs of health care consumers and the public at large. We conducted a retrospective automated textual analysis of anonymous user-generated queries submitted to ChaCha between January 2009 and November 2012. A total of 2.004 billion queries were read, of which 3.50% (70,083,796/2,004,243,249) were missing 1 or more data fields, leaving 1.934 billion complete lines of data for these analyses. Males and females submitted roughly equal numbers of health queries, but content differed by sex. Questions from females predominantly focused on pregnancy, menstruation, and vaginal health. Questions from males predominantly focused on body image, drug use, and sexuality. Adolescents aged 12-19 years submitted more queries than any other age group. Their queries were largely centered on sexual and reproductive health, and pregnancy in particular. The private nature of the ChaCha service provided a perfect environment for maximum frankness among users, especially among adolescents posing sensitive health questions. Adolescents' sexual health queries reveal knowledge gaps with serious, lifelong consequences. The nature of questions to the service provides opportunities for rapid understanding of health concerns and may lead to development of more effective tailored interventions.
A fast and robust bulk-loading algorithm for indexing very large digital elevation datasets II. Experimental results

NASA Astrophysics Data System (ADS)

Rodríguez, Félix R.; Barrena, Manuel

2011-07-01

The spatial indexing of eventually all the available topographic information of Earth is a highly valuable tool for different geoscientific application domains. The Shuttle Radar Topography Mission (SRTM) collected and made available to the public one of the world's largest digital elevation models (DEMs). With the aim of providing on easier and faster access to these data by improving their further analysis and processing, we have indexed the SRTM DEM by means of a spatial index based on the kd-tree data structure, called the Q-tree. This paper is the second in a two-part series that includes a thorough performance analysis to validate the bulk-load algorithm efficiency of the Q-tree. We investigate performance measuring elapsed time in different contexts, analyzing disk space usage, testing response time with typical queries, and validating the final index structure balance. In addition, the paper includes performance comparisons with Oracle 11g that helps to understand the real cost of our proposal. Our tests prove that the proposed algorithm outperforms Oracle 11g using around a 9% of the elapsed time, taking six times less storage with more than 96% of page utilization, and getting faster response times to spatial queries issued on 4.5 million points. In addition to this, the behavior of the spatial index has been successfully tested on both an open GIS (VT Builder) and a visualizer tool derived from the previous one.
Spatial Processes in Linear Ordering

ERIC Educational Resources Information Center

von Hecker, Ulrich; Klauer, Karl Christoph; Wolf, Lukas; Fazilat-Pour, Masoud

2016-01-01

Memory performance in linear order reasoning tasks (A > B, B > C, C > D, etc.) shows quicker, and more accurate responses to queries on wider (AD) than narrower (AB) pairs on a hypothetical linear mental model (A -- B -- C -- D). While indicative of an analogue representation, research so far did not provide positive evidence for spatial…
Computing quality scores and uncertainty for approximate pattern matching in geospatial semantic graphs

DOE PAGES

Stracuzzi, David John; Brost, Randolph C.; Phillips, Cynthia A.; ...

2015-09-26

Geospatial semantic graphs provide a robust foundation for representing and analyzing remote sensor data. In particular, they support a variety of pattern search operations that capture the spatial and temporal relationships among the objects and events in the data. However, in the presence of large data corpora, even a carefully constructed search query may return a large number of unintended matches. This work considers the problem of calculating a quality score for each match to the query, given that the underlying data are uncertain. As a result, we present a preliminary evaluation of three methods for determining both match qualitymore » scores and associated uncertainty bounds, illustrated in the context of an example based on overhead imagery data.« less
What are we ‘tweeting’ about obesity? Mapping tweets with Topic Modeling and Geographic Information System

PubMed Central

Ghosh, Debarchana (Debs); Guha, Rajarshi

2014-01-01

Public health related tweets are difficult to identify in large conversational datasets like Twitter.com. Even more challenging is the visualization and analyses of the spatial patterns encoded in tweets. This study has the following objectives: How can topic modeling be used to identify relevant public health topics such as obesity on Twitter.com? What are the common obesity related themes? What is the spatial pattern of the themes? What are the research challenges of using large conversational datasets from social networking sites? Obesity is chosen as a test theme to demonstrate the effectiveness of topic modeling using Latent Dirichlet Allocation (LDA) and spatial analysis using Geographic Information System (GIS). The dataset is constructed from tweets (originating from the United States) extracted from Twitter.com on obesity-related queries. Examples of such queries are ‘food deserts’, ‘fast food’, and ‘childhood obesity’. The tweets are also georeferenced and time stamped. Three cohesive and meaningful themes such as ‘childhood obesity and schools’, ‘obesity prevention’, and ‘obesity and food habits’ are extracted from the LDA model. The GIS analysis of the extracted themes show distinct spatial pattern between rural and urban areas, northern and southern states, and between coasts and inland states. Further, relating the themes with ancillary datasets such as US census and locations of fast food restaurants based upon the location of the tweets in a GIS environment opened new avenues for spatial analyses and mapping. Therefore the techniques used in this study provide a possible toolset for computational social scientists in general and health researchers in specific to better understand health problems from large conversational datasets. PMID:25126022
What are we 'tweeting' about obesity? Mapping tweets with Topic Modeling and Geographic Information System.

PubMed

Ghosh, Debarchana Debs; Guha, Rajarshi

2013-01-01

Public health related tweets are difficult to identify in large conversational datasets like Twitter.com. Even more challenging is the visualization and analyses of the spatial patterns encoded in tweets. This study has the following objectives: How can topic modeling be used to identify relevant public health topics such as obesity on Twitter.com? What are the common obesity related themes? What is the spatial pattern of the themes? What are the research challenges of using large conversational datasets from social networking sites? Obesity is chosen as a test theme to demonstrate the effectiveness of topic modeling using Latent Dirichlet Allocation (LDA) and spatial analysis using Geographic Information System (GIS). The dataset is constructed from tweets (originating from the United States) extracted from Twitter.com on obesity-related queries. Examples of such queries are 'food deserts', 'fast food', and 'childhood obesity'. The tweets are also georeferenced and time stamped. Three cohesive and meaningful themes such as 'childhood obesity and schools', 'obesity prevention', and 'obesity and food habits' are extracted from the LDA model. The GIS analysis of the extracted themes show distinct spatial pattern between rural and urban areas, northern and southern states, and between coasts and inland states. Further, relating the themes with ancillary datasets such as US census and locations of fast food restaurants based upon the location of the tweets in a GIS environment opened new avenues for spatial analyses and mapping. Therefore the techniques used in this study provide a possible toolset for computational social scientists in general and health researchers in specific to better understand health problems from large conversational datasets.
Identifying spatially similar gene expression patterns in early stage fruit fly embryo images: binary feature versus invariant moment digital representations

PubMed Central

Gurunathan, Rajalakshmi; Van Emden, Bernard; Panchanathan, Sethuraman; Kumar, Sudhir

2004-01-01

Background Modern developmental biology relies heavily on the analysis of embryonic gene expression patterns. Investigators manually inspect hundreds or thousands of expression patterns to identify those that are spatially similar and to ultimately infer potential gene interactions. However, the rapid accumulation of gene expression pattern data over the last two decades, facilitated by high-throughput techniques, has produced a need for the development of efficient approaches for direct comparison of images, rather than their textual descriptions, to identify spatially similar expression patterns. Results The effectiveness of the Binary Feature Vector (BFV) and Invariant Moment Vector (IMV) based digital representations of the gene expression patterns in finding biologically meaningful patterns was compared for a small (226 images) and a large (1819 images) dataset. For each dataset, an ordered list of images, with respect to a query image, was generated to identify overlapping and similar gene expression patterns, in a manner comparable to what a developmental biologist might do. The results showed that the BFV representation consistently outperforms the IMV representation in finding biologically meaningful matches when spatial overlap of the gene expression pattern and the genes involved are considered. Furthermore, we explored the value of conducting image-content based searches in a dataset where individual expression components (or domains) of multi-domain expression patterns were also included separately. We found that this technique improves performance of both IMV and BFV based searches. Conclusions We conclude that the BFV representation consistently produces a more extensive and better list of biologically useful patterns than the IMV representation. The high quality of results obtained scales well as the search database becomes larger, which encourages efforts to build automated image query and retrieval systems for spatial gene expression patterns. PMID:15603586
geophylobuilder 1.0: an arcgis extension for creating 'geophylogenies'.

PubMed

Kidd, David M; Liu, Xianhua

2008-01-01

Evolution is inherently a spatiotemporal process; however, despite this, phylogenetic and geographical data and models remain largely isolated from one another. Geographical information systems provide a ready-made spatial modelling, analysis and dissemination environment within which phylogenetic models can be explicitly linked with their associated spatial data and subsequently integrated with other georeferenced data sets describing the biotic and abiotic environment. geophylobuilder 1.0 is an extension for the arcgis geographical information system that builds a 'geophylogenetic' data model from a phylogenetic tree and associated geographical data. Geophylogenetic database objects can subsequently be queried, spatially analysed and visualized in both 2D and 3D within a geographical information systems. © 2007 The Authors.
Mercury Toolset for Spatiotemporal Metadata

NASA Technical Reports Server (NTRS)

Wilson, Bruce E.; Palanisamy, Giri; Devarakonda, Ranjeet; Rhyne, B. Timothy; Lindsley, Chris; Green, James

2010-01-01

Mercury (http://mercury.ornl.gov) is a set of tools for federated harvesting, searching, and retrieving metadata, particularly spatiotemporal metadata. Version 3.0 of the Mercury toolset provides orders of magnitude improvements in search speed, support for additional metadata formats, integration with Google Maps for spatial queries, facetted type search, support for RSS (Really Simple Syndication) delivery of search results, and enhanced customization to meet the needs of the multiple projects that use Mercury. It provides a single portal to very quickly search for data and information contained in disparate data management systems, each of which may use different metadata formats. Mercury harvests metadata and key data from contributing project servers distributed around the world and builds a centralized index. The search interfaces then allow the users to perform a variety of fielded, spatial, and temporal searches across these metadata sources. This centralized repository of metadata with distributed data sources provides extremely fast search results to the user, while allowing data providers to advertise the availability of their data and maintain complete control and ownership of that data. Mercury periodically (typically daily) harvests metadata sources through a collection of interfaces and re-indexes these metadata to provide extremely rapid search capabilities, even over collections with tens of millions of metadata records. A number of both graphical and application interfaces have been constructed within Mercury, to enable both human users and other computer programs to perform queries. Mercury was also designed to support multiple different projects, so that the particular fields that can be queried and used with search filters are easy to configure for each different project.
Mercury Toolset for Spatiotemporal Metadata

NASA Astrophysics Data System (ADS)

Devarakonda, Ranjeet; Palanisamy, Giri; Green, James; Wilson, Bruce; Rhyne, B. Timothy; Lindsley, Chris

2010-06-01

Mercury (http://mercury.ornl.gov) is a set of tools for federated harvesting, searching, and retrieving metadata, particularly spatiotemporal metadata. Version 3.0 of the Mercury toolset provides orders of magnitude improvements in search speed, support for additional metadata formats, integration with Google Maps for spatial queries, facetted type search, support for RSS (Really Simple Syndication) delivery of search results, and enhanced customization to meet the needs of the multiple projects that use Mercury. It provides a single portal to very quickly search for data and information contained in disparate data management systems, each of which may use different metadata formats. Mercury harvests metadata and key data from contributing project servers distributed around the world and builds a centralized index. The search interfaces then allow the users to perform a variety of fielded, spatial, and temporal searches across these metadata sources. This centralized repository of metadata with distributed data sources provides extremely fast search results to the user, while allowing data providers to advertise the availability of their data and maintain complete control and ownership of that data. Mercury periodically (typically daily)harvests metadata sources through a collection of interfaces and re-indexes these metadata to provide extremely rapid search capabilities, even over collections with tens of millions of metadata records. A number of both graphical and application interfaces have been constructed within Mercury, to enable both human users and other computer programs to perform queries. Mercury was also designed to support multiple different projects, so that the particular fields that can be queried and used with search filters are easy to configure for each different project.
Route Generation for a Synthetic Character (BOT) Using a Partial or Incomplete Knowledge Route Generation Algorithm in UT2004 Virtual Environment

NASA Technical Reports Server (NTRS)

Hanold, Gregg T.; Hanold, David T.

2010-01-01

This paper presents a new Route Generation Algorithm that accurately and realistically represents human route planning and navigation for Military Operations in Urban Terrain (MOUT). The accuracy of this algorithm in representing human behavior is measured using the Unreal Tournament(Trademark) 2004 (UT2004) Game Engine to provide the simulation environment in which the differences between the routes taken by the human player and those of a Synthetic Agent (BOT) executing the A-star algorithm and the new Route Generation Algorithm can be compared. The new Route Generation Algorithm computes the BOT route based on partial or incomplete knowledge received from the UT2004 game engine during game play. To allow BOT navigation to occur continuously throughout the game play with incomplete knowledge of the terrain, a spatial network model of the UT2004 MOUT terrain is captured and stored in an Oracle 11 9 Spatial Data Object (SOO). The SOO allows a partial data query to be executed to generate continuous route updates based on the terrain knowledge, and stored dynamic BOT, Player and environmental parameters returned by the query. The partial data query permits the dynamic adjustment of the planned routes by the Route Generation Algorithm based on the current state of the environment during a simulation. The dynamic nature of this algorithm more accurately allows the BOT to mimic the routes taken by the human executing under the same conditions thereby improving the realism of the BOT in a MOUT simulation environment.
DownscaleConcept 2.3 User Manual. Downscaled, Spatially Distributed Soil Moisture Calculator

DTIC Science & Technology

2011-01-01

be first presented with the dataset 28 results to your query. From this page, check the box next to the ASTER GDEM dataset and press the "List...information for verification. No charge will be associated with GDEM data archives. 14. Select "Submit Order Now!" to process your order. 15. Wait for
Spatial cyberinfrastructures, ontologies, and the humanities.

PubMed

Sieber, Renee E; Wellen, Christopher C; Jin, Yuan

2011-04-05

We report on research into building a cyberinfrastructure for Chinese biographical and geographic data. Our cyberinfrastructure contains (i) the McGill-Harvard-Yenching Library Ming Qing Women's Writings database (MQWW), the only online database on historical Chinese women's writings, (ii) the China Biographical Database, the authority for Chinese historical people, and (iii) the China Historical Geographical Information System, one of the first historical geographic information systems. Key to this integration is that linked databases retain separate identities as bases of knowledge, while they possess sufficient semantic interoperability to allow for multidatabase concepts and to support cross-database queries on an ad hoc basis. Computational ontologies create underlying semantics for database access. This paper focuses on the spatial component in a humanities cyberinfrastructure, which includes issues of conflicting data, heterogeneous data models, disambiguation, and geographic scale. First, we describe the methodology for integrating the databases. Then we detail the system architecture, which includes a tier of ontologies and schema. We describe the user interface and applications that allow for cross-database queries. For instance, users should be able to analyze the data, examine hypotheses on spatial and temporal relationships, and generate historical maps with datasets from MQWW for research, teaching, and publication on Chinese women writers, their familial relations, publishing venues, and the literary and social communities. Last, we discuss the social side of cyberinfrastructure development, as people are considered to be as critical as the technical components for its success.
On describing human white matter anatomy: the white matter query language.

PubMed

Wassermann, Demian; Makris, Nikos; Rathi, Yogesh; Shenton, Martha; Kikinis, Ron; Kubicki, Marek; Westin, Carl-Fredrik

2013-01-01

The main contribution of this work is the careful syntactical definition of major white matter tracts in the human brain based on a neuroanatomist's expert knowledge. We present a technique to formally describe white matter tracts and to automatically extract them from diffusion MRI data. The framework is based on a novel query language with a near-to-English textual syntax. This query language allows us to construct a dictionary of anatomical definitions describing white matter tracts. The definitions include adjacent gray and white matter regions, and rules for spatial relations. This enables automated coherent labeling of white matter anatomy across subjects. We use our method to encode anatomical knowledge in human white matter describing 10 association and 8 projection tracts per hemisphere and 7 commissural tracts. The technique is shown to be comparable in accuracy to manual labeling. We present results applying this framework to create a white matter atlas from 77 healthy subjects, and we use this atlas in a proof-of-concept study to detect tract changes specific to schizophrenia.
Combining semantic technologies with a content-based image retrieval system - Preliminary considerations

NASA Astrophysics Data System (ADS)

Chmiel, P.; Ganzha, M.; Jaworska, T.; Paprzycki, M.

2017-10-01

Nowadays, as a part of systematic growth of volume, and variety, of information that can be found on the Internet, we observe also dramatic increase in sizes of available image collections. There are many ways to help users browsing / selecting images of interest. One of popular approaches are Content-Based Image Retrieval (CBIR) systems, which allow users to search for images that match their interests, expressed in the form of images (query by example). However, we believe that image search and retrieval could take advantage of semantic technologies. We have decided to test this hypothesis. Specifically, on the basis of knowledge captured in the CBIR, we have developed a domain ontology of residential real estate (detached houses, in particular). This allows us to semantically represent each image (and its constitutive architectural elements) represented within the CBIR. The proposed ontology was extended to capture not only the elements resulting from image segmentation, but also "spatial relations" between them. As a result, a new approach to querying the image database (semantic querying) has materialized, thus extending capabilities of the developed system.
Pharmit: interactive exploration of chemical space.

PubMed

Sunseri, Jocelyn; Koes, David Ryan

2016-07-08

Pharmit (http://pharmit.csb.pitt.edu) provides an online, interactive environment for the virtual screening of large compound databases using pharmacophores, molecular shape and energy minimization. Users can import, create and edit virtual screening queries in an interactive browser-based interface. Queries are specified in terms of a pharmacophore, a spatial arrangement of the essential features of an interaction, and molecular shape. Search results can be further ranked and filtered using energy minimization. In addition to a number of pre-built databases of popular compound libraries, users may submit their own compound libraries for screening. Pharmit uses state-of-the-art sub-linear algorithms to provide interactive screening of millions of compounds. Queries typically take a few seconds to a few minutes depending on their complexity. This allows users to iteratively refine their search during a single session. The easy access to large chemical datasets provided by Pharmit simplifies and accelerates structure-based drug design. Pharmit is available under a dual BSD/GPL open-source license. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Geospatial data sharing, online spatial analysis and processing of Indian Biodiversity data in Internet GIS domain - A case study for raster based online geo-processing

NASA Astrophysics Data System (ADS)

Karnatak, H.; Pandey, K.; Oberai, K.; Roy, A.; Joshi, D.; Singh, H.; Raju, P. L. N.; Krishna Murthy, Y. V. N.

2014-11-01

National Biodiversity Characterization at Landscape Level, a project jointly sponsored by Department of Biotechnology and Department of Space, was implemented to identify and map the potential biodiversity rich areas in India. This project has generated spatial information at three levels viz. Satellite based primary information (Vegetation Type map, spatial locations of road & village, Fire occurrence); geospatially derived or modelled information (Disturbance Index, Fragmentation, Biological Richness) and geospatially referenced field samples plots. The study provides information of high disturbance and high biological richness areas suggesting future management strategies and formulating action plans. The study has generated for the first time baseline database in India which will be a valuable input towards climate change study in the Indian Subcontinent. The spatial data generated during the study is organized as central data repository in Geo-RDBMS environment using PostgreSQL and POSTGIS. The raster and vector data is published as OGC WMS and WFS standard for development of web base geoinformation system using Service Oriented Architecture (SOA). The WMS and WFS based system allows geo-visualization, online query and map outputs generation based on user request and response. This is a typical mashup architecture based geo-information system which allows access to remote web services like ISRO Bhuvan, Openstreet map, Google map etc., with overlay on Biodiversity data for effective study on Bio-resources. The spatial queries and analysis with vector data is achieved through SQL queries on POSTGIS and WFS-T operations. But the most important challenge is to develop a system for online raster based geo-spatial analysis and processing based on user defined Area of Interest (AOI) for large raster data sets. The map data of this study contains approximately 20 GB of size for each data layer which are five in number. An attempt has been to develop system using python, PostGIS and PHP for raster data analysis over the web for Biodiversity conservation and prioritization. The developed system takes inputs from users as WKT, Openlayer based Polygon geometry and Shape file upload as AOI to perform raster based operation using Python and GDAL/OGR. The intermediate products are stored in temporary files and tables which generate XML outputs for web representation. The raster operations like clip-zip-ship, class wise area statistics, single to multi-layer operations, diagrammatic representation and other geo-statistical analysis are performed. This is indigenous geospatial data processing engine developed using Open system architecture for spatial analysis of Biodiversity data sets in Internet GIS environment. The performance of this applications in multi-user environment like Internet domain is another challenging task which is addressed by fine tuning the source code, server hardening, spatial indexing and running the process in load balance mode. The developed system is hosted in Internet domain (http://bis.iirs.gov.in) for user access.
Titanbrowse: a new paradigm for access, visualization and analysis of hyperspectral imaging

NASA Astrophysics Data System (ADS)

Penteado, Paulo F.

2016-10-01

Currently there are archives and tools to explore remote sensing imaging, but these lack some functionality needed for hyperspectral imagers: 1) Querying and serving only whole datacubes is not enough, since in each cube there is typically a large variation in observation geometry over the spatial pixels. Thus, often the most useful unit for selecting observations of interest is not a whole cube but rather a single spectrum. 2) Pixel-specific geometric data included in the standard pipelines is calculated at only one point per pixel. Particularly for selections of pixels from many different cubes, or observations near the limb, it is necessary to know the actual extent of each pixel. 3) Database queries need not only metadata, but also by the spectral data. For instance, one query might look for atypical values of some band, or atypical relations between bands, denoting spectral features (such as ratios or differences between bands). 4) There is the need to evaluate arbitrary, dynamically-defined, complex functions of the data (beyond just simple arithmetic operations), both for selection in the queries, and for visualization, to interactively tune the queries to the observations of interest. 5) Making the most useful query for some analysis often requires interactive visualization integrated with data selection and processing, because the user needs to explore how different functions of the data vary over the observations without having to download data and import it into visualization software. 6) Complementary to interactive use, an API allowing programmatic access to the system is needed for systematic data analyses. 7) Direct access to calibrated and georeferenced data, without the need to download data and software and learn to process it.We present titanbrowse, a database, exploration and visualization system for Cassini VIMS observations of Titan, designed to fullfill the aforementioned needs. While it originallly ran on data in the user's computer, we are now developing an online version, so that users do not need to download software and data. The server, which we maintain, processes the queries and communicates the results to the client the user runs. http://ppenteado.net/titanbrowse.
The Semantic Retrieval of Spatial Data Service Based on Ontology in SIG

NASA Astrophysics Data System (ADS)

Sun, S.; Liu, D.; Li, G.; Yu, W.

2011-08-01

The research of SIG (Spatial Information Grid) mainly solves the problem of how to connect different computing resources, so that users can use all the resources in the Grid transparently and seamlessly. In SIG, spatial data service is described in some kinds of specifications, which use different meta-information of each kind of services. This kind of standardization cannot resolve the problem of semantic heterogeneity, which may limit user to obtain the required resources. This paper tries to solve two kinds of semantic heterogeneities (name heterogeneity and structure heterogeneity) in spatial data service retrieval based on ontology, and also, based on the hierarchical subsumption relationship among concept in ontology, the query words can be extended and more resource can be matched and found for user. These applications of ontology in spatial data resource retrieval can help to improve the capability of keyword matching, and find more related resources.

Array Processing in the Cloud: the rasdaman Approach

NASA Astrophysics Data System (ADS)

Merticariu, Vlad; Dumitru, Alex

2015-04-01

The multi-dimensional array data model is gaining more and more attention when dealing with Big Data challenges in a variety of domains such as climate simulations, geographic information systems, medical imaging or astronomical observations. Solutions provided by classical Big Data tools such as Key-Value Stores and MapReduce, as well as traditional relational databases, proved to be limited in domains associated with multi-dimensional data. This problem has been addressed by the field of array databases, in which systems provide database services for raster data, without imposing limitations on the number of dimensions that a dataset can have. Examples of datasets commonly handled by array databases include 1-dimensional sensor data, 2-D satellite imagery, 3-D x/y/t image time series as well as x/y/z geophysical voxel data, and 4-D x/y/z/t weather data. And this can grow as large as simulations of the whole universe when it comes to astrophysics. rasdaman is a well established array database, which implements many optimizations for dealing with large data volumes and operation complexity. Among those, the latest one is intra-query parallelization support: a network of machines collaborate for answering a single array database query, by dividing it into independent sub-queries sent to different servers. This enables massive processing speed-ups, which promise solutions to research challenges on multi-Petabyte data cubes. There are several correlated factors which influence the speedup that intra-query parallelisation brings: the number of servers, the capabilities of each server, the quality of the network, the availability of the data to the server that needs it in order to compute the result and many more. In the effort of adapting the engine to cloud processing patterns, two main components have been identified: one that handles communication and gathers information about the arrays sitting on every server, and a processing unit responsible with dividing work among available nodes and executing operations on local data. The federation daemon collects and stores statistics from the other network nodes and provides real time updates about local changes. Information exchanged includes available datasets, CPU load and memory usage per host. The processing component is represented by the rasdaman server. Using information from the federation daemon it breaks queries into subqueries to be executed on peer nodes, ships them, and assembles the intermediate results. Thus, we define a rasdaman network node as a pair of a federation daemon and a rasdaman server. Any node can receive a query and will subsequently act as this query's dispatcher, so all peers are at the same level and there is no single point of failure. Should a node become inaccessible then the peers will recognize this and will not any longer consider this peer for distribution. Conversely, a peer at any time can join the network. To assess the feasibility of our approach, we deployed a rasdaman network in the Amazon Elastic Cloud environment on 1001 nodes, and observed that this feature can greatly increase the performance and scalability of the system, offering a large throughput of processed data.
Mining Co-Location Patterns with Clustering Items from Spatial Data Sets

NASA Astrophysics Data System (ADS)

Zhou, G.; Li, Q.; Deng, G.; Yue, T.; Zhou, X.

2018-05-01

The explosive growth of spatial data and widespread use of spatial databases emphasize the need for the spatial data mining. Co-location patterns discovery is an important branch in spatial data mining. Spatial co-locations represent the subsets of features which are frequently located together in geographic space. However, the appearance of a spatial feature C is often not determined by a single spatial feature A or B but by the two spatial features A and B, that is to say where A and B appear together, C often appears. We note that this co-location pattern is different from the traditional co-location pattern. Thus, this paper presents a new concept called clustering terms, and this co-location pattern is called co-location patterns with clustering items. And the traditional algorithm cannot mine this co-location pattern, so we introduce the related concept in detail and propose a novel algorithm. This algorithm is extended by join-based approach proposed by Huang. Finally, we evaluate the performance of this algorithm.
Gazetteer Brokering through Semantic Mediation

NASA Astrophysics Data System (ADS)

Hobona, G.; Bermudez, L. E.; Brackin, R.

2013-12-01

A gazetteer is a geographical directory containing some information regarding places. It provides names, location and other attributes for places which may include points of interest (e.g. buildings, oilfields and boreholes), and other features. These features can be published via web services conforming to the Gazetteer Application Profile of the Web Feature Service (WFS) standard of the Open Geospatial Consortium (OGC). Against the backdrop of advances in geophysical surveys, there has been a significant increase in the amount of data referenced to locations. Gazetteers services have played a significant role in facilitating access to such data, including through provision of specialized queries such as text, spatial and fuzzy search. Recent developments in the OGC have led to advances in gazetteers such as support for multilingualism, diacritics, and querying via advanced spatial constraints (e.g. search by radial search and nearest neighbor). A challenge remaining however, is that gazetteers produced by different organizations have typically been modeled differently. Inconsistencies from gazetteers produced by different organizations may include naming the same feature in a different way, naming the attributes differently, locating the feature in a different location, and providing fewer or more attributes than the other services. The Gazetteer application profile of the WFS is a starting point to address such inconsistencies by providing a standardized interface based on rules specified in ISO 19112, the international standard for spatial referencing by geographic identifiers. The profile, however, does not provide rules to deal with semantic inconsistencies. The USGS and NGA commissioned research into the potential for a Single Point of Entry Global Gazetteer (SPEGG). The research was conducted by the Cross Community Interoperability thread of the OGC testbed, referenced OWS-9. The testbed prototyped approaches for brokering gazetteers through use of semantic web technologies, including ontologies and a semantic mediator. The semantically-enhanced SPEGG allowed a client to submit a single query (e.g. ';hills') and to retrieve data from two separate gazetteers with different vocabularies (e.g. where one refers to ';summits' another refers to ';hills'). Supporting the SPEGG was a SPARQL server that held the ontologies and processed queries on them. Earth Science surveys and forecast always have a place on Earth. Being able to share the information about a place and solve inconsistencies about that place from different sources will enable geoscientists to better do their research. In the advent of mobile geo computing and location based services (LBS), brokering gazetteers will provide geoscientists with access to gazetteer services rich with information and functionality beyond that offered by current generic gazetteers.
A Prototype Digital Library for 3D Collections: Tools To Capture, Model, Analyze, and Query Complex 3D Data.

ERIC Educational Resources Information Center

Rowe, Jeremy; Razdan, Anshuman

The Partnership for Research in Spatial Modeling (PRISM) project at Arizona State University (ASU) developed modeling and analytic tools to respond to the limitations of two-dimensional (2D) data representations perceived by affiliated discipline scientists, and to take advantage of the enhanced capabilities of three-dimensional (3D) data that…
Comparison of Programs Used for FIA Inventory Information Dissemination and Spatial Representation

Treesearch

Roger C. Lowe; Chris J. Cieszewski

2005-01-01

Six online applications developed for the interactive display of Forest Inventory and Analysis (FIA) data in which FIA database information and query results can be viewed as or selected from interactive geographic maps are compared. The programs evaluated are the U.S. Department of Agriculture Forest Service?s online systems; a SAS server-based mapping system...
Image-Based Airborne LiDAR Point Cloud Encoding for 3d Building Model Retrieval

NASA Astrophysics Data System (ADS)

Chen, Yi-Chen; Lin, Chao-Hung

2016-06-01

With the development of Web 2.0 and cyber city modeling, an increasing number of 3D models have been available on web-based model-sharing platforms with many applications such as navigation, urban planning, and virtual reality. Based on the concept of data reuse, a 3D model retrieval system is proposed to retrieve building models similar to a user-specified query. The basic idea behind this system is to reuse these existing 3D building models instead of reconstruction from point clouds. To efficiently retrieve models, the models in databases are compactly encoded by using a shape descriptor generally. However, most of the geometric descriptors in related works are applied to polygonal models. In this study, the input query of the model retrieval system is a point cloud acquired by Light Detection and Ranging (LiDAR) systems because of the efficient scene scanning and spatial information collection. Using Point clouds with sparse, noisy, and incomplete sampling as input queries is more difficult than that by using 3D models. Because that the building roof is more informative than other parts in the airborne LiDAR point cloud, an image-based approach is proposed to encode both point clouds from input queries and 3D models in databases. The main goal of data encoding is that the models in the database and input point clouds can be consistently encoded. Firstly, top-view depth images of buildings are generated to represent the geometry surface of a building roof. Secondly, geometric features are extracted from depth images based on height, edge and plane of building. Finally, descriptors can be extracted by spatial histograms and used in 3D model retrieval system. For data retrieval, the models are retrieved by matching the encoding coefficients of point clouds and building models. In experiments, a database including about 900,000 3D models collected from the Internet is used for evaluation of data retrieval. The results of the proposed method show a clear superiority over related methods.
Landscape development and mule deer habitat in Central Oregon

Treesearch

James A. Duncan; Theresa Burcsu

2012-01-01

This research explored the ecological consequences of rural residential development and different management regimes on a tract of former industrial timberland in central Oregon known as the Bull Springs. Forage quality and habitat suitability models for mule deer (Odocoileus hemionus) winter range were joined to the outputs of a spatially explicit...
Pharmacodynamic Assay Panel for Monitoring Phospho-Signaling Networks | Office of Cancer Clinical Proteomics Research

Cancer.gov

The DNA damage response (DDR) is a highly regulated signal transduction network that orchestrates the temporal and spatial organization of protein complexes required to repair (or tolerate) DNA damage (e.g., nucleotide excision repair, base excision repair, homologous recombination, non-homologous end joining, post-replication repair).
Spatial Analysis of Market Economy Innovations in the Former Soviet Union: The Case of Commodity Exchanges

DTIC Science & Technology

1992-10-06

current cooperative movement in the former Soviet Uni’n can be traced to a law adopted in November, 1986 which permitted family members residing...the labor joined the cooperative movement in 1988. Uzbekistan, the largest of the Central Asian republics, claims only one exchange in Tashkent
RIMS: An Integrated Mapping and Analysis System with Applications to Earth Sciences and Hydrology

NASA Astrophysics Data System (ADS)

Proussevitch, A. A.; Glidden, S.; Shiklomanov, A. I.; Lammers, R. B.

2011-12-01

A web-based information and computational system for analysis of spatially distributed Earth system, climate, and hydrologic data have been developed. The System allows visualization, data exploration, querying, manipulation and arbitrary calculations with any loaded gridded or vector polygon dataset. The system's acronym, RIMS, stands for its core functionality as a Rapid Integrated Mapping System. The system can be deployed for a Global scale projects as well as for regional hydrology and climatology studies. In particular, the Water Systems Analysis Group of the University of New Hampshire developed the global and regional (Northern Eurasia, pan-Arctic) versions of the system with different map projections and specific data. The system has demonstrated its potential for applications in other fields of Earth sciences and education. The key Web server/client components of the framework include (a) a visualization engine built on Open Source libraries (GDAL, PROJ.4, etc.) that are utilized in a MapServer; (b) multi-level data querying tools built on XML server-client communication protocols that allow downloading map data on-the-fly to a client web browser; and (c) data manipulation and grid cell level calculation tools that mimic desktop GIS software functionality via a web interface. Server side data management of the system is designed around a simple database of dataset metadata facilitating mounting of new data to the system and maintaining existing data in an easy manner. RIMS contains "built-in" river network data that allows for query of upstream areas on-demand which can be used for spatial data aggregation and analysis of sub-basin areas. RIMS is an ongoing effort and currently being used to serve a number of websites hosting a suite of hydrologic, environmental and other GIS data.
Spatial and spatiotemporal pattern analysis of coconut lethal yellowing in Mozambique.

PubMed

Bonnot, F; de Franqueville, H; Lourenço, E

2010-04-01

Coconut lethal yellowing (LY) is caused by a phytoplasma and is a major threat for coconut production throughout its growing area. Incidence of LY was monitored visually on every coconut tree in six fields in Mozambique for 34 months. Disease progress curves were plotted and average monthly disease incidence was estimated. Spatial patterns of disease incidence were analyzed at six assessment times. Aggregation was tested by the coefficient of spatial autocorrelation of the beta-binomial distribution of diseased trees in quadrats. The binary power law was used as an assessment of overdispersion across the six fields. Spatial autocorrelation between symptomatic trees was measured by the BB join count statistic based on the number of pairs of diseased trees separated by a specific distance and orientation, and tested using permutation methods. Aggregation of symptomatic trees was detected in every field in both cumulative and new cases. Spatiotemporal patterns were analyzed with two methods. The proximity of symptomatic trees at two assessment times was investigated using the spatiotemporal BB join count statistic based on the number of pairs of trees separated by a specific distance and orientation and exhibiting the first symptoms of LY at the two times. The semivariogram of times of appearance of LY was calculated to characterize how the lag between times of appearance of LY was related to the distance between symptomatic trees. Both statistics were tested using permutation methods. A tendency for new cases to appear in the proximity of previously diseased trees and a spatially structured pattern of times of appearance of LY within clusters of diseased trees were detected, suggesting secondary spread of the disease.
Ballistic-Failure Mechanisms in Gas Metal Arc Welds of Mil A46100 Armor-Grade Steel: A Computational Investigation

NASA Astrophysics Data System (ADS)

Grujicic, M.; Snipes, J. S.; Galgalikar, R.; Ramaswami, S.; Yavari, R.; Yen, C.-F.; Cheeseman, B. A.

2014-09-01

In our recent work, a multi-physics computational model for the conventional gas metal arc welding (GMAW) joining process was introduced. The model is of a modular type and comprises five modules, each designed to handle a specific aspect of the GMAW process, i.e.: (i) electro-dynamics of the welding-gun; (ii) radiation-/convection-controlled heat transfer from the electric-arc to the workpiece and mass transfer from the filler-metal consumable electrode to the weld; (iii) prediction of the temporal evolution and the spatial distribution of thermal and mechanical fields within the weld region during the GMAW joining process; (iv) the resulting temporal evolution and spatial distribution of the material microstructure throughout the weld region; and (v) spatial distribution of the as-welded material mechanical properties. In the present work, the GMAW process model has been upgraded with respect to its predictive capabilities regarding the spatial distribution of the mechanical properties controlling the ballistic-limit (i.e., penetration-resistance) of the weld. The model is upgraded through the introduction of the sixth module in the present work in recognition of the fact that in thick steel GMAW weldments, the overall ballistic performance of the armor may become controlled by the (often inferior) ballistic limits of its weld (fusion and heat-affected) zones. To demonstrate the utility of the upgraded GMAW process model, it is next applied to the case of butt-welding of a prototypical high-hardness armor-grade martensitic steel, MIL A46100. The model predictions concerning the spatial distribution of the material microstructure and ballistic-limit-controlling mechanical properties within the MIL A46100 butt-weld are found to be consistent with prior observations and general expectations.
Optimization of Gas Metal Arc Welding (GMAW) Process for Maximum Ballistic Limit in MIL A46100 Steel Welded All-Metal Armor

NASA Astrophysics Data System (ADS)

Grujicic, M.; Ramaswami, S.; Snipes, J. S.; Yavari, R.; Yen, C.-F.; Cheeseman, B. A.

2015-01-01

Our recently developed multi-physics computational model for the conventional gas metal arc welding (GMAW) joining process has been upgraded with respect to its predictive capabilities regarding the process optimization for the attainment of maximum ballistic limit within the weld. The original model consists of six modules, each dedicated to handling a specific aspect of the GMAW process, i.e., (a) electro-dynamics of the welding gun; (b) radiation-/convection-controlled heat transfer from the electric arc to the workpiece and mass transfer from the filler metal consumable electrode to the weld; (c) prediction of the temporal evolution and the spatial distribution of thermal and mechanical fields within the weld region during the GMAW joining process; (d) the resulting temporal evolution and spatial distribution of the material microstructure throughout the weld region; (e) spatial distribution of the as-welded material mechanical properties; and (f) spatial distribution of the material ballistic limit. In the present work, the model is upgraded through the introduction of the seventh module in recognition of the fact that identification of the optimum GMAW process parameters relative to the attainment of the maximum ballistic limit within the weld region entails the use of advanced optimization and statistical sensitivity analysis methods and tools. The upgraded GMAW process model is next applied to the case of butt welding of MIL A46100 (a prototypical high-hardness armor-grade martensitic steel) workpieces using filler metal electrodes made of the same material. The predictions of the upgraded GMAW process model pertaining to the spatial distribution of the material microstructure and ballistic limit-controlling mechanical properties within the MIL A46100 butt weld are found to be consistent with general expectations and prior observations.
Spatial cyberinfrastructures, ontologies, and the humanities

PubMed Central

Sieber, Renee E.; Wellen, Christopher C.; Jin, Yuan

2011-01-01

We report on research into building a cyberinfrastructure for Chinese biographical and geographic data. Our cyberinfrastructure contains (i) the McGill-Harvard-Yenching Library Ming Qing Women's Writings database (MQWW), the only online database on historical Chinese women's writings, (ii) the China Biographical Database, the authority for Chinese historical people, and (iii) the China Historical Geographical Information System, one of the first historical geographic information systems. Key to this integration is that linked databases retain separate identities as bases of knowledge, while they possess sufficient semantic interoperability to allow for multidatabase concepts and to support cross-database queries on an ad hoc basis. Computational ontologies create underlying semantics for database access. This paper focuses on the spatial component in a humanities cyberinfrastructure, which includes issues of conflicting data, heterogeneous data models, disambiguation, and geographic scale. First, we describe the methodology for integrating the databases. Then we detail the system architecture, which includes a tier of ontologies and schema. We describe the user interface and applications that allow for cross-database queries. For instance, users should be able to analyze the data, examine hypotheses on spatial and temporal relationships, and generate historical maps with datasets from MQWW for research, teaching, and publication on Chinese women writers, their familial relations, publishing venues, and the literary and social communities. Last, we discuss the social side of cyberinfrastructure development, as people are considered to be as critical as the technical components for its success. PMID:21444819
The research and implementation of coalfield spontaneous combustion of carbon emission WebGIS based on Silverlight and ArcGIS server

NASA Astrophysics Data System (ADS)

Zhu, Z.; Bi, J.; Wang, X.; Zhu, W.

2014-02-01

As an important sub-topic of the natural process of carbon emission data public information platform construction, coalfield spontaneous combustion of carbon emission WebGIS system has become an important study object. In connection with data features of coalfield spontaneous combustion carbon emissions (i.e. a wide range of data, which is rich and complex) and the geospatial characteristics, data is divided into attribute data and spatial data. Based on full analysis of the data, completed the detailed design of the Oracle database and stored on the Oracle database. Through Silverlight rich client technology and the expansion of WCF services, achieved the attribute data of web dynamic query, retrieval, statistical, analysis and other functions. For spatial data, we take advantage of ArcGIS Server and Silverlight-based API to invoke GIS server background published map services, GP services, Image services and other services, implemented coalfield spontaneous combustion of remote sensing image data and web map data display, data analysis, thematic map production. The study found that the Silverlight technology, based on rich client and object-oriented framework for WCF service, can efficiently constructed a WebGIS system. And then, combined with ArcGIS Silverlight API to achieve interactive query attribute data and spatial data of coalfield spontaneous emmission, can greatly improve the performance of WebGIS system. At the same time, it provided a strong guarantee for the construction of public information on China's carbon emission data.
A Services-Oriented Architecture for Water Observations Data

NASA Astrophysics Data System (ADS)

Maidment, D. R.; Zaslavsky, I.; Valentine, D.; Tarboton, D. G.; Whitenack, T.; Whiteaker, T.; Hooper, R.; Kirschtel, D.

2009-04-01

Water observations data are time series of measurements made at point locations of water level, flow, and quality and corresponding data for climatic observations at point locations such as gaged precipitation and weather variables. A services-oriented architecture has been built for such information for the United States that has three components: hydrologic information servers, hydrologic information clients, and a centralized metadata cataloging system. These are connected using web services for observations data and metadata defined by an XML-based language called WaterML. A Hydrologic Information Server can be built by storing observations data in a relational database schema in the CUAHSI Observations Data Model, in which case, web services access to the data and metadata is automatically provided by query functions for WaterML that are wrapped around the relational database within a web server. A Hydrologic Information Server can also be constructed by custom-programming an interface to an existing water agency web site so that responds to the same queries by producing data in WaterML as do the CUAHSI Observations Data Model based servers. A Hydrologic Information Client is one which can interpret and ingest WaterML metadata and data. We have two client applications for Excel and ArcGIS and have shown how WaterML web services can be ingested into programming environments such as Matlab and Visual Basic. HIS Central, maintained at the San Diego Supercomputer Center is a repository of observational metadata for WaterML web services which presently indexes 342 million data measured at 1.75 million locations. This is the largest catalog water observational data for the United States presently in existence. As more observation networks join what we term "CUAHSI Water Data Federation", and the system accommodates a growing number of sites, measured parameters, applications, and users, rapid and reliable access to large heterogeneous hydrologic data repositories becomes critical. The CUAHSI HIS solution to the scalability and heterogeneity challenges has several components. Structural differences across the data repositories are addressed by building a standard services foundation for the exchange of hydrologic data, as derived from a common information model for observational data measured at stationary points and its implementation as a relational schema (ODM) and an XML schema (WaterML). Semantic heterogeneity is managed by mapping water quantity, water quality, and other parameters collected by government agencies and academic projects to a common ontology. The WaterML-compliant web services are indexed in a community services registry called HIS Central (hiscentral.cuahsi.org). Once a web service is registered in HIS Central, its metadata (site and variable characteristics, period of record for each variable at each site, etc.) is harvested and appended to the central catalog. The catalog is further updated as the service publisher associates the variables in the published service with ontology concepts. After this, the newly published service becomes available for spatial and semantics-based queries from online and desktop client applications developed by the project. Hydrologic system server software is now deployed at more than a dozen locations in the United States and Australia. To provide rapid access to data summaries, in particular for several nation-wide data repositories including EPA STORET, USGS NWIS, and USDA SNOTEL, we convert the observation data catalogs and databases with harvested data values into special representations that support high-performance analysis and visualization. The construction of OLAP (Online Analytical Processing) cubes, often called data cubes, is an approach to organizing and querying large multi-dimensional data collections. We have applied the OLAP techniques, as implemented in Microsoft SQL Server 2005/2008, to the analysis of the catalogs from several agencies. OLAP analysis results reflect geography and history of observation data availability from USGS NWIS, EPA STORET, and USDA SNOTEL repositories, and spatial and temporal dynamics of the available measurements for several key nutrient-related parameters. Our experience developing the CUAHSI HIS cyberinfrastructure demonstrated that efficient integration of hydrologic observations from multiple government and academic sources requires a range of technical approaches focused on managing different components of data heterogeneity and system scalability. While this submission addresses technical aspects of developing a national-scale information system for hydrologic observations, the challenges of explicating shared semantics of hydrologic observations and building a community of HIS users and developers remain critical in constructing a nation-wide federation of water data services.
Geologic database for digital geology of California, Nevada, and Utah: an application of the North American Data Model

USGS Publications Warehouse

Bedford, David R.; Ludington, Steve; Nutt, Constance M.; Stone, Paul A.; Miller, David M.; Miller, Robert J.; Wagner, David L.; Saucedo, George J.

2003-01-01

The USGS is creating an integrated national database for digital state geologic maps that includes stratigraphic, age, and lithologic information. The majority of the conterminous 48 states have digital geologic base maps available, often at scales of 1:500,000. This product is a prototype, and is intended to demonstrate the types of derivative maps that will be possible with the national integrated database. This database permits the creation of a number of types of maps via simple or sophisticated queries, maps that may be useful in a number of areas, including mineral-resource assessment, environmental assessment, and regional tectonic evolution. This database is distributed with three main parts: a Microsoft Access 2000 database containing geologic map attribute data, an Arc/Info (Environmental Systems Research Institute, Redlands, California) Export format file containing points representing designation of stratigraphic regions for the Geologic Map of Utah, and an ArcView 3.2 (Environmental Systems Research Institute, Redlands, California) project containing scripts and dialogs for performing a series of generalization and mineral resource queries. IMPORTANT NOTE: Spatial data for the respective stage geologic maps is not distributed with this report. The digital state geologic maps for the states involved in this report are separate products, and two of them are produced by individual state agencies, which may be legally and/or financially responsible for this data. However, the spatial datasets for maps discussed in this report are available to the public. Questions regarding the distribution, sale, and use of individual state geologic maps should be sent to the respective state agency. We do provide suggestions for obtaining and formatting the spatial data to make it compatible with data in this report. See section ‘Obtaining and Formatting Spatial Data’ in the PDF version of the report.
Benchmarking Using Basic DBMS Operations

NASA Astrophysics Data System (ADS)

Crolotte, Alain; Ghazal, Ahmad

The TPC-H benchmark proved to be successful in the decision support area. Many commercial database vendors and their related hardware vendors used these benchmarks to show the superiority and competitive edge of their products. However, over time, the TPC-H became less representative of industry trends as vendors keep tuning their database to this benchmark-specific workload. In this paper, we present XMarq, a simple benchmark framework that can be used to compare various software/hardware combinations. Our benchmark model is currently composed of 25 queries that measure the performance of basic operations such as scans, aggregations, joins and index access. This benchmark model is based on the TPC-H data model due to its maturity and well-understood data generation capability. We also propose metrics to evaluate single-system performance and compare two systems. Finally we illustrate the effectiveness of this model by showing experimental results comparing two systems under different conditions.
Application of real-time database to LAMOST control system

NASA Astrophysics Data System (ADS)

Xu, Lingzhe; Xu, Xinqi

2004-09-01

The QNX based real time database is one of main features for Large sky Area Multi-Object fiber Spectroscopic Telescope's (LAMOST) control system, which serves as a storage and platform for data flow, recording and updating timely various status of moving components in the telescope structure as well as environmental parameters around it. The database joins harmonically in the administration of the Telescope Control System (TCS). The paper presents methodology and technique tips in designing the EMPRESS database GUI software package, such as the dynamic creation of control widgets, dynamic query and share memory. The seamless connection between EMPRESS and the graphical development tool of QNX"s Photon Application Builder (PhAB) has been realized, and so have the Windows look and feel yet under Unix-like operating system. In particular, the real time feature of the database is analyzed that satisfies the needs of the control system.
New Capabilities in the Astrophysics Multispectral Archive Search Engine

NASA Astrophysics Data System (ADS)

Cheung, C. Y.; Kelley, S.; Roussopoulos, N.

The Astrophysics Multispectral Archive Search Engine (AMASE) uses object-oriented database techniques to provide a uniform multi-mission and multi-spectral interface to search for data in the distributed archives. We describe our experience of porting AMASE from Illustra object-relational DBMS to the Informix Universal Data Server. New capabilities and utilities have been developed, including a spatial datablade that supports Nearest Neighbor queries.

The PANTHER User Experience

DOE Office of Scientific and Technical Information (OSTI.GOV)

Coram, Jamie L.; Morrow, James D.; Perkins, David Nikolaus

2015-09-01

This document describes the PANTHER R&D Application, a proof-of-concept user interface application developed under the PANTHER Grand Challenge LDRD. The purpose of the application is to explore interaction models for graph analytics, drive algorithmic improvements from an end-user point of view, and support demonstration of PANTHER technologies to potential customers. The R&D Application implements a graph-centric interaction model that exposes analysts to the algorithms contained within the GeoGraphy graph analytics library. Users define geospatial-temporal semantic graph queries by constructing search templates based on nodes, edges, and the constraints among them. Users then analyze the results of the queries using bothmore » geo-spatial and temporal visualizations. Development of this application has made user experience an explicit driver for project and algorithmic level decisions that will affect how analysts one day make use of PANTHER technologies.« less
Spatial modeling of households' knowledge about arsenic pollution in Bangladesh.

PubMed

Sarker, M Mizanur Rahman

2012-04-01

Arsenic in drinking water is an important public health issue in Bangladesh, which is affected by households' knowledge about arsenic threats from their drinking water. In this study, spatial statistical models were used to investigate the determinants and spatial dependence of households' knowledge about arsenic risk. The binary join matrix/binary contiguity matrix and inverse distance spatial weight matrix techniques are used to capture spatial dependence in the data. This analysis extends the spatial model by allowing spatial dependence to vary across divisions and regions. A positive spatial correlation was found in households' knowledge across neighboring districts at district, divisional and regional levels, but the strength of this spatial correlation varies considerably by spatial weight. Literacy rate, daily wage rate of agricultural labor, arsenic status, and percentage of red mark tube well usage in districts were found to contribute positively and significantly to households' knowledge. These findings have policy implications both at regional and national levels in mitigating the present arsenic crisis and to ensure arsenic-free water in Bangladesh. Copyright © 2012 Elsevier Ltd. All rights reserved.
Semantics Enabled Queries in EuroGEOSS: a Discovery Augmentation Approach

NASA Astrophysics Data System (ADS)

Santoro, M.; Mazzetti, P.; Fugazza, C.; Nativi, S.; Craglia, M.

2010-12-01

One of the main challenges in Earth Science Informatics is to build interoperability frameworks which allow users to discover, evaluate, and use information from different scientific domains. This needs to address multidisciplinary interoperability challenges concerning both technological and scientific aspects. From the technological point of view, it is necessary to provide a set of special interoperability arrangement in order to develop flexible frameworks that allow a variety of loosely-coupled services to interact with each other. From a scientific point of view, it is necessary to document clearly the theoretical and methodological assumptions underpinning applications in different scientific domains, and develop cross-domain ontologies to facilitate interdisciplinary dialogue and understanding. In this presentation we discuss a brokering approach that extends the traditional Service Oriented Architecture (SOA) adopted by most Spatial Data Infrastructures (SDIs) to provide the necessary special interoperability arrangements. In the EC-funded EuroGEOSS (A European approach to GEOSS) project, we distinguish among three possible functional brokering components: discovery, access and semantics brokers. This presentation focuses on the semantics broker, the Discovery Augmentation Component (DAC), which was specifically developed to address the three thematic areas covered by the EuroGEOSS project: biodiversity, forestry and drought. The EuroGEOSS DAC federates both semantics (e.g. SKOS repositories) and ISO-compliant geospatial catalog services. The DAC can be queried using common geospatial constraints (i.e. what, where, when, etc.). Two different augmented discovery styles are supported: a) automatic query expansion; b) user assisted query expansion. In the first case, the main discovery steps are: i. the query keywords (the what constraint) are “expanded” with related concepts/terms retrieved from the set of federated semantic services. A default expansion regards the multilinguality relationship; ii. The resulting queries are submitted to the federated catalog services; iii. The DAC performs a “smart” aggregation of the queries results and provides them back to the client. In the second case, the main discovery steps are: i. the user browses the federated semantic repositories and selects the concepts/terms-of-interest; ii. The DAC creates the set of geospatial queries based on the selected concepts/terms and submits them to the federated catalog services; iii. The DAC performs a “smart” aggregation of the queries results and provides them back to the client. A Graphical User Interface (GUI) was also developed for testing and interacting with the DAC. The entire brokering framework is deployed in the context of EuroGEOSS infrastructure and it is used in a couple of GEOSS AIP-3 use scenarios: the “e-Habitat Use Scenario” for the Biodiversity and Climate Change topic, and the “Comprehensive Drought Index Use Scenario” for Water/Drought topic
Improving 3d Spatial Queries Search: Newfangled Technique of Space Filling Curves in 3d City Modeling

NASA Astrophysics Data System (ADS)

Uznir, U.; Anton, F.; Suhaibah, A.; Rahman, A. A.; Mioc, D.

2013-09-01

The advantages of three dimensional (3D) city models can be seen in various applications including photogrammetry, urban and regional planning, computer games, etc.. They expand the visualization and analysis capabilities of Geographic Information Systems on cities, and they can be developed using web standards. However, these 3D city models consume much more storage compared to two dimensional (2D) spatial data. They involve extra geometrical and topological information together with semantic data. Without a proper spatial data clustering method and its corresponding spatial data access method, retrieving portions of and especially searching these 3D city models, will not be done optimally. Even though current developments are based on an open data model allotted by the Open Geospatial Consortium (OGC) called CityGML, its XML-based structure makes it challenging to cluster the 3D urban objects. In this research, we propose an opponent data constellation technique of space-filling curves (3D Hilbert curves) for 3D city model data representation. Unlike previous methods, that try to project 3D or n-dimensional data down to 2D or 3D using Principal Component Analysis (PCA) or Hilbert mappings, in this research, we extend the Hilbert space-filling curve to one higher dimension for 3D city model data implementations. The query performance was tested using a CityGML dataset of 1,000 building blocks and the results are presented in this paper. The advantages of implementing space-filling curves in 3D city modeling will improve data retrieval time by means of optimized 3D adjacency, nearest neighbor information and 3D indexing. The Hilbert mapping, which maps a subinterval of the [0, 1] interval to the corresponding portion of the d-dimensional Hilbert's curve, preserves the Lebesgue measure and is Lipschitz continuous. Depending on the applications, several alternatives are possible in order to cluster spatial data together in the third dimension compared to its clustering in 2D.
Unsupervised Spatial Event Detection in Targeted Domains with Applications to Civil Unrest Modeling

PubMed Central

Zhao, Liang; Chen, Feng; Dai, Jing; Hua, Ting; Lu, Chang-Tien; Ramakrishnan, Naren

2014-01-01

Twitter has become a popular data source as a surrogate for monitoring and detecting events. Targeted domains such as crime, election, and social unrest require the creation of algorithms capable of detecting events pertinent to these domains. Due to the unstructured language, short-length messages, dynamics, and heterogeneity typical of Twitter data streams, it is technically difficult and labor-intensive to develop and maintain supervised learning systems. We present a novel unsupervised approach for detecting spatial events in targeted domains and illustrate this approach using one specific domain, viz. civil unrest modeling. Given a targeted domain, we propose a dynamic query expansion algorithm to iteratively expand domain-related terms, and generate a tweet homogeneous graph. An anomaly identification method is utilized to detect spatial events over this graph by jointly maximizing local modularity and spatial scan statistics. Extensive experiments conducted in 10 Latin American countries demonstrate the effectiveness of the proposed approach. PMID:25350136
TES/Aura L2 Ozone (O3) Nadir V6 (TL2O3NS)

Atmospheric Science Data Center

2018-01-22

TES/Aura L2 Ozone (O3) Nadir (TL2O3NS) News: TES News Join ... Project Title: TES Discipline: Tropospheric Composition Version: V6 Level: L2 Platform: TES/Aura L2 Ozone Spatial Coverage: 5.3 x 8.5 km nadir ...
TES/Aura L2 Ozone (O3) Nadir V6 (TL2O3N)

Atmospheric Science Data Center

2018-01-18

TES/Aura L2 Ozone (O3) Nadir (TL2O3N) News: TES News Join ... Project Title: TES Discipline: Tropospheric Composition Version: V6 Level: L2 Platform: TES/Aura L2 Ozone Spatial Coverage: 5.3 x 8.5 km nadir ...
Using Search Engine Data as a Tool to Predict Syphilis.

PubMed

Young, Sean D; Torrone, Elizabeth A; Urata, John; Aral, Sevgi O

2018-07-01

Researchers have suggested that social media and online search data might be used to monitor and predict syphilis and other sexually transmitted diseases. Because people at risk for syphilis might seek sexual health and risk-related information on the internet, we investigated associations between internet state-level search query data (e.g., Google Trends) and reported weekly syphilis cases. We obtained weekly counts of reported primary and secondary syphilis for 50 states from 2012 to 2014 from the US Centers for Disease Control and Prevention. We collected weekly internet search query data regarding 25 risk-related keywords from 2012 to 2014 for 50 states using Google Trends. We joined 155 weeks of Google Trends data with 1-week lag to weekly syphilis data for a total of 7750 data points. Using the least absolute shrinkage and selection operator, we trained three linear mixed models on the first 10 weeks of each year. We validated models for 2012 and 2014 for the following 52 weeks and the 2014 model for the following 42 weeks. The models, consisting of different sets of keyword predictors for each year, accurately predicted 144 weeks of primary and secondary syphilis counts for each state, with an overall average R of 0.9 and overall average root mean squared error of 4.9. We used Google Trends search data from the prior week to predict cases of syphilis in the following weeks for each state. Further research could explore how search data could be integrated into public health monitoring systems.
Development of Intelligent Unmanned Systems

DTIC Science & Technology

2011-05-01

statistical analysis on the terrain map. The data points are stored in the corresponding cells of the Traversability Grid as a linked list of 3-D Cartesian...allowed for multiple configurations of specified data to be as flexible as possible. For example, when an object is being created the knowledge store ...library was also used for querying and storing spatial data . It provided many geometric abstractions necessary such as overlap and intersects
Image subregion querying using color correlograms

DOEpatents

Huang, Jing; Kumar, Shanmugasundaram Ravi; Mitra, Mandar; Zhu, Wei-Jing

2002-01-01

A color correlogram (10) is a representation expressing the spatial correlation of color and distance between pixels in a stored image. The color correlogram (10) may be used to distinguish objects in an image as well as between images in a plurality of images. By intersecting a color correlogram of an image object with correlograms of images to be searched, those images which contain the objects are identified by the intersection correlogram.
PRIVACYGRID: Supporting Anonymous Location Queries in Mobile Environments

DTIC Science & Technology

2007-01-01

continued price reduction of location tracking de- vices, location - based services (LBSs) are widely recognized as an important feature of the future computing... location - based services can operate completely anonymously, such as “when I pass a gas station, alert me with the unit price of the gas”. Others can...Anonymous Usage of Location - Based Services Through Spatial and Tempo- ral Cloaking. In Proceedings of the International Con- ference on Mobile
Mining and Querying Multimedia Data

DTIC Science & Technology

2011-09-29

able to capture more subtle spatial variations such as repetitiveness. Local feature descriptors such as SIFT [74] and SURF [12] have also been widely...empirically set to s = 90%, r = 50%, K = 20, where small variations lead to little perturbation of the output. The pseudo-code of the algorithm is...by constructing a three-layer graph based on clustering outputs, and executing a slight variation of random walk with restart algorithm. It provided
A data model and database for high-resolution pathology analytical image informatics.

PubMed

Wang, Fusheng; Kong, Jun; Cooper, Lee; Pan, Tony; Kurc, Tahsin; Chen, Wenjin; Sharma, Ashish; Niedermayr, Cristobal; Oh, Tae W; Brat, Daniel; Farris, Alton B; Foran, David J; Saltz, Joel

2011-01-01

The systematic analysis of imaged pathology specimens often results in a vast amount of morphological information at both the cellular and sub-cellular scales. While microscopy scanners and computerized analysis are capable of capturing and analyzing data rapidly, microscopy image data remain underutilized in research and clinical settings. One major obstacle which tends to reduce wider adoption of these new technologies throughout the clinical and scientific communities is the challenge of managing, querying, and integrating the vast amounts of data resulting from the analysis of large digital pathology datasets. This paper presents a data model, which addresses these challenges, and demonstrates its implementation in a relational database system. This paper describes a data model, referred to as Pathology Analytic Imaging Standards (PAIS), and a database implementation, which are designed to support the data management and query requirements of detailed characterization of micro-anatomic morphology through many interrelated analysis pipelines on whole-slide images and tissue microarrays (TMAs). (1) Development of a data model capable of efficiently representing and storing virtual slide related image, annotation, markup, and feature information. (2) Development of a database, based on the data model, capable of supporting queries for data retrieval based on analysis and image metadata, queries for comparison of results from different analyses, and spatial queries on segmented regions, features, and classified objects. The work described in this paper is motivated by the challenges associated with characterization of micro-scale features for comparative and correlative analyses involving whole-slides tissue images and TMAs. Technologies for digitizing tissues have advanced significantly in the past decade. Slide scanners are capable of producing high-magnification, high-resolution images from whole slides and TMAs within several minutes. Hence, it is becoming increasingly feasible for basic, clinical, and translational research studies to produce thousands of whole-slide images. Systematic analysis of these large datasets requires efficient data management support for representing and indexing results from hundreds of interrelated analyses generating very large volumes of quantifications such as shape and texture and of classifications of the quantified features. We have designed a data model and a database to address the data management requirements of detailed characterization of micro-anatomic morphology through many interrelated analysis pipelines. The data model represents virtual slide related image, annotation, markup and feature information. The database supports a wide range of metadata and spatial queries on images, annotations, markups, and features. We currently have three databases running on a Dell PowerEdge T410 server with CentOS 5.5 Linux operating system. The database server is IBM DB2 Enterprise Edition 9.7.2. The set of databases consists of 1) a TMA database containing image analysis results from 4740 cases of breast cancer, with 641 MB storage size; 2) an algorithm validation database, which stores markups and annotations from two segmentation algorithms and two parameter sets on 18 selected slides, with 66 GB storage size; and 3) an in silico brain tumor study database comprising results from 307 TCGA slides, with 365 GB storage size. The latter two databases also contain human-generated annotations and markups for regions and nuclei. Modeling and managing pathology image analysis results in a database provide immediate benefits on the value and usability of data in a research study. The database provides powerful query capabilities, which are otherwise difficult or cumbersome to support by other approaches such as programming languages. Standardized, semantic annotated data representation and interfaces also make it possible to more efficiently share image data and analysis results.
Research and development of web oriented remote sensing image publication system based on Servlet technique

NASA Astrophysics Data System (ADS)

Juanle, Wang; Shuang, Li; Yunqiang, Zhu

2005-10-01

According to the requirements of China National Scientific Data Sharing Program (NSDSP), the research and development of web oriented RS Image Publication System (RSIPS) is based on Java Servlet technique. The designing of RSIPS framework is composed of 3 tiers, which is Presentation Tier, Application Service Tier and Data Resource Tier. Presentation Tier provides user interface for data query, review and download. For the convenience of users, visual spatial query interface is included. Served as a middle tier, Application Service Tier controls all actions between users and databases. Data Resources Tier stores RS images in file and relationship databases. RSIPS is developed with cross platform programming based on Java Servlet tools, which is one of advanced techniques in J2EE architecture. RSIPS's prototype has been developed and applied in the geosciences clearinghouse practice which is among the experiment units of NSDSP in China.
Design and development of linked data from the National Map

USGS Publications Warehouse

Usery, E. Lynn; Varanka, Dalia E.

2012-01-01

The development of linked data on the World-Wide Web provides the opportunity for the U.S. Geological Survey (USGS) to supply its extensive volumes of geospatial data, information, and knowledge in a machine interpretable form and reach users and applications that heretofore have been unavailable. To pilot a process to take advantage of this opportunity, the USGS is developing an ontology for The National Map and converting selected data from nine research test areas to a Semantic Web format to support machine processing and linked data access. In a case study, the USGS has developed initial methods for legacy vector and raster formatted geometry, attributes, and spatial relationships to be accessed in a linked data environment maintaining the capability to generate graphic or image output from semantic queries. The description of an initial USGS approach to developing ontology, linked data, and initial query capability from The National Map databases is presented.
A 30-meter spatial database for the nation's forests

Treesearch

Raymond L. Czaplewski

2002-01-01

The FIA vision for remote sensing originated in 1992 with the Blue Ribbon Panel on FIA, and it has since evolved into an ambitious performance target for 2003. FIA is joining a consortium of Federal agencies to map the Nation's land cover. FIA field data will help produce a seamless, standardized, national geospatial database for forests at the scale of 30-m...
Study of Automatic Image Rectification and Registration of Scanned Historical Aerial Photographs

NASA Astrophysics Data System (ADS)

Chen, H. R.; Tseng, Y. H.

2016-06-01

Historical aerial photographs directly provide good evidences of past times. The Research Center for Humanities and Social Sciences (RCHSS) of Taiwan Academia Sinica has collected and scanned numerous historical maps and aerial images of Taiwan and China. Some maps or images have been geo-referenced manually, but most of historical aerial images have not been registered since there are no GPS or IMU data for orientation assisting in the past. In our research, we developed an automatic process of matching historical aerial images by SIFT (Scale Invariant Feature Transform) for handling the great quantity of images by computer vision. SIFT is one of the most popular method of image feature extracting and matching. This algorithm extracts extreme values in scale space into invariant image features, which are robust to changing in rotation scale, noise, and illumination. We also use RANSAC (Random sample consensus) to remove outliers, and obtain good conjugated points between photographs. Finally, we manually add control points for registration through least square adjustment based on collinear equation. In the future, we can use image feature points of more photographs to build control image database. Every new image will be treated as query image. If feature points of query image match the features in database, it means that the query image probably is overlapped with control images.With the updating of database, more and more query image can be matched and aligned automatically. Other research about multi-time period environmental changes can be investigated with those geo-referenced temporal spatial data.
Evaluation of Potential LSST Spatial Indexing Strategies

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nikolaev, S; Abdulla, G; Matzke, R

2006-10-13

The LSST requirement for producing alerts in near real-time, and the fact that generating an alert depends on knowing the history of light variations for a given sky position, both imply that the clustering information for all detections is available at any time during the survey. Therefore, any data structure describing clustering of detections in LSST needs to be continuously updated, even as new detections are arriving from the pipeline. We call this use case ''incremental clustering'', to reflect this continuous updating of clustering information. This document describes the evaluation results for several potential LSST incremental clustering strategies, using: (1)more » Neighbors table and zone optimization to store spatial clusters (a.k.a. Jim Grey's, or SDSS algorithm); (2) MySQL built-in R-tree implementation; (3) an external spatial index library which supports a query interface.« less
A geodata warehouse: Using denormalisation techniques as a tool for delivering spatially enabled integrated geological information to geologists

NASA Astrophysics Data System (ADS)

Kingdon, Andrew; Nayembil, Martin L.; Richardson, Anne E.; Smith, A. Graham

2016-11-01

New requirements to understand geological properties in three dimensions have led to the development of PropBase, a data structure and delivery tools to deliver this. At the BGS, relational database management systems (RDBMS) has facilitated effective data management using normalised subject-based database designs with business rules in a centralised, vocabulary controlled, architecture. These have delivered effective data storage in a secure environment. However, isolated subject-oriented designs prevented efficient cross-domain querying of datasets. Additionally, the tools provided often did not enable effective data discovery as they struggled to resolve the complex underlying normalised structures providing poor data access speeds. Users developed bespoke access tools to structures they did not fully understand sometimes delivering them incorrect results. Therefore, BGS has developed PropBase, a generic denormalised data structure within an RDBMS to store property data, to facilitate rapid and standardised data discovery and access, incorporating 2D and 3D physical and chemical property data, with associated metadata. This includes scripts to populate and synchronise the layer with its data sources through structured input and transcription standards. A core component of the architecture includes, an optimised query object, to deliver geoscience information from a structure equivalent to a data warehouse. This enables optimised query performance to deliver data in multiple standardised formats using a web discovery tool. Semantic interoperability is enforced through vocabularies combined from all data sources facilitating searching of related terms. PropBase holds 28.1 million spatially enabled property data points from 10 source databases incorporating over 50 property data types with a vocabulary set that includes 557 property terms. By enabling property data searches across multiple databases PropBase has facilitated new scientific research, previously considered impractical. PropBase is easily extended to incorporate 4D data (time series) and is providing a baseline for new "big data" monitoring projects.
Paradise: A Parallel Information System for EOSDIS

NASA Technical Reports Server (NTRS)

DeWitt, David

1996-01-01

The Paradise project was begun-in 1993 in order to explore the application of the parallel and object-oriented database system technology developed as a part of the Gamma, Exodus. and Shore projects to the design and development of a scaleable, geo-spatial database system for storing both massive spatial and satellite image data sets. Paradise is based on an object-relational data model. In addition to the standard attribute types such as integers, floats, strings and time, Paradise also provides a set of and multimedia data types, designed to facilitate the storage and querying of complex spatial and multimedia data sets. An individual tuple can contain any combination of this rich set of data types. For example, in the EOSDIS context, a tuple might mix terrain and map data for an area along with the latest satellite weather photo of the area. The use of a geo-spatial metaphor simplifies the task of fusing disparate forms of data from multiple data sources including text, image, map, and video data sets.

Effective 3-D surface modeling for geographic information systems

NASA Astrophysics Data System (ADS)

Yüksek, K.; Alparslan, M.; Mendi, E.

2013-11-01

In this work, we propose a dynamic, flexible and interactive urban digital terrain platform (DTP) with spatial data and query processing capabilities of Geographic Information Systems (GIS), multimedia database functionality and graphical modeling infrastructure. A new data element, called Geo-Node, which stores image, spatial data and 3-D CAD objects is developed using an efficient data structure. The system effectively handles data transfer of Geo-Nodes between main memory and secondary storage with an optimized Directional Replacement Policy (DRP) based buffer management scheme. Polyhedron structures are used in Digital Surface Modeling (DSM) and smoothing process is performed by interpolation. The experimental results show that our framework achieves high performance and works effectively with urban scenes independent from the amount of spatial data and image size. The proposed platform may contribute to the development of various applications such as Web GIS systems based on 3-D graphics standards (e.g. X3-D and VRML) and services which integrate multi-dimensional spatial information and satellite/aerial imagery.
Effective 3-D surface modeling for geographic information systems

NASA Astrophysics Data System (ADS)

Yüksek, K.; Alparslan, M.; Mendi, E.

2016-01-01

In this work, we propose a dynamic, flexible and interactive urban digital terrain platform with spatial data and query processing capabilities of geographic information systems, multimedia database functionality and graphical modeling infrastructure. A new data element, called Geo-Node, which stores image, spatial data and 3-D CAD objects is developed using an efficient data structure. The system effectively handles data transfer of Geo-Nodes between main memory and secondary storage with an optimized directional replacement policy (DRP) based buffer management scheme. Polyhedron structures are used in digital surface modeling and smoothing process is performed by interpolation. The experimental results show that our framework achieves high performance and works effectively with urban scenes independent from the amount of spatial data and image size. The proposed platform may contribute to the development of various applications such as Web GIS systems based on 3-D graphics standards (e.g., X3-D and VRML) and services which integrate multi-dimensional spatial information and satellite/aerial imagery.
Assignment of protein sequences to existing domain and family classification systems: Pfam and the PDB.

PubMed

Xu, Qifang; Dunbrack, Roland L

2012-11-01

Automating the assignment of existing domain and protein family classifications to new sets of sequences is an important task. Current methods often miss assignments because remote relationships fail to achieve statistical significance. Some assignments are not as long as the actual domain definitions because local alignment methods often cut alignments short. Long insertions in query sequences often erroneously result in two copies of the domain assigned to the query. Divergent repeat sequences in proteins are often missed. We have developed a multilevel procedure to produce nearly complete assignments of protein families of an existing classification system to a large set of sequences. We apply this to the task of assigning Pfam domains to sequences and structures in the Protein Data Bank (PDB). We found that HHsearch alignments frequently scored more remotely related Pfams in Pfam clans higher than closely related Pfams, thus, leading to erroneous assignment at the Pfam family level. A greedy algorithm allowing for partial overlaps was, thus, applied first to sequence/HMM alignments, then HMM-HMM alignments and then structure alignments, taking care to join partial alignments split by large insertions into single-domain assignments. Additional assignment of repeat Pfams with weaker E-values was allowed after stronger assignments of the repeat HMM. Our database of assignments, presented in a database called PDBfam, contains Pfams for 99.4% of chains >50 residues. The Pfam assignment data in PDBfam are available at http://dunbrack2.fccc.edu/ProtCid/PDBfam, which can be searched by PDB codes and Pfam identifiers. They will be updated regularly.
Cold plasma welding of polyaniline nanofibers with enhanced electrical and mechanical properties.

PubMed

Ye, Dong; Yu, Yao; Liu, Lin; Lu, Xinpei; Wu, Yue

2015-12-11

Joining conducting polymer (CP) nanofibers into an interconnected porous network can result in good mechanical and electrical contacts between nanofibers that can be beneficial for the high performance of CP-based devices. Here, we demonstrate the cold welding of polyaniline (PAni) nanofiber loose ends with cold plasma. The room-temperature and atmospheric-pressure helium micro-plasma jet launches highly charged ion bullets at a PAni nanofiber target with high precision and the highly charged ion bullet selectively induces field emission at the sharp nanofiber loose ends. This technique joins nanofiber tips without altering the morphology of the film and protonation thus leading to significantly enhanced electrical and mechanical properties. In addition, this technique has high spatial resolution and is able to selectively weld and dope regions of nanofiber film with promising novel device applications.
Cold plasma welding of polyaniline nanofibers with enhanced electrical and mechanical properties

NASA Astrophysics Data System (ADS)

Ye, Dong; Yu, Yao; Liu, Lin; Lu, Xinpei; Wu, Yue

2015-12-01

Joining conducting polymer (CP) nanofibers into an interconnected porous network can result in good mechanical and electrical contacts between nanofibers that can be beneficial for the high performance of CP-based devices. Here, we demonstrate the cold welding of polyaniline (PAni) nanofiber loose ends with cold plasma. The room-temperature and atmospheric-pressure helium micro-plasma jet launches highly charged ion bullets at a PAni nanofiber target with high precision and the highly charged ion bullet selectively induces field emission at the sharp nanofiber loose ends. This technique joins nanofiber tips without altering the morphology of the film and protonation thus leading to significantly enhanced electrical and mechanical properties. In addition, this technique has high spatial resolution and is able to selectively weld and dope regions of nanofiber film with promising novel device applications.
Methods and apparatus for microwave tissue welding for wound closure

NASA Technical Reports Server (NTRS)

Ngo, Phong H. (Inventor); Dusl, John R. (Inventor); Arndt, G. Dickey (Inventor); Phan, Chau T. (Inventor); Byerly, Diane L. (Inventor); Sognier, Marguerite A. (Inventor); Carl, James R. (Inventor)

2013-01-01

Methods and apparatus for joining biological tissue together are provided. In at least one specific embodiment, a method for joining biological tissue together can include applying a biological solder on a wound. A barrier layer can be disposed on the biological solder. An antenna can be located in proximate spatial relationship to the barrier layer. An impedance of the antenna can be matched to an impedance of the wound. Microwaves from a signal generator can be transmitted through the antenna to weld two or more biological tissue pieces of the wound together. A power of the microwaves can be adjusted by a control circuit disposed between the antenna and the signal generator. The heating profile within the tissue may be adjusted and controlled by the placement of metallic microspheres in or around the wound.
Receptor signaling clusters in the immune synapse(in eng)

DOE PAGES

Dustin, Michael L.; Groves, Jay T.

2012-02-23

Signaling processes between various immune cells involve large-scale spatial reorganization of receptors and signaling molecules within the cell-cell junction. These structures, now collectively referred to as immune synapses, interleave physical and mechanical processes with the cascades of chemical reactions that constitute signal transduction systems. Molecular level clustering, spatial exclusion, and long-range directed transport are all emerging as key regulatory mechanisms. The study of these processes is drawing researchers from physical sciences to join the effort and represents a rapidly growing branch of biophysical chemistry. Furthermore, recent advances in physical and quantitative analyses of signaling within the immune synapses are reviewedmore » here.« less
Receptor signaling clusters in the immune synapse (in eng)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dustin, Michael L.; Groves, Jay T.

2012-02-23

Signaling processes between various immune cells involve large-scale spatial reorganization of receptors and signaling molecules within the cell-cell junction. These structures, now collectively referred to as immune synapses, interleave physical and mechanical processes with the cascades of chemical reactions that constitute signal transduction systems. Molecular level clustering, spatial exclusion, and long-range directed transport are all emerging as key regulatory mechanisms. The study of these processes is drawing researchers from physical sciences to join the effort and represents a rapidly growing branch of biophysical chemistry. Furthermore, recent advances in physical and quantitative analyses of signaling within the immune synapses are reviewedmore » here.« less
Data Rods: High Speed, Time-Series Analysis of Massive Cryospheric Data Sets Using Object-Oriented Database Methods

NASA Astrophysics Data System (ADS)

Liang, Y.; Gallaher, D. W.; Grant, G.; Lv, Q.

2011-12-01

Change over time, is the central driver of climate change detection. The goal is to diagnose the underlying causes, and make projections into the future. In an effort to optimize this process we have developed the Data Rod model, an object-oriented approach that provides the ability to query grid cell changes and their relationships to neighboring grid cells through time. The time series data is organized in time-centric structures called "data rods." A single data rod can be pictured as the multi-spectral data history at one grid cell: a vertical column of data through time. This resolves the long-standing problem of managing time-series data and opens new possibilities for temporal data analysis. This structure enables rapid time- centric analysis at any grid cell across multiple sensors and satellite platforms. Collections of data rods can be spatially and temporally filtered, statistically analyzed, and aggregated for use with pattern matching algorithms. Likewise, individual image pixels can be extracted to generate multi-spectral imagery at any spatial and temporal location. The Data Rods project has created a series of prototype databases to store and analyze massive datasets containing multi-modality remote sensing data. Using object-oriented technology, this method overcomes the operational limitations of traditional relational databases. To demonstrate the speed and efficiency of time-centric analysis using the Data Rods model, we have developed a sea ice detection algorithm. This application determines the concentration of sea ice in a small spatial region across a long temporal window. If performed using traditional analytical techniques, this task would typically require extensive data downloads and spatial filtering. Using Data Rods databases, the exact spatio-temporal data set is immediately available No extraneous data is downloaded, and all selected data querying occurs transparently on the server side. Moreover, fundamental statistical calculations such as running averages are easily implemented against the time-centric columns of data.
Impact of Short-Range Forces on Defect Production from High-Energy Collisions

DOE PAGES

Stoller, R. E.; Tamm, A.; Béland, L. K.; ...

2016-04-25

Primary radiation damage formation in solid materials typically involves collisions between atoms that have up to a few hundred keV of kinetic energy. The distance between two colliding atoms can approach 0.05 nm during these collisions. At such small atomic separations, force fields fitted to equilibrium properties tend to significantly underestimate the potential energy of the colliding dimer. To enable molecular dynamics simulations of high-energy collisions, it is common practice to use a screened Coulomb force field to describe the interactions and to smoothly join this to the equilibrium force field at a suitable interatomic spacing. But, there is nomore » accepted standard method for choosing the parameters used in the joining process, and our results prove that defect production is sensitive to how the force fields are linked. A new procedure is presented that involves the use of ab initio calculations to determine the magnitude and spatial dependence of the pair interactions at intermediate distances, along with systematic criteria for choosing the joining parameters. Results are presented for the case of nickel, which demonstrate the use and validity of the procedure.« less
Adding intelligence to scientific data management

NASA Technical Reports Server (NTRS)

Campbell, William J.; Short, Nicholas M., Jr.; Treinish, Lloyd A.

1989-01-01

NASA plans to solve some of the problems of handling large-scale scientific data bases by turning to artificial intelligence (AI) are discussed. The growth of the information glut and the ways that AI can help alleviate the resulting problems are reviewed. The employment of the Intelligent User Interface prototype, where the user will generate his own natural language query with the assistance of the system, is examined. Spatial data management, scientific data visualization, and data fusion are discussed.
Asking better questions: How presentation formats influence information search.

PubMed

Wu, Charley M; Meder, Björn; Filimon, Flavia; Nelson, Jonathan D

2017-08-01

While the influence of presentation formats have been widely studied in Bayesian reasoning tasks, we present the first systematic investigation of how presentation formats influence information search decisions. Four experiments were conducted across different probabilistic environments, where subjects (N = 2,858) chose between 2 possible search queries, each with binary probabilistic outcomes, with the goal of maximizing classification accuracy. We studied 14 different numerical and visual formats for presenting information about the search environment, constructed across 6 design features that have been prominently related to improvements in Bayesian reasoning accuracy (natural frequencies, posteriors, complement, spatial extent, countability, and part-to-whole information). The posterior variants of the icon array and bar graph formats led to the highest proportion of correct responses, and were substantially better than the standard probability format. Results suggest that presenting information in terms of posterior probabilities and visualizing natural frequencies using spatial extent (a perceptual feature) were especially helpful in guiding search decisions, although environments with a mixture of probabilistic and certain outcomes were challenging across all formats. Subjects who made more accurate probability judgments did not perform better on the search task, suggesting that simple decision heuristics may be used to make search decisions without explicitly applying Bayesian inference to compute probabilities. We propose a new take-the-difference (TTD) heuristic that identifies the accuracy-maximizing query without explicit computation of posterior probabilities. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
A Rule-Based Spatial Reasoning Approach for OpenStreetMap Data Quality Enrichment; Case Study of Routing and Navigation

PubMed Central

2017-01-01

Finding relevant geospatial information is increasingly critical because of the growing volume of geospatial data available within the emerging “Big Data” era. Users are expecting that the availability of massive datasets will create more opportunities to uncover hidden information and answer more complex queries. This is especially the case with routing and navigation services where the ability to retrieve points of interest and landmarks make the routing service personalized, precise, and relevant. In this paper, we propose a new geospatial information approach that enables the retrieval of implicit information, i.e., geospatial entities that do not exist explicitly in the available source. We present an information broker that uses a rule-based spatial reasoning algorithm to detect topological relations. The information broker is embedded into a framework where annotations and mappings between OpenStreetMap data attributes and external resources, such as taxonomies, support the enrichment of queries to improve the ability of the system to retrieve information. Our method is tested with two case studies that leads to enriching the completeness of OpenStreetMap data with footway crossing points-of-interests as well as building entrances for routing and navigation purposes. It is concluded that the proposed approach can uncover implicit entities and contribute to extract required information from the existing datasets. PMID:29088125
A Columnar Storage Strategy with Spatiotemporal Index for Big Climate Data

NASA Astrophysics Data System (ADS)

Hu, F.; Bowen, M. K.; Li, Z.; Schnase, J. L.; Duffy, D.; Lee, T. J.; Yang, C. P.

2015-12-01

Large collections of observational, reanalysis, and climate model output data may grow to as large as a 100 PB in the coming years, so climate dataset is in the Big Data domain, and various distributed computing frameworks have been utilized to address the challenges by big climate data analysis. However, due to the binary data format (NetCDF, HDF) with high spatial and temporal dimensions, the computing frameworks in Apache Hadoop ecosystem are not originally suited for big climate data. In order to make the computing frameworks in Hadoop ecosystem directly support big climate data, we propose a columnar storage format with spatiotemporal index to store climate data, which will support any project in the Apache Hadoop ecosystem (e.g. MapReduce, Spark, Hive, Impala). With this approach, the climate data will be transferred into binary Parquet data format, a columnar storage format, and spatial and temporal index will be built and attached into the end of Parquet files to enable real-time data query. Then such climate data in Parquet data format could be available to any computing frameworks in Hadoop ecosystem. The proposed approach is evaluated using the NASA Modern-Era Retrospective Analysis for Research and Applications (MERRA) climate reanalysis dataset. Experimental results show that this approach could efficiently overcome the gap between the big climate data and the distributed computing frameworks, and the spatiotemporal index could significantly accelerate data querying and processing.
DCMS: A data analytics and management system for molecular simulation.

PubMed

Kumar, Anand; Grupcev, Vladimir; Berrada, Meryem; Fogarty, Joseph C; Tu, Yi-Cheng; Zhu, Xingquan; Pandit, Sagar A; Xia, Yuni

Molecular Simulation (MS) is a powerful tool for studying physical/chemical features of large systems and has seen applications in many scientific and engineering domains. During the simulation process, the experiments generate a very large number of atoms and intend to observe their spatial and temporal relationships for scientific analysis. The sheer data volumes and their intensive interactions impose significant challenges for data accessing, managing, and analysis. To date, existing MS software systems fall short on storage and handling of MS data, mainly because of the missing of a platform to support applications that involve intensive data access and analytical process. In this paper, we present the database-centric molecular simulation (DCMS) system our team developed in the past few years. The main idea behind DCMS is to store MS data in a relational database management system (DBMS) to take advantage of the declarative query interface ( i.e. , SQL), data access methods, query processing, and optimization mechanisms of modern DBMSs. A unique challenge is to handle the analytical queries that are often compute-intensive. For that, we developed novel indexing and query processing strategies (including algorithms running on modern co-processors) as integrated components of the DBMS. As a result, researchers can upload and analyze their data using efficient functions implemented inside the DBMS. Index structures are generated to store analysis results that may be interesting to other users, so that the results are readily available without duplicating the analysis. We have developed a prototype of DCMS based on the PostgreSQL system and experiments using real MS data and workload show that DCMS significantly outperforms existing MS software systems. We also used it as a platform to test other data management issues such as security and compression.
A case study in adaptable and reusable infrastructure at the Keck Observatory Archive: VO interfaces, moving targets, and more

NASA Astrophysics Data System (ADS)

Berriman, G. Bruce; Cohen, Richard W.; Colson, Andrew; Gelino, Christopher R.; Good, John C.; Kong, Mihseh; Laity, Anastasia C.; Mader, Jeffrey A.; Swain, Melanie A.; Tran, Hien D.; Wang, Shin-Ywan

2016-08-01

The Keck Observatory Archive (KOA) (https://koa.ipac.caltech.edu) curates all observations acquired at the W. M. Keck Observatory (WMKO) since it began operations in 1994, including data from eight active instruments and two decommissioned instruments. The archive is a collaboration between WMKO and the NASA Exoplanet Science Institute (NExScI). Since its inception in 2004, the science information system used at KOA has adopted an architectural approach that emphasizes software re-use and adaptability. This paper describes how KOA is currently leveraging and extending open source software components to develop new services and to support delivery of a complete set of instrument metadata, which will enable more sophisticated and extensive queries than currently possible. In August 2015, KOA deployed a program interface to discover public data from all instruments equipped with an imaging mode. The interface complies with version 2 of the Simple Imaging Access Protocol (SIAP), under development by the International Virtual Observatory Alliance (IVOA), which defines a standard mechanism for discovering images through spatial queries. The heart of the KOA service is an R-tree-based, database-indexing mechanism prototyped by the Virtual Astronomical Observatory (VAO) and further developed by the Montage Image Mosaic project, designed to provide fast access to large imaging data sets as a first step in creating wide-area image mosaics (such as mosaics of subsets of the 4.7 million images of the SDSS DR9 release). The KOA service uses the results of the spatial R-tree search to create an SQLite data database for further relational filtering. The service uses a JSON configuration file to describe the association between instrument parameters and the service query parameters, and to make it applicable beyond the Keck instruments. The images generated at the Keck telescope usually do not encode the image footprints as WCS fields in the FITS file headers. Because SIAP searches are spatial, much of the effort in developing the program interface involved processing the instrument and telescope parameters to understand how accurately we can derive the WCS information for each instrument. This knowledge is now being fed back into the KOA databases as part of a program to include complete metadata information for all imaging observations. The R-tree program was itself extended to support temporal (in addition to spatial) indexing, in response to requests from the planetary science community for a search engine to discover observations of Solar System objects. With this 3D-indexing scheme, the service performs very fast time and spatial matches between the target ephemerides, obtained from the JPL SPICE service. Our experiments indicate these matches can be more than 100 times faster than when separating temporal and spatial searches. Images of the tracks of the moving targets, overlaid with the image footprints, are computed with a new command-line visualization tool, mViewer, released with the Montage distribution. The service is currently in test and will be released in late summer 2016.
Geoscience information integration and visualization research of Shandong Province, China based on ArcGIS engine

NASA Astrophysics Data System (ADS)

Xu, Mingzhu; Gao, Zhiqiang; Ning, Jicai

2014-10-01

To improve the access efficiency of geoscience data, efficient data model and storage solutions should be used. Geoscience data is usually classified by format or coordinate system in existing storage solutions. When data is large, it is not conducive to search the geographic features. In this study, a geographical information integration system of Shandong province, China was developed based on the technology of ArcGIS Engine, .NET, and SQL Server. It uses Geodatabase spatial data model and ArcSDE to organize and store spatial and attribute data and establishes geoscience database of Shangdong. Seven function modules were designed: map browse, database and subject management, layer control, map query, spatial analysis and map symbolization. The system's characteristics of can be browsed and managed by geoscience subjects make the system convenient for geographic researchers and decision-making departments to use the data.
Study of Earthquake Disaster Prediction System of Langfang city Based on GIS

NASA Astrophysics Data System (ADS)

Huang, Meng; Zhang, Dian; Li, Pan; Zhang, YunHui; Zhang, RuoFei

2017-07-01

In this paper, according to the status of China’s need to improve the ability of earthquake disaster prevention, this paper puts forward the implementation plan of earthquake disaster prediction system of Langfang city based on GIS. Based on the GIS spatial database, coordinate transformation technology, GIS spatial analysis technology and PHP development technology, the seismic damage factor algorithm is used to predict the damage of the city under different intensity earthquake disaster conditions. The earthquake disaster prediction system of Langfang city is based on the B / S system architecture. Degree and spatial distribution and two-dimensional visualization display, comprehensive query analysis and efficient auxiliary decision-making function to determine the weak earthquake in the city and rapid warning. The system has realized the transformation of the city’s earthquake disaster reduction work from static planning to dynamic management, and improved the city’s earthquake and disaster prevention capability.
A biomedical information system for retrieval and manipulation of NHANES data.

PubMed

Mukherjee, Sukrit; Martins, David; Norris, Keith C; Jenders, Robert A

2013-01-01

The retrieval and manipulation of data from large public databases like the U.S. National Health and Nutrition Examination Survey (NHANES) may require sophisticated statistical software and significant expertise that may be unavailable in the university setting. In response, we have developed the Data Retrieval And Manipulation System (DReAMS), an automated information system to handle all processes of data extraction and cleaning and then joining different subsets to produce analysis-ready output. The system is a browser-based data warehouse application in which the input data from flat files or operational systems are aggregated in a structured way so that the desired data can be read, recoded, queried and extracted efficiently. The current pilot implementation of the system provides access to a limited amount of NHANES database. We plan to increase the amount of data available through the system in the near future and to extend the techniques to other large databases from CDU archive with a current holding of about 53 databases.
galaxie--CGI scripts for sequence identification through automated phylogenetic analysis.

PubMed

Nilsson, R Henrik; Larsson, Karl-Henrik; Ursing, Björn M

2004-06-12

The prevalent use of similarity searches like BLAST to identify sequences and species implicitly assumes the reference database to be of extensive sequence sampling. This is often not the case, restraining the correctness of the outcome as a basis for sequence identification. Phylogenetic inference outperforms similarity searches in retrieving correct phylogenies and consequently sequence identities, and a project was initiated to design a freely available script package for sequence identification through automated Web-based phylogenetic analysis. Three CGI scripts were designed to facilitate qualified sequence identification from a Web interface. Query sequences are aligned to pre-made alignments or to alignments made by ClustalW with entries retrieved from a BLAST search. The subsequent phylogenetic analysis is based on the PHYLIP package for inferring neighbor-joining and parsimony trees. The scripts are highly configurable. A service installation and a version for local use are found at http://andromeda.botany.gu.se/galaxiewelcome.html and http://galaxie.cgb.ki.se

Intraoperative virtual brain counseling

NASA Astrophysics Data System (ADS)

Jiang, Zhaowei; Grosky, William I.; Zamorano, Lucia J.; Muzik, Otto; Diaz, Fernando

1997-06-01

Our objective is to offer online real-tim e intelligent guidance to the neurosurgeon. Different from traditional image-guidance technologies that offer intra-operative visualization of medical images or atlas images, virtual brain counseling goes one step further. It can distinguish related brain structures and provide information about them intra-operatively. Virtual brain counseling is the foundation for surgical planing optimization and on-line surgical reference. It can provide a warning system that alerts the neurosurgeon if the chosen trajectory will pass through eloquent brain areas. In order to fulfill this objective, tracking techniques are involved for intra- operativity. Most importantly, a 3D virtual brian environment, different from traditional 3D digitized atlases, is an object-oriented model of the brain that stores information about different brain structures together with their elated information. An object-oriented hierarchical hyper-voxel space (HHVS) is introduced to integrate anatomical and functional structures. Spatial queries based on position of interest, line segment of interest, and volume of interest are introduced in this paper. The virtual brain environment is integrated with existing surgical pre-planning and intra-operative tracking systems to provide information for planning optimization and on-line surgical guidance. The neurosurgeon is alerted automatically if the planned treatment affects any critical structures. Architectures such as HHVS and algorithms, such as spatial querying, normalizing, and warping are presented in the paper. A prototype has shown that the virtual brain is intuitive in its hierarchical 3D appearance. It also showed that HHVS, as the key structure for virtual brain counseling, efficiently integrates multi-scale brain structures based on their spatial relationships.This is a promising development for optimization of treatment plans and online surgical intelligent guidance.
Facilitator and Participant Use of Facebook in a Community-Based Intervention for Parents: The InFANT Extend Program.

PubMed

Downing, Katherine L; Campbell, Karen J; van der Pligt, Paige; Hesketh, Kylie D

2017-12-01

Social networking sites such as Facebook afford new opportunities for behavior-change interventions. Although often used as a recruitment tool, few studies have reported the use of Facebook as an intervention component to facilitate communication between researchers and participants. The aim of this study was to examine facilitator and participant use of a Facebook component of a community-based intervention for parents. First-time parent groups participating in the intervention arm of the extended Infant Feeding, Activity and Nutrition Trial (InFANT Extend) Program were invited to join their own private Facebook group. Facilitators mediated the Facebook groups, using them to share resources with parents, arrange group sessions, and respond to parent queries. Parents completed process evaluation questionnaires reporting on the usefulness of the Facebook groups. A total of 150 parents (from 27 first-time parent groups) joined their private Facebook group. There were a mean of 36.9 (standard deviation 11.1) posts/group, with the majority being facilitator posts. Facilitator administration posts (e.g., arranging upcoming group sessions) had the highest average comments (4.0), followed by participant health/behavior questions (3.5). The majority of participants reported that they enjoyed being a part of their Facebook group; however, the frequency of logging on to their groups' page declined over the 36 months of the trial, as did their perceived usefulness of the group. Facebook appears to be a useful administrative tool in this context. Parents enjoyed being part of their Facebook group, but their reported use of and engagement with Facebook declined over time.
A prototype feature system for feature retrieval using relationships

USGS Publications Warehouse

Choi, J.; Usery, E.L.

2009-01-01

Using a feature data model, geographic phenomena can be represented effectively by integrating space, theme, and time. This paper extends and implements a feature data model that supports query and visualization of geographic features using their non-spatial and temporal relationships. A prototype feature-oriented geographic information system (FOGIS) is then developed and storage of features named Feature Database is designed. Buildings from the U.S. Marine Corps Base, Camp Lejeune, North Carolina and subways in Chicago, Illinois are used to test the developed system. The results of the applications show the strength of the feature data model and the developed system 'FOGIS' when they utilize non-spatial and temporal relationships in order to retrieve and visualize individual features.
The white matter query language: a novel approach for describing human white matter anatomy

PubMed Central

Makris, Nikos; Rathi, Yogesh; Shenton, Martha; Kikinis, Ron; Kubicki, Marek; Westin, Carl-Fredrik

2016-01-01

We have developed a novel method to describe human white matter anatomy using an approach that is both intuitive and simple to use, and which automatically extracts white matter tracts from diffusion MRI volumes. Further, our method simplifies the quantification and statistical analysis of white matter tracts on large diffusion MRI databases. This work reflects the careful syntactical definition of major white matter fiber tracts in the human brain based on a neuroanatomist’s expert knowledge. The framework is based on a novel query language with a near-to-English textual syntax. This query language makes it possible to construct a dictionary of anatomical definitions that describe white matter tracts. The definitions include adjacent gray and white matter regions, and rules for spatial relations. This novel method makes it possible to automatically label white matter anatomy across subjects. After describing this method, we provide an example of its implementation where we encode anatomical knowledge in human white matter for ten association and 15 projection tracts per hemisphere, along with seven commissural tracts. Importantly, this novel method is comparable in accuracy to manual labeling. Finally, we present results applying this method to create a white matter atlas from 77 healthy subjects, and we use this atlas in a small proof-of-concept study to detect changes in association tracts that characterize schizophrenia. PMID:26754839
The white matter query language: a novel approach for describing human white matter anatomy.

PubMed

Wassermann, Demian; Makris, Nikos; Rathi, Yogesh; Shenton, Martha; Kikinis, Ron; Kubicki, Marek; Westin, Carl-Fredrik

2016-12-01

We have developed a novel method to describe human white matter anatomy using an approach that is both intuitive and simple to use, and which automatically extracts white matter tracts from diffusion MRI volumes. Further, our method simplifies the quantification and statistical analysis of white matter tracts on large diffusion MRI databases. This work reflects the careful syntactical definition of major white matter fiber tracts in the human brain based on a neuroanatomist's expert knowledge. The framework is based on a novel query language with a near-to-English textual syntax. This query language makes it possible to construct a dictionary of anatomical definitions that describe white matter tracts. The definitions include adjacent gray and white matter regions, and rules for spatial relations. This novel method makes it possible to automatically label white matter anatomy across subjects. After describing this method, we provide an example of its implementation where we encode anatomical knowledge in human white matter for ten association and 15 projection tracts per hemisphere, along with seven commissural tracts. Importantly, this novel method is comparable in accuracy to manual labeling. Finally, we present results applying this method to create a white matter atlas from 77 healthy subjects, and we use this atlas in a small proof-of-concept study to detect changes in association tracts that characterize schizophrenia.
Nanocubes for real-time exploration of spatiotemporal datasets.

PubMed

Lins, Lauro; Klosowski, James T; Scheidegger, Carlos

2013-12-01

Consider real-time exploration of large multidimensional spatiotemporal datasets with billions of entries, each defined by a location, a time, and other attributes. Are certain attributes correlated spatially or temporally? Are there trends or outliers in the data? Answering these questions requires aggregation over arbitrary regions of the domain and attributes of the data. Many relational databases implement the well-known data cube aggregation operation, which in a sense precomputes every possible aggregate query over the database. Data cubes are sometimes assumed to take a prohibitively large amount of space, and to consequently require disk storage. In contrast, we show how to construct a data cube that fits in a modern laptop's main memory, even for billions of entries; we call this data structure a nanocube. We present algorithms to compute and query a nanocube, and show how it can be used to generate well-known visual encodings such as heatmaps, histograms, and parallel coordinate plots. When compared to exact visualizations created by scanning an entire dataset, nanocube plots have bounded screen error across a variety of scales, thanks to a hierarchical structure in space and time. We demonstrate the effectiveness of our technique on a variety of real-world datasets, and present memory, timing, and network bandwidth measurements. We find that the timings for the queries in our examples are dominated by network and user-interaction latencies.
A Fine-Grained and Privacy-Preserving Query Scheme for Fog Computing-Enhanced Location-Based Service

PubMed Central

Yin, Fan; Tang, Xiaohu

2017-01-01

Location-based services (LBS), as one of the most popular location-awareness applications, has been further developed to achieve low-latency with the assistance of fog computing. However, privacy issues remain a research challenge in the context of fog computing. Therefore, in this paper, we present a fine-grained and privacy-preserving query scheme for fog computing-enhanced location-based services, hereafter referred to as FGPQ. In particular, mobile users can obtain the fine-grained searching result satisfying not only the given spatial range but also the searching content. Detailed privacy analysis shows that our proposed scheme indeed achieves the privacy preservation for the LBS provider and mobile users. In addition, extensive performance analyses and experiments demonstrate that the FGPQ scheme can significantly reduce computational and communication overheads and ensure the low-latency, which outperforms existing state-of-the art schemes. Hence, our proposed scheme is more suitable for real-time LBS searching. PMID:28696395
A Fine-Grained and Privacy-Preserving Query Scheme for Fog Computing-Enhanced Location-Based Service.

PubMed

Yang, Xue; Yin, Fan; Tang, Xiaohu

2017-07-11

Location-based services (LBS), as one of the most popular location-awareness applications, has been further developed to achieve low-latency with the assistance of fog computing. However, privacy issues remain a research challenge in the context of fog computing. Therefore, in this paper, we present a fine-grained and privacy-preserving query scheme for fog computing-enhanced location-based services, hereafter referred to as FGPQ. In particular, mobile users can obtain the fine-grained searching result satisfying not only the given spatial range but also the searching content. Detailed privacy analysis shows that our proposed scheme indeed achieves the privacy preservation for the LBS provider and mobile users. In addition, extensive performance analyses and experiments demonstrate that the FGPQ scheme can significantly reduce computational and communication overheads and ensure the low-latency, which outperforms existing state-of-the art schemes. Hence, our proposed scheme is more suitable for real-time LBS searching.
Age-related differences in the use of spatial and categorical relationships in a visuo-spatial working memory task.

PubMed

Dai, Ruizhi; Thomas, Ayanna K; Taylor, Holly A

2018-01-30

Research examining object identity and location processing in visuo-spatial working memory (VSWM) has yielded inconsistent results on whether age differences exist in VSWM. The present study investigated whether these inconsistencies may stem from age-related differences in VSWM sub-processes, and whether processing of component VSWM information can be facilitated. In two experiments, younger and older adults studied 5 × 5 grids containing five objects in separate locations. In a continuous recognition paradigm, participants were tested on memory for object identity, location, or identity and location information combined. Spatial and categorical relationships were manipulated within grids to provide trial-level facilitation. In Experiment 1, randomizing trial types (location, identity, combination) assured that participants could not predict the information that would be queried. In Experiment 2, blocking trials by type encouraged strategic processing. Thus, we manipulated the nature of the task through object categorical relationship and spatial organization, and trial blocking. Our findings support age-related declines in VSWM. Additionally, grid organizations (categorical and spatial relationships), and trial blocking differentially affected younger and older adults. Younger adults used spatial organizations more effectively whereas older adults demonstrated an association bias. Our finding also suggests that older adults may be less efficient than younger adults in strategically engaging information processing.
Entanglement negativity after a local quantum quench in conformal field theories

NASA Astrophysics Data System (ADS)

Wen, Xueda; Chang, Po-Yao; Ryu, Shinsei

2015-08-01

We study the time evolution of the entanglement negativity after a local quantum quench in (1 + 1)-dimensional conformal field theories (CFTs), which we introduce by suddenly joining two initially decoupled CFTs at their end points. We calculate the negativity evolution for both adjacent intervals and disjoint intervals explicitly. For two adjacent intervals, the entanglement negativity grows logarithmically in time right after the quench. After developing a plateau-like feature, the entanglement negativity drops to the ground-state value. For the case of two spatially separated intervals, a light-cone behavior is observed in the negativity evolution; in addition, a long-range entanglement, which is independent of the distance between two intervals, can be created. Our results agree with the heuristic picture that quasiparticles, which carry entanglement, are emitted from the joining point and propagate freely through the system. Our analytical results are confirmed by numerical calculations based on a critical harmonic chain.
Using STOQS and stoqstoolbox for in situ Measurement Data Access in Matlab

NASA Astrophysics Data System (ADS)

López-Castejón, F.; Schlining, B.; McCann, M. P.

2012-12-01

This poster presents the stoqstoolbox, an extension to Matlab that simplifies the loading of in situ measurement data directly from STOQS databases. STOQS (Spatial Temporal Oceanographic Query System) is a geospatial database tool designed to provide efficient access to data following the CF-NetCDF Discrete Samples Geometries convention. Data are loaded from CF-NetCDF files into a STOQS database where indexes are created on depth, spatial coordinates and other parameters, e.g. platform type. STOQS provides consistent, simple and efficient methods to query for data. For example, we can request all measurements with a standard_name of sea_water_temperature between two times and from between two depths. Data access is simpler because the data are retrieved by parameter irrespective of platform or mission file names. Access is more efficient because data are retrieved via the index on depth and only the requested data are retrieved from the database and transferred into the Matlab workspace. Applications in the stoqstoolbox query the STOQS database via an HTTP REST application programming interface; they follow the Data Access Object pattern, enabling highly customizable query construction. Data are loaded into Matlab structures that clearly indicate latitude, longitude, depth, measurement data value, and platform name. The stoqstoolbox is designed to be used in concert with other tools, such as nctoolbox, which can load data from any OPeNDAP data source. With these two toolboxes a user can easily work with in situ and other gridded data, such as from numerical models and remote sensing platforms. In order to show the capability of stoqstoolbox we will show an example of model validation using data collected during the May-June 2012 field experiment conducted by the Monterey Bay Aquarium Research Institute (MBARI) in Monterey Bay, California. The data are available from the STOQS server at http://odss.mbari.org/canon/stoqs_may2012/query/. Over 14 million data points of 18 parameters from 6 platforms measured over a 3-week period are available on this server. The model used for comparison is the Regional Ocean Modeling System developed by Jet Propulsion Laboratory for the Monterey Bay. The model output are loaded into Matlab using nctoolbox from the JPL server at http://ourocean.jpl.nasa.gov:8080/thredds/dodsC/MBNowcast. Model validation with in situ measurements can be difficult because of different file formats and because data may be spread across individual data systems for each platform. With stoqstoolbox the researcher must know only the URL of the STOQS server and the OPeNDAP URL of the model output. With selected depth and time constraints a user's Matlab program searches for all in situ measurements available for the same time, depth and variable of the model. STOQS and stoqstoolbox are open source software projects supported by MBARI and the David and Lucile Packard foundation. For more information please see http://code.google.com/p/stoqs.
Scanning proton microprobe applied to analysis of individual aerosol particles from Amazon Basin

NASA Astrophysics Data System (ADS)

Gerab, Fábio; Artaxo, Paulo; Swietlicki, Erik; Pallon, Jan

1998-03-01

The development of the Scanning Proton Microprobe (SPM) offers a new possibility for individual aerosol particle studies. The SPM joins Particle Induced X-ray Emission (PIXE) elemental analysis qualities with micrometric spatial resolution. In this work the Lund University SPM facility was used for elemental characterization of individual aerosol particles emitted to the atmosphere in the Brazilian Amazon Basin, during gold mining activities by the so-called "gold shops".
Apparatus for melt growth of crystalline semiconductor sheets

DOEpatents

Ciszek, Theodore F.; Hurd, Jeffery L.

1986-01-01

An economical method is presented for forming thin sheets of crystalline silicon suitable for use in a photovoltaic conversion cell by solidification from the liquid phase. Two spatially separated, generally coplanar filaments wettable by liquid silicon and joined together at the end by a bridge member are immersed in a silicon melt and then slowly withdrawn from the melt so that a silicon crystal is grown between the edge of the bridge and the filaments.
Query Auto-Completion Based on Word2vec Semantic Similarity

NASA Astrophysics Data System (ADS)

Shao, Taihua; Chen, Honghui; Chen, Wanyu

2018-04-01

Query auto-completion (QAC) is the first step of information retrieval, which helps users formulate the entire query after inputting only a few prefixes. Regarding the models of QAC, the traditional method ignores the contribution from the semantic relevance between queries. However, similar queries always express extremely similar search intention. In this paper, we propose a hybrid model FS-QAC based on query semantic similarity as well as the query frequency. We choose word2vec method to measure the semantic similarity between intended queries and pre-submitted queries. By combining both features, our experiments show that FS-QAC model improves the performance when predicting the user’s query intention and helping formulate the right query. Our experimental results show that the optimal hybrid model contributes to a 7.54% improvement in terms of MRR against a state-of-the-art baseline using the public AOL query logs.
EquiX-A Search and Query Language for XML.

ERIC Educational Resources Information Center

Cohen, Sara; Kanza, Yaron; Kogan, Yakov; Sagiv, Yehoshua; Nutt, Werner; Serebrenik, Alexander

2002-01-01

Describes EquiX, a search language for XML that combines querying with searching to query the data and the meta-data content of Web pages. Topics include search engines; a data model for XML documents; search query syntax; search query semantics; an algorithm for evaluating a query on a document; and indexing EquiX queries. (LRW)
[Human body meridian spatial decision support system for clinical treatment and teaching of acupuncture and moxibustion].

PubMed

Wu, Dehua

2016-01-01

The spatial position and distribution of human body meridian are expressed limitedly in the decision support system (DSS) of acupuncture and moxibustion at present, which leads to the failure to give the effective quantitative analysis on the spatial range and the difficulty for the decision-maker to provide a realistic spatial decision environment. Focusing on the limit spatial expression in DSS of acupuncture and moxibustion, it was proposed that on the basis of the geographic information system, in association of DSS technology, the design idea was developed on the human body meridian spatial DSS. With the 4-layer service-oriented architecture adopted, the data center integrated development platform was taken as the system development environment. The hierarchical organization was done for the spatial data of human body meridian via the directory tree. The structured query language (SQL) server was used to achieve the unified management of spatial data and attribute data. The technologies of architecture, configuration and plug-in development model were integrated to achieve the data inquiry, buffer analysis and program evaluation of the human body meridian spatial DSS. The research results show that the human body meridian spatial DSS could reflect realistically the spatial characteristics of the spatial position and distribution of human body meridian and met the constantly changeable demand of users. It has the powerful spatial analysis function and assists with the scientific decision in clinical treatment and teaching of acupuncture and moxibustion. It is the new attempt to the informatization research of human body meridian.
GenoQuery: a new querying module for functional annotation in a genomic warehouse

PubMed Central

Lemoine, Frédéric; Labedan, Bernard; Froidevaux, Christine

2008-01-01

Motivation: We have to cope with both a deluge of new genome sequences and a huge amount of data produced by high-throughput approaches used to exploit these genomic features. Crossing and comparing such heterogeneous and disparate data will help improving functional annotation of genomes. This requires designing elaborate integration systems such as warehouses for storing and querying these data. Results: We have designed a relational genomic warehouse with an original multi-layer architecture made of a databases layer and an entities layer. We describe a new querying module, GenoQuery, which is based on this architecture. We use the entities layer to define mixed queries. These mixed queries allow searching for instances of biological entities and their properties in the different databases, without specifying in which database they should be found. Accordingly, we further introduce the central notion of alternative queries. Such queries have the same meaning as the original mixed queries, while exploiting complementarities yielded by the various integrated databases of the warehouse. We explain how GenoQuery computes all the alternative queries of a given mixed query. We illustrate how useful this querying module is by means of a thorough example. Availability: http://www.lri.fr/~lemoine/GenoQuery/ Contact: chris@lri.fr, lemoine@lri.fr PMID:18586731
Easy-to-use phylogenetic analysis system for hepatitis B virus infection.

PubMed

Sugiyama, Masaya; Inui, Ayano; Shin-I, Tadasu; Komatsu, Haruki; Mukaide, Motokazu; Masaki, Naohiko; Murata, Kazumoto; Ito, Kiyoaki; Nakanishi, Makoto; Fujisawa, Tomoo; Mizokami, Masashi

2011-10-01

The molecular phylogenetic analysis has been broadly applied to clinical and virological study. However, the appropriate settings and application of calculation parameters are difficult for non-specialists of molecular genetics. In the present study, the phylogenetic analysis tool was developed for the easy determination of genotypes and transmission route. A total of 23 patients of 10 families infected with hepatitis B virus (HBV) were enrolled and expected to undergo intrafamilial transmission. The extracted HBV DNA were amplified and sequenced in a region of the S gene. The software to automatically classify query sequence was constructed and installed on the Hepatitis Virus Database (HVDB). Reference sequences were retrieved from HVDB, which contained major genotypes from A to H. Multiple-alignments using CLUSTAL W were performed before the genetic distance matrix was calculated with the six-parameter method. The phylogenetic tree was output by the neighbor-joining method. User interface using WWW-browser was also developed for intuitive control. This system was named as the easy-to-use phylogenetic analysis system (E-PAS). Twenty-three sera of 10 families were analyzed to evaluate E-PAS. The queries obtained from nine families were genotype C and were located in one cluster per family. However, one patient of a family was classified into the cluster different from her family, suggesting that E-PAS detected the sample distinct from that of her family on the transmission route. The E-PAS to output phylogenetic tree was developed since requisite material was sequence data only. E-PAS could expand to determine HBV genotypes as well as transmission routes. © 2011 The Japan Society of Hepatology.
Assignment of protein sequences to existing domain and family classification systems: Pfam and the PDB

PubMed Central

Dunbrack, Roland L.

2012-01-01

Motivation: Automating the assignment of existing domain and protein family classifications to new sets of sequences is an important task. Current methods often miss assignments because remote relationships fail to achieve statistical significance. Some assignments are not as long as the actual domain definitions because local alignment methods often cut alignments short. Long insertions in query sequences often erroneously result in two copies of the domain assigned to the query. Divergent repeat sequences in proteins are often missed. Results: We have developed a multilevel procedure to produce nearly complete assignments of protein families of an existing classification system to a large set of sequences. We apply this to the task of assigning Pfam domains to sequences and structures in the Protein Data Bank (PDB). We found that HHsearch alignments frequently scored more remotely related Pfams in Pfam clans higher than closely related Pfams, thus, leading to erroneous assignment at the Pfam family level. A greedy algorithm allowing for partial overlaps was, thus, applied first to sequence/HMM alignments, then HMM–HMM alignments and then structure alignments, taking care to join partial alignments split by large insertions into single-domain assignments. Additional assignment of repeat Pfams with weaker E-values was allowed after stronger assignments of the repeat HMM. Our database of assignments, presented in a database called PDBfam, contains Pfams for 99.4% of chains >50 residues. Availability: The Pfam assignment data in PDBfam are available at http://dunbrack2.fccc.edu/ProtCid/PDBfam, which can be searched by PDB codes and Pfam identifiers. They will be updated regularly. Contact: Roland.Dunbracks@fccc.edu PMID:22942020
Spatial aspects of building and population exposure data and their implications for global earthquake exposure modeling

USGS Publications Warehouse

Dell’Acqua, F.; Gamba, P.; Jaiswal, K.

2012-01-01

This paper discusses spatial aspects of the global exposure dataset and mapping needs for earthquake risk assessment. We discuss this in the context of development of a Global Exposure Database for the Global Earthquake Model (GED4GEM), which requires compilation of a multi-scale inventory of assets at risk, for example, buildings, populations, and economic exposure. After defining the relevant spatial and geographic scales of interest, different procedures are proposed to disaggregate coarse-resolution data, to map them, and if necessary to infer missing data by using proxies. We discuss the advantages and limitations of these methodologies and detail the potentials of utilizing remote-sensing data. The latter is used especially to homogenize an existing coarser dataset and, where possible, replace it with detailed information extracted from remote sensing using the built-up indicators for different environments. Present research shows that the spatial aspects of earthquake risk computation are tightly connected with the availability of datasets of the resolution necessary for producing sufficiently detailed exposure. The global exposure database designed by the GED4GEM project is able to manage datasets and queries of multiple spatial scales.

EMAGE mouse embryo spatial gene expression database: 2010 update

PubMed Central

Richardson, Lorna; Venkataraman, Shanmugasundaram; Stevenson, Peter; Yang, Yiya; Burton, Nicholas; Rao, Jianguo; Fisher, Malcolm; Baldock, Richard A.; Davidson, Duncan R.; Christiansen, Jeffrey H.

2010-01-01

EMAGE (http://www.emouseatlas.org/emage) is a freely available online database of in situ gene expression patterns in the developing mouse embryo. Gene expression domains from raw images are extracted and integrated spatially into a set of standard 3D virtual mouse embryos at different stages of development, which allows data interrogation by spatial methods. An anatomy ontology is also used to describe sites of expression, which allows data to be queried using text-based methods. Here, we describe recent enhancements to EMAGE including: the release of a completely re-designed website, which offers integration of many different search functions in HTML web pages, improved user feedback and the ability to find similar expression patterns at the click of a button; back-end refactoring from an object oriented to relational architecture, allowing associated SQL access; and the provision of further access by standard formatted URLs and a Java API. We have also increased data coverage by sourcing from a greater selection of journals and developed automated methods for spatial data annotation that are being applied to spatially incorporate the genome-wide (∼19 000 gene) ‘EURExpress’ dataset into EMAGE. PMID:19767607
SPARK: Adapting Keyword Query to Semantic Search

NASA Astrophysics Data System (ADS)

Zhou, Qi; Wang, Chong; Xiong, Miao; Wang, Haofen; Yu, Yong

Semantic search promises to provide more accurate result than present-day keyword search. However, progress with semantic search has been delayed due to the complexity of its query languages. In this paper, we explore a novel approach of adapting keywords to querying the semantic web: the approach automatically translates keyword queries into formal logic queries so that end users can use familiar keywords to perform semantic search. A prototype system named 'SPARK' has been implemented in light of this approach. Given a keyword query, SPARK outputs a ranked list of SPARQL queries as the translation result. The translation in SPARK consists of three major steps: term mapping, query graph construction and query ranking. Specifically, a probabilistic query ranking model is proposed to select the most likely SPARQL query. In the experiment, SPARK achieved an encouraging translation result.
Searching for rare diseases in PubMed: a blind comparison of Orphanet expert query and query based on terminological knowledge.

PubMed

Griffon, N; Schuers, M; Dhombres, F; Merabti, T; Kerdelhué, G; Rollin, L; Darmoni, S J

2016-08-02

Despite international initiatives like Orphanet, it remains difficult to find up-to-date information about rare diseases. The aim of this study is to propose an exhaustive set of queries for PubMed based on terminological knowledge and to evaluate it versus the queries based on expertise provided by the most frequently used resource in Europe: Orphanet. Four rare disease terminologies (MeSH, OMIM, HPO and HRDO) were manually mapped to each other permitting the automatic creation of expended terminological queries for rare diseases. For 30 rare diseases, 30 citations retrieved by Orphanet expert query and/or query based on terminological knowledge were assessed for relevance by two independent reviewers unaware of the query's origin. An adjudication procedure was used to resolve any discrepancy. Precision, relative recall and F-measure were all computed. For each Orphanet rare disease (n = 8982), there was a corresponding terminological query, in contrast with only 2284 queries provided by Orphanet. Only 553 citations were evaluated due to queries with 0 or only a few hits. There were no significant differences between the Orpha query and terminological query in terms of precision, respectively 0.61 vs 0.52 (p = 0.13). Nevertheless, terminological queries retrieved more citations more often than Orpha queries (0.57 vs. 0.33; p = 0.01). Interestingly, Orpha queries seemed to retrieve older citations than terminological queries (p < 0.0001). The terminological queries proposed in this study are now currently available for all rare diseases. They may be a useful tool for both precision or recall oriented literature search.
Asymmetric coding of categorical spatial relations in both language and vision.

PubMed

Roth, J C; Franconeri, S L

2012-01-01

Describing certain types of spatial relationships between a pair of objects requires that the objects are assigned different "roles" in the relation, e.g., "A is above B" is different than "B is above A." This asymmetric representation places one object in the "target" or "figure" role and the other in the "reference" or "ground" role. Here we provide evidence that this asymmetry may be present not just in spatial language, but also in perceptual representations. More specifically, we describe a model of visual spatial relationship judgment where the designation of the target object within such a spatial relationship is guided by the location of the "spotlight" of attention. To demonstrate the existence of this perceptual asymmetry, we cued attention to one object within a pair by briefly previewing it, and showed that participants were faster to verify the depicted relation when that object was the linguistic target. Experiment 1 demonstrated this effect for left-right relations, and Experiment 2 for above-below relations. These results join several other types of demonstrations in suggesting that perceptual representations of some spatial relations may be asymmetrically coded, and further suggest that the location of selective attention may serve as the mechanism that guides this asymmetry.
Spatial Resolution Requirements for Traffic-Related Air Pollutant Exposure Evaluations

PubMed Central

Batterman, Stuart; Chambliss, Sarah; Isakov, Vlad

2014-01-01

Vehicle emissions represent one of the most important air pollution sources in most urban areas, and elevated concentrations of pollutants found near major roads have been associated with many adverse health impacts. To understand these impacts, exposure estimates should reflect the spatial and temporal patterns observed for traffic-related air pollutants. This paper evaluates the spatial resolution and zonal systems required to estimate accurately intraurban and near-road exposures of traffic-related air pollutants. The analyses use the detailed information assembled for a large (800 km2) area centered on Detroit, Michigan, USA. Concentrations of nitrogen oxides (NOx) due to vehicle emissions were estimated using hourly traffic volumes and speeds on 9,700 links representing all but minor roads in the city, the MOVES2010 emission model, the RLINE dispersion model, local meteorological data, a temporal resolution of 1 hr, and spatial resolution as low as 10 m. Model estimates were joined with the corresponding shape files to estimate residential exposures for 700,000 individuals at property parcel, census block, census tract, and ZIP code levels. We evaluate joining methods, the spatial resolution needed to meet specific error criteria, and the extent of exposure misclassification. To portray traffic-related air pollutant exposure, raster or inverse distance-weighted interpolations are superior to nearest neighbor approaches, and interpolations between receptors and points of interest should not exceed about 40 m near major roads, and 100 m at larger distances. For census tracts and ZIP codes, average exposures are overestimated since few individuals live very near major roads, the range of concentrations is compressed, most exposures are misclassified, and high concentrations near roads are entirely omitted. While smaller zones improve performance considerably, even block-level data can misclassify many individuals. To estimate exposures and impacts of traffic-related pollutants accurately, data should be geocoded or estimated at the most-resolved spatial level; census tract and larger zones have little if any ability to represent intraurban variation in traffic-related air pollutant concentrations. These results are based on one of the most comprehensive intraurban modeling studies in the literature and results are robust. Recommendations address the value of dispersion models to portray spatial and temporal variation of air pollutants in epidemiology and other studies; techniques to improve accuracy and reduce the computational burden in urban scale modeling; the necessary spatial resolution for health surveillance, demographic, and pollution data; and the consequences of low resolution data in terms of exposure misclassification. PMID:25132794
Applying Semantic Web Concepts to Support Net-Centric Warfare Using the Tactical Assessment Markup Language (TAML)

DTIC Science & Technology

2006-06-01

SPARQL SPARQL Protocol and RDF Query Language SQL Structured Query Language SUMO Suggested Upper Merged Ontology SW... Query optimization algorithms are implemented in the Pellet reasoner in order to ensure querying a knowledge base is efficient . These algorithms...memory as a treelike structure in order for the data to be queried . XML Query (XQuery) is the standard language used when querying XML
Implementation of Quantum Private Queries Using Nuclear Magnetic Resonance

NASA Astrophysics Data System (ADS)

Wang, Chuan; Hao, Liang; Zhao, Lian-Jie

2011-08-01

We present a modified protocol for the realization of a quantum private query process on a classical database. Using one-qubit query and CNOT operation, the query process can be realized in a two-mode database. In the query process, the data privacy is preserved as the sender would not reveal any information about the database besides her query information, and the database provider cannot retain any information about the query. We implement the quantum private query protocol in a nuclear magnetic resonance system. The density matrix of the memory registers are constructed.
Semantic Document Library: A Virtual Research Environment for Documents, Data and Workflows Sharing

NASA Astrophysics Data System (ADS)

Kotwani, K.; Liu, Y.; Myers, J.; Futrelle, J.

2008-12-01

The Semantic Document Library (SDL) was driven by use cases from the environmental observatory communities and is designed to provide conventional document repository features of uploading, downloading, editing and versioning of documents as well as value adding features of tagging, querying, sharing, annotating, ranking, provenance, social networking and geo-spatial mapping services. It allows users to organize a catalogue of watershed observation data, model output, workflows, as well publications and documents related to the same watershed study through the tagging capability. Users can tag all relevant materials using the same watershed name and find all of them easily later using this tag. The underpinning semantic content repository can store materials from other cyberenvironments such as workflow or simulation tools and SDL provides an effective interface to query and organize materials from various sources. Advanced features of the SDL allow users to visualize the provenance of the materials such as the source and how the output data is derived. Other novel features include visualizing all geo-referenced materials on a geospatial map. SDL as a component of a cyberenvironment portal (the NCSA Cybercollaboratory) has goal of efficient management of information and relationships between published artifacts (Validated models, vetted data, workflows, annotations, best practices, reviews and papers) produced from raw research artifacts (data, notes, plans etc.) through agents (people, sensors etc.). Tremendous scientific potential of artifacts is achieved through mechanisms of sharing, reuse and collaboration - empowering scientists to spread their knowledge and protocols and to benefit from the knowledge of others. SDL successfully implements web 2.0 technologies and design patterns along with semantic content management approach that enables use of multiple ontologies and dynamic evolution (e.g. folksonomies) of terminology. Scientific documents involved with many interconnected entities (artifacts or agents) are represented as RDF triples using semantic content repository middleware Tupelo in one or many data/metadata RDF stores. Queries to the RDF enables discovery of relations among data, process and people, digging out valuable aspects, making recommendations to users, such as what tools are typically used to answer certain kinds of questions or with certain types of dataset. This innovative concept brings out coherent information about entities from four different perspectives of the social context (Who-human relations and interactions), the casual context (Why - provenance and history), the geo-spatial context (Where - location or spatially referenced information) and the conceptual context (What - domain specific relations, ontologies etc.).
The EarthServer Geology Service: web coverage services for geosciences

NASA Astrophysics Data System (ADS)

Laxton, John; Sen, Marcus; Passmore, James

2014-05-01

The EarthServer FP7 project is implementing web coverage services using the OGC WCS and WCPS standards for a range of earth science domains: cryospheric; atmospheric; oceanographic; planetary; and geological. BGS is providing the geological service (http://earthserver.bgs.ac.uk/). Geoscience has used remote sensed data from satellites and planes for some considerable time, but other areas of geosciences are less familiar with the use of coverage data. This is rapidly changing with the development of new sensor networks and the move from geological maps to geological spatial models. The BGS geology service is designed initially to address two coverage data use cases and three levels of data access restriction. Databases of remote sensed data are typically very large and commonly held offline, making it time-consuming for users to assess and then download data. The service is designed to allow the spatial selection, editing and display of Landsat and aerial photographic imagery, including band selection and contrast stretching. This enables users to rapidly view data, assess is usefulness for their purposes, and then enhance and download it if it is suitable. At present the service contains six band Landsat 7 (Blue, Green, Red, NIR 1, NIR 2, MIR) and three band false colour aerial photography (NIR, green, blue), totalling around 1Tb. Increasingly 3D spatial models are being produced in place of traditional geological maps. Models make explicit spatial information implicit on maps and thus are seen as a better way of delivering geosciences information to non-geoscientists. However web delivery of models, including the provision of suitable visualisation clients, has proved more challenging than delivering maps. The EarthServer geology service is delivering 35 surfaces as coverages, comprising the modelled superficial deposits of the Glasgow area. These can be viewed using a 3D web client developed in the EarthServer project by Fraunhofer. As well as remote sensed imagery and 3D models, the geology service is also delivering DTM coverages which can be viewed in the 3D client in conjunction with both imagery and models. The service is accessible through a web GUI which allows the imagery to be viewed against a range of background maps and DTMs, and in the 3D client; spatial selection to be carried out graphically; the results of image enhancement to be displayed; and selected data to be downloaded. The GUI also provides access to the Glasgow model in the 3D client, as well as tutorial material. In the final year of the project it is intended to increase the volume of data to 20Tb and enhance the WCPS processing, including depth and thickness querying of 3D models. We have also investigated the use of GeoSciML, developed to describe and interchange the information on geological maps, to describe model surface coverages. EarthServer is developing a combined WCPS and xQuery query language, and we will investigate applying this to the GeoSciML described surfaces to answer questions such as 'find all units with a predominant sand lithology within 25m of the surface'.
A study of medical and health queries to web search engines.

PubMed

Spink, Amanda; Yang, Yin; Jansen, Jim; Nykanen, Pirrko; Lorence, Daniel P; Ozmutlu, Seda; Ozmutlu, H Cenk

2004-03-01

This paper reports findings from an analysis of medical or health queries to different web search engines. We report results: (i). comparing samples of 10000 web queries taken randomly from 1.2 million query logs from the AlltheWeb.com and Excite.com commercial web search engines in 2001 for medical or health queries, (ii). comparing the 2001 findings from Excite and AlltheWeb.com users with results from a previous analysis of medical and health related queries from the Excite Web search engine for 1997 and 1999, and (iii). medical or health advice-seeking queries beginning with the word 'should'. Findings suggest: (i). a small percentage of web queries are medical or health related, (ii). the top five categories of medical or health queries were: general health, weight issues, reproductive health and puberty, pregnancy/obstetrics, and human relationships, and (iii). over time, the medical and health queries may have declined as a proportion of all web queries, as the use of specialized medical/health websites and e-commerce-related queries has increased. Findings provide insights into medical and health-related web querying and suggests some implications for the use of the general web search engines when seeking medical/health information.
Monitoring Moving Queries inside a Safe Region

PubMed Central

Al-Khalidi, Haidar; Taniar, David; Alamri, Sultan

2014-01-01

With mobile moving range queries, there is a need to recalculate the relevant surrounding objects of interest whenever the query moves. Therefore, monitoring the moving query is very costly. The safe region is one method that has been proposed to minimise the communication and computation cost of continuously monitoring a moving range query. Inside the safe region the set of objects of interest to the query do not change; thus there is no need to update the query while it is inside its safe region. However, when the query leaves its safe region the mobile device has to reevaluate the query, necessitating communication with the server. Knowing when and where the mobile device will leave a safe region is widely known as a difficult problem. To solve this problem, we propose a novel method to monitor the position of the query over time using a linear function based on the direction of the query obtained by periodic monitoring of its position. Periodic monitoring ensures that the query is aware of its location all the time. This method reduces the costs associated with communications in client-server architecture. Computational results show that our method is successful in handling moving query patterns. PMID:24696652
RDF-GL: A SPARQL-Based Graphical Query Language for RDF

NASA Astrophysics Data System (ADS)

Hogenboom, Frederik; Milea, Viorel; Frasincar, Flavius; Kaymak, Uzay

This chapter presents RDF-GL, a graphical query language (GQL) for RDF. The GQL is based on the textual query language SPARQL and mainly focuses on SPARQL SELECT queries. The advantage of a GQL over textual query languages is that complexity is hidden through the use of graphical symbols. RDF-GL is supported by a Java-based editor, SPARQLinG, which is presented as well. The editor does not only allow for RDF-GL query creation, but also converts RDF-GL queries to SPARQL queries and is able to subsequently execute these. Experiments show that using the GQL in combination with the editor makes RDF querying more accessible for end users.
Method and apparatus for melt growth of crystalline semiconductor sheets

DOEpatents

Ciszek, T.F.; Hurd, J.L.

1981-02-25

An economical method is presented for forming thin sheets of crystalline silicon suitable for use in a photovoltaic conversion cell by solidification from the liquid phase. Two spatially separated, generally coplanar filaments wettable by liquid silicon and joined together at the end by a bridge member are immersed in a silicon melt and then slowly withdrawn from the melt so that a silicon crystal is grown between the edge of the bridge and the filaments.
Usage and administration manual for a geodatabase compendium of water-resources data-Rio Grande Basin from the Rio Arriba-Sandoval County line, New Mexico, to Presidio, Texas, 1889-2009

USGS Publications Warehouse

Burley, Thomas E.

2011-01-01

The U.S. Geological Survey, in cooperation with the New Mexico Interstate Stream Commission, developed a geodatabase compendium (hereinafter referred to as the 'geodatabase') of available water-resources data for the reach of the Rio Grande from Rio Arriba-Sandoval County line, New Mexico, to Presidio, Texas. Since 1889, a wealth of water-resources data has been collected in the Rio Grande Basin from Rio Arriba-Sandoval County line, New Mexico, to Presidio, Texas, for a variety of purposes. Collecting agencies, researchers, and organizations have included the U.S. Geological Survey, Bureau of Reclamation, International Boundary and Water Commission, State agencies, irrigation districts, municipal water utilities, universities, and other entities. About 1,750 data records were recently (2010) evaluated to enhance their usability by compiling them into a single geospatial relational database (geodatabase). This report is intended as a user's manual and administration guide for the geodatabase. All data available, including water quality, water level, and discharge data (both instantaneous and daily) from January 1, 1889, through December 17, 2009, were compiled for the study area. A flexible and efficient geodatabase design was used, enhancing the ability of the geodatabase to handle data from diverse sources and helping to ensure sustainability of the geodatabase with long-term maintenance. Geodatabase tables include daily data values, site locations and information, sample event information, and parameters, as well as data sources and collecting agencies. The end products of this effort are a comprehensive water-resources geodatabase that enables the visualization of primary sampling sites for surface discharges, groundwater elevations, and water-quality and associated data for the study area. In addition, repeatable data processing scripts, Structured Query Language queries for loading prepared data sources, and a detailed process for refreshing all data in the compendium have been developed. The geodatabase functionality allows users to explore spatial characteristics of the data, conduct spatial analyses, and pose questions to the geodatabase in the form of queries. Users can also customize and extend the geodatabase, combine it with other databases, or use the geodatabase design for other water-resources applications.
elevatr: Access Elevation Data from Various APIs | Science ...

EPA Pesticide Factsheets

Several web services are available that provide access to elevation data. This package provides access to several of those services and returns elevation data either as a SpatialPointsDataFrame from point elevation services or as a raster object from raster elevation services. Currently, the package supports access to the Mapzen Elevation Service, Mapzen Terrain Service, and the USGS Elevation Point Query Service. The R language for statistical computing is increasingly used for spatial data analysis . This R package, elevatr, is in response to this and provides access to elevation data from various sources directly in R. The impact of `elevatr` is that it will 1) facilitate spatial analysis in R by providing access to foundational dataset for many types of analyses (e.g. hydrology, limnology) 2) open up a new set of users and uses for APIs widely used outside of R, and 3) provide an excellent example federal open source development as promoted by the Federal Source Code Policy (https://sourcecode.cio.gov/).
Cumulative query method for influenza surveillance using search engine data.

PubMed

Seo, Dong-Woo; Jo, Min-Woo; Sohn, Chang Hwan; Shin, Soo-Yong; Lee, JaeHo; Yu, Maengsoo; Kim, Won Young; Lim, Kyoung Soo; Lee, Sang-Il

2014-12-16

Internet search queries have become an important data source in syndromic surveillance system. However, there is currently no syndromic surveillance system using Internet search query data in South Korea. The objective of this study was to examine correlations between our cumulative query method and national influenza surveillance data. Our study was based on the local search engine, Daum (approximately 25% market share), and influenza-like illness (ILI) data from the Korea Centers for Disease Control and Prevention. A quota sampling survey was conducted with 200 participants to obtain popular queries. We divided the study period into two sets: Set 1 (the 2009/10 epidemiological year for development set 1 and 2010/11 for validation set 1) and Set 2 (2010/11 for development Set 2 and 2011/12 for validation Set 2). Pearson's correlation coefficients were calculated between the Daum data and the ILI data for the development set. We selected the combined queries for which the correlation coefficients were .7 or higher and listed them in descending order. Then, we created a cumulative query method n representing the number of cumulative combined queries in descending order of the correlation coefficient. In validation set 1, 13 cumulative query methods were applied, and 8 had higher correlation coefficients (min=.916, max=.943) than that of the highest single combined query. Further, 11 of 13 cumulative query methods had an r value of ≥.7, but 4 of 13 combined queries had an r value of ≥.7. In validation set 2, 8 of 15 cumulative query methods showed higher correlation coefficients (min=.975, max=.987) than that of the highest single combined query. All 15 cumulative query methods had an r value of ≥.7, but 6 of 15 combined queries had an r value of ≥.7. Cumulative query method showed relatively higher correlation with national influenza surveillance data than combined queries in the development and validation set.
Data management with a landslide inventory of the Franconian Alb (Germany) using a spatial database and GIS tools

NASA Astrophysics Data System (ADS)

Bemm, Stefan; Sandmeier, Christine; Wilde, Martina; Jaeger, Daniel; Schwindt, Daniel; Terhorst, Birgit

2014-05-01

The area of the Swabian-Franconian cuesta landscape (Southern Germany) is highly prone to landslides. This was apparent in the late spring of 2013, when numerous landslides occurred as a consequence of heavy and long-lasting rainfalls. The specific climatic situation caused numerous damages with serious impact on settlements and infrastructure. Knowledge on spatial distribution of landslides, processes and characteristics are important to evaluate the potential risk that can occur from mass movements in those areas. In the frame of two projects about 400 landslides were mapped and detailed data sets were compiled during years 2011 to 2014 at the Franconian Alb. The studies are related to the project "Slope stability and hazard zones in the northern Bavarian cuesta" (DFG, German Research Foundation) as well as to the LfU (The Bavarian Environment Agency) within the project "Georisks and climate change - hazard indication map Jura". The central goal of the present study is to create a spatial database for landslides. The database should contain all fundamental parameters to characterize the mass movements and should provide the potential for secure data storage and data management, as well as statistical evaluations. The spatial database was created with PostgreSQL, an object-relational database management system and PostGIS, a spatial database extender for PostgreSQL, which provides the possibility to store spatial and geographic objects and to connect to several GIS applications, like GRASS GIS, SAGA GIS, QGIS and GDAL, a geospatial library (Obe et al. 2011). Database access for querying, importing, and exporting spatial and non-spatial data is ensured by using GUI or non-GUI connections. The database allows the use of procedural languages for writing advanced functions in the R, Python or Perl programming languages. It is possible to work directly with the (spatial) data entirety of the database in R. The inventory of the database includes (amongst others), informations on location, landslide types and causes, geomorphological positions, geometries, hazards and damages, as well as assessments related to the activity of landslides. Furthermore, there are stored spatial objects, which represent the components of a landslide, in particular the scarps and the accumulation areas. Besides, waterways, map sheets, contour lines, detailed infrastructure data, digital elevation models, aspect and slope data are included. Examples of spatial queries to the database are intersections of raster and vector data for calculating values for slope gradients or aspects of landslide areas and for creating multiple, overlaying sections for the comparison of slopes, as well as distances to the infrastructure or to the next receiving drainage. Furthermore, getting informations on landslide magnitudes, distribution and clustering, as well as potential correlations concerning geomorphological or geological conditions. The data management concept in this study can be implemented for any academic, public or private use, because it is independent from any obligatory licenses. The created spatial database offers a platform for interdisciplinary research and socio-economic questions, as well as for landslide susceptibility and hazard indication mapping. Obe, R.O., Hsu, L.S. 2011. PostGIS in action. - pp 492, Manning Publications, Stamford
A Query Integrator and Manager for the Query Web

PubMed Central

Brinkley, James F.; Detwiler, Landon T.

2012-01-01

We introduce two concepts: the Query Web as a layer of interconnected queries over the document web and the semantic web, and a Query Web Integrator and Manager (QI) that enables the Query Web to evolve. QI permits users to write, save and reuse queries over any web accessible source, including other queries saved in other installations of QI. The saved queries may be in any language (e.g. SPARQL, XQuery); the only condition for interconnection is that the queries return their results in some form of XML. This condition allows queries to chain off each other, and to be written in whatever language is appropriate for the task. We illustrate the potential use of QI for several biomedical use cases, including ontology view generation using a combination of graph-based and logical approaches, value set generation for clinical data management, image annotation using terminology obtained from an ontology web service, ontology-driven brain imaging data integration, small-scale clinical data integration, and wider-scale clinical data integration. Such use cases illustrate the current range of applications of QI and lead us to speculate about the potential evolution from smaller groups of interconnected queries into a larger query network that layers over the document and semantic web. The resulting Query Web could greatly aid researchers and others who now have to manually navigate through multiple information sources in order to answer specific questions. PMID:22531831
Integrated Web-Based Immersive Exploration of the Coordinated Canyon Experiment Data using Open Source STOQS Software

NASA Astrophysics Data System (ADS)

McCann, M. P.; Gwiazda, R.; O'Reilly, T. C.; Maier, K. L.; Lundsten, E. M.; Parsons, D. R.; Paull, C. K.

2017-12-01

The Coordinated Canyon Experiment (CCE) in Monterey Submarine Canyon has produced a wealth of oceanographic measurements whose analysis will improve understanding of turbidity current processes. Exploration of this data set, consisting of over 60 parameters from 15 platforms, is facilitated by using the open source Spatial Temporal Oceanographic Query System (STOQS) software (https://github.com/stoqs/stoqs). The Monterey Bay Aquarium Research Institute (MBARI) originally developed STOQS to help manage and visualize upper water column oceanographic measurements, but the generality of its data model permits effective use for any kind of spatial/temporal measurement data. STOQS consists of a PostgreSQL database and server-side Python/Django software; the client-side is jQuery JavaScript supporting AJAX requests to update a single page web application. The User Interface (UI) is optimized to provide a quick overview of data in spatial and temporal dimensions, as well as in parameter, platform, and data value space. A user may zoom into any feature of interest and select it, initiating a filter operation that updates the UI with an overview of all the data in the new filtered selection. When details are desired, radio buttons and checkboxes are selected to generate a number of different types of visualizations. These include color-filled temporal section and line plots, parameter-parameter plots, 2D map plots, and interactive 3D spatial visualizations. The Extensible 3D (X3D) standard and X3DOM JavaScript library provide the technology for presenting animated 3D data directly within the web browser. Most of the oceanographic measurements from the CCE (e.g. mooring mounted ADCP and CTD data) are easily visualized using established methods. However, unified integration and multiparameter display of several concurrently deployed sensors across a network of platforms is a challenge we hope to solve. Moreover, STOQS also allows display of data from a new instrument - the Benthic Event Detector (BED). The BED records 50Hz samples of orientation and acceleration when it moves. These data are converted to the CF-NetCDF format and then loaded into a STOQS database. Using the Spatial-3D view a user may interact with a virtual playback of BED motions, giving new insight into submarine canyon sediment density flows.
Large Survey Database: A Distributed Framework for Storage and Analysis of Large Datasets

NASA Astrophysics Data System (ADS)

Juric, Mario

2011-01-01

The Large Survey Database (LSD) is a Python framework and DBMS for distributed storage, cross-matching and querying of large survey catalogs (>10^9 rows, >1 TB). The primary driver behind its development is the analysis of Pan-STARRS PS1 data. It is specifically optimized for fast queries and parallel sweeps of positionally and temporally indexed datasets. It transparently scales to more than >10^2 nodes, and can be made to function in "shared nothing" architectures. An LSD database consists of a set of vertically and horizontally partitioned tables, physically stored as compressed HDF5 files. Vertically, we partition the tables into groups of related columns ('column groups'), storing together logically related data (e.g., astrometry, photometry). Horizontally, the tables are partitioned into partially overlapping ``cells'' by position in space (lon, lat) and time (t). This organization allows for fast lookups based on spatial and temporal coordinates, as well as data and task distribution. The design was inspired by the success of Google BigTable (Chang et al., 2006). Our programming model is a pipelined extension of MapReduce (Dean and Ghemawat, 2004). An SQL-like query language is used to access data. For complex tasks, map-reduce ``kernels'' that operate on query results on a per-cell basis can be written, with the framework taking care of scheduling and execution. The combination leverages users' familiarity with SQL, while offering a fully distributed computing environment. LSD adds little overhead compared to direct Python file I/O. In tests, we sweeped through 1.1 Grows of PanSTARRS+SDSS data (220GB) less than 15 minutes on a dual CPU machine. In a cluster environment, we achieved bandwidths of 17Gbits/sec (I/O limited). Based on current experience, we believe LSD should scale to be useful for analysis and storage of LSST-scale datasets. It can be downloaded from http://mwscience.net/lsd.

Using Generalized Annotated Programs to Solve Social Network Diffusion Optimization Problems

DTIC Science & Technology

2013-01-01

as follows: —Let kall be the k value for the SNDOP-ALL query and for each SNDOP query i, let ki be the k for that query. For each query i, set ki... kall − 1. —Number each element of vi ∈ V such that gI(vi) and V C(vi) are true. For the ith SNDOP query, let vi be the corresponding element of V —Let...vertices of S. PROOF. We set up |V | SNDOP-queries as follows: —Let kall be the k value for the SNDOP-ALL query and and for each SNDOP-query i, let ki be
A web-based data-querying tool based on ontology-driven methodology and flowchart-based model.

PubMed

Ping, Xiao-Ou; Chung, Yufang; Tseng, Yi-Ju; Liang, Ja-Der; Yang, Pei-Ming; Huang, Guan-Tarn; Lai, Feipei

2013-10-08

Because of the increased adoption rate of electronic medical record (EMR) systems, more health care records have been increasingly accumulating in clinical data repositories. Therefore, querying the data stored in these repositories is crucial for retrieving the knowledge from such large volumes of clinical data. The aim of this study is to develop a Web-based approach for enriching the capabilities of the data-querying system along the three following considerations: (1) the interface design used for query formulation, (2) the representation of query results, and (3) the models used for formulating query criteria. The Guideline Interchange Format version 3.5 (GLIF3.5), an ontology-driven clinical guideline representation language, was used for formulating the query tasks based on the GLIF3.5 flowchart in the Protégé environment. The flowchart-based data-querying model (FBDQM) query execution engine was developed and implemented for executing queries and presenting the results through a visual and graphical interface. To examine a broad variety of patient data, the clinical data generator was implemented to automatically generate the clinical data in the repository, and the generated data, thereby, were employed to evaluate the system. The accuracy and time performance of the system for three medical query tasks relevant to liver cancer were evaluated based on the clinical data generator in the experiments with varying numbers of patients. In this study, a prototype system was developed to test the feasibility of applying a methodology for building a query execution engine using FBDQMs by formulating query tasks using the existing GLIF. The FBDQM-based query execution engine was used to successfully retrieve the clinical data based on the query tasks formatted using the GLIF3.5 in the experiments with varying numbers of patients. The accuracy of the three queries (ie, "degree of liver damage," "degree of liver damage when applying a mutually exclusive setting," and "treatments for liver cancer") was 100% for all four experiments (10 patients, 100 patients, 1000 patients, and 10,000 patients). Among the three measured query phases, (1) structured query language operations, (2) criteria verification, and (3) other, the first two had the longest execution time. The ontology-driven FBDQM-based approach enriched the capabilities of the data-querying system. The adoption of the GLIF3.5 increased the potential for interoperability, shareability, and reusability of the query tasks.
Comparative Analysis of Online Health Queries Originating From Personal Computers and Smart Devices on a Consumer Health Information Portal

PubMed Central

Jadhav, Ashutosh; Andrews, Donna; Fiksdal, Alexander; Kumbamu, Ashok; McCormick, Jennifer B; Misitano, Andrew; Nelsen, Laurie; Ryu, Euijung; Sheth, Amit; Wu, Stephen

2014-01-01

Background The number of people using the Internet and mobile/smart devices for health information seeking is increasing rapidly. Although the user experience for online health information seeking varies with the device used, for example, smart devices (SDs) like smartphones/tablets versus personal computers (PCs) like desktops/laptops, very few studies have investigated how online health information seeking behavior (OHISB) may differ by device. Objective The objective of this study is to examine differences in OHISB between PCs and SDs through a comparative analysis of large-scale health search queries submitted through Web search engines from both types of devices. Methods Using the Web analytics tool, IBM NetInsight OnDemand, and based on the type of devices used (PCs or SDs), we obtained the most frequent health search queries between June 2011 and May 2013 that were submitted on Web search engines and directed users to the Mayo Clinic’s consumer health information website. We performed analyses on “Queries with considering repetition counts (QwR)” and “Queries without considering repetition counts (QwoR)”. The dataset contains (1) 2.74 million and 3.94 million QwoR, respectively for PCs and SDs, and (2) more than 100 million QwR for both PCs and SDs. We analyzed structural properties of the queries (length of the search queries, usage of query operators and special characters in health queries), types of search queries (keyword-based, wh-questions, yes/no questions), categorization of the queries based on health categories and information mentioned in the queries (gender, age-groups, temporal references), misspellings in the health queries, and the linguistic structure of the health queries. Results Query strings used for health information searching via PCs and SDs differ by almost 50%. The most searched health categories are “Symptoms” (1 in 3 search queries), “Causes”, and “Treatments & Drugs”. The distribution of search queries for different health categories differs with the device used for the search. Health queries tend to be longer and more specific than general search queries. Health queries from SDs are longer and have slightly fewer spelling mistakes than those from PCs. Users specify words related to women and children more often than that of men and any other age group. Most of the health queries are formulated using keywords; the second-most common are wh- and yes/no questions. Users ask more health questions using SDs than PCs. Almost all health queries have at least one noun and health queries from SDs are more descriptive than those from PCs. Conclusions This study is a large-scale comparative analysis of health search queries to understand the effects of device type (PCs vs SDs) used on OHISB. The study indicates that the device used for online health information search plays an important role in shaping how health information searches by consumers and patients are executed. PMID:25000537
Comparative analysis of online health queries originating from personal computers and smart devices on a consumer health information portal.

PubMed

Jadhav, Ashutosh; Andrews, Donna; Fiksdal, Alexander; Kumbamu, Ashok; McCormick, Jennifer B; Misitano, Andrew; Nelsen, Laurie; Ryu, Euijung; Sheth, Amit; Wu, Stephen; Pathak, Jyotishman

2014-07-04

The number of people using the Internet and mobile/smart devices for health information seeking is increasing rapidly. Although the user experience for online health information seeking varies with the device used, for example, smart devices (SDs) like smartphones/tablets versus personal computers (PCs) like desktops/laptops, very few studies have investigated how online health information seeking behavior (OHISB) may differ by device. The objective of this study is to examine differences in OHISB between PCs and SDs through a comparative analysis of large-scale health search queries submitted through Web search engines from both types of devices. Using the Web analytics tool, IBM NetInsight OnDemand, and based on the type of devices used (PCs or SDs), we obtained the most frequent health search queries between June 2011 and May 2013 that were submitted on Web search engines and directed users to the Mayo Clinic's consumer health information website. We performed analyses on "Queries with considering repetition counts (QwR)" and "Queries without considering repetition counts (QwoR)". The dataset contains (1) 2.74 million and 3.94 million QwoR, respectively for PCs and SDs, and (2) more than 100 million QwR for both PCs and SDs. We analyzed structural properties of the queries (length of the search queries, usage of query operators and special characters in health queries), types of search queries (keyword-based, wh-questions, yes/no questions), categorization of the queries based on health categories and information mentioned in the queries (gender, age-groups, temporal references), misspellings in the health queries, and the linguistic structure of the health queries. Query strings used for health information searching via PCs and SDs differ by almost 50%. The most searched health categories are "Symptoms" (1 in 3 search queries), "Causes", and "Treatments & Drugs". The distribution of search queries for different health categories differs with the device used for the search. Health queries tend to be longer and more specific than general search queries. Health queries from SDs are longer and have slightly fewer spelling mistakes than those from PCs. Users specify words related to women and children more often than that of men and any other age group. Most of the health queries are formulated using keywords; the second-most common are wh- and yes/no questions. Users ask more health questions using SDs than PCs. Almost all health queries have at least one noun and health queries from SDs are more descriptive than those from PCs. This study is a large-scale comparative analysis of health search queries to understand the effects of device type (PCs vs. SDs) used on OHISB. The study indicates that the device used for online health information search plays an important role in shaping how health information searches by consumers and patients are executed.
Wikipedia: a key tool for global public health promotion.

PubMed

Heilman, James M; Kemmann, Eckhard; Bonert, Michael; Chatterjee, Anwesh; Ragar, Brent; Beards, Graham M; Iberri, David J; Harvey, Matthew; Thomas, Brendan; Stomp, Wouter; Martone, Michael F; Lodge, Daniel J; Vondracek, Andrea; de Wolff, Jacob F; Liber, Casimir; Grover, Samir C; Vickers, Tim J; Meskó, Bertalan; Laurent, Michaël R

2011-01-31

The Internet has become an important health information resource for patients and the general public. Wikipedia, a collaboratively written Web-based encyclopedia, has become the dominant online reference work. It is usually among the top results of search engine queries, including when medical information is sought. Since April 2004, editors have formed a group called WikiProject Medicine to coordinate and discuss the English-language Wikipedia's medical content. This paper, written by members of the WikiProject Medicine, discusses the intricacies, strengths, and weaknesses of Wikipedia as a source of health information and compares it with other medical wikis. Medical professionals, their societies, patient groups, and institutions can help improve Wikipedia's health-related entries. Several examples of partnerships already show that there is enthusiasm to strengthen Wikipedia's biomedical content. Given its unique global reach, we believe its possibilities for use as a tool for worldwide health promotion are underestimated. We invite the medical community to join in editing Wikipedia, with the goal of providing people with free access to reliable, understandable, and up-to-date health information.
Wikipedia: A Key Tool for Global Public Health Promotion

PubMed Central

Heilman, James M; Kemmann, Eckhard; Bonert, Michael; Chatterjee, Anwesh; Ragar, Brent; Beards, Graham M; Iberri, David J; Harvey, Matthew; Thomas, Brendan; Stomp, Wouter; Martone, Michael F; Lodge, Daniel J; Vondracek, Andrea; de Wolff, Jacob F; Liber, Casimir; Grover, Samir C; Vickers, Tim J; Meskó, Bertalan

2011-01-01

The Internet has become an important health information resource for patients and the general public. Wikipedia, a collaboratively written Web-based encyclopedia, has become the dominant online reference work. It is usually among the top results of search engine queries, including when medical information is sought. Since April 2004, editors have formed a group called WikiProject Medicine to coordinate and discuss the English-language Wikipedia’s medical content. This paper, written by members of the WikiProject Medicine, discusses the intricacies, strengths, and weaknesses of Wikipedia as a source of health information and compares it with other medical wikis. Medical professionals, their societies, patient groups, and institutions can help improve Wikipedia’s health-related entries. Several examples of partnerships already show that there is enthusiasm to strengthen Wikipedia’s biomedical content. Given its unique global reach, we believe its possibilities for use as a tool for worldwide health promotion are underestimated. We invite the medical community to join in editing Wikipedia, with the goal of providing people with free access to reliable, understandable, and up-to-date health information. PMID:21282098
The Localized Discovery and Recovery for Query Packet Losses in Wireless Sensor Networks with Distributed Detector Clusters

PubMed Central

Teng, Rui; Leibnitz, Kenji; Miura, Ryu

2013-01-01

An essential application of wireless sensor networks is to successfully respond to user queries. Query packet losses occur in the query dissemination due to wireless communication problems such as interference, multipath fading, packet collisions, etc. The losses of query messages at sensor nodes result in the failure of sensor nodes reporting the requested data. Hence, the reliable and successful dissemination of query messages to sensor nodes is a non-trivial problem. The target of this paper is to enable highly successful query delivery to sensor nodes by localized and energy-efficient discovery, and recovery of query losses. We adopt local and collective cooperation among sensor nodes to increase the success rate of distributed discoveries and recoveries. To enable the scalability in the operations of discoveries and recoveries, we employ a distributed name resolution mechanism at each sensor node to allow sensor nodes to self-detect the correlated queries and query losses, and then efficiently locally respond to the query losses. We prove that the collective discovery of query losses has a high impact on the success of query dissemination and reveal that scalability can be achieved by using the proposed approach. We further study the novel features of the cooperation and competition in the collective recovery at PHY and MAC layers, and show that the appropriate number of detectors can achieve optimal successful recovery rate. We evaluate the proposed approach with both mathematical analyses and computer simulations. The proposed approach enables a high rate of successful delivery of query messages and it results in short route lengths to recover from query losses. The proposed approach is scalable and operates in a fully distributed manner. PMID:23748172
Observation of a Coulomb flux tube

NASA Astrophysics Data System (ADS)

Greensite, Jeff; Chung, Kristian

2018-03-01

In Coulomb gauge there is a longitudinal color electric field associated with a static quark-antiquark pair. We have measured the spatial distribution of this field, and find that it falls off exponentially with transverse distance from a line joining the two quarks. In other words there is a Coulomb flux tube, with a width that is somewhat smaller than that of the minimal energy flux tube associated with the asymptotic string tension. A confinement criterion for gauge theories with matter fields is also proposed.
BEAM CONTROL PROBE

DOEpatents

Chesterman, A.W.

1959-03-17

A probe is described for intercepting a desired portion of a beam of charged particles and for indicating the spatial disposition of the beam. The disclosed probe assembly includes a pair of pivotally mounted vanes moveable into a single plane with adjacent edges joining and a calibrated mechanical arrangement for pivoting the vancs apart. When the probe is disposed in the path of a charged particle beam, the vanes may be adjusted according to the beam current received in each vane to ascertain the dimension of the beam.
Ontological Approach to Military Knowledge Modeling and Management

DTIC Science & Technology

2004-03-01

federated search mechanism has to reformulate user queries (expressed using the ontology) in the query languages of the different sources (e.g. SQL...ontologies as a common terminology – Unified query to perform federated search • Query processing – Ontology mapping to sources reformulate queries
VISAGE: Interactive Visual Graph Querying.

PubMed

Pienta, Robert; Navathe, Shamkant; Tamersoy, Acar; Tong, Hanghang; Endert, Alex; Chau, Duen Horng

2016-06-01

Extracting useful patterns from large network datasets has become a fundamental challenge in many domains. We present VISAGE, an interactive visual graph querying approach that empowers users to construct expressive queries, without writing complex code (e.g., finding money laundering rings of bankers and business owners). Our contributions are as follows: (1) we introduce graph autocomplete , an interactive approach that guides users to construct and refine queries, preventing over-specification; (2) VISAGE guides the construction of graph queries using a data-driven approach, enabling users to specify queries with varying levels of specificity, from concrete and detailed (e.g., query by example), to abstract (e.g., with "wildcard" nodes of any types), to purely structural matching; (3) a twelve-participant, within-subject user study demonstrates VISAGE's ease of use and the ability to construct graph queries significantly faster than using a conventional query language; (4) VISAGE works on real graphs with over 468K edges, achieving sub-second response times for common queries.
VISAGE: Interactive Visual Graph Querying

PubMed Central

Pienta, Robert; Navathe, Shamkant; Tamersoy, Acar; Tong, Hanghang; Endert, Alex; Chau, Duen Horng

2017-01-01

Extracting useful patterns from large network datasets has become a fundamental challenge in many domains. We present VISAGE, an interactive visual graph querying approach that empowers users to construct expressive queries, without writing complex code (e.g., finding money laundering rings of bankers and business owners). Our contributions are as follows: (1) we introduce graph autocomplete, an interactive approach that guides users to construct and refine queries, preventing over-specification; (2) VISAGE guides the construction of graph queries using a data-driven approach, enabling users to specify queries with varying levels of specificity, from concrete and detailed (e.g., query by example), to abstract (e.g., with “wildcard” nodes of any types), to purely structural matching; (3) a twelve-participant, within-subject user study demonstrates VISAGE’s ease of use and the ability to construct graph queries significantly faster than using a conventional query language; (4) VISAGE works on real graphs with over 468K edges, achieving sub-second response times for common queries. PMID:28553670
A Visual Interface for Querying Heterogeneous Phylogenetic Databases.

PubMed

Jamil, Hasan M

2017-01-01

Despite the recent growth in the number of phylogenetic databases, access to these wealth of resources remain largely tool or form-based interface driven. It is our thesis that the flexibility afforded by declarative query languages may offer the opportunity to access these repositories in a better way, and to use such a language to pose truly powerful queries in unprecedented ways. In this paper, we propose a substantially enhanced closed visual query language, called PhyQL, that can be used to query phylogenetic databases represented in a canonical form. The canonical representation presented helps capture most phylogenetic tree formats in a convenient way, and is used as the storage model for our PhyloBase database for which PhyQL serves as the query language. We have implemented a visual interface for the end users to pose PhyQL queries using visual icons, and drag and drop operations defined over them. Once a query is posed, the interface translates the visual query into a Datalog query for execution over the canonical database. Responses are returned as hyperlinks to phylogenies that can be viewed in several formats using the tree viewers supported by PhyloBase. Results cached in PhyQL buffer allows secondary querying on the computed results making it a truly powerful querying architecture.
Which factors predict the time spent answering queries to a drug information centre?

PubMed Central

Reppe, Linda A.; Spigset, Olav

2010-01-01

Objective To develop a model based upon factors able to predict the time spent answering drug-related queries to Norwegian drug information centres (DICs). Setting and method Drug-related queries received at 5 DICs in Norway from March to May 2007 were randomly assigned to 20 employees until each of them had answered a minimum of five queries. The employees reported the number of drugs involved, the type of literature search performed, and whether the queries were considered judgmental or not, using a specifically developed scoring system. Main outcome measures The scores of these three factors were added together to define a workload score for each query. Workload and its individual factors were subsequently related to the measured time spent answering the queries by simple or multiple linear regression analyses. Results Ninety-six query/answer pairs were analyzed. Workload significantly predicted the time spent answering the queries (adjusted R2 = 0.22, P < 0.001). Literature search was the individual factor best predicting the time spent answering the queries (adjusted R2 = 0.17, P < 0.001), and this variable also contributed the most in the multiple regression analyses. Conclusion The most important workload factor predicting the time spent handling the queries in this study was the type of literature search that had to be performed. The categorisation of queries as judgmental or not, also affected the time spent answering the queries. The number of drugs involved did not significantly influence the time spent answering drug information queries. PMID:20922480
Personalized query suggestion based on user behavior

NASA Astrophysics Data System (ADS)

Chen, Wanyu; Hao, Zepeng; Shao, Taihua; Chen, Honghui

Query suggestions help users refine their queries after they input an initial query. Previous work mainly concentrated on similarity-based and context-based query suggestion approaches. However, models that focus on adapting to a specific user (personalization) can help to improve the probability of the user being satisfied. In this paper, we propose a personalized query suggestion model based on users’ search behavior (UB model), where we inject relevance between queries and users’ search behavior into a basic probabilistic model. For the relevance between queries, we consider their semantical similarity and co-occurrence which indicates the behavior information from other users in web search. Regarding the current user’s preference to a query, we combine the user’s short-term and long-term search behavior in a linear fashion and deal with the data sparse problem with Bayesian probabilistic matrix factorization (BPMF). In particular, we also investigate the impact of different personalization strategies (the combination of the user’s short-term and long-term search behavior) on the performance of query suggestion reranking. We quantify the improvement of our proposed UB model against a state-of-the-art baseline using the public AOL query logs and show that it beats the baseline in terms of metrics used in query suggestion reranking. The experimental results show that: (i) for personalized ranking, users’ behavioral information helps to improve query suggestion effectiveness; and (ii) given a query, merging information inferred from the short-term and long-term search behavior of a particular user can result in a better performance than both plain approaches.
Estimating Influenza Outbreaks Using Both Search Engine Query Data and Social Media Data in South Korea.

PubMed

Woo, Hyekyung; Cho, Youngtae; Shim, Eunyoung; Lee, Jong-Koo; Lee, Chang-Gun; Kim, Seong Hwan

2016-07-04

As suggested as early as in 2006, logs of queries submitted to search engines seeking information could be a source for detection of emerging influenza epidemics if changes in the volume of search queries are monitored (infodemiology). However, selecting queries that are most likely to be associated with influenza epidemics is a particular challenge when it comes to generating better predictions. In this study, we describe a methodological extension for detecting influenza outbreaks using search query data; we provide a new approach for query selection through the exploration of contextual information gleaned from social media data. Additionally, we evaluate whether it is possible to use these queries for monitoring and predicting influenza epidemics in South Korea. Our study was based on freely available weekly influenza incidence data and query data originating from the search engine on the Korean website Daum between April 3, 2011 and April 5, 2014. To select queries related to influenza epidemics, several approaches were applied: (1) exploring influenza-related words in social media data, (2) identifying the chief concerns related to influenza, and (3) using Web query recommendations. Optimal feature selection by least absolute shrinkage and selection operator (Lasso) and support vector machine for regression (SVR) were used to construct a model predicting influenza epidemics. In total, 146 queries related to influenza were generated through our initial query selection approach. A considerable proportion of optimal features for final models were derived from queries with reference to the social media data. The SVR model performed well: the prediction values were highly correlated with the recent observed influenza-like illness (r=.956; P<.001) and virological incidence rate (r=.963; P<.001). These results demonstrate the feasibility of using search queries to enhance influenza surveillance in South Korea. In addition, an approach for query selection using social media data seems ideal for supporting influenza surveillance based on search query data.
Estimating Influenza Outbreaks Using Both Search Engine Query Data and Social Media Data in South Korea

PubMed Central

Woo, Hyekyung; Shim, Eunyoung; Lee, Jong-Koo; Lee, Chang-Gun; Kim, Seong Hwan

2016-01-01

Background As suggested as early as in 2006, logs of queries submitted to search engines seeking information could be a source for detection of emerging influenza epidemics if changes in the volume of search queries are monitored (infodemiology). However, selecting queries that are most likely to be associated with influenza epidemics is a particular challenge when it comes to generating better predictions. Objective In this study, we describe a methodological extension for detecting influenza outbreaks using search query data; we provide a new approach for query selection through the exploration of contextual information gleaned from social media data. Additionally, we evaluate whether it is possible to use these queries for monitoring and predicting influenza epidemics in South Korea. Methods Our study was based on freely available weekly influenza incidence data and query data originating from the search engine on the Korean website Daum between April 3, 2011 and April 5, 2014. To select queries related to influenza epidemics, several approaches were applied: (1) exploring influenza-related words in social media data, (2) identifying the chief concerns related to influenza, and (3) using Web query recommendations. Optimal feature selection by least absolute shrinkage and selection operator (Lasso) and support vector machine for regression (SVR) were used to construct a model predicting influenza epidemics. Results In total, 146 queries related to influenza were generated through our initial query selection approach. A considerable proportion of optimal features for final models were derived from queries with reference to the social media data. The SVR model performed well: the prediction values were highly correlated with the recent observed influenza-like illness (r=.956; P<.001) and virological incidence rate (r=.963; P<.001). Conclusions These results demonstrate the feasibility of using search queries to enhance influenza surveillance in South Korea. In addition, an approach for query selection using social media data seems ideal for supporting influenza surveillance based on search query data. PMID:27377323
Lost in translation? A multilingual Query Builder improves the quality of PubMed queries: a randomised controlled trial.

PubMed

Schuers, Matthieu; Joulakian, Mher; Kerdelhué, Gaetan; Segas, Léa; Grosjean, Julien; Darmoni, Stéfan J; Griffon, Nicolas

2017-07-03

MEDLINE is the most widely used medical bibliographic database in the world. Most of its citations are in English and this can be an obstacle for some researchers to access the information the database contains. We created a multilingual query builder to facilitate access to the PubMed subset using a language other than English. The aim of our study was to assess the impact of this multilingual query builder on the quality of PubMed queries for non-native English speaking physicians and medical researchers. A randomised controlled study was conducted among French speaking general practice residents. We designed a multi-lingual query builder to facilitate information retrieval, based on available MeSH translations and providing users with both an interface and a controlled vocabulary in their own language. Participating residents were randomly allocated either the French or the English version of the query builder. They were asked to translate 12 short medical questions into MeSH queries. The main outcome was the quality of the query. Two librarians blind to the arm independently evaluated each query, using a modified published classification that differentiated eight types of errors. Twenty residents used the French version of the query builder and 22 used the English version. 492 queries were analysed. There were significantly more perfect queries in the French group vs. the English group (respectively 37.9% vs. 17.9%; p < 0.01). It took significantly more time for the members of the English group than the members of the French group to build each query, respectively 194 sec vs. 128 sec; p < 0.01. This multi-lingual query builder is an effective tool to improve the quality of PubMed queries in particular for researchers whose first language is not English.
A Web-Based Data-Querying Tool Based on Ontology-Driven Methodology and Flowchart-Based Model

PubMed Central

Ping, Xiao-Ou; Chung, Yufang; Liang, Ja-Der; Yang, Pei-Ming; Huang, Guan-Tarn; Lai, Feipei

2013-01-01

Background Because of the increased adoption rate of electronic medical record (EMR) systems, more health care records have been increasingly accumulating in clinical data repositories. Therefore, querying the data stored in these repositories is crucial for retrieving the knowledge from such large volumes of clinical data. Objective The aim of this study is to develop a Web-based approach for enriching the capabilities of the data-querying system along the three following considerations: (1) the interface design used for query formulation, (2) the representation of query results, and (3) the models used for formulating query criteria. Methods The Guideline Interchange Format version 3.5 (GLIF3.5), an ontology-driven clinical guideline representation language, was used for formulating the query tasks based on the GLIF3.5 flowchart in the Protégé environment. The flowchart-based data-querying model (FBDQM) query execution engine was developed and implemented for executing queries and presenting the results through a visual and graphical interface. To examine a broad variety of patient data, the clinical data generator was implemented to automatically generate the clinical data in the repository, and the generated data, thereby, were employed to evaluate the system. The accuracy and time performance of the system for three medical query tasks relevant to liver cancer were evaluated based on the clinical data generator in the experiments with varying numbers of patients. Results In this study, a prototype system was developed to test the feasibility of applying a methodology for building a query execution engine using FBDQMs by formulating query tasks using the existing GLIF. The FBDQM-based query execution engine was used to successfully retrieve the clinical data based on the query tasks formatted using the GLIF3.5 in the experiments with varying numbers of patients. The accuracy of the three queries (ie, “degree of liver damage,” “degree of liver damage when applying a mutually exclusive setting,” and “treatments for liver cancer”) was 100% for all four experiments (10 patients, 100 patients, 1000 patients, and 10,000 patients). Among the three measured query phases, (1) structured query language operations, (2) criteria verification, and (3) other, the first two had the longest execution time. Conclusions The ontology-driven FBDQM-based approach enriched the capabilities of the data-querying system. The adoption of the GLIF3.5 increased the potential for interoperability, shareability, and reusability of the query tasks. PMID:25600078
Mining Longitudinal Web Queries: Trends and Patterns.

ERIC Educational Resources Information Center

Wang, Peiling; Berry, Michael W.; Yang, Yiheng

2003-01-01

Analyzed user queries submitted to an academic Web site during a four-year period, using a relational database, to examine users' query behavior, to identify problems they encounter, and to develop techniques for optimizing query analysis and mining. Linguistic analyses focus on query structures, lexicon, and word associations using statistical…

Optimizing a Query by Transformation and Expansion.

PubMed

Glocker, Katrin; Knurr, Alexander; Dieter, Julia; Dominick, Friederike; Forche, Melanie; Koch, Christian; Pascoe Pérez, Analie; Roth, Benjamin; Ückert, Frank

2017-01-01

In the biomedical sector not only the amount of information produced and uploaded into the web is enormous, but also the number of sources where these data can be found. Clinicians and researchers spend huge amounts of time on trying to access this information and to filter the most important answers to a given question. As the formulation of these queries is crucial, automated query expansion is an effective tool to optimize a query and receive the best possible results. In this paper we introduce the concept of a workflow for an optimization of queries in the medical and biological sector by using a series of tools for expansion and transformation of the query. After the definition of attributes by the user, the query string is compared to previous queries in order to add semantic co-occurring terms to the query. Additionally, the query is enlarged by an inclusion of synonyms. The translation into database specific ontologies ensures the optimal query formulation for the chosen database(s). As this process can be performed in various databases at once, the results are ranked and normalized in order to achieve a comparable list of answers for a question.
WATCHMAN: A Data Warehouse Intelligent Cache Manager

NASA Technical Reports Server (NTRS)

Scheuermann, Peter; Shim, Junho; Vingralek, Radek

1996-01-01

Data warehouses store large volumes of data which are used frequently by decision support applications. Such applications involve complex queries. Query performance in such an environment is critical because decision support applications often require interactive query response time. Because data warehouses are updated infrequently, it becomes possible to improve query performance by caching sets retrieved by queries in addition to query execution plans. In this paper we report on the design of an intelligent cache manager for sets retrieved by queries called WATCHMAN, which is particularly well suited for data warehousing environment. Our cache manager employs two novel, complementary algorithms for cache replacement and for cache admission. WATCHMAN aims at minimizing query response time and its cache replacement policy swaps out entire retrieved sets of queries instead of individual pages. The cache replacement and admission algorithms make use of a profit metric, which considers for each retrieved set its average rate of reference, its size, and execution cost of the associated query. We report on a performance evaluation based on the TPC-D and Set Query benchmarks. These experiments show that WATCHMAN achieves a substantial performance improvement in a decision support environment when compared to a traditional LRU replacement algorithm.
CampusGIS of the University of Cologne: a tool for orientation, navigation, and management

NASA Astrophysics Data System (ADS)

Baaser, U.; Gnyp, M. L.; Hennig, S.; Hoffmeister, D.; Köhn, N.; Laudien, R.; Bareth, G.

2006-10-01

The working group for GIS and Remote Sensing at the Department of Geography at the University of Cologne has established a WebGIS called CampusGIS of the University of Cologne. The overall task of the CampusGIS is the connection of several existing databases at the University of Cologne with spatial data. These existing databases comprise data about staff, buildings, rooms, lectures, and general infrastructure like bus stops etc. These information were yet not linked to their spatial relation. Therefore, a GIS-based method is developed to link all the different databases to spatial entities. Due to the philosophy of the CampusGIS, an online-GUI is programmed which enables users to search for staff, buildings, or institutions. The query results are linked to the GIS database which allows the visualization of the spatial location of the searched entity. This system was established in 2005 and is operational since early 2006. In this contribution, the focus is on further developments. First results of (i) including routing services in, (ii) programming GUIs for mobile devices for, and (iii) including infrastructure management tools in the CampusGIS are presented. Consequently, the CampusGIS is not only available for spatial information retrieval and orientation. It also serves for on-campus navigation and administrative management.
Integration and management of massive remote-sensing data based on GeoSOT subdivision model

NASA Astrophysics Data System (ADS)

Li, Shuang; Cheng, Chengqi; Chen, Bo; Meng, Li

2016-07-01

Owing to the rapid development of earth observation technology, the volume of spatial information is growing rapidly; therefore, improving query retrieval speed from large, rich data sources for remote-sensing data management systems is quite urgent. A global subdivision model, geographic coordinate subdivision grid with one-dimension integer coding on 2n-tree, which we propose as a solution, has been used in data management organizations. However, because a spatial object may cover several grids, ample data redundancy will occur when data are stored in relational databases. To solve this redundancy problem, we first combined the subdivision model with the spatial array database containing the inverted index. We proposed an improved approach for integrating and managing massive remote-sensing data. By adding a spatial code column in an array format in a database, spatial information in remote-sensing metadata can be stored and logically subdivided. We implemented our method in a Kingbase Enterprise Server database system and compared the results with the Oracle platform by simulating worldwide image data. Experimental results showed that our approach performed better than Oracle in terms of data integration and time and space efficiency. Our approach also offers an efficient storage management system for existing storage centers and management systems.
Implementation and evaluation of a hypercube-based method for spatiotemporal exploration and analysis

NASA Astrophysics Data System (ADS)

Marchand, Pierre; Brisebois, Alexandre; Bédard, Yvan; Edwards, Geoffrey

This paper presents the results obtained with a new type of spatiotemporal topological dimension implemented within a hypercube, i.e., within a multidimensional database (MDDB) structure formed by the conjunction of several thematic, spatial and temporal dimensions. Our goal is to support efficient SpatioTemporal Exploration and Analysis (STEA) in the context of Automatic Position Reporting System (APRS), the worldwide amateur radio system for position report transmission. Mobile APRS stations are equipped with GPS navigation systems to provide real-time positioning reports. Previous research about the multidimensional approach has proved good potential for spatiotemporal exploration and analysis despite a lack of explicit topological operators (spatial, temporal and spatiotemporal). Our project implemented such operators through a hierarchy of operators that are applied to pairs of instances of objects. At the top of the hierarchy, users can use simple operators such as "same place", "same time" or "same time, same place". As they drill down into the hierarchy, more detailed topological operators are made available such as "adjacent immediately after", "touch during" or more detailed operators. This hierarchy is structured according to four levels of granularity based on cognitive models, generalized relationships and formal models of topological relationships. In this paper, we also describe the generic approach which allows efficient STEA within the multidimensional approach. Finally, we demonstrate that such an implementation offers query run times which permit to maintain a "train-of-thought" during exploration and analysis operations as they are compatible with Newell's cognitive band (query runtime<10 s) (Newell, A., 1990. Unified theories of cognition. Harvard University Press, Cambridge MA, 549 p.).
Making Temporal Search More Central in Spatial Data Infrastructures

NASA Astrophysics Data System (ADS)

Corti, P.; Lewis, B.

2017-10-01

A temporally enabled Spatial Data Infrastructure (SDI) is a framework of geospatial data, metadata, users, and tools intended to provide an efficient and flexible way to use spatial information which includes the historical dimension. One of the key software components of an SDI is the catalogue service which is needed to discover, query, and manage the metadata. A search engine is a software system capable of supporting fast and reliable search, which may use any means necessary to get users to the resources they need quickly and efficiently. These techniques may include features such as full text search, natural language processing, weighted results, temporal search based on enrichment, visualization of patterns in distributions of results in time and space using temporal and spatial faceting, and many others. In this paper we will focus on the temporal aspects of search which include temporal enrichment using a time miner - a software engine able to search for date components within a larger block of text, the storage of time ranges in the search engine, handling historical dates, and the use of temporal histograms in the user interface to display the temporal distribution of search results.
Assisting Consumer Health Information Retrieval with Query Recommendations

PubMed Central

Zeng, Qing T.; Crowell, Jonathan; Plovnick, Robert M.; Kim, Eunjung; Ngo, Long; Dibble, Emily

2006-01-01

Objective: Health information retrieval (HIR) on the Internet has become an important practice for millions of people, many of whom have problems forming effective queries. We have developed and evaluated a tool to assist people in health-related query formation. Design: We developed the Health Information Query Assistant (HIQuA) system. The system suggests alternative/additional query terms related to the user's initial query that can be used as building blocks to construct a better, more specific query. The recommended terms are selected according to their semantic distance from the original query, which is calculated on the basis of concept co-occurrences in medical literature and log data as well as semantic relations in medical vocabularies. Measurements: An evaluation of the HIQuA system was conducted and a total of 213 subjects participated in the study. The subjects were randomized into 2 groups. One group was given query recommendations and the other was not. Each subject performed HIR for both a predefined and a self-defined task. Results: The study showed that providing HIQuA recommendations resulted in statistically significantly higher rates of successful queries (odds ratio = 1.66, 95% confidence interval = 1.16–2.38), although no statistically significant impact on user satisfaction or the users' ability to accomplish the predefined retrieval task was found. Conclusion: Providing semantic-distance-based query recommendations can help consumers with query formation during HIR. PMID:16221944
PAQ: Persistent Adaptive Query Middleware for Dynamic Environments

NASA Astrophysics Data System (ADS)

Rajamani, Vasanth; Julien, Christine; Payton, Jamie; Roman, Gruia-Catalin

Pervasive computing applications often entail continuous monitoring tasks, issuing persistent queries that return continuously updated views of the operational environment. We present PAQ, a middleware that supports applications' needs by approximating a persistent query as a sequence of one-time queries. PAQ introduces an integration strategy abstraction that allows composition of one-time query responses into streams representing sophisticated spatio-temporal phenomena of interest. A distinguishing feature of our middleware is the realization that the suitability of a persistent query's result is a function of the application's tolerance for accuracy weighed against the associated overhead costs. In PAQ, programmers can specify an inquiry strategy that dictates how information is gathered. Since network dynamics impact the suitability of a particular inquiry strategy, PAQ associates an introspection strategy with a persistent query, that evaluates the quality of the query's results. The result of introspection can trigger application-defined adaptation strategies that alter the nature of the query. PAQ's simple API makes developing adaptive querying systems easily realizable. We present the key abstractions, describe their implementations, and demonstrate the middleware's usefulness through application examples and evaluation.
Analysis of queries sent to PubMed at the point of care: Observation of search behaviour in a medical teaching hospital

PubMed Central

Hoogendam, Arjen; Stalenhoef, Anton FH; Robbé, Pieter F de Vries; Overbeke, A John PM

2008-01-01

Background The use of PubMed to answer daily medical care questions is limited because it is challenging to retrieve a small set of relevant articles and time is restricted. Knowing what aspects of queries are likely to retrieve relevant articles can increase the effectiveness of PubMed searches. The objectives of our study were to identify queries that are likely to retrieve relevant articles by relating PubMed search techniques and tools to the number of articles retrieved and the selection of articles for further reading. Methods This was a prospective observational study of queries regarding patient-related problems sent to PubMed by residents and internists in internal medicine working in an Academic Medical Centre. We analyzed queries, search results, query tools (Mesh, Limits, wildcards, operators), selection of abstract and full-text for further reading, using a portal that mimics PubMed. Results PubMed was used to solve 1121 patient-related problems, resulting in 3205 distinct queries. Abstracts were viewed in 999 (31%) of these queries, and in 126 (39%) of 321 queries using query tools. The average term count per query was 2.5. Abstracts were selected in more than 40% of queries using four or five terms, increasing to 63% if the use of four or five terms yielded 2–161 articles. Conclusion Queries sent to PubMed by physicians at our hospital during daily medical care contain fewer than three terms. Queries using four to five terms, retrieving less than 161 article titles, are most likely to result in abstract viewing. PubMed search tools are used infrequently by our population and are less effective than the use of four or five terms. Methods to facilitate the formulation of precise queries, using more relevant terms, should be the focus of education and research. PMID:18816391
PRAIRIEMAP: A GIS database for prairie grassland management in western North America

USGS Publications Warehouse

,

2003-01-01

The USGS Forest and Rangeland Ecosystem Science Center, Snake River Field Station (SRFS) maintains a database of spatial information, called PRAIRIEMAP, which is needed to address the management of prairie grasslands in western North America. We identify and collect spatial data for the region encompassing the historical extent of prairie grasslands (Figure 1). State and federal agencies, the primary entities responsible for management of prairie grasslands, need this information to develop proactive management strategies to prevent prairie-grassland wildlife species from being listed as Endangered Species, or to develop appropriate responses if listing does occur. Spatial data are an important component in documenting current habitat and other environmental conditions, which can be used to identify areas that have undergone significant changes in land cover and to identify underlying causes. Spatial data will also be a critical component guiding the decision processes for restoration of habitat in the Great Plains. As such, the PRAIRIEMAP database will facilitate analyses of large-scale and range-wide factors that may be causing declines in grassland habitat and populations of species that depend on it for their survival. Therefore, development of a reliable spatial database carries multiple benefits for land and wildlife management. The project consists of 3 phases: (1) identify relevant spatial data, (2) assemble, document, and archive spatial data on a computer server, and (3) develop and maintain the web site (http://prairiemap.wr.usgs.gov) for query and transfer of GIS data to managers and researchers.
Real-Time Earthquake Monitoring with Spatio-Temporal Fields

NASA Astrophysics Data System (ADS)

Whittier, J. C.; Nittel, S.; Subasinghe, I.

2017-10-01

With live streaming sensors and sensor networks, increasingly large numbers of individual sensors are deployed in physical space. Sensor data streams are a fundamentally novel mechanism to deliver observations to information systems. They enable us to represent spatio-temporal continuous phenomena such as radiation accidents, toxic plumes, or earthquakes almost as instantaneously as they happen in the real world. Sensor data streams discretely sample an earthquake, while the earthquake is continuous over space and time. Programmers attempting to integrate many streams to analyze earthquake activity and scope need to write code to integrate potentially very large sets of asynchronously sampled, concurrent streams in tedious application code. In previous work, we proposed the field stream data model (Liang et al., 2016) for data stream engines. Abstracting the stream of an individual sensor as a temporal field, the field represents the Earth's movement at the sensor position as continuous. This simplifies analysis across many sensors significantly. In this paper, we undertake a feasibility study of using the field stream model and the open source Data Stream Engine (DSE) Apache Spark(Apache Spark, 2017) to implement a real-time earthquake event detection with a subset of the 250 GPS sensor data streams of the Southern California Integrated GPS Network (SCIGN). The field-based real-time stream queries compute maximum displacement values over the latest query window of each stream, and related spatially neighboring streams to identify earthquake events and their extent. Further, we correlated the detected events with an USGS earthquake event feed. The query results are visualized in real-time.
Improvement of density models of geological structures by fusion of gravity data and cosmic muon radiographies

NASA Astrophysics Data System (ADS)

Jourde, K.; Gibert, D.; Marteau, J.

2015-04-01

This paper examines how the resolution of small-scale geological density models is improved through the fusion of information provided by gravity measurements and density muon radiographies. Muon radiography aims at determining the density of geological bodies by measuring their screening effect on the natural flux of cosmic muons. Muon radiography essentially works like medical X-ray scan and integrates density information along elongated narrow conical volumes. Gravity measurements are linked to density by a 3-D integration encompassing the whole studied domain. We establish the mathematical expressions of these integration formulas - called acquisition kernels - and derive the resolving kernels that are spatial filters relating the true unknown density structure to the density distribution actually recovered from the available data. The resolving kernels approach allows to quantitatively describe the improvement of the resolution of the density models achieved by merging gravity data and muon radiographies. The method developed in this paper may be used to optimally design the geometry of the field measurements to perform in order to obtain a given spatial resolution pattern of the density model to construct. The resolving kernels derived in the joined muon/gravimetry case indicate that gravity data are almost useless to constrain the density structure in regions sampled by more than two muon tomography acquisitions. Interestingly the resolution in deeper regions not sampled by muon tomography is significantly improved by joining the two techniques. The method is illustrated with examples for La Soufrière of Guadeloupe volcano.
LAILAPS-QSM: A RESTful API and JAVA library for semantic query suggestions.

PubMed

Chen, Jinbo; Scholz, Uwe; Zhou, Ruonan; Lange, Matthias

2018-03-01

In order to access and filter content of life-science databases, full text search is a widely applied query interface. But its high flexibility and intuitiveness is paid for with potentially imprecise and incomplete query results. To reduce this drawback, query assistance systems suggest those combinations of keywords with the highest potential to match most of the relevant data records. Widespread approaches are syntactic query corrections that avoid misspelling and support expansion of words by suffixes and prefixes. Synonym expansion approaches apply thesauri, ontologies, and query logs. All need laborious curation and maintenance. Furthermore, access to query logs is in general restricted. Approaches that infer related queries by their query profile like research field, geographic location, co-authorship, affiliation etc. require user's registration and its public accessibility that contradict privacy concerns. To overcome these drawbacks, we implemented LAILAPS-QSM, a machine learning approach that reconstruct possible linguistic contexts of a given keyword query. The context is referred from the text records that are stored in the databases that are going to be queried or extracted for a general purpose query suggestion from PubMed abstracts and UniProt data. The supplied tool suite enables the pre-processing of these text records and the further computation of customized distributed word vectors. The latter are used to suggest alternative keyword queries. An evaluated of the query suggestion quality was done for plant science use cases. Locally present experts enable a cost-efficient quality assessment in the categories trait, biological entity, taxonomy, affiliation, and metabolic function which has been performed using ontology term similarities. LAILAPS-QSM mean information content similarity for 15 representative queries is 0.70, whereas 34% have a score above 0.80. In comparison, the information content similarity for human expert made query suggestions is 0.90. The software is either available as tool set to build and train dedicated query suggestion services or as already trained general purpose RESTful web service. The service uses open interfaces to be seamless embeddable into database frontends. The JAVA implementation uses highly optimized data structures and streamlined code to provide fast and scalable response for web service calls. The source code of LAILAPS-QSM is available under GNU General Public License version 2 in Bitbucket GIT repository: https://bitbucket.org/ipk_bit_team/bioescorte-suggestion.
Facilitating Cohort Discovery by Enhancing Ontology Exploration, Query Management and Query Sharing for Large Clinical Data Repositories.

PubMed

Tao, Shiqiang; Cui, Licong; Wu, Xi; Zhang, Guo-Qiang

2017-01-01

To help researchers better access clinical data, we developed a prototype query engine called DataSphere for exploring large-scale integrated clinical data repositories. DataSphere expedites data importing using a NoSQL data management system and dynamically renders its user interface for concept-based querying tasks. DataSphere provides an interactive query-building interface together with query translation and optimization strategies, which enable users to build and execute queries effectively and efficiently. We successfully loaded a dataset of one million patients for University of Kentucky (UK) Healthcare into DataSphere with more than 300 million clinical data records. We evaluated DataSphere by comparing it with an instance of i2b2 deployed at UK Healthcare, demonstrating that DataSphere provides enhanced user experience for both query building and execution.
Facilitating Cohort Discovery by Enhancing Ontology Exploration, Query Management and Query Sharing for Large Clinical Data Repositories

PubMed Central

Tao, Shiqiang; Cui, Licong; Wu, Xi; Zhang, Guo-Qiang

2017-01-01

To help researchers better access clinical data, we developed a prototype query engine called DataSphere for exploring large-scale integrated clinical data repositories. DataSphere expedites data importing using a NoSQL data management system and dynamically renders its user interface for concept-based querying tasks. DataSphere provides an interactive query-building interface together with query translation and optimization strategies, which enable users to build and execute queries effectively and efficiently. We successfully loaded a dataset of one million patients for University of Kentucky (UK) Healthcare into DataSphere with more than 300 million clinical data records. We evaluated DataSphere by comparing it with an instance of i2b2 deployed at UK Healthcare, demonstrating that DataSphere provides enhanced user experience for both query building and execution. PMID:29854239
Improve Performance of Data Warehouse by Query Cache

NASA Astrophysics Data System (ADS)

Gour, Vishal; Sarangdevot, S. S.; Sharma, Anand; Choudhary, Vinod

2010-11-01

The primary goal of data warehouse is to free the information locked up in the operational database so that decision makers and business analyst can make queries, analysis and planning regardless of the data changes in operational database. As the number of queries is large, therefore, in certain cases there is reasonable probability that same query submitted by the one or multiple users at different times. Each time when query is executed, all the data of warehouse is analyzed to generate the result of that query. In this paper we will study how using query cache improves performance of Data Warehouse and try to find the common problems faced. These kinds of problems are faced by Data Warehouse administrators which are minimizes response time and improves the efficiency of query in data warehouse overall, particularly when data warehouse is updated at regular interval.
Complex analyses on clinical information systems using restricted natural language querying to resolve time-event dependencies.

PubMed

Safari, Leila; Patrick, Jon D

2018-06-01

This paper reports on a generic framework to provide clinicians with the ability to conduct complex analyses on elaborate research topics using cascaded queries to resolve internal time-event dependencies in the research questions, as an extension to the proposed Clinical Data Analytics Language (CliniDAL). A cascaded query model is proposed to resolve internal time-event dependencies in the queries which can have up to five levels of criteria starting with a query to define subjects to be admitted into a study, followed by a query to define the time span of the experiment. Three more cascaded queries can be required to define control groups, control variables and output variables which all together simulate a real scientific experiment. According to the complexity of the research questions, the cascaded query model has the flexibility of merging some lower level queries for simple research questions or adding a nested query to each level to compose more complex queries. Three different scenarios (one of them contains two studies) are described and used for evaluation of the proposed solution. CliniDAL's complex analyses solution enables answering complex queries with time-event dependencies at most in a few hours which manually would take many days. An evaluation of results of the research studies based on the comparison between CliniDAL and SQL solutions reveals high usability and efficiency of CliniDAL's solution. Copyright © 2018 Elsevier Inc. All rights reserved.
Evaluation of Sub Query Performance in SQL Server

NASA Astrophysics Data System (ADS)

Oktavia, Tanty; Sujarwo, Surya

2014-03-01

The paper explores several sub query methods used in a query and their impact on the query performance. The study uses experimental approach to evaluate the performance of each sub query methods combined with indexing strategy. The sub query methods consist of in, exists, relational operator and relational operator combined with top operator. The experimental shows that using relational operator combined with indexing strategy in sub query has greater performance compared with using same method without indexing strategy and also other methods. In summary, for application that emphasized on the performance of retrieving data from database, it better to use relational operator combined with indexing strategy. This study is done on Microsoft SQL Server 2012.
Secure Skyline Queries on Cloud Platform.

PubMed

Liu, Jinfei; Yang, Juncheng; Xiong, Li; Pei, Jian

2017-04-01

Outsourcing data and computation to cloud server provides a cost-effective way to support large scale data storage and query processing. However, due to security and privacy concerns, sensitive data (e.g., medical records) need to be protected from the cloud server and other unauthorized users. One approach is to outsource encrypted data to the cloud server and have the cloud server perform query processing on the encrypted data only. It remains a challenging task to support various queries over encrypted data in a secure and efficient way such that the cloud server does not gain any knowledge about the data, query, and query result. In this paper, we study the problem of secure skyline queries over encrypted data. The skyline query is particularly important for multi-criteria decision making but also presents significant challenges due to its complex computations. We propose a fully secure skyline query protocol on data encrypted using semantically-secure encryption. As a key subroutine, we present a new secure dominance protocol, which can be also used as a building block for other queries. Finally, we provide both serial and parallelized implementations and empirically study the protocols in terms of efficiency and scalability under different parameter settings, verifying the feasibility of our proposed solutions.
Distributed query plan generation using multiobjective genetic algorithm.

PubMed

Panicker, Shina; Kumar, T V Vijay

2014-01-01

A distributed query processing strategy, which is a key performance determinant in accessing distributed databases, aims to minimize the total query processing cost. One way to achieve this is by generating efficient distributed query plans that involve fewer sites for processing a query. In the case of distributed relational databases, the number of possible query plans increases exponentially with respect to the number of relations accessed by the query and the number of sites where these relations reside. Consequently, computing optimal distributed query plans becomes a complex problem. This distributed query plan generation (DQPG) problem has already been addressed using single objective genetic algorithm, where the objective is to minimize the total query processing cost comprising the local processing cost (LPC) and the site-to-site communication cost (CC). In this paper, this DQPG problem is formulated and solved as a biobjective optimization problem with the two objectives being minimize total LPC and minimize total CC. These objectives are simultaneously optimized using a multiobjective genetic algorithm NSGA-II. Experimental comparison of the proposed NSGA-II based DQPG algorithm with the single objective genetic algorithm shows that the former performs comparatively better and converges quickly towards optimal solutions for an observed crossover and mutation probability.

Distributed Query Plan Generation Using Multiobjective Genetic Algorithm

PubMed Central

Panicker, Shina; Vijay Kumar, T. V.

2014-01-01

A distributed query processing strategy, which is a key performance determinant in accessing distributed databases, aims to minimize the total query processing cost. One way to achieve this is by generating efficient distributed query plans that involve fewer sites for processing a query. In the case of distributed relational databases, the number of possible query plans increases exponentially with respect to the number of relations accessed by the query and the number of sites where these relations reside. Consequently, computing optimal distributed query plans becomes a complex problem. This distributed query plan generation (DQPG) problem has already been addressed using single objective genetic algorithm, where the objective is to minimize the total query processing cost comprising the local processing cost (LPC) and the site-to-site communication cost (CC). In this paper, this DQPG problem is formulated and solved as a biobjective optimization problem with the two objectives being minimize total LPC and minimize total CC. These objectives are simultaneously optimized using a multiobjective genetic algorithm NSGA-II. Experimental comparison of the proposed NSGA-II based DQPG algorithm with the single objective genetic algorithm shows that the former performs comparatively better and converges quickly towards optimal solutions for an observed crossover and mutation probability. PMID:24963513
Towards Hybrid Online On-Demand Querying of Realtime Data with Stateful Complex Event Processing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhou, Qunzhi; Simmhan, Yogesh; Prasanna, Viktor K.

Emerging Big Data applications in areas like e-commerce and energy industry require both online and on-demand queries to be performed over vast and fast data arriving as streams. These present novel challenges to Big Data management systems. Complex Event Processing (CEP) is recognized as a high performance online query scheme which in particular deals with the velocity aspect of the 3-V’s of Big Data. However, traditional CEP systems do not consider data variety and lack the capability to embed ad hoc queries over the volume of data streams. In this paper, we propose H2O, a stateful complex event processing framework,more » to support hybrid online and on-demand queries over realtime data. We propose a semantically enriched event and query model to address data variety. A formal query algebra is developed to precisely capture the stateful and containment semantics of online and on-demand queries. We describe techniques to achieve the interactive query processing over realtime data featured by efficient online querying, dynamic stream data persistence and on-demand access. The system architecture is presented and the current implementation status reported.« less
Semantic Metadata for Heterogeneous Spatial Planning Documents

NASA Astrophysics Data System (ADS)

Iwaniak, A.; Kaczmarek, I.; Łukowicz, J.; Strzelecki, M.; Coetzee, S.; Paluszyński, W.

2016-09-01

Spatial planning documents contain information about the principles and rights of land use in different zones of a local authority. They are the basis for administrative decision making in support of sustainable development. In Poland these documents are published on the Web according to a prescribed non-extendable XML schema, designed for optimum presentation to humans in HTML web pages. There is no document standard, and limited functionality exists for adding references to external resources. The text in these documents is discoverable and searchable by general-purpose web search engines, but the semantics of the content cannot be discovered or queried. The spatial information in these documents is geographically referenced but not machine-readable. Major manual efforts are required to integrate such heterogeneous spatial planning documents from various local authorities for analysis, scenario planning and decision support. This article presents results of an implementation using machine-readable semantic metadata to identify relationships among regulations in the text, spatial objects in the drawings and links to external resources. A spatial planning ontology was used to annotate different sections of spatial planning documents with semantic metadata in the Resource Description Framework in Attributes (RDFa). The semantic interpretation of the content, links between document elements and links to external resources were embedded in XHTML pages. An example and use case from the spatial planning domain in Poland is presented to evaluate its efficiency and applicability. The solution enables the automated integration of spatial planning documents from multiple local authorities to assist decision makers with understanding and interpreting spatial planning information. The approach is equally applicable to legal documents from other countries and domains, such as cultural heritage and environmental management.
Query Health: standards-based, cross-platform population health surveillance

PubMed Central

Klann, Jeffrey G; Buck, Michael D; Brown, Jeffrey; Hadley, Marc; Elmore, Richard; Weber, Griffin M; Murphy, Shawn N

2014-01-01

Objective Understanding population-level health trends is essential to effectively monitor and improve public health. The Office of the National Coordinator for Health Information Technology (ONC) Query Health initiative is a collaboration to develop a national architecture for distributed, population-level health queries across diverse clinical systems with disparate data models. Here we review Query Health activities, including a standards-based methodology, an open-source reference implementation, and three pilot projects. Materials and methods Query Health defined a standards-based approach for distributed population health queries, using an ontology based on the Quality Data Model and Consolidated Clinical Document Architecture, Health Quality Measures Format (HQMF) as the query language, the Query Envelope as the secure transport layer, and the Quality Reporting Document Architecture as the result language. Results We implemented this approach using Informatics for Integrating Biology and the Bedside (i2b2) and hQuery for data analytics and PopMedNet for access control, secure query distribution, and response. We deployed the reference implementation at three pilot sites: two public health departments (New York City and Massachusetts) and one pilot designed to support Food and Drug Administration post-market safety surveillance activities. The pilots were successful, although improved cross-platform data normalization is needed. Discussions This initiative resulted in a standards-based methodology for population health queries, a reference implementation, and revision of the HQMF standard. It also informed future directions regarding interoperability and data access for ONC's Data Access Framework initiative. Conclusions Query Health was a test of the learning health system that supplied a functional methodology and reference implementation for distributed population health queries that has been validated at three sites. PMID:24699371
Query Health: standards-based, cross-platform population health surveillance.

PubMed

Klann, Jeffrey G; Buck, Michael D; Brown, Jeffrey; Hadley, Marc; Elmore, Richard; Weber, Griffin M; Murphy, Shawn N

2014-01-01

Understanding population-level health trends is essential to effectively monitor and improve public health. The Office of the National Coordinator for Health Information Technology (ONC) Query Health initiative is a collaboration to develop a national architecture for distributed, population-level health queries across diverse clinical systems with disparate data models. Here we review Query Health activities, including a standards-based methodology, an open-source reference implementation, and three pilot projects. Query Health defined a standards-based approach for distributed population health queries, using an ontology based on the Quality Data Model and Consolidated Clinical Document Architecture, Health Quality Measures Format (HQMF) as the query language, the Query Envelope as the secure transport layer, and the Quality Reporting Document Architecture as the result language. We implemented this approach using Informatics for Integrating Biology and the Bedside (i2b2) and hQuery for data analytics and PopMedNet for access control, secure query distribution, and response. We deployed the reference implementation at three pilot sites: two public health departments (New York City and Massachusetts) and one pilot designed to support Food and Drug Administration post-market safety surveillance activities. The pilots were successful, although improved cross-platform data normalization is needed. This initiative resulted in a standards-based methodology for population health queries, a reference implementation, and revision of the HQMF standard. It also informed future directions regarding interoperability and data access for ONC's Data Access Framework initiative. Query Health was a test of the learning health system that supplied a functional methodology and reference implementation for distributed population health queries that has been validated at three sites. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
Land Treatment Digital Library

USGS Publications Warehouse

Pilliod, David S.; Welty, Justin L.

2013-01-01

The Land Treatment Digital Library (LTDL) was created by the U.S. Geological Survey to catalog legacy land treatment information on Bureau of Land Management lands in the western United States. The LTDL can be used by federal managers and scientists for compiling information for data-calls, producing maps, generating reports, and conducting analyses at varying spatial and temporal scales. The LTDL currently houses thousands of treatments from BLM lands across 10 states. Users can browse a map to find information on individual treatments, perform more complex queries to identify a set of treatments, and view graphs of treatment summary statistics.
A multi-agent architecture for geosimulation of moving agents

NASA Astrophysics Data System (ADS)

Vahidnia, Mohammad H.; Alesheikh, Ali A.; Alavipanah, Seyed Kazem

2015-10-01

In this paper, a novel architecture is proposed in which an axiomatic derivation system in the form of first-order logic facilitates declarative explanation and spatial reasoning. Simulation of environmental perception and interaction between autonomous agents is designed with a geographic belief-desire-intention and a request-inform-query model. The architecture has a complementary quantitative component that supports collaborative planning based on the concept of equilibrium and game theory. This new architecture presents a departure from current best practices geographic agent-based modelling. Implementation tasks are discussed in some detail, as well as scenarios for fleet management and disaster management.
The LSST Data Mining Research Agenda

NASA Astrophysics Data System (ADS)

Borne, K.; Becla, J.; Davidson, I.; Szalay, A.; Tyson, J. A.

2008-12-01

We describe features of the LSST science database that are amenable to scientific data mining, object classification, outlier identification, anomaly detection, image quality assurance, and survey science validation. The data mining research agenda includes: scalability (at petabytes scales) of existing machine learning and data mining algorithms; development of grid-enabled parallel data mining algorithms; designing a robust system for brokering classifications from the LSST event pipeline (which may produce 10,000 or more event alerts per night) multi-resolution methods for exploration of petascale databases; indexing of multi-attribute multi-dimensional astronomical databases (beyond spatial indexing) for rapid querying of petabyte databases; and more.
Using search engine query data to track pharmaceutical utilization: a study of statins.

PubMed

Schuster, Nathaniel M; Rogers, Mary A M; McMahon, Laurence F

2010-08-01

To examine temporal and geographic associations between Google queries for health information and healthcare utilization benchmarks. Retrospective longitudinal study. Using Google Trends and Google Insights for Search data, the search terms Lipitor (atorvastatin calcium; Pfizer, Ann Arbor, MI) and simvastatin were evaluated for change over time and for association with Lipitor revenues. The relationship between query data and community-based resource use per Medicare beneficiary was assessed for 35 US metropolitan areas. Google queries for Lipitor significantly decreased from January 2004 through June 2009 and queries for simvastatin significantly increased (P <.001 for both), particularly after Lipitor came off patent (P <.001 for change in slope). The mean number of Google queries for Lipitor correlated (r = 0.98) with the percentage change in Lipitor global revenues from 2004 to 2008 (P <.001). Query preference for Lipitor over simvastatin was positively associated (r = 0.40) with a community's use of Medicare services. For every 1% increase in utilization of Medicare services in a community, there was a 0.2-unit increase in the ratio of Lipitor queries to simvastatin queries in that community (P = .02). Specific search engine queries for medical information correlate with pharmaceutical revenue and with overall healthcare utilization in a community. This suggests that search query data can track community-wide characteristics in healthcare utilization and have the potential for informing payers and policy makers regarding trends in utilization.
Knowledge-based geographic information systems (KBGIS): New analytic and data management tools

USGS Publications Warehouse

Albert, T.M.

1988-01-01

In its simplest form, a geographic information system (GIS) may be viewed as a data base management system in which most of the data are spatially indexed, and upon which sets of procedures operate to answer queries about spatial entities represented in the data base. Utilization of artificial intelligence (AI) techniques can enhance greatly the capabilities of a GIS, particularly in handling very large, diverse data bases involved in the earth sciences. A KBGIS has been developed by the U.S. Geological Survey which incorporates AI techniques such as learning, expert systems, new data representation, and more. The system, which will be developed further and applied, is a prototype of the next generation of GIS's, an intelligent GIS, as well as an example of a general-purpose intelligent data handling system. The paper provides a description of KBGIS and its application, as well as the AI techniques involved. ?? 1988 International Association for Mathematical Geology.
Model for Semantically Rich Point Cloud Data

NASA Astrophysics Data System (ADS)

Poux, F.; Neuville, R.; Hallot, P.; Billen, R.

2017-10-01

This paper proposes an interoperable model for managing high dimensional point clouds while integrating semantics. Point clouds from sensors are a direct source of information physically describing a 3D state of the recorded environment. As such, they are an exhaustive representation of the real world at every scale: 3D reality-based spatial data. Their generation is increasingly fast but processing routines and data models lack of knowledge to reason from information extraction rather than interpretation. The enhanced smart point cloud developed model allows to bring intelligence to point clouds via 3 connected meta-models while linking available knowledge and classification procedures that permits semantic injection. Interoperability drives the model adaptation to potentially many applications through specialized domain ontologies. A first prototype is implemented in Python and PostgreSQL database and allows to combine semantic and spatial concepts for basic hybrid queries on different point clouds.
VizieR Online Data Catalog: Bright white dwarfs IRAC photometry (Barber+, 2016)

NASA Astrophysics Data System (ADS)

Barber, S. D.; Belardi, C.; Kilic, M.; Gianninas, A.

2017-07-01

Mid-infrared photometry, like the 3.4 and 4.6um photometry available from WISE, is necessary to detect emission from a debris disc orbiting a WD. WISE, however, has poor spatial resolution (6 arcsec beam size) and is known to have a 75 per cent false positive rate for detecting dusty discs around WDs fainter than 14.5(15) mag in W1(W2) (Barber et al. (2014ApJ...786...77B). To mitigate this high rate of spurious detections, we compile higher spatial resolution archival data from the InfraRed Array Camera (IRAC) on the Spitzer Space Telescope. We query the Spitzer Heritage Archive for any observations within 10 arcsec of the 1265 WDs from Gianninas et al. (2011, Cat. J/ApJ/743/138) and find 907 Astronomical Observing Requests (AORs) for 381 WDs. (1 data file).
CSRQ: Communication-Efficient Secure Range Queries in Two-Tiered Sensor Networks

PubMed Central

Dai, Hua; Ye, Qingqun; Yang, Geng; Xu, Jia; He, Ruiliang

2016-01-01

In recent years, we have seen many applications of secure query in two-tiered wireless sensor networks. Storage nodes are responsible for storing data from nearby sensor nodes and answering queries from Sink. It is critical to protect data security from a compromised storage node. In this paper, the Communication-efficient Secure Range Query (CSRQ)—a privacy and integrity preserving range query protocol—is proposed to prevent attackers from gaining information of both data collected by sensor nodes and queries issued by Sink. To preserve privacy and integrity, in addition to employing the encoding mechanisms, a novel data structure called encrypted constraint chain is proposed, which embeds the information of integrity verification. Sink can use this encrypted constraint chain to verify the query result. The performance evaluation shows that CSRQ has lower communication cost than the current range query protocols. PMID:26907293
SPARQLGraph: a web-based platform for graphically querying biological Semantic Web databases.

PubMed

Schweiger, Dominik; Trajanoski, Zlatko; Pabinger, Stephan

2014-08-15

Semantic Web has established itself as a framework for using and sharing data across applications and database boundaries. Here, we present a web-based platform for querying biological Semantic Web databases in a graphical way. SPARQLGraph offers an intuitive drag & drop query builder, which converts the visual graph into a query and executes it on a public endpoint. The tool integrates several publicly available Semantic Web databases, including the databases of the just recently released EBI RDF platform. Furthermore, it provides several predefined template queries for answering biological questions. Users can easily create and save new query graphs, which can also be shared with other researchers. This new graphical way of creating queries for biological Semantic Web databases considerably facilitates usability as it removes the requirement of knowing specific query languages and database structures. The system is freely available at http://sparqlgraph.i-med.ac.at.
Torsional Shear Strength Tests for Glass-Ceramic Joined Silicon Carbide

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ferraris, Monica; Ventrella, Andrea; Salvo, Milena

2014-03-17

A torsion test on hour-glass-shaped samples with a full joined or a ring-shaped joined area was chosen in this study to measure shear strength of glass-ceramic joined silicon carbide. Shear strength of about 100 MPa was measured for full joined SiC with fracture completely inside their joined area. Attempts to obtain this shear strength with a ring-shaped joined area failed due to mixed mode fractures. However, full joined and ring-shaped steel hour-glasses joined by a glass-ceramic gave the same shear strength, thus suggesting that this test measures shear strength of joined components only when their fracture is completely inside theirmore » joined area.« less
Spatial rule-based assessment of habitat potential to predict impact of land use changes on biodiversity at municipal scale.

PubMed

Scolozzi, Rocco; Geneletti, Davide

2011-03-01

In human dominated landscapes, ecosystems are under increasing pressures caused by urbanization and infrastructure development. In Alpine valleys remnant natural areas are increasingly affected by habitat fragmentation and loss. In these contexts, there is a growing risk of local extinction for wildlife populations; hence assessing the consequences on biodiversity of proposed land use changes is extremely important. The article presents a methodology to assess the impacts of land use changes on target species at a local scale. The approach relies on the application of ecological profiles of target species for habitat potential (HP) assessment, using high resolution GIS-data within a multiple level framework. The HP, in this framework, is based on a species-specific assessment of the suitability of a site, as well of surrounding areas. This assessment is performed through spatial rules, structured as sets of queries on landscape objects. We show that by considering spatial dependencies in habitat assessment it is possible to perform better quantification of impacts of local-level land use changes on habitats.
Improving accuracy for identifying related PubMed queries by an integrated approach.

PubMed

Lu, Zhiyong; Wilbur, W John

2009-10-01

PubMed is the most widely used tool for searching biomedical literature online. As with many other online search tools, a user often types a series of multiple related queries before retrieving satisfactory results to fulfill a single information need. Meanwhile, it is also a common phenomenon to see a user type queries on unrelated topics in a single session. In order to study PubMed users' search strategies, it is necessary to be able to automatically separate unrelated queries and group together related queries. Here, we report a novel approach combining both lexical and contextual analyses for segmenting PubMed query sessions and identifying related queries and compare its performance with the previous approach based solely on concept mapping. We experimented with our integrated approach on sample data consisting of 1539 pairs of consecutive user queries in 351 user sessions. The prediction results of 1396 pairs agreed with the gold-standard annotations, achieving an overall accuracy of 90.7%. This demonstrates that our approach is significantly better than the previously published method. By applying this approach to a one day query log of PubMed, we found that a significant proportion of information needs involved more than one PubMed query, and that most of the consecutive queries for the same information need are lexically related. Finally, the proposed PubMed distance is shown to be an accurate and meaningful measure for determining the contextual similarity between biological terms. The integrated approach can play a critical role in handling real-world PubMed query log data as is demonstrated in our experiments.
Improving accuracy for identifying related PubMed queries by an integrated approach

PubMed Central

Lu, Zhiyong; Wilbur, W. John

2009-01-01

PubMed is the most widely used tool for searching biomedical literature online. As with many other online search tools, a user often types a series of multiple related queries before retrieving satisfactory results to fulfill a single information need. Meanwhile, it is also a common phenomenon to see a user type queries on unrelated topics in a single session. In order to study PubMed users’ search strategies, it is necessary to be able to automatically separate unrelated queries and group together related queries. Here, we report a novel approach combining both lexical and contextual analyses for segmenting PubMed query sessions and identifying related queries and compare its performance with the previous approach based solely on concept mapping. We experimented with our integrated approach on sample data consisting of 1,539 pairs of consecutive user queries in 351 user sessions. The prediction results of 1,396 pairs agreed with the gold-standard annotations, achieving an overall accuracy of 90.7%. This demonstrates that our approach is significantly better than the previously published method. By applying this approach to a one day query log of PubMed, we found that a significant proportion of information needs involved more than one PubMed query, and that most of the consecutive queries for the same information need are lexically related. Finally, the proposed PubMed distance is shown to be an accurate and meaningful measure for determining the contextual similarity between biological terms. The integrated approach can play a critical role in handling real-world PubMed query log data as is demonstrated in our experiments. PMID:19162232
Multi-Bit Quantum Private Query

NASA Astrophysics Data System (ADS)

Shi, Wei-Xu; Liu, Xing-Tong; Wang, Jian; Tang, Chao-Jing

2015-09-01

Most of the existing Quantum Private Queries (QPQ) protocols provide only single-bit queries service, thus have to be repeated several times when more bits are retrieved. Wei et al.'s scheme for block queries requires a high-dimension quantum key distribution system to sustain, which is still restricted in the laboratory. Here, based on Markus Jakobi et al.'s single-bit QPQ protocol, we propose a multi-bit quantum private query protocol, in which the user can get access to several bits within one single query. We also extend the proposed protocol to block queries, using a binary matrix to guard database security. Analysis in this paper shows that our protocol has better communication complexity, implementability and can achieve a considerable level of security.
Estimating Missing Features to Improve Multimedia Information Retrieval

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bagherjeiran, A; Love, N S; Kamath, C

Retrieval in a multimedia database usually involves combining information from different modalities of data, such as text and images. However, all modalities of the data may not be available to form the query. The retrieval results from such a partial query are often less than satisfactory. In this paper, we present an approach to complete a partial query by estimating the missing features in the query. Our experiments with a database of images and their associated captions show that, with an initial text-only query, our completion method has similar performance to a full query with both image and text features.more » In addition, when we use relevance feedback, our approach outperforms the results obtained using a full query.« less

A Framework for WWW Query Processing

NASA Technical Reports Server (NTRS)

Wu, Binghui Helen; Wharton, Stephen (Technical Monitor)

2000-01-01

Query processing is the most common operation in a DBMS. Sophisticated query processing has been mainly targeted at a single enterprise environment providing centralized control over data and metadata. Submitting queries by anonymous users on the web is different in such a way that load balancing or DBMS' accessing control becomes the key issue. This paper provides a solution by introducing a framework for WWW query processing. The success of this framework lies in the utilization of query optimization techniques and the ontological approach. This methodology has proved to be cost effective at the NASA Goddard Space Flight Center Distributed Active Archive Center (GDAAC).
QBIC project: querying images by content, using color, texture, and shape

NASA Astrophysics Data System (ADS)

Niblack, Carlton W.; Barber, Ron; Equitz, Will; Flickner, Myron D.; Glasman, Eduardo H.; Petkovic, Dragutin; Yanker, Peter; Faloutsos, Christos; Taubin, Gabriel

1993-04-01

In the query by image content (QBIC) project we are studying methods to query large on-line image databases using the images' content as the basis of the queries. Examples of the content we use include color, texture, and shape of image objects and regions. Potential applications include medical (`Give me other images that contain a tumor with a texture like this one'), photo-journalism (`Give me images that have blue at the top and red at the bottom'), and many others in art, fashion, cataloging, retailing, and industry. Key issues include derivation and computation of attributes of images and objects that provide useful query functionality, retrieval methods based on similarity as opposed to exact match, query by image example or user drawn image, the user interfaces, query refinement and navigation, high dimensional database indexing, and automatic and semi-automatic database population. We currently have a prototype system written in X/Motif and C running on an RS/6000 that allows a variety of queries, and a test database of over 1000 images and 1000 objects populated from commercially available photo clip art images. In this paper we present the main algorithms for color texture, shape and sketch query that we use, show example query results, and discuss future directions.
Does query expansion limit our learning? A comparison of social-based expansion to content-based expansion for medical queries on the internet.

PubMed

Pentoney, Christopher; Harwell, Jeff; Leroy, Gondy

2014-01-01

Searching for medical information online is a common activity. While it has been shown that forming good queries is difficult, Google's query suggestion tool, a type of query expansion, aims to facilitate query formation. However, it is unknown how this expansion, which is based on what others searched for, affects the information gathering of the online community. To measure the impact of social-based query expansion, this study compared it with content-based expansion, i.e., what is really in the text. We used 138,906 medical queries from the AOL User Session Collection and expanded them using Google's Autocomplete method (social-based) and the content of the Google Web Corpus (content-based). We evaluated the specificity and ambiguity of the expansion terms for trigram queries. We also looked at the impact on the actual results using domain diversity and expansion edit distance. Results showed that the social-based method provided more precise expansion terms as well as terms that were less ambiguous. Expanded queries do not differ significantly in diversity when expanded using the social-based method (6.72 different domains returned in the first ten results, on average) vs. content-based method (6.73 different domains, on average).
Secure Skyline Queries on Cloud Platform

PubMed Central

Liu, Jinfei; Yang, Juncheng; Xiong, Li; Pei, Jian

2017-01-01

Outsourcing data and computation to cloud server provides a cost-effective way to support large scale data storage and query processing. However, due to security and privacy concerns, sensitive data (e.g., medical records) need to be protected from the cloud server and other unauthorized users. One approach is to outsource encrypted data to the cloud server and have the cloud server perform query processing on the encrypted data only. It remains a challenging task to support various queries over encrypted data in a secure and efficient way such that the cloud server does not gain any knowledge about the data, query, and query result. In this paper, we study the problem of secure skyline queries over encrypted data. The skyline query is particularly important for multi-criteria decision making but also presents significant challenges due to its complex computations. We propose a fully secure skyline query protocol on data encrypted using semantically-secure encryption. As a key subroutine, we present a new secure dominance protocol, which can be also used as a building block for other queries. Finally, we provide both serial and parallelized implementations and empirically study the protocols in terms of efficiency and scalability under different parameter settings, verifying the feasibility of our proposed solutions. PMID:28883710
Heuristic query optimization for query multiple table and multiple clausa on mobile finance application

NASA Astrophysics Data System (ADS)

Indrayana, I. N. E.; P, N. M. Wirasyanti D.; Sudiartha, I. KG

2018-01-01

Mobile application allow many users to access data from the application without being limited to space, space and time. Over time the data population of this application will increase. Data access time will cause problems if the data record has reached tens of thousands to millions of records.The objective of this research is to maintain the performance of data execution for large data records. One effort to maintain data access time performance is to apply query optimization method. The optimization used in this research is query heuristic optimization method. The built application is a mobile-based financial application using MySQL database with stored procedure therein. This application is used by more than one business entity in one database, thus enabling rapid data growth. In this stored procedure there is an optimized query using heuristic method. Query optimization is performed on a “Select” query that involves more than one table with multiple clausa. Evaluation is done by calculating the average access time using optimized and unoptimized queries. Access time calculation is also performed on the increase of population data in the database. The evaluation results shown the time of data execution with query heuristic optimization relatively faster than data execution time without using query optimization.
A high performance, ad-hoc, fuzzy query processing system for relational databases

NASA Technical Reports Server (NTRS)

Mansfield, William H., Jr.; Fleischman, Robert M.

1992-01-01

Database queries involving imprecise or fuzzy predicates are currently an evolving area of academic and industrial research. Such queries place severe stress on the indexing and I/O subsystems of conventional database environments since they involve the search of large numbers of records. The Datacycle architecture and research prototype is a database environment that uses filtering technology to perform an efficient, exhaustive search of an entire database. It has recently been modified to include fuzzy predicates in its query processing. The approach obviates the need for complex index structures, provides unlimited query throughput, permits the use of ad-hoc fuzzy membership functions, and provides a deterministic response time largely independent of query complexity and load. This paper describes the Datacycle prototype implementation of fuzzy queries and some recent performance results.
Systems and methods for an extensible business application framework

NASA Technical Reports Server (NTRS)

Bell, David G. (Inventor); Crawford, Michael (Inventor)

2012-01-01

Method and systems for editing data from a query result include requesting a query result using a unique collection identifier for a collection of individual files and a unique identifier for a configuration file that specifies a data structure for the query result. A query result is generated that contains a plurality of fields as specified by the configuration file, by combining each of the individual files associated with a unique identifier for a collection of individual files. The query result data is displayed with a plurality of labels as specified in the configuration file. Edits can be performed by querying a collection of individual files using the configuration file, editing a portion of the query result, and transmitting only the edited information for storage back into a data repository.
Analysis of Information Needs of Users of MEDLINEplus, 2002 – 2003

PubMed Central

Scott-Wright, Alicia; Crowell, Jon; Zeng, Qing; Bates, David W.; Greenes, Robert

2006-01-01

We analyzed query logs from use of MEDLINEplus to answer the questions: Are consumers’ health information needs stable over time? and To what extent do users’ queries change over time? To determine log stability, we assessed an Overlap Rate (OR) defined as the number of unique queries common to two adjacent months divided by the total number of unique queries in those months. All exactly matching queries were considered as one unique query. We measured ORs for the top 10 and 100 unique queries of a month and compared these to ORs for the following month. Over ten months, users submitted 12,234,737 queries; only 2,179,571 (17.8%) were unique and these had a mean word count of 2.73 (S.D., 0.24); 121 of 137 (88.3%) unique queries each comprised of exactly matching search term(s) used at least 5000 times were of only one word. We could predict with 95% confidence that the monthly OR for the top 100 unique queries would lie between 67% – 87% when compared with the top 100 from the previous month. The mean month-to-month OR for top 10 queries was 62% (S.D., 20%) indicating significant variability; the lowest OR of 33% between the top 10 in Mar. compared to Apr. was likely due to “new” interest in information about SARS pneumonia in Apr. 2003. Consumers’ health information needs are relatively stable and the 100 most common unique queries are about 77% the same from month to month. Website sponsors should provide a broad range of information about a relatively stable number of topics. Analyses of log similarity may identify media-induced, cyclical, or seasonal changes in areas of consumer interest. PMID:17238431
Big Data and Dysmenorrhea: What Questions Do Women and Men Ask About Menstrual Pain?

PubMed

Chen, Chen X; Groves, Doyle; Miller, Wendy R; Carpenter, Janet S

2018-04-30

Menstrual pain is highly prevalent among women of reproductive age. As the general public increasingly obtains health information online, Big Data from online platforms provide novel sources to understand the public's perspectives and information needs about menstrual pain. The study's purpose was to describe salient queries about dysmenorrhea using Big Data from a question and answer platform. We performed text-mining of 1.9 billion queries from ChaCha, a United States-based question and answer platform. Dysmenorrhea-related queries were identified by using keyword searching. Each relevant query was split into token words (i.e., meaningful words or phrases) and stop words (i.e., not meaningful functional words). Word Adjacency Graph (WAG) modeling was used to detect clusters of queries and visualize the range of dysmenorrhea-related topics. We constructed two WAG models respectively from queries by women of reproductive age and bymen. Salient themes were identified through inspecting clusters of WAG models. We identified two subsets of queries: Subset 1 contained 507,327 queries from women aged 13-50 years. Subset 2 contained 113,888 queries from men aged 13 or above. WAG modeling revealed topic clusters for each subset. Between female and male subsets, topic clusters overlapped on dysmenorrhea symptoms and management. Among female queries, there were distinctive topics on approaching menstrual pain at school and menstrual pain-related conditions; while among male queries, there was a distinctive cluster of queries on menstrual pain from male's perspectives. Big Data mining of the ChaCha ® question and answer service revealed a series of information needs among women and men on menstrual pain. Findings may be useful in structuring the content and informing the delivery platform for educational interventions.
Modeling shape and topology of low-resolution density maps of biological macromolecules.

PubMed Central

De-Alarcón, Pedro A; Pascual-Montano, Alberto; Gupta, Amarnath; Carazo, Jose M

2002-01-01

In the present work we develop an efficient way of representing the geometry and topology of volumetric datasets of biological structures from medium to low resolution, aiming at storing and querying them in a database framework. We make use of a new vector quantization algorithm to select the points within the macromolecule that best approximate the probability density function of the original volume data. Connectivity among points is obtained with the use of the alpha shapes theory. This novel data representation has a number of interesting characteristics, such as 1) it allows us to automatically segment and quantify a number of important structural features from low-resolution maps, such as cavities and channels, opening the possibility of querying large collections of maps on the basis of these quantitative structural features; 2) it provides a compact representation in terms of size; 3) it contains a subset of three-dimensional points that optimally quantify the densities of medium resolution data; and 4) a general model of the geometry and topology of the macromolecule (as opposite to a spatially unrelated bunch of voxels) is easily obtained by the use of the alpha shapes theory. PMID:12124252
Multiple Query Evaluation Based on an Enhanced Genetic Algorithm.

ERIC Educational Resources Information Center

Tamine, Lynda; Chrisment, Claude; Boughanem, Mohand

2003-01-01

Explains the use of genetic algorithms to combine results from multiple query evaluations to improve relevance in information retrieval. Discusses niching techniques, relevance feedback techniques, and evolution heuristics, and compares retrieval results obtained by both genetic multiple query evaluation and classical single query evaluation…
Relational Algebra and SQL: Better Together

ERIC Educational Resources Information Center

McMaster, Kirby; Sambasivam, Samuel; Hadfield, Steven; Wolthuis, Stuart

2013-01-01

In this paper, we describe how database instructors can teach Relational Algebra and Structured Query Language together through programming. Students write query programs consisting of sequences of Relational Algebra operations vs. Structured Query Language SELECT statements. The query programs can then be run interactively, allowing students to…
A Firefly Algorithm-based Approach for Pseudo-Relevance Feedback: Application to Medical Database.

PubMed

Khennak, Ilyes; Drias, Habiba

2016-11-01

The difficulty of disambiguating the sense of the incomplete and imprecise keywords that are extensively used in the search queries has caused the failure of search systems to retrieve the desired information. One of the most powerful and promising method to overcome this shortcoming and improve the performance of search engines is Query Expansion, whereby the user's original query is augmented by new keywords that best characterize the user's information needs and produce more useful query. In this paper, a new Firefly Algorithm-based approach is proposed to enhance the retrieval effectiveness of query expansion while maintaining low computational complexity. In contrast to the existing literature, the proposed approach uses a Firefly Algorithm to find the best expanded query among a set of expanded query candidates. Moreover, this new approach allows the determination of the length of the expanded query empirically. Experimental results on MEDLINE, the on-line medical information database, show that our proposed approach is more effective and efficient compared to the state-of-the-art.
RiPPAS: A Ring-Based Privacy-Preserving Aggregation Scheme in Wireless Sensor Networks

PubMed Central

Zhang, Kejia; Han, Qilong; Cai, Zhipeng; Yin, Guisheng

2017-01-01

Recently, data privacy in wireless sensor networks (WSNs) has been paid increased attention. The characteristics of WSNs determine that users’ queries are mainly aggregation queries. In this paper, the problem of processing aggregation queries in WSNs with data privacy preservation is investigated. A Ring-based Privacy-Preserving Aggregation Scheme (RiPPAS) is proposed. RiPPAS adopts ring structure to perform aggregation. It uses pseudonym mechanism for anonymous communication and uses homomorphic encryption technique to add noise to the data easily to be disclosed. RiPPAS can handle both sum() queries and min()/max() queries, while the existing privacy-preserving aggregation methods can only deal with sum() queries. For processing sum() queries, compared with the existing methods, RiPPAS has advantages in the aspects of privacy preservation and communication efficiency, which can be proved by theoretical analysis and simulation results. For processing min()/max() queries, RiPPAS provides effective privacy preservation and has low communication overhead. PMID:28178197
Handling Different Spatial Resolutions in Image Fusion by Multivariate Curve Resolution-Alternating Least Squares for Incomplete Image Multisets.

PubMed

Piqueras, Sara; Bedia, Carmen; Beleites, Claudia; Krafft, Christoph; Popp, Jürgen; Maeder, Marcel; Tauler, Romà; de Juan, Anna

2018-06-05

Data fusion of different imaging techniques allows a comprehensive description of chemical and biological systems. Yet, joining images acquired with different spectroscopic platforms is complex because of the different sample orientation and image spatial resolution. Whereas matching sample orientation is often solved by performing suitable affine transformations of rotation, translation, and scaling among images, the main difficulty in image fusion is preserving the spatial detail of the highest spatial resolution image during multitechnique image analysis. In this work, a special variant of the unmixing algorithm Multivariate Curve Resolution Alternating Least Squares (MCR-ALS) for incomplete multisets is proposed to provide a solution for this kind of problem. This algorithm allows analyzing simultaneously images collected with different spectroscopic platforms without losing spatial resolution and ensuring spatial coherence among the images treated. The incomplete multiset structure concatenates images of the two platforms at the lowest spatial resolution with the image acquired with the highest spatial resolution. As a result, the constituents of the sample analyzed are defined by a single set of distribution maps, common to all platforms used and with the highest spatial resolution, and their related extended spectral signatures, covering the signals provided by each of the fused techniques. We demonstrate the potential of the new variant of MCR-ALS for multitechnique analysis on three case studies: (i) a model example of MIR and Raman images of pharmaceutical mixture, (ii) FT-IR and Raman images of palatine tonsil tissue, and (iii) mass spectrometry and Raman images of bean tissue.
Combinational Reasoning of Quantitative Fuzzy Topological Relations for Simple Fuzzy Regions

PubMed Central

Liu, Bo; Li, Dajun; Xia, Yuanping; Ruan, Jian; Xu, Lili; Wu, Huanyi

2015-01-01

In recent years, formalization and reasoning of topological relations have become a hot topic as a means to generate knowledge about the relations between spatial objects at the conceptual and geometrical levels. These mechanisms have been widely used in spatial data query, spatial data mining, evaluation of equivalence and similarity in a spatial scene, as well as for consistency assessment of the topological relations of multi-resolution spatial databases. The concept of computational fuzzy topological space is applied to simple fuzzy regions to efficiently and more accurately solve fuzzy topological relations. Thus, extending the existing research and improving upon the previous work, this paper presents a new method to describe fuzzy topological relations between simple spatial regions in Geographic Information Sciences (GIS) and Artificial Intelligence (AI). Firstly, we propose a new definition for simple fuzzy line segments and simple fuzzy regions based on the computational fuzzy topology. And then, based on the new definitions, we also propose a new combinational reasoning method to compute the topological relations between simple fuzzy regions, moreover, this study has discovered that there are (1) 23 different topological relations between a simple crisp region and a simple fuzzy region; (2) 152 different topological relations between two simple fuzzy regions. In the end, we have discussed some examples to demonstrate the validity of the new method, through comparisons with existing fuzzy models, we showed that the proposed method can compute more than the existing models, as it is more expressive than the existing fuzzy models. PMID:25775452
The construction of the spatio-temporal database of the ancient Silk Road within Xinjiang province during the Han and Tang dynasties

NASA Astrophysics Data System (ADS)

Bi, Jiantao; Luo, Guilin; Wang, Xingxing; Zhu, Zuojia

2014-03-01

As the bridge over the Chinese and Western civilization, the ancient Silk Road has made a huge contribution to cultural, economic, political exchanges between China and western countries. In this paper, we treated the historical period of Western Han Dynasty, Eastern Han Dynasty and Tang Dynasty as the research time domain, and the Western Regions' countries that were existed along the Silk Road at the mean time as the research spatial domain. Then we imported these data into the SQL Server database we constructed, from which we could either query the attribute information such as population, military force, the era of the Central Plains empire, the significant events taking place in the country and some related attribute information of these events like the happened calendar year in addition to some related spatial information such as the present location, the coordinates of the capital and the territory by inputting the name of the Western countries. At the same time we could query the significant events, government institution in Central Plains and the existent Western countries at the mean time by inputting the calendar year. Based on the database, associated with GIS, RS, Flex, C# and other related information technology and network technology, we could not only browsing, searching and editing the information of the ancient Silk Road in Xinjiang Province during the Han and Tang Dynasties, but preliminary analysing as well. This is the combination of archaeology and modern information technology, and the database could also be a reference to further study, research and practice in the related fields in the future.
a Webgis for the Knowledge and Conservation of the Historical Wall Structures of the 13TH-18TH Centuries

NASA Astrophysics Data System (ADS)

Vacca, G.; Pili, D.; Fiorino, D. R.; Pintus, V.

2017-05-01

The presented work is part of the research project, titled "Tecniche murarie tradizionali: conoscenza per la conservazione ed il miglioramento prestazionale" (Traditional building techniques: from knowledge to conservation and performance improvement), with the purpose of studying the building techniques of the 13th-18th centuries in the Sardinia Region (Italy) for their knowledge, conservation, and promotion. The end purpose of the entire study is to improve the performance of the examined structures. In particular, the task of the authors within the research project was to build a WebGIS to manage the data collected during the examination and study phases. This infrastructure was entirely built using Open Source software. The work consisted of designing a database built in PostgreSQL and its spatial extension PostGIS, which allows to store and manage feature geometries and spatial data. The data input is performed via a form built in HTML and PHP. The HTML part is based on Bootstrap, an open tools library for websites and web applications. The implementation of this template used both PHP and Javascript code. The PHP code manages the reading and writing of data to the database, using embedded SQL queries. As of today, we surveyed and archived more than 300 buildings, belonging to three main macro categories: fortification architectures, religious architectures, residential architectures. The masonry samples investigated in relation to the construction techniques are more than 150. The database is published on the Internet as a WebGIS built using the Leaflet Javascript open libraries, which allows creating map sites with background maps and navigation, input and query tools. This too uses an interaction of HTML, Javascript, PHP and SQL code.
Dynamic Querying of Mass-Storage RDF Data with Rule-Based Entailment Regimes

NASA Astrophysics Data System (ADS)

Ianni, Giovambattista; Krennwallner, Thomas; Martello, Alessandra; Polleres, Axel

RDF Schema (RDFS) as a lightweight ontology language is gaining popularity and, consequently, tools for scalable RDFS inference and querying are needed. SPARQL has become recently a W3C standard for querying RDF data, but it mostly provides means for querying simple RDF graphs only, whereas querying with respect to RDFS or other entailment regimes is left outside the current specification. In this paper, we show that SPARQL faces certain unwanted ramifications when querying ontologies in conjunction with RDF datasets that comprise multiple named graphs, and we provide an extension for SPARQL that remedies these effects. Moreover, since RDFS inference has a close relationship with logic rules, we generalize our approach to select a custom ruleset for specifying inferences to be taken into account in a SPARQL query. We show that our extensions are technically feasible by providing benchmark results for RDFS querying in our prototype system GiaBATA, which uses Datalog coupled with a persistent Relational Database as a back-end for implementing SPARQL with dynamic rule-based inference. By employing different optimization techniques like magic set rewriting our system remains competitive with state-of-the-art RDFS querying systems.
Mining the SDSS SkyServer SQL queries log

NASA Astrophysics Data System (ADS)

Hirota, Vitor M.; Santos, Rafael; Raddick, Jordan; Thakar, Ani

2016-05-01

SkyServer, the Internet portal for the Sloan Digital Sky Survey (SDSS) astronomic catalog, provides a set of tools that allows data access for astronomers and scientific education. One of SkyServer data access interfaces allows users to enter ad-hoc SQL statements to query the catalog. SkyServer also presents some template queries that can be used as basis for more complex queries. This interface has logged over 330 million queries submitted since 2001. It is expected that analysis of this data can be used to investigate usage patterns, identify potential new classes of queries, find similar queries, etc. and to shed some light on how users interact with the Sloan Digital Sky Survey data and how scientists have adopted the new paradigm of e-Science, which could in turn lead to enhancements on the user interfaces and experience in general. In this paper we review some approaches to SQL query mining, apply the traditional techniques used in the literature and present lessons learned, namely, that the general text mining approach for feature extraction and clustering does not seem to be adequate for this type of data, and, most importantly, we find that this type of analysis can result in very different queries being clustered together.

Applying Query Structuring in Cross-language Retrieval.

ERIC Educational Resources Information Center

Pirkola, Ari; Puolamaki, Deniz; Jarvelin, Kalervo

2003-01-01

Explores ways to apply query structuring in cross-language information retrieval. Tested were: English queries translated into Finnish using an electronic dictionary, and run in a Finnish newspaper databases; effects of compound-based structuring using a proximity operator for translation equivalents of query language compound components; and a…
Querying and Ranking XML Documents.

ERIC Educational Resources Information Center

Schlieder, Torsten; Meuss, Holger

2002-01-01

Discussion of XML, information retrieval, precision, and recall focuses on a retrieval technique that adopts the similarity measure of the vector space model, incorporates the document structure, and supports structured queries. Topics include a query model based on tree matching; structured queries and term-based ranking; and term frequency and…
Advanced Query Formulation in Deductive Databases.

ERIC Educational Resources Information Center

Niemi, Timo; Jarvelin, Kalervo

1992-01-01

Discusses deductive databases and database management systems (DBMS) and introduces a framework for advanced query formulation for end users. Recursive processing is described, a sample extensional database is presented, query types are explained, and criteria for advanced query formulation from the end user's viewpoint are examined. (31…
A Semantic Graph Query Language

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kaplan, I L

2006-10-16

Semantic graphs can be used to organize large amounts of information from a number of sources into one unified structure. A semantic query language provides a foundation for extracting information from the semantic graph. The graph query language described here provides a simple, powerful method for querying semantic graphs.
Using Common Table Expressions to Build a Scalable Boolean Query Generator for Clinical Data Warehouses

PubMed Central

Harris, Daniel R.; Henderson, Darren W.; Kavuluru, Ramakanth; Stromberg, Arnold J.; Johnson, Todd R.

2015-01-01

We present a custom, Boolean query generator utilizing common-table expressions (CTEs) that is capable of scaling with big datasets. The generator maps user-defined Boolean queries, such as those interactively created in clinical-research and general-purpose healthcare tools, into SQL. We demonstrate the effectiveness of this generator by integrating our work into the Informatics for Integrating Biology and the Bedside (i2b2) query tool and show that it is capable of scaling. Our custom generator replaces and outperforms the default query generator found within the Clinical Research Chart (CRC) cell of i2b2. In our experiments, sixteen different types of i2b2 queries were identified by varying four constraints: date, frequency, exclusion criteria, and whether selected concepts occurred in the same encounter. We generated non-trivial, random Boolean queries based on these 16 types; the corresponding SQL queries produced by both generators were compared by execution times. The CTE-based solution significantly outperformed the default query generator and provided a much more consistent response time across all query types (M=2.03, SD=6.64 vs. M=75.82, SD=238.88 seconds). Without costly hardware upgrades, we provide a scalable solution based on CTEs with very promising empirical results centered on performance gains. The evaluation methodology used for this provides a means of profiling clinical data warehouse performance. PMID:25192572
Query Language for Location-Based Services: A Model Checking Approach

NASA Astrophysics Data System (ADS)

Hoareau, Christian; Satoh, Ichiro

We present a model checking approach to the rationale, implementation, and applications of a query language for location-based services. Such query mechanisms are necessary so that users, objects, and/or services can effectively benefit from the location-awareness of their surrounding environment. The underlying data model is founded on a symbolic model of space organized in a tree structure. Once extended to a semantic model for modal logic, we regard location query processing as a model checking problem, and thus define location queries as hybrid logicbased formulas. Our approach is unique to existing research because it explores the connection between location models and query processing in ubiquitous computing systems, relies on a sound theoretical basis, and provides modal logic-based query mechanisms for expressive searches over a decentralized data structure. A prototype implementation is also presented and will be discussed.
Query Expansion and Query Translation as Logical Inference.

ERIC Educational Resources Information Center

Nie, Jian-Yun

2003-01-01

Examines query expansion during query translation in cross language information retrieval and develops a general framework for inferential information retrieval in two particular contexts: using fuzzy logic and probability theory. Obtains evaluation formulas that are shown to strongly correspond to those used in other information retrieval models.…
End-User Use of Data Base Query Language: Pros and Cons.

ERIC Educational Resources Information Center

Nicholes, Walter

1988-01-01

Man-machine interface, the concept of a computer "query," a review of database technology, and a description of the use of query languages at Brigham Young University are discussed. The pros and cons of end-user use of database query languages are explored. (Author/MLW)
Information Retrieval Using UMLS-based Structured Queries

PubMed Central

Fagan, Lawrence M.; Berrios, Daniel C.; Chan, Albert; Cucina, Russell; Datta, Anupam; Shah, Maulik; Surendran, Sujith

2001-01-01

During the last three years, we have developed and described components of ELBook, a semantically based information-retrieval system [1-4]. Using these components, domain experts can specify a query model, indexers can use the query model to index documents, and end-users can search these documents for instances of indexed queries.
A Relational Algebra Query Language for Programming Relational Databases

ERIC Educational Resources Information Center

McMaster, Kirby; Sambasivam, Samuel; Anderson, Nicole

2011-01-01

In this paper, we describe a Relational Algebra Query Language (RAQL) and Relational Algebra Query (RAQ) software product we have developed that allows database instructors to teach relational algebra through programming. Instead of defining query operations using mathematical notation (the approach commonly taken in database textbooks), students…
A spatial database for landslides in northern Bavaria: A methodological approach

NASA Astrophysics Data System (ADS)

Jäger, Daniel; Kreuzer, Thomas; Wilde, Martina; Bemm, Stefan; Terhorst, Birgit

2018-04-01

Landslide databases provide essential information for hazard modeling, damages on buildings and infrastructure, mitigation, and research needs. This study presents the development of a landslide database system named WISL (Würzburg Information System on Landslides), currently storing detailed landslide data for northern Bavaria, Germany, in order to enable scientific queries as well as comparisons with other regional landslide inventories. WISL is based on free open source software solutions (PostgreSQL, PostGIS) assuring good correspondence of the various softwares and to enable further extensions with specific adaptions of self-developed software. Apart from that, WISL was designed to be particularly compatible for easy communication with other databases. As a central pre-requisite for standardized, homogeneous data acquisition in the field, a customized data sheet for landslide description was compiled. This sheet also serves as an input mask for all data registration procedures in WISL. A variety of "in-database" solutions for landslide analysis provides the necessary scalability for the database, enabling operations at the local server. In its current state, WISL already enables extensive analysis and queries. This paper presents an example analysis of landslides in Oxfordian Limestones in the northeastern Franconian Alb, northern Bavaria. The results reveal widely differing landslides in terms of geometry and size. Further queries related to landslide activity classifies the majority of the landslides as currently inactive, however, they clearly possess a certain potential for remobilization. Along with some active mass movements, a significant percentage of landslides potentially endangers residential areas or infrastructure. The main aspect of future enhancements of the WISL database is related to data extensions in order to increase research possibilities, as well as to transfer the system to other regions and countries.
Bim-Gis Integrated Geospatial Information Model Using Semantic Web and Rdf Graphs

NASA Astrophysics Data System (ADS)

Hor, A.-H.; Jadidi, A.; Sohn, G.

2016-06-01

In recent years, 3D virtual indoor/outdoor urban modelling becomes a key spatial information framework for many civil and engineering applications such as evacuation planning, emergency and facility management. For accomplishing such sophisticate decision tasks, there is a large demands for building multi-scale and multi-sourced 3D urban models. Currently, Building Information Model (BIM) and Geographical Information Systems (GIS) are broadly used as the modelling sources. However, data sharing and exchanging information between two modelling domains is still a huge challenge; while the syntactic or semantic approaches do not fully provide exchanging of rich semantic and geometric information of BIM into GIS or vice-versa. This paper proposes a novel approach for integrating BIM and GIS using semantic web technologies and Resources Description Framework (RDF) graphs. The novelty of the proposed solution comes from the benefits of integrating BIM and GIS technologies into one unified model, so-called Integrated Geospatial Information Model (IGIM). The proposed approach consists of three main modules: BIM-RDF and GIS-RDF graphs construction, integrating of two RDF graphs, and query of information through IGIM-RDF graph using SPARQL. The IGIM generates queries from both the BIM and GIS RDF graphs resulting a semantically integrated model with entities representing both BIM classes and GIS feature objects with respect to the target-client application. The linkage between BIM-RDF and GIS-RDF is achieved through SPARQL endpoints and defined by a query using set of datasets and entity classes with complementary properties, relationships and geometries. To validate the proposed approach and its performance, a case study was also tested using IGIM system design.
An Ensemble Approach for Expanding Queries

DTIC Science & Technology

2012-11-01

0.39 pain^0.39 Hospital 15094 0.82 hospital^0.82 Miscarriage 45 3.35 miscarriage ^3.35 Radiotherapy 53 3.28 radiotherapy^3.28 Hypoaldosteronism 3...negated query is the expansion of the original query with negation terms preceding each word. For example, the negated version of “ miscarriage ^3.35...includes “no miscarriage ”^3.35 and “not miscarriage ”^3.35. If a document is the result of both original query and negated query, its score is
A Story of a Crashed Plane in US-Mexican border

NASA Astrophysics Data System (ADS)

Bermudez, Luis; Hobona, Gobe; Vretanos, Peter; Peterson, Perry

2013-04-01

A plane has crashed on the US-Mexican border. The search and rescue command center planner needs to find information about the crash site, a mountain, nearby mountains for the establishment of a communications tower, as well as ranches for setting up a local incident center. Events like this one occur all over the world and exchanging information seamlessly is key to save lives and prevent further disasters. This abstract describes an interoperability testbed that applied this scenario using technologies based on Open Geospatial Consortium (OGC) standards. The OGC, which has about 500 members, serves as a global forum for the collaboration of developers and users of spatial data products and services, and to advance the development of international standards for geospatial interoperability. The OGC Interoperability Program conducts international interoperability testbeds, such as the OGC Web Services Phase 9 (OWS-9), that encourages rapid development, testing, validation, demonstration and adoption of open, consensus based standards and best practices. The Cross-Community Interoperability (CCI) thread in OWS-9 advanced the Web Feature Service for Gazetteers (WFS-G) by providing a Single Point of Entry Global Gazetteer (SPEGG), where a user can submit a single query and access global geographic names data across multiple Federal names databases. Currently users must make two queries with differing input parameters against two separate databases to obtain authoritative cross border geographic names data. The gazetteers in this scenario included: GNIS and GNS. GNIS or Geographic Names Information System is managed by USGS. It was first developed in 1964 and contains information about domestic and Antarctic names. GNS or GeoNET Names Server provides the Geographic Names Data Base (GNDB) and it is managed by National Geospatial Intelligence Agency (NGA). GNS has been in service since 1994, and serves names for areas outside the United States and its dependent areas, as well as names for undersea features. The following challenges were advanced: Cascaded WFS-G servers (allowing to query multiple WFSs with a "parent" WFS), implemented query names filters (e.g. fuzzy search, text search), implemented dealing with multilingualism and diacritics, implemented advanced spatial constraints (e.g. search by radial search and nearest neighbor) and semantically mediated feature types (e.g. mountain vs. hill). To enable semantic mediation, a series of semantic mappings were defined between the NGA GNS, USGS GNIS and the Alexandria Digital Library (ADL) Gazetteer. The mappings were encoded in the Web Ontology Language (OWL) to enable them to be used by semantic web technologies. The semantic mappings were then published for ingestion into a semantic mediator that used the mappings to associate location types from one gazetteer with location types in another. The semantic mediator was then able to transform requests on the fly, providing a single point of entry WFS-G to multiple gazetteers. The presentation will provide a live presentation of the work performed, highlight main developments, and discuss future development.
CCTop: An Intuitive, Flexible and Reliable CRISPR/Cas9 Target Prediction Tool

PubMed Central

del Sol Keyer, Maria; Wittbrodt, Joachim; Mateo, Juan L.

2015-01-01

Engineering of the CRISPR/Cas9 system has opened a plethora of new opportunities for site-directed mutagenesis and targeted genome modification. Fundamental to this is a stretch of twenty nucleotides at the 5’ end of a guide RNA that provides specificity to the bound Cas9 endonuclease. Since a sequence of twenty nucleotides can occur multiple times in a given genome and some mismatches seem to be accepted by the CRISPR/Cas9 complex, an efficient and reliable in silico selection and evaluation of the targeting site is key prerequisite for the experimental success. Here we present the CRISPR/Cas9 target online predictor (CCTop, http://crispr.cos.uni-heidelberg.de) to overcome limitations of already available tools. CCTop provides an intuitive user interface with reasonable default parameters that can easily be tuned by the user. From a given query sequence, CCTop identifies and ranks all candidate sgRNA target sites according to their off-target quality and displays full documentation. CCTop was experimentally validated for gene inactivation, non-homologous end-joining as well as homology directed repair. Thus, CCTop provides the bench biologist with a tool for the rapid and efficient identification of high quality target sites. PMID:25909470
The igmspec database of public spectra probing the intergalactic medium

NASA Astrophysics Data System (ADS)

Prochaska, J. X.

2017-04-01

We describe v02 of igmspec, a database of publicly available ultraviolet, optical, and near-infrared spectra that probe the intergalactic medium (IGM). This database, a child of the specdb repository in the specdb github organization, comprises 403 277 unique sources and 434 686 spectra obtained with the world's greatest observatories. All of these data are distributed in a single ≈ 25GB HDF5 file maintained at the University of California Observatories and the University of California, Santa Cruz. The specdb software package includes Python scripts and modules for searching the source catalog and spectral datasets, and software links to the linetools package for spectral analysis. The repository also includes software to generate private spectral datasets that are compliant with International Virtual Observatory Alliance (IVOA) protocols and a Python-based interface for IVOA Simple Spectral Access queries. Future versions of igmspec will ingest other sources (e.g. gamma-ray burst afterglows) and other surveys as they become publicly available. The overall goal is to include every spectrum that effectively probes the IGM. Future databases of specdb may include publicly available galaxy spectra (exgalspec) and published supernovae spectra (snspec). The community is encouraged to join the effort on github: https://github.com/specdb.
A novel adaptive Cuckoo search for optimal query plan generation.

PubMed

Gomathi, Ramalingam; Sharmila, Dhandapani

2014-01-01

The emergence of multiple web pages day by day leads to the development of the semantic web technology. A World Wide Web Consortium (W3C) standard for storing semantic web data is the resource description framework (RDF). To enhance the efficiency in the execution time for querying large RDF graphs, the evolving metaheuristic algorithms become an alternate to the traditional query optimization methods. This paper focuses on the problem of query optimization of semantic web data. An efficient algorithm called adaptive Cuckoo search (ACS) for querying and generating optimal query plan for large RDF graphs is designed in this research. Experiments were conducted on different datasets with varying number of predicates. The experimental results have exposed that the proposed approach has provided significant results in terms of query execution time. The extent to which the algorithm is efficient is tested and the results are documented.
Query-Based Outlier Detection in Heterogeneous Information Networks.

PubMed

Kuck, Jonathan; Zhuang, Honglei; Yan, Xifeng; Cam, Hasan; Han, Jiawei

2015-03-01

Outlier or anomaly detection in large data sets is a fundamental task in data science, with broad applications. However, in real data sets with high-dimensional space, most outliers are hidden in certain dimensional combinations and are relative to a user's search space and interest. It is often more effective to give power to users and allow them to specify outlier queries flexibly, and the system will then process such mining queries efficiently. In this study, we introduce the concept of query-based outlier in heterogeneous information networks, design a query language to facilitate users to specify such queries flexibly, define a good outlier measure in heterogeneous networks, and study how to process outlier queries efficiently in large data sets. Our experiments on real data sets show that following such a methodology, interesting outliers can be defined and uncovered flexibly and effectively in large heterogeneous networks.
Query-Based Outlier Detection in Heterogeneous Information Networks

PubMed Central

Kuck, Jonathan; Zhuang, Honglei; Yan, Xifeng; Cam, Hasan; Han, Jiawei

2015-01-01

Outlier or anomaly detection in large data sets is a fundamental task in data science, with broad applications. However, in real data sets with high-dimensional space, most outliers are hidden in certain dimensional combinations and are relative to a user’s search space and interest. It is often more effective to give power to users and allow them to specify outlier queries flexibly, and the system will then process such mining queries efficiently. In this study, we introduce the concept of query-based outlier in heterogeneous information networks, design a query language to facilitate users to specify such queries flexibly, define a good outlier measure in heterogeneous networks, and study how to process outlier queries efficiently in large data sets. Our experiments on real data sets show that following such a methodology, interesting outliers can be defined and uncovered flexibly and effectively in large heterogeneous networks. PMID:27064397
Querying and Extracting Timeline Information from Road Traffic Sensor Data

PubMed Central

Imawan, Ardi; Indikawati, Fitri Indra; Kwon, Joonho; Rao, Praveen

2016-01-01

The escalation of traffic congestion in urban cities has urged many countries to use intelligent transportation system (ITS) centers to collect historical traffic sensor data from multiple heterogeneous sources. By analyzing historical traffic data, we can obtain valuable insights into traffic behavior. Many existing applications have been proposed with limited analysis results because of the inability to cope with several types of analytical queries. In this paper, we propose the QET (querying and extracting timeline information) system—a novel analytical query processing method based on a timeline model for road traffic sensor data. To address query performance, we build a TQ-index (timeline query-index) that exploits spatio-temporal features of timeline modeling. We also propose an intuitive timeline visualization method to display congestion events obtained from specified query parameters. In addition, we demonstrate the benefit of our system through a performance evaluation using a Busan ITS dataset and a Seattle freeway dataset. PMID:27563900

Improvement of density models of geological structures by fusion of gravity data and cosmic muon radiographies

NASA Astrophysics Data System (ADS)

Jourde, K.; Gibert, D.; Marteau, J.

2015-08-01

This paper examines how the resolution of small-scale geological density models is improved through the fusion of information provided by gravity measurements and density muon radiographies. Muon radiography aims at determining the density of geological bodies by measuring their screening effect on the natural flux of cosmic muons. Muon radiography essentially works like a medical X-ray scan and integrates density information along elongated narrow conical volumes. Gravity measurements are linked to density by a 3-D integration encompassing the whole studied domain. We establish the mathematical expressions of these integration formulas - called acquisition kernels - and derive the resolving kernels that are spatial filters relating the true unknown density structure to the density distribution actually recovered from the available data. The resolving kernel approach allows one to quantitatively describe the improvement of the resolution of the density models achieved by merging gravity data and muon radiographies. The method developed in this paper may be used to optimally design the geometry of the field measurements to be performed in order to obtain a given spatial resolution pattern of the density model to be constructed. The resolving kernels derived in the joined muon-gravimetry case indicate that gravity data are almost useless for constraining the density structure in regions sampled by more than two muon tomography acquisitions. Interestingly, the resolution in deeper regions not sampled by muon tomography is significantly improved by joining the two techniques. The method is illustrated with examples for the La Soufrière volcano of Guadeloupe.
Policy Compliance of Queries for Private Information Retrieval

DTIC Science & Technology

2010-11-01

SPARQL, unfortunately, is not in RDF and so we had to develop tools to translate SPARQL queries into RDF to be used by our policy compliance prototype...policy-assurance/sparql2n3.py) that accepts SPARQL queries and returns the translated query in our simplified ontology. An example of a translated
Knowledge Query Language (KQL)

DTIC Science & Technology

2016-02-12

Lexington Massachusetts This page intentionally left blank. iii EXECUTIVE SUMMARY Currently, queries for data ...retrieval from non-Structured Query Language (NoSQL) data stores are tightly coupled to the specific implementation of the data store implementation...independent of the storage content and format for querying NoSQL or relational data stores. This approach uses address expressions (or A-Expressions
Fragger: a protein fragment picker for structural queries.

PubMed

Berenger, Francois; Simoncini, David; Voet, Arnout; Shrestha, Rojan; Zhang, Kam Y J

2017-01-01

Protein modeling and design activities often require querying the Protein Data Bank (PDB) with a structural fragment, possibly containing gaps. For some applications, it is preferable to work on a specific subset of the PDB or with unpublished structures. These requirements, along with specific user needs, motivated the creation of a new software to manage and query 3D protein fragments. Fragger is a protein fragment picker that allows protein fragment databases to be created and queried. All fragment lengths are supported and any set of PDB files can be used to create a database. Fragger can efficiently search a fragment database with a query fragment and a distance threshold. Matching fragments are ranked by distance to the query. The query fragment can have structural gaps and the allowed amino acid sequences matching a query can be constrained via a regular expression of one-letter amino acid codes. Fragger also incorporates a tool to compute the backbone RMSD of one versus many fragments in high throughput. Fragger should be useful for protein design, loop grafting and related structural bioinformatics tasks.
Query by forms: User-oriented relational database retrieving system and its application in analysis of experiment data

NASA Astrophysics Data System (ADS)

Skotniczny, Zbigniew

1989-12-01

The Query by Forms (QbF) system is a user-oriented interactive tool for querying large relational database with minimal queries difinition cost. The system was worked out under the assumption that user's time and effort for defining needed queries is the most severe bottleneck. The system may be applied in any Rdb/VMS databases system and is recommended for specific information systems of any project where end-user queries cannot be foreseen. The tool is dedicated to specialist of an application domain who have to analyze data maintained in database from any needed point of view, who do not need to know commercial databases languages. The paper presents the system developed as a compromise between its functionality and usability. User-system communication via a menu-driven "tree-like" structure of screen-forms which produces a query difinition and execution is discussed in detail. Output of query results (printed reports and graphics) is also discussed. Finally the paper shows one application of QbF to a HERA-project.
A study on spatial decision support systems for HIV/AIDS prevention based on COM GIS technology

NASA Astrophysics Data System (ADS)

Yang, Kun; Luo, Huasong; Peng, Shungyun; Xu, Quanli

2007-06-01

Based on the deeply analysis of the current status and the existing problems of GIS technology applications in Epidemiology, this paper has proposed the method and process for establishing the spatial decision support systems of AIDS epidemic prevention by integrating the COM GIS, Spatial Database, GPS, Remote Sensing, and Communication technologies, as well as ASP and ActiveX software development technologies. One of the most important issues for constructing the spatial decision support systems of AIDS epidemic prevention is how to integrate the AIDS spreading models with GIS. The capabilities of GIS applications in the AIDS epidemic prevention have been described here in this paper firstly. Then some mature epidemic spreading models have also been discussed for extracting the computation parameters. Furthermore, a technical schema has been proposed for integrating the AIDS spreading models with GIS and relevant geospatial technologies, in which the GIS and model running platforms share a common spatial database and the computing results can be spatially visualized on Desktop or Web GIS clients. Finally, a complete solution for establishing the decision support systems of AIDS epidemic prevention has been offered in this paper based on the model integrating methods and ESRI COM GIS software packages. The general decision support systems are composed of data acquisition sub-systems, network communication sub-systems, model integrating sub-systems, AIDS epidemic information spatial database sub-systems, AIDS epidemic information querying and statistical analysis sub-systems, AIDS epidemic dynamic surveillance sub-systems, AIDS epidemic information spatial analysis and decision support sub-systems, as well as AIDS epidemic information publishing sub-systems based on Web GIS.
Hybrid ontology for semantic information retrieval model using keyword matching indexing system.

PubMed

Uthayan, K R; Mala, G S Anandha

2015-01-01

Ontology is the process of growth and elucidation of concepts of an information domain being common for a group of users. Establishing ontology into information retrieval is a normal method to develop searching effects of relevant information users require. Keywords matching process with historical or information domain is significant in recent calculations for assisting the best match for specific input queries. This research presents a better querying mechanism for information retrieval which integrates the ontology queries with keyword search. The ontology-based query is changed into a primary order to predicate logic uncertainty which is used for routing the query to the appropriate servers. Matching algorithms characterize warm area of researches in computer science and artificial intelligence. In text matching, it is more dependable to study semantics model and query for conditions of semantic matching. This research develops the semantic matching results between input queries and information in ontology field. The contributed algorithm is a hybrid method that is based on matching extracted instances from the queries and information field. The queries and information domain is focused on semantic matching, to discover the best match and to progress the executive process. In conclusion, the hybrid ontology in semantic web is sufficient to retrieve the documents when compared to standard ontology.
Hybrid Ontology for Semantic Information Retrieval Model Using Keyword Matching Indexing System

PubMed Central

Uthayan, K. R.; Anandha Mala, G. S.

2015-01-01

Ontology is the process of growth and elucidation of concepts of an information domain being common for a group of users. Establishing ontology into information retrieval is a normal method to develop searching effects of relevant information users require. Keywords matching process with historical or information domain is significant in recent calculations for assisting the best match for specific input queries. This research presents a better querying mechanism for information retrieval which integrates the ontology queries with keyword search. The ontology-based query is changed into a primary order to predicate logic uncertainty which is used for routing the query to the appropriate servers. Matching algorithms characterize warm area of researches in computer science and artificial intelligence. In text matching, it is more dependable to study semantics model and query for conditions of semantic matching. This research develops the semantic matching results between input queries and information in ontology field. The contributed algorithm is a hybrid method that is based on matching extracted instances from the queries and information field. The queries and information domain is focused on semantic matching, to discover the best match and to progress the executive process. In conclusion, the hybrid ontology in semantic web is sufficient to retrieve the documents when compared to standard ontology. PMID:25922851
Triage by ranking to support the curation of protein interactions

PubMed Central

Pasche, Emilie; Gobeill, Julien; Rech de Laval, Valentine; Gleizes, Anne; Michel, Pierre-André; Bairoch, Amos

2017-01-01

Abstract Today, molecular biology databases are the cornerstone of knowledge sharing for life and health sciences. The curation and maintenance of these resources are labour intensive. Although text mining is gaining impetus among curators, its integration in curation workflow has not yet been widely adopted. The Swiss Institute of Bioinformatics Text Mining and CALIPHO groups joined forces to design a new curation support system named nextA5. In this report, we explore the integration of novel triage services to support the curation of two types of biological data: protein–protein interactions (PPIs) and post-translational modifications (PTMs). The recognition of PPIs and PTMs poses a special challenge, as it not only requires the identification of biological entities (proteins or residues), but also that of particular relationships (e.g. binding or position). These relationships cannot be described with onto-terminological descriptors such as the Gene Ontology for molecular functions, which makes the triage task more challenging. Prioritizing papers for these tasks thus requires the development of different approaches. In this report, we propose a new method to prioritize articles containing information specific to PPIs and PTMs. The new resources (RESTful APIs, semantically annotated MEDLINE library) enrich the neXtA5 platform. We tuned the article prioritization model on a set of 100 proteins previously annotated by the CALIPHO group. The effectiveness of the triage service was tested with a dataset of 200 annotated proteins. We defined two sets of descriptors to support automatic triage: the first set to enrich for papers with PPI data, and the second for PTMs. All occurrences of these descriptors were marked-up in MEDLINE and indexed, thus constituting a semantically annotated version of MEDLINE. These annotations were then used to estimate the relevance of a particular article with respect to the chosen annotation type. This relevance score was combined with a local vector-space search engine to generate a ranked list of PMIDs. We also evaluated a query refinement strategy, which adds specific keywords (such as ‘binds’ or ‘interacts’) to the original query. Compared to PubMed, the search effectiveness of the nextA5 triage service is improved by 190% for the prioritization of papers with PPIs information and by 260% for papers with PTMs information. Combining advanced retrieval and query refinement strategies with automatically enriched MEDLINE contents is effective to improve triage in complex curation tasks such as the curation of protein PPIs and PTMs. Database URL: http://candy.hesge.ch/nextA5 PMID:29220432
The role of economics in the QUERI program: QUERI Series

PubMed Central

Smith, Mark W; Barnett, Paul G

2008-01-01

Background The United States (U.S.) Department of Veterans Affairs (VA) Quality Enhancement Research Initiative (QUERI) has implemented economic analyses in single-site and multi-site clinical trials. To date, no one has reviewed whether the QUERI Centers are taking an optimal approach to doing so. Consistent with the continuous learning culture of the QUERI Program, this paper provides such a reflection. Methods We present a case study of QUERI as an example of how economic considerations can and should be integrated into implementation research within both single and multi-site studies. We review theoretical and applied cost research in implementation studies outside and within VA. We also present a critique of the use of economic research within the QUERI program. Results Economic evaluation is a key element of implementation research. QUERI has contributed many developments in the field of implementation but has only recently begun multi-site implementation trials across multiple regions within the national VA healthcare system. These trials are unusual in their emphasis on developing detailed costs of implementation, as well as in the use of business case analyses (budget impact analyses). Conclusion Economics appears to play an important role in QUERI implementation studies, only after implementation has reached the stage of multi-site trials. Economic analysis could better inform the choice of which clinical best practices to implement and the choice of implementation interventions to employ. QUERI economics also would benefit from research on costing methods and development of widely accepted international standards for implementation economics. PMID:18430199
The role of economics in the QUERI program: QUERI Series.

PubMed

Smith, Mark W; Barnett, Paul G

2008-04-22

The United States (U.S.) Department of Veterans Affairs (VA) Quality Enhancement Research Initiative (QUERI) has implemented economic analyses in single-site and multi-site clinical trials. To date, no one has reviewed whether the QUERI Centers are taking an optimal approach to doing so. Consistent with the continuous learning culture of the QUERI Program, this paper provides such a reflection. We present a case study of QUERI as an example of how economic considerations can and should be integrated into implementation research within both single and multi-site studies. We review theoretical and applied cost research in implementation studies outside and within VA. We also present a critique of the use of economic research within the QUERI program. Economic evaluation is a key element of implementation research. QUERI has contributed many developments in the field of implementation but has only recently begun multi-site implementation trials across multiple regions within the national VA healthcare system. These trials are unusual in their emphasis on developing detailed costs of implementation, as well as in the use of business case analyses (budget impact analyses). Economics appears to play an important role in QUERI implementation studies, only after implementation has reached the stage of multi-site trials. Economic analysis could better inform the choice of which clinical best practices to implement and the choice of implementation interventions to employ. QUERI economics also would benefit from research on costing methods and development of widely accepted international standards for implementation economics.
East-China Geochemistry Database (ECGD):A New Networking Database for North China Craton

NASA Astrophysics Data System (ADS)

Wang, X.; Ma, W.

2010-12-01

North China Craton is one of the best natural laboratories that research some Earth Dynamic questions[1]. Scientists made much progress in research on this area, and got vast geochemistry data, which are essential for answering many fundamental questions about the age, composition, structure, and evolution of the East China area. But the geochemical data have long been accessible only through the scientific literature and theses where they have been widely dispersed, making it difficult for the broad Geosciences community to find, access and efficiently use the full range of available data[2]. How to effectively store, manage, share and reuse the existing geochemical data in the North China Craton area? East-China Geochemistry Database(ECGD) is a networking geochemical scientific database system that has been designed based on WebGIS and relational database for the structured storage and retrieval of geochemical data and geological map information. It is integrated the functions of data retrieval, spatial visualization and online analysis. ECGD focus on three areas: 1.Storage and retrieval of geochemical data and geological map information. Research on the characters of geochemical data, including its composing and connecting of each other, we designed a relational database, which based on geochemical relational data model, to store a variety of geological sample information such as sampling locality, age, sample characteristics, reference, major elements, rare earth elements, trace elements and isotope system et al. And a web-based user-friendly interface is provided for constructing queries. 2.Data view. ECGD is committed to online data visualization by different ways, especially to view data in digital map with dynamic way. Because ECGD was integrated WebGIS technology, the query results can be mapped on digital map, which can be zoomed, translation and dot selection. Besides of view and output query results data by html, txt or xls formats, researchers also can generate classification thematic maps using query results, according different parameters. 3.Data analysis on-line. Here we designed lots of geochemical online analysis tools, including geochemical diagrams, CIPW computing, and so on, which allows researchers to analyze query data without download query results. Operation of all these analysis tools is very easy; users just do it by click mouse one or two time. In summary, ECGD provide a geochemical platform for researchers, whom to know where various data are, to view various data in a synthetic and dynamic way, and analyze interested data online. REFERENCES [1] S. Gao, R.L. Rudnick, and W.L. Xu, “Recycling deep cratonic lithosphere and generation of intraplate magmatism in the North China Craton,” Earth and Planetary Science Letters,270,41-53,2008. [2] K.A. Lehnert, U. Harms, and E. Ito, “Promises, Achievements, and Challenges of Networking Global Geoinformatics Resources - Experiences of GeosciNET and EarthChem,” Geophysical Research Abstracts, Vol.10, EGU2008-A-05242,2008.
Geospatial Data Management Platform for Urban Groundwater

NASA Astrophysics Data System (ADS)

Gaitanaru, D.; Priceputu, A.; Gogu, C. R.

2012-04-01

Due to the large amount of civil work projects and research studies, large quantities of geo-data are produced for the urban environments. These data are usually redundant as well as they are spread in different institutions or private companies. Time consuming operations like data processing and information harmonisation represents the main reason to systematically avoid the re-use of data. The urban groundwater data shows the same complex situation. The underground structures (subway lines, deep foundations, underground parkings, and others), the urban facility networks (sewer systems, water supply networks, heating conduits, etc), the drainage systems, the surface water works and many others modify continuously. As consequence, their influence on groundwater changes systematically. However, these activities provide a large quantity of data, aquifers modelling and then behaviour prediction can be done using monitored quantitative and qualitative parameters. Due to the rapid evolution of technology in the past few years, transferring large amounts of information through internet has now become a feasible solution for sharing geoscience data. Furthermore, standard platform-independent means to do this have been developed (specific mark-up languages like: GML, GeoSciML, WaterML, GWML, CityML). They allow easily large geospatial databases updating and sharing through internet, even between different companies or between research centres that do not necessarily use the same database structures. For Bucharest City (Romania) an integrated platform for groundwater geospatial data management is developed under the framework of a national research project - "Sedimentary media modeling platform for groundwater management in urban areas" (SIMPA) financed by the National Authority for Scientific Research of Romania. The platform architecture is based on three components: a geospatial database, a desktop application (a complex set of hydrogeological and geological analysis tools) and a front-end geoportal service. The SIMPA platform makes use of mark-up transfer standards to provide a user-friendly application that can be accessed through internet to query, analyse, and visualise geospatial data related to urban groundwater. The platform holds the information within the local groundwater geospatial databases and the user is able to access this data through a geoportal service. The database architecture allows storing accurate and very detailed geological, hydrogeological, and infrastructure information that can be straightforwardly generalized and further upscaled. The geoportal service offers the possibility of querying a dataset from the spatial database. The query is coded in a standard mark-up language, and sent to the server through a standard Hyper Text Transfer Protocol (http) to be processed by the local application. After the validation of the query, the results are sent back to the user to be displayed by the geoportal application. The main advantage of the SIMPA platform is that it offers to the user the possibility to make a primary multi-criteria query, which results in a smaller set of data to be analysed afterwards. This improves both the transfer process parameters and the user's means of creating the desired query.
Processing SPARQL queries with regular expressions in RDF databases

PubMed Central

2011-01-01

Background As the Resource Description Framework (RDF) data model is widely used for modeling and sharing a lot of online bioinformatics resources such as Uniprot (dev.isb-sib.ch/projects/uniprot-rdf) or Bio2RDF (bio2rdf.org), SPARQL - a W3C recommendation query for RDF databases - has become an important query language for querying the bioinformatics knowledge bases. Moreover, due to the diversity of users’ requests for extracting information from the RDF data as well as the lack of users’ knowledge about the exact value of each fact in the RDF databases, it is desirable to use the SPARQL query with regular expression patterns for querying the RDF data. To the best of our knowledge, there is currently no work that efficiently supports regular expression processing in SPARQL over RDF databases. Most of the existing techniques for processing regular expressions are designed for querying a text corpus, or only for supporting the matching over the paths in an RDF graph. Results In this paper, we propose a novel framework for supporting regular expression processing in SPARQL query. Our contributions can be summarized as follows. 1) We propose an efficient framework for processing SPARQL queries with regular expression patterns in RDF databases. 2) We propose a cost model in order to adapt the proposed framework in the existing query optimizers. 3) We build a prototype for the proposed framework in C++ and conduct extensive experiments demonstrating the efficiency and effectiveness of our technique. Conclusions Experiments with a full-blown RDF engine show that our framework outperforms the existing ones by up to two orders of magnitude in processing SPARQL queries with regular expression patterns. PMID:21489225
Processing SPARQL queries with regular expressions in RDF databases.

PubMed

Lee, Jinsoo; Pham, Minh-Duc; Lee, Jihwan; Han, Wook-Shin; Cho, Hune; Yu, Hwanjo; Lee, Jeong-Hoon

2011-03-29

As the Resource Description Framework (RDF) data model is widely used for modeling and sharing a lot of online bioinformatics resources such as Uniprot (dev.isb-sib.ch/projects/uniprot-rdf) or Bio2RDF (bio2rdf.org), SPARQL - a W3C recommendation query for RDF databases - has become an important query language for querying the bioinformatics knowledge bases. Moreover, due to the diversity of users' requests for extracting information from the RDF data as well as the lack of users' knowledge about the exact value of each fact in the RDF databases, it is desirable to use the SPARQL query with regular expression patterns for querying the RDF data. To the best of our knowledge, there is currently no work that efficiently supports regular expression processing in SPARQL over RDF databases. Most of the existing techniques for processing regular expressions are designed for querying a text corpus, or only for supporting the matching over the paths in an RDF graph. In this paper, we propose a novel framework for supporting regular expression processing in SPARQL query. Our contributions can be summarized as follows. 1) We propose an efficient framework for processing SPARQL queries with regular expression patterns in RDF databases. 2) We propose a cost model in order to adapt the proposed framework in the existing query optimizers. 3) We build a prototype for the proposed framework in C++ and conduct extensive experiments demonstrating the efficiency and effectiveness of our technique. Experiments with a full-blown RDF engine show that our framework outperforms the existing ones by up to two orders of magnitude in processing SPARQL queries with regular expression patterns.
Exploring performance issues for a clinical database organized using an entity-attribute-value representation.

PubMed

Chen, R S; Nadkarni, P; Marenco, L; Levin, F; Erdos, J; Miller, P L

2000-01-01

The entity-attribute-value representation with classes and relationships (EAV/CR) provides a flexible and simple database schema to store heterogeneous biomedical data. In certain circumstances, however, the EAV/CR model is known to retrieve data less efficiently than conventionally based database schemas. To perform a pilot study that systematically quantifies performance differences for database queries directed at real-world microbiology data modeled with EAV/CR and conventional representations, and to explore the relative merits of different EAV/CR query implementation strategies. Clinical microbiology data obtained over a ten-year period were stored using both database models. Query execution times were compared for four clinically oriented attribute-centered and entity-centered queries operating under varying conditions of database size and system memory. The performance characteristics of three different EAV/CR query strategies were also examined. Performance was similar for entity-centered queries in the two database models. Performance in the EAV/CR model was approximately three to five times less efficient than its conventional counterpart for attribute-centered queries. The differences in query efficiency became slightly greater as database size increased, although they were reduced with the addition of system memory. The authors found that EAV/CR queries formulated using multiple, simple SQL statements executed in batch were more efficient than single, large SQL statements. This paper describes a pilot project to explore issues in and compare query performance for EAV/CR and conventional database representations. Although attribute-centered queries were less efficient in the EAV/CR model, these inefficiencies may be addressable, at least in part, by the use of more powerful hardware or more memory, or both.
Correlation between National Influenza Surveillance Data and Search Queries from Mobile Devices and Desktops in South Korea

PubMed Central

Seo, Dong-Woo; Sohn, Chang Hwan; Kim, Sung-Hoon; Ryoo, Seung Mok; Lee, Yoon-Seon; Lee, Jae Ho; Kim, Won Young; Lim, Kyoung Soo

2016-01-01

Background Digital surveillance using internet search queries can improve both the sensitivity and timeliness of the detection of a health event, such as an influenza outbreak. While it has recently been estimated that the mobile search volume surpasses the desktop search volume and mobile search patterns differ from desktop search patterns, the previous digital surveillance systems did not distinguish mobile and desktop search queries. The purpose of this study was to compare the performance of mobile and desktop search queries in terms of digital influenza surveillance. Methods and Results The study period was from September 6, 2010 through August 30, 2014, which consisted of four epidemiological years. Influenza-like illness (ILI) and virologic surveillance data from the Korea Centers for Disease Control and Prevention were used. A total of 210 combined queries from our previous survey work were used for this study. Mobile and desktop weekly search data were extracted from Naver, which is the largest search engine in Korea. Spearman’s correlation analysis was used to examine the correlation of the mobile and desktop data with ILI and virologic data in Korea. We also performed lag correlation analysis. We observed that the influenza surveillance performance of mobile search queries matched or exceeded that of desktop search queries over time. The mean correlation coefficients of mobile search queries and the number of queries with an r-value of ≥ 0.7 equaled or became greater than those of desktop searches over the four epidemiological years. A lag correlation analysis of up to two weeks showed similar trends. Conclusion Our study shows that mobile search queries for influenza surveillance have equaled or even become greater than desktop search queries over time. In the future development of influenza surveillance using search queries, the recognition of changing trend of mobile search data could be necessary. PMID:27391028
Correlation between National Influenza Surveillance Data and Search Queries from Mobile Devices and Desktops in South Korea.

PubMed

Shin, Soo-Yong; Kim, Taerim; Seo, Dong-Woo; Sohn, Chang Hwan; Kim, Sung-Hoon; Ryoo, Seung Mok; Lee, Yoon-Seon; Lee, Jae Ho; Kim, Won Young; Lim, Kyoung Soo

2016-01-01

Digital surveillance using internet search queries can improve both the sensitivity and timeliness of the detection of a health event, such as an influenza outbreak. While it has recently been estimated that the mobile search volume surpasses the desktop search volume and mobile search patterns differ from desktop search patterns, the previous digital surveillance systems did not distinguish mobile and desktop search queries. The purpose of this study was to compare the performance of mobile and desktop search queries in terms of digital influenza surveillance. The study period was from September 6, 2010 through August 30, 2014, which consisted of four epidemiological years. Influenza-like illness (ILI) and virologic surveillance data from the Korea Centers for Disease Control and Prevention were used. A total of 210 combined queries from our previous survey work were used for this study. Mobile and desktop weekly search data were extracted from Naver, which is the largest search engine in Korea. Spearman's correlation analysis was used to examine the correlation of the mobile and desktop data with ILI and virologic data in Korea. We also performed lag correlation analysis. We observed that the influenza surveillance performance of mobile search queries matched or exceeded that of desktop search queries over time. The mean correlation coefficients of mobile search queries and the number of queries with an r-value of ≥ 0.7 equaled or became greater than those of desktop searches over the four epidemiological years. A lag correlation analysis of up to two weeks showed similar trends. Our study shows that mobile search queries for influenza surveillance have equaled or even become greater than desktop search queries over time. In the future development of influenza surveillance using search queries, the recognition of changing trend of mobile search data could be necessary.
Searching for cancer information on the internet: analyzing natural language search queries.

PubMed

Bader, Judith L; Theofanos, Mary Frances

2003-12-11

Searching for health information is one of the most-common tasks performed by Internet users. Many users begin searching on popular search engines rather than on prominent health information sites. We know that many visitors to our (National Cancer Institute) Web site, cancer.gov, arrive via links in search engine result. To learn more about the specific needs of our general-public users, we wanted to understand what lay users really wanted to know about cancer, how they phrased their questions, and how much detail they used. The National Cancer Institute partnered with AskJeeves, Inc to develop a methodology to capture, sample, and analyze 3 months of cancer-related queries on the Ask.com Web site, a prominent United States consumer search engine, which receives over 35 million queries per week. Using a benchmark set of 500 terms and word roots supplied by the National Cancer Institute, AskJeeves identified a test sample of cancer queries for 1 week in August 2001. From these 500 terms only 37 appeared >or= 5 times/day over the trial test week in 17208 queries. Using these 37 terms, 204165 instances of cancer queries were found in the Ask.com query logs for the actual test period of June-August 2001. Of these, 7500 individual user questions were randomly selected for detailed analysis and assigned to appropriate categories. The exact language of sample queries is presented. Considering multiples of the same questions, the sample of 7500 individual user queries represented 76077 queries (37% of the total 3-month pool). Overall 78.37% of sampled Cancer queries asked about 14 specific cancer types. Within each cancer type, queries were sorted into appropriate subcategories including at least the following: General Information, Symptoms, Diagnosis and Testing, Treatment, Statistics, Definition, and Cause/Risk/Link. The most-common specific cancer types mentioned in queries were Digestive/Gastrointestinal/Bowel (15.0%), Breast (11.7%), Skin (11.3%), and Genitourinary (10.5%). Additional subcategories of queries about specific cancer types varied, depending on user input. Queries that were not specific to a cancer type were also tracked and categorized. Natural-language searching affords users the opportunity to fully express their information needs and can aid users naïve to the content and vocabulary. The specific queries analyzed for this study reflect news and research studies reported during the study dates and would surely change with different study dates. Analyzing queries from search engines represents one way of knowing what kinds of content to provide to users of a given Web site. Users ask questions using whole sentences and keywords, often misspelling words. Providing the option for natural-language searching does not obviate the need for good information architecture, usability engineering, and user testing in order to optimize user experience.
Searching for Cancer Information on the Internet: Analyzing Natural Language Search Queries

PubMed Central

Theofanos, Mary Frances

2003-01-01

Background Searching for health information is one of the most-common tasks performed by Internet users. Many users begin searching on popular search engines rather than on prominent health information sites. We know that many visitors to our (National Cancer Institute) Web site, cancer.gov, arrive via links in search engine result. Objective To learn more about the specific needs of our general-public users, we wanted to understand what lay users really wanted to know about cancer, how they phrased their questions, and how much detail they used. Methods The National Cancer Institute partnered with AskJeeves, Inc to develop a methodology to capture, sample, and analyze 3 months of cancer-related queries on the Ask.com Web site, a prominent United States consumer search engine, which receives over 35 million queries per week. Using a benchmark set of 500 terms and word roots supplied by the National Cancer Institute, AskJeeves identified a test sample of cancer queries for 1 week in August 2001. From these 500 terms only 37 appeared ≥ 5 times/day over the trial test week in 17208 queries. Using these 37 terms, 204165 instances of cancer queries were found in the Ask.com query logs for the actual test period of June-August 2001. Of these, 7500 individual user questions were randomly selected for detailed analysis and assigned to appropriate categories. The exact language of sample queries is presented. Results Considering multiples of the same questions, the sample of 7500 individual user queries represented 76077 queries (37% of the total 3-month pool). Overall 78.37% of sampled Cancer queries asked about 14 specific cancer types. Within each cancer type, queries were sorted into appropriate subcategories including at least the following: General Information, Symptoms, Diagnosis and Testing, Treatment, Statistics, Definition, and Cause/Risk/Link. The most-common specific cancer types mentioned in queries were Digestive/Gastrointestinal/Bowel (15.0%), Breast (11.7%), Skin (11.3%), and Genitourinary (10.5%). Additional subcategories of queries about specific cancer types varied, depending on user input. Queries that were not specific to a cancer type were also tracked and categorized. Conclusions Natural-language searching affords users the opportunity to fully express their information needs and can aid users naïve to the content and vocabulary. The specific queries analyzed for this study reflect news and research studies reported during the study dates and would surely change with different study dates. Analyzing queries from search engines represents one way of knowing what kinds of content to provide to users of a given Web site. Users ask questions using whole sentences and keywords, often misspelling words. Providing the option for natural-language searching does not obviate the need for good information architecture, usability engineering, and user testing in order to optimize user experience. PMID:14713659

Inter-specific territoriality in a Canis hybrid zone: spatial segregation between wolves, coyotes, and hybrids.

PubMed

Benson, John F; Patterson, Brent R

2013-12-01

Gray wolves (Canis lupus) and coyotes (Canis latrans) generally exhibit intraspecific territoriality manifesting in spatial segregation between adjacent packs. However, previous studies have found a high degree of interspecific spatial overlap between sympatric wolves and coyotes. Eastern wolves (Canis lycaon) are the most common wolf in and around Algonquin Provincial Park (APP), Ontario, Canada and hybridize with sympatric gray wolves and coyotes. We hypothesized that all Canis types (wolves, coyotes, and hybrids) exhibit a high degree of spatial segregation due to greater genetic, morphologic, and ecological similarities between wolves and coyotes in this hybrid system compared with western North American ecosystems. We used global positioning system telemetry and probabilistic measures of spatial overlap to investigate spatial segregation between adjacent Canis packs. Our hypothesis was supported as: (1) the probability of locating wolves, coyotes, and hybrids within home ranges ([Formula: see text] = 0.05) or core areas ([Formula: see text] < 0.01) of adjacent packs was low; and (2) the amount of shared space use was negligible. Spatial segregation did not vary substantially in relation to genotypes of adjacent packs or local environmental conditions (i.e., harvest regulations or road densities). We provide the first telemetry-based demonstration of spatial segregation between wolves and coyotes, highlighting the novel relationships between Canis types in the Ontario hybrid zone relative to areas where wolves and coyotes are reproductively isolated. Territoriality among Canis may increase the likelihood of eastern wolves joining coyote and hybrid packs, facilitate hybridization, and could play a role in limiting expansion of the genetically distinct APP eastern wolf population.
Nebhydro: Sharing Geospatial Data to Supportwater Management in Nebraska

NASA Astrophysics Data System (ADS)

Kamble, B.; Irmak, A.; Hubbard, K.; Deogun, J.; Dvorak, B.

2012-12-01

Recent advances in web-enabled geographical technologies have the potential to make a dramatic impact on development of highly interactive spatial applications on the web for visualization of large-scale geospatial data by water resources and irrigation scientists. Spatial and point scale water resources data visualization are an emerging and challenging application domain. Query based visual explorations of geospatial hydrological data can play an important role in stimulating scientific hypotheses and seeking causal relationships among hydro variables. The Nebraska Hydrological Information System (NebHydro) utilizes ESRI's ArcGIS server technology to increase technological awareness among farmers, irrigation managers and policy makers. Web-based geospatial applications are an effective way to expose scientific hydrological datasets to the research community and the public. NebHydro uses Adobe Flex technology to offer an online visualization and data analysis system for presentation of social and economic data. Internet mapping services is an integrated product of GIS and Internet technologies; it is a favored solution to achieve the interoperability of GIS. The development of Internet based GIS services in the state of Nebraska showcases the benefits of sharing geospatial hydrological data among agencies, resource managers and policy makers. Geospatial hydrological Information (Evapotranspiration from Remote Sensing, vegetation indices (NDVI), USGS Stream gauge data, Climatic data etc.) is generally generated through model simulation (METRIC, SWAP, Linux, Python based scripting etc). Information is compiled into and stored within object oriented relational spatial databases using a geodatabase information model that supports the key data types needed by applications including features, relationships, networks, imagery, terrains, maps and layers. The system provides online access, querying, visualization, and analysis of the hydrological data from several sources at one place. The study indicates that internet GIS, developed using advanced technologies, provides valuable education potential to users in hydrology and irrigation engineering and suggests that such a system can support advanced hydrological data access and analysis tools to improve utility of data in operations. Keywords: Hydrological Information System, NebHydro, Water Management, data sharing, data visualization, ArcGIS server.
The Huaihe Basin Water Resource and Water Quality Management Platform Implemented with a Spatio-Temporal Data Model

NASA Astrophysics Data System (ADS)

Liu, Y.; Zhang, W.; Yan, C.

2012-07-01

Presently, planning and assessment in maintenance, renewal and decision-making for watershed hydrology, water resource management and water quality assessment are evolving toward complex, spatially explicit regional environmental assessments. These problems have to be addressed with object-oriented spatio-temporal data models that can restore, manage, query and visualize various historic and updated basic information concerning with watershed hydrology, water resource management and water quality as well as compute and evaluate the watershed environmental conditions so as to provide online forecasting to police-makers and relevant authorities for supporting decision-making. The extensive data requirements and the difficult task of building input parameter files, however, has long been an obstacle to use of such complex models timely and effectively by resource managers. Success depends on an integrated approach that brings together scientific, education and training advances made across many individual disciplines and modified to fit the needs of the individuals and groups who must write, implement, evaluate, and adjust their watershed management plans. The centre for Hydro-science Research, Nanjing University, in cooperation with the relevant watershed management authorities, has developed a WebGIS management platform to facilitate this complex process. Improve the management of watersheds over the Huaihe basin through the development, promotion and use of a web-based, user-friendly, geospatial watershed management data and decision support system (WMDDSS) involved many difficulties for the development of this complicated System. In terms of the spatial and temporal characteristics of historic and currently available information on meteorological, hydrological, geographical, environmental and other relevant disciplines, we designed an object-oriented spatiotemporal data model that combines spatial, attribute and temporal information to implement the management system. Using this system, we can update, query and analyze environmental information as well as manage historical data, and a visualization tool was provided to help the user interpret results so as to provide scientific support for decision-making. The utility of the system has been demonstrated its values by being used in watershed management and environmental assessments.
Searching for Images: The Analysis of Users' Queries for Image Retrieval in American History.

ERIC Educational Resources Information Center

Choi, Youngok; Rasmussen, Edie M.

2003-01-01

Studied users' queries for visual information in American history to identify the image attributes important for retrieval and the characteristics of users' queries for digital images, based on queries from 38 faculty and graduate students. Results of pre- and post-test questionnaires and interviews suggest principle categories of search terms.…
Searching and Filtering Tweets: CSIRO at the TREC 2012 Microblog Track

DTIC Science & Technology

2012-11-01

stages. We first evaluate the effect of tweet corpus pre- processing in vanilla runs (no query expansion), and then assess the effect of query expansion...Effect of a vanilla run on D4 index (both realtime and non-real-time), and query expansion methods based on the submitted runs for two sets of queries
Knowledge Query Language (KQL)

DTIC Science & Technology

2016-02-01

unlimited. This page intentionally left blank. iii EXECUTIVE SUMMARY Currently, queries for data ...retrieval from non-Structured Query Language (NoSQL) data stores are tightly coupled to the specific implementation of the data store implementation, making...of the storage content and format for querying NoSQL or relational data stores. This approach uses address expressions (or A-Expressions) embedded in
System, method and apparatus for conducting a keyterm search

NASA Technical Reports Server (NTRS)

McGreevy, Michael W. (Inventor)

2004-01-01

A keyterm search is a method of searching a database for subsets of the database that are relevant to an input query. First, a number of relational models of subsets of a database are provided. A query is then input. The query can include one or more keyterms. Next, a gleaning model of the query is created. The gleaning model of the query is then compared to each one of the relational models of subsets of the database. The identifiers of the relevant subsets are then output.
System, method and apparatus for conducting a phrase search

NASA Technical Reports Server (NTRS)

McGreevy, Michael W. (Inventor)

2004-01-01

A phrase search is a method of searching a database for subsets of the database that are relevant to an input query. First, a number of relational models of subsets of a database are provided. A query is then input. The query can include one or more sequences of terms. Next, a relational model of the query is created. The relational model of the query is then compared to each one of the relational models of subsets of the database. The identifiers of the relevant subsets are then output.
Targeted exploration and analysis of large cross-platform human transcriptomic compendia

PubMed Central

Zhu, Qian; Wong, Aaron K; Krishnan, Arjun; Aure, Miriam R; Tadych, Alicja; Zhang, Ran; Corney, David C; Greene, Casey S; Bongo, Lars A; Kristensen, Vessela N; Charikar, Moses; Li, Kai; Troyanskaya, Olga G.

2016-01-01

We present SEEK (http://seek.princeton.edu), a query-based search engine across very large transcriptomic data collections, including thousands of human data sets from almost 50 microarray and next-generation sequencing platforms. SEEK uses a novel query-level cross-validation-based algorithm to automatically prioritize data sets relevant to the query and a robust search approach to identify query-coregulated genes, pathways, and processes. SEEK provides cross-platform handling, multi-gene query search, iterative metadata-based search refinement, and extensive visualization-based analysis options. PMID:25581801
Optimization of the Controlled Evaluation of Closed Relational Queries

NASA Astrophysics Data System (ADS)

Biskup, Joachim; Lochner, Jan-Hendrik; Sonntag, Sebastian

For relational databases, controlled query evaluation is an effective inference control mechanism preserving confidentiality regarding a previously declared confidentiality policy. Implementations of controlled query evaluation usually lack efficiency due to costly theorem prover calls. Suitably constrained controlled query evaluation can be implemented efficiently, but is not flexible enough from the perspective of database users and security administrators. In this paper, we propose an optimized framework for controlled query evaluation in relational databases, being efficiently implementable on the one hand and relaxing the constraints of previous approaches on the other hand.
An index-based algorithm for fast on-line query processing of latent semantic analysis

PubMed Central

Li, Pohan; Wang, Wei

2017-01-01

Latent Semantic Analysis (LSA) is widely used for finding the documents whose semantic is similar to the query of keywords. Although LSA yield promising similar results, the existing LSA algorithms involve lots of unnecessary operations in similarity computation and candidate check during on-line query processing, which is expensive in terms of time cost and cannot efficiently response the query request especially when the dataset becomes large. In this paper, we study the efficiency problem of on-line query processing for LSA towards efficiently searching the similar documents to a given query. We rewrite the similarity equation of LSA combined with an intermediate value called partial similarity that is stored in a designed index called partial index. For reducing the searching space, we give an approximate form of similarity equation, and then develop an efficient algorithm for building partial index, which skips the partial similarities lower than a given threshold θ. Based on partial index, we develop an efficient algorithm called ILSA for supporting fast on-line query processing. The given query is transformed into a pseudo document vector, and the similarities between query and candidate documents are computed by accumulating the partial similarities obtained from the index nodes corresponds to non-zero entries in the pseudo document vector. Compared to the LSA algorithm, ILSA reduces the time cost of on-line query processing by pruning the candidate documents that are not promising and skipping the operations that make little contribution to similarity scores. Extensive experiments through comparison with LSA have been done, which demonstrate the efficiency and effectiveness of our proposed algorithm. PMID:28520747
Analyzing Medical Image Search Behavior: Semantics and Prediction of Query Results.

PubMed

De-Arteaga, Maria; Eggel, Ivan; Kahn, Charles E; Müller, Henning

2015-10-01

Log files of information retrieval systems that record user behavior have been used to improve the outcomes of retrieval systems, understand user behavior, and predict events. In this article, a log file of the ARRS GoldMiner search engine containing 222,005 consecutive queries is analyzed. Time stamps are available for each query, as well as masked IP addresses, which enables to identify queries from the same person. This article describes the ways in which physicians (or Internet searchers interested in medical images) search and proposes potential improvements by suggesting query modifications. For example, many queries contain only few terms and therefore are not specific; others contain spelling mistakes or non-medical terms that likely lead to poor or empty results. One of the goals of this report is to predict the number of results a query will have since such a model allows search engines to automatically propose query modifications in order to avoid result lists that are empty or too large. This prediction is made based on characteristics of the query terms themselves. Prediction of empty results has an accuracy above 88%, and thus can be used to automatically modify the query to avoid empty result sets for a user. The semantic analysis and data of reformulations done by users in the past can aid the development of better search systems, particularly to improve results for novice users. Therefore, this paper gives important ideas to better understand how people search and how to use this knowledge to improve the performance of specialized medical search engines.
An index-based algorithm for fast on-line query processing of latent semantic analysis.

PubMed

Zhang, Mingxi; Li, Pohan; Wang, Wei

2017-01-01

Latent Semantic Analysis (LSA) is widely used for finding the documents whose semantic is similar to the query of keywords. Although LSA yield promising similar results, the existing LSA algorithms involve lots of unnecessary operations in similarity computation and candidate check during on-line query processing, which is expensive in terms of time cost and cannot efficiently response the query request especially when the dataset becomes large. In this paper, we study the efficiency problem of on-line query processing for LSA towards efficiently searching the similar documents to a given query. We rewrite the similarity equation of LSA combined with an intermediate value called partial similarity that is stored in a designed index called partial index. For reducing the searching space, we give an approximate form of similarity equation, and then develop an efficient algorithm for building partial index, which skips the partial similarities lower than a given threshold θ. Based on partial index, we develop an efficient algorithm called ILSA for supporting fast on-line query processing. The given query is transformed into a pseudo document vector, and the similarities between query and candidate documents are computed by accumulating the partial similarities obtained from the index nodes corresponds to non-zero entries in the pseudo document vector. Compared to the LSA algorithm, ILSA reduces the time cost of on-line query processing by pruning the candidate documents that are not promising and skipping the operations that make little contribution to similarity scores. Extensive experiments through comparison with LSA have been done, which demonstrate the efficiency and effectiveness of our proposed algorithm.
Mitochondrial phylogeography of a Beringian relict: the endemic freshwater genus of blackfish Dallia (Esociformes).

PubMed

Campbell, M A; Lopéz, J A

2014-02-01

Mitochondrial genetic variability among populations of the blackfish genus Dallia (Esociformes) across Beringia was examined. Levels of divergence and patterns of geographic distribution of mitochondrial DNA lineages were characterized using phylogenetic inference, median-joining haplotype networks, Bayesian skyline plots, mismatch analysis and spatial analysis of molecular variance (SAMOVA) to infer genealogical relationships and to assess patterns of phylogeography among extant mitochondrial lineages in populations of species of Dallia. The observed variation includes extensive standing mitochondrial genetic diversity and patterns of distinct spatial segregation corresponding to historical and contemporary barriers with minimal or no mixing of mitochondrial haplotypes between geographic areas. Mitochondrial diversity is highest in the common delta formed by the Yukon and Kuskokwim Rivers where they meet the Bering Sea. Other regions sampled in this study host comparatively low levels of mitochondrial diversity. The observed levels of mitochondrial diversity and the spatial distribution of that diversity are consistent with persistence of mitochondrial lineages in multiple refugia through the last glacial maximum. © 2014 The Fisheries Society of the British Isles.
Bat-Inspired Algorithm Based Query Expansion for Medical Web Information Retrieval.

PubMed

Khennak, Ilyes; Drias, Habiba

2017-02-01

With the increasing amount of medical data available on the Web, looking for health information has become one of the most widely searched topics on the Internet. Patients and people of several backgrounds are now using Web search engines to acquire medical information, including information about a specific disease, medical treatment or professional advice. Nonetheless, due to a lack of medical knowledge, many laypeople have difficulties in forming appropriate queries to articulate their inquiries, which deem their search queries to be imprecise due the use of unclear keywords. The use of these ambiguous and vague queries to describe the patients' needs has resulted in a failure of Web search engines to retrieve accurate and relevant information. One of the most natural and promising method to overcome this drawback is Query Expansion. In this paper, an original approach based on Bat Algorithm is proposed to improve the retrieval effectiveness of query expansion in medical field. In contrast to the existing literature, the proposed approach uses Bat Algorithm to find the best expanded query among a set of expanded query candidates, while maintaining low computational complexity. Moreover, this new approach allows the determination of the length of the expanded query empirically. Numerical results on MEDLINE, the on-line medical information database, show that the proposed approach is more effective and efficient compared to the baseline.
Semantic querying of relational data for clinical intelligence: a semantic web services-based approach

PubMed Central

2013-01-01

Background Clinical Intelligence, as a research and engineering discipline, is dedicated to the development of tools for data analysis for the purposes of clinical research, surveillance, and effective health care management. Self-service ad hoc querying of clinical data is one desirable type of functionality. Since most of the data are currently stored in relational or similar form, ad hoc querying is problematic as it requires specialised technical skills and the knowledge of particular data schemas. Results A possible solution is semantic querying where the user formulates queries in terms of domain ontologies that are much easier to navigate and comprehend than data schemas. In this article, we are exploring the possibility of using SADI Semantic Web services for semantic querying of clinical data. We have developed a prototype of a semantic querying infrastructure for the surveillance of, and research on, hospital-acquired infections. Conclusions Our results suggest that SADI can support ad-hoc, self-service, semantic queries of relational data in a Clinical Intelligence context. The use of SADI compares favourably with approaches based on declarative semantic mappings from data schemas to ontologies, such as query rewriting and RDFizing by materialisation, because it can easily cope with situations when (i) some computation is required to turn relational data into RDF or OWL, e.g., to implement temporal reasoning, or (ii) integration with external data sources is necessary. PMID:23497556
Efficient Queries of Stand-off Annotations for Natural Language Processing on Electronic Medical Records.

PubMed

Luo, Yuan; Szolovits, Peter

2016-01-01

In natural language processing, stand-off annotation uses the starting and ending positions of an annotation to anchor it to the text and stores the annotation content separately from the text. We address the fundamental problem of efficiently storing stand-off annotations when applying natural language processing on narrative clinical notes in electronic medical records (EMRs) and efficiently retrieving such annotations that satisfy position constraints. Efficient storage and retrieval of stand-off annotations can facilitate tasks such as mapping unstructured text to electronic medical record ontologies. We first formulate this problem into the interval query problem, for which optimal query/update time is in general logarithm. We next perform a tight time complexity analysis on the basic interval tree query algorithm and show its nonoptimality when being applied to a collection of 13 query types from Allen's interval algebra. We then study two closely related state-of-the-art interval query algorithms, proposed query reformulations, and augmentations to the second algorithm. Our proposed algorithm achieves logarithmic time stabbing-max query time complexity and solves the stabbing-interval query tasks on all of Allen's relations in logarithmic time, attaining the theoretic lower bound. Updating time is kept logarithmic and the space requirement is kept linear at the same time. We also discuss interval management in external memory models and higher dimensions.
Efficient Queries of Stand-off Annotations for Natural Language Processing on Electronic Medical Records

PubMed Central

Luo, Yuan; Szolovits, Peter

2016-01-01

In natural language processing, stand-off annotation uses the starting and ending positions of an annotation to anchor it to the text and stores the annotation content separately from the text. We address the fundamental problem of efficiently storing stand-off annotations when applying natural language processing on narrative clinical notes in electronic medical records (EMRs) and efficiently retrieving such annotations that satisfy position constraints. Efficient storage and retrieval of stand-off annotations can facilitate tasks such as mapping unstructured text to electronic medical record ontologies. We first formulate this problem into the interval query problem, for which optimal query/update time is in general logarithm. We next perform a tight time complexity analysis on the basic interval tree query algorithm and show its nonoptimality when being applied to a collection of 13 query types from Allen’s interval algebra. We then study two closely related state-of-the-art interval query algorithms, proposed query reformulations, and augmentations to the second algorithm. Our proposed algorithm achieves logarithmic time stabbing-max query time complexity and solves the stabbing-interval query tasks on all of Allen’s relations in logarithmic time, attaining the theoretic lower bound. Updating time is kept logarithmic and the space requirement is kept linear at the same time. We also discuss interval management in external memory models and higher dimensions. PMID:27478379
Executing SPARQL Queries over the Web of Linked Data

NASA Astrophysics Data System (ADS)

Hartig, Olaf; Bizer, Christian; Freytag, Johann-Christoph

The Web of Linked Data forms a single, globally distributed dataspace. Due to the openness of this dataspace, it is not possible to know in advance all data sources that might be relevant for query answering. This openness poses a new challenge that is not addressed by traditional research on federated query processing. In this paper we present an approach to execute SPARQL queries over the Web of Linked Data. The main idea of our approach is to discover data that might be relevant for answering a query during the query execution itself. This discovery is driven by following RDF links between data sources based on URIs in the query and in partial results. The URIs are resolved over the HTTP protocol into RDF data which is continuously added to the queried dataset. This paper describes concepts and algorithms to implement our approach using an iterator-based pipeline. We introduce a formalization of the pipelining approach and show that classical iterators may cause blocking due to the latency of HTTP requests. To avoid blocking, we propose an extension of the iterator paradigm. The evaluation of our approach shows its strengths as well as the still existing challenges.
A Natural Language Interface Concordant with a Knowledge Base.

PubMed

Han, Yong-Jin; Park, Seong-Bae; Park, Se-Young

2016-01-01

The discordance between expressions interpretable by a natural language interface (NLI) system and those answerable by a knowledge base is a critical problem in the field of NLIs. In order to solve this discordance problem, this paper proposes a method to translate natural language questions into formal queries that can be generated from a graph-based knowledge base. The proposed method considers a subgraph of a knowledge base as a formal query. Thus, all formal queries corresponding to a concept or a predicate in the knowledge base can be generated prior to query time and all possible natural language expressions corresponding to each formal query can also be collected in advance. A natural language expression has a one-to-one mapping with a formal query. Hence, a natural language question is translated into a formal query by matching the question with the most appropriate natural language expression. If the confidence of this matching is not sufficiently high the proposed method rejects the question and does not answer it. Multipredicate queries are processed by regarding them as a set of collected expressions. The experimental results show that the proposed method thoroughly handles answerable questions from the knowledge base and rejects unanswerable ones effectively.

Saying What You're Looking For: Linguistics Meets Video Search.

PubMed

Barrett, Daniel Paul; Barbu, Andrei; Siddharth, N; Siskind, Jeffrey Mark

2016-10-01

We present an approach to searching large video corpora for clips which depict a natural-language query in the form of a sentence. Compositional semantics is used to encode subtle meaning differences lost in other approaches, such as the difference between two sentences which have identical words but entirely different meaning: The person rode the horse versus The horse rode the person. Given a sentential query and a natural-language parser, we produce a score indicating how well a video clip depicts that sentence for each clip in a corpus and return a ranked list of clips. Two fundamental problems are addressed simultaneously: detecting and tracking objects, and recognizing whether those tracks depict the query. Because both tracking and object detection are unreliable, our approach uses the sentential query to focus the tracker on the relevant participants and ensures that the resulting tracks are described by the sentential query. While most earlier work was limited to single-word queries which correspond to either verbs or nouns, we search for complex queries which contain multiple phrases, such as prepositional phrases, and modifiers, such as adverbs. We demonstrate this approach by searching for 2,627 naturally elicited sentential queries in 10 Hollywood movies.
Context-Aware Online Commercial Intention Detection

NASA Astrophysics Data System (ADS)

Hu, Derek Hao; Shen, Dou; Sun, Jian-Tao; Yang, Qiang; Chen, Zheng

With more and more commercial activities moving onto the Internet, people tend to purchase what they need through Internet or conduct some online research before the actual transactions happen. For many Web users, their online commercial activities start from submitting a search query to search engines. Just like the common Web search queries, the queries with commercial intention are usually very short. Recognizing the queries with commercial intention against the common queries will help search engines provide proper search results and advertisements, help Web users obtain the right information they desire and help the advertisers benefit from the potential transactions. However, the intentions behind a query vary a lot for users with different background and interest. The intentions can even be different for the same user, when the query is issued in different contexts. In this paper, we present a new algorithm framework based on skip-chain conditional random field (SCCRF) for automatically classifying Web queries according to context-based online commercial intention. We analyze our algorithm performance both theoretically and empirically. Extensive experiments on several real search engine log datasets show that our algorithm can improve more than 10% on F1 score than previous algorithms on commercial intention detection.
A Tracking Analyst for large 3D spatiotemporal data from multiple sources (case study: Tracking volcanic eruptions in the atmosphere)

NASA Astrophysics Data System (ADS)

Gad, Mohamed A.; Elshehaly, Mai H.; Gračanin, Denis; Elmongui, Hicham G.

2018-02-01

This research presents a novel Trajectory-based Tracking Analyst (TTA) that can track and link spatiotemporally variable data from multiple sources. The proposed technique uses trajectory information to determine the positions of time-enabled and spatially variable scatter data at any given time through a combination of along trajectory adjustment and spatial interpolation. The TTA is applied in this research to track large spatiotemporal data of volcanic eruptions (acquired using multi-sensors) in the unsteady flow field of the atmosphere. The TTA enables tracking injections into the atmospheric flow field, the reconstruction of the spatiotemporally variable data at any desired time, and the spatiotemporal join of attribute data from multiple sources. In addition, we were able to create a smooth animation of the volcanic ash plume at interactive rates. The initial results indicate that the TTA can be applied to a wide range of multiple-source data.
Incremental Query Rewriting with Resolution

NASA Astrophysics Data System (ADS)

Riazanov, Alexandre; Aragão, Marcelo A. T.

We address the problem of semantic querying of relational databases (RDB) modulo knowledge bases using very expressive knowledge representation formalisms, such as full first-order logic or its various fragments. We propose to use a resolution-based first-order logic (FOL) reasoner for computing schematic answers to deductive queries, with the subsequent translation of these schematic answers to SQL queries which are evaluated using a conventional relational DBMS. We call our method incremental query rewriting, because an original semantic query is rewritten into a (potentially infinite) series of SQL queries. In this chapter, we outline the main idea of our technique - using abstractions of databases and constrained clauses for deriving schematic answers, and provide completeness and soundness proofs to justify the applicability of this technique to the case of resolution for FOL without equality. The proposed method can be directly used with regular RDBs, including legacy databases. Moreover, we propose it as a potential basis for an efficient Web-scale semantic search technology.
In-context query reformulation for failing SPARQL queries

NASA Astrophysics Data System (ADS)

Viswanathan, Amar; Michaelis, James R.; Cassidy, Taylor; de Mel, Geeth; Hendler, James

2017-05-01

Knowledge bases for decision support systems are growing increasingly complex, through continued advances in data ingest and management approaches. However, humans do not possess the cognitive capabilities to retain a bird's-eyeview of such knowledge bases, and may end up issuing unsatisfiable queries to such systems. This work focuses on the implementation of a query reformulation approach for graph-based knowledge bases, specifically designed to support the Resource Description Framework (RDF). The reformulation approach presented is instance-and schema-aware. Thus, in contrast to relaxation techniques found in the state-of-the-art, the presented approach produces in-context query reformulation.
Model-based query language for analyzing clinical processes.

PubMed

Barzdins, Janis; Barzdins, Juris; Rencis, Edgars; Sostaks, Agris

2013-01-01

Nowadays large databases of clinical process data exist in hospitals. However, these data are rarely used in full scope. In order to perform queries on hospital processes, one must either choose from the predefined queries or develop queries using MS Excel-type software system, which is not always a trivial task. In this paper we propose a new query language for analyzing clinical processes that is easily perceptible also by non-IT professionals. We develop this language based on a process modeling language which is also described in this paper. Prototypes of both languages have already been verified using real examples from hospitals.
AQBE — QBE Style Queries for Archetyped Data

NASA Astrophysics Data System (ADS)

Sachdeva, Shelly; Yaginuma, Daigo; Chu, Wanming; Bhalla, Subhash

Large-scale adoption of electronic healthcare applications requires semantic interoperability. The new proposals propose an advanced (multi-level) DBMS architecture for repository services for health records of patients. These also require query interfaces at multiple levels and at the level of semi-skilled users. In this regard, a high-level user interface for querying the new form of standardized Electronic Health Records system has been examined in this study. It proposes a step-by-step graphical query interface to allow semi-skilled users to write queries. Its aim is to decrease user effort and communication ambiguities, and increase user friendliness.
3D GIS spatial operation based on extended Euler operators

NASA Astrophysics Data System (ADS)

Xu, Hongbo; Lu, Guonian; Sheng, Yehua; Zhou, Liangchen; Guo, Fei; Shang, Zuoyan; Wang, Jing

2008-10-01

The implementation of 3 dimensions spatial operations, based on certain data structure, has a lack of universality and is not able to treat with non-manifold cases, at present. ISO/DIS 19107 standard just presents the definition of Boolean operators and set operators for topological relationship query, and OGC GeoXACML gives formal definitions for several set functions without implementation detail. Aiming at these problems, based mathematical foundation on cell complex theory, supported by non-manifold data structure and using relevant research in the field of non-manifold geometry modeling for reference, firstly, this paper according to non-manifold Euler-Poincaré formula constructs 6 extended Euler operators and inverse operators to carry out creating, updating and deleting 3D spatial elements, as well as several pairs of supplementary Euler operators to convenient for implementing advanced functions. Secondly, we change topological element operation sequence of Boolean operation and set operation as well as set functions defined in GeoXACML into combination of extended Euler operators, which separates the upper functions and lower data structure. Lastly, we develop underground 3D GIS prototype system, in which practicability and credibility of extended Euler operators faced to 3D GIS presented by this paper are validated.
Intelligent Context-Aware and Adaptive Interface for Mobile LBS

PubMed Central

Liu, Yanhong

2015-01-01

Context-aware user interface plays an important role in many human-computer Interaction tasks of location based services. Although spatial models for context-aware systems have been studied extensively, how to locate specific spatial information for users is still not well resolved, which is important in the mobile environment where location based services users are impeded by device limitations. Better context-aware human-computer interaction models of mobile location based services are needed not just to predict performance outcomes, such as whether people will be able to find the information needed to complete a human-computer interaction task, but to understand human processes that interact in spatial query, which will in turn inform the detailed design of better user interfaces in mobile location based services. In this study, a context-aware adaptive model for mobile location based services interface is proposed, which contains three major sections: purpose, adjustment, and adaptation. Based on this model we try to describe the process of user operation and interface adaptation clearly through the dynamic interaction between users and the interface. Then we show how the model applies users' demands in a complicated environment and suggested the feasibility by the experimental results. PMID:26457077
The Design of a High Performance Earth Imagery and Raster Data Management and Processing Platform

NASA Astrophysics Data System (ADS)

Xie, Qingyun

2016-06-01

This paper summarizes the general requirements and specific characteristics of both geospatial raster database management system and raster data processing platform from a domain-specific perspective as well as from a computing point of view. It also discusses the need of tight integration between the database system and the processing system. These requirements resulted in Oracle Spatial GeoRaster, a global scale and high performance earth imagery and raster data management and processing platform. The rationale, design, implementation, and benefits of Oracle Spatial GeoRaster are described. Basically, as a database management system, GeoRaster defines an integrated raster data model, supports image compression, data manipulation, general and spatial indices, content and context based queries and updates, versioning, concurrency, security, replication, standby, backup and recovery, multitenancy, and ETL. It provides high scalability using computer and storage clustering. As a raster data processing platform, GeoRaster provides basic operations, image processing, raster analytics, and data distribution featuring high performance computing (HPC). Specifically, HPC features include locality computing, concurrent processing, parallel processing, and in-memory computing. In addition, the APIs and the plug-in architecture are discussed.
Building a Pre-Competitive Knowledge Base to Support Australia's Wave Energy Industry

NASA Astrophysics Data System (ADS)

Hoeke, R. K.; Hemer, M. A.; Symonds, G.; Rosebrock, U.; Kenyon, R.; Zieger, S.; Durrant, T.; Contardo, S.; O'Grady, J.; Mcinnes, K. L.

2016-02-01

A pre-competitive, query-able and openly available spatio-temporal atlas of Australia's wind-wave energy resource and marine management uses is being delivered. To provide the best representation of wave energy resource information, accounting for both spatial and temporal characteristics of the resource, a 34+yr numerical hindcast of wave conditions in the Australian region has been developed. Considerable in situ and remotely sensed data have been collected to support calibration and validation of the hindcast, resulting in a high-quality characterisation of the available wave resource in the Australian domain. Planning for wave energy projects is also subject to other spatial constraints. Spatial information on alternative uses of the marine domain including, for example, fisheries and aquaculture, oil and gas, shipping, navigation and ports, marine parks and reserves, sub-sea cables and infrastructure, shipwrecks and sites of cultural significance, have been compiled to complement the spatial characterisation of resource and support spatial planning of future wave energy projects. Both resource and spatial constraint information are being disseminated via a state-of-the-art portal, designed to meet the needs of all industry stakeholders. Another aspect currently impeding the industry in Australia is the limited evidence-base of impacts of wave energy extraction on adjacent marine and coastal environments. To build this evidence base, a network of in situ wave measurement devices have been deployed surrounding the 3 wave energy converters of Carnegie Wave Energy Limited's Perth Wave Energy Project. This data is being used to calibrate and validate numerical simulations of the project site. Early stage results will be presented.
Querying Proofs

NASA Technical Reports Server (NTRS)

Aspinall, David; Denney, Ewen; Lueth, Christoph

2012-01-01

We motivate and introduce a query language PrQL designed for inspecting machine representations of proofs. PrQL natively supports hiproofs which express proof structure using hierarchical nested labelled trees. The core language presented in this paper is locally structured (first-order), with queries built using recursion and patterns over proof structure and rule names. We define the syntax and semantics of locally structured queries, demonstrate their power, and sketch some implementation experiments.
Effective Multi-Query Expansions: Collaborative Deep Networks for Robust Landmark Retrieval.

PubMed

Wang, Yang; Lin, Xuemin; Wu, Lin; Zhang, Wenjie

2017-03-01

Given a query photo issued by a user (q-user), the landmark retrieval is to return a set of photos with their landmarks similar to those of the query, while the existing studies on the landmark retrieval focus on exploiting geometries of landmarks for similarity matches between candidate photos and a query photo. We observe that the same landmarks provided by different users over social media community may convey different geometry information depending on the viewpoints and/or angles, and may, subsequently, yield very different results. In fact, dealing with the landmarks with low quality shapes caused by the photography of q-users is often nontrivial and has seldom been studied. In this paper, we propose a novel framework, namely, multi-query expansions, to retrieve semantically robust landmarks by two steps. First, we identify the top- k photos regarding the latent topics of a query landmark to construct multi-query set so as to remedy its possible low quality shape. For this purpose, we significantly extend the techniques of Latent Dirichlet Allocation. Then, motivated by the typical collaborative filtering methods, we propose to learn a collaborative deep networks-based semantically, nonlinear, and high-level features over the latent factor for landmark photo as the training set, which is formed by matrix factorization over collaborative user-photo matrix regarding the multi-query set. The learned deep network is further applied to generate the features for all the other photos, meanwhile resulting into a compact multi-query set within such space. Then, the final ranking scores are calculated over the high-level feature space between the multi-query set and all other photos, which are ranked to serve as the final ranking list of landmark retrieval. Extensive experiments are conducted on real-world social media data with both landmark photos together with their user information to show the superior performance over the existing methods, especially our recently proposed multi-query based mid-level pattern representation method [1].
CUFID-query: accurate network querying through random walk based network flow estimation.

PubMed

Jeong, Hyundoo; Qian, Xiaoning; Yoon, Byung-Jun

2017-12-28

Functional modules in biological networks consist of numerous biomolecules and their complicated interactions. Recent studies have shown that biomolecules in a functional module tend to have similar interaction patterns and that such modules are often conserved across biological networks of different species. As a result, such conserved functional modules can be identified through comparative analysis of biological networks. In this work, we propose a novel network querying algorithm based on the CUFID (Comparative network analysis Using the steady-state network Flow to IDentify orthologous proteins) framework combined with an efficient seed-and-extension approach. The proposed algorithm, CUFID-query, can accurately detect conserved functional modules as small subnetworks in the target network that are expected to perform similar functions to the given query functional module. The CUFID framework was recently developed for probabilistic pairwise global comparison of biological networks, and it has been applied to pairwise global network alignment, where the framework was shown to yield accurate network alignment results. In the proposed CUFID-query algorithm, we adopt the CUFID framework and extend it for local network alignment, specifically to solve network querying problems. First, in the seed selection phase, the proposed method utilizes the CUFID framework to compare the query and the target networks and to predict the probabilistic node-to-node correspondence between the networks. Next, the algorithm selects and greedily extends the seed in the target network by iteratively adding nodes that have frequent interactions with other nodes in the seed network, in a way that the conductance of the extended network is maximally reduced. Finally, CUFID-query removes irrelevant nodes from the querying results based on the personalized PageRank vector for the induced network that includes the fully extended network and its neighboring nodes. Through extensive performance evaluation based on biological networks with known functional modules, we show that CUFID-query outperforms the existing state-of-the-art algorithms in terms of prediction accuracy and biological significance of the predictions.
Querying graphs in protein-protein interactions networks using feedback vertex set.

PubMed

Blin, Guillaume; Sikora, Florian; Vialette, Stéphane

2010-01-01

Recent techniques increase rapidly the amount of our knowledge on interactions between proteins. The interpretation of these new information depends on our ability to retrieve known substructures in the data, the Protein-Protein Interactions (PPIs) networks. In an algorithmic point of view, it is an hard task since it often leads to NP-hard problems. To overcome this difficulty, many authors have provided tools for querying patterns with a restricted topology, i.e., paths or trees in PPI networks. Such restriction leads to the development of fixed parameter tractable (FPT) algorithms, which can be practicable for restricted sizes of queries. Unfortunately, Graph Homomorphism is a W[1]-hard problem, and hence, no FPT algorithm can be found when patterns are in the shape of general graphs. However, Dost et al. gave an algorithm (which is not implemented) to query graphs with a bounded treewidth in PPI networks (the treewidth of the query being involved in the time complexity). In this paper, we propose another algorithm for querying pattern in the shape of graphs, also based on dynamic programming and the color-coding technique. To transform graphs queries into trees without loss of informations, we use feedback vertex set coupled to a node duplication mechanism. Hence, our algorithm is FPT for querying graphs with a bounded size of their feedback vertex set. It gives an alternative to the treewidth parameter, which can be better or worst for a given query. We provide a python implementation which allows us to validate our implementation on real data. Especially, we retrieve some human queries in the shape of graphs into the fly PPI network.
Development and empirical user-centered evaluation of semantically-based query recommendation for an electronic health record search engine.

PubMed

Hanauer, David A; Wu, Danny T Y; Yang, Lei; Mei, Qiaozhu; Murkowski-Steffy, Katherine B; Vydiswaran, V G Vinod; Zheng, Kai

2017-03-01

The utility of biomedical information retrieval environments can be severely limited when users lack expertise in constructing effective search queries. To address this issue, we developed a computer-based query recommendation algorithm that suggests semantically interchangeable terms based on an initial user-entered query. In this study, we assessed the value of this approach, which has broad applicability in biomedical information retrieval, by demonstrating its application as part of a search engine that facilitates retrieval of information from electronic health records (EHRs). The query recommendation algorithm utilizes MetaMap to identify medical concepts from search queries and indexed EHR documents. Synonym variants from UMLS are used to expand the concepts along with a synonym set curated from historical EHR search logs. The empirical study involved 33 clinicians and staff who evaluated the system through a set of simulated EHR search tasks. User acceptance was assessed using the widely used technology acceptance model. The search engine's performance was rated consistently higher with the query recommendation feature turned on vs. off. The relevance of computer-recommended search terms was also rated high, and in most cases the participants had not thought of these terms on their own. The questions on perceived usefulness and perceived ease of use received overwhelmingly positive responses. A vast majority of the participants wanted the query recommendation feature to be available to assist in their day-to-day EHR search tasks. Challenges persist for users to construct effective search queries when retrieving information from biomedical documents including those from EHRs. This study demonstrates that semantically-based query recommendation is a viable solution to addressing this challenge. Published by Elsevier Inc.
Interactive visual exploration and analysis of origin-destination data

NASA Astrophysics Data System (ADS)

Ding, Linfang; Meng, Liqiu; Yang, Jian; Krisp, Jukka M.

2018-05-01

In this paper, we propose a visual analytics approach for the exploration of spatiotemporal interaction patterns of massive origin-destination data. Firstly, we visually query the movement database for data at certain time windows. Secondly, we conduct interactive clustering to allow the users to select input variables/features (e.g., origins, destinations, distance, and duration) and to adjust clustering parameters (e.g. distance threshold). The agglomerative hierarchical clustering method is applied for the multivariate clustering of the origin-destination data. Thirdly, we design a parallel coordinates plot for visualizing the precomputed clusters and for further exploration of interesting clusters. Finally, we propose a gradient line rendering technique to show the spatial and directional distribution of origin-destination clusters on a map view. We implement the visual analytics approach in a web-based interactive environment and apply it to real-world floating car data from Shanghai. The experiment results show the origin/destination hotspots and their spatial interaction patterns. They also demonstrate the effectiveness of our proposed approach.
Combining qualitative and quantitative spatial and temporal information in a hierarchical structure: Approximate reasoning for plan execution monitoring

NASA Technical Reports Server (NTRS)

Hoebel, Louis J.

1993-01-01

The problem of plan generation (PG) and the problem of plan execution monitoring (PEM), including updating, queries, and resource-bounded replanning, have different reasoning and representation requirements. PEM requires the integration of qualitative and quantitative information. PEM is the receiving of data about the world in which a plan or agent is executing. The problem is to quickly determine the relevance of the data, the consistency of the data with respect to the expected effects, and if execution should continue. Only spatial and temporal aspects of the plan are addressed for relevance in this work. Current temporal reasoning systems are deficient in computational aspects or expressiveness. This work presents a hybrid qualitative and quantitative system that is fully expressive in its assertion language while offering certain computational efficiencies. In order to proceed, methods incorporating approximate reasoning using hierarchies, notions of locality, constraint expansion, and absolute parameters need be used and are shown to be useful for the anytime nature of PEM.
Development of a 3D GIS and its application to karst areas

NASA Astrophysics Data System (ADS)

Wu, Qiang; Xu, Hua; Zhou, Wanfang

2008-05-01

There is a growing interest in modeling and analyzing karst phenomena in three dimensions. This paper integrates geology, groundwater hydrology, geographic information system (GIS), database management system (DBMS), visualization and data mining to study karst features in Huaibei, China. The 3D geo-objects retrieved from the karst area are analyzed and mapped into different abstract levels. The spatial relationships among the objects are constructed by a dual-linker. The shapes of the 3D objects and the topological models with attributes are stored and maintained in the DBMS. Spatial analysis was then used to integrate the data in the DBMS and the 3D model to form a virtual reality (VR) to provide analytical functions such as distribution analysis, correlation query, and probability assessment. The research successfully implements 3D modeling and analyses in the karst area, and meanwhile provides an efficient tool for government policy-makers to set out restrictions on water resource development in the area.
Relational Database for the Geology of the Northern Rocky Mountains - Idaho, Montana, and Washington

USGS Publications Warehouse

Causey, J. Douglas; Zientek, Michael L.; Bookstrom, Arthur A.; Frost, Thomas P.; Evans, Karl V.; Wilson, Anna B.; Van Gosen, Bradley S.; Boleneus, David E.; Pitts, Rebecca A.

2008-01-01

A relational database was created to prepare and organize geologic map-unit and lithologic descriptions for input into a spatial database for the geology of the northern Rocky Mountains, a compilation of forty-three geologic maps for parts of Idaho, Montana, and Washington in U.S. Geological Survey Open File Report 2005-1235. Not all of the information was transferred to and incorporated in the spatial database due to physical file limitations. This report releases that part of the relational database that was completed for that earlier product. In addition to descriptive geologic information for the northern Rocky Mountains region, the relational database contains a substantial bibliography of geologic literature for the area. The relational database nrgeo.mdb (linked below) is available in Microsoft Access version 2000, a proprietary database program. The relational database contains data tables and other tables used to define terms, relationships between the data tables, and hierarchical relationships in the data; forms used to enter data; and queries used to extract data.

Occam's razor: supporting visual query expression for content-based image queries

NASA Astrophysics Data System (ADS)

Venters, Colin C.; Hartley, Richard J.; Hewitt, William T.

2005-01-01

This paper reports the results of a usability experiment that investigated visual query formulation on three dimensions: effectiveness, efficiency, and user satisfaction. Twenty eight evaluation sessions were conducted in order to assess the extent to which query by visual example supports visual query formulation in a content-based image retrieval environment. In order to provide a context and focus for the investigation, the study was segmented by image type, user group, and use function. The image type consisted of a set of abstract geometric device marks supplied by the UK Trademark Registry. Users were selected from the 14 UK Patent Information Network offices. The use function was limited to the retrieval of images by shape similarity. Two client interfaces were developed for comparison purposes: Trademark Image Browser Engine (TRIBE) and Shape Query Image Retrieval Systems Engine (SQUIRE).
Geometric Representations of Condition Queries on Three-Dimensional Vector Fields

NASA Technical Reports Server (NTRS)

Henze, Chris

1999-01-01

Condition queries on distributed data ask where particular conditions are satisfied. It is possible to represent condition queries as geometric objects by plotting field data in various spaces derived from the data, and by selecting loci within these derived spaces which signify the desired conditions. Rather simple geometric partitions of derived spaces can represent complex condition queries because much complexity can be encapsulated in the derived space mapping itself A geometric view of condition queries provides a useful conceptual unification, allowing one to intuitively understand many existing vector field feature detection algorithms -- and to design new ones -- as variations on a common theme. A geometric representation of condition queries also provides a simple and coherent basis for computer implementation, reducing a wide variety of existing and potential vector field feature detection techniques to a few simple geometric operations.
Occam"s razor: supporting visual query expression for content-based image queries

NASA Astrophysics Data System (ADS)

Venters, Colin C.; Hartley, Richard J.; Hewitt, William T.

2004-12-01

This paper reports the results of a usability experiment that investigated visual query formulation on three dimensions: effectiveness, efficiency, and user satisfaction. Twenty eight evaluation sessions were conducted in order to assess the extent to which query by visual example supports visual query formulation in a content-based image retrieval environment. In order to provide a context and focus for the investigation, the study was segmented by image type, user group, and use function. The image type consisted of a set of abstract geometric device marks supplied by the UK Trademark Registry. Users were selected from the 14 UK Patent Information Network offices. The use function was limited to the retrieval of images by shape similarity. Two client interfaces were developed for comparison purposes: Trademark Image Browser Engine (TRIBE) and Shape Query Image Retrieval Systems Engine (SQUIRE).
Retrieval feedback in MEDLINE.

PubMed Central

Srinivasan, P

1996-01-01

OBJECTIVE: To investigate a new approach for query expansion based on retrieval feedback. The first objective in this study was to examine alternative query-expansion methods within the same retrieval-feedback framework. The three alternatives proposed are: expansion on the MeSH query field alone, expansion on the free-text field alone, and expansion on both the MeSH and the free-text fields. The second objective was to gain further understanding of retrieval feedback by examining possible dependencies on relevant documents during the feedback cycle. DESIGN: Comparative study of retrieval effectiveness using the original unexpanded and the alternative expanded user queries on a MEDLINE test collection of 75 queries and 2,334 MEDLINE citations. MEASUREMENTS: Retrieval effectivenesses of the original unexpanded and the alternative expanded queries were compared using 11-point-average precision scores (11-AvgP). These are averages of precision scores obtained at 11 standard recall points. RESULTS: All three expansion strategies significantly improved the original queries in terms of retrieval effectiveness. Expansion on MeSH alone was equivalent to expansion on both MeSH and the free-text fields. Expansion on the free-text field alone improved the queries significantly less than did the other two strategies. The second part of the study indicated that retrieval-feedback-based expansion yields significant performance improvements independent of the availability of relevant documents for feedback information. CONCLUSIONS: Retrieval feedback offers a robust procedure for query expansion that is most effective for MEDLINE when applied to the MeSH field. PMID:8653452
Dual Roles for DNA Polymerase Theta in Alternative End-Joining Repair of Double-Strand Breaks in Drosophila

PubMed Central

McVey, Mitch

2010-01-01

DNA double-strand breaks are repaired by multiple mechanisms that are roughly grouped into the categories of homology-directed repair and non-homologous end joining. End-joining repair can be further classified as either classical non-homologous end joining, which requires DNA ligase 4, or “alternative” end joining, which does not. Alternative end joining has been associated with genomic deletions and translocations, but its molecular mechanism(s) are largely uncharacterized. Here, we report that Drosophila melanogaster DNA polymerase theta (pol theta), encoded by the mus308 gene and previously implicated in DNA interstrand crosslink repair, plays a crucial role in DNA ligase 4-independent alternative end joining. In the absence of pol theta, end joining is impaired and residual repair often creates large deletions flanking the break site. Analysis of break repair junctions from flies with mus308 separation-of-function alleles suggests that pol theta promotes the use of long microhomologies during alternative end joining and increases the likelihood of complex insertion events. Our results establish pol theta as a key protein in alternative end joining in Drosophila and suggest a potential mechanistic link between alternative end joining and interstrand crosslink repair. PMID:20617203
Query Expansion Using SNOMED-CT and Weighing Schemes

DTIC Science & Technology

2014-11-01

For this research, we have used SNOMED-CT along with UMLS Methathesaurus as our ontology in medical domain to expand the queries. General Terms...CT along with UMLS Methathesaurus as our ontology in medical domain to expand the queries. 15. SUBJECT TERMS 16. SECURITY CLASSIFICATION OF: 17...University of the Basque country discuss their finding on query expansion using external sources headlined by Unified Medical Language System ( UMLS
Categorical and Specificity Differences between User-Supplied Tags and Search Query Terms for Images. An Analysis of "Flickr" Tags and Web Image Search Queries

ERIC Educational Resources Information Center

Chung, EunKyung; Yoon, JungWon

2009-01-01

Introduction: The purpose of this study is to compare characteristics and features of user supplied tags and search query terms for images on the "Flickr" Website in terms of categories of pictorial meanings and level of term specificity. Method: This study focuses on comparisons between tags and search queries using Shatford's categorization…
Design Recommendations for Query Languages

DTIC Science & Technology

1980-09-01

DESIGN RECOMMENDATIONS FOR QUERY LANGUAGES S.L. Ehrenreich Submitted by: Stanley M. Halpin, Acting Chief HUMAN FACTORS TECHNICAL AREA Approved by: Edgar ...respond to que- ries that it recognizes as faulty. Codd (1974) states that in designing a nat- ural query language, attention must be given to dealing...impaired. Codd (1974) also regarded the user’s perception of the data base to be of critical importance in properly designing a query language system
Agent-Based Framework for Discrete Entity Simulations

DTIC Science & Technology

2006-11-01

Postgres database server for environment queries of neighbors and continuum data. As expected for raw database queries (no database optimizations in...form. Eventually the code was ported to GNU C++ on the same single Intel Pentium 4 CPU running RedHat Linux 9.0 and Postgres database server...Again Postgres was used for environmental queries, and the tool remained relatively slow because of the immense number of queries necessary to assess
An SSVEP-Based Brain-Computer Interface for Text Spelling With Adaptive Queries That Maximize Information Gain Rates.

PubMed

Akce, Abdullah; Norton, James J S; Bretl, Timothy

2015-09-01

This paper presents a brain-computer interface for text entry using steady-state visually evoked potentials (SSVEP). Like other SSVEP-based spellers, ours identifies the desired input character by posing questions (or queries) to users through a visual interface. Each query defines a mapping from possible characters to steady-state stimuli. The user responds by attending to one of these stimuli. Unlike other SSVEP-based spellers, ours chooses from a much larger pool of possible queries-on the order of ten thousand instead of ten. The larger query pool allows our speller to adapt more effectively to the inherent structure of what is being typed and to the input performance of the user, both of which make certain queries provide more information than others. In particular, our speller chooses queries from this pool that maximize the amount of information to be received per unit of time, a measure of mutual information that we call information gain rate. To validate our interface, we compared it with two other state-of-the-art SSVEP-based spellers, which were re-implemented to use the same input mechanism. Results showed that our interface, with the larger query pool, allowed users to spell multiple-word texts nearly twice as fast as they could with the compared spellers.
Query construction, entropy, and generalization in neural-network models

NASA Astrophysics Data System (ADS)

Sollich, Peter

1994-05-01

We study query construction algorithms, which aim at improving the generalization ability of systems that learn from examples by choosing optimal, nonredundant training sets. We set up a general probabilistic framework for deriving such algorithms from the requirement of optimizing a suitable objective function; specifically, we consider the objective functions entropy (or information gain) and generalization error. For two learning scenarios, the high-low game and the linear perceptron, we evaluate the generalization performance obtained by applying the corresponding query construction algorithms and compare it to training on random examples. We find qualitative differences between the two scenarios due to the different structure of the underlying rules (nonlinear and ``noninvertible'' versus linear); in particular, for the linear perceptron, random examples lead to the same generalization ability as a sequence of queries in the limit of an infinite number of examples. We also investigate learning algorithms which are ill matched to the learning environment and find that, in this case, minimum entropy queries can in fact yield a lower generalization ability than random examples. Finally, we study the efficiency of single queries and its dependence on the learning history, i.e., on whether the previous training examples were generated randomly or by querying, and the difference between globally and locally optimal query construction.
DISPAQ: Distributed Profitable-Area Query from Big Taxi Trip Data.

PubMed

Putri, Fadhilah Kurnia; Song, Giltae; Kwon, Joonho; Rao, Praveen

2017-09-25

One of the crucial problems for taxi drivers is to efficiently locate passengers in order to increase profits. The rapid advancement and ubiquitous penetration of Internet of Things (IoT) technology into transportation industries enables us to provide taxi drivers with locations that have more potential passengers (more profitable areas) by analyzing and querying taxi trip data. In this paper, we propose a query processing system, called Distributed Profitable-Area Query ( DISPAQ ) which efficiently identifies profitable areas by exploiting the Apache Software Foundation's Spark framework and a MongoDB database. DISPAQ first maintains a profitable-area query index (PQ-index) by extracting area summaries and route summaries from raw taxi trip data. It then identifies candidate profitable areas by searching the PQ-index during query processing. Then, it exploits a Z-Skyline algorithm, which is an extension of skyline processing with a Z-order space filling curve, to quickly refine the candidate profitable areas. To improve the performance of distributed query processing, we also propose local Z-Skyline optimization, which reduces the number of dominant tests by distributing killer profitable areas to each cluster node. Through extensive evaluation with real datasets, we demonstrate that our DISPAQ system provides a scalable and efficient solution for processing profitable-area queries from huge amounts of big taxi trip data.
A Novel Two-Tier Cooperative Caching Mechanism for the Optimization of Multi-Attribute Periodic Queries in Wireless Sensor Networks

PubMed Central

Zhou, ZhangBing; Zhao, Deng; Shu, Lei; Tsang, Kim-Fung

2015-01-01

Wireless sensor networks, serving as an important interface between physical environments and computational systems, have been used extensively for supporting domain applications, where multiple-attribute sensory data are queried from the network continuously and periodically. Usually, certain sensory data may not vary significantly within a certain time duration for certain applications. In this setting, sensory data gathered at a certain time slot can be used for answering concurrent queries and may be reused for answering the forthcoming queries when the variation of these data is within a certain threshold. To address this challenge, a popularity-based cooperative caching mechanism is proposed in this article, where the popularity of sensory data is calculated according to the queries issued in recent time slots. This popularity reflects the possibility that sensory data are interested in the forthcoming queries. Generally, sensory data with the highest popularity are cached at the sink node, while sensory data that may not be interested in the forthcoming queries are cached in the head nodes of divided grid cells. Leveraging these cooperatively cached sensory data, queries are answered through composing these two-tier cached data. Experimental evaluation shows that this approach can reduce the network communication cost significantly and increase the network capability. PMID:26131665
DISPAQ: Distributed Profitable-Area Query from Big Taxi Trip Data †

PubMed Central

Putri, Fadhilah Kurnia; Song, Giltae; Rao, Praveen

2017-01-01

One of the crucial problems for taxi drivers is to efficiently locate passengers in order to increase profits. The rapid advancement and ubiquitous penetration of Internet of Things (IoT) technology into transportation industries enables us to provide taxi drivers with locations that have more potential passengers (more profitable areas) by analyzing and querying taxi trip data. In this paper, we propose a query processing system, called Distributed Profitable-Area Query (DISPAQ) which efficiently identifies profitable areas by exploiting the Apache Software Foundation’s Spark framework and a MongoDB database. DISPAQ first maintains a profitable-area query index (PQ-index) by extracting area summaries and route summaries from raw taxi trip data. It then identifies candidate profitable areas by searching the PQ-index during query processing. Then, it exploits a Z-Skyline algorithm, which is an extension of skyline processing with a Z-order space filling curve, to quickly refine the candidate profitable areas. To improve the performance of distributed query processing, we also propose local Z-Skyline optimization, which reduces the number of dominant tests by distributing killer profitable areas to each cluster node. Through extensive evaluation with real datasets, we demonstrate that our DISPAQ system provides a scalable and efficient solution for processing profitable-area queries from huge amounts of big taxi trip data. PMID:28946679
VPipe: Virtual Pipelining for Scheduling of DAG Stream Query Plans

NASA Astrophysics Data System (ADS)

Wang, Song; Gupta, Chetan; Mehta, Abhay

There are data streams all around us that can be harnessed for tremendous business and personal advantage. For an enterprise-level stream processing system such as CHAOS [1] (Continuous, Heterogeneous Analytic Over Streams), handling of complex query plans with resource constraints is challenging. While several scheduling strategies exist for stream processing, efficient scheduling of complex DAG query plans is still largely unsolved. In this paper, we propose a novel execution scheme for scheduling complex directed acyclic graph (DAG) query plans with meta-data enriched stream tuples. Our solution, called Virtual Pipelined Chain (or VPipe Chain for short), effectively extends the "Chain" pipelining scheduling approach to complex DAG query plans.
Pulsed coherent population trapping with repeated queries for producing single-peaked high contrast Ramsey interference

NASA Astrophysics Data System (ADS)

Warren, Z.; Shahriar, M. S.; Tripathi, R.; Pati, G. S.

2018-02-01

A repeated query technique has been demonstrated as a new interrogation method in pulsed coherent population trapping for producing single-peaked Ramsey interference with high contrast. This technique enhances the contrast of the central Ramsey fringe by nearly 1.5 times and significantly suppresses the side fringes by using more query pulses ( >10) in the pulse cycle. Theoretical models have been developed to simulate Ramsey interference and analyze the characteristics of the Ramsey spectrum produced by the repeated query technique. Experiments have also been carried out employing a repeated query technique in a prototype rubidium clock to study its frequency stability performance.
Concept locator: a client-server application for retrieval of UMLS metathesaurus concepts through complex boolean query.

PubMed

Nadkarni, P M

1997-08-01

Concept Locator (CL) is a client-server application that accesses a Sybase relational database server containing a subset of the UMLS Metathesaurus for the purpose of retrieval of concepts corresponding to one or more query expressions supplied to it. CL's query grammar permits complex Boolean expressions, wildcard patterns, and parenthesized (nested) subexpressions. CL translates the query expressions supplied to it into one or more SQL statements that actually perform the retrieval. The generated SQL is optimized by the client to take advantage of the strengths of the server's query optimizer, and sidesteps its weaknesses, so that execution is reasonably efficient.
Evolution of Query Optimization Methods

NASA Astrophysics Data System (ADS)

Hameurlain, Abdelkader; Morvan, Franck

Query optimization is the most critical phase in query processing. In this paper, we try to describe synthetically the evolution of query optimization methods from uniprocessor relational database systems to data Grid systems through parallel, distributed and data integration systems. We point out a set of parameters to characterize and compare query optimization methods, mainly: (i) size of the search space, (ii) type of method (static or dynamic), (iii) modification types of execution plans (re-optimization or re-scheduling), (iv) level of modification (intra-operator and/or inter-operator), (v) type of event (estimation errors, delay, user preferences), and (vi) nature of decision-making (centralized or decentralized control).
A Comparative Study on Two Typical Schemes for Securing Spatial-Temporal Top-k Queries in Two-Tiered Mobile Wireless Sensor Networks.

PubMed

Ma, Xingpo; Liu, Xingjian; Liang, Junbin; Li, Yin; Li, Ran; Ma, Wenpeng; Qi, Chuanda

2018-03-15

A novel network paradigm of mobile edge computing, namely TMWSNs (two-tiered mobile wireless sensor networks), has just been proposed by researchers in recent years for its high scalability and robustness. However, only a few works have considered the security of TMWSNs. In fact, the storage nodes, which are located at the upper layer of TMWSNs, are prone to being attacked by the adversaries because they play a key role in bridging both the sensor nodes and the sink, which may lead to the disclosure of all data stored on them as well as some other potentially devastating results. In this paper, we make a comparative study on two typical schemes, EVTopk and VTMSN, which have been proposed recently for securing Top- k queries in TMWSNs, through both theoretical analysis and extensive simulations, aiming at finding out their disadvantages and advancements. We find that both schemes unsatisfactorily raise communication costs. Specifically, the extra communication cost brought about by transmitting the proof information uses up more than 40% of the total communication cost between the sensor nodes and the storage nodes, and 80% of that between the storage nodes and the sink. We discuss the corresponding reasons and present our suggestions, hoping that it will inspire the researchers researching this subject.
An alternative database approach for management of SNOMED CT and improved patient data queries.

PubMed

Campbell, W Scott; Pedersen, Jay; McClay, James C; Rao, Praveen; Bastola, Dhundy; Campbell, James R

2015-10-01

SNOMED CT is the international lingua franca of terminologies for human health. Based in Description Logics (DL), the terminology enables data queries that incorporate inferences between data elements, as well as, those relationships that are explicitly stated. However, the ontologic and polyhierarchical nature of the SNOMED CT concept model make it difficult to implement in its entirety within electronic health record systems that largely employ object oriented or relational database architectures. The result is a reduction of data richness, limitations of query capability and increased systems overhead. The hypothesis of this research was that a graph database (graph DB) architecture using SNOMED CT as the basis for the data model and subsequently modeling patient data upon the semantic core of SNOMED CT could exploit the full value of the terminology to enrich and support advanced data querying capability of patient data sets. The hypothesis was tested by instantiating a graph DB with the fully classified SNOMED CT concept model. The graph DB instance was tested for integrity by calculating the transitive closure table for the SNOMED CT hierarchy and comparing the results with transitive closure tables created using current, validated methods. The graph DB was then populated with 461,171 anonymized patient record fragments and over 2.1 million associated SNOMED CT clinical findings. Queries, including concept negation and disjunction, were then run against the graph database and an enterprise Oracle relational database (RDBMS) of the same patient data sets. The graph DB was then populated with laboratory data encoded using LOINC, as well as, medication data encoded with RxNorm and complex queries performed using LOINC, RxNorm and SNOMED CT to identify uniquely described patient populations. A graph database instance was successfully created for two international releases of SNOMED CT and two US SNOMED CT editions. Transitive closure tables and descriptive statistics generated using the graph database were identical to those using validated methods. Patient queries produced identical patient count results to the Oracle RDBMS with comparable times. Database queries involving defining attributes of SNOMED CT concepts were possible with the graph DB. The same queries could not be directly performed with the Oracle RDBMS representation of the patient data and required the creation and use of external terminology services. Further, queries of undefined depth were successful in identifying unknown relationships between patient cohorts. The results of this study supported the hypothesis that a patient database built upon and around the semantic model of SNOMED CT was possible. The model supported queries that leveraged all aspects of the SNOMED CT logical model to produce clinically relevant query results. Logical disjunction and negation queries were possible using the data model, as well as, queries that extended beyond the structural IS_A hierarchy of SNOMED CT to include queries that employed defining attribute-values of SNOMED CT concepts as search parameters. As medical terminologies, such as SNOMED CT, continue to expand, they will become more complex and model consistency will be more difficult to assure. Simultaneously, consumers of data will increasingly demand improvements to query functionality to accommodate additional granularity of clinical concepts without sacrificing speed. This new line of research provides an alternative approach to instantiating and querying patient data represented using advanced computable clinical terminologies. Copyright © 2015 Elsevier Inc. All rights reserved.

Content-Aware DataGuide with Incremental Index Update using Frequently Used Paths

NASA Astrophysics Data System (ADS)

Sharma, A. K.; Duhan, Neelam; Khattar, Priyanka

2010-11-01

Size of the WWW is increasing day by day. Due to the absence of structured data on the Web, it becomes very difficult for information retrieval tools to fully utilize the Web information. As a solution to this problem, XML pages come into play, which provide structural information to the users to some extent. Without efficient indexes, query processing can be quite inefficient due to an exhaustive traversal on XML data. In this paper an improved content-centric approach of Content-Aware DataGuide, which is an indexing technique for XML databases, is being proposed that uses frequently used paths from historical query logs to improve query performance. The index can be updated incrementally according to the changes in query workload and thus, the overhead of reconstruction can be minimized. Frequently used paths are extracted using any Sequential Pattern mining algorithm on subsequent queries in the query workload. After this, the data structures are incrementally updated. This indexing technique proves to be efficient as partial matching queries can be executed efficiently and users can now get the more relevant documents in results.
Automatic Extraction of Destinations, Origins and Route Parts from Human Generated Route Directions

NASA Astrophysics Data System (ADS)

Zhang, Xiao; Mitra, Prasenjit; Klippel, Alexander; Maceachren, Alan

Researchers from the cognitive and spatial sciences are studying text descriptions of movement patterns in order to examine how humans communicate and understand spatial information. In particular, route directions offer a rich source of information on how cognitive systems conceptualize movement patterns by segmenting them into meaningful parts. Route directions are composed using a plethora of cognitive spatial organization principles: changing levels of granularity, hierarchical organization, incorporation of cognitively and perceptually salient elements, and so forth. Identifying such information in text documents automatically is crucial for enabling machine-understanding of human spatial language. The benefits are: a) creating opportunities for large-scale studies of human linguistic behavior; b) extracting and georeferencing salient entities (landmarks) that are used by human route direction providers; c) developing methods to translate route directions to sketches and maps; and d) enabling queries on large corpora of crawled/analyzed movement data. In this paper, we introduce our approach and implementations that bring us closer to the goal of automatically processing linguistic route directions. We report on research directed at one part of the larger problem, that is, extracting the three most critical parts of route directions and movement patterns in general: origin, destination, and route parts. We use machine-learning based algorithms to extract these parts of routes, including, for example, destination names and types. We prove the effectiveness of our approach in several experiments using hand-tagged corpora.
Spatial Digital Database for the Geologic Map of Oregon

USGS Publications Warehouse

Walker, George W.; MacLeod, Norman S.; Miller, Robert J.; Raines, Gary L.; Connors, Katherine A.

2003-01-01

Introduction This report describes and makes available a geologic digital spatial database (orgeo) representing the geologic map of Oregon (Walker and MacLeod, 1991). The original paper publication was printed as a single map sheet at a scale of 1:500,000, accompanied by a second sheet containing map unit descriptions and ancillary data. A digital version of the Walker and MacLeod (1991) map was included in Raines and others (1996). The dataset provided by this open-file report supersedes the earlier published digital version (Raines and others, 1996). This digital spatial database is one of many being created by the U.S. Geological Survey as an ongoing effort to provide geologic information for use in spatial analysis in a geographic information system (GIS). This database can be queried in many ways to produce a variety of geologic maps. This database is not meant to be used or displayed at any scale larger than 1:500,000 (for example, 1:100,000). This report describes the methods used to convert the geologic map data into a digital format, describes the ArcInfo GIS file structures and relationships, and explains how to download the digital files from the U.S. Geological Survey public access World Wide Web site on the Internet. Scanned images of the printed map (Walker and MacLeod, 1991), their correlation of map units, and their explanation of map symbols are also available for download.
Autocorrelation and Regularization of Query-Based Information Retrieval Scores

DTIC Science & Technology

2008-02-01

of the most general information retrieval models [ Salton , 1968]. By treating a query as a very short document, documents and queries can be rep... Salton , 1971]. In the context of single link hierarchical clustering, Jardine and van Rijsbergen showed that ranking all k clusters and retrieving a...a document about “dogs”, then the system will always miss this document when a user queries “dog”. Salton recognized that a document’s representation
Query Log Analysis of an Electronic Health Record Search Engine

PubMed Central

Yang, Lei; Mei, Qiaozhu; Zheng, Kai; Hanauer, David A.

2011-01-01

We analyzed a longitudinal collection of query logs of a full-text search engine designed to facilitate information retrieval in electronic health records (EHR). The collection, 202,905 queries and 35,928 user sessions recorded over a course of 4 years, represents the information-seeking behavior of 533 medical professionals, including frontline practitioners, coding personnel, patient safety officers, and biomedical researchers for patient data stored in EHR systems. In this paper, we present descriptive statistics of the queries, a categorization of information needs manifested through the queries, as well as temporal patterns of the users’ information-seeking behavior. The results suggest that information needs in medical domain are substantially more sophisticated than those that general-purpose web search engines need to accommodate. Therefore, we envision there exists a significant challenge, along with significant opportunities, to provide intelligent query recommendations to facilitate information retrieval in EHR. PMID:22195150
A Fuzzy Query Mechanism for Human Resource Websites

NASA Astrophysics Data System (ADS)

Lai, Lien-Fu; Wu, Chao-Chin; Huang, Liang-Tsung; Kuo, Jung-Chih

Users' preferences often contain imprecision and uncertainty that are difficult for traditional human resource websites to deal with. In this paper, we apply the fuzzy logic theory to develop a fuzzy query mechanism for human resource websites. First, a storing mechanism is proposed to store fuzzy data into conventional database management systems without modifying DBMS models. Second, a fuzzy query language is proposed for users to make fuzzy queries on fuzzy databases. User's fuzzy requirement can be expressed by a fuzzy query which consists of a set of fuzzy conditions. Third, each fuzzy condition associates with a fuzzy importance to differentiate between fuzzy conditions according to their degrees of importance. Fourth, the fuzzy weighted average is utilized to aggregate all fuzzy conditions based on their degrees of importance and degrees of matching. Through the mutual compensation of all fuzzy conditions, the ordering of query results can be obtained according to user's preference.
Integration Of 3D Geographic Information System (GIS) For Effective Waste Management Practice

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rood, G.J.; Hecox, G.R.

2006-07-01

Soil remediation in response to the presence of residual radioactivity resulting from past MED/AEC activities is currently in progress under the Formerly Utilized Sites Remedial Action Program near the St. Louis, MO airport. During GY05, approximately 92,000 cubic meters (120,000 cubic yards) of radioactive soil was excavated, packaged and transported via rail for disposal at U.S. Ecology or Envirocare of Utah, LLC. To facilitate the management of excavation/transportation/disposal activities, a 3D GIS was developed for the site that was used to estimate the in-situ radionuclide activities, activities in excavation block areas, and shipping activities using a sum-of ratio (SOR) methodmore » for combining various radionuclide compounds into applicable transportation and disposal SOR values. The 3D GIS was developed starting with the SOR values for the approximately 900 samples from 90 borings. These values were processed into a three-dimensional (3D) point grid using kriging with nominal grid spacing of 1.5 by 1.5 meter horizontal by 0.3 meter vertical. The final grid, clipped to the area and soil interval above the planned base of excavation, consisted of 210,000 individual points. Standard GIS volumetric and spatial join procedures were used to calculate the volume of soil represented by each grid point, the base of excavation, depth below ground surface, elevation, surface elevation and SOR values for each point in the final grid. To create the maps needed for management, the point grid results were spatially joined to each excavation area in 0.9 meter (3 foot) depth intervals and the average SOR and total volumes were calculations. The final maps were color-coded for easy identification of areas above the specific transportation or disposal criteria. (authors)« less
Integrating Smart Health in the US Health Care System: Infodemiology Study of Asthma Monitoring in the Google Era

PubMed Central

Sampri, Alexia; Sypsa, Karla; Tsagarakis, Konstantinos P

2018-01-01

Background With the internet’s penetration and use constantly expanding, this vast amount of information can be employed in order to better assess issues in the US health care system. Google Trends, a popular tool in big data analytics, has been widely used in the past to examine interest in various medical and health-related topics and has shown great potential in forecastings, predictions, and nowcastings. As empirical relationships between online queries and human behavior have been shown to exist, a new opportunity to explore the behavior toward asthma—a common respiratory disease—is present. Objective This study aimed at forecasting the online behavior toward asthma and examined the correlations between queries and reported cases in order to explore the possibility of nowcasting asthma prevalence in the United States using online search traffic data. Methods Applying Holt-Winters exponential smoothing to Google Trends time series from 2004 to 2015 for the term “asthma,” forecasts for online queries at state and national levels are estimated from 2016 to 2020 and validated against available Google query data from January 2016 to June 2017. Correlations among yearly Google queries and between Google queries and reported asthma cases are examined. Results Our analysis shows that search queries exhibit seasonality within each year and the relationships between each 2 years’ queries are statistically significant (P<.05). Estimated forecasting models for a 5-year period (2016 through 2020) for Google queries are robust and validated against available data from January 2016 to June 2017. Significant correlations were found between (1) online queries and National Health Interview Survey lifetime asthma (r=–.82, P=.001) and current asthma (r=–.77, P=.004) rates from 2004 to 2015 and (2) between online queries and Behavioral Risk Factor Surveillance System lifetime (r=–.78, P=.003) and current asthma (r=–.79, P=.002) rates from 2004 to 2014. The correlations are negative, but lag analysis to identify the period of response cannot be employed until short-interval data on asthma prevalence are made available. Conclusions Online behavior toward asthma can be accurately predicted, and significant correlations between online queries and reported cases exist. This method of forecasting Google queries can be used by health care officials to nowcast asthma prevalence by city, state, or nationally, subject to future availability of daily, weekly, or monthly data on reported cases. This method could therefore be used for improved monitoring and assessment of the needs surrounding the current population of patients with asthma. PMID:29530839
Menopause and big data: Word Adjacency Graph modeling of menopause-related ChaCha data.

PubMed

Carpenter, Janet S; Groves, Doyle; Chen, Chen X; Otte, Julie L; Miller, Wendy R

2017-07-01

To detect and visualize salient queries about menopause using Big Data from ChaCha. We used Word Adjacency Graph (WAG) modeling to detect clusters and visualize the range of menopause-related topics and their mutual proximity. The subset of relevant queries was fully modeled. We split each query into token words (ie, meaningful words and phrases) and removed stopwords (ie, not meaningful functional words). The remaining words were considered in sequence to build summary tables of words and two and three-word phrases. Phrases occurring at least 10 times were used to build a network graph model that was iteratively refined by observing and removing clusters of unrelated content. We identified two menopause-related subsets of queries by searching for questions containing menopause and menopause-related terms (eg, climacteric, hot flashes, night sweats, hormone replacement). The first contained 263,363 queries from individuals aged 13 and older and the second contained 5,892 queries from women aged 40 to 62 years. In the first set, we identified 12 topic clusters: 6 relevant to menopause and 6 less relevant. In the second set, we identified 15 topic clusters: 11 relevant to menopause and 4 less relevant. Queries about hormones were pervasive within both WAG models. Many of the queries reflected low literacy levels and/or feelings of embarrassment. We modeled menopause-related queries posed by ChaCha users between 2009 and 2012. ChaCha data may be used on its own or in combination with other Big Data sources to identify patient-driven educational needs and create patient-centered interventions.
Fast Inbound Top-K Query for Random Walk with Restart.

PubMed

Zhang, Chao; Jiang, Shan; Chen, Yucheng; Sun, Yidan; Han, Jiawei

2015-09-01

Random walk with restart (RWR) is widely recognized as one of the most important node proximity measures for graphs, as it captures the holistic graph structure and is robust to noise in the graph. In this paper, we study a novel query based on the RWR measure, called the inbound top-k (Ink) query. Given a query node q and a number k , the Ink query aims at retrieving k nodes in the graph that have the largest weighted RWR scores to q . Ink queries can be highly useful for various applications such as traffic scheduling, disease treatment, and targeted advertising. Nevertheless, none of the existing RWR computation techniques can accurately and efficiently process the Ink query in large graphs. We propose two algorithms, namely Squeeze and Ripple, both of which can accurately answer the Ink query in a fast and incremental manner. To identify the top- k nodes, Squeeze iteratively performs matrix-vector multiplication and estimates the lower and upper bounds for all the nodes in the graph. Ripple employs a more aggressive strategy by only estimating the RWR scores for the nodes falling in the vicinity of q , the nodes outside the vicinity do not need to be evaluated because their RWR scores are propagated from the boundary of the vicinity and thus upper bounded. Ripple incrementally expands the vicinity until the top- k result set can be obtained. Our extensive experiments on real-life graph data sets show that Ink queries can retrieve interesting results, and the proposed algorithms are orders of magnitude faster than state-of-the-art method.
Sleep-wake time perception varies by direct or indirect query.

PubMed

Alameddine, Y; Ellenbogen, J M; Bianchi, M T

2015-01-15

The diagnosis of insomnia rests on self-report of difficulty initiating or maintaining sleep. However, subjective reports may be unreliable, and possibly may vary by the method of inquiry. We investigated this possibility by comparing within-individual response to direct versus indirect time queries after overnight polysomnography. We obtained self-reported sleep-wake times via morning questionnaires in 879 consecutive adult diagnostic polysomnograms. Responses were compared within subjects (direct versus indirect query) and across groups defined by apnea-hypopnea index and by self-reported insomnia symptoms in pre-sleep questionnaires. Direct queries required a time duration response, while indirect queries required clock times from which we calculated time durations. Direct and indirect queries of sleep latency were the same in only 41% of cases, and total sleep time queries matched in only 5.4%. For both latency and total sleep, the most common discrepancy involved the indirect value being larger than the direct response. The discrepancy between direct and indirect queries was not related to objective sleep metrics. The degree of discrepancy was not related to the presence of insomnia symptoms, although patients reporting insomnia symptoms showed underestimation of total sleep duration by direct response. Self-reported sleep latency and total sleep time are often internally inconsistent when comparing direct and indirect survey queries of each measure. These discrepancies represent substantive challenges to effective clinical practice, particularly when diagnosis and management depends on self-reported sleep patterns, as with insomnia. Although self-reported sleep-wake times remains fundamental to clinical practice, objective measures provide clinically relevant adjunctive information. © 2015 American Academy of Sleep Medicine.
Digital soil mapping for the support of delineation of Areas Facing Natural Constraints defined by common European biophysical criteria

NASA Astrophysics Data System (ADS)

Pásztor, László; Bakacsi, Zsófia; Laborczi, Annamária; Takács, Katalin; Szatmári, Gábor; Tóth, Tibor; Szabó, József

2016-04-01

One of the main objectives of the EU's Common Agricultural Policy is to encourage maintaining agricultural production in Areas Facing Natural Constraints (ANC) in order to sustain agricultural production and use natural resources, in such a way to secure both stable production and income to farmers and to protect the environment. ANC assignment has both ecological and severe economical aspects. Recently the delimitation of ANCs is suggested to be carried out by using common biophysical diagnostic criteria on low soil productivity and poor climate conditions all over Europe. The criterion system was elaborated and has been repeatedly upgraded by JRC. The operational implementation is under member state competence. This process requires application of available soil databases and proper thematic and spatial inference methods. In our paper we present the inferences applied for the latest identification and delineation of areas with low soil productivity in Hungary according to JRC biophysical criteria related to soil: limited soil drainage, texture and stoniness (coarse texture, heavy clay, vertic properties), shallow rooting depth, chemical properties (salinity, sodicity, low pH). The compilation of target specific maps were based on the available legacy and recently collected data. In the present work three different data sources were used. The most relevant available data were queried from the datasets for each mapped criterion for either direct application or for the compilation a suitable, synthetic (non-measured) parameter. In some cases the values of the target variable originated from only one, in other cases from more databases. The reference dataset used in the mapping process was set up after substantial statistical analysis and filtering. It consisted of the values of the target variable attributed to the finally selected georeferenced locations. For spatial inference regression kriging was applied. Accuracy assessment was carried out by Leave One Out Cross Validation (LOOCV). In some cases the DSM product directly provided the delineation result by simple querying, in other cases further interpretation of the map was necessary. As the result of our work not only spatial fulfilment of the European biophysical criteria was assessed and provided for decision makers, but unique digital soil map products were elaborated regionalizing specific soil features, which were never mapped before, even nationally with 1 ha spatial resolution. Acknowledgement: Our work was supported by the "European Fund for Agricultural and Rural Development: Europe investing in rural areas" with the support of the European Union and Hungary and by the Hungarian National Scientific Research Foundation (OTKA, Grant No. K105167).
Distributed Multi-interface Catalogue for Geospatial Data

NASA Astrophysics Data System (ADS)

Nativi, S.; Bigagli, L.; Mazzetti, P.; Mattia, U.; Boldrini, E.

2007-12-01

Several geosciences communities (e.g. atmospheric science, oceanography, hydrology) have developed tailored data and metadata models and service protocol specifications for enabling online data discovery, inventory, evaluation, access and download. These specifications are conceived either profiling geospatial information standards or extending the well-accepted geosciences data models and protocols in order to capture more semantics. These artifacts have generated a set of related catalog -and inventory services- characterizing different communities, initiatives and projects. In fact, these geospatial data catalogs are discovery and access systems that use metadata as the target for query on geospatial information. The indexed and searchable metadata provide a disciplined vocabulary against which intelligent geospatial search can be performed within or among communities. There exists a clear need to conceive and achieve solutions to implement interoperability among geosciences communities, in the context of the more general geospatial information interoperability framework. Such solutions should provide search and access capabilities across catalogs, inventory lists and their registered resources. Thus, the development of catalog clearinghouse solutions is a near-term challenge in support of fully functional and useful infrastructures for spatial data (e.g. INSPIRE, GMES, NSDI, GEOSS). This implies the implementation of components for query distribution and virtual resource aggregation. These solutions must implement distributed discovery functionalities in an heterogeneous environment, requiring metadata profiles harmonization as well as protocol adaptation and mediation. We present a catalog clearinghouse solution for the interoperability of several well-known cataloguing systems (e.g. OGC CSW, THREDDS catalog and data services). The solution implements consistent resource discovery and evaluation over a dynamic federation of several well-known cataloguing and inventory systems. Prominent features include: 1)Support to distributed queries over a hierarchical data model, supporting incremental queries (i.e. query over collections, to be subsequently refined) and opaque/translucent chaining; 2)Support to several client protocols, through a compound front-end interface module. This allows to accommodate a (growing) number of cataloguing standards, or profiles thereof, including the OGC CSW interface, ebRIM Application Profile (for Core ISO Metadata and other data models), and the ISO Application Profile. The presented catalog clearinghouse supports both the opaque and translucent pattern for service chaining. In fact, the clearinghouse catalog may be configured either to completely hide the underlying federated services or to provide clients with services information. In both cases, the clearinghouse solution presents a higher level interface (i.e. OGC CSW) which harmonizes multiple lower level services (e.g. OGC CSW, WMS and WCS, THREDDS, etc.), and handles all control and interaction with them. In the translucent case, client has the option to directly access the lower level services (e.g. to improve performances). In the GEOSS context, the solution has been experimented both as a stand-alone user application and as a service framework. The first scenario allows a user to download a multi-platform client software and query a federation of cataloguing systems, that he can customize at will. The second scenario support server-side deployment and can be flexibly adapted to several use-cases, such as intranet proxy, catalog broker, etc.
Secure and Efficient k-NN Queries⋆

PubMed Central

Asif, Hafiz; Vaidya, Jaideep; Shafiq, Basit; Adam, Nabil

2017-01-01

Given the morass of available data, ranking and best match queries are often used to find records of interest. As such, k-NN queries, which give the k closest matches to a query point, are of particular interest, and have many applications. We study this problem in the context of the financial sector, wherein an investment portfolio database is queried for matching portfolios. Given the sensitivity of the information involved, our key contribution is to develop a secure k-NN computation protocol that can enable the computation k-NN queries in a distributed multi-party environment while taking domain semantics into account. The experimental results show that the proposed protocols are extremely efficient. PMID:29218333
Nearest private query based on quantum oblivious key distribution

NASA Astrophysics Data System (ADS)

Xu, Min; Shi, Run-hua; Luo, Zhen-yu; Peng, Zhen-wan

2017-12-01

Nearest private query is a special private query which involves two parties, a user and a data owner, where the user has a private input (e.g., an integer) and the data owner has a private data set, and the user wants to query which element in the owner's private data set is the nearest to his input without revealing their respective private information. In this paper, we first present a quantum protocol for nearest private query, which is based on quantum oblivious key distribution (QOKD). Compared to the classical related protocols, our protocol has the advantages of the higher security and the better feasibility, so it has a better prospect of applications.
ODM2 (Observation Data Model): The EarthChem Use Case

NASA Astrophysics Data System (ADS)

Lehnert, Kerstin; Song, Lulin; Hsu, Leslie; Horsburgh, Jeffrey S.; Aufdenkampe, Anthony K.; Mayorga, Emilio; Tarboton, David; Zaslavsky, Ilya

2014-05-01

PetDB is an online data system that was created in the late 1990's to serve online a synthesis of published geochemical and petrological data of igneous and metamorphic rocks. PetDB has today reached a volume of 2.5 million analytical values for nearly 70,000 rock samples. PetDB's data model (Lehnert et al., G-Cubed 2000) was designed to store sample-based observational data generated by the analysis of rocks, together with a wide range of metadata documenting provenance of the samples, analytical procedures, data quality, and data source. Attempts to store additional types of geochemical data such as time-series data of seafloor hydrothermal springs and volcanic gases, depth-series data for marine sediments and soils, and mineral or mineral inclusion data revealed the limitations of the schema: the inability to properly record sample hierarchies (for example, a garnet that is included in a diamond that is included in a xenolith that is included in a kimberlite rock sample), inability to properly store time-series data, inability to accommodate classification schemes other than rock lithologies, deficiencies of identifying and documenting datasets that are not part of publications. In order to overcome these deficiencies, PetDB has been developing a new data schema using the ODM2 information model (ODM=Observation Data Model). The development of ODM2 is a collaborative project that leverages the experience of several existing information representations, including PetDB and EarthChem, and the CUAHSI HIS Observations Data Model (ODM), as well as the general specification for encoding observational data called Observations and Measurements (O&M) to develop a uniform information model that seamlessly manages spatially discrete, feature-based earth observations from environmental samples and sample fractions as well as in-situ sensors, and to test its initial implementation in a variety of user scenarios. The O&M model, adopted as an international standard by the Open Geospatial Consortium, and later by ISO, is the foundation of several domain markup languages such as OGC WaterML 2, used for exchanging hydrologic time series. O&M profiles for samples and sample fractions have not been standardized yet, and there is a significant variety in sample data representations used across agencies and academic projects. The intent of the ODM2 project is to create a unified relational representation for different types of spatially discrete observational data, ensuring that the data can be efficiently stored, transferred, catalogued and queried within a variety of earth science applications. We will report on the initial design and implementation of the new model for PetDB, and results of testing the model against a set of common queries. We have explored several aspects of the model, including: semantic consistency, validation and integrity checking, portability and maintainability, query efficiency, and scalability. The sample datasets from PetDB have been loaded in the initial physical implementation for testing. The results of the experiments point to both benefits and challenges of the initial design, and illustrate the key trade-off between the generality of design, ease of interpretation, and query efficiency, especially as the system needs to scale to millions of records.
Blind Seer: A Scalable Private DBMS

DTIC Science & Technology

2014-05-01

searchable index terms per DB row, in time comparable to (insecure) MySQL (many practical queries can be privately executed with work 1.2-3 times slower...than MySQL , although some queries are costlier). We support a rich query set, including searching on arbitrary boolean formulas on keywords and ranges...index terms per DB row, in time comparable to (insecure) MySQL (many practical queries can be privately executed with work 1.2-3 times slower than MySQL
Applying Wave (registered trademark) to Build an Air Force Community of Interest Shared Space

DTIC Science & Technology

2007-08-01

Performance. It is essential that an inverse transform be defined for every transform, or else the query mediator must be smart enough to figure out how...to invert it. Without an inverse transform , if an incoming query constrains on the transformed attribute, the query mediator might generate a query...plan that is horribly inefficient. If you must code a custom transformation function, you must also code the inverse transform . Putting the
Concept Based Tie-breaking and Maximal Marginal Relevance Retrieval in Microblog Retrieval

DTIC Science & Technology

2014-11-01

the same score, another singal will be used to rank these documents to break the ties , but the relative orders of other documents against these...documents remain the same. The tie- breaking step above is repeatedly applied to further break ties until all candidate signals are applied and the ranking...searched it on the Yahoo! search engine, which returned some query sug- gestions for the query. The original queries as well as their query suggestions
Multi-field query expansion is effective for biomedical dataset retrieval.

PubMed

Bouadjenek, Mohamed Reda; Verspoor, Karin

2017-01-01

In the context of the bioCADDIE challenge addressing information retrieval of biomedical datasets, we propose a method for retrieval of biomedical data sets with heterogenous schemas through query reformulation. In particular, the method proposed transforms the initial query into a multi-field query that is then enriched with terms that are likely to occur in the relevant datasets. We compare and evaluate two query expansion strategies, one based on the Rocchio method and another based on a biomedical lexicon. We then perform a comprehensive comparative evaluation of our method on the bioCADDIE dataset collection for biomedical retrieval. We demonstrate the effectiveness of our multi-field query method compared to two baselines, with MAP improved from 0.2171 and 0.2669 to 0.2996. We also show the benefits of query expansion, where the Rocchio expanstion method improves the MAP for our two baselines from 0.2171 and 0.2669 to 0.335. We show that the Rocchio query expansion method slightly outperforms the one based on the biomedical lexicon as a source of terms, with an improvement of roughly 3% for MAP. However, the query expansion method based on the biomedical lexicon is much less resource intensive since it does not require computation of any relevance feedback set or any initial execution of the query. Hence, in term of trade-off between efficiency, execution time and retrieval accuracy, we argue that the query expansion method based on the biomedical lexicon offers the best performance for a prototype biomedical data search engine intended to be used at a large scale. In the official bioCADDIE challenge results, although our approach is ranked seventh in terms of the infNDCG evaluation metric, it ranks second in term of P@10 and NDCG. Hence, the method proposed here provides overall good retrieval performance in relation to the approaches of other competitors. Consequently, the observations made in this paper should benefit the development of a Data Discovery Index prototype or the improvement of the existing one. © The Author(s) 2017. Published by Oxford University Press.

Multi-field query expansion is effective for biomedical dataset retrieval

PubMed Central

2017-01-01

Abstract In the context of the bioCADDIE challenge addressing information retrieval of biomedical datasets, we propose a method for retrieval of biomedical data sets with heterogenous schemas through query reformulation. In particular, the method proposed transforms the initial query into a multi-field query that is then enriched with terms that are likely to occur in the relevant datasets. We compare and evaluate two query expansion strategies, one based on the Rocchio method and another based on a biomedical lexicon. We then perform a comprehensive comparative evaluation of our method on the bioCADDIE dataset collection for biomedical retrieval. We demonstrate the effectiveness of our multi-field query method compared to two baselines, with MAP improved from 0.2171 and 0.2669 to 0.2996. We also show the benefits of query expansion, where the Rocchio expanstion method improves the MAP for our two baselines from 0.2171 and 0.2669 to 0.335. We show that the Rocchio query expansion method slightly outperforms the one based on the biomedical lexicon as a source of terms, with an improvement of roughly 3% for MAP. However, the query expansion method based on the biomedical lexicon is much less resource intensive since it does not require computation of any relevance feedback set or any initial execution of the query. Hence, in term of trade-off between efficiency, execution time and retrieval accuracy, we argue that the query expansion method based on the biomedical lexicon offers the best performance for a prototype biomedical data search engine intended to be used at a large scale. In the official bioCADDIE challenge results, although our approach is ranked seventh in terms of the infNDCG evaluation metric, it ranks second in term of P@10 and NDCG. Hence, the method proposed here provides overall good retrieval performance in relation to the approaches of other competitors. Consequently, the observations made in this paper should benefit the development of a Data Discovery Index prototype or the improvement of the existing one. PMID:29220457
Efficient processing of multiple nested event pattern queries over multi-dimensional event streams based on a triaxial hierarchical model.

PubMed

Xiao, Fuyuan; Aritsugi, Masayoshi; Wang, Qing; Zhang, Rong

2016-09-01

For efficient and sophisticated analysis of complex event patterns that appear in streams of big data from health care information systems and support for decision-making, a triaxial hierarchical model is proposed in this paper. Our triaxial hierarchical model is developed by focusing on hierarchies among nested event pattern queries with an event concept hierarchy, thereby allowing us to identify the relationships among the expressions and sub-expressions of the queries extensively. We devise a cost-based heuristic by means of the triaxial hierarchical model to find an optimised query execution plan in terms of the costs of both the operators and the communications between them. According to the triaxial hierarchical model, we can also calculate how to reuse the results of the common sub-expressions in multiple queries. By integrating the optimised query execution plan with the reuse schemes, a multi-query optimisation strategy is developed to accomplish efficient processing of multiple nested event pattern queries. We present empirical studies in which the performance of multi-query optimisation strategy was examined under various stream input rates and workloads. Specifically, the workloads of pattern queries can be used for supporting monitoring patients' conditions. On the other hand, experiments with varying input rates of streams can correspond to changes of the numbers of patients that a system should manage, whereas burst input rates can correspond to changes of rushes of patients to be taken care of. The experimental results have shown that, in Workload 1, our proposal can improve about 4 and 2 times throughput comparing with the relative works, respectively; in Workload 2, our proposal can improve about 3 and 2 times throughput comparing with the relative works, respectively; in Workload 3, our proposal can improve about 6 times throughput comparing with the relative work. The experimental results demonstrated that our proposal was able to process complex queries efficiently which can support health information systems and further decision-making. Copyright © 2016 Elsevier B.V. All rights reserved.
A Magnetic Petrology Database for Satellite Magnetic Anomaly Interpretations

NASA Astrophysics Data System (ADS)

Nazarova, K.; Wasilewski, P.; Didenko, A.; Genshaft, Y.; Pashkevich, I.

2002-05-01

A Magnetic Petrology Database (MPDB) is now being compiled at NASA/Goddard Space Flight Center in cooperation with Russian and Ukrainian Institutions. The purpose of this database is to provide the geomagnetic community with a comprehensive and user-friendly method of accessing magnetic petrology data via Internet for more realistic interpretation of satellite magnetic anomalies. Magnetic Petrology Data had been accumulated in NASA/Goddard Space Flight Center, United Institute of Physics of the Earth (Russia) and Institute of Geophysics (Ukraine) over several decades and now consists of many thousands of records of data in our archives. The MPDB was, and continues to be in big demand especially since recent launching in near Earth orbit of the mini-constellation of three satellites - Oersted (in 1999), Champ (in 2000), and SAC-C (in 2000) which will provide lithospheric magnetic maps with better spatial and amplitude resolution (about 1 nT). The MPDB is focused on lower crustal and upper mantle rocks and will include data on mantle xenoliths, serpentinized ultramafic rocks, granulites, iron quartzites and rocks from Archean-Proterozoic metamorphic sequences from all around the world. A substantial amount of data is coming from the area of unique Kursk Magnetic Anomaly and Kola Deep Borehole (which recovered 12 km of continental crust). A prototype MPDB can be found on the Geodynamics Branch web server of Goddard Space Flight Center at http://core2.gsfc.nasa.gov/terr_mag/magnpetr.html. The MPDB employs a searchable relational design and consists of 7 interrelated tables. The schema of database is shown at http://core2.gsfc.nasa.gov/terr_mag/doc.html. MySQL database server was utilized to implement MPDB. The SQL (Structured Query Language) is used to query the database. To present the results of queries on WEB and for WEB programming we utilized PHP scripting language and CGI scripts. The prototype MPDB is designed to search database by major satellite magnetic anomaly, tectonic structure, geographical location, rock type, magnetic properties, chemistry and reference, see http://core2.gsfc.nasa.gov/terr_mag/query1.html. The output of database is HTML structured table, text file, and downloadable file. This database will be very useful for studies of lithospheric satellite magnetic anomalies on the Earth and other terrestrial planets.
78 FR 20473 - National Practitioner Data Bank

Federal Register 2010, 2011, 2012, 2013, 2014

2013-04-05

... may self-query. Information under the HCQIA is reported by medical malpractice payers, state medical... Organizations (QIOs). Individual health care practitioners and entities may self-query. Information under... have access to this information. Individual practitioners, providers, and suppliers may self-query the...
A test of theory of planned behavior in Korea: participation in alcohol-related social gatherings.

PubMed

Park, Hee Sun; Lee, Dong Wook

2009-12-01

Two studies are reported using the theory of planned behavior (TPB) to predict and explain joining and not joining alcohol-related social gatherings among Korean undergraduates in various engineering majors. Specifically, considering that the attitudinal component of TPB is behavioral-outcome-based, the current study investigated whether the outcomes of engaging in a behavior and of not engaging in a behavior would similarly predict intentions to engage in a behavior and intentions to not engage in a behavior. The current study also examined whether intentions to engage and intentions to not engage would be significantly related to self-reported behavior a week later. Participants in Study 1 reported TPB components (attitudes toward behavior, subjective norms, perceived behavioral control, and behavioral intentions) concerning joining alcohol-related social gatherings. Participants in Study 2 reported TPB components concerning not joining alcohol-related social gatherings. Additionally, a week later, the participants in both studies reported their participation in alcohol-related social gatherings from the past week. Generally, the results showed that the TPB components were significantly associated with undergraduates' intentions to join and intentions to not join. Specifically, conversation-related attitudes and senior-junior relationship-related attitudes were significantly related to intentions to join, and only group-related attitudes were significantly related to intentions to not join. Intentions to join and intentions to not join were not significantly related to self-reported behavior of joining alcohol-related social gatherings a week later. The findings from the current research provide some evidence that joining or not joining alcohol-related social gatherings may not be mere behavioral opposites, predictable by the presence or absence of the same behavioral outcomes. These two aspects of the behavior may require assessment of different behavioral outcomes or different assessments of the same behavioral outcomes.
Query by example video based on fuzzy c-means initialized by fixed clustering center

NASA Astrophysics Data System (ADS)

Hou, Sujuan; Zhou, Shangbo; Siddique, Muhammad Abubakar

2012-04-01

Currently, the high complexity of video contents has posed the following major challenges for fast retrieval: (1) efficient similarity measurements, and (2) efficient indexing on the compact representations. A video-retrieval strategy based on fuzzy c-means (FCM) is presented for querying by example. Initially, the query video is segmented and represented by a set of shots, each shot can be represented by a key frame, and then we used video processing techniques to find visual cues to represent the key frame. Next, because the FCM algorithm is sensitive to the initializations, here we initialized the cluster center by the shots of query video so that users could achieve appropriate convergence. After an FCM cluster was initialized by the query video, each shot of query video was considered a benchmark point in the aforesaid cluster, and each shot in the database possessed a class label. The similarity between the shots in the database with the same class label and benchmark point can be transformed into the distance between them. Finally, the similarity between the query video and the video in database was transformed into the number of similar shots. Our experimental results demonstrated the performance of this proposed approach.
NCBI2RDF: enabling full RDF-based access to NCBI databases.

PubMed

Anguita, Alberto; García-Remesal, Miguel; de la Iglesia, Diana; Maojo, Victor

2013-01-01

RDF has become the standard technology for enabling interoperability among heterogeneous biomedical databases. The NCBI provides access to a large set of life sciences databases through a common interface called Entrez. However, the latter does not provide RDF-based access to such databases, and, therefore, they cannot be integrated with other RDF-compliant databases and accessed via SPARQL query interfaces. This paper presents the NCBI2RDF system, aimed at providing RDF-based access to the complete NCBI data repository. This API creates a virtual endpoint for servicing SPARQL queries over different NCBI repositories and presenting to users the query results in SPARQL results format, thus enabling this data to be integrated and/or stored with other RDF-compliant repositories. SPARQL queries are dynamically resolved, decomposed, and forwarded to the NCBI-provided E-utilities programmatic interface to access the NCBI data. Furthermore, we show how our approach increases the expressiveness of the native NCBI querying system, allowing several databases to be accessed simultaneously. This feature significantly boosts productivity when working with complex queries and saves time and effort to biomedical researchers. Our approach has been validated with a large number of SPARQL queries, thus proving its reliability and enhanced capabilities in biomedical environments.
[On the seasonality of dermatoses: a retrospective analysis of search engine query data depending on the season].

PubMed

Köhler, M J; Springer, S; Kaatz, M

2014-09-01

The volume of search engine queries about disease-relevant items reflects public interest and correlates with disease prevalence as proven by the example of flu (influenza). Other influences include media attention or holidays. The present work investigates if the seasonality of prevalence or symptom severity of dermatoses correlates with search engine query data. The relative weekly volume of dermatological relevant search terms was assessed by the online tool Google Trends for the years 2009-2013. For each item, the degree of seasonality was calculated via frequency analysis and a geometric approach. Many dermatoses show a marked seasonality, reflected by search engine query volumes. Unexpected seasonal variations of these queries suggest a previously unknown variability of the respective disease prevalence. Furthermore, using the example of allergic rhinitis, a close correlation of search engine query data with actual pollen count can be demonstrated. In many cases, search engine query data are appropriate to estimate seasonal variability in prevalence of common dermatoses. This finding may be useful for real-time analysis and formation of hypotheses concerning pathogenetic or symptom aggravating mechanisms and may thus contribute to improvement of diagnostics and prevention of skin diseases.
Visual graph query formulation and exploration: a new perspective on information retrieval at the edge

NASA Astrophysics Data System (ADS)

Kase, Sue E.; Vanni, Michelle; Knight, Joanne A.; Su, Yu; Yan, Xifeng

2016-05-01

Within operational environments decisions must be made quickly based on the information available. Identifying an appropriate knowledge base and accurately formulating a search query are critical tasks for decision-making effectiveness in dynamic situations. The spreading of graph data management tools to access large graph databases is a rapidly emerging research area of potential benefit to the intelligence community. A graph representation provides a natural way of modeling data in a wide variety of domains. Graph structures use nodes, edges, and properties to represent and store data. This research investigates the advantages of information search by graph query initiated by the analyst and interactively refined within the contextual dimensions of the answer space toward a solution. The paper introduces SLQ, a user-friendly graph querying system enabling the visual formulation of schemaless and structureless graph queries. SLQ is demonstrated with an intelligence analyst information search scenario focused on identifying individuals responsible for manufacturing a mosquito-hosted deadly virus. The scenario highlights the interactive construction of graph queries without prior training in complex query languages or graph databases, intuitive navigation through the problem space, and visualization of results in graphical format.
Representation and alignment of sung queries for music information retrieval

NASA Astrophysics Data System (ADS)

Adams, Norman H.; Wakefield, Gregory H.

2005-09-01

The pursuit of robust and rapid query-by-humming systems, which search melodic databases using sung queries, is a common theme in music information retrieval. The retrieval aspect of this database problem has received considerable attention, whereas the front-end processing of sung queries and the data structure to represent melodies has been based on musical intuition and historical momentum. The present work explores three time series representations for sung queries: a sequence of notes, a ``smooth'' pitch contour, and a sequence of pitch histograms. The performance of the three representations is compared using a collection of naturally sung queries. It is found that the most robust performance is achieved by the representation with highest dimension, the smooth pitch contour, but that this representation presents a formidable computational burden. For all three representations, it is necessary to align the query and target in order to achieve robust performance. The computational cost of the alignment is quadratic, hence it is necessary to keep the dimension small for rapid retrieval. Accordingly, iterative deepening is employed to achieve both robust performance and rapid retrieval. Finally, the conventional iterative framework is expanded to adapt the alignment constraints based on previous iterations, further expediting retrieval without degrading performance.
Concept-based query language approach to enterprise information systems

NASA Astrophysics Data System (ADS)

Niemi, Timo; Junkkari, Marko; Järvelin, Kalervo

2014-01-01

In enterprise information systems (EISs) it is necessary to model, integrate and compute very diverse data. In advanced EISs the stored data often are based both on structured (e.g. relational) and semi-structured (e.g. XML) data models. In addition, the ad hoc information needs of end-users may require the manipulation of data-oriented (structural), behavioural and deductive aspects of data. Contemporary languages capable of treating this kind of diversity suit only persons with good programming skills. In this paper we present a concept-oriented query language approach to manipulate this diversity so that the programming skill requirements are considerably reduced. In our query language, the features which need technical knowledge are hidden in application-specific concepts and structures. Therefore, users need not be aware of the underlying technology. Application-specific concepts and structures are represented by the modelling primitives of the extended RDOOM (relational deductive object-oriented modelling) which contains primitives for all crucial real world relationships (is-a relationship, part-of relationship, association), XML documents and views. Our query language also supports intensional and extensional-intensional queries, in addition to conventional extensional queries. In its query formulation, the end-user combines available application-specific concepts and structures through shared variables.
Relativistic quantum private database queries

NASA Astrophysics Data System (ADS)

Sun, Si-Jia; Yang, Yu-Guang; Zhang, Ming-Ou

2015-04-01

Recently, Jakobi et al. (Phys Rev A 83, 022301, 2011) suggested the first practical private database query protocol (J-protocol) based on the Scarani et al. (Phys Rev Lett 92, 057901, 2004) quantum key distribution protocol. Unfortunately, the J-protocol is just a cheat-sensitive private database query protocol. In this paper, we present an idealized relativistic quantum private database query protocol based on Minkowski causality and the properties of quantum information. Also, we prove that the protocol is secure in terms of the user security and the database security.
Walter User’s Manual (Version 1.0).

DTIC Science & Technology

1987-09-01

queries and/or commands. 1.2 - How Walter Uses the Screen As shown in Figure 1-1, Walter divides the screen of your terminal into five separate areas...our attention to queries and how to submit them to the database. 1.3.1 - Submitting Queries A query is an expression consisting of words, parentheses...dates, but also with ranges of dates, such as "oct 15 : nov 15". Waiter recognizes three kinds of dates: * Specific dates of the form [date <month> <day
Flexible Decision Support in Device-Saturated Environments

DTIC Science & Technology

2003-10-01

also output tuples to a remote MySQL or Postgres database. 3.3 GUI The GUI allows the user to pose queries using SQL and to display query...DatabaseConnection.java – handles connections to an external database (such as MySQL or Postgres ). • Debug.java – contains the code for printing out Debug messages...also provided. It is possible to output the results of queries to a MySQL or Postgres database for archival and the GUI can query those results
High Temperature Joining and Characterization of Joint Properties in Silicon Carbide-Based Composite Materials

NASA Technical Reports Server (NTRS)

Halbig, Michael C.; Singh, Mrityunjay

2015-01-01

Advanced silicon carbide-based ceramics and composites are being developed for a wide variety of high temperature extreme environment applications. Robust high temperature joining and integration technologies are enabling for the fabrication and manufacturing of large and complex shaped components. The development of a new joining approach called SET (Single-step Elevated Temperature) joining will be described along with the overview of previously developed joining approaches including high temperature brazing, ARCJoinT (Affordable, Robust Ceramic Joining Technology), diffusion bonding, and REABOND (Refractory Eutectic Assisted Bonding). Unlike other approaches, SET joining does not have any lower temperature phases and will therefore have a use temperature above 1315C. Optimization of the composition for full conversion to silicon carbide will be discussed. The goal is to find a composition with no remaining carbon or free silicon. Green tape interlayers were developed for joining. Microstructural analysis and preliminary mechanical tests of the joints will be presented.
Improving biomedical information retrieval by linear combinations of different query expansion techniques.

PubMed

Abdulla, Ahmed AbdoAziz Ahmed; Lin, Hongfei; Xu, Bo; Banbhrani, Santosh Kumar

2016-07-25

Biomedical literature retrieval is becoming increasingly complex, and there is a fundamental need for advanced information retrieval systems. Information Retrieval (IR) programs scour unstructured materials such as text documents in large reserves of data that are usually stored on computers. IR is related to the representation, storage, and organization of information items, as well as to access. In IR one of the main problems is to determine which documents are relevant and which are not to the user's needs. Under the current regime, users cannot precisely construct queries in an accurate way to retrieve particular pieces of data from large reserves of data. Basic information retrieval systems are producing low-quality search results. In our proposed system for this paper we present a new technique to refine Information Retrieval searches to better represent the user's information need in order to enhance the performance of information retrieval by using different query expansion techniques and apply a linear combinations between them, where the combinations was linearly between two expansion results at one time. Query expansions expand the search query, for example, by finding synonyms and reweighting original terms. They provide significantly more focused, particularized search results than do basic search queries. The retrieval performance is measured by some variants of MAP (Mean Average Precision) and according to our experimental results, the combination of best results of query expansion is enhanced the retrieved documents and outperforms our baseline by 21.06 %, even it outperforms a previous study by 7.12 %. We propose several query expansion techniques and their combinations (linearly) to make user queries more cognizable to search engines and to produce higher-quality search results.
A distributed query execution engine of big attributed graphs.

PubMed

Batarfi, Omar; Elshawi, Radwa; Fayoumi, Ayman; Barnawi, Ahmed; Sakr, Sherif

2016-01-01

A graph is a popular data model that has become pervasively used for modeling structural relationships between objects. In practice, in many real-world graphs, the graph vertices and edges need to be associated with descriptive attributes. Such type of graphs are referred to as attributed graphs. G-SPARQL has been proposed as an expressive language, with a centralized execution engine, for querying attributed graphs. G-SPARQL supports various types of graph querying operations including reachability, pattern matching and shortest path where any G-SPARQL query may include value-based predicates on the descriptive information (attributes) of the graph edges/vertices in addition to the structural predicates. In general, a main limitation of centralized systems is that their vertical scalability is always restricted by the physical limits of computer systems. This article describes the design, implementation in addition to the performance evaluation of DG-SPARQL, a distributed, hybrid and adaptive parallel execution engine of G-SPARQL queries. In this engine, the topology of the graph is distributed over the main memory of the underlying nodes while the graph data are maintained in a relational store which is replicated on the disk of each of the underlying nodes. DG-SPARQL evaluates parts of the query plan via SQL queries which are pushed to the underlying relational stores while other parts of the query plan, as necessary, are evaluated via indexless memory-based graph traversal algorithms. Our experimental evaluation shows the efficiency and the scalability of DG-SPARQL on querying massive attributed graph datasets in addition to its ability to outperform the performance of Apache Giraph, a popular distributed graph processing system, by orders of magnitudes.
Folksonomical P2P File Sharing Networks Using Vectorized KANSEI Information as Search Tags

NASA Astrophysics Data System (ADS)

Ohnishi, Kei; Yoshida, Kaori; Oie, Yuji

We present the concept of folksonomical peer-to-peer (P2P) file sharing networks that allow participants (peers) to freely assign structured search tags to files. These networks are similar to folksonomies in the present Web from the point of view that users assign search tags to information distributed over a network. As a concrete example, we consider an unstructured P2P network using vectorized Kansei (human sensitivity) information as structured search tags for file search. Vectorized Kansei information as search tags indicates what participants feel about their files and is assigned by the participant to each of their files. A search query also has the same form of search tags and indicates what participants want to feel about files that they will eventually obtain. A method that enables file search using vectorized Kansei information is the Kansei query-forwarding method, which probabilistically propagates a search query to peers that are likely to hold more files having search tags that are similar to the query. The similarity between the search query and the search tags is measured in terms of their dot product. The simulation experiments examine if the Kansei query-forwarding method can provide equal search performance for all peers in a network in which only the Kansei information and the tendency with respect to file collection are different among all of the peers. The simulation results show that the Kansei query forwarding method and a random-walk-based query forwarding method, for comparison, work effectively in different situations and are complementary. Furthermore, the Kansei query forwarding method is shown, through simulations, to be superior to or equal to the random-walk based one in terms of search speed.
Quantum algorithms on Walsh transform and Hamming distance for Boolean functions

NASA Astrophysics Data System (ADS)

Xie, Zhengwei; Qiu, Daowen; Cai, Guangya

2018-06-01

Walsh spectrum or Walsh transform is an alternative description of Boolean functions. In this paper, we explore quantum algorithms to approximate the absolute value of Walsh transform W_f at a single point z0 (i.e., |W_f(z0)|) for n-variable Boolean functions with probability at least 8/π 2 using the number of O(1/|W_f(z_{0)|ɛ }) queries, promised that the accuracy is ɛ , while the best known classical algorithm requires O(2n) queries. The Hamming distance between Boolean functions is used to study the linearity testing and other important problems. We take advantage of Walsh transform to calculate the Hamming distance between two n-variable Boolean functions f and g using O(1) queries in some cases. Then, we exploit another quantum algorithm which converts computing Hamming distance between two Boolean functions to quantum amplitude estimation (i.e., approximate counting). If Ham(f,g)=t≠0, we can approximately compute Ham( f, g) with probability at least 2/3 by combining our algorithm and {Approx-Count(f,ɛ ) algorithm} using the expected number of Θ( √{N/(\\lfloor ɛ t\\rfloor +1)}+√{t(N-t)}/\\lfloor ɛ t\\rfloor +1) queries, promised that the accuracy is ɛ . Moreover, our algorithm is optimal, while the exact query complexity for the above problem is Θ(N) and the query complexity with the accuracy ɛ is O(1/ɛ 2N/(t+1)) in classical algorithm, where N=2n. Finally, we present three exact quantum query algorithms for two promise problems on Hamming distance using O(1) queries, while any classical deterministic algorithm solving the problem uses Ω(2n) queries.
Using web search query data to monitor dengue epidemics: a new model for neglected tropical disease surveillance.

PubMed

Chan, Emily H; Sahai, Vikram; Conrad, Corrie; Brownstein, John S

2011-05-01

A variety of obstacles including bureaucracy and lack of resources have interfered with timely detection and reporting of dengue cases in many endemic countries. Surveillance efforts have turned to modern data sources, such as Internet search queries, which have been shown to be effective for monitoring influenza-like illnesses. However, few have evaluated the utility of web search query data for other diseases, especially those of high morbidity and mortality or where a vaccine may not exist. In this study, we aimed to assess whether web search queries are a viable data source for the early detection and monitoring of dengue epidemics. Bolivia, Brazil, India, Indonesia and Singapore were chosen for analysis based on available data and adequate search volume. For each country, a univariate linear model was then built by fitting a time series of the fraction of Google search query volume for specific dengue-related queries from that country against a time series of official dengue case counts for a time-frame within 2003-2010. The specific combination of queries used was chosen to maximize model fit. Spurious spikes in the data were also removed prior to model fitting. The final models, fit using a training subset of the data, were cross-validated against both the overall dataset and a holdout subset of the data. All models were found to fit the data quite well, with validation correlations ranging from 0.82 to 0.99. Web search query data were found to be capable of tracking dengue activity in Bolivia, Brazil, India, Indonesia and Singapore. Whereas traditional dengue data from official sources are often not available until after some substantial delay, web search query data are available in near real-time. These data represent valuable complement to assist with traditional dengue surveillance.

Parasol: An Architecture for Cross-Cloud Federated Graph Querying

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lieberman, Michael; Choudhury, Sutanay; Hughes, Marisa

2014-06-22

Large scale data fusion of multiple datasets can often provide in- sights that examining datasets individually cannot. However, when these datasets reside in different data centers and cannot be collocated due to technical, administrative, or policy barriers, a unique set of problems arise that hamper querying and data fusion. To ad- dress these problems, a system and architecture named Parasol is presented that enables federated queries over graph databases residing in multiple clouds. Parasol’s design is flexible and requires only minimal assumptions for participant clouds. Query optimization techniques are also described that are compatible with Parasol’s lightweight architecture. Experiments onmore » a prototype implementation of Parasol indicate its suitability for cross-cloud federated graph queries.« less
Querying XML Data with SPARQL

NASA Astrophysics Data System (ADS)

Bikakis, Nikos; Gioldasis, Nektarios; Tsinaraki, Chrisa; Christodoulakis, Stavros

SPARQL is today the standard access language for Semantic Web data. In the recent years XML databases have also acquired industrial importance due to the widespread applicability of XML in the Web. In this paper we present a framework that bridges the heterogeneity gap and creates an interoperable environment where SPARQL queries are used to access XML databases. Our approach assumes that fairly generic mappings between ontology constructs and XML Schema constructs have been automatically derived or manually specified. The mappings are used to automatically translate SPARQL queries to semantically equivalent XQuery queries which are used to access the XML databases. We present the algorithms and the implementation of SPARQL2XQuery framework, which is used for answering SPARQL queries over XML databases.
Searching Electronic Health Records for Temporal Patterns in Patient Histories: A Case Study with Microsoft Amalga

PubMed Central

Plaisant, Catherine; Lam, Stanley; Shneiderman, Ben; Smith, Mark S.; Roseman, David; Marchand, Greg; Gillam, Michael; Feied, Craig; Handler, Jonathan; Rappaport, Hank

2008-01-01

As electronic health records (EHR) become more widespread, they enable clinicians and researchers to pose complex queries that can benefit immediate patient care and deepen understanding of medical treatment and outcomes. However, current query tools make complex temporal queries difficult to pose, and physicians have to rely on computer professionals to specify the queries for them. This paper describes our efforts to develop a novel query tool implemented in a large operational system at the Washington Hospital Center (Microsoft Amalga, formerly known as Azyxxi). We describe our design of the interface to specify temporal patterns and the visual presentation of results, and report on a pilot user study looking for adverse reactions following radiology studies using contrast. PMID:18999158
Participation in voluntary and community organisations in the United Kingdom and the influences on the self-management of long-term conditions.

PubMed

Jeffries, Mark; Mathieson, Amy; Kennedy, Anne; Kirk, Susan; Morris, Rebecca; Blickem, Christian; Vassilev, Ivalyo; Rogers, Anne

2015-05-01

Voluntary and community organisations (VCOs) have health benefits for those who attend and are viewed as having the potential to support long-term condition management. However, existing community-level understandings of participation do not explain the involvement with VCOs at an individual level, or the nature of support, which may elicit health benefits. Framing active participation as 'doing and experiencing', the aim of this qualitative study was to explore why people with long-term vascular conditions join VCOs, maintain their membership and what prevents participation. Twenty participants, self-diagnosed as having diabetes, chronic heart disease or chronic kidney disease, were purposefully sampled and recruited from a range of VCOs in the North West of England identified from a mapping of local organisations. In semi-structured interviews, we explored the nature of their participation. Analysis was thematic and iterative involving a continual reflection on the data. People gave various reasons for joining groups. These included health and well-being, the need for social contact and pursuing a particular hobby. Barriers to participation included temporal and spatial barriers and those associated with group dynamics. Members maintained their membership on the basis of an identity and sense of belonging to the group, developing close relationships within it and the availability of support and trust. Participants joined community groups often in response to a health-related event. Our findings demonstrate the ways in which the social contact associated with continued participation in VCOs is seen as helping with long-term condition management. Interventions designed at improving chronic illness management might usefully consider the role of VCOs. © 2014 John Wiley & Sons Ltd.
Improved data retrieval from TreeBASE via taxonomic and linguistic data enrichment

PubMed Central

Anwar, Nadia; Hunt, Ela

2009-01-01

Background TreeBASE, the only data repository for phylogenetic studies, is not being used effectively since it does not meet the taxonomic data retrieval requirements of the systematics community. We show, through an examination of the queries performed on TreeBASE, that data retrieval using taxon names is unsatisfactory. Results We report on a new wrapper supporting taxon queries on TreeBASE by utilising a Taxonomy and Classification Database (TCl-Db) we created. TCl-Db holds merged and consolidated taxonomic names from multiple data sources and can be used to translate hierarchical, vernacular and synonym queries into specific query terms in TreeBASE. The query expansion supported by TCl-Db shows very significant information retrieval quality improvement. The wrapper can be accessed at the URL The methodology we developed is scalable and can be applied to new data, as those become available in the future. Conclusion Significantly improved data retrieval quality is shown for all queries, and additional flexibility is achieved via user-driven taxonomy selection. PMID:19426482
SPARQL Assist language-neutral query composer

PubMed Central

2012-01-01

Background SPARQL query composition is difficult for the lay-person, and even the experienced bioinformatician in cases where the data model is unfamiliar. Moreover, established best-practices and internationalization concerns dictate that the identifiers for ontological terms should be opaque rather than human-readable, which further complicates the task of synthesizing queries manually. Results We present SPARQL Assist: a Web application that addresses these issues by providing context-sensitive type-ahead completion during SPARQL query construction. Ontological terms are suggested using their multi-lingual labels and descriptions, leveraging existing support for internationalization and language-neutrality. Moreover, the system utilizes the semantics embedded in ontologies, and within the query itself, to help prioritize the most likely suggestions. Conclusions To ensure success, the Semantic Web must be easily available to all users, regardless of locale, training, or preferred language. By enhancing support for internationalization, and moreover by simplifying the manual construction of SPARQL queries through the use of controlled-natural-language interfaces, we believe we have made some early steps towards simplifying access to Semantic Web resources. PMID:22373327
SPARQL assist language-neutral query composer.

PubMed

McCarthy, Luke; Vandervalk, Ben; Wilkinson, Mark

2012-01-25

SPARQL query composition is difficult for the lay-person, and even the experienced bioinformatician in cases where the data model is unfamiliar. Moreover, established best-practices and internationalization concerns dictate that the identifiers for ontological terms should be opaque rather than human-readable, which further complicates the task of synthesizing queries manually. We present SPARQL Assist: a Web application that addresses these issues by providing context-sensitive type-ahead completion during SPARQL query construction. Ontological terms are suggested using their multi-lingual labels and descriptions, leveraging existing support for internationalization and language-neutrality. Moreover, the system utilizes the semantics embedded in ontologies, and within the query itself, to help prioritize the most likely suggestions. To ensure success, the Semantic Web must be easily available to all users, regardless of locale, training, or preferred language. By enhancing support for internationalization, and moreover by simplifying the manual construction of SPARQL queries through the use of controlled-natural-language interfaces, we believe we have made some early steps towards simplifying access to Semantic Web resources.
Querying Safety Cases

NASA Technical Reports Server (NTRS)

Denney, Ewen W.; Naylor, Dwight; Pai, Ganesh

2014-01-01

Querying a safety case to show how the various stakeholders' concerns about system safety are addressed has been put forth as one of the benefits of argument-based assurance (in a recent study by the Health Foundation, UK, which reviewed the use of safety cases in safety-critical industries). However, neither the literature nor current practice offer much guidance on querying mechanisms appropriate for, or available within, a safety case paradigm. This paper presents a preliminary approach that uses a formal basis for querying safety cases, specifically Goal Structuring Notation (GSN) argument structures. Our approach semantically enriches GSN arguments with domain-specific metadata that the query language leverages, along with its inherent structure, to produce views. We have implemented the approach in our toolset AdvoCATE, and illustrate it by application to a fragment of the safety argument for an Unmanned Aircraft System (UAS) being developed at NASA Ames. We also discuss the potential practical utility of our query mechanism within the context of the existing framework for UAS safety assurance.
Parallel Index and Query for Large Scale Data Analysis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chou, Jerry; Wu, Kesheng; Ruebel, Oliver

2011-07-18

Modern scientific datasets present numerous data management and analysis challenges. State-of-the-art index and query technologies are critical for facilitating interactive exploration of large datasets, but numerous challenges remain in terms of designing a system for process- ing general scientific datasets. The system needs to be able to run on distributed multi-core platforms, efficiently utilize underlying I/O infrastructure, and scale to massive datasets. We present FastQuery, a novel software framework that address these challenges. FastQuery utilizes a state-of-the-art index and query technology (FastBit) and is designed to process mas- sive datasets on modern supercomputing platforms. We apply FastQuery to processing ofmore » a massive 50TB dataset generated by a large scale accelerator modeling code. We demonstrate the scalability of the tool to 11,520 cores. Motivated by the scientific need to search for inter- esting particles in this dataset, we use our framework to reduce search time from hours to tens of seconds.« less
Information Network Model Query Processing

NASA Astrophysics Data System (ADS)

Song, Xiaopu

Information Networking Model (INM) [31] is a novel database model for real world objects and relationships management. It naturally and directly supports various kinds of static and dynamic relationships between objects. In INM, objects are networked through various natural and complex relationships. INM Query Language (INM-QL) [30] is designed to explore such information network, retrieve information about schema, instance, their attributes, relationships, and context-dependent information, and process query results in the user specified form. INM database management system has been implemented using Berkeley DB, and it supports INM-QL. This thesis is mainly focused on the implementation of the subsystem that is able to effectively and efficiently process INM-QL. The subsystem provides a lexical and syntactical analyzer of INM-QL, and it is able to choose appropriate evaluation strategies and index mechanism to process queries in INM-QL without the user's intervention. It also uses intermediate result structure to hold intermediate query result and other helping structures to reduce complexity of query processing.
Using Bitmap Indexing Technology for Combined Numerical and TextQueries

DOE Office of Scientific and Technical Information (OSTI.GOV)

Stockinger, Kurt; Cieslewicz, John; Wu, Kesheng

2006-10-16

In this paper, we describe a strategy of using compressedbitmap indices to speed up queries on both numerical data and textdocuments. By using an efficient compression algorithm, these compressedbitmap indices are compact even for indices with millions of distinctterms. Moreover, bitmap indices can be used very efficiently to answerBoolean queries over text documents involving multiple query terms.Existing inverted indices for text searches are usually inefficient forcorpora with a very large number of terms as well as for queriesinvolving a large number of hits. We demonstrate that our compressedbitmap index technology overcomes both of those short-comings. In aperformance comparison against amore » commonly used database system, ourindices answer queries 30 times faster on average. To provide full SQLsupport, we integrated our indexing software, called FastBit, withMonetDB. The integrated system MonetDB/FastBit provides not onlyefficient searches on a single table as FastBit does, but also answersjoin queries efficiently. Furthermore, MonetDB/FastBit also provides avery efficient retrieval mechanism of result records.« less
Integrating Smart Health in the US Health Care System: Infodemiology Study of Asthma Monitoring in the Google Era.

PubMed

Mavragani, Amaryllis; Sampri, Alexia; Sypsa, Karla; Tsagarakis, Konstantinos P

2018-03-12

With the internet's penetration and use constantly expanding, this vast amount of information can be employed in order to better assess issues in the US health care system. Google Trends, a popular tool in big data analytics, has been widely used in the past to examine interest in various medical and health-related topics and has shown great potential in forecastings, predictions, and nowcastings. As empirical relationships between online queries and human behavior have been shown to exist, a new opportunity to explore the behavior toward asthma-a common respiratory disease-is present. This study aimed at forecasting the online behavior toward asthma and examined the correlations between queries and reported cases in order to explore the possibility of nowcasting asthma prevalence in the United States using online search traffic data. Applying Holt-Winters exponential smoothing to Google Trends time series from 2004 to 2015 for the term "asthma," forecasts for online queries at state and national levels are estimated from 2016 to 2020 and validated against available Google query data from January 2016 to June 2017. Correlations among yearly Google queries and between Google queries and reported asthma cases are examined. Our analysis shows that search queries exhibit seasonality within each year and the relationships between each 2 years' queries are statistically significant (P<.05). Estimated forecasting models for a 5-year period (2016 through 2020) for Google queries are robust and validated against available data from January 2016 to June 2017. Significant correlations were found between (1) online queries and National Health Interview Survey lifetime asthma (r=-.82, P=.001) and current asthma (r=-.77, P=.004) rates from 2004 to 2015 and (2) between online queries and Behavioral Risk Factor Surveillance System lifetime (r=-.78, P=.003) and current asthma (r=-.79, P=.002) rates from 2004 to 2014. The correlations are negative, but lag analysis to identify the period of response cannot be employed until short-interval data on asthma prevalence are made available. Online behavior toward asthma can be accurately predicted, and significant correlations between online queries and reported cases exist. This method of forecasting Google queries can be used by health care officials to nowcast asthma prevalence by city, state, or nationally, subject to future availability of daily, weekly, or monthly data on reported cases. This method could therefore be used for improved monitoring and assessment of the needs surrounding the current population of patients with asthma. ©Amaryllis Mavragani, Alexia Sampri, Karla Sypsa, Konstantinos P Tsagarakis. Originally published in JMIR Public Health and Surveillance (http://publichealth.jmir.org), 12.03.2018.
BioFed: federated query processing over life sciences linked open data.

PubMed

Hasnain, Ali; Mehmood, Qaiser; Sana E Zainab, Syeda; Saleem, Muhammad; Warren, Claude; Zehra, Durre; Decker, Stefan; Rebholz-Schuhmann, Dietrich

2017-03-15

Biomedical data, e.g. from knowledge bases and ontologies, is increasingly made available following open linked data principles, at best as RDF triple data. This is a necessary step towards unified access to biological data sets, but this still requires solutions to query multiple endpoints for their heterogeneous data to eventually retrieve all the meaningful information. Suggested solutions are based on query federation approaches, which require the submission of SPARQL queries to endpoints. Due to the size and complexity of available data, these solutions have to be optimised for efficient retrieval times and for users in life sciences research. Last but not least, over time, the reliability of data resources in terms of access and quality have to be monitored. Our solution (BioFed) federates data over 130 SPARQL endpoints in life sciences and tailors query submission according to the provenance information. BioFed has been evaluated against the state of the art solution FedX and forms an important benchmark for the life science domain. The efficient cataloguing approach of the federated query processing system 'BioFed', the triple pattern wise source selection and the semantic source normalisation forms the core to our solution. It gathers and integrates data from newly identified public endpoints for federated access. Basic provenance information is linked to the retrieved data. Last but not least, BioFed makes use of the latest SPARQL standard (i.e., 1.1) to leverage the full benefits for query federation. The evaluation is based on 10 simple and 10 complex queries, which address data in 10 major and very popular data sources (e.g., Dugbank, Sider). BioFed is a solution for a single-point-of-access for a large number of SPARQL endpoints providing life science data. It facilitates efficient query generation for data access and provides basic provenance information in combination with the retrieved data. BioFed fully supports SPARQL 1.1 and gives access to the endpoint's availability based on the EndpointData graph. Our evaluation of BioFed against FedX is based on 20 heterogeneous federated SPARQL queries and shows competitive execution performance in comparison to FedX, which can be attributed to the provision of provenance information for the source selection. Developing and testing federated query engines for life sciences data is still a challenging task. According to our findings, it is advantageous to optimise the source selection. The cataloguing of SPARQL endpoints, including type and property indexing, leads to efficient querying of data resources over the Web of Data. This could even be further improved through the use of ontologies, e.g., for abstract normalisation of query terms.
An adaptable architecture for patient cohort identification from diverse data sources.

PubMed

Bache, Richard; Miles, Simon; Taweel, Adel

2013-12-01

We define and validate an architecture for systems that identify patient cohorts for clinical trials from multiple heterogeneous data sources. This architecture has an explicit query model capable of supporting temporal reasoning and expressing eligibility criteria independently of the representation of the data used to evaluate them. The architecture has the key feature that queries defined according to the query model are both pre and post-processed and this is used to address both structural and semantic heterogeneity. The process of extracting the relevant clinical facts is separated from the process of reasoning about them. A specific instance of the query model is then defined and implemented. We show that the specific instance of the query model has wide applicability. We then describe how it is used to access three diverse data warehouses to determine patient counts. Although the proposed architecture requires greater effort to implement the query model than would be the case for using just SQL and accessing a data-based management system directly, this effort is justified because it supports both temporal reasoning and heterogeneous data sources. The query model only needs to be implemented once no matter how many data sources are accessed. Each additional source requires only the implementation of a lightweight adaptor. The architecture has been used to implement a specific query model that can express complex eligibility criteria and access three diverse data warehouses thus demonstrating the feasibility of this approach in dealing with temporal reasoning and data heterogeneity.
Biological and Chemical Aspects of Natural Biflavonoids from Plants: A Brief Review.

PubMed

Gontijo, Vanessa Silva; Dos Santos, Marcelo Henrique; Viegas, Claudio

2017-01-01

Biflavonoids belong to a subclass of the plant flavonoids family and are limited to several species in the plant kingdom. In the literature, biflavonoids are extensively reported for their pharmacological properties including anti-inflammatory, antioxidant, inhibitory activity against phospholipase A2 (PLA2) and antiprotozoal activity. These activities have been discovered from the small number of biflavonoid structures that have been investigated, although the natural biflavonoids library is likely to be large. In addition, many medicinal properties and traditional use of plants are attributed to the presence of bioflavonoids among their secondary metabolites. Structurally, biflavonoids are polyphenol compounds comprising of two identical or non-identical flavonflavonoid units joined in a symmetrical or unsymmetrical manner through an alkyl or an alkoxy-based linker of varying length. Due to their chemical and biological importance, several bioprospective phytochemical studies and chemical approaches using coupling and molecular rearrangement strategies have been developed to identify and synthesize new bioactive biflavonoids. In this brief review, we present some basic structural aspects for classification and nomenclature of bioflavonoids and a compilation of the literature data published in the last 7 years, concerning the discovery of new natural biflavonoids of plant origin and their pharmacological and biological properties. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
DNA barcoding reveal patterns of species diversity among northwestern Pacific molluscs

PubMed Central

Sun, Shao’e; Li, Qi; Kong, Lingfeng; Yu, Hong; Zheng, Xiaodong; Yu, Ruihai; Dai, Lina; Sun, Yan; Chen, Jun; Liu, Jun; Ni, Lehai; Feng, Yanwei; Yu, Zhenzhen; Zou, Shanmei; Lin, Jiping

2016-01-01

This study represents the first comprehensive molecular assessment of northwestern Pacific molluscs. In total, 2801 DNA barcodes belonging to 569 species from China, Japan and Korea were analyzed. An overlap between intra- and interspecific genetic distances was present in 71 species. We tested the efficacy of this library by simulating a sequence-based specimen identification scenario using Best Match (BM), Best Close Match (BCM) and All Species Barcode (ASB) criteria with three threshold values. BM approach returned 89.15% true identifications (95.27% when excluding singletons). The highest success rate of congruent identifications was obtained with BCM at 0.053 threshold. The analysis of our barcode library together with public data resulted in 582 Barcode Index Numbers (BINs), 72.2% of which was found to be concordantly with morphology-based identifications. The discrepancies were divided in two groups: sequences from different species clustered in a single BIN and conspecific sequences divided in one more BINs. In Neighbour-Joining phenogram, 2,320 (83.0%) queries fromed 355 (62.4%) species-specific barcode clusters allowing their successful identification. 33 species showed paraphyletic and haplotype sharing. 62 cases are represented by deeply diverged lineages. This study suggest an increased species diversity in this region, highlighting taxonomic revision and conservation strategy for the cryptic complexes. PMID:27640675
Molecular characterization of the Indian poultry nodular tapeworm, Raillietina echinobothrida (Cestoda: Cyclophyllidea: Davaineidae) based on rDNA internal transcribed spacer 2 region.

PubMed

Ramnath; Jyrwa, D B; Dutta, A K; Das, B; Tandon, V

2014-03-01

The nodular tapeworm, Raillietina echinobothrida is a well studied avian gastrointestinal parasite of family Davaineidae (Cestoda: Cyclophyllidea). It is reported to be the largest in size and second most prevalent species infecting chicken in north-east India. In the present study, morphometrical methods coupled with the molecular analysis of the second internal transcribed spacer (ITS2) region of ribosomal DNA were employed for precise identification of the parasite. The annotated ITS2 region was found to be 446 bp long and further utilized to elucidate the phylogenetic relationships and its species-interrelationships at the molecular level. In phylogenetic analysis similar topology was observed among the trees obtained by distance-based neighbor-joining as well as character-based maximum parsimony tree building methods. The query sequence R. echinobothrida is well aligned and placed within the Davaineidae group, with all Raillietina species well separated from the other cyclophyllidean (taeniid and hymenolepid) cestodes, while Diphyllobothrium latum (Pseudophyllidea: Diphyllobothriidae) was rooted as an out-group. Sequence similarities indeed confirmed our hypothesis that Raillietina spp. are neighboring the position with other studied species of order Cyclophyllidea against the out-group order Pseudophyllidea. The present study strengthens the potential of ITS2 as a reliable marker for phylogenetic reconstructions.
Joining of graphene flakes by low energy N ion beam irradiation

NASA Astrophysics Data System (ADS)

Wu, Xin; Zhao, Haiyan; Pei, Jiayun; Yan, Dong

2017-03-01

An approach utilizing low energy N ion beam irradiation is applied in joining two monolayer graphene flakes. Raman spectrometry and atomic force microscopy show the joining signal under 40 eV and 1 × 1014 cm-2 N ion irradiation. Molecular dynamics simulations demonstrate that the joining phenomenon is attributed to the punch-down effect and the subsequent chemical bond generation between the two sheets. The generated chemical bonds are made up of inserted ions (embedded joining) and knocked-out carbon atoms (saturation joining). The electronic transport properties of the joint are also calculated for its applications.
Precision Joining Center

NASA Astrophysics Data System (ADS)

Powell, J. W.; Westphal, D. A.

1991-08-01

A workshop to obtain input from industry on the establishment of the Precision Joining Center (PJC) was held on July 10-12, 1991. The PJC is a center for training Joining Technologists in advanced joining techniques and concepts in order to promote the competitiveness of U.S. industry. The center will be established as part of the DOE Defense Programs Technology Commercialization Initiative, and operated by EG&G Rocky Flats in cooperation with the American Welding Society and the Colorado School of Mines Center for Welding and Joining Research. The overall objectives of the workshop were to validate the need for a Joining Technologists to fill the gap between the welding operator and the welding engineer, and to assure that the PJC will train individuals to satisfy that need. The consensus of the workshop participants was that the Joining Technologist is a necessary position in industry, and is currently used, with some variation, by many companies. It was agreed that the PJC core curriculum, as presented, would produce a Joining Technologist of value to industries that use precision joining techniques. The advantage of the PJC would be to train the Joining Technologist much more quickly and more completely. The proposed emphasis of the PJC curriculum on equipment intensive and hands-on training was judged to be essential.
Spatial problem-solving strategies of middle school students: Wayfinding with geographic information systems

NASA Astrophysics Data System (ADS)

Wigglesworth, John C.

2000-06-01

Geographic Information Systems (GIS) is a powerful computer software package that emphasizes the use of maps and the management of spatially referenced environmental data archived in a systems data base. Professional applications of GIS have been in place since the 1980's, but only recently has GIS gained significant attention in the K--12 classroom. Students using GIS are able to manipulate and query data in order to solve all manners of spatial problems. Very few studies have examined how this technological innovation can support classroom learning. In particular, there has been little research on how experience in using the software correlates with a child's spatial cognition and his/her ability to understand spatial relationships. This study investigates the strategies used by middle school students to solve a wayfinding (route-finding) problem using the ArcView GIS software. The research design combined an individual background questionnaire, results from the Group Assessment of Logical Thinking (GALT) test, and analysis of reflective think-aloud sessions to define the characteristics of the strategies students' used to solve this particular class of spatial problem. Three uniquely different spatial problem solving strategies were identified. Visual/Concrete Wayfinders used a highly visual strategy; Logical/Abstract Wayfinders used GIS software tools to apply a more analytical and systematic approach; Transitional Wayfinders used an approach that showed evidence of one that was shifting from a visual strategy to one that was more analytical. The triangulation of data sources indicates that this progression of wayfinding strategy can be correlated both to Piagetian stages of logical thought and to experience with the use of maps. These findings suggest that GIS teachers must be aware that their students' performance will lie on a continuum that is based on cognitive development, spatial ability, and prior experience with maps. To be most effective, GIS teaching strategies and curriculum development should also represent a progression that correlates to the learners' current skills and experience.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.