Sample records for spatial query processing

  1. Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce.

    PubMed

    Aji, Ablimit; Wang, Fusheng; Vo, Hoang; Lee, Rubao; Liu, Qiaoling; Zhang, Xiaodong; Saltz, Joel

    2013-08-01

    Support of high performance queries on large volumes of spatial data becomes increasingly important in many application domains, including geospatial problems in numerous fields, location based services, and emerging scientific applications that are increasingly data- and compute-intensive. The emergence of massive scale spatial data is due to the proliferation of cost effective and ubiquitous positioning technologies, development of high resolution imaging technologies, and contribution from a large number of community users. There are two major challenges for managing and querying massive spatial data to support spatial queries: the explosion of spatial data, and the high computational complexity of spatial queries. In this paper, we present Hadoop-GIS - a scalable and high performance spatial data warehousing system for running large scale spatial queries on Hadoop. Hadoop-GIS supports multiple types of spatial queries on MapReduce through spatial partitioning, customizable spatial query engine RESQUE, implicit parallel spatial query execution on MapReduce, and effective methods for amending query results through handling boundary objects. Hadoop-GIS utilizes global partition indexing and customizable on demand local spatial indexing to achieve efficient query processing. Hadoop-GIS is integrated into Hive to support declarative spatial queries with an integrated architecture. Our experiments have demonstrated the high efficiency of Hadoop-GIS on query response and high scalability to run on commodity clusters. Our comparative experiments have showed that performance of Hadoop-GIS is on par with parallel SDBMS and outperforms SDBMS for compute-intensive queries. Hadoop-GIS is available as a set of library for processing spatial queries, and as an integrated software package in Hive.

  2. Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce

    PubMed Central

    Aji, Ablimit; Wang, Fusheng; Vo, Hoang; Lee, Rubao; Liu, Qiaoling; Zhang, Xiaodong; Saltz, Joel

    2013-01-01

    Support of high performance queries on large volumes of spatial data becomes increasingly important in many application domains, including geospatial problems in numerous fields, location based services, and emerging scientific applications that are increasingly data- and compute-intensive. The emergence of massive scale spatial data is due to the proliferation of cost effective and ubiquitous positioning technologies, development of high resolution imaging technologies, and contribution from a large number of community users. There are two major challenges for managing and querying massive spatial data to support spatial queries: the explosion of spatial data, and the high computational complexity of spatial queries. In this paper, we present Hadoop-GIS – a scalable and high performance spatial data warehousing system for running large scale spatial queries on Hadoop. Hadoop-GIS supports multiple types of spatial queries on MapReduce through spatial partitioning, customizable spatial query engine RESQUE, implicit parallel spatial query execution on MapReduce, and effective methods for amending query results through handling boundary objects. Hadoop-GIS utilizes global partition indexing and customizable on demand local spatial indexing to achieve efficient query processing. Hadoop-GIS is integrated into Hive to support declarative spatial queries with an integrated architecture. Our experiments have demonstrated the high efficiency of Hadoop-GIS on query response and high scalability to run on commodity clusters. Our comparative experiments have showed that performance of Hadoop-GIS is on par with parallel SDBMS and outperforms SDBMS for compute-intensive queries. Hadoop-GIS is available as a set of library for processing spatial queries, and as an integrated software package in Hive. PMID:24187650

  3. a Novel Approach of Indexing and Retrieving Spatial Polygons for Efficient Spatial Region Queries

    NASA Astrophysics Data System (ADS)

    Zhao, J. H.; Wang, X. Z.; Wang, F. Y.; Shen, Z. H.; Zhou, Y. C.; Wang, Y. L.

    2017-10-01

    Spatial region queries are more and more widely used in web-based applications. Mechanisms to provide efficient query processing over geospatial data are essential. However, due to the massive geospatial data volume, heavy geometric computation, and high access concurrency, it is difficult to get response in real time. Spatial indexes are usually used in this situation. In this paper, based on k-d tree, we introduce a distributed KD-Tree (DKD-Tree) suitbable for polygon data, and a two-step query algorithm. The spatial index construction is recursive and iterative, and the query is an in memory process. Both the index and query methods can be processed in parallel, and are implemented based on HDFS, Spark and Redis. Experiments on a large volume of Remote Sensing images metadata have been carried out, and the advantages of our method are investigated by comparing with spatial region queries executed on PostgreSQL and PostGIS. Results show that our approach not only greatly improves the efficiency of spatial region query, but also has good scalability, Moreover, the two-step spatial range query algorithm can also save cluster resources to support a large number of concurrent queries. Therefore, this method is very useful when building large geographic information systems.

  4. Demonstration of Hadoop-GIS: A Spatial Data Warehousing System Over MapReduce.

    PubMed

    Aji, Ablimit; Sun, Xiling; Vo, Hoang; Liu, Qioaling; Lee, Rubao; Zhang, Xiaodong; Saltz, Joel; Wang, Fusheng

    2013-11-01

    The proliferation of GPS-enabled devices, and the rapid improvement of scientific instruments have resulted in massive amounts of spatial data in the last decade. Support of high performance spatial queries on large volumes data has become increasingly important in numerous fields, which requires a scalable and efficient spatial data warehousing solution as existing approaches exhibit scalability limitations and efficiency bottlenecks for large scale spatial applications. In this demonstration, we present Hadoop-GIS - a scalable and high performance spatial query system over MapReduce. Hadoop-GIS provides an efficient spatial query engine to process spatial queries, data and space based partitioning, and query pipelines that parallelize queries implicitly on MapReduce. Hadoop-GIS also provides an expressive, SQL-like spatial query language for workload specification. We will demonstrate how spatial queries are expressed in spatially extended SQL queries, and submitted through a command line/web interface for execution. Parallel to our system demonstration, we explain the system architecture and details on how queries are translated to MapReduce operators, optimized, and executed on Hadoop. In addition, we will showcase how the system can be used to support two representative real world use cases: large scale pathology analytical imaging, and geo-spatial data warehousing.

  5. A Big Spatial Data Processing Framework Applying to National Geographic Conditions Monitoring

    NASA Astrophysics Data System (ADS)

    Xiao, F.

    2018-04-01

    In this paper, a novel framework for spatial data processing is proposed, which apply to National Geographic Conditions Monitoring project of China. It includes 4 layers: spatial data storage, spatial RDDs, spatial operations, and spatial query language. The spatial data storage layer uses HDFS to store large size of spatial vector/raster data in the distributed cluster. The spatial RDDs are the abstract logical dataset of spatial data types, and can be transferred to the spark cluster to conduct spark transformations and actions. The spatial operations layer is a series of processing on spatial RDDs, such as range query, k nearest neighbor and spatial join. The spatial query language is a user-friendly interface which provide people not familiar with Spark with a comfortable way to operation the spatial operation. Compared with other spatial frameworks, it is highlighted that comprehensive technologies are referred for big spatial data processing. Extensive experiments on real datasets show that the framework achieves better performance than traditional process methods.

  6. Demonstration of Hadoop-GIS: A Spatial Data Warehousing System Over MapReduce

    PubMed Central

    Aji, Ablimit; Sun, Xiling; Vo, Hoang; Liu, Qioaling; Lee, Rubao; Zhang, Xiaodong; Saltz, Joel; Wang, Fusheng

    2016-01-01

    The proliferation of GPS-enabled devices, and the rapid improvement of scientific instruments have resulted in massive amounts of spatial data in the last decade. Support of high performance spatial queries on large volumes data has become increasingly important in numerous fields, which requires a scalable and efficient spatial data warehousing solution as existing approaches exhibit scalability limitations and efficiency bottlenecks for large scale spatial applications. In this demonstration, we present Hadoop-GIS – a scalable and high performance spatial query system over MapReduce. Hadoop-GIS provides an efficient spatial query engine to process spatial queries, data and space based partitioning, and query pipelines that parallelize queries implicitly on MapReduce. Hadoop-GIS also provides an expressive, SQL-like spatial query language for workload specification. We will demonstrate how spatial queries are expressed in spatially extended SQL queries, and submitted through a command line/web interface for execution. Parallel to our system demonstration, we explain the system architecture and details on how queries are translated to MapReduce operators, optimized, and executed on Hadoop. In addition, we will showcase how the system can be used to support two representative real world use cases: large scale pathology analytical imaging, and geo-spatial data warehousing. PMID:27617325

  7. Cognitive issues in searching images with visual queries

    NASA Astrophysics Data System (ADS)

    Yu, ByungGu; Evens, Martha W.

    1999-01-01

    In this paper, we propose our image indexing technique and visual query processing technique. Our mental images are different from the actual retinal images and many things, such as personal interests, personal experiences, perceptual context, the characteristics of spatial objects, and so on, affect our spatial perception. These private differences are propagated into our mental images and so our visual queries become different from the real images that we want to find. This is a hard problem and few people have tried to work on it. In this paper, we survey the human mental imagery system, the human spatial perception, and discuss several kinds of visual queries. Also, we propose our own approach to visual query interpretation and processing.

  8. Spatial information semantic query based on SPARQL

    NASA Astrophysics Data System (ADS)

    Xiao, Zhifeng; Huang, Lei; Zhai, Xiaofang

    2009-10-01

    How can the efficiency of spatial information inquiries be enhanced in today's fast-growing information age? We are rich in geospatial data but poor in up-to-date geospatial information and knowledge that are ready to be accessed by public users. This paper adopts an approach for querying spatial semantic by building an Web Ontology language(OWL) format ontology and introducing SPARQL Protocol and RDF Query Language(SPARQL) to search spatial semantic relations. It is important to establish spatial semantics that support for effective spatial reasoning for performing semantic query. Compared to earlier keyword-based and information retrieval techniques that rely on syntax, we use semantic approaches in our spatial queries system. Semantic approaches need to be developed by ontology, so we use OWL to describe spatial information extracted by the large-scale map of Wuhan. Spatial information expressed by ontology with formal semantics is available to machines for processing and to people for understanding. The approach is illustrated by introducing a case study for using SPARQL to query geo-spatial ontology instances of Wuhan. The paper shows that making use of SPARQL to search OWL ontology instances can ensure the result's accuracy and applicability. The result also indicates constructing a geo-spatial semantic query system has positive efforts on forming spatial query and retrieval.

  9. Towards Building a High Performance Spatial Query System for Large Scale Medical Imaging Data.

    PubMed

    Aji, Ablimit; Wang, Fusheng; Saltz, Joel H

    2012-11-06

    Support of high performance queries on large volumes of scientific spatial data is becoming increasingly important in many applications. This growth is driven by not only geospatial problems in numerous fields, but also emerging scientific applications that are increasingly data- and compute-intensive. For example, digital pathology imaging has become an emerging field during the past decade, where examination of high resolution images of human tissue specimens enables more effective diagnosis, prediction and treatment of diseases. Systematic analysis of large-scale pathology images generates tremendous amounts of spatially derived quantifications of micro-anatomic objects, such as nuclei, blood vessels, and tissue regions. Analytical pathology imaging provides high potential to support image based computer aided diagnosis. One major requirement for this is effective querying of such enormous amount of data with fast response, which is faced with two major challenges: the "big data" challenge and the high computation complexity. In this paper, we present our work towards building a high performance spatial query system for querying massive spatial data on MapReduce. Our framework takes an on demand index building approach for processing spatial queries and a partition-merge approach for building parallel spatial query pipelines, which fits nicely with the computing model of MapReduce. We demonstrate our framework on supporting multi-way spatial joins for algorithm evaluation and nearest neighbor queries for microanatomic objects. To reduce query response time, we propose cost based query optimization to mitigate the effect of data skew. Our experiments show that the framework can efficiently support complex analytical spatial queries on MapReduce.

  10. Towards Building a High Performance Spatial Query System for Large Scale Medical Imaging Data

    PubMed Central

    Aji, Ablimit; Wang, Fusheng; Saltz, Joel H.

    2013-01-01

    Support of high performance queries on large volumes of scientific spatial data is becoming increasingly important in many applications. This growth is driven by not only geospatial problems in numerous fields, but also emerging scientific applications that are increasingly data- and compute-intensive. For example, digital pathology imaging has become an emerging field during the past decade, where examination of high resolution images of human tissue specimens enables more effective diagnosis, prediction and treatment of diseases. Systematic analysis of large-scale pathology images generates tremendous amounts of spatially derived quantifications of micro-anatomic objects, such as nuclei, blood vessels, and tissue regions. Analytical pathology imaging provides high potential to support image based computer aided diagnosis. One major requirement for this is effective querying of such enormous amount of data with fast response, which is faced with two major challenges: the “big data” challenge and the high computation complexity. In this paper, we present our work towards building a high performance spatial query system for querying massive spatial data on MapReduce. Our framework takes an on demand index building approach for processing spatial queries and a partition-merge approach for building parallel spatial query pipelines, which fits nicely with the computing model of MapReduce. We demonstrate our framework on supporting multi-way spatial joins for algorithm evaluation and nearest neighbor queries for microanatomic objects. To reduce query response time, we propose cost based query optimization to mitigate the effect of data skew. Our experiments show that the framework can efficiently support complex analytical spatial queries on MapReduce. PMID:24501719

  11. An intelligent user interface for browsing satellite data catalogs

    NASA Technical Reports Server (NTRS)

    Cromp, Robert F.; Crook, Sharon

    1989-01-01

    A large scale domain-independent spatial data management expert system that serves as a front-end to databases containing spatial data is described. This system is unique for two reasons. First, it uses spatial search techniques to generate a list of all the primary keys that fall within a user's spatial constraints prior to invoking the database management system, thus substantially decreasing the amount of time required to answer a user's query. Second, a domain-independent query expert system uses a domain-specific rule base to preprocess the user's English query, effectively mapping a broad class of queries into a smaller subset that can be handled by a commercial natural language processing system. The methods used by the spatial search module and the query expert system are explained, and the system architecture for the spatial data management expert system is described. The system is applied to data from the International Ultraviolet Explorer (IUE) satellite, and results are given.

  12. Spatial aggregation query in dynamic geosensor networks

    NASA Astrophysics Data System (ADS)

    Yi, Baolin; Feng, Dayang; Xiao, Shisong; Zhao, Erdun

    2007-11-01

    Wireless sensor networks have been widely used for civilian and military applications, such as environmental monitoring and vehicle tracking. In many of these applications, the researches mainly aim at building sensor network based systems to leverage the sensed data to applications. However, the existing works seldom exploited spatial aggregation query considering the dynamic characteristics of sensor networks. In this paper, we investigate how to process spatial aggregation query over dynamic geosensor networks where both the sink node and sensor nodes are mobile and propose several novel improvements on enabling techniques. The mobility of sensors makes the existing routing protocol based on information of fixed framework or the neighborhood infeasible. We present an improved location-based stateless implicit geographic forwarding (IGF) protocol for routing a query toward the area specified by query window, a diameter-based window aggregation query (DWAQ) algorithm for query propagation and data aggregation in the query window, finally considering the location changing of the sink node, we present two schemes to forward the result to the sink node. Simulation results show that the proposed algorithms can improve query latency and query accuracy.

  13. a Spatiotemporal Aggregation Query Method Using Multi-Thread Parallel Technique Based on Regional Division

    NASA Astrophysics Data System (ADS)

    Liao, S.; Chen, L.; Li, J.; Xiong, W.; Wu, Q.

    2015-07-01

    Existing spatiotemporal database supports spatiotemporal aggregation query over massive moving objects datasets. Due to the large amounts of data and single-thread processing method, the query speed cannot meet the application requirements. On the other hand, the query efficiency is more sensitive to spatial variation then temporal variation. In this paper, we proposed a spatiotemporal aggregation query method using multi-thread parallel technique based on regional divison and implemented it on the server. Concretely, we divided the spatiotemporal domain into several spatiotemporal cubes, computed spatiotemporal aggregation on all cubes using the technique of multi-thread parallel processing, and then integrated the query results. By testing and analyzing on the real datasets, this method has improved the query speed significantly.

  14. Spatial and symbolic queries for 3D image data

    NASA Astrophysics Data System (ADS)

    Benson, Daniel C.; Zick, Gregory L.

    1992-04-01

    We present a query system for an object-oriented biomedical imaging database containing 3-D anatomical structures and their corresponding 2-D images. The graphical interface facilitates the formation of spatial queries, nonspatial or symbolic queries, and combined spatial/symbolic queries. A query editor is used for the creation and manipulation of 3-D query objects as volumes, surfaces, lines, and points. Symbolic predicates are formulated through a combination of text fields and multiple choice selections. Query results, which may include images, image contents, composite objects, graphics, and alphanumeric data, are displayed in multiple views. Objects returned by the query may be selected directly within the views for further inspection or modification, or for use as query objects in subsequent queries. Our image database query system provides visual feedback and manipulation of spatial query objects, multiple views of volume data, and the ability to combine spatial and symbolic queries. The system allows for incremental enhancement of existing objects and the addition of new objects and spatial relationships. The query system is designed for databases containing symbolic and spatial data. This paper discuses its application to data acquired in biomedical 3- D image reconstruction, but it is applicable to other areas such as CAD/CAM, geographical information systems, and computer vision.

  15. SPARQL Query Re-writing Using Partonomy Based Transformation Rules

    NASA Astrophysics Data System (ADS)

    Jain, Prateek; Yeh, Peter Z.; Verma, Kunal; Henson, Cory A.; Sheth, Amit P.

    Often the information present in a spatial knowledge base is represented at a different level of granularity and abstraction than the query constraints. For querying ontology's containing spatial information, the precise relationships between spatial entities has to be specified in the basic graph pattern of SPARQL query which can result in long and complex queries. We present a novel approach to help users intuitively write SPARQL queries to query spatial data, rather than relying on knowledge of the ontology structure. Our framework re-writes queries, using transformation rules to exploit part-whole relations between geographical entities to address the mismatches between query constraints and knowledge base. Our experiments were performed on completely third party datasets and queries. Evaluations were performed on Geonames dataset using questions from National Geographic Bee serialized into SPARQL and British Administrative Geography Ontology using questions from a popular trivia website. These experiments demonstrate high precision in retrieval of results and ease in writing queries.

  16. Remote sensing and GIS integration: Towards intelligent imagery within a spatial data infrastructure

    NASA Astrophysics Data System (ADS)

    Abdelrahim, Mohamed Mahmoud Hosny

    2001-11-01

    In this research, an "Intelligent Imagery System Prototype" (IISP) was developed. IISP is an integration tool that facilitates the environment for active, direct, and on-the-fly usage of high resolution imagery, internally linked to hidden GIS vector layers, to query the real world phenomena and, consequently, to perform exploratory types of spatial analysis based on a clear/undisturbed image scene. The IISP was designed and implemented using the software components approach to verify the hypothesis that a fully rectified, partially rectified, or even unrectified digital image can be internally linked to a variety of different hidden vector databases/layers covering the end user area of interest, and consequently may be reliably used directly as a base for "on-the-fly" querying of real-world phenomena and for performing exploratory types of spatial analysis. Within IISP, differentially rectified, partially rectified (namely, IKONOS GEOCARTERRA(TM)), and unrectified imagery (namely, scanned aerial photographs and captured video frames) were investigated. The system was designed to handle four types of spatial functions, namely, pointing query, polygon/line-based image query, database query, and buffering. The system was developed using ESRI MapObjects 2.0a as the core spatial component within Visual Basic 6.0. When used to perform the pre-defined spatial queries using different combinations of image and vector data, the IISP provided the same results as those obtained by querying pre-processed vector layers even when the image used was not orthorectified and the vector layers had different parameters. In addition, the real-time pixel location orthorectification technique developed and presented within the IKONOS GEOCARTERRA(TM) case provided a horizontal accuracy (RMSE) of +/- 2.75 metres. This accuracy is very close to the accuracy level obtained when purchasing the orthorectified IKONOS PRECISION products (RMSE of +/- 1.9 metre). The latter cost approximately four times as much as the IKONOS GEOCARTERRA(TM) products. The developed IISP is a step closer towards the direct and active involvement of high-resolution remote sensing imagery in querying the real world and performing exploratory types of spatial analysis. (Abstract shortened by UMI.)

  17. Research on presentation and query service of geo-spatial data based on ontology

    NASA Astrophysics Data System (ADS)

    Li, Hong-wei; Li, Qin-chao; Cai, Chang

    2008-10-01

    The paper analyzed the deficiency on presentation and query of geo-spatial data existed in current GIS, discussed the advantages that ontology possessed in formalization of geo-spatial data and the presentation of semantic granularity, taken land-use classification system as an example to construct domain ontology, and described it by OWL; realized the grade level and category presentation of land-use data benefited from the thoughts of vertical and horizontal navigation; and then discussed query mode of geo-spatial data based on ontology, including data query based on types and grade levels, instances and spatial relation, and synthetic query based on types and instances; these methods enriched query mode of current GIS, and is a useful attempt; point out that the key point of the presentation and query of spatial data based on ontology is to construct domain ontology that can correctly reflect geo-concept and its spatial relation and realize its fine formalization description.

  18. Army technology development. IBIS query. Software to support the Image Based Information System (IBIS) expansion for mapping, charting and geodesy

    NASA Technical Reports Server (NTRS)

    Friedman, S. Z.; Walker, R. E.; Aitken, R. B.

    1986-01-01

    The Image Based Information System (IBIS) has been under development at the Jet Propulsion Laboratory (JPL) since 1975. It is a collection of more than 90 programs that enable processing of image, graphical, tabular data for spatial analysis. IBIS can be utilized to create comprehensive geographic data bases. From these data, an analyst can study various attributes describing characteristics of a given study area. Even complex combinations of disparate data types can be synthesized to obtain a new perspective on spatial phenomena. In 1984, new query software was developed enabling direct Boolean queries of IBIS data bases through the submission of easily understood expressions. An improved syntax methodology, a data dictionary, and display software simplified the analysts' tasks associated with building, executing, and subsequently displaying the results of a query. The primary purpose of this report is to describe the features and capabilities of the new query software. A secondary purpose of this report is to compare this new query software to the query software developed previously (Friedman, 1982). With respect to this topic, the relative merits and drawbacks of both approaches are covered.

  19. Object-Oriented Query Language For Events Detection From Images Sequences

    NASA Astrophysics Data System (ADS)

    Ganea, Ion Eugen

    2015-09-01

    In this paper is presented a method to represent the events extracted from images sequences and the query language used for events detection. Using an object oriented model the spatial and temporal relationships between salient objects and also between events are stored and queried. This works aims to unify the storing and querying phases for video events processing. The object oriented language syntax used for events processing allow the instantiation of the indexes classes in order to improve the accuracy of the query results. The experiments were performed on images sequences provided from sport domain and it shows the reliability and the robustness of the proposed language. To extend the language will be added a specific syntax for constructing the templates for abnormal events and for detection of the incidents as the final goal of the research.

  20. Bin-Hash Indexing: A Parallel Method for Fast Query Processing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bethel, Edward W; Gosink, Luke J.; Wu, Kesheng

    2008-06-27

    This paper presents a new parallel indexing data structure for answering queries. The index, called Bin-Hash, offers extremely high levels of concurrency, and is therefore well-suited for the emerging commodity of parallel processors, such as multi-cores, cell processors, and general purpose graphics processing units (GPU). The Bin-Hash approach first bins the base data, and then partitions and separately stores the values in each bin as a perfect spatial hash table. To answer a query, we first determine whether or not a record satisfies the query conditions based on the bin boundaries. For the bins with records that can not bemore » resolved, we examine the spatial hash tables. The procedures for examining the bin numbers and the spatial hash tables offer the maximum possible level of concurrency; all records are able to be evaluated by our procedure independently in parallel. Additionally, our Bin-Hash procedures access much smaller amounts of data than similar parallel methods, such as the projection index. This smaller data footprint is critical for certain parallel processors, like GPUs, where memory resources are limited. To demonstrate the effectiveness of Bin-Hash, we implement it on a GPU using the data-parallel programming language CUDA. The concurrency offered by the Bin-Hash index allows us to fully utilize the GPU's massive parallelism in our work; over 12,000 records can be simultaneously evaluated at any one time. We show that our new query processing method is an order of magnitude faster than current state-of-the-art CPU-based indexing technologies. Additionally, we compare our performance to existing GPU-based projection index strategies.« less

  1. Spatial Knowledge Infrastructures - Creating Value for Policy Makers and Benefits the Community

    NASA Astrophysics Data System (ADS)

    Arnold, L. M.

    2016-12-01

    The spatial data infrastructure is arguably one of the most significant advancements in the spatial sector. It's been a game changer for governments, providing for the coordination and sharing of spatial data across organisations and the provision of accessible information to the broader community of users. Today however, end-users such as policy-makers require far more from these spatial data infrastructures. They want more than just data; they want the knowledge that can be extracted from data and they don't want to have to download, manipulate and process data in order to get the knowledge they seek. It's time for the spatial sector to reduce its focus on data in spatial data infrastructures and take a more proactive step in emphasising and delivering the knowledge value. Nowadays, decision-makers want to be able to query at will the data to meet their immediate need for knowledge. This is a new value proposal for the decision-making consumer and will require a shift in thinking. This paper presents a model for a Spatial Knowledge Infrastructure and underpinning methods that will realise a new real-time approach to delivering knowledge. The methods embrace the new capabilities afforded through the sematic web, domain and process ontologies and natural query language processing. Semantic Web technologies today have the potential to transform the spatial industry into more than just a distribution channel for data. The Semantic Web RDF (Resource Description Framework) enables meaning to be drawn from data automatically. While pushing data out to end-users will remain a central role for data producers, the power of the semantic web is that end-users have the ability to marshal a broad range of spatial resources via a query to extract knowledge from available data. This can be done without actually having to configure systems specifically for the end-user. All data producers need do is make data accessible in RDF and the spatial analytics does the rest.

  2. Evaluation of Content-Matched Range Monitoring Queries over Moving Objects in Mobile Computing Environments.

    PubMed

    Jung, HaRim; Song, MoonBae; Youn, Hee Yong; Kim, Ung Mo

    2015-09-18

    A content-matched (CM) rangemonitoring query overmoving objects continually retrieves the moving objects (i) whose non-spatial attribute values are matched to given non-spatial query values; and (ii) that are currently located within a given spatial query range. In this paper, we propose a new query indexing structure, called the group-aware query region tree (GQR-tree) for efficient evaluation of CMrange monitoring queries. The primary role of the GQR-tree is to help the server leverage the computational capabilities of moving objects in order to improve the system performance in terms of the wireless communication cost and server workload. Through a series of comprehensive simulations, we verify the superiority of the GQR-tree method over the existing methods.

  3. Research on Extension of Sparql Ontology Query Language Considering the Computation of Indoor Spatial Relations

    NASA Astrophysics Data System (ADS)

    Li, C.; Zhu, X.; Guo, W.; Liu, Y.; Huang, H.

    2015-05-01

    A method suitable for indoor complex semantic query considering the computation of indoor spatial relations is provided According to the characteristics of indoor space. This paper designs ontology model describing the space related information of humans, events and Indoor space objects (e.g. Storey and Room) as well as their relations to meet the indoor semantic query. The ontology concepts are used in IndoorSPARQL query language which extends SPARQL syntax for representing and querying indoor space. And four types specific primitives for indoor query, "Adjacent", "Opposite", "Vertical" and "Contain", are defined as query functions in IndoorSPARQL used to support quantitative spatial computations. Also a method is proposed to analysis the query language. Finally this paper adopts this method to realize indoor semantic query on the study area through constructing the ontology model for the study building. The experimental results show that the method proposed in this paper can effectively support complex indoor space semantic query.

  4. Approximate Algorithms for Computing Spatial Distance Histograms with Accuracy Guarantees

    PubMed Central

    Grupcev, Vladimir; Yuan, Yongke; Tu, Yi-Cheng; Huang, Jin; Chen, Shaoping; Pandit, Sagar; Weng, Michael

    2014-01-01

    Particle simulation has become an important research tool in many scientific and engineering fields. Data generated by such simulations impose great challenges to database storage and query processing. One of the queries against particle simulation data, the spatial distance histogram (SDH) query, is the building block of many high-level analytics, and requires quadratic time to compute using a straightforward algorithm. Previous work has developed efficient algorithms that compute exact SDHs. While beating the naive solution, such algorithms are still not practical in processing SDH queries against large-scale simulation data. In this paper, we take a different path to tackle this problem by focusing on approximate algorithms with provable error bounds. We first present a solution derived from the aforementioned exact SDH algorithm, and this solution has running time that is unrelated to the system size N. We also develop a mathematical model to analyze the mechanism that leads to errors in the basic approximate algorithm. Our model provides insights on how the algorithm can be improved to achieve higher accuracy and efficiency. Such insights give rise to a new approximate algorithm with improved time/accuracy tradeoff. Experimental results confirm our analysis. PMID:24693210

  5. Evaluation of Content-Matched Range Monitoring Queries over Moving Objects in Mobile Computing Environments

    PubMed Central

    Jung, HaRim; Song, MoonBae; Youn, Hee Yong; Kim, Ung Mo

    2015-01-01

    A content-matched (CM) range monitoring query over moving objects continually retrieves the moving objects (i) whose non-spatial attribute values are matched to given non-spatial query values; and (ii) that are currently located within a given spatial query range. In this paper, we propose a new query indexing structure, called the group-aware query region tree (GQR-tree) for efficient evaluation of CM range monitoring queries. The primary role of the GQR-tree is to help the server leverage the computational capabilities of moving objects in order to improve the system performance in terms of the wireless communication cost and server workload. Through a series of comprehensive simulations, we verify the superiority of the GQR-tree method over the existing methods. PMID:26393613

  6. Rasdaman for Big Spatial Raster Data

    NASA Astrophysics Data System (ADS)

    Hu, F.; Huang, Q.; Scheele, C. J.; Yang, C. P.; Yu, M.; Liu, K.

    2015-12-01

    Spatial raster data have grown exponentially over the past decade. Recent advancements on data acquisition technology, such as remote sensing, have allowed us to collect massive observation data of various spatial resolution and domain coverage. The volume, velocity, and variety of such spatial data, along with the computational intensive nature of spatial queries, pose grand challenge to the storage technologies for effective big data management. While high performance computing platforms (e.g., cloud computing) can be used to solve the computing-intensive issues in big data analysis, data has to be managed in a way that is suitable for distributed parallel processing. Recently, rasdaman (raster data manager) has emerged as a scalable and cost-effective database solution to store and retrieve massive multi-dimensional arrays, such as sensor, image, and statistics data. Within this paper, the pros and cons of using rasdaman to manage and query spatial raster data will be examined and compared with other common approaches, including file-based systems, relational databases (e.g., PostgreSQL/PostGIS), and NoSQL databases (e.g., MongoDB and Hive). Earth Observing System (EOS) data collected from NASA's Atmospheric Scientific Data Center (ASDC) will be used and stored in these selected database systems, and a set of spatial and non-spatial queries will be designed to benchmark their performance on retrieving large-scale, multi-dimensional arrays of EOS data. Lessons learnt from using rasdaman will be discussed as well.

  7. Indexing and retrieving point and region objects

    NASA Astrophysics Data System (ADS)

    Ibrahim, Azzam T.; Fotouhi, Farshad A.

    1996-03-01

    R-tree and its variants are examples of spatial data structures for paged-secondary memory. To process a query, these structures require multiple path traversals. In this paper, we present a new image access method, SB+-tree which requires a single path traversal to process a query. Also, SB+-tree will allow commercial databases an access method for spatial objects without a major change, since most commercial databases already support B+-tree as an access method for text data. The SB+-tree can be used for zero and non-zero size data objects. Non-zero size objects are approximated by their minimum bounding rectangles (MBRs). The number of SB+-trees generated is dependent upon the number of dimensions of the approximation of the object. The structure supports efficient spatial operations such as regions-overlap, distance and direction. In this paper, we experimentally and analytically demonstrate the superiority of SB+-tree over R-tree.

  8. Content-based retrieval of historical Ottoman documents stored as textual images.

    PubMed

    Saykol, Ediz; Sinop, Ali Kemal; Güdükbay, Ugur; Ulusoy, Ozgür; Cetin, A Enis

    2004-03-01

    There is an accelerating demand to access the visual content of documents stored in historical and cultural archives. Availability of electronic imaging tools and effective image processing techniques makes it feasible to process the multimedia data in large databases. In this paper, a framework for content-based retrieval of historical documents in the Ottoman Empire archives is presented. The documents are stored as textual images, which are compressed by constructing a library of symbols occurring in a document, and the symbols in the original image are then replaced with pointers into the codebook to obtain a compressed representation of the image. The features in wavelet and spatial domain based on angular and distance span of shapes are used to extract the symbols. In order to make content-based retrieval in historical archives, a query is specified as a rectangular region in an input image and the same symbol-extraction process is applied to the query region. The queries are processed on the codebook of documents and the query images are identified in the resulting documents using the pointers in textual images. The querying process does not require decompression of images. The new content-based retrieval framework is also applicable to many other document archives using different scripts.

  9. Modeling and query the uncertainty of network constrained moving objects based on RFID data

    NASA Astrophysics Data System (ADS)

    Han, Liang; Xie, Kunqing; Ma, Xiujun; Song, Guojie

    2007-06-01

    The management of network constrained moving objects is more and more practical, especially in intelligent transportation system. In the past, the location information of moving objects on network is collected by GPS, which cost high and has the problem of frequent update and privacy. The RFID (Radio Frequency IDentification) devices are used more and more widely to collect the location information. They are cheaper and have less update. And they interfere in the privacy less. They detect the id of the object and the time when moving object passed by the node of the network. They don't detect the objects' exact movement in side the edge, which lead to a problem of uncertainty. How to modeling and query the uncertainty of the network constrained moving objects based on RFID data becomes a research issue. In this paper, a model is proposed to describe the uncertainty of network constrained moving objects. A two level index is presented to provide efficient access to the network and the data of movement. The processing of imprecise time-slice query and spatio-temporal range query are studied in this paper. The processing includes four steps: spatial filter, spatial refinement, temporal filter and probability calculation. Finally, some experiments are done based on the simulated data. In the experiments the performance of the index is studied. The precision and recall of the result set are defined. And how the query arguments affect the precision and recall of the result set is also discussed.

  10. An Efficient Method for the Retrieval of Objects by Topological Relations in Spatial Database Systems.

    ERIC Educational Resources Information Center

    Lin, P. L.; Tan, W. H.

    2003-01-01

    Presents a new method to improve the performance of query processing in a spatial database. Experiments demonstrated that performance of database systems can be improved because both the number of objects accessed and number of objects requiring detailed inspection are much less than those in the previous approach. (AEF)

  11. Multidimensional indexing structure for use with linear optimization queries

    NASA Technical Reports Server (NTRS)

    Bergman, Lawrence David (Inventor); Castelli, Vittorio (Inventor); Chang, Yuan-Chi (Inventor); Li, Chung-Sheng (Inventor); Smith, John Richard (Inventor)

    2002-01-01

    Linear optimization queries, which usually arise in various decision support and resource planning applications, are queries that retrieve top N data records (where N is an integer greater than zero) which satisfy a specific optimization criterion. The optimization criterion is to either maximize or minimize a linear equation. The coefficients of the linear equation are given at query time. Methods and apparatus are disclosed for constructing, maintaining and utilizing a multidimensional indexing structure of database records to improve the execution speed of linear optimization queries. Database records with numerical attributes are organized into a number of layers and each layer represents a geometric structure called convex hull. Such linear optimization queries are processed by searching from the outer-most layer of this multi-layer indexing structure inwards. At least one record per layer will satisfy the query criterion and the number of layers needed to be searched depends on the spatial distribution of records, the query-issued linear coefficients, and N, the number of records to be returned. When N is small compared to the total size of the database, answering the query typically requires searching only a small fraction of all relevant records, resulting in a tremendous speedup as compared to linearly scanning the entire dataset.

  12. Astronomical Data Processing Using SciQL, an SQL Based Query Language for Array Data

    NASA Astrophysics Data System (ADS)

    Zhang, Y.; Scheers, B.; Kersten, M.; Ivanova, M.; Nes, N.

    2012-09-01

    SciQL (pronounced as ‘cycle’) is a novel SQL-based array query language for scientific applications with both tables and arrays as first class citizens. SciQL lowers the entrance fee of adopting relational DBMS (RDBMS) in scientific domains, because it includes functionality often only found in mathematics software packages. In this paper, we demonstrate the usefulness of SciQL for astronomical data processing using examples from the Transient Key Project of the LOFAR radio telescope. In particular, how the LOFAR light-curve database of all detected sources can be constructed, by correlating sources across the spatial, frequency, time and polarisation domains.

  13. Location Prediction Based on Transition Probability Matrices Constructing from Sequential Rules for Spatial-Temporal K-Anonymity Dataset

    PubMed Central

    Liu, Zhao; Zhu, Yunhong; Wu, Chenxue

    2016-01-01

    Spatial-temporal k-anonymity has become a mainstream approach among techniques for protection of users’ privacy in location-based services (LBS) applications, and has been applied to several variants such as LBS snapshot queries and continuous queries. Analyzing large-scale spatial-temporal anonymity sets may benefit several LBS applications. In this paper, we propose two location prediction methods based on transition probability matrices constructing from sequential rules for spatial-temporal k-anonymity dataset. First, we define single-step sequential rules mined from sequential spatial-temporal k-anonymity datasets generated from continuous LBS queries for multiple users. We then construct transition probability matrices from mined single-step sequential rules, and normalize the transition probabilities in the transition matrices. Next, we regard a mobility model for an LBS requester as a stationary stochastic process and compute the n-step transition probability matrices by raising the normalized transition probability matrices to the power n. Furthermore, we propose two location prediction methods: rough prediction and accurate prediction. The former achieves the probabilities of arriving at target locations along simple paths those include only current locations, target locations and transition steps. By iteratively combining the probabilities for simple paths with n steps and the probabilities for detailed paths with n-1 steps, the latter method calculates transition probabilities for detailed paths with n steps from current locations to target locations. Finally, we conduct extensive experiments, and correctness and flexibility of our proposed algorithm have been verified. PMID:27508502

  14. Genomes as geography: using GIS technology to build interactive genome feature maps

    PubMed Central

    Dolan, Mary E; Holden, Constance C; Beard, M Kate; Bult, Carol J

    2006-01-01

    Background Many commonly used genome browsers display sequence annotations and related attributes as horizontal data tracks that can be toggled on and off according to user preferences. Most genome browsers use only simple keyword searches and limit the display of detailed annotations to one chromosomal region of the genome at a time. We have employed concepts, methodologies, and tools that were developed for the display of geographic data to develop a Genome Spatial Information System (GenoSIS) for displaying genomes spatially, and interacting with genome annotations and related attribute data. In contrast to the paradigm of horizontally stacked data tracks used by most genome browsers, GenoSIS uses the concept of registered spatial layers composed of spatial objects for integrated display of diverse data. In addition to basic keyword searches, GenoSIS supports complex queries, including spatial queries, and dynamically generates genome maps. Our adaptation of the geographic information system (GIS) model in a genome context supports spatial representation of genome features at multiple scales with a versatile and expressive query capability beyond that supported by existing genome browsers. Results We implemented an interactive genome sequence feature map for the mouse genome in GenoSIS, an application that uses ArcGIS, a commercially available GIS software system. The genome features and their attributes are represented as spatial objects and data layers that can be toggled on and off according to user preferences or displayed selectively in response to user queries. GenoSIS supports the generation of custom genome maps in response to complex queries about genome features based on both their attributes and locations. Our example application of GenoSIS to the mouse genome demonstrates the powerful visualization and query capability of mature GIS technology applied in a novel domain. Conclusion Mapping tools developed specifically for geographic data can be exploited to display, explore and interact with genome data. The approach we describe here is organism independent and is equally useful for linear and circular chromosomes. One of the unique capabilities of GenoSIS compared to existing genome browsers is the capacity to generate genome feature maps dynamically in response to complex attribute and spatial queries. PMID:16984652

  15. Innovations in individual feature history management - The significance of feature-based temporal model

    USGS Publications Warehouse

    Choi, J.; Seong, J.C.; Kim, B.; Usery, E.L.

    2008-01-01

    A feature relies on three dimensions (space, theme, and time) for its representation. Even though spatiotemporal models have been proposed, they have principally focused on the spatial changes of a feature. In this paper, a feature-based temporal model is proposed to represent the changes of both space and theme independently. The proposed model modifies the ISO's temporal schema and adds new explicit temporal relationship structure that stores temporal topological relationship with the ISO's temporal primitives of a feature in order to keep track feature history. The explicit temporal relationship can enhance query performance on feature history by removing topological comparison during query process. Further, a prototype system has been developed to test a proposed feature-based temporal model by querying land parcel history in Athens, Georgia. The result of temporal query on individual feature history shows the efficiency of the explicit temporal relationship structure. ?? Springer Science+Business Media, LLC 2007.

  16. A Lightweight I/O Scheme to Facilitate Spatial and Temporal Queries of Scientific Data Analytics

    NASA Technical Reports Server (NTRS)

    Tian, Yuan; Liu, Zhuo; Klasky, Scott; Wang, Bin; Abbasi, Hasan; Zhou, Shujia; Podhorszki, Norbert; Clune, Tom; Logan, Jeremy; Yu, Weikuan

    2013-01-01

    In the era of petascale computing, more scientific applications are being deployed on leadership scale computing platforms to enhance the scientific productivity. Many I/O techniques have been designed to address the growing I/O bottleneck on large-scale systems by handling massive scientific data in a holistic manner. While such techniques have been leveraged in a wide range of applications, they have not been shown as adequate for many mission critical applications, particularly in data post-processing stage. One of the examples is that some scientific applications generate datasets composed of a vast amount of small data elements that are organized along many spatial and temporal dimensions but require sophisticated data analytics on one or more dimensions. Including such dimensional knowledge into data organization can be beneficial to the efficiency of data post-processing, which is often missing from exiting I/O techniques. In this study, we propose a novel I/O scheme named STAR (Spatial and Temporal AggRegation) to enable high performance data queries for scientific analytics. STAR is able to dive into the massive data, identify the spatial and temporal relationships among data variables, and accordingly organize them into an optimized multi-dimensional data structure before storing to the storage. This technique not only facilitates the common access patterns of data analytics, but also further reduces the application turnaround time. In particular, STAR is able to enable efficient data queries along the time dimension, a practice common in scientific analytics but not yet supported by existing I/O techniques. In our case study with a critical climate modeling application GEOS-5, the experimental results on Jaguar supercomputer demonstrate an improvement up to 73 times for the read performance compared to the original I/O method.

  17. Extended Maptree: a Representation of Fine-Grained Topology and Spatial Hierarchy of Bim

    NASA Astrophysics Data System (ADS)

    Wu, Y.; Shang, J.; Hu, X.; Zhou, Z.

    2017-09-01

    Spatial queries play significant roles in exchanging Building Information Modeling (BIM) data and integrating BIM with indoor spatial information. However, topological operators implemented for BIM spatial queries are limited to qualitative relations (e.g. touching, intersecting). To overcome this limitation, we propose an extended maptree model to represent the fine-grained topology and spatial hierarchy of indoor spaces. The model is based on a maptree which consists of combinatorial maps and an adjacency tree. Topological relations (e.g., adjacency, incidence, and covering) derived from BIM are represented explicitly and formally by extended maptrees, which can facilitate the spatial queries of BIM. To construct an extended maptree, we first use a solid model represented by vertical extrusion and boundary representation to generate the isolated 3-cells of combinatorial maps. Then, the spatial relationships defined in IFC are used to sew them together. Furthermore, the incremental edges of extended maptrees are labeled as removed 2-cells. Based on this, we can merge adjacent 3-cells according to the spatial hierarchy of IFC.

  18. A high-performance spatial database based approach for pathology imaging algorithm evaluation

    PubMed Central

    Wang, Fusheng; Kong, Jun; Gao, Jingjing; Cooper, Lee A.D.; Kurc, Tahsin; Zhou, Zhengwen; Adler, David; Vergara-Niedermayr, Cristobal; Katigbak, Bryan; Brat, Daniel J.; Saltz, Joel H.

    2013-01-01

    Background: Algorithm evaluation provides a means to characterize variability across image analysis algorithms, validate algorithms by comparison with human annotations, combine results from multiple algorithms for performance improvement, and facilitate algorithm sensitivity studies. The sizes of images and image analysis results in pathology image analysis pose significant challenges in algorithm evaluation. We present an efficient parallel spatial database approach to model, normalize, manage, and query large volumes of analytical image result data. This provides an efficient platform for algorithm evaluation. Our experiments with a set of brain tumor images demonstrate the application, scalability, and effectiveness of the platform. Context: The paper describes an approach and platform for evaluation of pathology image analysis algorithms. The platform facilitates algorithm evaluation through a high-performance database built on the Pathology Analytic Imaging Standards (PAIS) data model. Aims: (1) Develop a framework to support algorithm evaluation by modeling and managing analytical results and human annotations from pathology images; (2) Create a robust data normalization tool for converting, validating, and fixing spatial data from algorithm or human annotations; (3) Develop a set of queries to support data sampling and result comparisons; (4) Achieve high performance computation capacity via a parallel data management infrastructure, parallel data loading and spatial indexing optimizations in this infrastructure. Materials and Methods: We have considered two scenarios for algorithm evaluation: (1) algorithm comparison where multiple result sets from different methods are compared and consolidated; and (2) algorithm validation where algorithm results are compared with human annotations. We have developed a spatial normalization toolkit to validate and normalize spatial boundaries produced by image analysis algorithms or human annotations. The validated data were formatted based on the PAIS data model and loaded into a spatial database. To support efficient data loading, we have implemented a parallel data loading tool that takes advantage of multi-core CPUs to accelerate data injection. The spatial database manages both geometric shapes and image features or classifications, and enables spatial sampling, result comparison, and result aggregation through expressive structured query language (SQL) queries with spatial extensions. To provide scalable and efficient query support, we have employed a shared nothing parallel database architecture, which distributes data homogenously across multiple database partitions to take advantage of parallel computation power and implements spatial indexing to achieve high I/O throughput. Results: Our work proposes a high performance, parallel spatial database platform for algorithm validation and comparison. This platform was evaluated by storing, managing, and comparing analysis results from a set of brain tumor whole slide images. The tools we develop are open source and available to download. Conclusions: Pathology image algorithm validation and comparison are essential to iterative algorithm development and refinement. One critical component is the support for queries involving spatial predicates and comparisons. In our work, we develop an efficient data model and parallel database approach to model, normalize, manage and query large volumes of analytical image result data. Our experiments demonstrate that the data partitioning strategy and the grid-based indexing result in good data distribution across database nodes and reduce I/O overhead in spatial join queries through parallel retrieval of relevant data and quick subsetting of datasets. The set of tools in the framework provide a full pipeline to normalize, load, manage and query analytical results for algorithm evaluation. PMID:23599905

  19. A geo-spatial data management system for potentially active volcanoes—GEOWARN project

    NASA Astrophysics Data System (ADS)

    Gogu, Radu C.; Dietrich, Volker J.; Jenny, Bernhard; Schwandner, Florian M.; Hurni, Lorenz

    2006-02-01

    Integrated studies of active volcanic systems for the purpose of long-term monitoring and forecast and short-term eruption prediction require large numbers of data-sets from various disciplines. A modern database concept has been developed for managing and analyzing multi-disciplinary volcanological data-sets. The GEOWARN project (choosing the "Kos-Yali-Nisyros-Tilos volcanic field, Greece" and the "Campi Flegrei, Italy" as test sites) is oriented toward potentially active volcanoes situated in regions of high geodynamic unrest. This article describes the volcanological database of the spatial and temporal data acquired within the GEOWARN project. As a first step, a spatial database embedded in a Geographic Information System (GIS) environment was created. Digital data of different spatial resolution, and time-series data collected at different intervals or periods, were unified in a common, four-dimensional representation of space and time. The database scheme comprises various information layers containing geographic data (e.g. seafloor and land digital elevation model, satellite imagery, anthropogenic structures, land-use), geophysical data (e.g. from active and passive seismicity, gravity, tomography, SAR interferometry, thermal imagery, differential GPS), geological data (e.g. lithology, structural geology, oceanography), and geochemical data (e.g. from hydrothermal fluid chemistry and diffuse degassing features). As a second step based on the presented database, spatial data analysis has been performed using custom-programmed interfaces that execute query scripts resulting in a graphical visualization of data. These query tools were designed and compiled following scenarios of known "behavior" patterns of dormant volcanoes and first candidate signs of potential unrest. The spatial database and query approach is intended to facilitate scientific research on volcanic processes and phenomena, and volcanic surveillance.

  20. Spatial Query for Planetary Data

    NASA Technical Reports Server (NTRS)

    Shams, Khawaja S.; Crockett, Thomas M.; Powell, Mark W.; Joswig, Joseph C.; Fox, Jason M.

    2011-01-01

    Science investigators need to quickly and effectively assess past observations of specific locations on a planetary surface. This innovation involves a location-based search technology that was adapted and applied to planetary science data to support a spatial query capability for mission operations software. High-performance location-based searching requires the use of spatial data structures for database organization. Spatial data structures are designed to organize datasets based on their coordinates in a way that is optimized for location-based retrieval. The particular spatial data structure that was adapted for planetary data search is the R+ tree.

  1. What Is Spatio-Temporal Data Warehousing?

    NASA Astrophysics Data System (ADS)

    Vaisman, Alejandro; Zimányi, Esteban

    In the last years, extending OLAP (On-Line Analytical Processing) systems with spatial and temporal features has attracted the attention of the GIS (Geographic Information Systems) and database communities. However, there is no a commonly agreed definition of what is a spatio-temporal data warehouse and what functionality such a data warehouse should support. Further, the solutions proposed in the literature vary considerably in the kind of data that can be represented as well as the kind of queries that can be expressed. In this paper we present a conceptual framework for defining spatio-temporal data warehouses using an extensible data type system. We also define a taxonomy of different classes of queries of increasing expressive power, and show how to express such queries using an extension of the tuple relational calculus with aggregated functions.

  2. Geographic Video 3d Data Model And Retrieval

    NASA Astrophysics Data System (ADS)

    Han, Z.; Cui, C.; Kong, Y.; Wu, H.

    2014-04-01

    Geographic video includes both spatial and temporal geographic features acquired through ground-based or non-ground-based cameras. With the popularity of video capture devices such as smartphones, the volume of user-generated geographic video clips has grown significantly and the trend of this growth is quickly accelerating. Such a massive and increasing volume poses a major challenge to efficient video management and query. Most of the today's video management and query techniques are based on signal level content extraction. They are not able to fully utilize the geographic information of the videos. This paper aimed to introduce a geographic video 3D data model based on spatial information. The main idea of the model is to utilize the location, trajectory and azimuth information acquired by sensors such as GPS receivers and 3D electronic compasses in conjunction with video contents. The raw spatial information is synthesized to point, line, polygon and solid according to the camcorder parameters such as focal length and angle of view. With the video segment and video frame, we defined the three categories geometry object using the geometry model of OGC Simple Features Specification for SQL. We can query video through computing the spatial relation between query objects and three categories geometry object such as VFLocation, VSTrajectory, VSFOView and VFFovCone etc. We designed the query methods using the structured query language (SQL) in detail. The experiment indicate that the model is a multiple objective, integration, loosely coupled, flexible and extensible data model for the management of geographic stereo video.

  3. The Grid File: A Data Structure Designed to Support Proximity Queries on Spatial Objects.

    DTIC Science & Technology

    1983-06-01

    dimensional space. The technique to be presented for storing spatial objects works for any choice of parameters by which * simple objects can be represented...However, depending on characteristics of the data to be processed , some choices of parameters are better than others. Let us discuss some...considerations that may determine the choice of parameters. 1) istinction between lmaerba peuwuers ad extensiem prwuneert For some clasm of simple objects It

  4. KBGIS-II: A knowledge-based geographic information system

    NASA Technical Reports Server (NTRS)

    Smith, Terence; Peuquet, Donna; Menon, Sudhakar; Agarwal, Pankaj

    1986-01-01

    The architecture and working of a recently implemented Knowledge-Based Geographic Information System (KBGIS-II), designed to satisfy several general criteria for the GIS, is described. The system has four major functions including query-answering, learning and editing. The main query finds constrained locations for spatial objects that are describable in a predicate-calculus based spatial object language. The main search procedures include a family of constraint-satisfaction procedures that use a spatial object knowledge base to search efficiently for complex spatial objects in large, multilayered spatial data bases. These data bases are represented in quadtree form. The search strategy is designed to reduce the computational cost of search in the average case. The learning capabilities of the system include the addition of new locations of complex spatial objects to the knowledge base as queries are answered, and the ability to learn inductively definitions of new spatial objects from examples. The new definitions are added to the knowledge base by the system. The system is performing all its designated tasks successfully. Future reports will relate performance characteristics of the system.

  5. Spatial Processes in Linear Ordering

    ERIC Educational Resources Information Center

    von Hecker, Ulrich; Klauer, Karl Christoph; Wolf, Lukas; Fazilat-Pour, Masoud

    2016-01-01

    Memory performance in linear order reasoning tasks (A > B, B > C, C > D, etc.) shows quicker, and more accurate responses to queries on wider (AD) than narrower (AB) pairs on a hypothetical linear mental model (A -- B -- C -- D). While indicative of an analogue representation, research so far did not provide positive evidence for spatial…

  6. Geospatial data sharing, online spatial analysis and processing of Indian Biodiversity data in Internet GIS domain - A case study for raster based online geo-processing

    NASA Astrophysics Data System (ADS)

    Karnatak, H.; Pandey, K.; Oberai, K.; Roy, A.; Joshi, D.; Singh, H.; Raju, P. L. N.; Krishna Murthy, Y. V. N.

    2014-11-01

    National Biodiversity Characterization at Landscape Level, a project jointly sponsored by Department of Biotechnology and Department of Space, was implemented to identify and map the potential biodiversity rich areas in India. This project has generated spatial information at three levels viz. Satellite based primary information (Vegetation Type map, spatial locations of road & village, Fire occurrence); geospatially derived or modelled information (Disturbance Index, Fragmentation, Biological Richness) and geospatially referenced field samples plots. The study provides information of high disturbance and high biological richness areas suggesting future management strategies and formulating action plans. The study has generated for the first time baseline database in India which will be a valuable input towards climate change study in the Indian Subcontinent. The spatial data generated during the study is organized as central data repository in Geo-RDBMS environment using PostgreSQL and POSTGIS. The raster and vector data is published as OGC WMS and WFS standard for development of web base geoinformation system using Service Oriented Architecture (SOA). The WMS and WFS based system allows geo-visualization, online query and map outputs generation based on user request and response. This is a typical mashup architecture based geo-information system which allows access to remote web services like ISRO Bhuvan, Openstreet map, Google map etc., with overlay on Biodiversity data for effective study on Bio-resources. The spatial queries and analysis with vector data is achieved through SQL queries on POSTGIS and WFS-T operations. But the most important challenge is to develop a system for online raster based geo-spatial analysis and processing based on user defined Area of Interest (AOI) for large raster data sets. The map data of this study contains approximately 20 GB of size for each data layer which are five in number. An attempt has been to develop system using python, PostGIS and PHP for raster data analysis over the web for Biodiversity conservation and prioritization. The developed system takes inputs from users as WKT, Openlayer based Polygon geometry and Shape file upload as AOI to perform raster based operation using Python and GDAL/OGR. The intermediate products are stored in temporary files and tables which generate XML outputs for web representation. The raster operations like clip-zip-ship, class wise area statistics, single to multi-layer operations, diagrammatic representation and other geo-statistical analysis are performed. This is indigenous geospatial data processing engine developed using Open system architecture for spatial analysis of Biodiversity data sets in Internet GIS environment. The performance of this applications in multi-user environment like Internet domain is another challenging task which is addressed by fine tuning the source code, server hardening, spatial indexing and running the process in load balance mode. The developed system is hosted in Internet domain (http://bis.iirs.gov.in) for user access.

  7. Spatial Relation Predicates in Topographic Feature Semantics

    USGS Publications Warehouse

    Varanka, Dalia E.; Caro, Holly K.

    2013-01-01

    Topographic data are designed and widely used for base maps of diverse applications, yet the power of these information sources largely relies on the interpretive skills of map readers and relational database expert users once the data are in map or geographic information system (GIS) form. Advances in geospatial semantic technology offer data model alternatives for explicating concepts and articulating complex data queries and statements. To understand and enrich the vocabulary of topographic feature properties for semantic technology, English language spatial relation predicates were analyzed in three standard topographic feature glossaries. The analytical approach drew from disciplinary concepts in geography, linguistics, and information science. Five major classes of spatial relation predicates were identified from the analysis; representations for most of these are not widely available. The classes are: part-whole (which are commonly modeled throughout semantic and linked-data networks), geometric, processes, human intention, and spatial prepositions. These are commonly found in the ‘real world’ and support the environmental science basis for digital topographical mapping. The spatial relation concepts are based on sets of relation terms presented in this chapter, though these lists are not prescriptive or exhaustive. The results of this study make explicit the concepts forming a broad set of spatial relation expressions, which in turn form the basis for expanding the range of possible queries for topographical data analysis and mapping.

  8. SkyQuery - A Prototype Distributed Query and Cross-Matching Web Service for the Virtual Observatory

    NASA Astrophysics Data System (ADS)

    Thakar, A. R.; Budavari, T.; Malik, T.; Szalay, A. S.; Fekete, G.; Nieto-Santisteban, M.; Haridas, V.; Gray, J.

    2002-12-01

    We have developed a prototype distributed query and cross-matching service for the VO community, called SkyQuery, which is implemented with hierarchichal Web Services. SkyQuery enables astronomers to run combined queries on existing distributed heterogeneous astronomy archives. SkyQuery provides a simple, user-friendly interface to run distributed queries over the federation of registered astronomical archives in the VO. The SkyQuery client connects to the portal Web Service, which farms the query out to the individual archives, which are also Web Services called SkyNodes. The cross-matching algorithm is run recursively on each SkyNode. Each archive is a relational DBMS with a HTM index for fast spatial lookups. The results of the distributed query are returned as an XML DataSet that is automatically rendered by the client. SkyQuery also returns the image cutout corresponding to the query result. SkyQuery finds not only matches between the various catalogs, but also dropouts - objects that exist in some of the catalogs but not in others. This is often as important as finding matches. We demonstrate the utility of SkyQuery with a brown-dwarf search between SDSS and 2MASS, and a search for radio-quiet quasars in SDSS, 2MASS and FIRST. The importance of a service like SkyQuery for the worldwide astronomical community cannot be overstated: data on the same objects in various archives is mapped in different wavelength ranges and looks very different due to different errors, instrument sensitivities and other peculiarities of each archive. Our cross-matching algorithm preforms a fuzzy spatial join across multiple catalogs. This type of cross-matching is currently often done by eye, one object at a time. A static cross-identification table for a set of archives would become obsolete by the time it was built - the exponential growth of astronomical data means that a dynamic cross-identification mechanism like SkyQuery is the only viable option. SkyQuery was funded by a grant from the NASA AISR program.

  9. DownscaleConcept 2.3 User Manual. Downscaled, Spatially Distributed Soil Moisture Calculator

    DTIC Science & Technology

    2011-01-01

    be first presented with the dataset 28 results to your query. From this page, check the box next to the ASTER GDEM dataset and press the "List...information for verification. No charge will be associated with GDEM data archives. 14. Select "Submit Order Now!" to process your order. 15. Wait for

  10. PlanetServer: Innovative approaches for the online analysis of hyperspectral satellite data from Mars

    NASA Astrophysics Data System (ADS)

    Oosthoek, J. H. P.; Flahaut, J.; Rossi, A. P.; Baumann, P.; Misev, D.; Campalani, P.; Unnithan, V.

    2014-06-01

    PlanetServer is a WebGIS system, currently under development, enabling the online analysis of Compact Reconnaissance Imaging Spectrometer (CRISM) hyperspectral data from Mars. It is part of the EarthServer project which builds infrastructure for online access and analysis of huge Earth Science datasets. Core functionality consists of the rasdaman Array Database Management System (DBMS) for storage, and the Open Geospatial Consortium (OGC) Web Coverage Processing Service (WCPS) for data querying. Various WCPS queries have been designed to access spatial and spectral subsets of the CRISM data. The client WebGIS, consisting mainly of the OpenLayers javascript library, uses these queries to enable online spatial and spectral analysis. Currently the PlanetServer demonstration consists of two CRISM Full Resolution Target (FRT) observations, surrounding the NASA Curiosity rover landing site. A detailed analysis of one of these observations is performed in the Case Study section. The current PlanetServer functionality is described step by step, and is tested by focusing on detecting mineralogical evidence described in earlier Gale crater studies. Both the PlanetServer methodology and its possible use for mineralogical studies will be further discussed. Future work includes batch ingestion of CRISM data and further development of the WebGIS and analysis tools.

  11. Semantic-based surveillance video retrieval.

    PubMed

    Hu, Weiming; Xie, Dan; Fu, Zhouyu; Zeng, Wenrong; Maybank, Steve

    2007-04-01

    Visual surveillance produces large amounts of video data. Effective indexing and retrieval from surveillance video databases are very important. Although there are many ways to represent the content of video clips in current video retrieval algorithms, there still exists a semantic gap between users and retrieval systems. Visual surveillance systems supply a platform for investigating semantic-based video retrieval. In this paper, a semantic-based video retrieval framework for visual surveillance is proposed. A cluster-based tracking algorithm is developed to acquire motion trajectories. The trajectories are then clustered hierarchically using the spatial and temporal information, to learn activity models. A hierarchical structure of semantic indexing and retrieval of object activities, where each individual activity automatically inherits all the semantic descriptions of the activity model to which it belongs, is proposed for accessing video clips and individual objects at the semantic level. The proposed retrieval framework supports various queries including queries by keywords, multiple object queries, and queries by sketch. For multiple object queries, succession and simultaneity restrictions, together with depth and breadth first orders, are considered. For sketch-based queries, a method for matching trajectories drawn by users to spatial trajectories is proposed. The effectiveness and efficiency of our framework are tested in a crowded traffic scene.

  12. KBGIS-2: A knowledge-based geographic information system

    NASA Technical Reports Server (NTRS)

    Smith, T.; Peuquet, D.; Menon, S.; Agarwal, P.

    1986-01-01

    The architecture and working of a recently implemented knowledge-based geographic information system (KBGIS-2) that was designed to satisfy several general criteria for the geographic information system are described. The system has four major functions that include query-answering, learning, and editing. The main query finds constrained locations for spatial objects that are describable in a predicate-calculus based spatial objects language. The main search procedures include a family of constraint-satisfaction procedures that use a spatial object knowledge base to search efficiently for complex spatial objects in large, multilayered spatial data bases. These data bases are represented in quadtree form. The search strategy is designed to reduce the computational cost of search in the average case. The learning capabilities of the system include the addition of new locations of complex spatial objects to the knowledge base as queries are answered, and the ability to learn inductively definitions of new spatial objects from examples. The new definitions are added to the knowledge base by the system. The system is currently performing all its designated tasks successfully, although currently implemented on inadequate hardware. Future reports will detail the performance characteristics of the system, and various new extensions are planned in order to enhance the power of KBGIS-2.

  13. Age-related differences in the use of spatial and categorical relationships in a visuo-spatial working memory task.

    PubMed

    Dai, Ruizhi; Thomas, Ayanna K; Taylor, Holly A

    2018-01-30

    Research examining object identity and location processing in visuo-spatial working memory (VSWM) has yielded inconsistent results on whether age differences exist in VSWM. The present study investigated whether these inconsistencies may stem from age-related differences in VSWM sub-processes, and whether processing of component VSWM information can be facilitated. In two experiments, younger and older adults studied 5 × 5 grids containing five objects in separate locations. In a continuous recognition paradigm, participants were tested on memory for object identity, location, or identity and location information combined. Spatial and categorical relationships were manipulated within grids to provide trial-level facilitation. In Experiment 1, randomizing trial types (location, identity, combination) assured that participants could not predict the information that would be queried. In Experiment 2, blocking trials by type encouraged strategic processing. Thus, we manipulated the nature of the task through object categorical relationship and spatial organization, and trial blocking. Our findings support age-related declines in VSWM. Additionally, grid organizations (categorical and spatial relationships), and trial blocking differentially affected younger and older adults. Younger adults used spatial organizations more effectively whereas older adults demonstrated an association bias. Our finding also suggests that older adults may be less efficient than younger adults in strategically engaging information processing.

  14. Random and Directed Walk-Based Top-k Queries in Wireless Sensor Networks

    PubMed Central

    Fu, Jun-Song; Liu, Yun

    2015-01-01

    In wireless sensor networks, filter-based top-k query approaches are the state-of-the-art solutions and have been extensively researched in the literature, however, they are very sensitive to the network parameters, including the size of the network, dynamics of the sensors’ readings and declines in the overall range of all the readings. In this work, a random walk-based top-k query approach called RWTQ and a directed walk-based top-k query approach called DWTQ are proposed. At the beginning of a top-k query, one or several tokens are sent to the specific node(s) in the network by the base station. Then, each token walks in the network independently to record and process the readings in a random or directed way. A strategy of choosing the “right” way in DWTQ is carefully designed for the token(s) to arrive at the high-value regions as soon as possible. When designing the walking strategy for DWTQ, the spatial correlations of the readings are also considered. Theoretical analysis and simulation results indicate that RWTQ and DWTQ both are very robust against these parameters discussed previously. In addition, DWTQ outperforms TAG, FILA and EXTOK in transmission cost, energy consumption and network lifetime. PMID:26016914

  15. Titanbrowse: a new paradigm for access, visualization and analysis of hyperspectral imaging

    NASA Astrophysics Data System (ADS)

    Penteado, Paulo F.

    2016-10-01

    Currently there are archives and tools to explore remote sensing imaging, but these lack some functionality needed for hyperspectral imagers: 1) Querying and serving only whole datacubes is not enough, since in each cube there is typically a large variation in observation geometry over the spatial pixels. Thus, often the most useful unit for selecting observations of interest is not a whole cube but rather a single spectrum. 2) Pixel-specific geometric data included in the standard pipelines is calculated at only one point per pixel. Particularly for selections of pixels from many different cubes, or observations near the limb, it is necessary to know the actual extent of each pixel. 3) Database queries need not only metadata, but also by the spectral data. For instance, one query might look for atypical values of some band, or atypical relations between bands, denoting spectral features (such as ratios or differences between bands). 4) There is the need to evaluate arbitrary, dynamically-defined, complex functions of the data (beyond just simple arithmetic operations), both for selection in the queries, and for visualization, to interactively tune the queries to the observations of interest. 5) Making the most useful query for some analysis often requires interactive visualization integrated with data selection and processing, because the user needs to explore how different functions of the data vary over the observations without having to download data and import it into visualization software. 6) Complementary to interactive use, an API allowing programmatic access to the system is needed for systematic data analyses. 7) Direct access to calibrated and georeferenced data, without the need to download data and software and learn to process it.We present titanbrowse, a database, exploration and visualization system for Cassini VIMS observations of Titan, designed to fullfill the aforementioned needs. While it originallly ran on data in the user's computer, we are now developing an online version, so that users do not need to download software and data. The server, which we maintain, processes the queries and communicates the results to the client the user runs. http://ppenteado.net/titanbrowse.

  16. VisGets: coordinated visualizations for web-based information exploration and discovery.

    PubMed

    Dörk, Marian; Carpendale, Sheelagh; Collins, Christopher; Williamson, Carey

    2008-01-01

    In common Web-based search interfaces, it can be difficult to formulate queries that simultaneously combine temporal, spatial, and topical data filters. We investigate how coordinated visualizations can enhance search and exploration of information on the World Wide Web by easing the formulation of these types of queries. Drawing from visual information seeking and exploratory search, we introduce VisGets--interactive query visualizations of Web-based information that operate with online information within a Web browser. VisGets provide the information seeker with visual overviews of Web resources and offer a way to visually filter the data. Our goal is to facilitate the construction of dynamic search queries that combine filters from more than one data dimension. We present a prototype information exploration system featuring three linked VisGets (temporal, spatial, and topical), and used it to visually explore news items from online RSS feeds.

  17. Spatial Data Services for Interdisciplinary Applications from the NASA Socioeconomic Data and Applications Center

    NASA Astrophysics Data System (ADS)

    Chen, R. S.; MacManus, K.; Vinay, S.; Yetman, G.

    2016-12-01

    The Socioeconomic Data and Applications Center (SEDAC), one of 12 Distributed Active Archive Centers (DAACs) in the NASA Earth Observing System Data and Information System (EOSDIS), has developed a variety of operational spatial data services aimed at providing online access, visualization, and analytic functions for geospatial socioeconomic and environmental data. These services include: open web services that implement Open Geospatial Consortium (OGC) specifications such as Web Map Service (WMS), Web Feature Service (WFS), and Web Coverage Service (WCS); spatial query services that support Web Processing Service (WPS) and Representation State Transfer (REST); and web map clients and a mobile app that utilize SEDAC and other open web services. These services may be accessed from a variety of external map clients and visualization tools such as NASA's WorldView, NOAA's Climate Explorer, and ArcGIS Online. More than 200 data layers related to population, settlements, infrastructure, agriculture, environmental pollution, land use, health, hazards, climate change and other aspects of sustainable development are available through WMS, WFS, and/or WCS. Version 2 of the SEDAC Population Estimation Service (PES) supports spatial queries through WPS and REST in the form of a user-defined polygon or circle. The PES returns an estimate of the population residing in the defined area for a specific year (2000, 2005, 2010, 2015, or 2020) based on SEDAC's Gridded Population of the World version 4 (GPWv4) dataset, together with measures of accuracy. The SEDAC Hazards Mapper and the recently released HazPop iOS mobile app enable users to easily submit spatial queries to the PES and see the results. SEDAC has developed an operational virtualized backend infrastructure to manage these services and support their continual improvement as standards change, new data and services become available, and user needs evolve. An ongoing challenge is to improve the reliability and performance of the infrastructure, in conjunction with external services, to meet both research and operational needs.

  18. Implementation of Quantum Private Queries Using Nuclear Magnetic Resonance

    NASA Astrophysics Data System (ADS)

    Wang, Chuan; Hao, Liang; Zhao, Lian-Jie

    2011-08-01

    We present a modified protocol for the realization of a quantum private query process on a classical database. Using one-qubit query and CNOT operation, the query process can be realized in a two-mode database. In the query process, the data privacy is preserved as the sender would not reveal any information about the database besides her query information, and the database provider cannot retain any information about the query. We implement the quantum private query protocol in a nuclear magnetic resonance system. The density matrix of the memory registers are constructed.

  19. Design and development of linked data from the National Map

    USGS Publications Warehouse

    Usery, E. Lynn; Varanka, Dalia E.

    2012-01-01

    The development of linked data on the World-Wide Web provides the opportunity for the U.S. Geological Survey (USGS) to supply its extensive volumes of geospatial data, information, and knowledge in a machine interpretable form and reach users and applications that heretofore have been unavailable. To pilot a process to take advantage of this opportunity, the USGS is developing an ontology for The National Map and converting selected data from nine research test areas to a Semantic Web format to support machine processing and linked data access. In a case study, the USGS has developed initial methods for legacy vector and raster formatted geometry, attributes, and spatial relationships to be accessed in a linked data environment maintaining the capability to generate graphic or image output from semantic queries. The description of an initial USGS approach to developing ontology, linked data, and initial query capability from The National Map databases is presented.

  20. Spatiotemporal conceptual platform for querying archaeological information systems

    NASA Astrophysics Data System (ADS)

    Partsinevelos, Panagiotis; Sartzetaki, Mary; Sarris, Apostolos

    2015-04-01

    Spatial and temporal distribution of archaeological sites has been shown to associate with several attributes including marine, water, mineral and food resources, climate conditions, geomorphological features, etc. In this study, archeological settlement attributes are evaluated under various associations in order to provide a specialized query platform in a geographic information system (GIS). Towards this end, a spatial database is designed to include a series of archaeological findings for a secluded geographic area of Crete in Greece. The key categories of the geodatabase include the archaeological type (palace, burial site, village, etc.), temporal information of the habitation/usage period (pre Minoan, Minoan, Byzantine, etc.), and the extracted geographical attributes of the sites (distance to sea, altitude, resources, etc.). Most of the related spatial attributes are extracted with readily available GIS tools. Additionally, a series of conceptual data attributes are estimated, including: Temporal relation of an era to a future one in terms of alteration of the archaeological type, topologic relations of various types and attributes, spatial proximity relations between various types. These complex spatiotemporal relational measures reveal new attributes towards better understanding of site selection for prehistoric and/or historic cultures, yet their potential combinations can become numerous. Therefore, after the quantification of the above mentioned attributes, they are classified as of their importance for archaeological site location modeling. Under this new classification scheme, the user may select a geographic area of interest and extract only the important attributes for a specific archaeological type. These extracted attributes may then be queried against the entire spatial database and provide a location map of possible new archaeological sites. This novel type of querying is robust since the user does not have to type a standard SQL query but graphically select an area of interest. In addition, according to the application at hand, novel spatiotemporal attributes and relations can be supported, towards the understanding of historical settlement patterns.

  1. DCMS: A data analytics and management system for molecular simulation.

    PubMed

    Kumar, Anand; Grupcev, Vladimir; Berrada, Meryem; Fogarty, Joseph C; Tu, Yi-Cheng; Zhu, Xingquan; Pandit, Sagar A; Xia, Yuni

    Molecular Simulation (MS) is a powerful tool for studying physical/chemical features of large systems and has seen applications in many scientific and engineering domains. During the simulation process, the experiments generate a very large number of atoms and intend to observe their spatial and temporal relationships for scientific analysis. The sheer data volumes and their intensive interactions impose significant challenges for data accessing, managing, and analysis. To date, existing MS software systems fall short on storage and handling of MS data, mainly because of the missing of a platform to support applications that involve intensive data access and analytical process. In this paper, we present the database-centric molecular simulation (DCMS) system our team developed in the past few years. The main idea behind DCMS is to store MS data in a relational database management system (DBMS) to take advantage of the declarative query interface ( i.e. , SQL), data access methods, query processing, and optimization mechanisms of modern DBMSs. A unique challenge is to handle the analytical queries that are often compute-intensive. For that, we developed novel indexing and query processing strategies (including algorithms running on modern co-processors) as integrated components of the DBMS. As a result, researchers can upload and analyze their data using efficient functions implemented inside the DBMS. Index structures are generated to store analysis results that may be interesting to other users, so that the results are readily available without duplicating the analysis. We have developed a prototype of DCMS based on the PostgreSQL system and experiments using real MS data and workload show that DCMS significantly outperforms existing MS software systems. We also used it as a platform to test other data management issues such as security and compression.

  2. geophylobuilder 1.0: an arcgis extension for creating 'geophylogenies'.

    PubMed

    Kidd, David M; Liu, Xianhua

    2008-01-01

    Evolution is inherently a spatiotemporal process; however, despite this, phylogenetic and geographical data and models remain largely isolated from one another. Geographical information systems provide a ready-made spatial modelling, analysis and dissemination environment within which phylogenetic models can be explicitly linked with their associated spatial data and subsequently integrated with other georeferenced data sets describing the biotic and abiotic environment. geophylobuilder 1.0 is an extension for the arcgis geographical information system that builds a 'geophylogenetic' data model from a phylogenetic tree and associated geographical data. Geophylogenetic database objects can subsequently be queried, spatially analysed and visualized in both 2D and 3D within a geographical information systems. © 2007 The Authors.

  3. Nosql for Storage and Retrieval of Large LIDAR Data Collections

    NASA Astrophysics Data System (ADS)

    Boehm, J.; Liu, K.

    2015-08-01

    Developments in LiDAR technology over the past decades have made LiDAR to become a mature and widely accepted source of geospatial information. This in turn has led to an enormous growth in data volume. The central idea for a file-centric storage of LiDAR point clouds is the observation that large collections of LiDAR data are typically delivered as large collections of files, rather than single files of terabyte size. This split of the dataset, commonly referred to as tiling, was usually done to accommodate a specific processing pipeline. It makes therefore sense to preserve this split. A document oriented NoSQL database can easily emulate this data partitioning, by representing each tile (file) in a separate document. The document stores the metadata of the tile. The actual files are stored in a distributed file system emulated by the NoSQL database. We demonstrate the use of MongoDB a highly scalable document oriented NoSQL database for storing large LiDAR files. MongoDB like any NoSQL database allows for queries on the attributes of the document. As a specialty MongoDB also allows spatial queries. Hence we can perform spatial queries on the bounding boxes of the LiDAR tiles. Inserting and retrieving files on a cloud-based database is compared to native file system and cloud storage transfer speed.

  4. Ibmdbpy-spatial : An Open-source implementation of in-database geospatial analytics in Python

    NASA Astrophysics Data System (ADS)

    Roy, Avipsa; Fouché, Edouard; Rodriguez Morales, Rafael; Moehler, Gregor

    2017-04-01

    As the amount of spatial data acquired from several geodetic sources has grown over the years and as data infrastructure has become more powerful, the need for adoption of in-database analytic technology within geosciences has grown rapidly. In-database analytics on spatial data stored in a traditional enterprise data warehouse enables much faster retrieval and analysis for making better predictions about risks and opportunities, identifying trends and spot anomalies. Although there are a number of open-source spatial analysis libraries like geopandas and shapely available today, most of them have been restricted to manipulation and analysis of geometric objects with a dependency on GEOS and similar libraries. We present an open-source software package, written in Python, to fill the gap between spatial analysis and in-database analytics. Ibmdbpy-spatial provides a geospatial extension to the ibmdbpy package, implemented in 2015. It provides an interface for spatial data manipulation and access to in-database algorithms in IBM dashDB, a data warehouse platform with a spatial extender that runs as a service on IBM's cloud platform called Bluemix. Working in-database reduces the network overload, as the complete data need not be replicated into the user's local system altogether and only a subset of the entire dataset can be fetched into memory in a single instance. Ibmdbpy-spatial accelerates Python analytics by seamlessly pushing operations written in Python into the underlying database for execution using the dashDB spatial extender, thereby benefiting from in-database performance-enhancing features, such as columnar storage and parallel processing. The package is currently supported on Python versions from 2.7 up to 3.4. The basic architecture of the package consists of three main components - 1) a connection to the dashDB represented by the instance IdaDataBase, which uses a middleware API namely - pypyodbc or jaydebeapi to establish the database connection via ODBC or JDBC respectively, 2) an instance to represent the spatial data stored in the database as a dataframe in Python, called the IdaGeoDataFrame, with a specific geometry attribute which recognises a planar geometry column in dashDB and 3) Python wrappers for spatial functions like within, distance, area, buffer} and more which dashDB currently supports to make the querying process from Python much simpler for the users. The spatial functions translate well-known geopandas-like syntax into SQL queries utilising the database connection to perform spatial operations in-database and can operate on single geometries as well two different geometries from different IdaGeoDataFrames. The in-database queries strictly follow the standards of OpenGIS Implementation Specification for Geographic information - Simple feature access for SQL. The results of the operations obtained can thereby be accessed dynamically via interactive Jupyter notebooks from any system which supports Python, without any additional dependencies and can also be combined with other open source libraries such as matplotlib and folium in-built within Jupyter notebooks for visualization purposes. We built a use case to analyse crime hotspots in New York city to validate our implementation and visualized the results as a choropleth map for each borough.

  5. Data Container Study for Handling array-based data using Hive, Spark, MongoDB, SciDB and Rasdaman

    NASA Astrophysics Data System (ADS)

    Xu, M.; Hu, F.; Yang, J.; Yu, M.; Yang, C. P.

    2017-12-01

    Geoscience communities have come up with various big data storage solutions, such as Rasdaman and Hive, to address the grand challenges for massive Earth observation data management and processing. To examine the readiness of current solutions in supporting big Earth observation, we propose to investigate and compare four popular data container solutions, including Rasdaman, Hive, Spark, SciDB and MongoDB. Using different types of spatial and non-spatial queries, datasets stored in common scientific data formats (e.g., NetCDF and HDF), and two applications (i.e. dust storm simulation data mining and MERRA data analytics), we systematically compare and evaluate the feature and performance of these four data containers in terms of data discover and access. The computing resources (e.g. CPU, memory, hard drive, network) consumed while performing various queries and operations are monitored and recorded for the performance evaluation. The initial results show that 1) the popular data container clusters are able to handle large volume of data, but their performances vary in different situations. Meanwhile, there is a trade-off between data preprocessing, disk saving, query-time saving, and resource consuming. 2) ClimateSpark, MongoDB and SciDB perform the best among all the containers in all the queries tests, and Hive performs the worst. 3) These studied data containers can be applied on other array-based datasets, such as high resolution remote sensing data and model simulation data. 4) Rasdaman clustering configuration is more complex than the others. A comprehensive report will detail the experimental results, and compare their pros and cons regarding system performance, ease of use, accessibility, scalability, compatibility, and flexibility.

  6. A Framework for WWW Query Processing

    NASA Technical Reports Server (NTRS)

    Wu, Binghui Helen; Wharton, Stephen (Technical Monitor)

    2000-01-01

    Query processing is the most common operation in a DBMS. Sophisticated query processing has been mainly targeted at a single enterprise environment providing centralized control over data and metadata. Submitting queries by anonymous users on the web is different in such a way that load balancing or DBMS' accessing control becomes the key issue. This paper provides a solution by introducing a framework for WWW query processing. The success of this framework lies in the utilization of query optimization techniques and the ontological approach. This methodology has proved to be cost effective at the NASA Goddard Space Flight Center Distributed Active Archive Center (GDAAC).

  7. Effective 3-D surface modeling for geographic information systems

    NASA Astrophysics Data System (ADS)

    Yüksek, K.; Alparslan, M.; Mendi, E.

    2013-11-01

    In this work, we propose a dynamic, flexible and interactive urban digital terrain platform (DTP) with spatial data and query processing capabilities of Geographic Information Systems (GIS), multimedia database functionality and graphical modeling infrastructure. A new data element, called Geo-Node, which stores image, spatial data and 3-D CAD objects is developed using an efficient data structure. The system effectively handles data transfer of Geo-Nodes between main memory and secondary storage with an optimized Directional Replacement Policy (DRP) based buffer management scheme. Polyhedron structures are used in Digital Surface Modeling (DSM) and smoothing process is performed by interpolation. The experimental results show that our framework achieves high performance and works effectively with urban scenes independent from the amount of spatial data and image size. The proposed platform may contribute to the development of various applications such as Web GIS systems based on 3-D graphics standards (e.g. X3-D and VRML) and services which integrate multi-dimensional spatial information and satellite/aerial imagery.

  8. Effective 3-D surface modeling for geographic information systems

    NASA Astrophysics Data System (ADS)

    Yüksek, K.; Alparslan, M.; Mendi, E.

    2016-01-01

    In this work, we propose a dynamic, flexible and interactive urban digital terrain platform with spatial data and query processing capabilities of geographic information systems, multimedia database functionality and graphical modeling infrastructure. A new data element, called Geo-Node, which stores image, spatial data and 3-D CAD objects is developed using an efficient data structure. The system effectively handles data transfer of Geo-Nodes between main memory and secondary storage with an optimized directional replacement policy (DRP) based buffer management scheme. Polyhedron structures are used in digital surface modeling and smoothing process is performed by interpolation. The experimental results show that our framework achieves high performance and works effectively with urban scenes independent from the amount of spatial data and image size. The proposed platform may contribute to the development of various applications such as Web GIS systems based on 3-D graphics standards (e.g., X3-D and VRML) and services which integrate multi-dimensional spatial information and satellite/aerial imagery.

  9. Distributed query plan generation using multiobjective genetic algorithm.

    PubMed

    Panicker, Shina; Kumar, T V Vijay

    2014-01-01

    A distributed query processing strategy, which is a key performance determinant in accessing distributed databases, aims to minimize the total query processing cost. One way to achieve this is by generating efficient distributed query plans that involve fewer sites for processing a query. In the case of distributed relational databases, the number of possible query plans increases exponentially with respect to the number of relations accessed by the query and the number of sites where these relations reside. Consequently, computing optimal distributed query plans becomes a complex problem. This distributed query plan generation (DQPG) problem has already been addressed using single objective genetic algorithm, where the objective is to minimize the total query processing cost comprising the local processing cost (LPC) and the site-to-site communication cost (CC). In this paper, this DQPG problem is formulated and solved as a biobjective optimization problem with the two objectives being minimize total LPC and minimize total CC. These objectives are simultaneously optimized using a multiobjective genetic algorithm NSGA-II. Experimental comparison of the proposed NSGA-II based DQPG algorithm with the single objective genetic algorithm shows that the former performs comparatively better and converges quickly towards optimal solutions for an observed crossover and mutation probability.

  10. Distributed Query Plan Generation Using Multiobjective Genetic Algorithm

    PubMed Central

    Panicker, Shina; Vijay Kumar, T. V.

    2014-01-01

    A distributed query processing strategy, which is a key performance determinant in accessing distributed databases, aims to minimize the total query processing cost. One way to achieve this is by generating efficient distributed query plans that involve fewer sites for processing a query. In the case of distributed relational databases, the number of possible query plans increases exponentially with respect to the number of relations accessed by the query and the number of sites where these relations reside. Consequently, computing optimal distributed query plans becomes a complex problem. This distributed query plan generation (DQPG) problem has already been addressed using single objective genetic algorithm, where the objective is to minimize the total query processing cost comprising the local processing cost (LPC) and the site-to-site communication cost (CC). In this paper, this DQPG problem is formulated and solved as a biobjective optimization problem with the two objectives being minimize total LPC and minimize total CC. These objectives are simultaneously optimized using a multiobjective genetic algorithm NSGA-II. Experimental comparison of the proposed NSGA-II based DQPG algorithm with the single objective genetic algorithm shows that the former performs comparatively better and converges quickly towards optimal solutions for an observed crossover and mutation probability. PMID:24963513

  11. The Design of a High Performance Earth Imagery and Raster Data Management and Processing Platform

    NASA Astrophysics Data System (ADS)

    Xie, Qingyun

    2016-06-01

    This paper summarizes the general requirements and specific characteristics of both geospatial raster database management system and raster data processing platform from a domain-specific perspective as well as from a computing point of view. It also discusses the need of tight integration between the database system and the processing system. These requirements resulted in Oracle Spatial GeoRaster, a global scale and high performance earth imagery and raster data management and processing platform. The rationale, design, implementation, and benefits of Oracle Spatial GeoRaster are described. Basically, as a database management system, GeoRaster defines an integrated raster data model, supports image compression, data manipulation, general and spatial indices, content and context based queries and updates, versioning, concurrency, security, replication, standby, backup and recovery, multitenancy, and ETL. It provides high scalability using computer and storage clustering. As a raster data processing platform, GeoRaster provides basic operations, image processing, raster analytics, and data distribution featuring high performance computing (HPC). Specifically, HPC features include locality computing, concurrent processing, parallel processing, and in-memory computing. In addition, the APIs and the plug-in architecture are discussed.

  12. Model-based query language for analyzing clinical processes.

    PubMed

    Barzdins, Janis; Barzdins, Juris; Rencis, Edgars; Sostaks, Agris

    2013-01-01

    Nowadays large databases of clinical process data exist in hospitals. However, these data are rarely used in full scope. In order to perform queries on hospital processes, one must either choose from the predefined queries or develop queries using MS Excel-type software system, which is not always a trivial task. In this paper we propose a new query language for analyzing clinical processes that is easily perceptible also by non-IT professionals. We develop this language based on a process modeling language which is also described in this paper. Prototypes of both languages have already been verified using real examples from hospitals.

  13. Merged data models for multi-parameterized querying: Spectral data base meets GIS-based map archive

    NASA Astrophysics Data System (ADS)

    Naß, A.; D'Amore, M.; Helbert, J.

    2017-09-01

    Current and upcoming planetary missions deliver a huge amount of different data (remote sensing data, in-situ data, and derived products). Within this contribution present how different data (bases) can be managed and merged, to enable multi-parameterized querying based on the constant spatial context.

  14. Data Container Study for Handling Array-based Data Using Rasdaman, Hive, Spark, and MongoDB

    NASA Astrophysics Data System (ADS)

    Xu, M.; Hu, F.; Yu, M.; Scheele, C.; Liu, K.; Huang, Q.; Yang, C. P.; Little, M. M.

    2016-12-01

    Geoscience communities have come up with various big data storage solutions, such as Rasdaman and Hive, to address the grand challenges for massive Earth observation data management and processing. To examine the readiness of current solutions in supporting big Earth observation, we propose to investigate and compare four popular data container solutions, including Rasdaman, Hive, Spark, and MongoDB. Using different types of spatial and non-spatial queries, datasets stored in common scientific data formats (e.g., NetCDF and HDF), and two applications (i.e. dust storm simulation data mining and MERRA data analytics), we systematically compare and evaluate the feature and performance of these four data containers in terms of data discover and access. The computing resources (e.g. CPU, memory, hard drive, network) consumed while performing various queries and operations are monitored and recorded for the performance evaluation. The initial results show that 1) Rasdaman has the best performance for queries on statistical and operational functions, and supports NetCDF data format better than HDF; 2) Rasdaman clustering configuration is more complex than the others; 3) Hive performs better on single pixel extraction from multiple images; and 4) Except for the single pixel extractions, Spark performs better than Hive and its performance is close to Rasdaman. A comprehensive report will detail the experimental results, and compare their pros and cons regarding system performance, ease of use, accessibility, scalability, compatibility, and flexibility.

  15. A fast and robust bulk-loading algorithm for indexing very large digital elevation datasets II. Experimental results

    NASA Astrophysics Data System (ADS)

    Rodríguez, Félix R.; Barrena, Manuel

    2011-07-01

    The spatial indexing of eventually all the available topographic information of Earth is a highly valuable tool for different geoscientific application domains. The Shuttle Radar Topography Mission (SRTM) collected and made available to the public one of the world's largest digital elevation models (DEMs). With the aim of providing on easier and faster access to these data by improving their further analysis and processing, we have indexed the SRTM DEM by means of a spatial index based on the kd-tree data structure, called the Q-tree. This paper is the second in a two-part series that includes a thorough performance analysis to validate the bulk-load algorithm efficiency of the Q-tree. We investigate performance measuring elapsed time in different contexts, analyzing disk space usage, testing response time with typical queries, and validating the final index structure balance. In addition, the paper includes performance comparisons with Oracle 11g that helps to understand the real cost of our proposal. Our tests prove that the proposed algorithm outperforms Oracle 11g using around a 9% of the elapsed time, taking six times less storage with more than 96% of page utilization, and getting faster response times to spatial queries issued on 4.5 million points. In addition to this, the behavior of the spatial index has been successfully tested on both an open GIS (VT Builder) and a visualizer tool derived from the previous one.

  16. Towards Hybrid Online On-Demand Querying of Realtime Data with Stateful Complex Event Processing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhou, Qunzhi; Simmhan, Yogesh; Prasanna, Viktor K.

    Emerging Big Data applications in areas like e-commerce and energy industry require both online and on-demand queries to be performed over vast and fast data arriving as streams. These present novel challenges to Big Data management systems. Complex Event Processing (CEP) is recognized as a high performance online query scheme which in particular deals with the velocity aspect of the 3-V’s of Big Data. However, traditional CEP systems do not consider data variety and lack the capability to embed ad hoc queries over the volume of data streams. In this paper, we propose H2O, a stateful complex event processing framework,more » to support hybrid online and on-demand queries over realtime data. We propose a semantically enriched event and query model to address data variety. A formal query algebra is developed to precisely capture the stateful and containment semantics of online and on-demand queries. We describe techniques to achieve the interactive query processing over realtime data featured by efficient online querying, dynamic stream data persistence and on-demand access. The system architecture is presented and the current implementation status reported.« less

  17. Processing SPARQL queries with regular expressions in RDF databases

    PubMed Central

    2011-01-01

    Background As the Resource Description Framework (RDF) data model is widely used for modeling and sharing a lot of online bioinformatics resources such as Uniprot (dev.isb-sib.ch/projects/uniprot-rdf) or Bio2RDF (bio2rdf.org), SPARQL - a W3C recommendation query for RDF databases - has become an important query language for querying the bioinformatics knowledge bases. Moreover, due to the diversity of users’ requests for extracting information from the RDF data as well as the lack of users’ knowledge about the exact value of each fact in the RDF databases, it is desirable to use the SPARQL query with regular expression patterns for querying the RDF data. To the best of our knowledge, there is currently no work that efficiently supports regular expression processing in SPARQL over RDF databases. Most of the existing techniques for processing regular expressions are designed for querying a text corpus, or only for supporting the matching over the paths in an RDF graph. Results In this paper, we propose a novel framework for supporting regular expression processing in SPARQL query. Our contributions can be summarized as follows. 1) We propose an efficient framework for processing SPARQL queries with regular expression patterns in RDF databases. 2) We propose a cost model in order to adapt the proposed framework in the existing query optimizers. 3) We build a prototype for the proposed framework in C++ and conduct extensive experiments demonstrating the efficiency and effectiveness of our technique. Conclusions Experiments with a full-blown RDF engine show that our framework outperforms the existing ones by up to two orders of magnitude in processing SPARQL queries with regular expression patterns. PMID:21489225

  18. Processing SPARQL queries with regular expressions in RDF databases.

    PubMed

    Lee, Jinsoo; Pham, Minh-Duc; Lee, Jihwan; Han, Wook-Shin; Cho, Hune; Yu, Hwanjo; Lee, Jeong-Hoon

    2011-03-29

    As the Resource Description Framework (RDF) data model is widely used for modeling and sharing a lot of online bioinformatics resources such as Uniprot (dev.isb-sib.ch/projects/uniprot-rdf) or Bio2RDF (bio2rdf.org), SPARQL - a W3C recommendation query for RDF databases - has become an important query language for querying the bioinformatics knowledge bases. Moreover, due to the diversity of users' requests for extracting information from the RDF data as well as the lack of users' knowledge about the exact value of each fact in the RDF databases, it is desirable to use the SPARQL query with regular expression patterns for querying the RDF data. To the best of our knowledge, there is currently no work that efficiently supports regular expression processing in SPARQL over RDF databases. Most of the existing techniques for processing regular expressions are designed for querying a text corpus, or only for supporting the matching over the paths in an RDF graph. In this paper, we propose a novel framework for supporting regular expression processing in SPARQL query. Our contributions can be summarized as follows. 1) We propose an efficient framework for processing SPARQL queries with regular expression patterns in RDF databases. 2) We propose a cost model in order to adapt the proposed framework in the existing query optimizers. 3) We build a prototype for the proposed framework in C++ and conduct extensive experiments demonstrating the efficiency and effectiveness of our technique. Experiments with a full-blown RDF engine show that our framework outperforms the existing ones by up to two orders of magnitude in processing SPARQL queries with regular expression patterns.

  19. Web-based Visualization and Query of semantically segmented multiresolution 3D Models in the Field of Cultural Heritage

    NASA Astrophysics Data System (ADS)

    Auer, M.; Agugiaro, G.; Billen, N.; Loos, L.; Zipf, A.

    2014-05-01

    Many important Cultural Heritage sites have been studied over long periods of time by different means of technical equipment, methods and intentions by different researchers. This has led to huge amounts of heterogeneous "traditional" datasets and formats. The rising popularity of 3D models in the field of Cultural Heritage in recent years has brought additional data formats and makes it even more necessary to find solutions to manage, publish and study these data in an integrated way. The MayaArch3D project aims to realize such an integrative approach by establishing a web-based research platform bringing spatial and non-spatial databases together and providing visualization and analysis tools. Especially the 3D components of the platform use hierarchical segmentation concepts to structure the data and to perform queries on semantic entities. This paper presents a database schema to organize not only segmented models but also different Levels-of-Details and other representations of the same entity. It is further implemented in a spatial database which allows the storing of georeferenced 3D data. This enables organization and queries by semantic, geometric and spatial properties. As service for the delivery of the segmented models a standardization candidate of the OpenGeospatialConsortium (OGC), the Web3DService (W3DS) has been extended to cope with the new database schema and deliver a web friendly format for WebGL rendering. Finally a generic user interface is presented which uses the segments as navigation metaphor to browse and query the semantic segmentation levels and retrieve information from an external database of the German Archaeological Institute (DAI).

  20. Privacy-Preserving Location-Based Query Using Location Indexes and Parallel Searching in Distributed Networks

    PubMed Central

    Liu, Lei; Zhao, Jing

    2014-01-01

    An efficient location-based query algorithm of protecting the privacy of the user in the distributed networks is given. This algorithm utilizes the location indexes of the users and multiple parallel threads to search and select quickly all the candidate anonymous sets with more users and their location information with more uniform distribution to accelerate the execution of the temporal-spatial anonymous operations, and it allows the users to configure their custom-made privacy-preserving location query requests. The simulated experiment results show that the proposed algorithm can offer simultaneously the location query services for more users and improve the performance of the anonymous server and satisfy the anonymous location requests of the users. PMID:24790579

  1. Privacy-preserving location-based query using location indexes and parallel searching in distributed networks.

    PubMed

    Zhong, Cheng; Liu, Lei; Zhao, Jing

    2014-01-01

    An efficient location-based query algorithm of protecting the privacy of the user in the distributed networks is given. This algorithm utilizes the location indexes of the users and multiple parallel threads to search and select quickly all the candidate anonymous sets with more users and their location information with more uniform distribution to accelerate the execution of the temporal-spatial anonymous operations, and it allows the users to configure their custom-made privacy-preserving location query requests. The simulated experiment results show that the proposed algorithm can offer simultaneously the location query services for more users and improve the performance of the anonymous server and satisfy the anonymous location requests of the users.

  2. Spatial coding-based approach for partitioning big spatial data in Hadoop

    NASA Astrophysics Data System (ADS)

    Yao, Xiaochuang; Mokbel, Mohamed F.; Alarabi, Louai; Eldawy, Ahmed; Yang, Jianyu; Yun, Wenju; Li, Lin; Ye, Sijing; Zhu, Dehai

    2017-09-01

    Spatial data partitioning (SDP) plays a powerful role in distributed storage and parallel computing for spatial data. However, due to skew distribution of spatial data and varying volume of spatial vector objects, it leads to a significant challenge to ensure both optimal performance of spatial operation and data balance in the cluster. To tackle this problem, we proposed a spatial coding-based approach for partitioning big spatial data in Hadoop. This approach, firstly, compressed the whole big spatial data based on spatial coding matrix to create a sensing information set (SIS), including spatial code, size, count and other information. SIS was then employed to build spatial partitioning matrix, which was used to spilt all spatial objects into different partitions in the cluster finally. Based on our approach, the neighbouring spatial objects can be partitioned into the same block. At the same time, it also can minimize the data skew in Hadoop distributed file system (HDFS). The presented approach with a case study in this paper is compared against random sampling based partitioning, with three measurement standards, namely, the spatial index quality, data skew in HDFS, and range query performance. The experimental results show that our method based on spatial coding technique can improve the query performance of big spatial data, as well as the data balance in HDFS. We implemented and deployed this approach in Hadoop, and it is also able to support efficiently any other distributed big spatial data systems.

  3. RiPPAS: A Ring-Based Privacy-Preserving Aggregation Scheme in Wireless Sensor Networks

    PubMed Central

    Zhang, Kejia; Han, Qilong; Cai, Zhipeng; Yin, Guisheng

    2017-01-01

    Recently, data privacy in wireless sensor networks (WSNs) has been paid increased attention. The characteristics of WSNs determine that users’ queries are mainly aggregation queries. In this paper, the problem of processing aggregation queries in WSNs with data privacy preservation is investigated. A Ring-based Privacy-Preserving Aggregation Scheme (RiPPAS) is proposed. RiPPAS adopts ring structure to perform aggregation. It uses pseudonym mechanism for anonymous communication and uses homomorphic encryption technique to add noise to the data easily to be disclosed. RiPPAS can handle both sum() queries and min()/max() queries, while the existing privacy-preserving aggregation methods can only deal with sum() queries. For processing sum() queries, compared with the existing methods, RiPPAS has advantages in the aspects of privacy preservation and communication efficiency, which can be proved by theoretical analysis and simulation results. For processing min()/max() queries, RiPPAS provides effective privacy preservation and has low communication overhead. PMID:28178197

  4. DISPAQ: Distributed Profitable-Area Query from Big Taxi Trip Data.

    PubMed

    Putri, Fadhilah Kurnia; Song, Giltae; Kwon, Joonho; Rao, Praveen

    2017-09-25

    One of the crucial problems for taxi drivers is to efficiently locate passengers in order to increase profits. The rapid advancement and ubiquitous penetration of Internet of Things (IoT) technology into transportation industries enables us to provide taxi drivers with locations that have more potential passengers (more profitable areas) by analyzing and querying taxi trip data. In this paper, we propose a query processing system, called Distributed Profitable-Area Query ( DISPAQ ) which efficiently identifies profitable areas by exploiting the Apache Software Foundation's Spark framework and a MongoDB database. DISPAQ first maintains a profitable-area query index (PQ-index) by extracting area summaries and route summaries from raw taxi trip data. It then identifies candidate profitable areas by searching the PQ-index during query processing. Then, it exploits a Z-Skyline algorithm, which is an extension of skyline processing with a Z-order space filling curve, to quickly refine the candidate profitable areas. To improve the performance of distributed query processing, we also propose local Z-Skyline optimization, which reduces the number of dominant tests by distributing killer profitable areas to each cluster node. Through extensive evaluation with real datasets, we demonstrate that our DISPAQ system provides a scalable and efficient solution for processing profitable-area queries from huge amounts of big taxi trip data.

  5. DISPAQ: Distributed Profitable-Area Query from Big Taxi Trip Data †

    PubMed Central

    Putri, Fadhilah Kurnia; Song, Giltae; Rao, Praveen

    2017-01-01

    One of the crucial problems for taxi drivers is to efficiently locate passengers in order to increase profits. The rapid advancement and ubiquitous penetration of Internet of Things (IoT) technology into transportation industries enables us to provide taxi drivers with locations that have more potential passengers (more profitable areas) by analyzing and querying taxi trip data. In this paper, we propose a query processing system, called Distributed Profitable-Area Query (DISPAQ) which efficiently identifies profitable areas by exploiting the Apache Software Foundation’s Spark framework and a MongoDB database. DISPAQ first maintains a profitable-area query index (PQ-index) by extracting area summaries and route summaries from raw taxi trip data. It then identifies candidate profitable areas by searching the PQ-index during query processing. Then, it exploits a Z-Skyline algorithm, which is an extension of skyline processing with a Z-order space filling curve, to quickly refine the candidate profitable areas. To improve the performance of distributed query processing, we also propose local Z-Skyline optimization, which reduces the number of dominant tests by distributing killer profitable areas to each cluster node. Through extensive evaluation with real datasets, we demonstrate that our DISPAQ system provides a scalable and efficient solution for processing profitable-area queries from huge amounts of big taxi trip data. PMID:28946679

  6. An index-based algorithm for fast on-line query processing of latent semantic analysis

    PubMed Central

    Li, Pohan; Wang, Wei

    2017-01-01

    Latent Semantic Analysis (LSA) is widely used for finding the documents whose semantic is similar to the query of keywords. Although LSA yield promising similar results, the existing LSA algorithms involve lots of unnecessary operations in similarity computation and candidate check during on-line query processing, which is expensive in terms of time cost and cannot efficiently response the query request especially when the dataset becomes large. In this paper, we study the efficiency problem of on-line query processing for LSA towards efficiently searching the similar documents to a given query. We rewrite the similarity equation of LSA combined with an intermediate value called partial similarity that is stored in a designed index called partial index. For reducing the searching space, we give an approximate form of similarity equation, and then develop an efficient algorithm for building partial index, which skips the partial similarities lower than a given threshold θ. Based on partial index, we develop an efficient algorithm called ILSA for supporting fast on-line query processing. The given query is transformed into a pseudo document vector, and the similarities between query and candidate documents are computed by accumulating the partial similarities obtained from the index nodes corresponds to non-zero entries in the pseudo document vector. Compared to the LSA algorithm, ILSA reduces the time cost of on-line query processing by pruning the candidate documents that are not promising and skipping the operations that make little contribution to similarity scores. Extensive experiments through comparison with LSA have been done, which demonstrate the efficiency and effectiveness of our proposed algorithm. PMID:28520747

  7. An index-based algorithm for fast on-line query processing of latent semantic analysis.

    PubMed

    Zhang, Mingxi; Li, Pohan; Wang, Wei

    2017-01-01

    Latent Semantic Analysis (LSA) is widely used for finding the documents whose semantic is similar to the query of keywords. Although LSA yield promising similar results, the existing LSA algorithms involve lots of unnecessary operations in similarity computation and candidate check during on-line query processing, which is expensive in terms of time cost and cannot efficiently response the query request especially when the dataset becomes large. In this paper, we study the efficiency problem of on-line query processing for LSA towards efficiently searching the similar documents to a given query. We rewrite the similarity equation of LSA combined with an intermediate value called partial similarity that is stored in a designed index called partial index. For reducing the searching space, we give an approximate form of similarity equation, and then develop an efficient algorithm for building partial index, which skips the partial similarities lower than a given threshold θ. Based on partial index, we develop an efficient algorithm called ILSA for supporting fast on-line query processing. The given query is transformed into a pseudo document vector, and the similarities between query and candidate documents are computed by accumulating the partial similarities obtained from the index nodes corresponds to non-zero entries in the pseudo document vector. Compared to the LSA algorithm, ILSA reduces the time cost of on-line query processing by pruning the candidate documents that are not promising and skipping the operations that make little contribution to similarity scores. Extensive experiments through comparison with LSA have been done, which demonstrate the efficiency and effectiveness of our proposed algorithm.

  8. Information Network Model Query Processing

    NASA Astrophysics Data System (ADS)

    Song, Xiaopu

    Information Networking Model (INM) [31] is a novel database model for real world objects and relationships management. It naturally and directly supports various kinds of static and dynamic relationships between objects. In INM, objects are networked through various natural and complex relationships. INM Query Language (INM-QL) [30] is designed to explore such information network, retrieve information about schema, instance, their attributes, relationships, and context-dependent information, and process query results in the user specified form. INM database management system has been implemented using Berkeley DB, and it supports INM-QL. This thesis is mainly focused on the implementation of the subsystem that is able to effectively and efficiently process INM-QL. The subsystem provides a lexical and syntactical analyzer of INM-QL, and it is able to choose appropriate evaluation strategies and index mechanism to process queries in INM-QL without the user's intervention. It also uses intermediate result structure to hold intermediate query result and other helping structures to reduce complexity of query processing.

  9. Ontological Approach to Military Knowledge Modeling and Management

    DTIC Science & Technology

    2004-03-01

    federated search mechanism has to reformulate user queries (expressed using the ontology) in the query languages of the different sources (e.g. SQL...ontologies as a common terminology – Unified query to perform federated search • Query processing – Ontology mapping to sources reformulate queries

  10. Resource Needs and Pedagogical Value of Web Mapping for Spatial Thinking

    ERIC Educational Resources Information Center

    Manson, Steven; Shannon, Jerry; Eria, Sami; Kne, Len; Dyke, Kevin; Nelson, Sara; Batra, Lalit; Bonsal, Dudley; Kernik, Melinda; Immich, Jennifer; Matson, Laura

    2014-01-01

    Web mapping involves publishing and using maps via the Internet, and can range from presenting static maps to offering dynamic data querying and spatial analysis. Web mapping is seen as a promising way to support development of spatial thinking in the classroom but there are unanswered questions about how this promise plays out in reality. This…

  11. Application based on ArcObject inquiry and Google maps demonstration to real estate database

    NASA Astrophysics Data System (ADS)

    Hwang, JinTsong

    2007-06-01

    Real estate industry in Taiwan has been flourishing in recent years. To acquire various and abundant information of real estate for sale is the same goal for the consumers and the brokerages. Therefore, before looking at the property, it is important to get all pertinent information possible. Not only this beneficial for the real estate agent as they can provide the sellers with the most information, thereby solidifying the interest of the buyer, but may also save time and the cost of manpower were something out of place. Most of the brokerage sites are aware of utilizes Internet as form of media for publicity however; the contents are limited to specific property itself and the functions of query are mostly just provided searching by condition. This paper proposes a query interface on website which gives function of zone query by spatial analysis for non-GIS users, developing a user-friendly interface with ArcObject in VB6, and query by condition. The inquiry results can show on the web page which is embedded functions of Google Maps and the UrMap API on it. In addition, the demonstration of inquiry results will give the multimedia present way which includes hyperlink to Google Earth with surrounding of the property, the Virtual Reality scene of house, panorama of interior of building and so on. Therefore, the website provides extra spatial solution for query and demonstration abundant information of real estate in two-dimensional and three-dimensional types of view.

  12. HodDB: Design and Analysis of a Query Processor for Brick.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Fierro, Gabriel; Culler, David

    Brick is a recently proposed metadata schema and ontology for describing building components and the relationships between them. It represents buildings as directed labeled graphs using the RDF data model. Using the SPARQL query language, building-agnostic applications query a Brick graph to discover the set of resources and relationships they require to operate. Latency-sensitive applications, such as user interfaces, demand response and modelpredictive control, require fast queries — conventionally less than 100ms. We benchmark a set of popular open-source and commercial SPARQL databases against three real Brick models using seven application queries and find that none of them meet thismore » performance target. This lack of performance can be attributed to design decisions that optimize for queries over large graphs consisting of billions of triples, but give poor spatial locality and join performance on the small dense graphs typical of Brick. We present the design and evaluation of HodDB, a RDF/SPARQL database for Brick built over a node-based index structure. HodDB performs Brick queries 3-700x faster than leading SPARQL databases and consistently meets the 100ms threshold, enabling the portability of important latency-sensitive building applications.« less

  13. EMAP and EMAGE: a framework for understanding spatially organized data.

    PubMed

    Baldock, Richard A; Bard, Jonathan B L; Burger, Albert; Burton, Nicolas; Christiansen, Jeff; Feng, Guanjie; Hill, Bill; Houghton, Derek; Kaufman, Matthew; Rao, Jianguo; Sharpe, James; Ross, Allyson; Stevenson, Peter; Venkataraman, Shanmugasundaram; Waterhouse, Andrew; Yang, Yiya; Davidson, Duncan R

    2003-01-01

    The Edinburgh MouseAtlas Project (EMAP) is a time-series of mouse-embryo volumetric models. The models provide a context-free spatial framework onto which structural interpretations and experimental data can be mapped. This enables collation, comparison, and query of complex spatial patterns with respect to each other and with respect to known or hypothesized structure. The atlas also includes a time-dependent anatomical ontology and mapping between the ontology and the spatial models in the form of delineated anatomical regions or tissues. The models provide a natural, graphical context for browsing and visualizing complex data. The Edinburgh Mouse Atlas Gene-Expression Database (EMAGE) is one of the first applications of the EMAP framework and provides a spatially mapped gene-expression database with associated tools for data mapping, submission, and query. In this article, we describe the underlying principles of the Atlas and the gene-expression database, and provide a practical introduction to the use of the EMAP and EMAGE tools, including use of new techniques for whole body gene-expression data capture and mapping.

  14. Relational Algebra in Spatial Decision Support Systems Ontologies.

    PubMed

    Diomidous, Marianna; Chardalias, Kostis; Koutonias, Panagiotis; Magnita, Adrianna; Andrianopoulos, Charalampos; Zimeras, Stelios; Mechili, Enkeleint Aggelos

    2017-01-01

    Decision Support Systems (DSS) is a powerful tool, for facilitates researchers to choose the correct decision based on their final results. Especially in medical cases where doctors could use these systems, to overcome the problem with the clinical misunderstanding. Based on these systems, queries must be constructed based on the particular questions that doctors must answer. In this work, combination between questions and queries would be presented via relational algebra.

  15. The research and implementation of coalfield spontaneous combustion of carbon emission WebGIS based on Silverlight and ArcGIS server

    NASA Astrophysics Data System (ADS)

    Zhu, Z.; Bi, J.; Wang, X.; Zhu, W.

    2014-02-01

    As an important sub-topic of the natural process of carbon emission data public information platform construction, coalfield spontaneous combustion of carbon emission WebGIS system has become an important study object. In connection with data features of coalfield spontaneous combustion carbon emissions (i.e. a wide range of data, which is rich and complex) and the geospatial characteristics, data is divided into attribute data and spatial data. Based on full analysis of the data, completed the detailed design of the Oracle database and stored on the Oracle database. Through Silverlight rich client technology and the expansion of WCF services, achieved the attribute data of web dynamic query, retrieval, statistical, analysis and other functions. For spatial data, we take advantage of ArcGIS Server and Silverlight-based API to invoke GIS server background published map services, GP services, Image services and other services, implemented coalfield spontaneous combustion of remote sensing image data and web map data display, data analysis, thematic map production. The study found that the Silverlight technology, based on rich client and object-oriented framework for WCF service, can efficiently constructed a WebGIS system. And then, combined with ArcGIS Silverlight API to achieve interactive query attribute data and spatial data of coalfield spontaneous emmission, can greatly improve the performance of WebGIS system. At the same time, it provided a strong guarantee for the construction of public information on China's carbon emission data.

  16. A high performance, ad-hoc, fuzzy query processing system for relational databases

    NASA Technical Reports Server (NTRS)

    Mansfield, William H., Jr.; Fleischman, Robert M.

    1992-01-01

    Database queries involving imprecise or fuzzy predicates are currently an evolving area of academic and industrial research. Such queries place severe stress on the indexing and I/O subsystems of conventional database environments since they involve the search of large numbers of records. The Datacycle architecture and research prototype is a database environment that uses filtering technology to perform an efficient, exhaustive search of an entire database. It has recently been modified to include fuzzy predicates in its query processing. The approach obviates the need for complex index structures, provides unlimited query throughput, permits the use of ad-hoc fuzzy membership functions, and provides a deterministic response time largely independent of query complexity and load. This paper describes the Datacycle prototype implementation of fuzzy queries and some recent performance results.

  17. Hybrid ontology for semantic information retrieval model using keyword matching indexing system.

    PubMed

    Uthayan, K R; Mala, G S Anandha

    2015-01-01

    Ontology is the process of growth and elucidation of concepts of an information domain being common for a group of users. Establishing ontology into information retrieval is a normal method to develop searching effects of relevant information users require. Keywords matching process with historical or information domain is significant in recent calculations for assisting the best match for specific input queries. This research presents a better querying mechanism for information retrieval which integrates the ontology queries with keyword search. The ontology-based query is changed into a primary order to predicate logic uncertainty which is used for routing the query to the appropriate servers. Matching algorithms characterize warm area of researches in computer science and artificial intelligence. In text matching, it is more dependable to study semantics model and query for conditions of semantic matching. This research develops the semantic matching results between input queries and information in ontology field. The contributed algorithm is a hybrid method that is based on matching extracted instances from the queries and information field. The queries and information domain is focused on semantic matching, to discover the best match and to progress the executive process. In conclusion, the hybrid ontology in semantic web is sufficient to retrieve the documents when compared to standard ontology.

  18. Hybrid Ontology for Semantic Information Retrieval Model Using Keyword Matching Indexing System

    PubMed Central

    Uthayan, K. R.; Anandha Mala, G. S.

    2015-01-01

    Ontology is the process of growth and elucidation of concepts of an information domain being common for a group of users. Establishing ontology into information retrieval is a normal method to develop searching effects of relevant information users require. Keywords matching process with historical or information domain is significant in recent calculations for assisting the best match for specific input queries. This research presents a better querying mechanism for information retrieval which integrates the ontology queries with keyword search. The ontology-based query is changed into a primary order to predicate logic uncertainty which is used for routing the query to the appropriate servers. Matching algorithms characterize warm area of researches in computer science and artificial intelligence. In text matching, it is more dependable to study semantics model and query for conditions of semantic matching. This research develops the semantic matching results between input queries and information in ontology field. The contributed algorithm is a hybrid method that is based on matching extracted instances from the queries and information field. The queries and information domain is focused on semantic matching, to discover the best match and to progress the executive process. In conclusion, the hybrid ontology in semantic web is sufficient to retrieve the documents when compared to standard ontology. PMID:25922851

  19. Query Language for Location-Based Services: A Model Checking Approach

    NASA Astrophysics Data System (ADS)

    Hoareau, Christian; Satoh, Ichiro

    We present a model checking approach to the rationale, implementation, and applications of a query language for location-based services. Such query mechanisms are necessary so that users, objects, and/or services can effectively benefit from the location-awareness of their surrounding environment. The underlying data model is founded on a symbolic model of space organized in a tree structure. Once extended to a semantic model for modal logic, we regard location query processing as a model checking problem, and thus define location queries as hybrid logicbased formulas. Our approach is unique to existing research because it explores the connection between location models and query processing in ubiquitous computing systems, relies on a sound theoretical basis, and provides modal logic-based query mechanisms for expressive searches over a decentralized data structure. A prototype implementation is also presented and will be discussed.

  20. Modeling Spatial Relationships within a Fuzzy Framework.

    ERIC Educational Resources Information Center

    Petry, Frederick E.; Cobb, Maria A.

    1998-01-01

    Presents a model for representing and storing binary topological and directional relationships between 2-dimensional objects that is used to provide a basis for fuzzy querying capabilities. A data structure called an abstract spatial graph (ASG) is defined for the binary relationships that maintains all necessary information regarding topology and…

  1. An Ontology-Based Reasoning Framework for Querying Satellite Images for Disaster Monitoring.

    PubMed

    Alirezaie, Marjan; Kiselev, Andrey; Längkvist, Martin; Klügl, Franziska; Loutfi, Amy

    2017-11-05

    This paper presents a framework in which satellite images are classified and augmented with additional semantic information to enable queries about what can be found on the map at a particular location, but also about paths that can be taken. This is achieved by a reasoning framework based on qualitative spatial reasoning that is able to find answers to high level queries that may vary on the current situation. This framework called SemCityMap, provides the full pipeline from enriching the raw image data with rudimentary labels to the integration of a knowledge representation and reasoning methods to user interfaces for high level querying. To illustrate the utility of SemCityMap in a disaster scenario, we use an urban environment-central Stockholm-in combination with a flood simulation. We show that the system provides useful answers to high-level queries also with respect to the current flood status. Examples of such queries concern path planning for vehicles or retrieval of safe regions such as "find all regions close to schools and far from the flooded area". The particular advantage of our approach lies in the fact that ontological information and reasoning is explicitly integrated so that queries can be formulated in a natural way using concepts on appropriate level of abstraction, including additional constraints.

  2. An Ontology-Based Reasoning Framework for Querying Satellite Images for Disaster Monitoring

    PubMed Central

    Alirezaie, Marjan; Klügl, Franziska; Loutfi, Amy

    2017-01-01

    This paper presents a framework in which satellite images are classified and augmented with additional semantic information to enable queries about what can be found on the map at a particular location, but also about paths that can be taken. This is achieved by a reasoning framework based on qualitative spatial reasoning that is able to find answers to high level queries that may vary on the current situation. This framework called SemCityMap, provides the full pipeline from enriching the raw image data with rudimentary labels to the integration of a knowledge representation and reasoning methods to user interfaces for high level querying. To illustrate the utility of SemCityMap in a disaster scenario, we use an urban environment—central Stockholm—in combination with a flood simulation. We show that the system provides useful answers to high-level queries also with respect to the current flood status. Examples of such queries concern path planning for vehicles or retrieval of safe regions such as “find all regions close to schools and far from the flooded area”. The particular advantage of our approach lies in the fact that ontological information and reasoning is explicitly integrated so that queries can be formulated in a natural way using concepts on appropriate level of abstraction, including additional constraints. PMID:29113073

  3. Secure Skyline Queries on Cloud Platform.

    PubMed

    Liu, Jinfei; Yang, Juncheng; Xiong, Li; Pei, Jian

    2017-04-01

    Outsourcing data and computation to cloud server provides a cost-effective way to support large scale data storage and query processing. However, due to security and privacy concerns, sensitive data (e.g., medical records) need to be protected from the cloud server and other unauthorized users. One approach is to outsource encrypted data to the cloud server and have the cloud server perform query processing on the encrypted data only. It remains a challenging task to support various queries over encrypted data in a secure and efficient way such that the cloud server does not gain any knowledge about the data, query, and query result. In this paper, we study the problem of secure skyline queries over encrypted data. The skyline query is particularly important for multi-criteria decision making but also presents significant challenges due to its complex computations. We propose a fully secure skyline query protocol on data encrypted using semantically-secure encryption. As a key subroutine, we present a new secure dominance protocol, which can be also used as a building block for other queries. Finally, we provide both serial and parallelized implementations and empirically study the protocols in terms of efficiency and scalability under different parameter settings, verifying the feasibility of our proposed solutions.

  4. Cognitive search model and a new query paradigm

    NASA Astrophysics Data System (ADS)

    Xu, Zhonghui

    2001-06-01

    This paper proposes a cognitive model in which people begin to search pictures by using semantic content and find a right picture by judging whether its visual content is a proper visualization of the semantics desired. It is essential that human search is not just a process of matching computation on visual feature but rather a process of visualization of the semantic content known. For people to search electronic images in the way as they manually do in the model, we suggest that querying be a semantic-driven process like design. A query-by-design paradigm is prosed in the sense that what you design is what you find. Unlike query-by-example, query-by-design allows users to specify the semantic content through an iterative and incremental interaction process so that a retrieval can start with association and identification of the given semantic content and get refined while further visual cues are available. An experimental image retrieval system, Kuafu, has been under development using the query-by-design paradigm and an iconic language is adopted.

  5. Query-Based Outlier Detection in Heterogeneous Information Networks.

    PubMed

    Kuck, Jonathan; Zhuang, Honglei; Yan, Xifeng; Cam, Hasan; Han, Jiawei

    2015-03-01

    Outlier or anomaly detection in large data sets is a fundamental task in data science, with broad applications. However, in real data sets with high-dimensional space, most outliers are hidden in certain dimensional combinations and are relative to a user's search space and interest. It is often more effective to give power to users and allow them to specify outlier queries flexibly, and the system will then process such mining queries efficiently. In this study, we introduce the concept of query-based outlier in heterogeneous information networks, design a query language to facilitate users to specify such queries flexibly, define a good outlier measure in heterogeneous networks, and study how to process outlier queries efficiently in large data sets. Our experiments on real data sets show that following such a methodology, interesting outliers can be defined and uncovered flexibly and effectively in large heterogeneous networks.

  6. Query-Based Outlier Detection in Heterogeneous Information Networks

    PubMed Central

    Kuck, Jonathan; Zhuang, Honglei; Yan, Xifeng; Cam, Hasan; Han, Jiawei

    2015-01-01

    Outlier or anomaly detection in large data sets is a fundamental task in data science, with broad applications. However, in real data sets with high-dimensional space, most outliers are hidden in certain dimensional combinations and are relative to a user’s search space and interest. It is often more effective to give power to users and allow them to specify outlier queries flexibly, and the system will then process such mining queries efficiently. In this study, we introduce the concept of query-based outlier in heterogeneous information networks, design a query language to facilitate users to specify such queries flexibly, define a good outlier measure in heterogeneous networks, and study how to process outlier queries efficiently in large data sets. Our experiments on real data sets show that following such a methodology, interesting outliers can be defined and uncovered flexibly and effectively in large heterogeneous networks. PMID:27064397

  7. VPipe: Virtual Pipelining for Scheduling of DAG Stream Query Plans

    NASA Astrophysics Data System (ADS)

    Wang, Song; Gupta, Chetan; Mehta, Abhay

    There are data streams all around us that can be harnessed for tremendous business and personal advantage. For an enterprise-level stream processing system such as CHAOS [1] (Continuous, Heterogeneous Analytic Over Streams), handling of complex query plans with resource constraints is challenging. While several scheduling strategies exist for stream processing, efficient scheduling of complex DAG query plans is still largely unsolved. In this paper, we propose a novel execution scheme for scheduling complex directed acyclic graph (DAG) query plans with meta-data enriched stream tuples. Our solution, called Virtual Pipelined Chain (or VPipe Chain for short), effectively extends the "Chain" pipelining scheduling approach to complex DAG query plans.

  8. Efficient Queries of Stand-off Annotations for Natural Language Processing on Electronic Medical Records.

    PubMed

    Luo, Yuan; Szolovits, Peter

    2016-01-01

    In natural language processing, stand-off annotation uses the starting and ending positions of an annotation to anchor it to the text and stores the annotation content separately from the text. We address the fundamental problem of efficiently storing stand-off annotations when applying natural language processing on narrative clinical notes in electronic medical records (EMRs) and efficiently retrieving such annotations that satisfy position constraints. Efficient storage and retrieval of stand-off annotations can facilitate tasks such as mapping unstructured text to electronic medical record ontologies. We first formulate this problem into the interval query problem, for which optimal query/update time is in general logarithm. We next perform a tight time complexity analysis on the basic interval tree query algorithm and show its nonoptimality when being applied to a collection of 13 query types from Allen's interval algebra. We then study two closely related state-of-the-art interval query algorithms, proposed query reformulations, and augmentations to the second algorithm. Our proposed algorithm achieves logarithmic time stabbing-max query time complexity and solves the stabbing-interval query tasks on all of Allen's relations in logarithmic time, attaining the theoretic lower bound. Updating time is kept logarithmic and the space requirement is kept linear at the same time. We also discuss interval management in external memory models and higher dimensions.

  9. Efficient Queries of Stand-off Annotations for Natural Language Processing on Electronic Medical Records

    PubMed Central

    Luo, Yuan; Szolovits, Peter

    2016-01-01

    In natural language processing, stand-off annotation uses the starting and ending positions of an annotation to anchor it to the text and stores the annotation content separately from the text. We address the fundamental problem of efficiently storing stand-off annotations when applying natural language processing on narrative clinical notes in electronic medical records (EMRs) and efficiently retrieving such annotations that satisfy position constraints. Efficient storage and retrieval of stand-off annotations can facilitate tasks such as mapping unstructured text to electronic medical record ontologies. We first formulate this problem into the interval query problem, for which optimal query/update time is in general logarithm. We next perform a tight time complexity analysis on the basic interval tree query algorithm and show its nonoptimality when being applied to a collection of 13 query types from Allen’s interval algebra. We then study two closely related state-of-the-art interval query algorithms, proposed query reformulations, and augmentations to the second algorithm. Our proposed algorithm achieves logarithmic time stabbing-max query time complexity and solves the stabbing-interval query tasks on all of Allen’s relations in logarithmic time, attaining the theoretic lower bound. Updating time is kept logarithmic and the space requirement is kept linear at the same time. We also discuss interval management in external memory models and higher dimensions. PMID:27478379

  10. On-demand information retrieval in sensor networks with localised query and energy-balanced data collection.

    PubMed

    Teng, Rui; Zhang, Bing

    2011-01-01

    On-demand information retrieval enables users to query and collect up-to-date sensing information from sensor nodes. Since high energy efficiency is required in a sensor network, it is desirable to disseminate query messages with small traffic overhead and to collect sensing data with low energy consumption. However, on-demand query messages are generally forwarded to sensor nodes in network-wide broadcasts, which create large traffic overhead. In addition, since on-demand information retrieval may introduce intermittent and spatial data collections, the construction and maintenance of conventional aggregation structures such as clusters and chains will be at high cost. In this paper, we propose an on-demand information retrieval approach that exploits the name resolution of data queries according to the attribute and location of each sensor node. The proposed approach localises each query dissemination and enable localised data collection with maximised aggregation. To illustrate the effectiveness of the proposed approach, an analytical model that describes the criteria of sink proxy selection is provided. The evaluation results reveal that the proposed scheme significantly reduces energy consumption and improves the balance of energy consumption among sensor nodes by alleviating heavy traffic near the sink.

  11. Advanced Query Formulation in Deductive Databases.

    ERIC Educational Resources Information Center

    Niemi, Timo; Jarvelin, Kalervo

    1992-01-01

    Discusses deductive databases and database management systems (DBMS) and introduces a framework for advanced query formulation for end users. Recursive processing is described, a sample extensional database is presented, query types are explained, and criteria for advanced query formulation from the end user's viewpoint are examined. (31…

  12. Parallel Index and Query for Large Scale Data Analysis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chou, Jerry; Wu, Kesheng; Ruebel, Oliver

    2011-07-18

    Modern scientific datasets present numerous data management and analysis challenges. State-of-the-art index and query technologies are critical for facilitating interactive exploration of large datasets, but numerous challenges remain in terms of designing a system for process- ing general scientific datasets. The system needs to be able to run on distributed multi-core platforms, efficiently utilize underlying I/O infrastructure, and scale to massive datasets. We present FastQuery, a novel software framework that address these challenges. FastQuery utilizes a state-of-the-art index and query technology (FastBit) and is designed to process mas- sive datasets on modern supercomputing platforms. We apply FastQuery to processing ofmore » a massive 50TB dataset generated by a large scale accelerator modeling code. We demonstrate the scalability of the tool to 11,520 cores. Motivated by the scientific need to search for inter- esting particles in this dataset, we use our framework to reduce search time from hours to tens of seconds.« less

  13. Gazetteer Brokering through Semantic Mediation

    NASA Astrophysics Data System (ADS)

    Hobona, G.; Bermudez, L. E.; Brackin, R.

    2013-12-01

    A gazetteer is a geographical directory containing some information regarding places. It provides names, location and other attributes for places which may include points of interest (e.g. buildings, oilfields and boreholes), and other features. These features can be published via web services conforming to the Gazetteer Application Profile of the Web Feature Service (WFS) standard of the Open Geospatial Consortium (OGC). Against the backdrop of advances in geophysical surveys, there has been a significant increase in the amount of data referenced to locations. Gazetteers services have played a significant role in facilitating access to such data, including through provision of specialized queries such as text, spatial and fuzzy search. Recent developments in the OGC have led to advances in gazetteers such as support for multilingualism, diacritics, and querying via advanced spatial constraints (e.g. search by radial search and nearest neighbor). A challenge remaining however, is that gazetteers produced by different organizations have typically been modeled differently. Inconsistencies from gazetteers produced by different organizations may include naming the same feature in a different way, naming the attributes differently, locating the feature in a different location, and providing fewer or more attributes than the other services. The Gazetteer application profile of the WFS is a starting point to address such inconsistencies by providing a standardized interface based on rules specified in ISO 19112, the international standard for spatial referencing by geographic identifiers. The profile, however, does not provide rules to deal with semantic inconsistencies. The USGS and NGA commissioned research into the potential for a Single Point of Entry Global Gazetteer (SPEGG). The research was conducted by the Cross Community Interoperability thread of the OGC testbed, referenced OWS-9. The testbed prototyped approaches for brokering gazetteers through use of semantic web technologies, including ontologies and a semantic mediator. The semantically-enhanced SPEGG allowed a client to submit a single query (e.g. ';hills') and to retrieve data from two separate gazetteers with different vocabularies (e.g. where one refers to ';summits' another refers to ';hills'). Supporting the SPEGG was a SPARQL server that held the ontologies and processed queries on them. Earth Science surveys and forecast always have a place on Earth. Being able to share the information about a place and solve inconsistencies about that place from different sources will enable geoscientists to better do their research. In the advent of mobile geo computing and location based services (LBS), brokering gazetteers will provide geoscientists with access to gazetteer services rich with information and functionality beyond that offered by current generic gazetteers.

  14. Study of Automatic Image Rectification and Registration of Scanned Historical Aerial Photographs

    NASA Astrophysics Data System (ADS)

    Chen, H. R.; Tseng, Y. H.

    2016-06-01

    Historical aerial photographs directly provide good evidences of past times. The Research Center for Humanities and Social Sciences (RCHSS) of Taiwan Academia Sinica has collected and scanned numerous historical maps and aerial images of Taiwan and China. Some maps or images have been geo-referenced manually, but most of historical aerial images have not been registered since there are no GPS or IMU data for orientation assisting in the past. In our research, we developed an automatic process of matching historical aerial images by SIFT (Scale Invariant Feature Transform) for handling the great quantity of images by computer vision. SIFT is one of the most popular method of image feature extracting and matching. This algorithm extracts extreme values in scale space into invariant image features, which are robust to changing in rotation scale, noise, and illumination. We also use RANSAC (Random sample consensus) to remove outliers, and obtain good conjugated points between photographs. Finally, we manually add control points for registration through least square adjustment based on collinear equation. In the future, we can use image feature points of more photographs to build control image database. Every new image will be treated as query image. If feature points of query image match the features in database, it means that the query image probably is overlapped with control images.With the updating of database, more and more query image can be matched and aligned automatically. Other research about multi-time period environmental changes can be investigated with those geo-referenced temporal spatial data.

  15. Secure Skyline Queries on Cloud Platform

    PubMed Central

    Liu, Jinfei; Yang, Juncheng; Xiong, Li; Pei, Jian

    2017-01-01

    Outsourcing data and computation to cloud server provides a cost-effective way to support large scale data storage and query processing. However, due to security and privacy concerns, sensitive data (e.g., medical records) need to be protected from the cloud server and other unauthorized users. One approach is to outsource encrypted data to the cloud server and have the cloud server perform query processing on the encrypted data only. It remains a challenging task to support various queries over encrypted data in a secure and efficient way such that the cloud server does not gain any knowledge about the data, query, and query result. In this paper, we study the problem of secure skyline queries over encrypted data. The skyline query is particularly important for multi-criteria decision making but also presents significant challenges due to its complex computations. We propose a fully secure skyline query protocol on data encrypted using semantically-secure encryption. As a key subroutine, we present a new secure dominance protocol, which can be also used as a building block for other queries. Finally, we provide both serial and parallelized implementations and empirically study the protocols in terms of efficiency and scalability under different parameter settings, verifying the feasibility of our proposed solutions. PMID:28883710

  16. Usage and administration manual for a geodatabase compendium of water-resources data-Rio Grande Basin from the Rio Arriba-Sandoval County line, New Mexico, to Presidio, Texas, 1889-2009

    USGS Publications Warehouse

    Burley, Thomas E.

    2011-01-01

    The U.S. Geological Survey, in cooperation with the New Mexico Interstate Stream Commission, developed a geodatabase compendium (hereinafter referred to as the 'geodatabase') of available water-resources data for the reach of the Rio Grande from Rio Arriba-Sandoval County line, New Mexico, to Presidio, Texas. Since 1889, a wealth of water-resources data has been collected in the Rio Grande Basin from Rio Arriba-Sandoval County line, New Mexico, to Presidio, Texas, for a variety of purposes. Collecting agencies, researchers, and organizations have included the U.S. Geological Survey, Bureau of Reclamation, International Boundary and Water Commission, State agencies, irrigation districts, municipal water utilities, universities, and other entities. About 1,750 data records were recently (2010) evaluated to enhance their usability by compiling them into a single geospatial relational database (geodatabase). This report is intended as a user's manual and administration guide for the geodatabase. All data available, including water quality, water level, and discharge data (both instantaneous and daily) from January 1, 1889, through December 17, 2009, were compiled for the study area. A flexible and efficient geodatabase design was used, enhancing the ability of the geodatabase to handle data from diverse sources and helping to ensure sustainability of the geodatabase with long-term maintenance. Geodatabase tables include daily data values, site locations and information, sample event information, and parameters, as well as data sources and collecting agencies. The end products of this effort are a comprehensive water-resources geodatabase that enables the visualization of primary sampling sites for surface discharges, groundwater elevations, and water-quality and associated data for the study area. In addition, repeatable data processing scripts, Structured Query Language queries for loading prepared data sources, and a detailed process for refreshing all data in the compendium have been developed. The geodatabase functionality allows users to explore spatial characteristics of the data, conduct spatial analyses, and pose questions to the geodatabase in the form of queries. Users can also customize and extend the geodatabase, combine it with other databases, or use the geodatabase design for other water-resources applications.

  17. Providing R-Tree Support for Mongodb

    NASA Astrophysics Data System (ADS)

    Xiang, Longgang; Shao, Xiaotian; Wang, Dehao

    2016-06-01

    Supporting large amounts of spatial data is a significant characteristic of modern databases. However, unlike some mature relational databases, such as Oracle and PostgreSQL, most of current burgeoning NoSQL databases are not well designed for storing geospatial data, which is becoming increasingly important in various fields. In this paper, we propose a novel method to provide R-tree index, as well as corresponding spatial range query and nearest neighbour query functions, for MongoDB, one of the most prevalent NoSQL databases. First, after in-depth analysis of MongoDB's features, we devise an efficient tabular document structure which flattens R-tree index into MongoDB collections. Further, relevant mechanisms of R-tree operations are issued, and then we discuss in detail how to integrate R-tree into MongoDB. Finally, we present the experimental results which show that our proposed method out-performs the built-in spatial index of MongoDB. Our research will greatly facilitate big data management issues with MongoDB in a variety of geospatial information applications.

  18. Searching and Filtering Tweets: CSIRO at the TREC 2012 Microblog Track

    DTIC Science & Technology

    2012-11-01

    stages. We first evaluate the effect of tweet corpus pre- processing in vanilla runs (no query expansion), and then assess the effect of query expansion...Effect of a vanilla run on D4 index (both realtime and non-real-time), and query expansion methods based on the submitted runs for two sets of queries

  19. A Columnar Storage Strategy with Spatiotemporal Index for Big Climate Data

    NASA Astrophysics Data System (ADS)

    Hu, F.; Bowen, M. K.; Li, Z.; Schnase, J. L.; Duffy, D.; Lee, T. J.; Yang, C. P.

    2015-12-01

    Large collections of observational, reanalysis, and climate model output data may grow to as large as a 100 PB in the coming years, so climate dataset is in the Big Data domain, and various distributed computing frameworks have been utilized to address the challenges by big climate data analysis. However, due to the binary data format (NetCDF, HDF) with high spatial and temporal dimensions, the computing frameworks in Apache Hadoop ecosystem are not originally suited for big climate data. In order to make the computing frameworks in Hadoop ecosystem directly support big climate data, we propose a columnar storage format with spatiotemporal index to store climate data, which will support any project in the Apache Hadoop ecosystem (e.g. MapReduce, Spark, Hive, Impala). With this approach, the climate data will be transferred into binary Parquet data format, a columnar storage format, and spatial and temporal index will be built and attached into the end of Parquet files to enable real-time data query. Then such climate data in Parquet data format could be available to any computing frameworks in Hadoop ecosystem. The proposed approach is evaluated using the NASA Modern-Era Retrospective Analysis for Research and Applications (MERRA) climate reanalysis dataset. Experimental results show that this approach could efficiently overcome the gap between the big climate data and the distributed computing frameworks, and the spatiotemporal index could significantly accelerate data querying and processing.

  20. Ontology-based geospatial data query and integration

    USGS Publications Warehouse

    Zhao, T.; Zhang, C.; Wei, M.; Peng, Z.-R.

    2008-01-01

    Geospatial data sharing is an increasingly important subject as large amount of data is produced by a variety of sources, stored in incompatible formats, and accessible through different GIS applications. Past efforts to enable sharing have produced standardized data format such as GML and data access protocols such as Web Feature Service (WFS). While these standards help enabling client applications to gain access to heterogeneous data stored in different formats from diverse sources, the usability of the access is limited due to the lack of data semantics encoded in the WFS feature types. Past research has used ontology languages to describe the semantics of geospatial data but ontology-based queries cannot be applied directly to legacy data stored in databases or shapefiles, or to feature data in WFS services. This paper presents a method to enable ontology query on spatial data available from WFS services and on data stored in databases. We do not create ontology instances explicitly and thus avoid the problems of data replication. Instead, user queries are rewritten to WFS getFeature requests and SQL queries to database. The method also has the benefits of being able to utilize existing tools of databases, WFS, and GML while enabling query based on ontology semantics. ?? 2008 Springer-Verlag Berlin Heidelberg.

  1. A spatio-temporal index for aerial full waveform laser scanning data

    NASA Astrophysics Data System (ADS)

    Laefer, Debra F.; Vo, Anh-Vu; Bertolotto, Michela

    2018-04-01

    Aerial laser scanning is increasingly available in the full waveform version of the raw signal, which can provide greater insight into and control over the data and, thus, richer information about the scanned scenes. However, when compared to conventional discrete point storage, preserving raw waveforms leads to vastly larger and more complex data volumes. To begin addressing these challenges, this paper introduces a novel bi-level approach for storing and indexing full waveform (FWF) laser scanning data in a relational database environment, while considering both the spatial and the temporal dimensions of that data. In the storage scheme's upper level, the full waveform datasets are partitioned into spatial and temporal coherent groups that are indexed by a two-dimensional R∗-tree. To further accelerate intra-block data retrieval, at the lower level a three-dimensional local octree is created for each pulse block. The local octrees are implemented in-memory and can be efficiently written to a database for reuse. The indexing solution enables scalable and efficient three-dimensional (3D) spatial and spatio-temporal queries on the actual pulse data - functionalities not available in other systems. The proposed FWF laser scanning data solution is capable of managing multiple FWF datasets derived from large flight missions. The flight structure is embedded into the data storage model and can be used for querying predicates. Such functionality is important to FWF data exploration since aircraft locations and orientations are frequently required for FWF data analyses. Empirical tests on real datasets of up to 1 billion pulses from Dublin, Ireland prove the almost perfect scalability of the system. The use of the local 3D octree in the indexing structure accelerated pulse clipping by 1.2-3.5 times for non-axis-aligned (NAA) polyhedron shaped clipping windows, while axis-aligned (AA) polyhedron clipping was better served using only the top indexing layer. The distinct behaviours of the hybrid indexing for AA and NAA clipping windows are attributable to the different proportion of the local-index-related overheads with respect to the total querying costs. When temporal constraints were added, generally the number of costly spatial checks were reduced, thereby shortening the querying times.

  2. An approach for heterogeneous and loosely coupled geospatial data distributed computing

    NASA Astrophysics Data System (ADS)

    Chen, Bin; Huang, Fengru; Fang, Yu; Huang, Zhou; Lin, Hui

    2010-07-01

    Most GIS (Geographic Information System) applications tend to have heterogeneous and autonomous geospatial information resources, and the availability of these local resources is unpredictable and dynamic under a distributed computing environment. In order to make use of these local resources together to solve larger geospatial information processing problems that are related to an overall situation, in this paper, with the support of peer-to-peer computing technologies, we propose a geospatial data distributed computing mechanism that involves loosely coupled geospatial resource directories and a term named as Equivalent Distributed Program of global geospatial queries to solve geospatial distributed computing problems under heterogeneous GIS environments. First, a geospatial query process schema for distributed computing as well as a method for equivalent transformation from a global geospatial query to distributed local queries at SQL (Structured Query Language) level to solve the coordinating problem among heterogeneous resources are presented. Second, peer-to-peer technologies are used to maintain a loosely coupled network environment that consists of autonomous geospatial information resources, thus to achieve decentralized and consistent synchronization among global geospatial resource directories, and to carry out distributed transaction management of local queries. Finally, based on the developed prototype system, example applications of simple and complex geospatial data distributed queries are presented to illustrate the procedure of global geospatial information processing.

  3. Constraint-based Data Mining

    NASA Astrophysics Data System (ADS)

    Boulicaut, Jean-Francois; Jeudy, Baptiste

    Knowledge Discovery in Databases (KDD) is a complex interactive process. The promising theoretical framework of inductive databases considers this is essentially a querying process. It is enabled by a query language which can deal either with raw data or patterns which hold in the data. Mining patterns turns to be the so-called inductive query evaluation process for which constraint-based Data Mining techniques have to be designed. An inductive query specifies declaratively the desired constraints and algorithms are used to compute the patterns satisfying the constraints in the data. We survey important results of this active research domain. This chapter emphasizes a real breakthrough for hard problems concerning local pattern mining under various constraints and it points out the current directions of research as well.

  4. Intelligent Context-Aware and Adaptive Interface for Mobile LBS

    PubMed Central

    Liu, Yanhong

    2015-01-01

    Context-aware user interface plays an important role in many human-computer Interaction tasks of location based services. Although spatial models for context-aware systems have been studied extensively, how to locate specific spatial information for users is still not well resolved, which is important in the mobile environment where location based services users are impeded by device limitations. Better context-aware human-computer interaction models of mobile location based services are needed not just to predict performance outcomes, such as whether people will be able to find the information needed to complete a human-computer interaction task, but to understand human processes that interact in spatial query, which will in turn inform the detailed design of better user interfaces in mobile location based services. In this study, a context-aware adaptive model for mobile location based services interface is proposed, which contains three major sections: purpose, adjustment, and adaptation. Based on this model we try to describe the process of user operation and interface adaptation clearly through the dynamic interaction between users and the interface. Then we show how the model applies users' demands in a complicated environment and suggested the feasibility by the experimental results. PMID:26457077

  5. An adaptable architecture for patient cohort identification from diverse data sources.

    PubMed

    Bache, Richard; Miles, Simon; Taweel, Adel

    2013-12-01

    We define and validate an architecture for systems that identify patient cohorts for clinical trials from multiple heterogeneous data sources. This architecture has an explicit query model capable of supporting temporal reasoning and expressing eligibility criteria independently of the representation of the data used to evaluate them. The architecture has the key feature that queries defined according to the query model are both pre and post-processed and this is used to address both structural and semantic heterogeneity. The process of extracting the relevant clinical facts is separated from the process of reasoning about them. A specific instance of the query model is then defined and implemented. We show that the specific instance of the query model has wide applicability. We then describe how it is used to access three diverse data warehouses to determine patient counts. Although the proposed architecture requires greater effort to implement the query model than would be the case for using just SQL and accessing a data-based management system directly, this effort is justified because it supports both temporal reasoning and heterogeneous data sources. The query model only needs to be implemented once no matter how many data sources are accessed. Each additional source requires only the implementation of a lightweight adaptor. The architecture has been used to implement a specific query model that can express complex eligibility criteria and access three diverse data warehouses thus demonstrating the feasibility of this approach in dealing with temporal reasoning and data heterogeneity.

  6. A spatial data handling system for retrieval of images by unrestricted regions of user interest

    NASA Technical Reports Server (NTRS)

    Dorfman, Erik; Cromp, Robert F.

    1992-01-01

    The Intelligent Data Management (IDM) project at NASA/Goddard Space Flight Center has prototyped an Intelligent Information Fusion System (IIFS), which automatically ingests metadata from remote sensor observations into a large catalog which is directly queryable by end-users. The greatest challenge in the implementation of this catalog was supporting spatially-driven searches, where the user has a possible complex region of interest and wishes to recover those images that overlap all or simply a part of that region. A spatial data management system is described, which is capable of storing and retrieving records of image data regardless of their source. This system was designed and implemented as part of the IIFS catalog. A new data structure, called a hypercylinder, is central to the design. The hypercylinder is specifically tailored for data distributed over the surface of a sphere, such as satellite observations of the Earth or space. Operations on the hypercylinder are regulated by two expert systems. The first governs the ingest of new metadata records, and maintains the efficiency of the data structure as it grows. The second translates, plans, and executes users' spatial queries, performing incremental optimization as partial query results are returned.

  7. Ontology for Transforming Geo-Spatial Data for Discovery and Integration of Scientific Data

    NASA Astrophysics Data System (ADS)

    Nguyen, L.; Chee, T.; Minnis, P.

    2013-12-01

    Discovery and access to geo-spatial scientific data across heterogeneous repositories and multi-discipline datasets can present challenges for scientist. We propose to build a workflow for transforming geo-spatial datasets into semantic environment by using relationships to describe the resource using OWL Web Ontology, RDF, and a proposed geo-spatial vocabulary. We will present methods for transforming traditional scientific dataset, use of a semantic repository, and querying using SPARQL to integrate and access datasets. This unique repository will enable discovery of scientific data by geospatial bound or other criteria.

  8. IJA: an efficient algorithm for query processing in sensor networks.

    PubMed

    Lee, Hyun Chang; Lee, Young Jae; Lim, Ji Hyang; Kim, Dong Hwa

    2011-01-01

    One of main features in sensor networks is the function that processes real time state information after gathering needed data from many domains. The component technologies consisting of each node called a sensor node that are including physical sensors, processors, actuators and power have advanced significantly over the last decade. Thanks to the advanced technology, over time sensor networks have been adopted in an all-round industry sensing physical phenomenon. However, sensor nodes in sensor networks are considerably constrained because with their energy and memory resources they have a very limited ability to process any information compared to conventional computer systems. Thus query processing over the nodes should be constrained because of their limitations. Due to the problems, the join operations in sensor networks are typically processed in a distributed manner over a set of nodes and have been studied. By way of example while simple queries, such as select and aggregate queries, in sensor networks have been addressed in the literature, the processing of join queries in sensor networks remains to be investigated. Therefore, in this paper, we propose and describe an Incremental Join Algorithm (IJA) in Sensor Networks to reduce the overhead caused by moving a join pair to the final join node or to minimize the communication cost that is the main consumer of the battery when processing the distributed queries in sensor networks environments. At the same time, the simulation result shows that the proposed IJA algorithm significantly reduces the number of bytes to be moved to join nodes compared to the popular synopsis join algorithm.

  9. IJA: An Efficient Algorithm for Query Processing in Sensor Networks

    PubMed Central

    Lee, Hyun Chang; Lee, Young Jae; Lim, Ji Hyang; Kim, Dong Hwa

    2011-01-01

    One of main features in sensor networks is the function that processes real time state information after gathering needed data from many domains. The component technologies consisting of each node called a sensor node that are including physical sensors, processors, actuators and power have advanced significantly over the last decade. Thanks to the advanced technology, over time sensor networks have been adopted in an all-round industry sensing physical phenomenon. However, sensor nodes in sensor networks are considerably constrained because with their energy and memory resources they have a very limited ability to process any information compared to conventional computer systems. Thus query processing over the nodes should be constrained because of their limitations. Due to the problems, the join operations in sensor networks are typically processed in a distributed manner over a set of nodes and have been studied. By way of example while simple queries, such as select and aggregate queries, in sensor networks have been addressed in the literature, the processing of join queries in sensor networks remains to be investigated. Therefore, in this paper, we propose and describe an Incremental Join Algorithm (IJA) in Sensor Networks to reduce the overhead caused by moving a join pair to the final join node or to minimize the communication cost that is the main consumer of the battery when processing the distributed queries in sensor networks environments. At the same time, the simulation result shows that the proposed IJA algorithm significantly reduces the number of bytes to be moved to join nodes compared to the popular synopsis join algorithm. PMID:22319375

  10. A similarity-based data warehousing environment for medical images.

    PubMed

    Teixeira, Jefferson William; Annibal, Luana Peixoto; Felipe, Joaquim Cezar; Ciferri, Ricardo Rodrigues; Ciferri, Cristina Dutra de Aguiar

    2015-11-01

    A core issue of the decision-making process in the medical field is to support the execution of analytical (OLAP) similarity queries over images in data warehousing environments. In this paper, we focus on this issue. We propose imageDWE, a non-conventional data warehousing environment that enables the storage of intrinsic features taken from medical images in a data warehouse and supports OLAP similarity queries over them. To comply with this goal, we introduce the concept of perceptual layer, which is an abstraction used to represent an image dataset according to a given feature descriptor in order to enable similarity search. Based on this concept, we propose the imageDW, an extended data warehouse with dimension tables specifically designed to support one or more perceptual layers. We also detail how to build an imageDW and how to load image data into it. Furthermore, we show how to process OLAP similarity queries composed of a conventional predicate and a similarity search predicate that encompasses the specification of one or more perceptual layers. Moreover, we introduce an index technique to improve the OLAP query processing over images. We carried out performance tests over a data warehouse environment that consolidated medical images from exams of several modalities. The results demonstrated the feasibility and efficiency of our proposed imageDWE to manage images and to process OLAP similarity queries. The results also demonstrated that the use of the proposed index technique guaranteed a great improvement in query processing. Copyright © 2015 Elsevier Ltd. All rights reserved.

  11. Data Rods: High Speed, Time-Series Analysis of Massive Cryospheric Data Sets Using Object-Oriented Database Methods

    NASA Astrophysics Data System (ADS)

    Liang, Y.; Gallaher, D. W.; Grant, G.; Lv, Q.

    2011-12-01

    Change over time, is the central driver of climate change detection. The goal is to diagnose the underlying causes, and make projections into the future. In an effort to optimize this process we have developed the Data Rod model, an object-oriented approach that provides the ability to query grid cell changes and their relationships to neighboring grid cells through time. The time series data is organized in time-centric structures called "data rods." A single data rod can be pictured as the multi-spectral data history at one grid cell: a vertical column of data through time. This resolves the long-standing problem of managing time-series data and opens new possibilities for temporal data analysis. This structure enables rapid time- centric analysis at any grid cell across multiple sensors and satellite platforms. Collections of data rods can be spatially and temporally filtered, statistically analyzed, and aggregated for use with pattern matching algorithms. Likewise, individual image pixels can be extracted to generate multi-spectral imagery at any spatial and temporal location. The Data Rods project has created a series of prototype databases to store and analyze massive datasets containing multi-modality remote sensing data. Using object-oriented technology, this method overcomes the operational limitations of traditional relational databases. To demonstrate the speed and efficiency of time-centric analysis using the Data Rods model, we have developed a sea ice detection algorithm. This application determines the concentration of sea ice in a small spatial region across a long temporal window. If performed using traditional analytical techniques, this task would typically require extensive data downloads and spatial filtering. Using Data Rods databases, the exact spatio-temporal data set is immediately available No extraneous data is downloaded, and all selected data querying occurs transparently on the server side. Moreover, fundamental statistical calculations such as running averages are easily implemented against the time-centric columns of data.

  12. Targeted exploration and analysis of large cross-platform human transcriptomic compendia

    PubMed Central

    Zhu, Qian; Wong, Aaron K; Krishnan, Arjun; Aure, Miriam R; Tadych, Alicja; Zhang, Ran; Corney, David C; Greene, Casey S; Bongo, Lars A; Kristensen, Vessela N; Charikar, Moses; Li, Kai; Troyanskaya, Olga G.

    2016-01-01

    We present SEEK (http://seek.princeton.edu), a query-based search engine across very large transcriptomic data collections, including thousands of human data sets from almost 50 microarray and next-generation sequencing platforms. SEEK uses a novel query-level cross-validation-based algorithm to automatically prioritize data sets relevant to the query and a robust search approach to identify query-coregulated genes, pathways, and processes. SEEK provides cross-platform handling, multi-gene query search, iterative metadata-based search refinement, and extensive visualization-based analysis options. PMID:25581801

  13. Graphical modeling and query language for hospitals.

    PubMed

    Barzdins, Janis; Barzdins, Juris; Rencis, Edgars; Sostaks, Agris

    2013-01-01

    So far there has been little evidence that implementation of the health information technologies (HIT) is leading to health care cost savings. One of the reasons for this lack of impact by the HIT likely lies in the complexity of the business process ownership in the hospitals. The goal of our research is to develop a business model-based method for hospital use which would allow doctors to retrieve directly the ad-hoc information from various hospital databases. We have developed a special domain-specific process modelling language called the MedMod. Formally, we define the MedMod language as a profile on UML Class diagrams, but we also demonstrate it on examples, where we explain the semantics of all its elements informally. Moreover, we have developed the Process Query Language (PQL) that is based on MedMod process definition language. The purpose of PQL is to allow a doctor querying (filtering) runtime data of hospital's processes described using MedMod. The MedMod language tries to overcome deficiencies in existing process modeling languages, allowing to specify the loosely-defined sequence of the steps to be performed in the clinical process. The main advantages of PQL are in two main areas - usability and efficiency. They are: 1) the view on data through "glasses" of familiar process, 2) the simple and easy-to-perceive means of setting filtering conditions require no more expertise than using spreadsheet applications, 3) the dynamic response to each step in construction of the complete query that shortens the learning curve greatly and reduces the error rate, and 4) the selected means of filtering and data retrieving allows to execute queries in O(n) time regarding the size of the dataset. We are about to continue developing this project with three further steps. First, we are planning to develop user-friendly graphical editors for the MedMod process modeling and query languages. The second step is to do evaluation of usability the proposed language and tool involving the physicians from several hospitals in Latvia and working with real data from these hospitals. Our third step is to develop an efficient implementation of the query language.

  14. An adaptable architecture for patient cohort identification from diverse data sources

    PubMed Central

    Bache, Richard; Miles, Simon; Taweel, Adel

    2013-01-01

    Objective We define and validate an architecture for systems that identify patient cohorts for clinical trials from multiple heterogeneous data sources. This architecture has an explicit query model capable of supporting temporal reasoning and expressing eligibility criteria independently of the representation of the data used to evaluate them. Method The architecture has the key feature that queries defined according to the query model are both pre and post-processed and this is used to address both structural and semantic heterogeneity. The process of extracting the relevant clinical facts is separated from the process of reasoning about them. A specific instance of the query model is then defined and implemented. Results We show that the specific instance of the query model has wide applicability. We then describe how it is used to access three diverse data warehouses to determine patient counts. Discussion Although the proposed architecture requires greater effort to implement the query model than would be the case for using just SQL and accessing a data-based management system directly, this effort is justified because it supports both temporal reasoning and heterogeneous data sources. The query model only needs to be implemented once no matter how many data sources are accessed. Each additional source requires only the implementation of a lightweight adaptor. Conclusions The architecture has been used to implement a specific query model that can express complex eligibility criteria and access three diverse data warehouses thus demonstrating the feasibility of this approach in dealing with temporal reasoning and data heterogeneity. PMID:24064442

  15. Matching health information seekers' queries to medical terms

    PubMed Central

    2012-01-01

    Background The Internet is a major source of health information but most seekers are not familiar with medical vocabularies. Hence, their searches fail due to bad query formulation. Several methods have been proposed to improve information retrieval: query expansion, syntactic and semantic techniques or knowledge-based methods. However, it would be useful to clean those queries which are misspelled. In this paper, we propose a simple yet efficient method in order to correct misspellings of queries submitted by health information seekers to a medical online search tool. Methods In addition to query normalizations and exact phonetic term matching, we tested two approximate string comparators: the similarity score function of Stoilos and the normalized Levenshtein edit distance. We propose here to combine them to increase the number of matched medical terms in French. We first took a sample of query logs to determine the thresholds and processing times. In the second run, at a greater scale we tested different combinations of query normalizations before or after misspelling correction with the retained thresholds in the first run. Results According to the total number of suggestions (around 163, the number of the first sample of queries), at a threshold comparator score of 0.3, the normalized Levenshtein edit distance gave the highest F-Measure (88.15%) and at a threshold comparator score of 0.7, the Stoilos function gave the highest F-Measure (84.31%). By combining Levenshtein and Stoilos, the highest F-Measure (80.28%) is obtained with 0.2 and 0.7 thresholds respectively. However, queries are composed by several terms that may be combination of medical terms. The process of query normalization and segmentation is thus required. The highest F-Measure (64.18%) is obtained when this process is realized before spelling-correction. Conclusions Despite the widely known high performance of the normalized edit distance of Levenshtein, we show in this paper that its combination with the Stoilos algorithm improved the results for misspelling correction of user queries. Accuracy is improved by combining spelling, phoneme-based information and string normalizations and segmentations into medical terms. These encouraging results have enabled the integration of this method into two projects funded by the French National Research Agency-Technologies for Health Care. The first aims to facilitate the coding process of clinical free texts contained in Electronic Health Records and discharge summaries, whereas the second aims at improving information retrieval through Electronic Health Records. PMID:23095521

  16. Efficient processing of multiple nested event pattern queries over multi-dimensional event streams based on a triaxial hierarchical model.

    PubMed

    Xiao, Fuyuan; Aritsugi, Masayoshi; Wang, Qing; Zhang, Rong

    2016-09-01

    For efficient and sophisticated analysis of complex event patterns that appear in streams of big data from health care information systems and support for decision-making, a triaxial hierarchical model is proposed in this paper. Our triaxial hierarchical model is developed by focusing on hierarchies among nested event pattern queries with an event concept hierarchy, thereby allowing us to identify the relationships among the expressions and sub-expressions of the queries extensively. We devise a cost-based heuristic by means of the triaxial hierarchical model to find an optimised query execution plan in terms of the costs of both the operators and the communications between them. According to the triaxial hierarchical model, we can also calculate how to reuse the results of the common sub-expressions in multiple queries. By integrating the optimised query execution plan with the reuse schemes, a multi-query optimisation strategy is developed to accomplish efficient processing of multiple nested event pattern queries. We present empirical studies in which the performance of multi-query optimisation strategy was examined under various stream input rates and workloads. Specifically, the workloads of pattern queries can be used for supporting monitoring patients' conditions. On the other hand, experiments with varying input rates of streams can correspond to changes of the numbers of patients that a system should manage, whereas burst input rates can correspond to changes of rushes of patients to be taken care of. The experimental results have shown that, in Workload 1, our proposal can improve about 4 and 2 times throughput comparing with the relative works, respectively; in Workload 2, our proposal can improve about 3 and 2 times throughput comparing with the relative works, respectively; in Workload 3, our proposal can improve about 6 times throughput comparing with the relative work. The experimental results demonstrated that our proposal was able to process complex queries efficiently which can support health information systems and further decision-making. Copyright © 2016 Elsevier B.V. All rights reserved.

  17. Geospatial Data Management Platform for Urban Groundwater

    NASA Astrophysics Data System (ADS)

    Gaitanaru, D.; Priceputu, A.; Gogu, C. R.

    2012-04-01

    Due to the large amount of civil work projects and research studies, large quantities of geo-data are produced for the urban environments. These data are usually redundant as well as they are spread in different institutions or private companies. Time consuming operations like data processing and information harmonisation represents the main reason to systematically avoid the re-use of data. The urban groundwater data shows the same complex situation. The underground structures (subway lines, deep foundations, underground parkings, and others), the urban facility networks (sewer systems, water supply networks, heating conduits, etc), the drainage systems, the surface water works and many others modify continuously. As consequence, their influence on groundwater changes systematically. However, these activities provide a large quantity of data, aquifers modelling and then behaviour prediction can be done using monitored quantitative and qualitative parameters. Due to the rapid evolution of technology in the past few years, transferring large amounts of information through internet has now become a feasible solution for sharing geoscience data. Furthermore, standard platform-independent means to do this have been developed (specific mark-up languages like: GML, GeoSciML, WaterML, GWML, CityML). They allow easily large geospatial databases updating and sharing through internet, even between different companies or between research centres that do not necessarily use the same database structures. For Bucharest City (Romania) an integrated platform for groundwater geospatial data management is developed under the framework of a national research project - "Sedimentary media modeling platform for groundwater management in urban areas" (SIMPA) financed by the National Authority for Scientific Research of Romania. The platform architecture is based on three components: a geospatial database, a desktop application (a complex set of hydrogeological and geological analysis tools) and a front-end geoportal service. The SIMPA platform makes use of mark-up transfer standards to provide a user-friendly application that can be accessed through internet to query, analyse, and visualise geospatial data related to urban groundwater. The platform holds the information within the local groundwater geospatial databases and the user is able to access this data through a geoportal service. The database architecture allows storing accurate and very detailed geological, hydrogeological, and infrastructure information that can be straightforwardly generalized and further upscaled. The geoportal service offers the possibility of querying a dataset from the spatial database. The query is coded in a standard mark-up language, and sent to the server through a standard Hyper Text Transfer Protocol (http) to be processed by the local application. After the validation of the query, the results are sent back to the user to be displayed by the geoportal application. The main advantage of the SIMPA platform is that it offers to the user the possibility to make a primary multi-criteria query, which results in a smaller set of data to be analysed afterwards. This improves both the transfer process parameters and the user's means of creating the desired query.

  18. Model for Semantically Rich Point Cloud Data

    NASA Astrophysics Data System (ADS)

    Poux, F.; Neuville, R.; Hallot, P.; Billen, R.

    2017-10-01

    This paper proposes an interoperable model for managing high dimensional point clouds while integrating semantics. Point clouds from sensors are a direct source of information physically describing a 3D state of the recorded environment. As such, they are an exhaustive representation of the real world at every scale: 3D reality-based spatial data. Their generation is increasingly fast but processing routines and data models lack of knowledge to reason from information extraction rather than interpretation. The enhanced smart point cloud developed model allows to bring intelligence to point clouds via 3 connected meta-models while linking available knowledge and classification procedures that permits semantic injection. Interoperability drives the model adaptation to potentially many applications through specialized domain ontologies. A first prototype is implemented in Python and PostgreSQL database and allows to combine semantic and spatial concepts for basic hybrid queries on different point clouds.

  19. Producing approximate answers to database queries

    NASA Technical Reports Server (NTRS)

    Vrbsky, Susan V.; Liu, Jane W. S.

    1993-01-01

    We have designed and implemented a query processor, called APPROXIMATE, that makes approximate answers available if part of the database is unavailable or if there is not enough time to produce an exact answer. The accuracy of the approximate answers produced improves monotonically with the amount of data retrieved to produce the result. The exact answer is produced if all of the needed data are available and query processing is allowed to continue until completion. The monotone query processing algorithm of APPROXIMATE works within the standard relational algebra framework and can be implemented on a relational database system with little change to the relational architecture. We describe here the approximation semantics of APPROXIMATE that serves as the basis for meaningful approximations of both set-valued and single-valued queries. We show how APPROXIMATE is implemented to make effective use of semantic information, provided by an object-oriented view of the database, and describe the additional overhead required by APPROXIMATE.

  20. Modelling the spatial distribution of Fasciola hepatica in bovines using decision tree, logistic regression and GIS query approaches for Brazil.

    PubMed

    Bennema, S C; Molento, M B; Scholte, R G; Carvalho, O S; Pritsch, I

    2017-11-01

    Fascioliasis is a condition caused by the trematode Fasciola hepatica. In this paper, the spatial distribution of F. hepatica in bovines in Brazil was modelled using a decision tree approach and a logistic regression, combined with a geographic information system (GIS) query. In the decision tree and the logistic model, isothermality had the strongest influence on disease prevalence. Also, the 50-year average precipitation in the warmest quarter of the year was included as a risk factor, having a negative influence on the parasite prevalence. The risk maps developed using both techniques, showed a predicted higher prevalence mainly in the South of Brazil. The prediction performance seemed to be high, but both techniques failed to reach a high accuracy in predicting the medium and high prevalence classes to the entire country. The GIS query map, based on the range of isothermality, minimum temperature of coldest month, precipitation of warmest quarter of the year, altitude and the average dailyland surface temperature, showed a possibility of presence of F. hepatica in a very large area. The risk maps produced using these methods can be used to focus activities of animal and public health programmes, even on non-evaluated F. hepatica areas.

  1. Comparing fixed sampling with minimizer sampling when using k-mer indexes to find maximal exact matches.

    PubMed

    Almutairy, Meznah; Torng, Eric

    2018-01-01

    Bioinformatics applications and pipelines increasingly use k-mer indexes to search for similar sequences. The major problem with k-mer indexes is that they require lots of memory. Sampling is often used to reduce index size and query time. Most applications use one of two major types of sampling: fixed sampling and minimizer sampling. It is well known that fixed sampling will produce a smaller index, typically by roughly a factor of two, whereas it is generally assumed that minimizer sampling will produce faster query times since query k-mers can also be sampled. However, no direct comparison of fixed and minimizer sampling has been performed to verify these assumptions. We systematically compare fixed and minimizer sampling using the human genome as our database. We use the resulting k-mer indexes for fixed sampling and minimizer sampling to find all maximal exact matches between our database, the human genome, and three separate query sets, the mouse genome, the chimp genome, and an NGS data set. We reach the following conclusions. First, using larger k-mers reduces query time for both fixed sampling and minimizer sampling at a cost of requiring more space. If we use the same k-mer size for both methods, fixed sampling requires typically half as much space whereas minimizer sampling processes queries only slightly faster. If we are allowed to use any k-mer size for each method, then we can choose a k-mer size such that fixed sampling both uses less space and processes queries faster than minimizer sampling. The reason is that although minimizer sampling is able to sample query k-mers, the number of shared k-mer occurrences that must be processed is much larger for minimizer sampling than fixed sampling. In conclusion, we argue that for any application where each shared k-mer occurrence must be processed, fixed sampling is the right sampling method.

  2. Comparing fixed sampling with minimizer sampling when using k-mer indexes to find maximal exact matches

    PubMed Central

    Torng, Eric

    2018-01-01

    Bioinformatics applications and pipelines increasingly use k-mer indexes to search for similar sequences. The major problem with k-mer indexes is that they require lots of memory. Sampling is often used to reduce index size and query time. Most applications use one of two major types of sampling: fixed sampling and minimizer sampling. It is well known that fixed sampling will produce a smaller index, typically by roughly a factor of two, whereas it is generally assumed that minimizer sampling will produce faster query times since query k-mers can also be sampled. However, no direct comparison of fixed and minimizer sampling has been performed to verify these assumptions. We systematically compare fixed and minimizer sampling using the human genome as our database. We use the resulting k-mer indexes for fixed sampling and minimizer sampling to find all maximal exact matches between our database, the human genome, and three separate query sets, the mouse genome, the chimp genome, and an NGS data set. We reach the following conclusions. First, using larger k-mers reduces query time for both fixed sampling and minimizer sampling at a cost of requiring more space. If we use the same k-mer size for both methods, fixed sampling requires typically half as much space whereas minimizer sampling processes queries only slightly faster. If we are allowed to use any k-mer size for each method, then we can choose a k-mer size such that fixed sampling both uses less space and processes queries faster than minimizer sampling. The reason is that although minimizer sampling is able to sample query k-mers, the number of shared k-mer occurrences that must be processed is much larger for minimizer sampling than fixed sampling. In conclusion, we argue that for any application where each shared k-mer occurrence must be processed, fixed sampling is the right sampling method. PMID:29389989

  3. Automation and integration of components for generalized semantic markup of electronic medical texts.

    PubMed

    Dugan, J M; Berrios, D C; Liu, X; Kim, D K; Kaizer, H; Fagan, L M

    1999-01-01

    Our group has built an information retrieval system based on a complex semantic markup of medical textbooks. We describe the construction of a set of web-based knowledge-acquisition tools that expedites the collection and maintenance of the concepts required for text markup and the search interface required for information retrieval from the marked text. In the text markup system, domain experts (DEs) identify sections of text that contain one or more elements from a finite set of concepts. End users can then query the text using a predefined set of questions, each of which identifies a subset of complementary concepts. The search process matches that subset of concepts to relevant points in the text. The current process requires that the DE invest significant time to generate the required concepts and questions. We propose a new system--called ACQUIRE (Acquisition of Concepts and Queries in an Integrated Retrieval Environment)--that assists a DE in two essential tasks in the text-markup process. First, it helps her to develop, edit, and maintain the concept model: the set of concepts with which she marks the text. Second, ACQUIRE helps her to develop a query model: the set of specific questions that end users can later use to search the marked text. The DE incorporates concepts from the concept model when she creates the questions in the query model. The major benefit of the ACQUIRE system is a reduction in the time and effort required for the text-markup process. We compared the process of concept- and query-model creation using ACQUIRE to the process used in previous work by rebuilding two existing models that we previously constructed manually. We observed a significant decrease in the time required to build and maintain the concept and query models.

  4. Architecture and prototypical implementation of a semantic querying system for big Earth observation image bases

    PubMed Central

    Tiede, Dirk; Baraldi, Andrea; Sudmanns, Martin; Belgiu, Mariana; Lang, Stefan

    2017-01-01

    ABSTRACT Spatiotemporal analytics of multi-source Earth observation (EO) big data is a pre-condition for semantic content-based image retrieval (SCBIR). As a proof of concept, an innovative EO semantic querying (EO-SQ) subsystem was designed and prototypically implemented in series with an EO image understanding (EO-IU) subsystem. The EO-IU subsystem is automatically generating ESA Level 2 products (scene classification map, up to basic land cover units) from optical satellite data. The EO-SQ subsystem comprises a graphical user interface (GUI) and an array database embedded in a client server model. In the array database, all EO images are stored as a space-time data cube together with their Level 2 products generated by the EO-IU subsystem. The GUI allows users to (a) develop a conceptual world model based on a graphically supported query pipeline as a combination of spatial and temporal operators and/or standard algorithms and (b) create, save and share within the client-server architecture complex semantic queries/decision rules, suitable for SCBIR and/or spatiotemporal EO image analytics, consistent with the conceptual world model. PMID:29098143

  5. Optimizing a Query by Transformation and Expansion.

    PubMed

    Glocker, Katrin; Knurr, Alexander; Dieter, Julia; Dominick, Friederike; Forche, Melanie; Koch, Christian; Pascoe Pérez, Analie; Roth, Benjamin; Ückert, Frank

    2017-01-01

    In the biomedical sector not only the amount of information produced and uploaded into the web is enormous, but also the number of sources where these data can be found. Clinicians and researchers spend huge amounts of time on trying to access this information and to filter the most important answers to a given question. As the formulation of these queries is crucial, automated query expansion is an effective tool to optimize a query and receive the best possible results. In this paper we introduce the concept of a workflow for an optimization of queries in the medical and biological sector by using a series of tools for expansion and transformation of the query. After the definition of attributes by the user, the query string is compared to previous queries in order to add semantic co-occurring terms to the query. Additionally, the query is enlarged by an inclusion of synonyms. The translation into database specific ontologies ensures the optimal query formulation for the chosen database(s). As this process can be performed in various databases at once, the results are ranked and normalized in order to achieve a comparable list of answers for a question.

  6. Merging OLTP and OLAP - Back to the Future

    NASA Astrophysics Data System (ADS)

    Lehner, Wolfgang

    When the terms "Data Warehousing" and "Online Analytical Processing" were coined in the 1990s by Kimball, Codd, and others, there was an obvious need for separating data and workload for operational transactional-style processing and decision-making implying complex analytical queries over large and historic data sets. Large data warehouse infrastructures have been set up to cope with the special requirements of analytical query answering for multiple reasons: For example, analytical thinking heavily relies on predefined navigation paths to guide the user through the data set and to provide different views on different aggregation levels.Multi-dimensional queries exploiting hierarchically structured dimensions lead to complex star queries at a relational backend, which could hardly be handled by classical relational systems.

  7. Making Temporal Search More Central in Spatial Data Infrastructures

    NASA Astrophysics Data System (ADS)

    Corti, P.; Lewis, B.

    2017-10-01

    A temporally enabled Spatial Data Infrastructure (SDI) is a framework of geospatial data, metadata, users, and tools intended to provide an efficient and flexible way to use spatial information which includes the historical dimension. One of the key software components of an SDI is the catalogue service which is needed to discover, query, and manage the metadata. A search engine is a software system capable of supporting fast and reliable search, which may use any means necessary to get users to the resources they need quickly and efficiently. These techniques may include features such as full text search, natural language processing, weighted results, temporal search based on enrichment, visualization of patterns in distributions of results in time and space using temporal and spatial faceting, and many others. In this paper we will focus on the temporal aspects of search which include temporal enrichment using a time miner - a software engine able to search for date components within a larger block of text, the storage of time ranges in the search engine, handling historical dates, and the use of temporal histograms in the user interface to display the temporal distribution of search results.

  8. Incidental threat during visuospatial working memory in adolescent anxiety: an emotional memory-guided saccade task.

    PubMed

    Mueller, Sven C; Shechner, Tomer; Rosen, Dana; Nelson, Eric E; Pine, Daniel S; Ernst, Monique

    2015-04-01

    Pediatric anxiety disorders are among the most common psychiatric mental illnesses in children and adolescents, and are associated with abnormal cognitive control in emotional, particularly threat, contexts. In a series of studies using eye movement saccade tasks, we reported anxiety-related alterations in the interplay of inhibitory control with incentives, or with emotional distractors. The present study extends these findings to working memory (WM), and queries the interaction of spatial WM with emotional stimuli in pediatric clinical anxiety. Participants were 33 children/adolescents diagnosed with an anxiety disorder, and 22 age-matched healthy comparison youths. Participants completed a novel eye movement task, an affective variant of the memory-guided saccade task. This task assessed the influence of incidental threat on spatial WM processes during high and low cognitive load. Healthy but not anxious children/adolescents showed slowed saccade latencies during incidental threat in low-load but not high-load WM conditions. No other group effects emerged on saccade latency or accuracy. The current data suggest a differential pattern of how emotion interacts with cognitive control in healthy youth relative to anxious youth. These findings extend data from inhibitory processes, reported previously, to spatial WM in pediatric anxiety. © 2015 Wiley Periodicals, Inc.

  9. A novel visualization model for web search results.

    PubMed

    Nguyen, Tien N; Zhang, Jin

    2006-01-01

    This paper presents an interactive visualization system, named WebSearchViz, for visualizing the Web search results and acilitating users' navigation and exploration. The metaphor in our model is the solar system with its planets and asteroids revolving around the sun. Location, color, movement, and spatial distance of objects in the visual space are used to represent the semantic relationships between a query and relevant Web pages. Especially, the movement of objects and their speeds add a new dimension to the visual space, illustrating the degree of relevance among a query and Web search results in the context of users' subjects of interest. By interacting with the visual space, users are able to observe the semantic relevance between a query and a resulting Web page with respect to their subjects of interest, context information, or concern. Users' subjects of interest can be dynamically changed, redefined, added, or deleted from the visual space.

  10. Clinic expert information extraction based on domain model and block importance model.

    PubMed

    Zhang, Yuanpeng; Wang, Li; Qian, Danmin; Geng, Xingyun; Yao, Dengfu; Dong, Jiancheng

    2015-11-01

    To extract expert clinic information from the Deep Web, there are two challenges to face. The first one is to make a judgment on forms. A novel method based on a domain model, which is a tree structure constructed by the attributes of query interfaces is proposed. With this model, query interfaces can be classified to a domain and filled in with domain keywords. Another challenge is to extract information from response Web pages indexed by query interfaces. To filter the noisy information on a Web page, a block importance model is proposed, both content and spatial features are taken into account in this model. The experimental results indicate that the domain model yields a precision 4.89% higher than that of the rule-based method, whereas the block importance model yields an F1 measure 10.5% higher than that of the XPath method. Copyright © 2015 Elsevier Ltd. All rights reserved.

  11. Evolution of Query Optimization Methods

    NASA Astrophysics Data System (ADS)

    Hameurlain, Abdelkader; Morvan, Franck

    Query optimization is the most critical phase in query processing. In this paper, we try to describe synthetically the evolution of query optimization methods from uniprocessor relational database systems to data Grid systems through parallel, distributed and data integration systems. We point out a set of parameters to characterize and compare query optimization methods, mainly: (i) size of the search space, (ii) type of method (static or dynamic), (iii) modification types of execution plans (re-optimization or re-scheduling), (iv) level of modification (intra-operator and/or inter-operator), (v) type of event (estimation errors, delay, user preferences), and (vi) nature of decision-making (centralized or decentralized control).

  12. Comparing the performance of two CBIRS indexing schemes

    NASA Astrophysics Data System (ADS)

    Mueller, Wolfgang; Robbert, Guenter; Henrich, Andreas

    2003-01-01

    Content based image retrieval (CBIR) as it is known today has to deal with a number of challenges. Quickly summarized, the main challenges are firstly, to bridge the semantic gap between high-level concepts and low-level features using feedback, secondly to provide performance under adverse conditions. High-dimensional spaces, as well as a demanding machine learning task make the right way of indexing an important issue. When indexing multimedia data, most groups opt for extraction of high-dimensional feature vectors from the data, followed by dimensionality reduction like PCA (Principal Components Analysis) or LSI (Latent Semantic Indexing). The resulting vectors are indexed using spatial indexing structures such as kd-trees or R-trees, for example. Other projects, such as MARS and Viper propose the adaptation of text indexing techniques, notably the inverted file. Here, the Viper system is the most direct adaptation of text retrieval techniques to quantized vectors. However, while the Viper query engine provides decent performance together with impressive user-feedback behavior, as well as the possibility for easy integration of long-term learning algorithms, and support for potentially infinite feature vectors, there has been no comparison of vector-based methods and inverted-file-based methods under similar conditions. In this publication, we compare a CBIR query engine that uses inverted files (Bothrops, a rewrite of the Viper query engine based on a relational database), and a CBIR query engine based on LSD (Local Split Decision) trees for spatial indexing using the same feature sets. The Benchathlon initiative works on providing a set of images and ground truth for simulating image queries by example and corresponding user feedback. When performing the Benchathlon benchmark on a CBIR system (the System Under Test, SUT), a benchmarking harness connects over internet to the SUT, performing a number of queries using an agreed-upon protocol, the multimedia retrieval markup language (MRML). Using this benchmark one can measure the quality of retrieval, as well as the overall (speed) performance of the benchmarked system. Our Benchmarks will draw on the Benchathlon"s work for documenting the retrieval performance of both inverted file-based and LSD tree based techniques. However in addition to these results, we will present statistics, that can be obtained only inside the system under test. These statistics will include the number of complex mathematical operations, as well as the amount of data that has to be read from disk during operation of a query.

  13. A Simple Blueprint for Automatic Boolean Query Processing.

    ERIC Educational Resources Information Center

    Salton, G.

    1988-01-01

    Describes a new Boolean retrieval environment in which an extended soft Boolean logic is used to automatically construct queries from original natural language formulations provided by users. Experimental results that compare the retrieval effectiveness of this method to conventional Boolean and vector processing are discussed. (27 references)…

  14. Querying and Extracting Timeline Information from Road Traffic Sensor Data

    PubMed Central

    Imawan, Ardi; Indikawati, Fitri Indra; Kwon, Joonho; Rao, Praveen

    2016-01-01

    The escalation of traffic congestion in urban cities has urged many countries to use intelligent transportation system (ITS) centers to collect historical traffic sensor data from multiple heterogeneous sources. By analyzing historical traffic data, we can obtain valuable insights into traffic behavior. Many existing applications have been proposed with limited analysis results because of the inability to cope with several types of analytical queries. In this paper, we propose the QET (querying and extracting timeline information) system—a novel analytical query processing method based on a timeline model for road traffic sensor data. To address query performance, we build a TQ-index (timeline query-index) that exploits spatio-temporal features of timeline modeling. We also propose an intuitive timeline visualization method to display congestion events obtained from specified query parameters. In addition, we demonstrate the benefit of our system through a performance evaluation using a Busan ITS dataset and a Seattle freeway dataset. PMID:27563900

  15. Geospatial Data Stream Processing in Python Using FOSS4G Components

    NASA Astrophysics Data System (ADS)

    McFerren, G.; van Zyl, T.

    2016-06-01

    One viewpoint of current and future IT systems holds that there is an increase in the scale and velocity at which data are acquired and analysed from heterogeneous, dynamic sources. In the earth observation and geoinformatics domains, this process is driven by the increase in number and types of devices that report location and the proliferation of assorted sensors, from satellite constellations to oceanic buoy arrays. Much of these data will be encountered as self-contained messages on data streams - continuous, infinite flows of data. Spatial analytics over data streams concerns the search for spatial and spatio-temporal relationships within and amongst data "on the move". In spatial databases, queries can assess a store of data to unpack spatial relationships; this is not the case on streams, where spatial relationships need to be established with the incomplete data available. Methods for spatially-based indexing, filtering, joining and transforming of streaming data need to be established and implemented in software components. This article describes the usage patterns and performance metrics of a number of well known FOSS4G Python software libraries within the data stream processing paradigm. In particular, we consider the RTree library for spatial indexing, the Shapely library for geometric processing and transformation and the PyProj library for projection and geodesic calculations over streams of geospatial data. We introduce a message oriented Python-based geospatial data streaming framework called Swordfish, which provides data stream processing primitives, functions, transports and a common data model for describing messages, based on the Open Geospatial Consortium Observations and Measurements (O&M) and Unidata Common Data Model (CDM) standards. We illustrate how the geospatial software components are integrated with the Swordfish framework. Furthermore, we describe the tight temporal constraints under which geospatial functionality can be invoked when processing high velocity, potentially infinite geospatial data streams. The article discusses the performance of these libraries under simulated streaming loads (size, complexity and volume of messages) and how they can be deployed and utilised with Swordfish under real load scenarios, illustrated by a set of Vessel Automatic Identification System (AIS) use cases. We conclude that the described software libraries are able to perform adequately under geospatial data stream processing scenarios - many real application use cases will be handled sufficiently by the software.

  16. Database technology and the management of multimedia data in the Mirror project

    NASA Astrophysics Data System (ADS)

    de Vries, Arjen P.; Blanken, H. M.

    1998-10-01

    Multimedia digital libraries require an open distributed architecture instead of a monolithic database system. In the Mirror project, we use the Monet extensible database kernel to manage different representation of multimedia objects. To maintain independence between content, meta-data, and the creation of meta-data, we allow distribution of data and operations using CORBA. This open architecture introduces new problems for data access. From an end user's perspective, the problem is how to search the available representations to fulfill an actual information need; the conceptual gap between human perceptual processes and the meta-data is too large. From a system's perspective, several representations of the data may semantically overlap or be irrelevant. We address these problems with an iterative query process and active user participating through relevance feedback. A retrieval model based on inference networks assists the user with query formulation. The integration of this model into the database design has two advantages. First, the user can query both the logical and the content structure of multimedia objects. Second, the use of different data models in the logical and the physical database design provides data independence and allows algebraic query optimization. We illustrate query processing with a music retrieval application.

  17. TOWARD AN ACCURATE ANALYSIS OF RANGE QUERIES ON SPATIAL DATA. (R825195)

    EPA Science Inventory

    The perspectives, information and conclusions conveyed in research project abstracts, progress reports, final reports, journal abstracts and journal publications convey the viewpoints of the principal investigator and may not represent the views and policies of ORD and EPA. Concl...

  18. Research on spatio-temporal database techniques for spatial information service

    NASA Astrophysics Data System (ADS)

    Zhao, Rong; Wang, Liang; Li, Yuxiang; Fan, Rongshuang; Liu, Ping; Li, Qingyuan

    2007-06-01

    Geographic data should be described by spatial, temporal and attribute components, but the spatio-temporal queries are difficult to be answered within current GIS. This paper describes research into the development and application of spatio-temporal data management system based upon GeoWindows GIS software platform which was developed by Chinese Academy of Surveying and Mapping (CASM). Faced the current and practical requirements of spatial information application, and based on existing GIS platform, one kind of spatio-temporal data model which integrates vector and grid data together was established firstly. Secondly, we solved out the key technique of building temporal data topology, successfully developed a suit of spatio-temporal database management system adopting object-oriented methods. The system provides the temporal data collection, data storage, data management and data display and query functions. Finally, as a case study, we explored the application of spatio-temporal data management system with the administrative region data of multi-history periods of China as the basic data. With all the efforts above, the GIS capacity of management and manipulation in aspect of time and attribute of GIS has been enhanced, and technical reference has been provided for the further development of temporal geographic information system (TGIS).

  19. Multidatabase Query Processing with Uncertainty in Global Keys and Attribute Values.

    ERIC Educational Resources Information Center

    Scheuermann, Peter; Li, Wen-Syan; Clifton, Chris

    1998-01-01

    Presents an approach for dynamic database integration and query processing in the absence of information about attribute correspondences and global IDs. Defines different types of equivalence conditions for the construction of global IDs. Proposes a strategy based on ranked role-sets that makes use of an automated semantic integration procedure…

  20. Method for localizing and isolating an errant process step

    DOEpatents

    Tobin, Jr., Kenneth W.; Karnowski, Thomas P.; Ferrell, Regina K.

    2003-01-01

    A method for localizing and isolating an errant process includes the steps of retrieving from a defect image database a selection of images each image having image content similar to image content extracted from a query image depicting a defect, each image in the selection having corresponding defect characterization data. A conditional probability distribution of the defect having occurred in a particular process step is derived from the defect characterization data. A process step as a highest probable source of the defect according to the derived conditional probability distribution is then identified. A method for process step defect identification includes the steps of characterizing anomalies in a product, the anomalies detected by an imaging system. A query image of a product defect is then acquired. A particular characterized anomaly is then correlated with the query image. An errant process step is then associated with the correlated image.

  1. Toward a Data Scalable Solution for Facilitating Discovery of Science Resources

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Weaver, Jesse R.; Castellana, Vito G.; Morari, Alessandro

    Science is increasingly motivated by the need to process larger quantities of data. It is facing severe challenges in data collection, management, and processing, so much so that the computational demands of “data scaling” are competing with, and in many fields surpassing, the traditional objective of decreasing processing time. Example domains with large datasets include astronomy, biology, genomics, climate/weather, and material sciences. This paper presents a real-world use case in which we wish to answer queries pro- vided by domain scientists in order to facilitate discovery of relevant science resources. The problem is that the metadata for these science resourcesmore » is very large and is growing quickly, rapidly increasing the need for a data scaling solution. We propose a system – SGEM – designed for answering graph-based queries over large datasets on cluster architectures, and we re- port performance results for queries on the current RDESC dataset of nearly 1.4 billion triples, and on the well-known BSBM SPARQL query benchmark.« less

  2. Semantic based man-machine interface for real-time communication

    NASA Technical Reports Server (NTRS)

    Ali, M.; Ai, C.-S.

    1988-01-01

    A flight expert system (FLES) was developed to assist pilots in monitoring, diagnosing and recovering from in-flight faults. To provide a communications interface between the flight crew and FLES, a natural language interface (NALI) was implemented. Input to NALI is processed by three processors: (1) the semantics parser; (2) the knowledge retriever; and (3) the response generator. First the semantic parser extracts meaningful words and phrases to generate an internal representation of the query. At this point, the semantic parser has the ability to map different input forms related to the same concept into the same internal representation. Then the knowledge retriever analyzes and stores the context of the query to aid in resolving ellipses and pronoun references. At the end of this process, a sequence of retrievel functions is created as a first step in generating the proper response. Finally, the response generator generates the natural language response to the query. The architecture of NALI was designed to process both temporal and nontemporal queries. The architecture and implementation of NALI are described.

  3. Automation and integration of components for generalized semantic markup of electronic medical texts.

    PubMed Central

    Dugan, J. M.; Berrios, D. C.; Liu, X.; Kim, D. K.; Kaizer, H.; Fagan, L. M.

    1999-01-01

    Our group has built an information retrieval system based on a complex semantic markup of medical textbooks. We describe the construction of a set of web-based knowledge-acquisition tools that expedites the collection and maintenance of the concepts required for text markup and the search interface required for information retrieval from the marked text. In the text markup system, domain experts (DEs) identify sections of text that contain one or more elements from a finite set of concepts. End users can then query the text using a predefined set of questions, each of which identifies a subset of complementary concepts. The search process matches that subset of concepts to relevant points in the text. The current process requires that the DE invest significant time to generate the required concepts and questions. We propose a new system--called ACQUIRE (Acquisition of Concepts and Queries in an Integrated Retrieval Environment)--that assists a DE in two essential tasks in the text-markup process. First, it helps her to develop, edit, and maintain the concept model: the set of concepts with which she marks the text. Second, ACQUIRE helps her to develop a query model: the set of specific questions that end users can later use to search the marked text. The DE incorporates concepts from the concept model when she creates the questions in the query model. The major benefit of the ACQUIRE system is a reduction in the time and effort required for the text-markup process. We compared the process of concept- and query-model creation using ACQUIRE to the process used in previous work by rebuilding two existing models that we previously constructed manually. We observed a significant decrease in the time required to build and maintain the concept and query models. Images Figure 1 Figure 2 Figure 4 Figure 5 PMID:10566457

  4. Ontology Based Quality Evaluation for Spatial Data

    NASA Astrophysics Data System (ADS)

    Yılmaz, C.; Cömert, Ç.

    2015-08-01

    Many institutions will be providing data to the National Spatial Data Infrastructure (NSDI). Current technical background of the NSDI is based on syntactic web services. It is expected that this will be replaced by semantic web services. The quality of the data provided is important in terms of the decision-making process and the accuracy of transactions. Therefore, the data quality needs to be tested. This topic has been neglected in Turkey. Data quality control for NSDI may be done by private or public "data accreditation" institutions. A methodology is required for data quality evaluation. There are studies for data quality including ISO standards, academic studies and software to evaluate spatial data quality. ISO 19157 standard defines the data quality elements. Proprietary software such as, 1Spatial's 1Validate and ESRI's Data Reviewer offers quality evaluation based on their own classification of rules. Commonly, rule based approaches are used for geospatial data quality check. In this study, we look for the technical components to devise and implement a rule based approach with ontologies using free and open source software in semantic web context. Semantic web uses ontologies to deliver well-defined web resources and make them accessible to end-users and processes. We have created an ontology conforming to the geospatial data and defined some sample rules to show how to test data with respect to data quality elements including; attribute, topo-semantic and geometrical consistency using free and open source software. To test data against rules, sample GeoSPARQL queries are created, associated with specifications.

  5. A case study in adaptable and reusable infrastructure at the Keck Observatory Archive: VO interfaces, moving targets, and more

    NASA Astrophysics Data System (ADS)

    Berriman, G. Bruce; Cohen, Richard W.; Colson, Andrew; Gelino, Christopher R.; Good, John C.; Kong, Mihseh; Laity, Anastasia C.; Mader, Jeffrey A.; Swain, Melanie A.; Tran, Hien D.; Wang, Shin-Ywan

    2016-08-01

    The Keck Observatory Archive (KOA) (https://koa.ipac.caltech.edu) curates all observations acquired at the W. M. Keck Observatory (WMKO) since it began operations in 1994, including data from eight active instruments and two decommissioned instruments. The archive is a collaboration between WMKO and the NASA Exoplanet Science Institute (NExScI). Since its inception in 2004, the science information system used at KOA has adopted an architectural approach that emphasizes software re-use and adaptability. This paper describes how KOA is currently leveraging and extending open source software components to develop new services and to support delivery of a complete set of instrument metadata, which will enable more sophisticated and extensive queries than currently possible. In August 2015, KOA deployed a program interface to discover public data from all instruments equipped with an imaging mode. The interface complies with version 2 of the Simple Imaging Access Protocol (SIAP), under development by the International Virtual Observatory Alliance (IVOA), which defines a standard mechanism for discovering images through spatial queries. The heart of the KOA service is an R-tree-based, database-indexing mechanism prototyped by the Virtual Astronomical Observatory (VAO) and further developed by the Montage Image Mosaic project, designed to provide fast access to large imaging data sets as a first step in creating wide-area image mosaics (such as mosaics of subsets of the 4.7 million images of the SDSS DR9 release). The KOA service uses the results of the spatial R-tree search to create an SQLite data database for further relational filtering. The service uses a JSON configuration file to describe the association between instrument parameters and the service query parameters, and to make it applicable beyond the Keck instruments. The images generated at the Keck telescope usually do not encode the image footprints as WCS fields in the FITS file headers. Because SIAP searches are spatial, much of the effort in developing the program interface involved processing the instrument and telescope parameters to understand how accurately we can derive the WCS information for each instrument. This knowledge is now being fed back into the KOA databases as part of a program to include complete metadata information for all imaging observations. The R-tree program was itself extended to support temporal (in addition to spatial) indexing, in response to requests from the planetary science community for a search engine to discover observations of Solar System objects. With this 3D-indexing scheme, the service performs very fast time and spatial matches between the target ephemerides, obtained from the JPL SPICE service. Our experiments indicate these matches can be more than 100 times faster than when separating temporal and spatial searches. Images of the tracks of the moving targets, overlaid with the image footprints, are computed with a new command-line visualization tool, mViewer, released with the Montage distribution. The service is currently in test and will be released in late summer 2016.

  6. PRAIRIEMAP: A GIS database for prairie grassland management in western North America

    USGS Publications Warehouse

    ,

    2003-01-01

    The USGS Forest and Rangeland Ecosystem Science Center, Snake River Field Station (SRFS) maintains a database of spatial information, called PRAIRIEMAP, which is needed to address the management of prairie grasslands in western North America. We identify and collect spatial data for the region encompassing the historical extent of prairie grasslands (Figure 1). State and federal agencies, the primary entities responsible for management of prairie grasslands, need this information to develop proactive management strategies to prevent prairie-grassland wildlife species from being listed as Endangered Species, or to develop appropriate responses if listing does occur. Spatial data are an important component in documenting current habitat and other environmental conditions, which can be used to identify areas that have undergone significant changes in land cover and to identify underlying causes. Spatial data will also be a critical component guiding the decision processes for restoration of habitat in the Great Plains. As such, the PRAIRIEMAP database will facilitate analyses of large-scale and range-wide factors that may be causing declines in grassland habitat and populations of species that depend on it for their survival. Therefore, development of a reliable spatial database carries multiple benefits for land and wildlife management. The project consists of 3 phases: (1) identify relevant spatial data, (2) assemble, document, and archive spatial data on a computer server, and (3) develop and maintain the web site (http://prairiemap.wr.usgs.gov) for query and transfer of GIS data to managers and researchers.

  7. The role of organizational research in implementing evidence-based practice: QUERI Series

    PubMed Central

    Yano, Elizabeth M

    2008-01-01

    Background Health care organizations exert significant influence on the manner in which clinicians practice and the processes and outcomes of care that patients experience. A greater understanding of the organizational milieu into which innovations will be introduced, as well as the organizational factors that are likely to foster or hinder the adoption and use of new technologies, care arrangements and quality improvement (QI) strategies are central to the effective implementation of research into practice. Unfortunately, much implementation research seems to not recognize or adequately address the influence and importance of organizations. Using examples from the U.S. Department of Veterans Affairs (VA) Quality Enhancement Research Initiative (QUERI), we describe the role of organizational research in advancing the implementation of evidence-based practice into routine care settings. Methods Using the six-step QUERI process as a foundation, we present an organizational research framework designed to improve and accelerate the implementation of evidence-based practice into routine care. Specific QUERI-related organizational research applications are reviewed, with discussion of the measures and methods used to apply them. We describe these applications in the context of a continuum of organizational research activities to be conducted before, during and after implementation. Results Since QUERI's inception, various approaches to organizational research have been employed to foster progress through QUERI's six-step process. We report on how explicit integration of the evaluation of organizational factors into QUERI planning has informed the design of more effective care delivery system interventions and enabled their improved "fit" to individual VA facilities or practices. We examine the value and challenges in conducting organizational research, and briefly describe the contributions of organizational theory and environmental context to the research framework. Conclusion Understanding the organizational context of delivering evidence-based practice is a critical adjunct to efforts to systematically improve quality. Given the size and diversity of VA practices, coupled with unique organizational data sources, QUERI is well-positioned to make valuable contributions to the field of implementation science. More explicit accommodation of organizational inquiry into implementation research agendas has helped QUERI researchers to better frame and extend their work as they move toward regional and national spread activities. PMID:18510749

  8. STARS 2.0: 2nd-generation open-source archiving and query software

    NASA Astrophysics Data System (ADS)

    Winegar, Tom

    2008-07-01

    The Subaru Telescope is in process of developing an open-source alternative to the 1st-generation software and databases (STARS 1) used for archiving and query. For STARS 2, we have chosen PHP and Python for scripting and MySQL as the database software. We have collected feedback from staff and observers, and used this feedback to significantly improve the design and functionality of our future archiving and query software. Archiving - We identified two weaknesses in 1st-generation STARS archiving software: a complex and inflexible table structure and uncoordinated system administration for our business model: taking pictures from the summit and archiving them in both Hawaii and Japan. We adopted a simplified and normalized table structure with passive keyword collection, and we are designing an archive-to-archive file transfer system that automatically reports real-time status and error conditions and permits error recovery. Query - We identified several weaknesses in 1st-generation STARS query software: inflexible query tools, poor sharing of calibration data, and no automatic file transfer mechanisms to observers. We are developing improved query tools and sharing of calibration data, and multi-protocol unassisted file transfer mechanisms for observers. In the process, we have redefined a 'query': from an invisible search result that can only transfer once in-house right now, with little status and error reporting and no error recovery - to a stored search result that can be monitored, transferred to different locations with multiple protocols, reporting status and error conditions and permitting recovery from errors.

  9. Enhancing SAMOS Data Access in DOMS via a Neo4j Property Graph Database.

    NASA Astrophysics Data System (ADS)

    Stallard, A. P.; Smith, S. R.; Elya, J. L.

    2016-12-01

    The Shipboard Automated Meteorological and Oceanographic System (SAMOS) initiative provides routine access to high-quality marine meteorological and near-surface oceanographic observations from research vessels. The Distributed Oceanographic Match-Up Service (DOMS) under development is a centralized service that allows researchers to easily match in situ and satellite oceanographic data from distributed sources to facilitate satellite calibration, validation, and retrieval algorithm development. The service currently uses Apache Solr as a backend search engine on each node in the distributed network. While Solr is a high-performance solution that facilitates creation and maintenance of indexed data, it is limited in the sense that its schema is fixed. The property graph model escapes this limitation by creating relationships between data objects. The authors will present the development of the SAMOS Neo4j property graph database including new search possibilities that take advantage of the property graph model, performance comparisons with Apache Solr, and a vision for graph databases as a storage tool for oceanographic data. The integration of the SAMOS Neo4j graph into DOMS will also be described. Currently, Neo4j contains spatial and temporal records from SAMOS which are modeled into a time tree and r-tree using Graph Aware and Spatial plugin tools for Neo4j. These extensions provide callable Java procedures within CYPHER (Neo4j's query language) that generate in-graph structures. Once generated, these structures can be queried using procedures from these libraries, or directly via CYPHER statements. Neo4j excels at performing relationship and path-based queries, which challenge relational-SQL databases because they require memory intensive joins due to the limitation of their design. Consider a user who wants to find records over several years, but only for specific months. If a traditional database only stores timestamps, this type of query would be complex and likely prohibitively slow. Using the time tree model, one can specify a path from the root to the data which restricts resolutions to certain timeframes (e.g., months). This query can be executed without joins, unions, or other compute-intensive operations, putting Neo4j at a computational advantage to the SQL database alternative.

  10. A Study of the Efficiency of Spatial Indexing Methods Applied to Large Astronomical Databases

    NASA Astrophysics Data System (ADS)

    Donaldson, Tom; Berriman, G. Bruce; Good, John; Shiao, Bernie

    2018-01-01

    Spatial indexing of astronomical databases generally uses quadrature methods, which partition the sky into cells used to create an index (usually a B-tree) written as database column. We report the results of a study to compare the performance of two common indexing methods, HTM and HEALPix, on Solaris and Windows database servers installed with a PostgreSQL database, and a Windows Server installed with MS SQL Server. The indexing was applied to the 2MASS All-Sky Catalog and to the Hubble Source catalog. On each server, the study compared indexing performance by submitting 1 million queries at each index level with random sky positions and random cone search radius, which was computed on a logarithmic scale between 1 arcsec and 1 degree, and measuring the time to complete the query and write the output. These simulated queries, intended to model realistic use patterns, were run in a uniform way on many combinations of indexing method and indexing level. The query times in all simulations are strongly I/O-bound and are linear with number of records returned for large numbers of sources. There are, however, considerable differences between simulations, which reveal that hardware I/O throughput is a more important factor in managing the performance of a DBMS than the choice of indexing scheme. The choice of index itself is relatively unimportant: for comparable index levels, the performance is consistent within the scatter of the timings. At small index levels (large cells; e.g. level 4; cell size 3.7 deg), there is large scatter in the timings because of wide variations in the number of sources found in the cells. At larger index levels, performance improves and scatter decreases, but the improvement at level 8 (14 min) and higher is masked to some extent in the timing scatter caused by the range of query sizes. At very high levels (20; 0.0004 arsec), the granularity of the cells becomes so high that a large number of extraneous empty cells begin to degrade performance. Thus, for the use patterns studied here the database performance is not critically dependent on the exact choices of index or level.

  11. Query Processing for Probabilistic State Diagrams Describing Multiple Robot Navigation in an Indoor Environment

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Czejdo, Bogdan; Bhattacharya, Sambit; Ferragut, Erik M

    2012-01-01

    This paper describes the syntax and semantics of multi-level state diagrams to support probabilistic behavior of cooperating robots. The techniques are presented to analyze these diagrams by querying combined robots behaviors. It is shown how to use state abstraction and transition abstraction to create, verify and process large probabilistic state diagrams.

  12. Geologic map of the Patagonia Mountains, Santa Cruz County, Arizona

    USGS Publications Warehouse

    Graybeal, Frederick T.; Moyer, Lorre A.; Vikre, Peter; Dunlap, Pamela; Wallis, John C.

    2015-01-01

    Several spatial databases provide data for the geologic map of the Patagonia Mountains in Arizona. The data can be viewed and queried in ArcGIS 10, a geographic information system; a geologic map is also available in PDF format. All products are available online only.

  13. Multi-Mission Geographic Information System for Science Operations: A Test Case Using MSL Data

    NASA Astrophysics Data System (ADS)

    Calef, F. J.; Abarca, H. E.; Soliman, T.; Abercrombie, S. P.; Powell, M. W.

    2017-06-01

    The Multi-Mission Geographic Information System (MMGIS) is a NASA AMMOS project in its second year of development, built to display and query science products in a spatial context. We present our progress building this tool using MSL in situ data.

  14. Supporting diagnosis and treatment in medical care based on Big Data processing.

    PubMed

    Lupşe, Oana-Sorina; Crişan-Vida, Mihaela; Stoicu-Tivadar, Lăcrămioara; Bernard, Elena

    2014-01-01

    With information and data in all domains growing every day, it is difficult to manage and extract useful knowledge for specific situations. This paper presents an integrated system architecture to support the activity in the Ob-Gin departments with further developments in using new technology to manage Big Data processing - using Google BigQuery - in the medical domain. The data collected and processed with Google BigQuery results from different sources: two Obstetrics & Gynaecology Departments, the TreatSuggest application - an application for suggesting treatments, and a home foetal surveillance system. Data is uploaded in Google BigQuery from Bega Hospital Timişoara, Romania. The analysed data is useful for the medical staff, researchers and statisticians from public health domain. The current work describes the technological architecture and its processing possibilities that in the future will be proved based on quality criteria to lead to a better decision process in diagnosis and public health.

  15. Hybrid Schema Matching for Deep Web

    NASA Astrophysics Data System (ADS)

    Chen, Kerui; Zuo, Wanli; He, Fengling; Chen, Yongheng

    Schema matching is the process of identifying semantic mappings, or correspondences, between two or more schemas. Schema matching is a first step and critical part of data integration. For schema matching of deep web, most researches only interested in query interface, while rarely pay attention to abundant schema information contained in query result pages. This paper proposed a mixed schema matching technique, which combines attributes that appeared in query structures and query results of different data sources, and mines the matched schemas inside. Experimental results prove the effectiveness of this method for improving the accuracy of schema matching.

  16. Enabling Incremental Query Re-Optimization.

    PubMed

    Liu, Mengmeng; Ives, Zachary G; Loo, Boon Thau

    2016-01-01

    As declarative query processing techniques expand to the Web, data streams, network routers, and cloud platforms, there is an increasing need to re-plan execution in the presence of unanticipated performance changes. New runtime information may affect which query plan we prefer to run. Adaptive techniques require innovation both in terms of the algorithms used to estimate costs , and in terms of the search algorithm that finds the best plan. We investigate how to build a cost-based optimizer that recomputes the optimal plan incrementally given new cost information, much as a stream engine constantly updates its outputs given new data. Our implementation especially shows benefits for stream processing workloads. It lays the foundations upon which a variety of novel adaptive optimization algorithms can be built. We start by leveraging the recently proposed approach of formulating query plan enumeration as a set of recursive datalog queries ; we develop a variety of novel optimization approaches to ensure effective pruning in both static and incremental cases. We further show that the lessons learned in the declarative implementation can be equally applied to more traditional optimizer implementations.

  17. A Search Strategy of Level-Based Flooding for the Internet of Things

    PubMed Central

    Qiu, Tie; Ding, Yanhong; Xia, Feng; Ma, Honglian

    2012-01-01

    This paper deals with the query problem in the Internet of Things (IoT). Flooding is an important query strategy. However, original flooding is prone to cause heavy network loads. To address this problem, we propose a variant of flooding, called Level-Based Flooding (LBF). With LBF, the whole network is divided into several levels according to the distances (i.e., hops) between the sensor nodes and the sink node. The sink node knows the level information of each node. Query packets are broadcast in the network according to the levels of nodes. Upon receiving a query packet, sensor nodes decide how to process it according to the percentage of neighbors that have processed it. When the target node receives the query packet, it sends its data back to the sink node via random walk. We show by extensive simulations that the performance of LBF in terms of cost and latency is much better than that of original flooding, and LBF can be used in IoT of different scales. PMID:23112594

  18. Enabling Incremental Query Re-Optimization

    PubMed Central

    Liu, Mengmeng; Ives, Zachary G.; Loo, Boon Thau

    2017-01-01

    As declarative query processing techniques expand to the Web, data streams, network routers, and cloud platforms, there is an increasing need to re-plan execution in the presence of unanticipated performance changes. New runtime information may affect which query plan we prefer to run. Adaptive techniques require innovation both in terms of the algorithms used to estimate costs, and in terms of the search algorithm that finds the best plan. We investigate how to build a cost-based optimizer that recomputes the optimal plan incrementally given new cost information, much as a stream engine constantly updates its outputs given new data. Our implementation especially shows benefits for stream processing workloads. It lays the foundations upon which a variety of novel adaptive optimization algorithms can be built. We start by leveraging the recently proposed approach of formulating query plan enumeration as a set of recursive datalog queries; we develop a variety of novel optimization approaches to ensure effective pruning in both static and incremental cases. We further show that the lessons learned in the declarative implementation can be equally applied to more traditional optimizer implementations. PMID:28659658

  19. Development of a web-based video management and application processing system

    NASA Astrophysics Data System (ADS)

    Chan, Shermann S.; Wu, Yi; Li, Qing; Zhuang, Yueting

    2001-07-01

    How to facilitate efficient video manipulation and access in a web-based environment is becoming a popular trend for video applications. In this paper, we present a web-oriented video management and application processing system, based on our previous work on multimedia database and content-based retrieval. In particular, we extend the VideoMAP architecture with specific web-oriented mechanisms, which include: (1) Concurrency control facilities for the editing of video data among different types of users, such as Video Administrator, Video Producer, Video Editor, and Video Query Client; different users are assigned various priority levels for different operations on the database. (2) Versatile video retrieval mechanism which employs a hybrid approach by integrating a query-based (database) mechanism with content- based retrieval (CBR) functions; its specific language (CAROL/ST with CBR) supports spatio-temporal semantics of video objects, and also offers an improved mechanism to describe visual content of videos by content-based analysis method. (3) Query profiling database which records the `histories' of various clients' query activities; such profiles can be used to provide the default query template when a similar query is encountered by the same kind of users. An experimental prototype system is being developed based on the existing VideoMAP prototype system, using Java and VC++ on the PC platform.

  20. Efficient hemodynamic event detection utilizing relational databases and wavelet analysis

    NASA Technical Reports Server (NTRS)

    Saeed, M.; Mark, R. G.

    2001-01-01

    Development of a temporal query framework for time-oriented medical databases has hitherto been a challenging problem. We describe a novel method for the detection of hemodynamic events in multiparameter trends utilizing wavelet coefficients in a MySQL relational database. Storage of the wavelet coefficients allowed for a compact representation of the trends, and provided robust descriptors for the dynamics of the parameter time series. A data model was developed to allow for simplified queries along several dimensions and time scales. Of particular importance, the data model and wavelet framework allowed for queries to be processed with minimal table-join operations. A web-based search engine was developed to allow for user-defined queries. Typical queries required between 0.01 and 0.02 seconds, with at least two orders of magnitude improvement in speed over conventional queries. This powerful and innovative structure will facilitate research on large-scale time-oriented medical databases.

  1. Unleashing spatially distributed ecohydrology modeling using Big Data tools

    NASA Astrophysics Data System (ADS)

    Miles, B.; Idaszak, R.

    2015-12-01

    Physically based spatially distributed ecohydrology models are useful for answering science and management questions related to the hydrology and biogeochemistry of prairie, savanna, forested, as well as urbanized ecosystems. However, these models can produce hundreds of gigabytes of spatial output for a single model run over decadal time scales when run at regional spatial scales and moderate spatial resolutions (~100-km2+ at 30-m spatial resolution) or when run for small watersheds at high spatial resolutions (~1-km2 at 3-m spatial resolution). Numerical data formats such as HDF5 can store arbitrarily large datasets. However even in HPC environments, there are practical limits on the size of single files that can be stored and reliably backed up. Even when such large datasets can be stored, querying and analyzing these data can suffer from poor performance due to memory limitations and I/O bottlenecks, for example on single workstations where memory and bandwidth are limited, or in HPC environments where data are stored separately from computational nodes. The difficulty of storing and analyzing spatial data from ecohydrology models limits our ability to harness these powerful tools. Big Data tools such as distributed databases have the potential to surmount the data storage and analysis challenges inherent to large spatial datasets. Distributed databases solve these problems by storing data close to computational nodes while enabling horizontal scalability and fault tolerance. Here we present the architecture of and preliminary results from PatchDB, a distributed datastore for managing spatial output from the Regional Hydro-Ecological Simulation System (RHESSys). The initial version of PatchDB uses message queueing to asynchronously write RHESSys model output to an Apache Cassandra cluster. Once stored in the cluster, these data can be efficiently queried to quickly produce both spatial visualizations for a particular variable (e.g. maps and animations), as well as point time series of arbitrary variables at arbitrary points in space within a watershed or river basin. By treating ecohydrology modeling as a Big Data problem, we hope to provide a platform for answering transformative science and management questions related to water quantity and quality in a world of non-stationary climate.

  2. Semantic Document Library: A Virtual Research Environment for Documents, Data and Workflows Sharing

    NASA Astrophysics Data System (ADS)

    Kotwani, K.; Liu, Y.; Myers, J.; Futrelle, J.

    2008-12-01

    The Semantic Document Library (SDL) was driven by use cases from the environmental observatory communities and is designed to provide conventional document repository features of uploading, downloading, editing and versioning of documents as well as value adding features of tagging, querying, sharing, annotating, ranking, provenance, social networking and geo-spatial mapping services. It allows users to organize a catalogue of watershed observation data, model output, workflows, as well publications and documents related to the same watershed study through the tagging capability. Users can tag all relevant materials using the same watershed name and find all of them easily later using this tag. The underpinning semantic content repository can store materials from other cyberenvironments such as workflow or simulation tools and SDL provides an effective interface to query and organize materials from various sources. Advanced features of the SDL allow users to visualize the provenance of the materials such as the source and how the output data is derived. Other novel features include visualizing all geo-referenced materials on a geospatial map. SDL as a component of a cyberenvironment portal (the NCSA Cybercollaboratory) has goal of efficient management of information and relationships between published artifacts (Validated models, vetted data, workflows, annotations, best practices, reviews and papers) produced from raw research artifacts (data, notes, plans etc.) through agents (people, sensors etc.). Tremendous scientific potential of artifacts is achieved through mechanisms of sharing, reuse and collaboration - empowering scientists to spread their knowledge and protocols and to benefit from the knowledge of others. SDL successfully implements web 2.0 technologies and design patterns along with semantic content management approach that enables use of multiple ontologies and dynamic evolution (e.g. folksonomies) of terminology. Scientific documents involved with many interconnected entities (artifacts or agents) are represented as RDF triples using semantic content repository middleware Tupelo in one or many data/metadata RDF stores. Queries to the RDF enables discovery of relations among data, process and people, digging out valuable aspects, making recommendations to users, such as what tools are typically used to answer certain kinds of questions or with certain types of dataset. This innovative concept brings out coherent information about entities from four different perspectives of the social context (Who-human relations and interactions), the casual context (Why - provenance and history), the geo-spatial context (Where - location or spatially referenced information) and the conceptual context (What - domain specific relations, ontologies etc.).

  3. GeoNetwork powered GI-cat: a geoportal hybrid solution

    NASA Astrophysics Data System (ADS)

    Baldini, Alessio; Boldrini, Enrico; Santoro, Mattia; Mazzetti, Paolo

    2010-05-01

    To the aim of setting up a Spatial Data Infrastructures (SDI) the creation of a system for the metadata management and discovery plays a fundamental role. An effective solution is the use of a geoportal (e.g. FAO/ESA geoportal), that has the important benefit of being accessible from a web browser. With this work we present a solution based integrating two of the available frameworks: GeoNetwork and GI-cat. GeoNetwork is an opensource software designed to improve accessibility of a wide variety of data together with the associated ancillary information (metadata), at different scale and from multidisciplinary sources; data are organized and documented in a standard and consistent way. GeoNetwork implements both the Portal and Catalog components of a Spatial Data Infrastructure (SDI) defined in the OGC Reference Architecture. It provides tools for managing and publishing metadata on spatial data and related services. GeoNetwork allows harvesting of various types of web data sources e.g. OGC Web Services (e.g. CSW, WCS, WMS). GI-cat is a distributed catalog based on a service-oriented framework of modular components and can be customized and tailored to support different deployment scenarios. It can federate a multiplicity of catalogs services, as well as inventory and access services in order to discover and access heterogeneous ESS resources. The federated resources are exposed by GI-cat through several standard catalog interfaces (e.g. OGC CSW AP ISO, OpenSearch, etc.) and by the GI-cat extended interface. Specific components implement mediation services for interfacing heterogeneous service providers, each of which exposes a specific standard specification; such components are called Accessors. These mediating components solve providers data modelmultiplicity by mapping them onto the GI-cat internal data model which implements the ISO 19115 Core profile. Accessors also implement the query protocol mapping; first they translate the query requests expressed according to the interface protocols exposed by GI-cat into the multiple query dialects spoken by the resource service providers. Currently, a number of well-accepted catalog and inventory services are supported, including several OGC Web Services, THREDDS Data Server, SeaDataNet Common Data Index, GBIF and OpenSearch engines. A GeoNetwork powered GI-cat has been developed in order to exploit the best of the two frameworks. The new system uses a modified version of GeoNetwork web interface in order to add the capability of querying also the specified GI-cat catalog and not only the GeoNetwork internal database. The resulting system consists in a geoportal in which GI-cat plays the role of the search engine. This new system allows to distribute the query on the different types of data sources linked to a GI-cat. The metadata results of the query are then visualized by the Geonetwork web interface. This configuration was experimented in the framework of GIIDA, a project of the Italian National Research Council (CNR) focused on data accessibility and interoperability. A second advantage of this solution is achieved setting up a GeoNetwork catalog amongst the accessors of the GI-cat instance. Such a configuration will allow in turn GI-cat to run the query against the internal GeoNetwork database. This allows to have both the harvesting and the metadata editor functionalities provided by GeoNetwork and the distributed search functionality of GI-cat available in a consistent way through the same web interface.

  4. Integrated Web-Based Immersive Exploration of the Coordinated Canyon Experiment Data using Open Source STOQS Software

    NASA Astrophysics Data System (ADS)

    McCann, M. P.; Gwiazda, R.; O'Reilly, T. C.; Maier, K. L.; Lundsten, E. M.; Parsons, D. R.; Paull, C. K.

    2017-12-01

    The Coordinated Canyon Experiment (CCE) in Monterey Submarine Canyon has produced a wealth of oceanographic measurements whose analysis will improve understanding of turbidity current processes. Exploration of this data set, consisting of over 60 parameters from 15 platforms, is facilitated by using the open source Spatial Temporal Oceanographic Query System (STOQS) software (https://github.com/stoqs/stoqs). The Monterey Bay Aquarium Research Institute (MBARI) originally developed STOQS to help manage and visualize upper water column oceanographic measurements, but the generality of its data model permits effective use for any kind of spatial/temporal measurement data. STOQS consists of a PostgreSQL database and server-side Python/Django software; the client-side is jQuery JavaScript supporting AJAX requests to update a single page web application. The User Interface (UI) is optimized to provide a quick overview of data in spatial and temporal dimensions, as well as in parameter, platform, and data value space. A user may zoom into any feature of interest and select it, initiating a filter operation that updates the UI with an overview of all the data in the new filtered selection. When details are desired, radio buttons and checkboxes are selected to generate a number of different types of visualizations. These include color-filled temporal section and line plots, parameter-parameter plots, 2D map plots, and interactive 3D spatial visualizations. The Extensible 3D (X3D) standard and X3DOM JavaScript library provide the technology for presenting animated 3D data directly within the web browser. Most of the oceanographic measurements from the CCE (e.g. mooring mounted ADCP and CTD data) are easily visualized using established methods. However, unified integration and multiparameter display of several concurrently deployed sensors across a network of platforms is a challenge we hope to solve. Moreover, STOQS also allows display of data from a new instrument - the Benthic Event Detector (BED). The BED records 50Hz samples of orientation and acceleration when it moves. These data are converted to the CF-NetCDF format and then loaded into a STOQS database. Using the Spatial-3D view a user may interact with a virtual playback of BED motions, giving new insight into submarine canyon sediment density flows.

  5. Mof-Tree: A Spatial Access Method To Manipulate Multiple Overlapping Features.

    ERIC Educational Resources Information Center

    Manolopoulos, Yannis; Nardelli, Enrico; Papadopoulos, Apostolos; Proietti, Guido

    1997-01-01

    Investigates the manipulation of large sets of two-dimensional data representing multiple overlapping features, and presents a new access method, the MOF-tree. Analyzes storage requirements and time with respect to window query operations involving multiple features. Examines both the pointer-based and pointerless MOF-tree representations.…

  6. Can internet search queries be used for dengue fever surveillance in China?

    PubMed

    Guo, Pi; Wang, Li; Zhang, Yanhong; Luo, Ganfeng; Zhang, Yanting; Deng, Changyu; Zhang, Qin; Zhang, Qingying

    2017-10-01

    China experienced an unprecedented outbreak of dengue fever in 2014, and the number of cases reached the highest level over the past 25 years. Traditional sentinel surveillance systems of dengue fever in China have an obvious drawback that the average delay from receipt to dissemination of dengue case data is roughly 1-2 weeks. In order to exploit internet search queries to timely monitor dengue fever, we analyzed data of dengue incidence and Baidu search query from 31 provinces in mainland China during the period of January 2011 to December 2014. We found that there was a strong correlation between changes in people's online health-seeking behavior and dengue fever incidence. Our study represents the first attempt demonstrating a strong temporal and spatial correlation between internet search trends and dengue epidemics nationwide in China. The findings will help the government to strengthen the capacity of traditional surveillance systems for dengue fever. Copyright © 2017 The Author(s). Published by Elsevier Ltd.. All rights reserved.

  7. AQUAdexIM: highly efficient in-memory indexing and querying of astronomy time series images

    NASA Astrophysics Data System (ADS)

    Hong, Zhi; Yu, Ce; Wang, Jie; Xiao, Jian; Cui, Chenzhou; Sun, Jizhou

    2016-12-01

    Astronomy has always been, and will continue to be, a data-based science, and astronomers nowadays are faced with increasingly massive datasets, one key problem of which is to efficiently retrieve the desired cup of data from the ocean. AQUAdexIM, an innovative spatial indexing and querying method, performs highly efficient on-the-fly queries under users' request to search for Time Series Images from existing observation data on the server side and only return the desired FITS images to users, so users no longer need to download entire datasets to their local machines, which will only become more and more impractical as the data size keeps increasing. Moreover, AQUAdexIM manages to keep a very low storage space overhead and its specially designed in-memory index structure enables it to search for Time Series Images of a given area of the sky 10 times faster than using Redis, a state-of-the-art in-memory database.

  8. Integrating geo web services for a user driven exploratory analysis

    NASA Astrophysics Data System (ADS)

    Moncrieff, Simon; Turdukulov, Ulanbek; Gulland, Elizabeth-Kate

    2016-04-01

    In data exploration, several online data sources may need to be dynamically aggregated or summarised over spatial region, time interval, or set of attributes. With respect to thematic data, web services are mainly used to present results leading to a supplier driven service model limiting the exploration of the data. In this paper we propose a user need driven service model based on geo web processing services. The aim of the framework is to provide a method for the scalable and interactive access to various geographic data sources on the web. The architecture combines a data query, processing technique and visualisation methodology to rapidly integrate and visually summarise properties of a dataset. We illustrate the environment on a health related use case that derives Age Standardised Rate - a dynamic index that needs integration of the existing interoperable web services of demographic data in conjunction with standalone non-spatial secure database servers used in health research. Although the example is specific to the health field, the architecture and the proposed approach are relevant and applicable to other fields that require integration and visualisation of geo datasets from various web services and thus, we believe is generic in its approach.

  9. Indexing data cubes for content-based searches in radio astronomy

    NASA Astrophysics Data System (ADS)

    Araya, M.; Candia, G.; Gregorio, R.; Mendoza, M.; Solar, M.

    2016-01-01

    Methods for observing space have changed profoundly in the past few decades. The methods needed to detect and record astronomical objects have shifted from conventional observations in the optical range to more sophisticated methods which permit the detection of not only the shape of an object but also the velocity and frequency of emissions in the millimeter-scale wavelength range and the chemical substances from which they originate. The consolidation of radio astronomy through a range of global-scale projects such as the Very Long Baseline Array (VLBA) and the Atacama Large Millimeter/submillimeter Array (ALMA) reinforces the need to develop better methods of data processing that can automatically detect regions of interest (ROIs) within data cubes (position-position-velocity), index them and facilitate subsequent searches via methods based on queries using spatial coordinates and/or velocity ranges. In this article, we present the development of an automatic system for indexing ROIs in data cubes that is capable of automatically detecting and recording ROIs while reducing the necessary storage space. The system is able to process data cubes containing megabytes of data in fractions of a second without human supervision, thus allowing it to be incorporated into a production line for displaying objects in a virtual observatory. We conducted a set of comprehensive experiments to illustrate how our system works. As a result, an index of 3% of the input size was stored in a spatial database, representing a compression ratio equal to 33:1 over an input of 20.875 GB, achieving an index of 773 MB approximately. On the other hand, a single query can be evaluated over our system in a fraction of second, showing that the indexing step works as a shock-absorber of the computational time involved in data cube processing. The system forms part of the Chilean Virtual Observatory (ChiVO), an initiative which belongs to the International Virtual Observatory Alliance (IVOA) that seeks to provide the capability of content-based searches on data cubes to the astronomical community.

  10. Mashups over the Deep Web

    NASA Astrophysics Data System (ADS)

    Hornung, Thomas; Simon, Kai; Lausen, Georg

    Combining information from different Web sources often results in a tedious and repetitive process, e.g. even simple information requests might require to iterate over a result list of one Web query and use each single result as input for a subsequent query. One approach for this chained queries are data-centric mashups, which allow to visually model the data flow as a graph, where the nodes represent the data source and the edges the data flow.

  11. Executing SPARQL Queries over the Web of Linked Data

    NASA Astrophysics Data System (ADS)

    Hartig, Olaf; Bizer, Christian; Freytag, Johann-Christoph

    The Web of Linked Data forms a single, globally distributed dataspace. Due to the openness of this dataspace, it is not possible to know in advance all data sources that might be relevant for query answering. This openness poses a new challenge that is not addressed by traditional research on federated query processing. In this paper we present an approach to execute SPARQL queries over the Web of Linked Data. The main idea of our approach is to discover data that might be relevant for answering a query during the query execution itself. This discovery is driven by following RDF links between data sources based on URIs in the query and in partial results. The URIs are resolved over the HTTP protocol into RDF data which is continuously added to the queried dataset. This paper describes concepts and algorithms to implement our approach using an iterator-based pipeline. We introduce a formalization of the pipelining approach and show that classical iterators may cause blocking due to the latency of HTTP requests. To avoid blocking, we propose an extension of the iterator paradigm. The evaluation of our approach shows its strengths as well as the still existing challenges.

  12. A Natural Language Interface Concordant with a Knowledge Base.

    PubMed

    Han, Yong-Jin; Park, Seong-Bae; Park, Se-Young

    2016-01-01

    The discordance between expressions interpretable by a natural language interface (NLI) system and those answerable by a knowledge base is a critical problem in the field of NLIs. In order to solve this discordance problem, this paper proposes a method to translate natural language questions into formal queries that can be generated from a graph-based knowledge base. The proposed method considers a subgraph of a knowledge base as a formal query. Thus, all formal queries corresponding to a concept or a predicate in the knowledge base can be generated prior to query time and all possible natural language expressions corresponding to each formal query can also be collected in advance. A natural language expression has a one-to-one mapping with a formal query. Hence, a natural language question is translated into a formal query by matching the question with the most appropriate natural language expression. If the confidence of this matching is not sufficiently high the proposed method rejects the question and does not answer it. Multipredicate queries are processed by regarding them as a set of collected expressions. The experimental results show that the proposed method thoroughly handles answerable questions from the knowledge base and rejects unanswerable ones effectively.

  13. Applications of Singh-Rajput Mes in Recall Operations of Quantum Associative Memory for a Two- Qubit System

    NASA Astrophysics Data System (ADS)

    Singh, Manu Pratap; Rajput, B. S.

    2016-03-01

    Recall operations of quantum associative memory (QuAM) have been conducted separately through evolutionary as well as non-evolutionary processes in terms of unitary and non- unitary operators respectively by separately choosing our recently derived maximally entangled states (Singh-Rajput MES) and Bell's MES as memory states for various queries and it has been shown that in each case the choices of Singh-Rajput MES as valid memory states are much more suitable than those of Bell's MES. it has been demonstrated that in both the types of recall processes the first and the fourth states of Singh-Rajput MES are most suitable choices as memory states for the queries `11' and `00' respectively while none of the Bell's MES is a suitable choice as valid memory state in these recall processes. It has been demonstrated that all the four states of Singh-Rajput MES are suitable choice as valid memory states for the queries `1?', `?1', `?0' and `0?' while none of the Bell's MES is suitable choice as the valid memory state for these queries also.

  14. Finding the Patient's Voice Using Big Data: Analysis of Users' Health-Related Concerns in the ChaCha Question-and-Answer Service (2009-2012).

    PubMed

    Priest, Chad; Knopf, Amelia; Groves, Doyle; Carpenter, Janet S; Furrey, Christopher; Krishnan, Anand; Miller, Wendy R; Otte, Julie L; Palakal, Mathew; Wiehe, Sarah; Wilson, Jeffrey

    2016-03-09

    The development of effective health care and public health interventions requires a comprehensive understanding of the perceptions, concerns, and stated needs of health care consumers and the public at large. Big datasets from social media and question-and-answer services provide insight into the public's health concerns and priorities without the financial, temporal, and spatial encumbrances of more traditional community-engagement methods and may prove a useful starting point for public-engagement health research (infodemiology). The objective of our study was to describe user characteristics and health-related queries of the ChaCha question-and-answer platform, and discuss how these data may be used to better understand the perceptions, concerns, and stated needs of health care consumers and the public at large. We conducted a retrospective automated textual analysis of anonymous user-generated queries submitted to ChaCha between January 2009 and November 2012. A total of 2.004 billion queries were read, of which 3.50% (70,083,796/2,004,243,249) were missing 1 or more data fields, leaving 1.934 billion complete lines of data for these analyses. Males and females submitted roughly equal numbers of health queries, but content differed by sex. Questions from females predominantly focused on pregnancy, menstruation, and vaginal health. Questions from males predominantly focused on body image, drug use, and sexuality. Adolescents aged 12-19 years submitted more queries than any other age group. Their queries were largely centered on sexual and reproductive health, and pregnancy in particular. The private nature of the ChaCha service provided a perfect environment for maximum frankness among users, especially among adolescents posing sensitive health questions. Adolescents' sexual health queries reveal knowledge gaps with serious, lifelong consequences. The nature of questions to the service provides opportunities for rapid understanding of health concerns and may lead to development of more effective tailored interventions.

  15. In-database processing of a large collection of remote sensing data: applications and implementation

    NASA Astrophysics Data System (ADS)

    Kikhtenko, Vladimir; Mamash, Elena; Chubarov, Dmitri; Voronina, Polina

    2016-04-01

    Large archives of remote sensing data are now available to scientists, yet the need to work with individual satellite scenes or product files constrains studies that span a wide temporal range or spatial extent. The resources (storage capacity, computing power and network bandwidth) required for such studies are often beyond the capabilities of individual geoscientists. This problem has been tackled before in remote sensing research and inspired several information systems. Some of them such as NASA Giovanni [1] and Google Earth Engine have already proved their utility for science. Analysis tasks involving large volumes of numerical data are not unique to Earth Sciences. Recent advances in data science are enabled by the development of in-database processing engines that bring processing closer to storage, use declarative query languages to facilitate parallel scalability and provide high-level abstraction of the whole dataset. We build on the idea of bridging the gap between file archives containing remote sensing data and databases by integrating files into relational database as foreign data sources and performing analytical processing inside the database engine. Thereby higher level query language can efficiently address problems of arbitrary size: from accessing the data associated with a specific pixel or a grid cell to complex aggregation over spatial or temporal extents over a large number of individual data files. This approach was implemented using PostgreSQL for a Siberian regional archive of satellite data products holding hundreds of terabytes of measurements from multiple sensors and missions taken over a decade-long span. While preserving the original storage layout and therefore compatibility with existing applications the in-database processing engine provides a toolkit for provisioning remote sensing data in scientific workflows and applications. The use of SQL - a widely used higher level declarative query language - simplifies interoperability between desktop GIS, web applications and geographic web services and interactive scientific applications (MATLAB, IPython). The system is also automatically ingesting direct readout data from meteorological and research satellites in near-real time with distributed acquisition workflows managed by Taverna workflow engine [2]. The system has demonstrated its utility in performing non-trivial analytic processing such as the computation of the Robust Satellite Technique (RST) indices [3]. It had been useful in different tasks such as studying urban heat islands, analyzing patterns in the distribution of wildfire occurrences, detecting phenomena related to seismic and earthquake activity. Initial experience has highlighted several limitations of the proposed approach yet it has demonstrated ability to facilitate the use of large archives of remote sensing data by geoscientists. 1. J.G. Acker, G. Leptoukh, Online analysis enhances use of NASA Earth science data. EOS Trans. AGU, 2007, 88(2), P. 14-17. 2. D. Hull, K. Wolsfencroft, R. Stevens, C. Goble, M.R. Pocock, P. Li and T. Oinn, Taverna: a tool for building and running workflows of services. Nucleic Acids Research. 2006. V. 34. P. W729-W732. 3. V. Tramutoli, G. Di Bello, N. Pergola, S. Piscitelli, Robust satellite techniques for remote sensing of seismically active areas // Annals of Geophysics. 2001. no. 44(2). P. 295-312.

  16. Digital soil mapping for the support of delineation of Areas Facing Natural Constraints defined by common European biophysical criteria

    NASA Astrophysics Data System (ADS)

    Pásztor, László; Bakacsi, Zsófia; Laborczi, Annamária; Takács, Katalin; Szatmári, Gábor; Tóth, Tibor; Szabó, József

    2016-04-01

    One of the main objectives of the EU's Common Agricultural Policy is to encourage maintaining agricultural production in Areas Facing Natural Constraints (ANC) in order to sustain agricultural production and use natural resources, in such a way to secure both stable production and income to farmers and to protect the environment. ANC assignment has both ecological and severe economical aspects. Recently the delimitation of ANCs is suggested to be carried out by using common biophysical diagnostic criteria on low soil productivity and poor climate conditions all over Europe. The criterion system was elaborated and has been repeatedly upgraded by JRC. The operational implementation is under member state competence. This process requires application of available soil databases and proper thematic and spatial inference methods. In our paper we present the inferences applied for the latest identification and delineation of areas with low soil productivity in Hungary according to JRC biophysical criteria related to soil: limited soil drainage, texture and stoniness (coarse texture, heavy clay, vertic properties), shallow rooting depth, chemical properties (salinity, sodicity, low pH). The compilation of target specific maps were based on the available legacy and recently collected data. In the present work three different data sources were used. The most relevant available data were queried from the datasets for each mapped criterion for either direct application or for the compilation a suitable, synthetic (non-measured) parameter. In some cases the values of the target variable originated from only one, in other cases from more databases. The reference dataset used in the mapping process was set up after substantial statistical analysis and filtering. It consisted of the values of the target variable attributed to the finally selected georeferenced locations. For spatial inference regression kriging was applied. Accuracy assessment was carried out by Leave One Out Cross Validation (LOOCV). In some cases the DSM product directly provided the delineation result by simple querying, in other cases further interpretation of the map was necessary. As the result of our work not only spatial fulfilment of the European biophysical criteria was assessed and provided for decision makers, but unique digital soil map products were elaborated regionalizing specific soil features, which were never mapped before, even nationally with 1 ha spatial resolution. Acknowledgement: Our work was supported by the "European Fund for Agricultural and Rural Development: Europe investing in rural areas" with the support of the European Union and Hungary and by the Hungarian National Scientific Research Foundation (OTKA, Grant No. K105167).

  17. Content-Aware DataGuide with Incremental Index Update using Frequently Used Paths

    NASA Astrophysics Data System (ADS)

    Sharma, A. K.; Duhan, Neelam; Khattar, Priyanka

    2010-11-01

    Size of the WWW is increasing day by day. Due to the absence of structured data on the Web, it becomes very difficult for information retrieval tools to fully utilize the Web information. As a solution to this problem, XML pages come into play, which provide structural information to the users to some extent. Without efficient indexes, query processing can be quite inefficient due to an exhaustive traversal on XML data. In this paper an improved content-centric approach of Content-Aware DataGuide, which is an indexing technique for XML databases, is being proposed that uses frequently used paths from historical query logs to improve query performance. The index can be updated incrementally according to the changes in query workload and thus, the overhead of reconstruction can be minimized. Frequently used paths are extracted using any Sequential Pattern mining algorithm on subsequent queries in the query workload. After this, the data structures are incrementally updated. This indexing technique proves to be efficient as partial matching queries can be executed efficiently and users can now get the more relevant documents in results.

  18. Index Compression and Efficient Query Processing in Large Web Search Engines

    ERIC Educational Resources Information Center

    Ding, Shuai

    2013-01-01

    The inverted index is the main data structure used by all the major search engines. Search engines build an inverted index on their collection to speed up query processing. As the size of the web grows, the length of the inverted list structures, which can easily grow to hundreds of MBs or even GBs for common terms (roughly linear in the size of…

  19. A natural language query system for Hubble Space Telescope proposal selection

    NASA Technical Reports Server (NTRS)

    Hornick, Thomas; Cohen, William; Miller, Glenn

    1987-01-01

    The proposal selection process for the Hubble Space Telescope is assisted by a robust and easy to use query program (TACOS). The system parses an English subset language sentence regardless of the order of the keyword phases, allowing the user a greater flexibility than a standard command query language. Capabilities for macro and procedure definition are also integrated. The system was designed for flexibility in both use and maintenance. In addition, TACOS can be applied to any knowledge domain that can be expressed in terms of a single reaction. The system was implemented mostly in Common LISP. The TACOS design is described in detail, with particular attention given to the implementation methods of sentence processing.

  20. Accelerating Research Impact in a Learning Health Care System

    PubMed Central

    Elwy, A. Rani; Sales, Anne E.; Atkins, David

    2017-01-01

    Background: Since 1998, the Veterans Health Administration (VHA) Quality Enhancement Research Initiative (QUERI) has supported more rapid implementation of research into clinical practice. Objectives: With the passage of the Veterans Access, Choice and Accountability Act of 2014 (Choice Act), QUERI further evolved to support VHA’s transformation into a Learning Health Care System by aligning science with clinical priority goals based on a strategic planning process and alignment of funding priorities with updated VHA priority goals in response to the Choice Act. Design: QUERI updated its strategic goals in response to independent assessments mandated by the Choice Act that recommended VHA reduce variation in care by providing a clear path to implement best practices. Specifically, QUERI updated its application process to ensure its centers (Programs) focus on cross-cutting VHA priorities and specify roadmaps for implementation of research-informed practices across different settings. QUERI also increased funding for scientific evaluations of the Choice Act and other policies in response to Commission on Care recommendations. Results: QUERI’s national network of Programs deploys effective practices using implementation strategies across different settings. QUERI Choice Act evaluations informed the law’s further implementation, setting the stage for additional rigorous national evaluations of other VHA programs and policies including community provider networks. Conclusions: Grounded in implementation science and evidence-based policy, QUERI serves as an example of how to operationalize core components of a Learning Health Care System, notably through rigorous evaluation and scientific testing of implementation strategies to ultimately reduce variation in quality and improve overall population health. PMID:27997456

  1. Automatic Query Formulations in Information Retrieval.

    ERIC Educational Resources Information Center

    Salton, G.; And Others

    1983-01-01

    Introduces methods designed to reduce role of search intermediaries by generating Boolean search formulations automatically using term frequency considerations from natural language statements provided by system patrons. Experimental results are supplied and methods are described for applying automatic query formulation process in practice.…

  2. Toward a Cognitive Task Analysis for Biomedical Query Mediation

    PubMed Central

    Hruby, Gregory W.; Cimino, James J.; Patel, Vimla; Weng, Chunhua

    2014-01-01

    In many institutions, data analysts use a Biomedical Query Mediation (BQM) process to facilitate data access for medical researchers. However, understanding of the BQM process is limited in the literature. To bridge this gap, we performed the initial steps of a cognitive task analysis using 31 BQM instances conducted between one analyst and 22 researchers in one academic department. We identified five top-level tasks, i.e., clarify research statement, explain clinical process, identify related data elements, locate EHR data element, and end BQM with either a database query or unmet, infeasible information needs, and 10 sub-tasks. We evaluated the BQM task model with seven data analysts from different clinical research institutions. Evaluators found all the tasks completely or semi-valid. This study contributes initial knowledge towards the development of a generalizable cognitive task representation for BQM. PMID:25954589

  3. Toward a cognitive task analysis for biomedical query mediation.

    PubMed

    Hruby, Gregory W; Cimino, James J; Patel, Vimla; Weng, Chunhua

    2014-01-01

    In many institutions, data analysts use a Biomedical Query Mediation (BQM) process to facilitate data access for medical researchers. However, understanding of the BQM process is limited in the literature. To bridge this gap, we performed the initial steps of a cognitive task analysis using 31 BQM instances conducted between one analyst and 22 researchers in one academic department. We identified five top-level tasks, i.e., clarify research statement, explain clinical process, identify related data elements, locate EHR data element, and end BQM with either a database query or unmet, infeasible information needs, and 10 sub-tasks. We evaluated the BQM task model with seven data analysts from different clinical research institutions. Evaluators found all the tasks completely or semi-valid. This study contributes initial knowledge towards the development of a generalizable cognitive task representation for BQM.

  4. The EarthServer Geology Service: web coverage services for geosciences

    NASA Astrophysics Data System (ADS)

    Laxton, John; Sen, Marcus; Passmore, James

    2014-05-01

    The EarthServer FP7 project is implementing web coverage services using the OGC WCS and WCPS standards for a range of earth science domains: cryospheric; atmospheric; oceanographic; planetary; and geological. BGS is providing the geological service (http://earthserver.bgs.ac.uk/). Geoscience has used remote sensed data from satellites and planes for some considerable time, but other areas of geosciences are less familiar with the use of coverage data. This is rapidly changing with the development of new sensor networks and the move from geological maps to geological spatial models. The BGS geology service is designed initially to address two coverage data use cases and three levels of data access restriction. Databases of remote sensed data are typically very large and commonly held offline, making it time-consuming for users to assess and then download data. The service is designed to allow the spatial selection, editing and display of Landsat and aerial photographic imagery, including band selection and contrast stretching. This enables users to rapidly view data, assess is usefulness for their purposes, and then enhance and download it if it is suitable. At present the service contains six band Landsat 7 (Blue, Green, Red, NIR 1, NIR 2, MIR) and three band false colour aerial photography (NIR, green, blue), totalling around 1Tb. Increasingly 3D spatial models are being produced in place of traditional geological maps. Models make explicit spatial information implicit on maps and thus are seen as a better way of delivering geosciences information to non-geoscientists. However web delivery of models, including the provision of suitable visualisation clients, has proved more challenging than delivering maps. The EarthServer geology service is delivering 35 surfaces as coverages, comprising the modelled superficial deposits of the Glasgow area. These can be viewed using a 3D web client developed in the EarthServer project by Fraunhofer. As well as remote sensed imagery and 3D models, the geology service is also delivering DTM coverages which can be viewed in the 3D client in conjunction with both imagery and models. The service is accessible through a web GUI which allows the imagery to be viewed against a range of background maps and DTMs, and in the 3D client; spatial selection to be carried out graphically; the results of image enhancement to be displayed; and selected data to be downloaded. The GUI also provides access to the Glasgow model in the 3D client, as well as tutorial material. In the final year of the project it is intended to increase the volume of data to 20Tb and enhance the WCPS processing, including depth and thickness querying of 3D models. We have also investigated the use of GeoSciML, developed to describe and interchange the information on geological maps, to describe model surface coverages. EarthServer is developing a combined WCPS and xQuery query language, and we will investigate applying this to the GeoSciML described surfaces to answer questions such as 'find all units with a predominant sand lithology within 25m of the surface'.

  5. Structuring Legacy Pathology Reports by openEHR Archetypes to Enable Semantic Querying.

    PubMed

    Kropf, Stefan; Krücken, Peter; Mueller, Wolf; Denecke, Kerstin

    2017-05-18

    Clinical information is often stored as free text, e.g. in discharge summaries or pathology reports. These documents are semi-structured using section headers, numbered lists, items and classification strings. However, it is still challenging to retrieve relevant documents since keyword searches applied on complete unstructured documents result in many false positive retrieval results. We are concentrating on the processing of pathology reports as an example for unstructured clinical documents. The objective is to transform reports semi-automatically into an information structure that enables an improved access and retrieval of relevant data. The data is expected to be stored in a standardized, structured way to make it accessible for queries that are applied to specific sections of a document (section-sensitive queries) and for information reuse. Our processing pipeline comprises information modelling, section boundary detection and section-sensitive queries. For enabling a focused search in unstructured data, documents are automatically structured and transformed into a patient information model specified through openEHR archetypes. The resulting XML-based pathology electronic health records (PEHRs) are queried by XQuery and visualized by XSLT in HTML. Pathology reports (PRs) can be reliably structured into sections by a keyword-based approach. The information modelling using openEHR allows saving time in the modelling process since many archetypes can be reused. The resulting standardized, structured PEHRs allow accessing relevant data by retrieving data matching user queries. Mapping unstructured reports into a standardized information model is a practical solution for a better access to data. Archetype-based XML enables section-sensitive retrieval and visualisation by well-established XML techniques. Focussing the retrieval to particular sections has the potential of saving retrieval time and improving the accuracy of the retrieval.

  6. Extending the Query Language of a Data Warehouse for Patient Recruitment.

    PubMed

    Dietrich, Georg; Ertl, Maximilian; Fette, Georg; Kaspar, Mathias; Krebs, Jonathan; Mackenrodt, Daniel; Störk, Stefan; Puppe, Frank

    2017-01-01

    Patient recruitment for clinical trials is a laborious task, as many texts have to be screened. Usually, this work is done manually and takes a lot of time. We have developed a system that automates the screening process. Besides standard keyword queries, the query language supports extraction of numbers, time-spans and negations. In a feasibility study for patient recruitment from a stroke unit with 40 patients, we achieved encouraging extraction rates above 95% for numbers and negations and ca. 86% for time spans.

  7. Data management with a landslide inventory of the Franconian Alb (Germany) using a spatial database and GIS tools

    NASA Astrophysics Data System (ADS)

    Bemm, Stefan; Sandmeier, Christine; Wilde, Martina; Jaeger, Daniel; Schwindt, Daniel; Terhorst, Birgit

    2014-05-01

    The area of the Swabian-Franconian cuesta landscape (Southern Germany) is highly prone to landslides. This was apparent in the late spring of 2013, when numerous landslides occurred as a consequence of heavy and long-lasting rainfalls. The specific climatic situation caused numerous damages with serious impact on settlements and infrastructure. Knowledge on spatial distribution of landslides, processes and characteristics are important to evaluate the potential risk that can occur from mass movements in those areas. In the frame of two projects about 400 landslides were mapped and detailed data sets were compiled during years 2011 to 2014 at the Franconian Alb. The studies are related to the project "Slope stability and hazard zones in the northern Bavarian cuesta" (DFG, German Research Foundation) as well as to the LfU (The Bavarian Environment Agency) within the project "Georisks and climate change - hazard indication map Jura". The central goal of the present study is to create a spatial database for landslides. The database should contain all fundamental parameters to characterize the mass movements and should provide the potential for secure data storage and data management, as well as statistical evaluations. The spatial database was created with PostgreSQL, an object-relational database management system and PostGIS, a spatial database extender for PostgreSQL, which provides the possibility to store spatial and geographic objects and to connect to several GIS applications, like GRASS GIS, SAGA GIS, QGIS and GDAL, a geospatial library (Obe et al. 2011). Database access for querying, importing, and exporting spatial and non-spatial data is ensured by using GUI or non-GUI connections. The database allows the use of procedural languages for writing advanced functions in the R, Python or Perl programming languages. It is possible to work directly with the (spatial) data entirety of the database in R. The inventory of the database includes (amongst others), informations on location, landslide types and causes, geomorphological positions, geometries, hazards and damages, as well as assessments related to the activity of landslides. Furthermore, there are stored spatial objects, which represent the components of a landslide, in particular the scarps and the accumulation areas. Besides, waterways, map sheets, contour lines, detailed infrastructure data, digital elevation models, aspect and slope data are included. Examples of spatial queries to the database are intersections of raster and vector data for calculating values for slope gradients or aspects of landslide areas and for creating multiple, overlaying sections for the comparison of slopes, as well as distances to the infrastructure or to the next receiving drainage. Furthermore, getting informations on landslide magnitudes, distribution and clustering, as well as potential correlations concerning geomorphological or geological conditions. The data management concept in this study can be implemented for any academic, public or private use, because it is independent from any obligatory licenses. The created spatial database offers a platform for interdisciplinary research and socio-economic questions, as well as for landslide susceptibility and hazard indication mapping. Obe, R.O., Hsu, L.S. 2011. PostGIS in action. - pp 492, Manning Publications, Stamford

  8. LAILAPS-QSM: A RESTful API and JAVA library for semantic query suggestions.

    PubMed

    Chen, Jinbo; Scholz, Uwe; Zhou, Ruonan; Lange, Matthias

    2018-03-01

    In order to access and filter content of life-science databases, full text search is a widely applied query interface. But its high flexibility and intuitiveness is paid for with potentially imprecise and incomplete query results. To reduce this drawback, query assistance systems suggest those combinations of keywords with the highest potential to match most of the relevant data records. Widespread approaches are syntactic query corrections that avoid misspelling and support expansion of words by suffixes and prefixes. Synonym expansion approaches apply thesauri, ontologies, and query logs. All need laborious curation and maintenance. Furthermore, access to query logs is in general restricted. Approaches that infer related queries by their query profile like research field, geographic location, co-authorship, affiliation etc. require user's registration and its public accessibility that contradict privacy concerns. To overcome these drawbacks, we implemented LAILAPS-QSM, a machine learning approach that reconstruct possible linguistic contexts of a given keyword query. The context is referred from the text records that are stored in the databases that are going to be queried or extracted for a general purpose query suggestion from PubMed abstracts and UniProt data. The supplied tool suite enables the pre-processing of these text records and the further computation of customized distributed word vectors. The latter are used to suggest alternative keyword queries. An evaluated of the query suggestion quality was done for plant science use cases. Locally present experts enable a cost-efficient quality assessment in the categories trait, biological entity, taxonomy, affiliation, and metabolic function which has been performed using ontology term similarities. LAILAPS-QSM mean information content similarity for 15 representative queries is 0.70, whereas 34% have a score above 0.80. In comparison, the information content similarity for human expert made query suggestions is 0.90. The software is either available as tool set to build and train dedicated query suggestion services or as already trained general purpose RESTful web service. The service uses open interfaces to be seamless embeddable into database frontends. The JAVA implementation uses highly optimized data structures and streamlined code to provide fast and scalable response for web service calls. The source code of LAILAPS-QSM is available under GNU General Public License version 2 in Bitbucket GIT repository: https://bitbucket.org/ipk_bit_team/bioescorte-suggestion.

  9. Computing quality scores and uncertainty for approximate pattern matching in geospatial semantic graphs

    DOE PAGES

    Stracuzzi, David John; Brost, Randolph C.; Phillips, Cynthia A.; ...

    2015-09-26

    Geospatial semantic graphs provide a robust foundation for representing and analyzing remote sensor data. In particular, they support a variety of pattern search operations that capture the spatial and temporal relationships among the objects and events in the data. However, in the presence of large data corpora, even a carefully constructed search query may return a large number of unintended matches. This work considers the problem of calculating a quality score for each match to the query, given that the underlying data are uncertain. As a result, we present a preliminary evaluation of three methods for determining both match qualitymore » scores and associated uncertainty bounds, illustrated in the context of an example based on overhead imagery data.« less

  10. A study on spatial decision support systems for HIV/AIDS prevention based on COM GIS technology

    NASA Astrophysics Data System (ADS)

    Yang, Kun; Luo, Huasong; Peng, Shungyun; Xu, Quanli

    2007-06-01

    Based on the deeply analysis of the current status and the existing problems of GIS technology applications in Epidemiology, this paper has proposed the method and process for establishing the spatial decision support systems of AIDS epidemic prevention by integrating the COM GIS, Spatial Database, GPS, Remote Sensing, and Communication technologies, as well as ASP and ActiveX software development technologies. One of the most important issues for constructing the spatial decision support systems of AIDS epidemic prevention is how to integrate the AIDS spreading models with GIS. The capabilities of GIS applications in the AIDS epidemic prevention have been described here in this paper firstly. Then some mature epidemic spreading models have also been discussed for extracting the computation parameters. Furthermore, a technical schema has been proposed for integrating the AIDS spreading models with GIS and relevant geospatial technologies, in which the GIS and model running platforms share a common spatial database and the computing results can be spatially visualized on Desktop or Web GIS clients. Finally, a complete solution for establishing the decision support systems of AIDS epidemic prevention has been offered in this paper based on the model integrating methods and ESRI COM GIS software packages. The general decision support systems are composed of data acquisition sub-systems, network communication sub-systems, model integrating sub-systems, AIDS epidemic information spatial database sub-systems, AIDS epidemic information querying and statistical analysis sub-systems, AIDS epidemic dynamic surveillance sub-systems, AIDS epidemic information spatial analysis and decision support sub-systems, as well as AIDS epidemic information publishing sub-systems based on Web GIS.

  11. What are we ‘tweeting’ about obesity? Mapping tweets with Topic Modeling and Geographic Information System

    PubMed Central

    Ghosh, Debarchana (Debs); Guha, Rajarshi

    2014-01-01

    Public health related tweets are difficult to identify in large conversational datasets like Twitter.com. Even more challenging is the visualization and analyses of the spatial patterns encoded in tweets. This study has the following objectives: How can topic modeling be used to identify relevant public health topics such as obesity on Twitter.com? What are the common obesity related themes? What is the spatial pattern of the themes? What are the research challenges of using large conversational datasets from social networking sites? Obesity is chosen as a test theme to demonstrate the effectiveness of topic modeling using Latent Dirichlet Allocation (LDA) and spatial analysis using Geographic Information System (GIS). The dataset is constructed from tweets (originating from the United States) extracted from Twitter.com on obesity-related queries. Examples of such queries are ‘food deserts’, ‘fast food’, and ‘childhood obesity’. The tweets are also georeferenced and time stamped. Three cohesive and meaningful themes such as ‘childhood obesity and schools’, ‘obesity prevention’, and ‘obesity and food habits’ are extracted from the LDA model. The GIS analysis of the extracted themes show distinct spatial pattern between rural and urban areas, northern and southern states, and between coasts and inland states. Further, relating the themes with ancillary datasets such as US census and locations of fast food restaurants based upon the location of the tweets in a GIS environment opened new avenues for spatial analyses and mapping. Therefore the techniques used in this study provide a possible toolset for computational social scientists in general and health researchers in specific to better understand health problems from large conversational datasets. PMID:25126022

  12. What are we 'tweeting' about obesity? Mapping tweets with Topic Modeling and Geographic Information System.

    PubMed

    Ghosh, Debarchana Debs; Guha, Rajarshi

    2013-01-01

    Public health related tweets are difficult to identify in large conversational datasets like Twitter.com. Even more challenging is the visualization and analyses of the spatial patterns encoded in tweets. This study has the following objectives: How can topic modeling be used to identify relevant public health topics such as obesity on Twitter.com? What are the common obesity related themes? What is the spatial pattern of the themes? What are the research challenges of using large conversational datasets from social networking sites? Obesity is chosen as a test theme to demonstrate the effectiveness of topic modeling using Latent Dirichlet Allocation (LDA) and spatial analysis using Geographic Information System (GIS). The dataset is constructed from tweets (originating from the United States) extracted from Twitter.com on obesity-related queries. Examples of such queries are 'food deserts', 'fast food', and 'childhood obesity'. The tweets are also georeferenced and time stamped. Three cohesive and meaningful themes such as 'childhood obesity and schools', 'obesity prevention', and 'obesity and food habits' are extracted from the LDA model. The GIS analysis of the extracted themes show distinct spatial pattern between rural and urban areas, northern and southern states, and between coasts and inland states. Further, relating the themes with ancillary datasets such as US census and locations of fast food restaurants based upon the location of the tweets in a GIS environment opened new avenues for spatial analyses and mapping. Therefore the techniques used in this study provide a possible toolset for computational social scientists in general and health researchers in specific to better understand health problems from large conversational datasets.

  13. Identifying spatially similar gene expression patterns in early stage fruit fly embryo images: binary feature versus invariant moment digital representations

    PubMed Central

    Gurunathan, Rajalakshmi; Van Emden, Bernard; Panchanathan, Sethuraman; Kumar, Sudhir

    2004-01-01

    Background Modern developmental biology relies heavily on the analysis of embryonic gene expression patterns. Investigators manually inspect hundreds or thousands of expression patterns to identify those that are spatially similar and to ultimately infer potential gene interactions. However, the rapid accumulation of gene expression pattern data over the last two decades, facilitated by high-throughput techniques, has produced a need for the development of efficient approaches for direct comparison of images, rather than their textual descriptions, to identify spatially similar expression patterns. Results The effectiveness of the Binary Feature Vector (BFV) and Invariant Moment Vector (IMV) based digital representations of the gene expression patterns in finding biologically meaningful patterns was compared for a small (226 images) and a large (1819 images) dataset. For each dataset, an ordered list of images, with respect to a query image, was generated to identify overlapping and similar gene expression patterns, in a manner comparable to what a developmental biologist might do. The results showed that the BFV representation consistently outperforms the IMV representation in finding biologically meaningful matches when spatial overlap of the gene expression pattern and the genes involved are considered. Furthermore, we explored the value of conducting image-content based searches in a dataset where individual expression components (or domains) of multi-domain expression patterns were also included separately. We found that this technique improves performance of both IMV and BFV based searches. Conclusions We conclude that the BFV representation consistently produces a more extensive and better list of biologically useful patterns than the IMV representation. The high quality of results obtained scales well as the search database becomes larger, which encourages efforts to build automated image query and retrieval systems for spatial gene expression patterns. PMID:15603586

  14. BioFed: federated query processing over life sciences linked open data.

    PubMed

    Hasnain, Ali; Mehmood, Qaiser; Sana E Zainab, Syeda; Saleem, Muhammad; Warren, Claude; Zehra, Durre; Decker, Stefan; Rebholz-Schuhmann, Dietrich

    2017-03-15

    Biomedical data, e.g. from knowledge bases and ontologies, is increasingly made available following open linked data principles, at best as RDF triple data. This is a necessary step towards unified access to biological data sets, but this still requires solutions to query multiple endpoints for their heterogeneous data to eventually retrieve all the meaningful information. Suggested solutions are based on query federation approaches, which require the submission of SPARQL queries to endpoints. Due to the size and complexity of available data, these solutions have to be optimised for efficient retrieval times and for users in life sciences research. Last but not least, over time, the reliability of data resources in terms of access and quality have to be monitored. Our solution (BioFed) federates data over 130 SPARQL endpoints in life sciences and tailors query submission according to the provenance information. BioFed has been evaluated against the state of the art solution FedX and forms an important benchmark for the life science domain. The efficient cataloguing approach of the federated query processing system 'BioFed', the triple pattern wise source selection and the semantic source normalisation forms the core to our solution. It gathers and integrates data from newly identified public endpoints for federated access. Basic provenance information is linked to the retrieved data. Last but not least, BioFed makes use of the latest SPARQL standard (i.e., 1.1) to leverage the full benefits for query federation. The evaluation is based on 10 simple and 10 complex queries, which address data in 10 major and very popular data sources (e.g., Dugbank, Sider). BioFed is a solution for a single-point-of-access for a large number of SPARQL endpoints providing life science data. It facilitates efficient query generation for data access and provides basic provenance information in combination with the retrieved data. BioFed fully supports SPARQL 1.1 and gives access to the endpoint's availability based on the EndpointData graph. Our evaluation of BioFed against FedX is based on 20 heterogeneous federated SPARQL queries and shows competitive execution performance in comparison to FedX, which can be attributed to the provision of provenance information for the source selection. Developing and testing federated query engines for life sciences data is still a challenging task. According to our findings, it is advantageous to optimise the source selection. The cataloguing of SPARQL endpoints, including type and property indexing, leads to efficient querying of data resources over the Web of Data. This could even be further improved through the use of ontologies, e.g., for abstract normalisation of query terms.

  15. Query by example video based on fuzzy c-means initialized by fixed clustering center

    NASA Astrophysics Data System (ADS)

    Hou, Sujuan; Zhou, Shangbo; Siddique, Muhammad Abubakar

    2012-04-01

    Currently, the high complexity of video contents has posed the following major challenges for fast retrieval: (1) efficient similarity measurements, and (2) efficient indexing on the compact representations. A video-retrieval strategy based on fuzzy c-means (FCM) is presented for querying by example. Initially, the query video is segmented and represented by a set of shots, each shot can be represented by a key frame, and then we used video processing techniques to find visual cues to represent the key frame. Next, because the FCM algorithm is sensitive to the initializations, here we initialized the cluster center by the shots of query video so that users could achieve appropriate convergence. After an FCM cluster was initialized by the query video, each shot of query video was considered a benchmark point in the aforesaid cluster, and each shot in the database possessed a class label. The similarity between the shots in the database with the same class label and benchmark point can be transformed into the distance between them. Finally, the similarity between the query video and the video in database was transformed into the number of similar shots. Our experimental results demonstrated the performance of this proposed approach.

  16. Representation and alignment of sung queries for music information retrieval

    NASA Astrophysics Data System (ADS)

    Adams, Norman H.; Wakefield, Gregory H.

    2005-09-01

    The pursuit of robust and rapid query-by-humming systems, which search melodic databases using sung queries, is a common theme in music information retrieval. The retrieval aspect of this database problem has received considerable attention, whereas the front-end processing of sung queries and the data structure to represent melodies has been based on musical intuition and historical momentum. The present work explores three time series representations for sung queries: a sequence of notes, a ``smooth'' pitch contour, and a sequence of pitch histograms. The performance of the three representations is compared using a collection of naturally sung queries. It is found that the most robust performance is achieved by the representation with highest dimension, the smooth pitch contour, but that this representation presents a formidable computational burden. For all three representations, it is necessary to align the query and target in order to achieve robust performance. The computational cost of the alignment is quadratic, hence it is necessary to keep the dimension small for rapid retrieval. Accordingly, iterative deepening is employed to achieve both robust performance and rapid retrieval. Finally, the conventional iterative framework is expanded to adapt the alignment constraints based on previous iterations, further expediting retrieval without degrading performance.

  17. The Use of Dynamic Segment Scoring for Language-Independent Question Answering

    DTIC Science & Technology

    2001-01-01

    initial window with one sentence is compared to scores corre- his/PRONOUN brother/ CONSANGUINITY like/SIMILARITY his/PRONOUN call/NOMENCLATURE he/PRONOUN...the query processing mod- ule. Using the differences between index numbers to specify phys- ical distance relationships among query keywords, we can

  18. Data Processing on Database Management Systems with Fuzzy Query

    NASA Astrophysics Data System (ADS)

    Şimşek, Irfan; Topuz, Vedat

    In this study, a fuzzy query tool (SQLf) for non-fuzzy database management systems was developed. In addition, samples of fuzzy queries were made by using real data with the tool developed in this study. Performance of SQLf was tested with the data about the Marmara University students' food grant. The food grant data were collected in MySQL database by using a form which had been filled on the web. The students filled a form on the web to describe their social and economical conditions for the food grant request. This form consists of questions which have fuzzy and crisp answers. The main purpose of this fuzzy query is to determine the students who deserve the grant. The SQLf easily found the eligible students for the grant through predefined fuzzy values. The fuzzy query tool (SQLf) could be used easily with other database system like ORACLE and SQL server.

  19. Automatic Extraction of Destinations, Origins and Route Parts from Human Generated Route Directions

    NASA Astrophysics Data System (ADS)

    Zhang, Xiao; Mitra, Prasenjit; Klippel, Alexander; Maceachren, Alan

    Researchers from the cognitive and spatial sciences are studying text descriptions of movement patterns in order to examine how humans communicate and understand spatial information. In particular, route directions offer a rich source of information on how cognitive systems conceptualize movement patterns by segmenting them into meaningful parts. Route directions are composed using a plethora of cognitive spatial organization principles: changing levels of granularity, hierarchical organization, incorporation of cognitively and perceptually salient elements, and so forth. Identifying such information in text documents automatically is crucial for enabling machine-understanding of human spatial language. The benefits are: a) creating opportunities for large-scale studies of human linguistic behavior; b) extracting and georeferencing salient entities (landmarks) that are used by human route direction providers; c) developing methods to translate route directions to sketches and maps; and d) enabling queries on large corpora of crawled/analyzed movement data. In this paper, we introduce our approach and implementations that bring us closer to the goal of automatically processing linguistic route directions. We report on research directed at one part of the larger problem, that is, extracting the three most critical parts of route directions and movement patterns in general: origin, destination, and route parts. We use machine-learning based algorithms to extract these parts of routes, including, for example, destination names and types. We prove the effectiveness of our approach in several experiments using hand-tagged corpora.

  20. A Query Language for Handling Big Observation Data Sets in the Sensor Web

    NASA Astrophysics Data System (ADS)

    Autermann, Christian; Stasch, Christoph; Jirka, Simon; Koppe, Roland

    2017-04-01

    The Sensor Web provides a framework for the standardized Web-based sharing of environmental observations and sensor metadata. While the issue of varying data formats and protocols is addressed by these standards, the fast growing size of observational data is imposing new challenges for the application of these standards. Most solutions for handling big observational datasets currently focus on remote sensing applications, while big in-situ datasets relying on vector features still lack a solid approach. Conventional Sensor Web technologies may not be adequate, as the sheer size of the data transmitted and the amount of metadata accumulated may render traditional OGC Sensor Observation Services (SOS) unusable. Besides novel approaches to store and process observation data in place, e.g. by harnessing big data technologies from mainstream IT, the access layer has to be amended to utilize and integrate these large observational data archives into applications and to enable analysis. For this, an extension to the SOS will be discussed that establishes a query language to dynamically process and filter observations at storage level, similar to the OGC Web Coverage Service (WCS) and it's Web Coverage Processing Service (WCPS) extension. This will enable applications to request e.g. spatial or temporal aggregated data sets in a resolution it is able to display or it requires. The approach will be developed and implemented in cooperation with the The Alfred Wegener Institute, Helmholtz Centre for Polar and Marine Research whose catalogue of data compromises marine observations of physical, chemical and biological phenomena from a wide variety of sensors, including mobile (like research vessels, aircrafts or underwater vehicles) and stationary (like buoys or research stations). Observations are made with a high temporal resolution and the resulting time series may span multiple decades.

  1. Mercury Toolset for Spatiotemporal Metadata

    NASA Technical Reports Server (NTRS)

    Wilson, Bruce E.; Palanisamy, Giri; Devarakonda, Ranjeet; Rhyne, B. Timothy; Lindsley, Chris; Green, James

    2010-01-01

    Mercury (http://mercury.ornl.gov) is a set of tools for federated harvesting, searching, and retrieving metadata, particularly spatiotemporal metadata. Version 3.0 of the Mercury toolset provides orders of magnitude improvements in search speed, support for additional metadata formats, integration with Google Maps for spatial queries, facetted type search, support for RSS (Really Simple Syndication) delivery of search results, and enhanced customization to meet the needs of the multiple projects that use Mercury. It provides a single portal to very quickly search for data and information contained in disparate data management systems, each of which may use different metadata formats. Mercury harvests metadata and key data from contributing project servers distributed around the world and builds a centralized index. The search interfaces then allow the users to perform a variety of fielded, spatial, and temporal searches across these metadata sources. This centralized repository of metadata with distributed data sources provides extremely fast search results to the user, while allowing data providers to advertise the availability of their data and maintain complete control and ownership of that data. Mercury periodically (typically daily) harvests metadata sources through a collection of interfaces and re-indexes these metadata to provide extremely rapid search capabilities, even over collections with tens of millions of metadata records. A number of both graphical and application interfaces have been constructed within Mercury, to enable both human users and other computer programs to perform queries. Mercury was also designed to support multiple different projects, so that the particular fields that can be queried and used with search filters are easy to configure for each different project.

  2. Mercury Toolset for Spatiotemporal Metadata

    NASA Astrophysics Data System (ADS)

    Devarakonda, Ranjeet; Palanisamy, Giri; Green, James; Wilson, Bruce; Rhyne, B. Timothy; Lindsley, Chris

    2010-06-01

    Mercury (http://mercury.ornl.gov) is a set of tools for federated harvesting, searching, and retrieving metadata, particularly spatiotemporal metadata. Version 3.0 of the Mercury toolset provides orders of magnitude improvements in search speed, support for additional metadata formats, integration with Google Maps for spatial queries, facetted type search, support for RSS (Really Simple Syndication) delivery of search results, and enhanced customization to meet the needs of the multiple projects that use Mercury. It provides a single portal to very quickly search for data and information contained in disparate data management systems, each of which may use different metadata formats. Mercury harvests metadata and key data from contributing project servers distributed around the world and builds a centralized index. The search interfaces then allow the users to perform a variety of fielded, spatial, and temporal searches across these metadata sources. This centralized repository of metadata with distributed data sources provides extremely fast search results to the user, while allowing data providers to advertise the availability of their data and maintain complete control and ownership of that data. Mercury periodically (typically daily)harvests metadata sources through a collection of interfaces and re-indexes these metadata to provide extremely rapid search capabilities, even over collections with tens of millions of metadata records. A number of both graphical and application interfaces have been constructed within Mercury, to enable both human users and other computer programs to perform queries. Mercury was also designed to support multiple different projects, so that the particular fields that can be queried and used with search filters are easy to configure for each different project.

  3. Route Generation for a Synthetic Character (BOT) Using a Partial or Incomplete Knowledge Route Generation Algorithm in UT2004 Virtual Environment

    NASA Technical Reports Server (NTRS)

    Hanold, Gregg T.; Hanold, David T.

    2010-01-01

    This paper presents a new Route Generation Algorithm that accurately and realistically represents human route planning and navigation for Military Operations in Urban Terrain (MOUT). The accuracy of this algorithm in representing human behavior is measured using the Unreal Tournament(Trademark) 2004 (UT2004) Game Engine to provide the simulation environment in which the differences between the routes taken by the human player and those of a Synthetic Agent (BOT) executing the A-star algorithm and the new Route Generation Algorithm can be compared. The new Route Generation Algorithm computes the BOT route based on partial or incomplete knowledge received from the UT2004 game engine during game play. To allow BOT navigation to occur continuously throughout the game play with incomplete knowledge of the terrain, a spatial network model of the UT2004 MOUT terrain is captured and stored in an Oracle 11 9 Spatial Data Object (SOO). The SOO allows a partial data query to be executed to generate continuous route updates based on the terrain knowledge, and stored dynamic BOT, Player and environmental parameters returned by the query. The partial data query permits the dynamic adjustment of the planned routes by the Route Generation Algorithm based on the current state of the environment during a simulation. The dynamic nature of this algorithm more accurately allows the BOT to mimic the routes taken by the human executing under the same conditions thereby improving the realism of the BOT in a MOUT simulation environment.

  4. Spatial cyberinfrastructures, ontologies, and the humanities.

    PubMed

    Sieber, Renee E; Wellen, Christopher C; Jin, Yuan

    2011-04-05

    We report on research into building a cyberinfrastructure for Chinese biographical and geographic data. Our cyberinfrastructure contains (i) the McGill-Harvard-Yenching Library Ming Qing Women's Writings database (MQWW), the only online database on historical Chinese women's writings, (ii) the China Biographical Database, the authority for Chinese historical people, and (iii) the China Historical Geographical Information System, one of the first historical geographic information systems. Key to this integration is that linked databases retain separate identities as bases of knowledge, while they possess sufficient semantic interoperability to allow for multidatabase concepts and to support cross-database queries on an ad hoc basis. Computational ontologies create underlying semantics for database access. This paper focuses on the spatial component in a humanities cyberinfrastructure, which includes issues of conflicting data, heterogeneous data models, disambiguation, and geographic scale. First, we describe the methodology for integrating the databases. Then we detail the system architecture, which includes a tier of ontologies and schema. We describe the user interface and applications that allow for cross-database queries. For instance, users should be able to analyze the data, examine hypotheses on spatial and temporal relationships, and generate historical maps with datasets from MQWW for research, teaching, and publication on Chinese women writers, their familial relations, publishing venues, and the literary and social communities. Last, we discuss the social side of cyberinfrastructure development, as people are considered to be as critical as the technical components for its success.

  5. On describing human white matter anatomy: the white matter query language.

    PubMed

    Wassermann, Demian; Makris, Nikos; Rathi, Yogesh; Shenton, Martha; Kikinis, Ron; Kubicki, Marek; Westin, Carl-Fredrik

    2013-01-01

    The main contribution of this work is the careful syntactical definition of major white matter tracts in the human brain based on a neuroanatomist's expert knowledge. We present a technique to formally describe white matter tracts and to automatically extract them from diffusion MRI data. The framework is based on a novel query language with a near-to-English textual syntax. This query language allows us to construct a dictionary of anatomical definitions describing white matter tracts. The definitions include adjacent gray and white matter regions, and rules for spatial relations. This enables automated coherent labeling of white matter anatomy across subjects. We use our method to encode anatomical knowledge in human white matter describing 10 association and 8 projection tracts per hemisphere and 7 commissural tracts. The technique is shown to be comparable in accuracy to manual labeling. We present results applying this framework to create a white matter atlas from 77 healthy subjects, and we use this atlas in a proof-of-concept study to detect tract changes specific to schizophrenia.

  6. Combining semantic technologies with a content-based image retrieval system - Preliminary considerations

    NASA Astrophysics Data System (ADS)

    Chmiel, P.; Ganzha, M.; Jaworska, T.; Paprzycki, M.

    2017-10-01

    Nowadays, as a part of systematic growth of volume, and variety, of information that can be found on the Internet, we observe also dramatic increase in sizes of available image collections. There are many ways to help users browsing / selecting images of interest. One of popular approaches are Content-Based Image Retrieval (CBIR) systems, which allow users to search for images that match their interests, expressed in the form of images (query by example). However, we believe that image search and retrieval could take advantage of semantic technologies. We have decided to test this hypothesis. Specifically, on the basis of knowledge captured in the CBIR, we have developed a domain ontology of residential real estate (detached houses, in particular). This allows us to semantically represent each image (and its constitutive architectural elements) represented within the CBIR. The proposed ontology was extended to capture not only the elements resulting from image segmentation, but also "spatial relations" between them. As a result, a new approach to querying the image database (semantic querying) has materialized, thus extending capabilities of the developed system.

  7. Pharmit: interactive exploration of chemical space.

    PubMed

    Sunseri, Jocelyn; Koes, David Ryan

    2016-07-08

    Pharmit (http://pharmit.csb.pitt.edu) provides an online, interactive environment for the virtual screening of large compound databases using pharmacophores, molecular shape and energy minimization. Users can import, create and edit virtual screening queries in an interactive browser-based interface. Queries are specified in terms of a pharmacophore, a spatial arrangement of the essential features of an interaction, and molecular shape. Search results can be further ranked and filtered using energy minimization. In addition to a number of pre-built databases of popular compound libraries, users may submit their own compound libraries for screening. Pharmit uses state-of-the-art sub-linear algorithms to provide interactive screening of millions of compounds. Queries typically take a few seconds to a few minutes depending on their complexity. This allows users to iteratively refine their search during a single session. The easy access to large chemical datasets provided by Pharmit simplifies and accelerates structure-based drug design. Pharmit is available under a dual BSD/GPL open-source license. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  8. Nonmaterialized Relations and the Support of Information Retrieval Applications by Relational Database Systems.

    ERIC Educational Resources Information Center

    Lynch, Clifford A.

    1991-01-01

    Describes several aspects of the problem of supporting information retrieval system query requirements in the relational database management system (RDBMS) environment and proposes an extension to query processing called nonmaterialized relations. User interactions with information retrieval systems are discussed, and nonmaterialized relations are…

  9. Multi-INT Complex Event Processing using Approximate, Incremental Graph Pattern Search

    DTIC Science & Technology

    2012-06-01

    graph pattern search and SPARQL queries . Total execution time for 10 executions each of 5 random pattern searches in synthetic data sets...01/11 1000 10000 100000 RDF triples Time (secs) 10 20 Graph pattern algorithm SPARQL queries Initial Performance Comparisons 09/18/11 2011 Thrust Area

  10. Hybrid Filtering in Semantic Query Processing

    ERIC Educational Resources Information Center

    Jeong, Hanjo

    2011-01-01

    This dissertation presents a hybrid filtering method and a case-based reasoning framework for enhancing the effectiveness of Web search. Web search may not reflect user needs, intent, context, and preferences, because today's keyword-based search is lacking semantic information to capture the user's context and intent in posing the search query.…

  11. Enabling multi-level relevance feedback on PubMed by integrating rank learning into DBMS.

    PubMed

    Yu, Hwanjo; Kim, Taehoon; Oh, Jinoh; Ko, Ilhwan; Kim, Sungchul; Han, Wook-Shin

    2010-04-16

    Finding relevant articles from PubMed is challenging because it is hard to express the user's specific intention in the given query interface, and a keyword query typically retrieves a large number of results. Researchers have applied machine learning techniques to find relevant articles by ranking the articles according to the learned relevance function. However, the process of learning and ranking is usually done offline without integrated with the keyword queries, and the users have to provide a large amount of training documents to get a reasonable learning accuracy. This paper proposes a novel multi-level relevance feedback system for PubMed, called RefMed, which supports both ad-hoc keyword queries and a multi-level relevance feedback in real time on PubMed. RefMed supports a multi-level relevance feedback by using the RankSVM as the learning method, and thus it achieves higher accuracy with less feedback. RefMed "tightly" integrates the RankSVM into RDBMS to support both keyword queries and the multi-level relevance feedback in real time; the tight coupling of the RankSVM and DBMS substantially improves the processing time. An efficient parameter selection method for the RankSVM is also proposed, which tunes the RankSVM parameter without performing validation. Thereby, RefMed achieves a high learning accuracy in real time without performing a validation process. RefMed is accessible at http://dm.postech.ac.kr/refmed. RefMed is the first multi-level relevance feedback system for PubMed, which achieves a high accuracy with less feedback. It effectively learns an accurate relevance function from the user's feedback and efficiently processes the function to return relevant articles in real time.

  12. Enabling multi-level relevance feedback on PubMed by integrating rank learning into DBMS

    PubMed Central

    2010-01-01

    Background Finding relevant articles from PubMed is challenging because it is hard to express the user's specific intention in the given query interface, and a keyword query typically retrieves a large number of results. Researchers have applied machine learning techniques to find relevant articles by ranking the articles according to the learned relevance function. However, the process of learning and ranking is usually done offline without integrated with the keyword queries, and the users have to provide a large amount of training documents to get a reasonable learning accuracy. This paper proposes a novel multi-level relevance feedback system for PubMed, called RefMed, which supports both ad-hoc keyword queries and a multi-level relevance feedback in real time on PubMed. Results RefMed supports a multi-level relevance feedback by using the RankSVM as the learning method, and thus it achieves higher accuracy with less feedback. RefMed "tightly" integrates the RankSVM into RDBMS to support both keyword queries and the multi-level relevance feedback in real time; the tight coupling of the RankSVM and DBMS substantially improves the processing time. An efficient parameter selection method for the RankSVM is also proposed, which tunes the RankSVM parameter without performing validation. Thereby, RefMed achieves a high learning accuracy in real time without performing a validation process. RefMed is accessible at http://dm.postech.ac.kr/refmed. Conclusions RefMed is the first multi-level relevance feedback system for PubMed, which achieves a high accuracy with less feedback. It effectively learns an accurate relevance function from the user’s feedback and efficiently processes the function to return relevant articles in real time. PMID:20406504

  13. Branch-Based Centralized Data Collection for Smart Grids Using Wireless Sensor Networks

    PubMed Central

    Kim, Kwangsoo; Jin, Seong-il

    2015-01-01

    A smart grid is one of the most important applications in smart cities. In a smart grid, a smart meter acts as a sensor node in a sensor network, and a central device collects power usage from every smart meter. This paper focuses on a centralized data collection problem of how to collect every power usage from every meter without collisions in an environment in which the time synchronization among smart meters is not guaranteed. To solve the problem, we divide a tree that a sensor network constructs into several branches. A conflict-free query schedule is generated based on the branches. Each power usage is collected according to the schedule. The proposed method has important features: shortening query processing time and avoiding collisions between a query and query responses. We evaluate this method using the ns-2 simulator. The experimental results show that this method can achieve both collision avoidance and fast query processing at the same time. The success rate of data collection at a sink node executing this method is 100%. Its running time is about 35 percent faster than that of the round-robin method, and its memory size is reduced to about 10% of that of the depth-first search method. PMID:26007734

  14. Branch-based centralized data collection for smart grids using wireless sensor networks.

    PubMed

    Kim, Kwangsoo; Jin, Seong-il

    2015-05-21

    A smart grid is one of the most important applications in smart cities. In a smart grid, a smart meter acts as a sensor node in a sensor network, and a central device collects power usage from every smart meter. This paper focuses on a centralized data collection problem of how to collect every power usage from every meter without collisions in an environment in which the time synchronization among smart meters is not guaranteed. To solve the problem, we divide a tree that a sensor network constructs into several branches. A conflict-free query schedule is generated based on the branches. Each power usage is collected according to the schedule. The proposed method has important features: shortening query processing time and avoiding collisions between a query and query responses. We evaluate this method using the ns-2 simulator. The experimental results show that this method can achieve both collision avoidance and fast query processing at the same time. The success rate of data collection at a sink node executing this method is 100%. Its running time is about 35 percent faster than that of the round-robin method, and its memory size is reduced to about 10% of that of the depth-first search method.

  15. Monotonically improving approximate answers to relational algebra queries

    NASA Technical Reports Server (NTRS)

    Smith, Kenneth P.; Liu, J. W. S.

    1989-01-01

    We present here a query processing method that produces approximate answers to queries posed in standard relational algebra. This method is monotone in the sense that the accuracy of the approximate result improves with the amount of time spent producing the result. This strategy enables us to trade the time to produce the result for the accuracy of the result. An approximate relational model that characterizes appromimate relations and a partial order for comparing them is developed. Relational operators which operate on and return approximate relations are defined.

  16. Private and Efficient Query Processing on Outsourced Genomic Databases.

    PubMed

    Ghasemi, Reza; Al Aziz, Md Momin; Mohammed, Noman; Dehkordi, Massoud Hadian; Jiang, Xiaoqian

    2017-09-01

    Applications of genomic studies are spreading rapidly in many domains of science and technology such as healthcare, biomedical research, direct-to-consumer services, and legal and forensic. However, there are a number of obstacles that make it hard to access and process a big genomic database for these applications. First, sequencing genomic sequence is a time consuming and expensive process. Second, it requires large-scale computation and storage systems to process genomic sequences. Third, genomic databases are often owned by different organizations, and thus, not available for public usage. Cloud computing paradigm can be leveraged to facilitate the creation and sharing of big genomic databases for these applications. Genomic data owners can outsource their databases in a centralized cloud server to ease the access of their databases. However, data owners are reluctant to adopt this model, as it requires outsourcing the data to an untrusted cloud service provider that may cause data breaches. In this paper, we propose a privacy-preserving model for outsourcing genomic data to a cloud. The proposed model enables query processing while providing privacy protection of genomic databases. Privacy of the individuals is guaranteed by permuting and adding fake genomic records in the database. These techniques allow cloud to evaluate count and top-k queries securely and efficiently. Experimental results demonstrate that a count and a top-k query over 40 Single Nucleotide Polymorphisms (SNPs) in a database of 20 000 records takes around 100 and 150 s, respectively.

  17. Private and Efficient Query Processing on Outsourced Genomic Databases

    PubMed Central

    Ghasemi, Reza; Al Aziz, Momin; Mohammed, Noman; Dehkordi, Massoud Hadian; Jiang, Xiaoqian

    2017-01-01

    Applications of genomic studies are spreading rapidly in many domains of science and technology such as healthcare, biomedical research, direct-to-consumer services, and legal and forensic. However, there are a number of obstacles that make it hard to access and process a big genomic database for these applications. First, sequencing genomic sequence is a time-consuming and expensive process. Second, it requires large-scale computation and storage systems to processes genomic sequences. Third, genomic databases are often owned by different organizations and thus not available for public usage. Cloud computing paradigm can be leveraged to facilitate the creation and sharing of big genomic databases for these applications. Genomic data owners can outsource their databases in a centralized cloud server to ease the access of their databases. However, data owners are reluctant to adopt this model, as it requires outsourcing the data to an untrusted cloud service provider that may cause data breaches. In this paper, we propose a privacy-preserving model for outsourcing genomic data to a cloud. The proposed model enables query processing while providing privacy protection of genomic databases. Privacy of the individuals is guaranteed by permuting and adding fake genomic records in the database. These techniques allow cloud to evaluate count and top-k queries securely and efficiently. Experimental results demonstrate that a count and a top-k query over 40 SNPs in a database of 20,000 records takes around 100 and 150 seconds, respectively. PMID:27834660

  18. A geospatial database model for the management of remote sensing datasets at multiple spectral, spatial, and temporal scales

    NASA Astrophysics Data System (ADS)

    Ifimov, Gabriela; Pigeau, Grace; Arroyo-Mora, J. Pablo; Soffer, Raymond; Leblanc, George

    2017-10-01

    In this study the development and implementation of a geospatial database model for the management of multiscale datasets encompassing airborne imagery and associated metadata is presented. To develop the multi-source geospatial database we have used a Relational Database Management System (RDBMS) on a Structure Query Language (SQL) server which was then integrated into ArcGIS and implemented as a geodatabase. The acquired datasets were compiled, standardized, and integrated into the RDBMS, where logical associations between different types of information were linked (e.g. location, date, and instrument). Airborne data, at different processing levels (digital numbers through geocorrected reflectance), were implemented in the geospatial database where the datasets are linked spatially and temporally. An example dataset consisting of airborne hyperspectral imagery, collected for inter and intra-annual vegetation characterization and detection of potential hydrocarbon seepage events over pipeline areas, is presented. Our work provides a model for the management of airborne imagery, which is a challenging aspect of data management in remote sensing, especially when large volumes of data are collected.

  19. AMRZone: A Runtime AMR Data Sharing Framework For Scientific Applications

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhang, Wenzhao; Tang, Houjun; Harenberg, Steven

    Frameworks that facilitate runtime data sharing across multiple applications are of great importance for scientific data analytics. Although existing frameworks work well over uniform mesh data, they can not effectively handle adaptive mesh refinement (AMR) data. Among the challenges to construct an AMR-capable framework include: (1) designing an architecture that facilitates online AMR data management; (2) achieving a load-balanced AMR data distribution for the data staging space at runtime; and (3) building an effective online index to support the unique spatial data retrieval requirements for AMR data. Towards addressing these challenges to support runtime AMR data sharing across scientific applications,more » we present the AMRZone framework. Experiments over real-world AMR datasets demonstrate AMRZone's effectiveness at achieving a balanced workload distribution, reading/writing large-scale datasets with thousands of parallel processes, and satisfying queries with spatial constraints. Moreover, AMRZone's performance and scalability are even comparable with existing state-of-the-art work when tested over uniform mesh data with up to 16384 cores; in the best case, our framework achieves a 46% performance improvement.« less

  20. The Semantic Retrieval of Spatial Data Service Based on Ontology in SIG

    NASA Astrophysics Data System (ADS)

    Sun, S.; Liu, D.; Li, G.; Yu, W.

    2011-08-01

    The research of SIG (Spatial Information Grid) mainly solves the problem of how to connect different computing resources, so that users can use all the resources in the Grid transparently and seamlessly. In SIG, spatial data service is described in some kinds of specifications, which use different meta-information of each kind of services. This kind of standardization cannot resolve the problem of semantic heterogeneity, which may limit user to obtain the required resources. This paper tries to solve two kinds of semantic heterogeneities (name heterogeneity and structure heterogeneity) in spatial data service retrieval based on ontology, and also, based on the hierarchical subsumption relationship among concept in ontology, the query words can be extended and more resource can be matched and found for user. These applications of ontology in spatial data resource retrieval can help to improve the capability of keyword matching, and find more related resources.

  1. Labeling RDF Graphs for Linear Time and Space Querying

    NASA Astrophysics Data System (ADS)

    Furche, Tim; Weinzierl, Antonius; Bry, François

    Indices and data structures for web querying have mostly considered tree shaped data, reflecting the view of XML documents as tree-shaped. However, for RDF (and when querying ID/IDREF constraints in XML) data is indisputably graph-shaped. In this chapter, we first study existing indexing and labeling schemes for RDF and other graph datawith focus on support for efficient adjacency and reachability queries. For XML, labeling schemes are an important part of the widespread adoption of XML, in particular for mapping XML to existing (relational) database technology. However, the existing indexing and labeling schemes for RDF (and graph data in general) sacrifice one of the most attractive properties of XML labeling schemes, the constant time (and per-node space) test for adjacency (child) and reachability (descendant). In the second part, we introduce the first labeling scheme for RDF data that retains this property and thus achieves linear time and space processing of acyclic RDF queries on a significantly larger class of graphs than previous approaches (which are mostly limited to tree-shaped data). Finally, we show how this labeling scheme can be applied to (acyclic) SPARQL queries to obtain an evaluation algorithm with time and space complexity linear in the number of resources in the queried RDF graph.

  2. Fast Inbound Top-K Query for Random Walk with Restart.

    PubMed

    Zhang, Chao; Jiang, Shan; Chen, Yucheng; Sun, Yidan; Han, Jiawei

    2015-09-01

    Random walk with restart (RWR) is widely recognized as one of the most important node proximity measures for graphs, as it captures the holistic graph structure and is robust to noise in the graph. In this paper, we study a novel query based on the RWR measure, called the inbound top-k (Ink) query. Given a query node q and a number k , the Ink query aims at retrieving k nodes in the graph that have the largest weighted RWR scores to q . Ink queries can be highly useful for various applications such as traffic scheduling, disease treatment, and targeted advertising. Nevertheless, none of the existing RWR computation techniques can accurately and efficiently process the Ink query in large graphs. We propose two algorithms, namely Squeeze and Ripple, both of which can accurately answer the Ink query in a fast and incremental manner. To identify the top- k nodes, Squeeze iteratively performs matrix-vector multiplication and estimates the lower and upper bounds for all the nodes in the graph. Ripple employs a more aggressive strategy by only estimating the RWR scores for the nodes falling in the vicinity of q , the nodes outside the vicinity do not need to be evaluated because their RWR scores are propagated from the boundary of the vicinity and thus upper bounded. Ripple incrementally expands the vicinity until the top- k result set can be obtained. Our extensive experiments on real-life graph data sets show that Ink queries can retrieve interesting results, and the proposed algorithms are orders of magnitude faster than state-of-the-art method.

  3. Recommender System for Learning SQL Using Hints

    ERIC Educational Resources Information Center

    Lavbic, Dejan; Matek, Tadej; Zrnec, Aljaž

    2017-01-01

    Today's software industry requires individuals who are proficient in as many programming languages as possible. Structured query language (SQL), as an adopted standard, is no exception, as it is the most widely used query language to retrieve and manipulate data. However, the process of learning SQL turns out to be challenging. The need for a…

  4. Exploration of Web Users' Search Interests through Automatic Subject Categorization of Query Terms.

    ERIC Educational Resources Information Center

    Pu, Hsiao-tieh; Yang, Chyan; Chuang, Shui-Lung

    2001-01-01

    Proposes a mechanism that carefully integrates human and machine efforts to explore Web users' search interests. The approach consists of a four-step process: extraction of core terms; construction of subject taxonomy; automatic subject categorization of query terms; and observation of users' search interests. Research findings are proved valuable…

  5. Web Searching: A Process-Oriented Experimental Study of Three Interactive Search Paradigms.

    ERIC Educational Resources Information Center

    Dennis, Simon; Bruza, Peter; McArthur, Robert

    2002-01-01

    Compares search effectiveness when using query-based Internet search via the Google search engine, directory-based search via Yahoo, and phrase-based query reformulation-assisted search via the Hyperindex browser by means of a controlled, user-based experimental study of undergraduates at the University of Queensland. Discusses cognitive load,…

  6. The Huaihe Basin Water Resource and Water Quality Management Platform Implemented with a Spatio-Temporal Data Model

    NASA Astrophysics Data System (ADS)

    Liu, Y.; Zhang, W.; Yan, C.

    2012-07-01

    Presently, planning and assessment in maintenance, renewal and decision-making for watershed hydrology, water resource management and water quality assessment are evolving toward complex, spatially explicit regional environmental assessments. These problems have to be addressed with object-oriented spatio-temporal data models that can restore, manage, query and visualize various historic and updated basic information concerning with watershed hydrology, water resource management and water quality as well as compute and evaluate the watershed environmental conditions so as to provide online forecasting to police-makers and relevant authorities for supporting decision-making. The extensive data requirements and the difficult task of building input parameter files, however, has long been an obstacle to use of such complex models timely and effectively by resource managers. Success depends on an integrated approach that brings together scientific, education and training advances made across many individual disciplines and modified to fit the needs of the individuals and groups who must write, implement, evaluate, and adjust their watershed management plans. The centre for Hydro-science Research, Nanjing University, in cooperation with the relevant watershed management authorities, has developed a WebGIS management platform to facilitate this complex process. Improve the management of watersheds over the Huaihe basin through the development, promotion and use of a web-based, user-friendly, geospatial watershed management data and decision support system (WMDDSS) involved many difficulties for the development of this complicated System. In terms of the spatial and temporal characteristics of historic and currently available information on meteorological, hydrological, geographical, environmental and other relevant disciplines, we designed an object-oriented spatiotemporal data model that combines spatial, attribute and temporal information to implement the management system. Using this system, we can update, query and analyze environmental information as well as manage historical data, and a visualization tool was provided to help the user interpret results so as to provide scientific support for decision-making. The utility of the system has been demonstrated its values by being used in watershed management and environmental assessments.

  7. Breaking the Curse of Cardinality on Bitmap Indexes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wu, Kesheng; Wu, Kesheng; Stockinger, Kurt

    2008-04-04

    Bitmap indexes are known to be efficient for ad-hoc range queries that are common in data warehousing and scientific applications. However, they suffer from the curse of cardinality, that is, their efficiency deteriorates as attribute cardinalities increase. A number of strategies have been proposed, but none of them addresses the problem adequately. In this paper, we propose a novel binned bitmap index that greatly reduces the cost to answer queries, and therefore breaks the curse of cardinality. The key idea is to augment the binned index with an Order-preserving Bin-based Clustering (OrBiC) structure. This data structure significantly reduces the I/Omore » operations needed to resolve records that cannot be resolved with the bitmaps. To further improve the proposed index structure, we also present a strategy to create single-valued bins for frequent values. This strategy reduces index sizes and improves query processing speed. Overall, the binned indexes with OrBiC great improves the query processing speed, and are 3 - 25 times faster than the best available indexes for high-cardinality data.« less

  8. Automatic query formulations in information retrieval.

    PubMed

    Salton, G; Buckley, C; Fox, E A

    1983-07-01

    Modern information retrieval systems are designed to supply relevant information in response to requests received from the user population. In most retrieval environments the search requests consist of keywords, or index terms, interrelated by appropriate Boolean operators. Since it is difficult for untrained users to generate effective Boolean search requests, trained search intermediaries are normally used to translate original statements of user need into useful Boolean search formulations. Methods are introduced in this study which reduce the role of the search intermediaries by making it possible to generate Boolean search formulations completely automatically from natural language statements provided by the system patrons. Frequency considerations are used automatically to generate appropriate term combinations as well as Boolean connectives relating the terms. Methods are covered to produce automatic query formulations both in a standard Boolean logic system, as well as in an extended Boolean system in which the strict interpretation of the connectives is relaxed. Experimental results are supplied to evaluate the effectiveness of the automatic query formulation process, and methods are described for applying the automatic query formulation process in practice.

  9. A database de-identification framework to enable direct queries on medical data for secondary use.

    PubMed

    Erdal, B S; Liu, J; Ding, J; Chen, J; Marsh, C B; Kamal, J; Clymer, B D

    2012-01-01

    To qualify the use of patient clinical records as non-human-subject for research purpose, electronic medical record data must be de-identified so there is minimum risk to protected health information exposure. This study demonstrated a robust framework for structured data de-identification that can be applied to any relational data source that needs to be de-identified. Using a real world clinical data warehouse, a pilot implementation of limited subject areas were used to demonstrate and evaluate this new de-identification process. Query results and performances are compared between source and target system to validate data accuracy and usability. The combination of hashing, pseudonyms, and session dependent randomizer provides a rigorous de-identification framework to guard against 1) source identifier exposure; 2) internal data analyst manually linking to source identifiers; and 3) identifier cross-link among different researchers or multiple query sessions by the same researcher. In addition, a query rejection option is provided to refuse queries resulting in less than preset numbers of subjects and total records to prevent users from accidental subject identification due to low volume of data. This framework does not prevent subject re-identification based on prior knowledge and sequence of events. Also, it does not deal with medical free text de-identification, although text de-identification using natural language processing can be included due its modular design. We demonstrated a framework resulting in HIPAA Compliant databases that can be directly queried by researchers. This technique can be augmented to facilitate inter-institutional research data sharing through existing middleware such as caGrid.

  10. A Prototype Digital Library for 3D Collections: Tools To Capture, Model, Analyze, and Query Complex 3D Data.

    ERIC Educational Resources Information Center

    Rowe, Jeremy; Razdan, Anshuman

    The Partnership for Research in Spatial Modeling (PRISM) project at Arizona State University (ASU) developed modeling and analytic tools to respond to the limitations of two-dimensional (2D) data representations perceived by affiliated discipline scientists, and to take advantage of the enhanced capabilities of three-dimensional (3D) data that…

  11. Comparison of Programs Used for FIA Inventory Information Dissemination and Spatial Representation

    Treesearch

    Roger C. Lowe; Chris J. Cieszewski

    2005-01-01

    Six online applications developed for the interactive display of Forest Inventory and Analysis (FIA) data in which FIA database information and query results can be viewed as or selected from interactive geographic maps are compared. The programs evaluated are the U.S. Department of Agriculture Forest Service?s online systems; a SAS server-based mapping system...

  12. A High Speed Mobile Courier Data Access System That Processes Database Queries in Real-Time

    NASA Astrophysics Data System (ADS)

    Gatsheni, Barnabas Ndlovu; Mabizela, Zwelakhe

    A secure high-speed query processing mobile courier data access (MCDA) system for a Courier Company has been developed. This system uses the wireless networks in combination with wired networks for updating a live database at the courier centre in real-time by an offsite worker (the Courier). The system is protected by VPN based on IPsec. There is no system that we know of to date that performs the task for the courier as proposed in this paper.

  13. Application of Remote Sensing and GIS in Landfill (waste Disposal) Site Selection and Environmental Impacts Assessment around Mysore City, Karnataka, India

    NASA Astrophysics Data System (ADS)

    Basavarajappa, T. H.

    2012-07-01

    Landfill site selection is a complex process involving geological, hydrological, environmental and technical parameters as well as government regulations. As such, it requires the processing of a good amount of geospatial data. Landfill site selection techniques have been analyzed for identifying their suitability. Application of Geographic Information System (GIS) is suitable to find best locations for such installations which use multiple criteria analysis. The use of Artificial intelligence methods, such as expert systems, can also be very helpful in solid waste planning and management. The waste disposal and its pollution around major cities in Karnataka are important problems affecting the environment. The Mysore is one of the major cities in Karnataka. The landfill site selection is the best way to control of pollution from any region. The main aim is to develop geographic information system to study the Landuse/ Landcover, natural drainage system, water bodies, and extents of villages around Mysore city, transportation, topography, geomorphology, lithology, structures, vegetation and forest information for landfill site selection. GIS combines spatial data (maps, aerial photographs, and satellite images) with quantitative, qualitative, and descriptive information database, which can support a wide range of spatial queries. For the Site Selection of an industrial waste and normal daily urban waste of a city town or a village, combining GIS with Analytical Hierarchy Process (AHP) will be more appropriate. This method is innovative because it establishes general indices to quantify overall environmental impact as well as individual indices for specific environmental components (i.e. surface water, groundwater, atmosphere, soil and human health). Since this method requires processing large quantities of spatial data. To automate the processes of establishing composite evaluation criteria, performing multiple criteria analysis and carrying out spatial clustering a suitable methodology was developed. The feasibility of site selection in the study area based on different criteria was used to obtain the layered data by integrating Remote Sensing and GIS. This methodology is suitable for all practical applications in other cities, also.

  14. Generating and Executing Complex Natural Language Queries across Linked Data.

    PubMed

    Hamon, Thierry; Mougin, Fleur; Grabar, Natalia

    2015-01-01

    With the recent and intensive research in the biomedical area, the knowledge accumulated is disseminated through various knowledge bases. Links between these knowledge bases are needed in order to use them jointly. Linked Data, SPARQL language, and interfaces in Natural Language question-answering provide interesting solutions for querying such knowledge bases. We propose a method for translating natural language questions in SPARQL queries. We use Natural Language Processing tools, semantic resources, and the RDF triples description. The method is designed on 50 questions over 3 biomedical knowledge bases, and evaluated on 27 questions. It achieves 0.78 F-measure on the test set. The method for translating natural language questions into SPARQL queries is implemented as Perl module available at http://search.cpan.org/ thhamon/RDF-NLP-SPARQLQuery.

  15. Optimizability of OGC Standards Implementations - a Case Study

    NASA Astrophysics Data System (ADS)

    Misev, D.; Baumann, P.

    2012-04-01

    Why do we shop at Amazon? Because they have a unique offering that is nowhere else available? Certainly not. Rather, Amazon offers (i) simple, yet effective search; (ii) very simple payment; (iii) extremely rapid delivery. This is how scientific services will be distinguished in future: not for their data holding (there will be manifold choice), but for their service quality. We are facing the transition from data stewardship to service stewardship. One of the OGC standards which particularly enables flexible retrieval is the Web Coverage Processing Service (WCPS). It defines a high-level query language on large, multi-dimensional raster data, such as 1D timeseries, 2D EO imagery, 3D x/y/t image time series and x/y/z geophysical data, 4D x/y/z/t climate and ocean data. We have implemented WCPS based on an Array Database Management System, rasdaman, which is available in open source. In this demonstration, we study WCPS queries on 2D, 3D, and 4D data sets. Particular emphasis is placed on the computational load queries generate in such on-demand processing and filtering. We look at different techniques and their impact on performance, such as adaptive storage partitioning, query rewriting, and just-in-time compilation. Results show that there is significant potential for effective server-side optimization once a query language is sufficiently high-level and declarative.

  16. GIS-based analysis and modelling with empirical and remotely-sensed data on coastline advance and retreat

    NASA Astrophysics Data System (ADS)

    Ahmad, Sajid Rashid

    With the understanding that far more research remains to be done on the development and use of innovative and functional geospatial techniques and procedures to investigate coastline changes this thesis focussed on the integration of remote sensing, geographical information systems (GIS) and modelling techniques to provide meaningful insights on the spatial and temporal dynamics of coastline changes. One of the unique strengths of this research was the parameterization of the GIS with long-term empirical and remote sensing data. Annual empirical data from 1941--2007 were analyzed by the GIS, and then modelled with statistical techniques. Data were also extracted from Landsat TM and ETM+ images. The band ratio method was used to extract the coastlines. Topographic maps were also used to extract digital map data. All data incorporated into ArcGIS 9.2 were analyzed with various modules, including Spatial Analyst, 3D Analyst, and Triangulated Irregular Networks. The Digital Shoreline Analysis System was used to analyze and predict rates of coastline change. GIS results showed the spatial locations along the coast that will either advance or retreat over time. The linear regression results highlighted temporal changes which are likely to occur along the coastline. Box-Jenkins modelling procedures were utilized to determine statistical models which best described the time series (1941--2007) of coastline change data. After several iterations and goodness-of-fit tests, second-order spatial cyclic autoregressive models, first-order autoregressive models and autoregressive moving average models were identified as being appropriate for describing the deterministic and random processes operating in Guyana's coastal system. The models highlighted not only cyclical patterns in advance and retreat of the coastline, but also the existence of short and long-term memory processes. Long-term memory processes could be associated with mudshoal propagation and stabilization while short-term memory processes were indicative of transitory hydrodynamic and other processes. An innovative framework for a spatio-temporal information-based system (STIBS) was developed. STIBS incorporated diverse datasets within a GIS, dynamic computer-based simulation models, and a spatial information query and graphical subsystem. Tests of the STIBS proved that it could be used to simulate and visualize temporal variability in shifting morphological states of the coastline.

  17. Image-Based Airborne LiDAR Point Cloud Encoding for 3d Building Model Retrieval

    NASA Astrophysics Data System (ADS)

    Chen, Yi-Chen; Lin, Chao-Hung

    2016-06-01

    With the development of Web 2.0 and cyber city modeling, an increasing number of 3D models have been available on web-based model-sharing platforms with many applications such as navigation, urban planning, and virtual reality. Based on the concept of data reuse, a 3D model retrieval system is proposed to retrieve building models similar to a user-specified query. The basic idea behind this system is to reuse these existing 3D building models instead of reconstruction from point clouds. To efficiently retrieve models, the models in databases are compactly encoded by using a shape descriptor generally. However, most of the geometric descriptors in related works are applied to polygonal models. In this study, the input query of the model retrieval system is a point cloud acquired by Light Detection and Ranging (LiDAR) systems because of the efficient scene scanning and spatial information collection. Using Point clouds with sparse, noisy, and incomplete sampling as input queries is more difficult than that by using 3D models. Because that the building roof is more informative than other parts in the airborne LiDAR point cloud, an image-based approach is proposed to encode both point clouds from input queries and 3D models in databases. The main goal of data encoding is that the models in the database and input point clouds can be consistently encoded. Firstly, top-view depth images of buildings are generated to represent the geometry surface of a building roof. Secondly, geometric features are extracted from depth images based on height, edge and plane of building. Finally, descriptors can be extracted by spatial histograms and used in 3D model retrieval system. For data retrieval, the models are retrieved by matching the encoding coefficients of point clouds and building models. In experiments, a database including about 900,000 3D models collected from the Internet is used for evaluation of data retrieval. The results of the proposed method show a clear superiority over related methods.

  18. Semantic integration of information about orthologs and diseases: the OGO system.

    PubMed

    Miñarro-Gimenez, Jose Antonio; Egaña Aranguren, Mikel; Martínez Béjar, Rodrigo; Fernández-Breis, Jesualdo Tomás; Madrid, Marisa

    2011-12-01

    Semantic Web technologies like RDF and OWL are currently applied in life sciences to improve knowledge management by integrating disparate information. Many of the systems that perform such task, however, only offer a SPARQL query interface, which is difficult to use for life scientists. We present the OGO system, which consists of a knowledge base that integrates information of orthologous sequences and genetic diseases, providing an easy to use ontology-constrain driven query interface. Such interface allows the users to define SPARQL queries through a graphical process, therefore not requiring SPARQL expertise. Copyright © 2011 Elsevier Inc. All rights reserved.

  19. Advances in nowcasting influenza-like illness rates using search query logs

    NASA Astrophysics Data System (ADS)

    Lampos, Vasileios; Miller, Andrew C.; Crossan, Steve; Stefansen, Christian

    2015-08-01

    User-generated content can assist epidemiological surveillance in the early detection and prevalence estimation of infectious diseases, such as influenza. Google Flu Trends embodies the first public platform for transforming search queries to indications about the current state of flu in various places all over the world. However, the original model significantly mispredicted influenza-like illness rates in the US during the 2012-13 flu season. In this work, we build on the previous modeling attempt, proposing substantial improvements. Firstly, we investigate the performance of a widely used linear regularized regression solver, known as the Elastic Net. Then, we expand on this model by incorporating the queries selected by the Elastic Net into a nonlinear regression framework, based on a composite Gaussian Process. Finally, we augment the query-only predictions with an autoregressive model, injecting prior knowledge about the disease. We assess predictive performance using five consecutive flu seasons spanning from 2008 to 2013 and qualitatively explain certain shortcomings of the previous approach. Our results indicate that a nonlinear query modeling approach delivers the lowest cumulative nowcasting error, and also suggest that query information significantly improves autoregressive inferences, obtaining state-of-the-art performance.

  20. Measuring Up: Implementing a Dental Quality Measure in the Electronic Health Record Context

    PubMed Central

    Bhardwaj, Aarti; Ramoni, Rachel; Kalenderian, Elsbeth; Neumann, Ana; Hebballi, Nutan B; White, Joel M; McClellan, Lyle; Walji, Muhammad F

    2015-01-01

    Background Quality improvement requires quality measures that are validly implementable. In this work, we assessed the feasibility and performance of an automated electronic Meaningful Use dental clinical quality measure (percentage of children who received fluoride varnish). Methods We defined how to implement the automated measure queries in a dental electronic health record (EHR). Within records identified through automated query, we manually reviewed a subsample to assess the performance of the query. Results The automated query found 71.0% of patients to have had fluoride varnish compared to 77.6% found using the manual chart review. The automated quality measure performance was 90.5% sensitivity, 90.8% specificity, 96.9% positive predictive value, and 75.2% negative predictive value. Conclusions Our findings support the feasibility of automated dental quality measure queries in the context of sufficient structured data. Information noted only in the free text rather than in structured data would require natural language processing approaches to effectively query. Practical Implications To participate in self-directed quality improvement, dental clinicians must embrace the accountability era. Commitment to quality will require enhanced documentation in order to support near-term automated calculation of quality measures. PMID:26562736

  1. Analytics-Driven Lossless Data Compression for Rapid In-situ Indexing, Storing, and Querying

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jenkins, John; Arkatkar, Isha; Lakshminarasimhan, Sriram

    2013-01-01

    The analysis of scientific simulations is highly data-intensive and is becoming an increasingly important challenge. Peta-scale data sets require the use of light-weight query-driven analysis methods, as opposed to heavy-weight schemes that optimize for speed at the expense of size. This paper is an attempt in the direction of query processing over losslessly compressed scientific data. We propose a co-designed double-precision compression and indexing methodology for range queries by performing unique-value-based binning on the most significant bytes of double precision data (sign, exponent, and most significant mantissa bits), and inverting the resulting metadata to produce an inverted index over amore » reduced data representation. Without the inverted index, our method matches or improves compression ratios over both general-purpose and floating-point compression utilities. The inverted index is light-weight, and the overall storage requirement for both reduced column and index is less than 135%, whereas existing DBMS technologies can require 200-400%. As a proof-of-concept, we evaluate univariate range queries that additionally return column values, a critical component of data analytics, against state-of-the-art bitmap indexing technology, showing multi-fold query performance improvements.« less

  2. Advances in nowcasting influenza-like illness rates using search query logs.

    PubMed

    Lampos, Vasileios; Miller, Andrew C; Crossan, Steve; Stefansen, Christian

    2015-08-03

    User-generated content can assist epidemiological surveillance in the early detection and prevalence estimation of infectious diseases, such as influenza. Google Flu Trends embodies the first public platform for transforming search queries to indications about the current state of flu in various places all over the world. However, the original model significantly mispredicted influenza-like illness rates in the US during the 2012-13 flu season. In this work, we build on the previous modeling attempt, proposing substantial improvements. Firstly, we investigate the performance of a widely used linear regularized regression solver, known as the Elastic Net. Then, we expand on this model by incorporating the queries selected by the Elastic Net into a nonlinear regression framework, based on a composite Gaussian Process. Finally, we augment the query-only predictions with an autoregressive model, injecting prior knowledge about the disease. We assess predictive performance using five consecutive flu seasons spanning from 2008 to 2013 and qualitatively explain certain shortcomings of the previous approach. Our results indicate that a nonlinear query modeling approach delivers the lowest cumulative nowcasting error, and also suggest that query information significantly improves autoregressive inferences, obtaining state-of-the-art performance.

  3. Comparing NetCDF and SciDB on managing and querying 5D hydrologic dataset

    NASA Astrophysics Data System (ADS)

    Liu, Haicheng; Xiao, Xiao

    2016-11-01

    Efficiently extracting information from high dimensional hydro-meteorological modelling datasets requires smart solutions. Traditional methods are mostly based on files, which can be edited and accessed handily. But they have problems of efficiency due to contiguous storage structure. Others propose databases as an alternative for advantages such as native functionalities for manipulating multidimensional (MD) arrays, smart caching strategy and scalability. In this research, NetCDF file based solutions and the multidimensional array database management system (DBMS) SciDB applying chunked storage structure are benchmarked to determine the best solution for storing and querying 5D large hydrologic modelling dataset. The effect of data storage configurations including chunk size, dimension order and compression on query performance is explored. Results indicate that dimension order to organize storage of 5D data has significant influence on query performance if chunk size is very large. But the effect becomes insignificant when chunk size is properly set. Compression of SciDB mostly has negative influence on query performance. Caching is an advantage but may be influenced by execution of different query processes. On the whole, NetCDF solution without compression is in general more efficient than the SciDB DBMS.

  4. Shuttle-Data-Tape XML Translator

    NASA Technical Reports Server (NTRS)

    Barry, Matthew R.; Osborne, Richard N.

    2005-01-01

    JSDTImport is a computer program for translating native Shuttle Data Tape (SDT) files from American Standard Code for Information Interchange (ASCII) format into databases in other formats. JSDTImport solves the problem of organizing the SDT content, affording flexibility to enable users to choose how to store the information in a database to better support client and server applications. JSDTImport can be dynamically configured by use of a simple Extensible Markup Language (XML) file. JSDTImport uses this XML file to define how each record and field will be parsed, its layout and definition, and how the resulting database will be structured. JSDTImport also includes a client application programming interface (API) layer that provides abstraction for the data-querying process. The API enables a user to specify the search criteria to apply in gathering all the data relevant to a query. The API can be used to organize the SDT content and translate into a native XML database. The XML format is structured into efficient sections, enabling excellent query performance by use of the XPath query language. Optionally, the content can be translated into a Structured Query Language (SQL) database for fast, reliable SQL queries on standard database server computers.

  5. Scalable and responsive event processing in the cloud

    PubMed Central

    Suresh, Visalakshmi; Ezhilchelvan, Paul; Watson, Paul

    2013-01-01

    Event processing involves continuous evaluation of queries over streams of events. Response-time optimization is traditionally done over a fixed set of nodes and/or by using metrics measured at query-operator levels. Cloud computing makes it easy to acquire and release computing nodes as required. Leveraging this flexibility, we propose a novel, queueing-theory-based approach for meeting specified response-time targets against fluctuating event arrival rates by drawing only the necessary amount of computing resources from a cloud platform. In the proposed approach, the entire processing engine of a distinct query is modelled as an atomic unit for predicting response times. Several such units hosted on a single node are modelled as a multiple class M/G/1 system. These aspects eliminate intrusive, low-level performance measurements at run-time, and also offer portability and scalability. Using model-based predictions, cloud resources are efficiently used to meet response-time targets. The efficacy of the approach is demonstrated through cloud-based experiments. PMID:23230164

  6. RIMS: An Integrated Mapping and Analysis System with Applications to Earth Sciences and Hydrology

    NASA Astrophysics Data System (ADS)

    Proussevitch, A. A.; Glidden, S.; Shiklomanov, A. I.; Lammers, R. B.

    2011-12-01

    A web-based information and computational system for analysis of spatially distributed Earth system, climate, and hydrologic data have been developed. The System allows visualization, data exploration, querying, manipulation and arbitrary calculations with any loaded gridded or vector polygon dataset. The system's acronym, RIMS, stands for its core functionality as a Rapid Integrated Mapping System. The system can be deployed for a Global scale projects as well as for regional hydrology and climatology studies. In particular, the Water Systems Analysis Group of the University of New Hampshire developed the global and regional (Northern Eurasia, pan-Arctic) versions of the system with different map projections and specific data. The system has demonstrated its potential for applications in other fields of Earth sciences and education. The key Web server/client components of the framework include (a) a visualization engine built on Open Source libraries (GDAL, PROJ.4, etc.) that are utilized in a MapServer; (b) multi-level data querying tools built on XML server-client communication protocols that allow downloading map data on-the-fly to a client web browser; and (c) data manipulation and grid cell level calculation tools that mimic desktop GIS software functionality via a web interface. Server side data management of the system is designed around a simple database of dataset metadata facilitating mounting of new data to the system and maintaining existing data in an easy manner. RIMS contains "built-in" river network data that allows for query of upstream areas on-demand which can be used for spatial data aggregation and analysis of sub-basin areas. RIMS is an ongoing effort and currently being used to serve a number of websites hosting a suite of hydrologic, environmental and other GIS data.

  7. The Ned IIS project - forest ecosystem management

    Treesearch

    W. Potter; D. Nute; J. Wang; F. Maier; Michael Twery; H. Michael Rauscher; P. Knopp; S. Thomasma; M. Dass; H. Uchiyama

    2002-01-01

    For many years we have held to the notion that an Intelligent Information System (IIS) is composed of a unified knowledge base, database, and model base. The main idea behind this notion is the transparent processing of user queries. The system is responsible for "deciding" which information sources to access in order to fulfil a query regardless of whether...

  8. The Effectiveness of Stemming for Natural-Language Access to Slovene Textual Data.

    ERIC Educational Resources Information Center

    Popovic, Mirko; Willett, Peter

    1992-01-01

    Reports on the use of stemming for Slovene language documents and queries in free-text retrieval systems and demonstrates that an appropriate stemming algorithm results in an increase in retrieval effectiveness when compared with nonstemming processing. A comparison is made with stemming of English versions of the same documents and queries. (24…

  9. Finding Relevant Data in a Sea of Languages

    DTIC Science & Technology

    2016-04-26

    full machine-translated text , unbiased word clouds , query-biased word clouds , and query-biased sentence...and information retrieval to automate language processing tasks so that the limited number of linguists available for analyzing text and spoken...the crime (stock market). The Cross-LAnguage Search Engine (CLASE) has already preprocessed the documents, extracting text to identify the language

  10. A distributed query execution engine of big attributed graphs.

    PubMed

    Batarfi, Omar; Elshawi, Radwa; Fayoumi, Ayman; Barnawi, Ahmed; Sakr, Sherif

    2016-01-01

    A graph is a popular data model that has become pervasively used for modeling structural relationships between objects. In practice, in many real-world graphs, the graph vertices and edges need to be associated with descriptive attributes. Such type of graphs are referred to as attributed graphs. G-SPARQL has been proposed as an expressive language, with a centralized execution engine, for querying attributed graphs. G-SPARQL supports various types of graph querying operations including reachability, pattern matching and shortest path where any G-SPARQL query may include value-based predicates on the descriptive information (attributes) of the graph edges/vertices in addition to the structural predicates. In general, a main limitation of centralized systems is that their vertical scalability is always restricted by the physical limits of computer systems. This article describes the design, implementation in addition to the performance evaluation of DG-SPARQL, a distributed, hybrid and adaptive parallel execution engine of G-SPARQL queries. In this engine, the topology of the graph is distributed over the main memory of the underlying nodes while the graph data are maintained in a relational store which is replicated on the disk of each of the underlying nodes. DG-SPARQL evaluates parts of the query plan via SQL queries which are pushed to the underlying relational stores while other parts of the query plan, as necessary, are evaluated via indexless memory-based graph traversal algorithms. Our experimental evaluation shows the efficiency and the scalability of DG-SPARQL on querying massive attributed graph datasets in addition to its ability to outperform the performance of Apache Giraph, a popular distributed graph processing system, by orders of magnitudes.

  11. Implementing and evaluating a regional strategy to improve testing rates in VA patients at risk for HIV, utilizing the QUERI process as a guiding framework: QUERI Series.

    PubMed

    Goetz, Matthew B; Bowman, Candice; Hoang, Tuyen; Anaya, Henry; Osborn, Teresa; Gifford, Allen L; Asch, Steven M

    2008-03-19

    We describe how we used the framework of the U.S. Department of Veterans Affairs (VA) Quality Enhancement Research Initiative (QUERI) to develop a program to improve rates of diagnostic testing for the Human Immunodeficiency Virus (HIV). This venture was prompted by the observation by the CDC that 25% of HIV-infected patients do not know their diagnosis - a point of substantial importance to the VA, which is the largest provider of HIV care in the United States. Following the QUERI steps (or process), we evaluated: 1) whether undiagnosed HIV infection is a high-risk, high-volume clinical issue within the VA, 2) whether there are evidence-based recommendations for HIV testing, 3) whether there are gaps in the performance of VA HIV testing, and 4) the barriers and facilitators to improving current practice in the VA.Based on our findings, we developed and initiated a QUERI step 4/phase 1 pilot project using the precepts of the Chronic Care Model. Our improvement strategy relies upon electronic clinical reminders to provide decision support; audit/feedback as a clinical information system, and appropriate changes in delivery system design. These activities are complemented by academic detailing and social marketing interventions to achieve provider activation. Our preliminary formative evaluation indicates the need to ensure leadership and team buy-in, address facility-specific barriers, refine the reminder, and address factors that contribute to inter-clinic variances in HIV testing rates. Preliminary unadjusted data from the first seven months of our program show 3-5 fold increases in the proportion of at-risk patients who are offered HIV testing at the VA sites (stations) where the pilot project has been undertaken; no change was seen at control stations. This project demonstrates the early success of the application of the QUERI process to the development of a program to improve HIV testing rates. Preliminary unadjusted results show that the coordinated use of audit/feedback, provider activation, and organizational change can increase HIV testing rates for at-risk patients. We are refining our program prior to extending our work to a small-scale, multi-site evaluation (QUERI step 4/phase 2). We also plan to evaluate the durability/sustainability of the intervention effect, the costs of HIV testing, and the number of newly identified HIV-infected patients. Ultimately, we will evaluate this program in other geographically dispersed stations (QUERI step 4/phases 3 and 4).

  12. Implementing and evaluating a regional strategy to improve testing rates in VA patients at risk for HIV, utilizing the QUERI process as a guiding framework: QUERI Series

    PubMed Central

    Goetz, Matthew B; Bowman, Candice; Hoang, Tuyen; Anaya, Henry; Osborn, Teresa; Gifford, Allen L; Asch, Steven M

    2008-01-01

    Background We describe how we used the framework of the U.S. Department of Veterans Affairs (VA) Quality Enhancement Research Initiative (QUERI) to develop a program to improve rates of diagnostic testing for the Human Immunodeficiency Virus (HIV). This venture was prompted by the observation by the CDC that 25% of HIV-infected patients do not know their diagnosis – a point of substantial importance to the VA, which is the largest provider of HIV care in the United States. Methods Following the QUERI steps (or process), we evaluated: 1) whether undiagnosed HIV infection is a high-risk, high-volume clinical issue within the VA, 2) whether there are evidence-based recommendations for HIV testing, 3) whether there are gaps in the performance of VA HIV testing, and 4) the barriers and facilitators to improving current practice in the VA. Based on our findings, we developed and initiated a QUERI step 4/phase 1 pilot project using the precepts of the Chronic Care Model. Our improvement strategy relies upon electronic clinical reminders to provide decision support; audit/feedback as a clinical information system, and appropriate changes in delivery system design. These activities are complemented by academic detailing and social marketing interventions to achieve provider activation. Results Our preliminary formative evaluation indicates the need to ensure leadership and team buy-in, address facility-specific barriers, refine the reminder, and address factors that contribute to inter-clinic variances in HIV testing rates. Preliminary unadjusted data from the first seven months of our program show 3–5 fold increases in the proportion of at-risk patients who are offered HIV testing at the VA sites (stations) where the pilot project has been undertaken; no change was seen at control stations. Discussion This project demonstrates the early success of the application of the QUERI process to the development of a program to improve HIV testing rates. Preliminary unadjusted results show that the coordinated use of audit/feedback, provider activation, and organizational change can increase HIV testing rates for at-risk patients. We are refining our program prior to extending our work to a small-scale, multi-site evaluation (QUERI step 4/phase 2). We also plan to evaluate the durability/sustainability of the intervention effect, the costs of HIV testing, and the number of newly identified HIV-infected patients. Ultimately, we will evaluate this program in other geographically dispersed stations (QUERI step 4/phases 3 and 4). PMID:18353185

  13. GO2PUB: Querying PubMed with semantic expansion of gene ontology terms

    PubMed Central

    2012-01-01

    Background With the development of high throughput methods of gene analyses, there is a growing need for mining tools to retrieve relevant articles in PubMed. As PubMed grows, literature searches become more complex and time-consuming. Automated search tools with good precision and recall are necessary. We developed GO2PUB to automatically enrich PubMed queries with gene names, symbols and synonyms annotated by a GO term of interest or one of its descendants. Results GO2PUB enriches PubMed queries based on selected GO terms and keywords. It processes the result and displays the PMID, title, authors, abstract and bibliographic references of the articles. Gene names, symbols and synonyms that have been generated as extra keywords from the GO terms are also highlighted. GO2PUB is based on a semantic expansion of PubMed queries using the semantic inheritance between terms through the GO graph. Two experts manually assessed the relevance of GO2PUB, GoPubMed and PubMed on three queries about lipid metabolism. Experts’ agreement was high (kappa = 0.88). GO2PUB returned 69% of the relevant articles, GoPubMed: 40% and PubMed: 29%. GO2PUB and GoPubMed have 17% of their results in common, corresponding to 24% of the total number of relevant results. 70% of the articles returned by more than one tool were relevant. 36% of the relevant articles were returned only by GO2PUB, 17% only by GoPubMed and 14% only by PubMed. For determining whether these results can be generalized, we generated twenty queries based on random GO terms with a granularity similar to those of the first three queries and compared the proportions of GO2PUB and GoPubMed results. These were respectively of 77% and 40% for the first queries, and of 70% and 38% for the random queries. The two experts also assessed the relevance of seven of the twenty queries (the three related to lipid metabolism and four related to other domains). Expert agreement was high (0.93 and 0.8). GO2PUB and GoPubMed performances were similar to those of the first queries. Conclusions We demonstrated that the use of genes annotated by either GO terms of interest or a descendant of these GO terms yields some relevant articles ignored by other tools. The comparison of GO2PUB, based on semantic expansion, with GoPubMed, based on text mining techniques, showed that both tools are complementary. The analysis of the randomly-generated queries suggests that the results obtained about lipid metabolism can be generalized to other biological processes. GO2PUB is available at http://go2pub.genouest.org. PMID:22958570

  14. Spatial cyberinfrastructures, ontologies, and the humanities

    PubMed Central

    Sieber, Renee E.; Wellen, Christopher C.; Jin, Yuan

    2011-01-01

    We report on research into building a cyberinfrastructure for Chinese biographical and geographic data. Our cyberinfrastructure contains (i) the McGill-Harvard-Yenching Library Ming Qing Women's Writings database (MQWW), the only online database on historical Chinese women's writings, (ii) the China Biographical Database, the authority for Chinese historical people, and (iii) the China Historical Geographical Information System, one of the first historical geographic information systems. Key to this integration is that linked databases retain separate identities as bases of knowledge, while they possess sufficient semantic interoperability to allow for multidatabase concepts and to support cross-database queries on an ad hoc basis. Computational ontologies create underlying semantics for database access. This paper focuses on the spatial component in a humanities cyberinfrastructure, which includes issues of conflicting data, heterogeneous data models, disambiguation, and geographic scale. First, we describe the methodology for integrating the databases. Then we detail the system architecture, which includes a tier of ontologies and schema. We describe the user interface and applications that allow for cross-database queries. For instance, users should be able to analyze the data, examine hypotheses on spatial and temporal relationships, and generate historical maps with datasets from MQWW for research, teaching, and publication on Chinese women writers, their familial relations, publishing venues, and the literary and social communities. Last, we discuss the social side of cyberinfrastructure development, as people are considered to be as critical as the technical components for its success. PMID:21444819

  15. Geologic database for digital geology of California, Nevada, and Utah: an application of the North American Data Model

    USGS Publications Warehouse

    Bedford, David R.; Ludington, Steve; Nutt, Constance M.; Stone, Paul A.; Miller, David M.; Miller, Robert J.; Wagner, David L.; Saucedo, George J.

    2003-01-01

    The USGS is creating an integrated national database for digital state geologic maps that includes stratigraphic, age, and lithologic information. The majority of the conterminous 48 states have digital geologic base maps available, often at scales of 1:500,000. This product is a prototype, and is intended to demonstrate the types of derivative maps that will be possible with the national integrated database. This database permits the creation of a number of types of maps via simple or sophisticated queries, maps that may be useful in a number of areas, including mineral-resource assessment, environmental assessment, and regional tectonic evolution. This database is distributed with three main parts: a Microsoft Access 2000 database containing geologic map attribute data, an Arc/Info (Environmental Systems Research Institute, Redlands, California) Export format file containing points representing designation of stratigraphic regions for the Geologic Map of Utah, and an ArcView 3.2 (Environmental Systems Research Institute, Redlands, California) project containing scripts and dialogs for performing a series of generalization and mineral resource queries. IMPORTANT NOTE: Spatial data for the respective stage geologic maps is not distributed with this report. The digital state geologic maps for the states involved in this report are separate products, and two of them are produced by individual state agencies, which may be legally and/or financially responsible for this data. However, the spatial datasets for maps discussed in this report are available to the public. Questions regarding the distribution, sale, and use of individual state geologic maps should be sent to the respective state agency. We do provide suggestions for obtaining and formatting the spatial data to make it compatible with data in this report. See section ‘Obtaining and Formatting Spatial Data’ in the PDF version of the report.

  16. Classification of Automated Search Traffic

    NASA Astrophysics Data System (ADS)

    Buehrer, Greg; Stokes, Jack W.; Chellapilla, Kumar; Platt, John C.

    As web search providers seek to improve both relevance and response times, they are challenged by the ever-increasing tax of automated search query traffic. Third party systems interact with search engines for a variety of reasons, such as monitoring a web site’s rank, augmenting online games, or possibly to maliciously alter click-through rates. In this paper, we investigate automated traffic (sometimes referred to as bot traffic) in the query stream of a large search engine provider. We define automated traffic as any search query not generated by a human in real time. We first provide examples of different categories of query logs generated by automated means. We then develop many different features that distinguish between queries generated by people searching for information, and those generated by automated processes. We categorize these features into two classes, either an interpretation of the physical model of human interactions, or as behavioral patterns of automated interactions. Using the these detection features, we next classify the query stream using multiple binary classifiers. In addition, a multiclass classifier is then developed to identify subclasses of both normal and automated traffic. An active learning algorithm is used to suggest which user sessions to label to improve the accuracy of the multiclass classifier, while also seeking to discover new classes of automated traffic. Performance analysis are then provided. Finally, the multiclass classifier is used to predict the subclass distribution for the search query stream.

  17. Situation awareness acquired from monitoring process plants - the Process Overview concept and measure.

    PubMed

    Lau, Nathan; Jamieson, Greg A; Skraaning, Gyrd

    2016-07-01

    We introduce Process Overview, a situation awareness characterisation of the knowledge derived from monitoring process plants. Process Overview is based on observational studies of process control work in the literature. The characterisation is applied to develop a query-based measure called the Process Overview Measure. The goal of the measure is to improve coupling between situation and awareness according to process plant properties and operator cognitive work. A companion article presents the empirical evaluation of the Process Overview Measure in a realistic process control setting. The Process Overview Measure demonstrated sensitivity and validity by revealing significant effects of experimental manipulations that corroborated with other empirical results. The measure also demonstrated adequate inter-rater reliability and practicality for measuring SA based on data collected by process experts. Practitioner Summary: The Process Overview Measure is a query-based measure for assessing operator situation awareness from monitoring process plants in representative settings.

  18. Guided Iterative Substructure Search (GI-SSS) - A New Trick for an Old Dog.

    PubMed

    Weskamp, Nils

    2016-07-01

    Substructure search (SSS) is a fundamental technique supported by various chemical information systems. Many users apply it in an iterative manner: they modify their queries to shape the composition of the retrieved hit sets according to their needs. We propose and evaluate two heuristic extensions of SSS aimed at simplifying these iterative query modifications by collecting additional information during query processing and visualizing this information in an intuitive way. This gives the user a convenient feedback on how certain changes to the query would affect the retrieved hit set and reduces the number of trial-and-error cycles needed to generate an optimal search result. The proposed heuristics are simple, yet surprisingly effective and can be easily added to existing SSS implementations. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  19. DREAM: Classification scheme for dialog acts in clinical research query mediation.

    PubMed

    Hoxha, Julia; Chandar, Praveen; He, Zhe; Cimino, James; Hanauer, David; Weng, Chunhua

    2016-02-01

    Clinical data access involves complex but opaque communication between medical researchers and query analysts. Understanding such communication is indispensable for designing intelligent human-machine dialog systems that automate query formulation. This study investigates email communication and proposes a novel scheme for classifying dialog acts in clinical research query mediation. We analyzed 315 email messages exchanged in the communication for 20 data requests obtained from three institutions. The messages were segmented into 1333 utterance units. Through a rigorous process, we developed a classification scheme and applied it for dialog act annotation of the extracted utterances. Evaluation results with high inter-annotator agreement demonstrate the reliability of this scheme. This dataset is used to contribute preliminary understanding of dialog acts distribution and conversation flow in this dialog space. Copyright © 2015 Elsevier Inc. All rights reserved.

  20. Device-independent quantum private query

    NASA Astrophysics Data System (ADS)

    Maitra, Arpita; Paul, Goutam; Roy, Sarbani

    2017-04-01

    In quantum private query (QPQ), a client obtains values corresponding to his or her query only, and nothing else from the server, and the server does not get any information about the queries. V. Giovannetti et al. [Phys. Rev. Lett. 100, 230502 (2008)], 10.1103/PhysRevLett.100.230502 gave the first QPQ protocol and since then quite a few variants and extensions have been proposed. However, none of the existing protocols are device independent; i.e., all of them assume implicitly that the entangled states supplied to the client and the server are of a certain form. In this work, we exploit the idea of a local CHSH game and connect it with the scheme of Y. G. Yang et al. [Quantum Info. Process. 13, 805 (2014)], 10.1007/s11128-013-0692-8 to present the concept of a device-independent QPQ protocol.

  1. Seismic Search Engine: A distributed database for mining large scale seismic data

    NASA Astrophysics Data System (ADS)

    Liu, Y.; Vaidya, S.; Kuzma, H. A.

    2009-12-01

    The International Monitoring System (IMS) of the CTBTO collects terabytes worth of seismic measurements from many receiver stations situated around the earth with the goal of detecting underground nuclear testing events and distinguishing them from other benign, but more common events such as earthquakes and mine blasts. The International Data Center (IDC) processes and analyzes these measurements, as they are collected by the IMS, to summarize event detections in daily bulletins. Thereafter, the data measurements are archived into a large format database. Our proposed Seismic Search Engine (SSE) will facilitate a framework for data exploration of the seismic database as well as the development of seismic data mining algorithms. Analogous to GenBank, the annotated genetic sequence database maintained by NIH, through SSE, we intend to provide public access to seismic data and a set of processing and analysis tools, along with community-generated annotations and statistical models to help interpret the data. SSE will implement queries as user-defined functions composed from standard tools and models. Each query is compiled and executed over the database internally before reporting results back to the user. Since queries are expressed with standard tools and models, users can easily reproduce published results within this framework for peer-review and making metric comparisons. As an illustration, an example query is “what are the best receiver stations in East Asia for detecting events in the Middle East?” Evaluating this query involves listing all receiver stations in East Asia, characterizing known seismic events in that region, and constructing a profile for each receiver station to determine how effective its measurements are at predicting each event. The results of this query can be used to help prioritize how data is collected, identify defective instruments, and guide future sensor placements.

  2. Selecting materialized views using random algorithm

    NASA Astrophysics Data System (ADS)

    Zhou, Lijuan; Hao, Zhongxiao; Liu, Chi

    2007-04-01

    The data warehouse is a repository of information collected from multiple possibly heterogeneous autonomous distributed databases. The information stored at the data warehouse is in form of views referred to as materialized views. The selection of the materialized views is one of the most important decisions in designing a data warehouse. Materialized views are stored in the data warehouse for the purpose of efficiently implementing on-line analytical processing queries. The first issue for the user to consider is query response time. So in this paper, we develop algorithms to select a set of views to materialize in data warehouse in order to minimize the total view maintenance cost under the constraint of a given query response time. We call it query_cost view_ selection problem. First, cost graph and cost model of query_cost view_ selection problem are presented. Second, the methods for selecting materialized views by using random algorithms are presented. The genetic algorithm is applied to the materialized views selection problem. But with the development of genetic process, the legal solution produced become more and more difficult, so a lot of solutions are eliminated and producing time of the solutions is lengthened in genetic algorithm. Therefore, improved algorithm has been presented in this paper, which is the combination of simulated annealing algorithm and genetic algorithm for the purpose of solving the query cost view selection problem. Finally, in order to test the function and efficiency of our algorithms experiment simulation is adopted. The experiments show that the given methods can provide near-optimal solutions in limited time and works better in practical cases. Randomized algorithms will become invaluable tools for data warehouse evolution.

  3. Footprint Representation of Planetary Remote Sensing Data

    NASA Astrophysics Data System (ADS)

    Walter, S. H. G.; Gasselt, S. V.; Michael, G.; Neukum, G.

    The geometric outline of remote sensing image data, the so called footprint, can be represented as a number of coordinate tuples. These polygons are associated with according attribute information such as orbit name, ground- and image resolution, solar longitude and illumination conditions to generate a powerful base for classification of planetary experiment data. Speed, handling and extended capabilites are the reasons for using geodatabases to store and access these data types. Techniques for such a spatial database of footprint data are demonstrated using the Relational Database Management System (RDBMS) PostgreSQL, spatially enabled by the PostGIS extension. Exemplary, footprints of the HRSC and OMEGA instruments, both onboard ESA's Mars Express Orbiter, are generated and connected to attribute information. The aim is to provide high-resolution footprints of the OMEGA instrument to the science community for the first time and make them available for web-based mapping applications like the "Planetary Interactive GIS-on-the-Web Analyzable Database" (PIG- WAD), produced by the USGS. Map overlays with HRSC or other instruments like MOC and THEMIS (footprint maps are already available for these instruments and can be integrated into the database) allow on-the-fly intersection and comparison as well as extended statistics of the data. Footprint polygons are generated one by one using standard software provided by the instrument teams. Attribute data is calculated and stored together with the geometric information. In the case of HRSC, the coordinates of the footprints are already available in the VICAR label of each image file. Using the VICAR RTL and PostgreSQL's libpq C library they are loaded into the database using the Well-Known Text (WKT) notation by the Open Geospatial Consortium, Inc. (OGC). For the OMEGA instrument, image data is read using IDL routines developed and distributed by the OMEGA team. Image outlines are exported together with relevant attribute data to the industry standard Shapefile format. These files are translated to a Structured Query Language (SQL) command sequence suitable for insertion into the PostGIS/PostgrSQL database using the shp2pgsql data loader provided by the PostGIS software. PostgreSQL's advanced features such as geometry types, rules, operators and functions allow complex spatial queries and on-the-fly processing of data on DBMS level e.g. generalisation of the outlines. Processing done by the DBMS, visualisation via GIS systems and utilisation for web-based applications like mapservers will be demonstrated.

  4. New Capabilities in the Astrophysics Multispectral Archive Search Engine

    NASA Astrophysics Data System (ADS)

    Cheung, C. Y.; Kelley, S.; Roussopoulos, N.

    The Astrophysics Multispectral Archive Search Engine (AMASE) uses object-oriented database techniques to provide a uniform multi-mission and multi-spectral interface to search for data in the distributed archives. We describe our experience of porting AMASE from Illustra object-relational DBMS to the Informix Universal Data Server. New capabilities and utilities have been developed, including a spatial datablade that supports Nearest Neighbor queries.

  5. The PANTHER User Experience

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Coram, Jamie L.; Morrow, James D.; Perkins, David Nikolaus

    2015-09-01

    This document describes the PANTHER R&D Application, a proof-of-concept user interface application developed under the PANTHER Grand Challenge LDRD. The purpose of the application is to explore interaction models for graph analytics, drive algorithmic improvements from an end-user point of view, and support demonstration of PANTHER technologies to potential customers. The R&D Application implements a graph-centric interaction model that exposes analysts to the algorithms contained within the GeoGraphy graph analytics library. Users define geospatial-temporal semantic graph queries by constructing search templates based on nodes, edges, and the constraints among them. Users then analyze the results of the queries using bothmore » geo-spatial and temporal visualizations. Development of this application has made user experience an explicit driver for project and algorithmic level decisions that will affect how analysts one day make use of PANTHER technologies.« less

  6. Mining Claim Activity on Federal Land for the Period 1976 through 2003

    USGS Publications Warehouse

    Causey, J. Douglas

    2005-01-01

    Previous reports on mining claim records provided information and statistics (number of claims) using data from the U.S. Bureau of Land Management's (BLM) Mining Claim Recordation System. Since that time, BLM converted their mining claim data to the Legacy Repost 2000 system (LR2000). This report describes a process to extract similar statistical data about mining claims from LR2000 data using different software and procedures than were used in the earlier work. A major difference between this process and the previous work is that every section that has a mining claim record is assigned a value. This is done by proportioning a claim between each section in which it is recorded. Also, the mining claim data in this report includes all BLM records, not just the western states. LR2000 mining claim database tables for the United States were provided by BLM in text format and imported into a Microsoft? Access2000 database in January, 2004. Data from two tables in the BLM LR2000 database were summarized through a series of database queries to determine a number that represents active mining claims in each Public Land Survey (PLS) section for each of the years from 1976 to 2002. For most of the area, spatial databases are also provided. The spatial databases are only configured to work with the statistics provided in the non-spatial data files. They are suitable for geographic information system (GIS)-based regional assessments at a scale of 1:100,000 or smaller (for example, 1:250,000).

  7. Federated queries of clinical data repositories: the sum of the parts does not equal the whole

    PubMed Central

    Weber, Griffin M

    2013-01-01

    Background and objective In 2008 we developed a shared health research information network (SHRINE), which for the first time enabled research queries across the full patient populations of four Boston hospitals. It uses a federated architecture, where each hospital returns only the aggregate count of the number of patients who match a query. This allows hospitals to retain control over their local databases and comply with federal and state privacy laws. However, because patients may receive care from multiple hospitals, the result of a federated query might differ from what the result would be if the query were run against a single central repository. This paper describes the situations when this happens and presents a technique for correcting these errors. Methods We use a one-time process of identifying which patients have data in multiple repositories by comparing one-way hash values of patient demographics. This enables us to partition the local databases such that all patients within a given partition have data at the same subset of hospitals. Federated queries are then run separately on each partition independently, and the combined results are presented to the user. Results Using theoretical bounds and simulated hospital networks, we demonstrate that once the partitions are made, SHRINE can produce more precise estimates of the number of patients matching a query. Conclusions Uncertainty in the overlap of patient populations across hospitals limits the effectiveness of SHRINE and other federated query tools. Our technique reduces this uncertainty while retaining an aggregate federated architecture. PMID:23349080

  8. 41. DISCOVERY, SEARCH, AND COMMUNICATION OF TEXTUAL KNOWLEDGE RESOURCES IN DISTRIBUTED SYSTEMS a. Discovering and Utilizing Knowledge Sources for Metasearch Knowledge Systems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zamora, Antonio

    Advanced Natural Language Processing Tools for Web Information Retrieval, Content Analysis, and Synthesis. The goal of this SBIR was to implement and evaluate several advanced Natural Language Processing (NLP) tools and techniques to enhance the precision and relevance of search results by analyzing and augmenting search queries and by helping to organize the search output obtained from heterogeneous databases and web pages containing textual information of interest to DOE and the scientific-technical user communities in general. The SBIR investigated 1) the incorporation of spelling checkers in search applications, 2) identification of significant phrases and concepts using a combination of linguisticmore » and statistical techniques, and 3) enhancement of the query interface and search retrieval results through the use of semantic resources, such as thesauri. A search program with a flexible query interface was developed to search reference databases with the objective of enhancing search results from web queries or queries of specialized search systems such as DOE's Information Bridge. The DOE ETDE/INIS Joint Thesaurus was processed to create a searchable database. Term frequencies and term co-occurrences were used to enhance the web information retrieval by providing algorithmically-derived objective criteria to organize relevant documents into clusters containing significant terms. A thesaurus provides an authoritative overview and classification of a field of knowledge. By organizing the results of a search using the thesaurus terminology, the output is more meaningful than when the results are just organized based on the terms that co-occur in the retrieved documents, some of which may not be significant. An attempt was made to take advantage of the hierarchy provided by broader and narrower terms, as well as other field-specific information in the thesauri. The search program uses linguistic morphological routines to find relevant entries regardless of whether terms are stored in singular or plural form. Implementation of additional inflectional morphology processes for verbs can enhance retrieval further, but this has to be balanced by the possibility of broadening the results too much. In addition to the DOE energy thesaurus, other sources of specialized organized knowledge such as the Medical Subject Headings (MeSH), the Unified Medical Language System (UMLS), and Wikipedia were investigated. The supporting role of the NLP thesaurus search program was enhanced by incorporating spelling aid and a part-of-speech tagger to cope with misspellings in the queries and to determine the grammatical roles of the query words and identify nouns for special processing. To improve precision, multiple modes of searching were implemented including Boolean operators, and field-specific searches. Programs to convert a thesaurus or reference file into searchable support files can be deployed easily, and the resulting files are immediately searchable to produce relevance-ranked results with builtin spelling aid, morphological processing, and advanced search logic. Demonstration systems were built for several databases, including the DOE energy thesaurus.« less

  9. KA-SB: from data integration to large scale reasoning

    PubMed Central

    Roldán-García, María del Mar; Navas-Delgado, Ismael; Kerzazi, Amine; Chniber, Othmane; Molina-Castro, Joaquín; Aldana-Montes, José F

    2009-01-01

    Background The analysis of information in the biological domain is usually focused on the analysis of data from single on-line data sources. Unfortunately, studying a biological process requires having access to disperse, heterogeneous, autonomous data sources. In this context, an analysis of the information is not possible without the integration of such data. Methods KA-SB is a querying and analysis system for final users based on combining a data integration solution with a reasoner. Thus, the tool has been created with a process divided into two steps: 1) KOMF, the Khaos Ontology-based Mediator Framework, is used to retrieve information from heterogeneous and distributed databases; 2) the integrated information is crystallized in a (persistent and high performance) reasoner (DBOWL). This information could be further analyzed later (by means of querying and reasoning). Results In this paper we present a novel system that combines the use of a mediation system with the reasoning capabilities of a large scale reasoner to provide a way of finding new knowledge and of analyzing the integrated information from different databases, which is retrieved as a set of ontology instances. This tool uses a graphical query interface to build user queries easily, which shows a graphical representation of the ontology and allows users o build queries by clicking on the ontology concepts. Conclusion These kinds of systems (based on KOMF) will provide users with very large amounts of information (interpreted as ontology instances once retrieved), which cannot be managed using traditional main memory-based reasoners. We propose a process for creating persistent and scalable knowledgebases from sets of OWL instances obtained by integrating heterogeneous data sources with KOMF. This process has been applied to develop a demo tool , which uses the BioPax Level 3 ontology as the integration schema, and integrates UNIPROT, KEGG, CHEBI, BRENDA and SABIORK databases. PMID:19796402

  10. Bio-TDS: bioscience query tool discovery system.

    PubMed

    Gnimpieba, Etienne Z; VanDiermen, Menno S; Gustafson, Shayla M; Conn, Bill; Lushbough, Carol M

    2017-01-04

    Bioinformatics and computational biology play a critical role in bioscience and biomedical research. As researchers design their experimental projects, one major challenge is to find the most relevant bioinformatics toolkits that will lead to new knowledge discovery from their data. The Bio-TDS (Bioscience Query Tool Discovery Systems, http://biotds.org/) has been developed to assist researchers in retrieving the most applicable analytic tools by allowing them to formulate their questions as free text. The Bio-TDS is a flexible retrieval system that affords users from multiple bioscience domains (e.g. genomic, proteomic, bio-imaging) the ability to query over 12 000 analytic tool descriptions integrated from well-established, community repositories. One of the primary components of the Bio-TDS is the ontology and natural language processing workflow for annotation, curation, query processing, and evaluation. The Bio-TDS's scientific impact was evaluated using sample questions posed by researchers retrieved from Biostars, a site focusing on BIOLOGICAL DATA ANALYSIS: The Bio-TDS was compared to five similar bioscience analytic tool retrieval systems with the Bio-TDS outperforming the others in terms of relevance and completeness. The Bio-TDS offers researchers the capacity to associate their bioscience question with the most relevant computational toolsets required for the data analysis in their knowledge discovery process. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  11. ArrayBridge: Interweaving declarative array processing with high-performance computing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Xing, Haoyuan; Floratos, Sofoklis; Blanas, Spyros

    Scientists are increasingly turning to datacenter-scale computers to produce and analyze massive arrays. Despite decades of database research that extols the virtues of declarative query processing, scientists still write, debug and parallelize imperative HPC kernels even for the most mundane queries. This impedance mismatch has been partly attributed to the cumbersome data loading process; in response, the database community has proposed in situ mechanisms to access data in scientific file formats. Scientists, however, desire more than a passive access method that reads arrays from files. This paper describes ArrayBridge, a bi-directional array view mechanism for scientific file formats, that aimsmore » to make declarative array manipulations interoperable with imperative file-centric analyses. Our prototype implementation of ArrayBridge uses HDF5 as the underlying array storage library and seamlessly integrates into the SciDB open-source array database system. In addition to fast querying over external array objects, ArrayBridge produces arrays in the HDF5 file format just as easily as it can read from it. ArrayBridge also supports time travel queries from imperative kernels through the unmodified HDF5 API, and automatically deduplicates between array versions for space efficiency. Our extensive performance evaluation in NERSC, a large-scale scientific computing facility, shows that ArrayBridge exhibits statistically indistinguishable performance and I/O scalability to the native SciDB storage engine.« less

  12. An Intelligent Information System for forest management: NED/FVS integration

    Treesearch

    J. Wang; W.D. Potter; D. Nute; F. Maier; H. Michael Rauscher; M.J. Twery; S. Thomasma; P. Knopp

    2002-01-01

    An Intelligent Information System (IIS) is viewed as composed of a unified knowledge base, database, and model base. This allows an IIS to provide responses to user queries regardless of whether the query process involves a data retrieval, an inference, a computational method, a problem solving module, or some combination of these. NED-2 is a full-featured intelligent...

  13. Design of a Low-Cost Adaptive Question Answering System for Closed Domain Factoid Queries

    ERIC Educational Resources Information Center

    Toh, Huey Ling

    2010-01-01

    Closed domain question answering (QA) systems achieve precision and recall at the cost of complex language processing techniques to parse the answer corpus. We propose a "query-based" model for indexing answers in a closed domain factoid QA system. Further, we use a phrase term inference method for improving the ranking order of related questions.…

  14. QATT: a Natural Language Interface for QPE. M.S. Thesis

    NASA Technical Reports Server (NTRS)

    White, Douglas Robert-Graham

    1989-01-01

    QATT, a natural language interface developed for the Qualitative Process Engine (QPE) system is presented. The major goal was to evaluate the use of a preexisting natural language understanding system designed to be tailored for query processing in multiple domains of application. The other goal of QATT is to provide a comfortable environment in which to query envisionments in order to gain insight into the qualitative behavior of physical systems. It is shown that the use of the preexisting system made possible the development of a reasonably useful interface in a few months.

  15. Optimizing Maintenance of Constraint-Based Database Caches

    NASA Astrophysics Data System (ADS)

    Klein, Joachim; Braun, Susanne

    Caching data reduces user-perceived latency and often enhances availability in case of server crashes or network failures. DB caching aims at local processing of declarative queries in a DBMS-managed cache close to the application. Query evaluation must produce the same results as if done at the remote database backend, which implies that all data records needed to process such a query must be present and controlled by the cache, i. e., to achieve “predicate-specific” loading and unloading of such record sets. Hence, cache maintenance must be based on cache constraints such that “predicate completeness” of the caching units currently present can be guaranteed at any point in time. We explore how cache groups can be maintained to provide the data currently needed. Moreover, we design and optimize loading and unloading algorithms for sets of records keeping the caching units complete, before we empirically identify the costs involved in cache maintenance.

  16. Semantics Enabled Queries in EuroGEOSS: a Discovery Augmentation Approach

    NASA Astrophysics Data System (ADS)

    Santoro, M.; Mazzetti, P.; Fugazza, C.; Nativi, S.; Craglia, M.

    2010-12-01

    One of the main challenges in Earth Science Informatics is to build interoperability frameworks which allow users to discover, evaluate, and use information from different scientific domains. This needs to address multidisciplinary interoperability challenges concerning both technological and scientific aspects. From the technological point of view, it is necessary to provide a set of special interoperability arrangement in order to develop flexible frameworks that allow a variety of loosely-coupled services to interact with each other. From a scientific point of view, it is necessary to document clearly the theoretical and methodological assumptions underpinning applications in different scientific domains, and develop cross-domain ontologies to facilitate interdisciplinary dialogue and understanding. In this presentation we discuss a brokering approach that extends the traditional Service Oriented Architecture (SOA) adopted by most Spatial Data Infrastructures (SDIs) to provide the necessary special interoperability arrangements. In the EC-funded EuroGEOSS (A European approach to GEOSS) project, we distinguish among three possible functional brokering components: discovery, access and semantics brokers. This presentation focuses on the semantics broker, the Discovery Augmentation Component (DAC), which was specifically developed to address the three thematic areas covered by the EuroGEOSS project: biodiversity, forestry and drought. The EuroGEOSS DAC federates both semantics (e.g. SKOS repositories) and ISO-compliant geospatial catalog services. The DAC can be queried using common geospatial constraints (i.e. what, where, when, etc.). Two different augmented discovery styles are supported: a) automatic query expansion; b) user assisted query expansion. In the first case, the main discovery steps are: i. the query keywords (the what constraint) are “expanded” with related concepts/terms retrieved from the set of federated semantic services. A default expansion regards the multilinguality relationship; ii. The resulting queries are submitted to the federated catalog services; iii. The DAC performs a “smart” aggregation of the queries results and provides them back to the client. In the second case, the main discovery steps are: i. the user browses the federated semantic repositories and selects the concepts/terms-of-interest; ii. The DAC creates the set of geospatial queries based on the selected concepts/terms and submits them to the federated catalog services; iii. The DAC performs a “smart” aggregation of the queries results and provides them back to the client. A Graphical User Interface (GUI) was also developed for testing and interacting with the DAC. The entire brokering framework is deployed in the context of EuroGEOSS infrastructure and it is used in a couple of GEOSS AIP-3 use scenarios: the “e-Habitat Use Scenario” for the Biodiversity and Climate Change topic, and the “Comprehensive Drought Index Use Scenario” for Water/Drought topic

  17. Improving 3d Spatial Queries Search: Newfangled Technique of Space Filling Curves in 3d City Modeling

    NASA Astrophysics Data System (ADS)

    Uznir, U.; Anton, F.; Suhaibah, A.; Rahman, A. A.; Mioc, D.

    2013-09-01

    The advantages of three dimensional (3D) city models can be seen in various applications including photogrammetry, urban and regional planning, computer games, etc.. They expand the visualization and analysis capabilities of Geographic Information Systems on cities, and they can be developed using web standards. However, these 3D city models consume much more storage compared to two dimensional (2D) spatial data. They involve extra geometrical and topological information together with semantic data. Without a proper spatial data clustering method and its corresponding spatial data access method, retrieving portions of and especially searching these 3D city models, will not be done optimally. Even though current developments are based on an open data model allotted by the Open Geospatial Consortium (OGC) called CityGML, its XML-based structure makes it challenging to cluster the 3D urban objects. In this research, we propose an opponent data constellation technique of space-filling curves (3D Hilbert curves) for 3D city model data representation. Unlike previous methods, that try to project 3D or n-dimensional data down to 2D or 3D using Principal Component Analysis (PCA) or Hilbert mappings, in this research, we extend the Hilbert space-filling curve to one higher dimension for 3D city model data implementations. The query performance was tested using a CityGML dataset of 1,000 building blocks and the results are presented in this paper. The advantages of implementing space-filling curves in 3D city modeling will improve data retrieval time by means of optimized 3D adjacency, nearest neighbor information and 3D indexing. The Hilbert mapping, which maps a subinterval of the [0, 1] interval to the corresponding portion of the d-dimensional Hilbert's curve, preserves the Lebesgue measure and is Lipschitz continuous. Depending on the applications, several alternatives are possible in order to cluster spatial data together in the third dimension compared to its clustering in 2D.

  18. Unsupervised Spatial Event Detection in Targeted Domains with Applications to Civil Unrest Modeling

    PubMed Central

    Zhao, Liang; Chen, Feng; Dai, Jing; Hua, Ting; Lu, Chang-Tien; Ramakrishnan, Naren

    2014-01-01

    Twitter has become a popular data source as a surrogate for monitoring and detecting events. Targeted domains such as crime, election, and social unrest require the creation of algorithms capable of detecting events pertinent to these domains. Due to the unstructured language, short-length messages, dynamics, and heterogeneity typical of Twitter data streams, it is technically difficult and labor-intensive to develop and maintain supervised learning systems. We present a novel unsupervised approach for detecting spatial events in targeted domains and illustrate this approach using one specific domain, viz. civil unrest modeling. Given a targeted domain, we propose a dynamic query expansion algorithm to iteratively expand domain-related terms, and generate a tweet homogeneous graph. An anomaly identification method is utilized to detect spatial events over this graph by jointly maximizing local modularity and spatial scan statistics. Extensive experiments conducted in 10 Latin American countries demonstrate the effectiveness of the proposed approach. PMID:25350136

  19. StarView: The object oriented design of the ST DADS user interface

    NASA Technical Reports Server (NTRS)

    Williams, J. D.; Pollizzi, J. A.

    1992-01-01

    StarView is the user interface being developed for the Hubble Space Telescope Data Archive and Distribution Service (ST DADS). ST DADS is the data archive for HST observations and a relational database catalog describing the archived data. Users will use StarView to query the catalog and select appropriate datasets for study. StarView sends requests for archived datasets to ST DADS which processes the requests and returns the database to the user. StarView is designed to be a powerful and extensible user interface. Unique features include an internal relational database to navigate query results, a form definition language that will work with both CRT and X interfaces, a data definition language that will allow StarView to work with any relational database, and the ability to generate adhoc queries without requiring the user to understand the structure of the ST DADS catalog. Ultimately, StarView will allow the user to refine queries in the local database for improved performance and merge in data from external sources for correlation with other query results. The user will be able to create a query from single or multiple forms, merging the selected attributes into a single query. Arbitrary selection of attributes for querying is supported. The user will be able to select how query results are viewed. A standard form or table-row format may be used. Navigation capabilities are provided to aid the user in viewing query results. Object oriented analysis and design techniques were used in the design of StarView to support the mechanisms and concepts required to implement these features. One such mechanism is the Model-View-Controller (MVC) paradigm. The MVC allows the user to have multiple views of the underlying database, while providing a consistent mechanism for interaction regardless of the view. This approach supports both CRT and X interfaces while providing a common mode of user interaction. Another powerful abstraction is the concept of a Query Model. This concept allows a single query to be built form a single or multiple forms before it is submitted to ST DADS. Supporting this concept is the adhoc query generator which allows the user to select and qualify an indeterminate number attributes from the database. The user does not need any knowledge of how the joins across various tables are to be resolved. The adhoc generator calculates the joins automatically and generates the correct SQL query.

  20. Effective Filtering of Query Results on Updated User Behavioral Profiles in Web Mining

    PubMed Central

    Sadesh, S.; Suganthe, R. C.

    2015-01-01

    Web with tremendous volume of information retrieves result for user related queries. With the rapid growth of web page recommendation, results retrieved based on data mining techniques did not offer higher performance filtering rate because relationships between user profile and queries were not analyzed in an extensive manner. At the same time, existing user profile based prediction in web data mining is not exhaustive in producing personalized result rate. To improve the query result rate on dynamics of user behavior over time, Hamilton Filtered Regime Switching User Query Probability (HFRS-UQP) framework is proposed. HFRS-UQP framework is split into two processes, where filtering and switching are carried out. The data mining based filtering in our research work uses the Hamilton Filtering framework to filter user result based on personalized information on automatic updated profiles through search engine. Maximized result is fetched, that is, filtered out with respect to user behavior profiles. The switching performs accurate filtering updated profiles using regime switching. The updating in profile change (i.e., switches) regime in HFRS-UQP framework identifies the second- and higher-order association of query result on the updated profiles. Experiment is conducted on factors such as personalized information search retrieval rate, filtering efficiency, and precision ratio. PMID:26221626

  1. Open Data, Jupyter Notebooks and Geospatial Data Standards Combined - Opening up large volumes of marine and climate data to other communities

    NASA Astrophysics Data System (ADS)

    Clements, O.; Siemen, S.; Wagemann, J.

    2017-12-01

    The EU-funded Earthserver-2 project aims to offer on-demand access to large volumes of environmental data (Earth Observation, Marine, Climate data and Planetary data) via the interface standard Web Coverage Service defined by the Open Geospatial Consortium. Providing access to data via OGC web services (e.g. WCS and WMS) has the potential to open up services to a wider audience, especially to users outside the respective communities. Especially WCS 2.0 with its processing extension Web Coverage Processing Service (WCPS) is highly beneficial to make large volumes accessible to non-expert communities. Users do not have to deal with custom community data formats, such as GRIB for the meteorological community, but can directly access the data in a format they are more familiar with, such as NetCDF, JSON or CSV. Data requests can further directly be integrated into custom processing routines and users are not required to download Gigabytes of data anymore. WCS supports trim (reduction of data extent) and slice (reduction of data dimension) operations on multi-dimensional data, providing users a very flexible on-demand access to the data. WCPS allows the user to craft queries to run on the data using a text-based query language, similar to SQL. These queries can be very powerful, e.g. condensing a three-dimensional data cube into its two-dimensional mean. However, the more processing-intensive the more complex the query. As part of the EarthServer-2 project, we developed a python library that helps users to generate complex WCPS queries with Python, a programming language they are more familiar with. The interactive presentation aims to give practical examples how users can benefit from two specific WCS services from the Marine and Climate community. Use-cases from the two communities will show different approaches to take advantage of a Web Coverage (Processing) Service. The entire content is available with Jupyter Notebooks, as they prove to be a highly beneficial tool to generate reproducible workflows for environmental data analysis.

  2. Optimizing Interactive Development of Data-Intensive Applications

    PubMed Central

    Interlandi, Matteo; Tetali, Sai Deep; Gulzar, Muhammad Ali; Noor, Joseph; Condie, Tyson; Kim, Miryung; Millstein, Todd

    2017-01-01

    Modern Data-Intensive Scalable Computing (DISC) systems are designed to process data through batch jobs that execute programs (e.g., queries) compiled from a high-level language. These programs are often developed interactively by posing ad-hoc queries over the base data until a desired result is generated. We observe that there can be significant overlap in the structure of these queries used to derive the final program. Yet, each successive execution of a slightly modified query is performed anew, which can significantly increase the development cycle. Vega is an Apache Spark framework that we have implemented for optimizing a series of similar Spark programs, likely originating from a development or exploratory data analysis session. Spark developers (e.g., data scientists) can leverage Vega to significantly reduce the amount of time it takes to re-execute a modified Spark program, reducing the overall time to market for their Big Data applications. PMID:28405637

  3. Modeling relief.

    PubMed

    Sumner, Walton; Xu, Jin Zhong; Roussel, Guy; Hagen, Michael D

    2007-10-11

    The American Board of Family Medicine deployed virtual patient simulations in 2004 to evaluate Diplomates' diagnostic and management skills. A previously reported dynamic process generates general symptom histories from time series data representing baseline values and reactions to medications. The simulator also must answer queries about details such as palliation and provocation. These responses often describe some recurring pattern, such as, "this medicine relieves my symptoms in a few minutes." The simulator can provide a detail stored as text, or it can evaluate a reference to a second query object. The second query object can generate details using a single Bayesian network to evaluate the effect of each drug in a virtual patient's medication list. A new medication option may not require redesign of the second query object if its implementation is consistent with related drugs. We expect this mechanism to maintain realistic responses to detail questions in complex simulations.

  4. Using a data base management system for modelling SSME test history data

    NASA Technical Reports Server (NTRS)

    Abernethy, K.

    1985-01-01

    The usefulness of a data base management system (DBMS) for modelling historical test data for the complete series of static test firings for the Space Shuttle Main Engine (SSME) was assessed. From an analysis of user data base query requirements, it became clear that a relational DMBS which included a relationally complete query language would permit a model satisfying the query requirements. Representative models and sample queries are discussed. A list of environment-particular evaluation criteria for the desired DBMS was constructed; these criteria include requirements in the areas of user-interface complexity, program independence, flexibility, modifiability, and output capability. The evaluation process included the construction of several prototype data bases for user assessement. The systems studied, representing the three major DBMS conceptual models, were: MIRADS, a hierarchical system; DMS-1100, a CODASYL-based network system; ORACLE, a relational system; and DATATRIEVE, a relational-type system.

  5. ClaRNA: a classifier of contacts in RNA 3D structures based on a comparative analysis of various classification schemes

    PubMed Central

    Waleń, Tomasz; Chojnowski, Grzegorz; Gierski, Przemysław; Bujnicki, Janusz M.

    2014-01-01

    The understanding of folding and function of RNA molecules depends on the identification and classification of interactions between ribonucleotide residues. We developed a new method named ClaRNA for computational classification of contacts in RNA 3D structures. Unique features of the program are the ability to identify imperfect contacts and to process coarse-grained models. Each doublet of spatially close ribonucleotide residues in a query structure is compared to clusters of reference doublets obtained by analysis of a large number of experimentally determined RNA structures, and assigned a score that describes its similarity to one or more known types of contacts, including pairing, stacking, base–phosphate and base–ribose interactions. The accuracy of ClaRNA is 0.997 for canonical base pairs, 0.983 for non-canonical pairs and 0.961 for stacking interactions. The generalized squared correlation coefficient (GC2) for ClaRNA is 0.969 for canonical base pairs, 0.638 for non-canonical pairs and 0.824 for stacking interactions. The classifier can be easily extended to include new types of spatial relationships between pairs or larger assemblies of nucleotide residues. ClaRNA is freely available via a web server that includes an extensive set of tools for processing and visualizing structural information about RNA molecules. PMID:25159614

  6. Towards computational improvement of DNA database indexing and short DNA query searching.

    PubMed

    Stojanov, Done; Koceski, Sašo; Mileva, Aleksandra; Koceska, Nataša; Bande, Cveta Martinovska

    2014-09-03

    In order to facilitate and speed up the search of massive DNA databases, the database is indexed at the beginning, employing a mapping function. By searching through the indexed data structure, exact query hits can be identified. If the database is searched against an annotated DNA query, such as a known promoter consensus sequence, then the starting locations and the number of potential genes can be determined. This is particularly relevant if unannotated DNA sequences have to be functionally annotated. However, indexing a massive DNA database and searching an indexed data structure with millions of entries is a time-demanding process. In this paper, we propose a fast DNA database indexing and searching approach, identifying all query hits in the database, without having to examine all entries in the indexed data structure, limiting the maximum length of a query that can be searched against the database. By applying the proposed indexing equation, the whole human genome could be indexed in 10 hours on a personal computer, under the assumption that there is enough RAM to store the indexed data structure. Analysing the methodology proposed by Reneker, we observed that hits at starting positions [Formula: see text] are not reported, if the database is searched against a query shorter than [Formula: see text] nucleotides, such that [Formula: see text] is the length of the DNA database words being mapped and [Formula: see text] is the length of the query. A solution of this drawback is also presented.

  7. Python Winding Itself Around Datacubes: How to Access Massive Multi-Dimensional Arrays in a Pythonic Way

    NASA Astrophysics Data System (ADS)

    Merticariu, Vlad; Misev, Dimitar; Baumann, Peter

    2017-04-01

    While python has developed into the lingua franca in Data Science there is often a paradigm break when accessing specialized tools. In particular for one of the core data categories in science and engineering, massive multi-dimensional arrays, out-of-memory solutions typically employ their own, different models. We discuss this situation on the example of the scalable open-source array engine, rasdaman ("raster data manager") which offers access to and processing of Petascale multi-dimensional arrays through an SQL-style array query language, rasql. Such queries are executed in the server on a storage engine utilizing adaptive array partitioning and based on a processing engine implementing a "tile streaming" paradigm to allow processing of arrays massively larger than server RAM. The rasdaman QL has acted as blueprint for forthcoming ISO Array SQL and the Open Geospatial Consortium (OGC) geo analytics language, Web Coverage Processing Service, adopted in 2008. Not surprisingly, rasdaman is OGC and INSPIRE Reference Implementation for their "Big Earth Data" standards suite. Recently, rasdaman has been augmented with a python interface which allows to transparently interact with the database (credits go to Siddharth Shukla's Master Thesis at Jacobs University). Programmers do not need to know the rasdaman query language, as the operators are silently transformed, through lazy evaluation, into queries. Arrays delivered are likewise automatically transformed into their python representation. In the talk, the rasdaman concept will be illustrated with the help of large-scale real-life examples of operational satellite image and weather data services, and sample python code.

  8. Performance Modeling in CUDA Streams - A Means for High-Throughput Data Processing.

    PubMed

    Li, Hao; Yu, Di; Kumar, Anand; Tu, Yi-Cheng

    2014-10-01

    Push-based database management system (DBMS) is a new type of data processing software that streams large volume of data to concurrent query operators. The high data rate of such systems requires large computing power provided by the query engine. In our previous work, we built a push-based DBMS named G-SDMS to harness the unrivaled computational capabilities of modern GPUs. A major design goal of G-SDMS is to support concurrent processing of heterogenous query processing operations and enable resource allocation among such operations. Understanding the performance of operations as a result of resource consumption is thus a premise in the design of G-SDMS. With NVIDIA's CUDA framework as the system implementation platform, we present our recent work on performance modeling of CUDA kernels running concurrently under a runtime mechanism named CUDA stream . Specifically, we explore the connection between performance and resource occupancy of compute-bound kernels and develop a model that can predict the performance of such kernels. Furthermore, we provide an in-depth anatomy of the CUDA stream mechanism and summarize the main kernel scheduling disciplines in it. Our models and derived scheduling disciplines are verified by extensive experiments using synthetic and real-world CUDA kernels.

  9. Spatial-temporal analysis of building surface temperatures in Hung Hom

    NASA Astrophysics Data System (ADS)

    Zeng, Ying; Shen, Yueqian

    2015-12-01

    This thesis presents a study on spatial-temporal analysis of building surface temperatures in Hung Hom. Observations were collected from Aug 2013 to Oct 2013 at a 30-min interval, using iButton sensors (N=20) covering twelve locations in Hung Hom. And thermal images were captured in PolyU from 05 Aug 2013 to 06 Aug 2013. A linear regression model of iButton and thermal records is established to calibrate temperature data. A 3D modeling system is developed based on Visual Studio 2010 development platform, using ArcEngine10.0 component, Microsoft Access 2010 database and C# programming language. The system realizes processing data, spatial analysis, compound query and 3D face temperature rendering and so on. After statistical analyses, building face azimuths are found to have a statistically significant relationship with sun azimuths at peak time. And seasonal building temperature changing also corresponds to the sun angle and sun azimuth variations. Building materials are found to have a significant effect on building surface temperatures. Buildings with lower albedo materials tend to have higher temperatures and larger thermal conductivity material have significant diurnal variations. For the geographical locations, the peripheral faces of campus have higher temperatures than the inner faces during day time and buildings located at the southeast are cooler than the western. Furthermore, human activity is found to have a strong relationship with building surface temperatures through weekday and weekend comparison.

  10. Spatial Databases for CalVO Volcanoes: Current Status and Future Directions

    NASA Astrophysics Data System (ADS)

    Ramsey, D. W.

    2013-12-01

    The U.S. Geological Survey (USGS) California Volcano Observatory (CalVO) aims to advance scientific understanding of volcanic processes and to lessen harmful impacts of volcanic activity in California and Nevada. Within CalVO's area of responsibility, ten volcanoes or volcanic centers have been identified by a national volcanic threat assessment in support of developing the U.S. National Volcano Early Warning System (NVEWS) as posing moderate, high, or very high threats to surrounding communities based on their recent eruptive histories and their proximity to vulnerable people, property, and infrastructure. To better understand the extent of potential hazards at these and other volcanoes and volcanic centers, the USGS Volcano Science Center (VSC) is continually compiling spatial databases of volcano information, including: geologic mapping, hazards assessment maps, locations of geochemical and geochronological samples, and the distribution of volcanic vents. This digital mapping effort has been ongoing for over 15 years and early databases are being converted to match recent datasets compiled with new data models designed for use in: 1) generating hazard zones, 2) evaluating risk to population and infrastructure, 3) numerical hazard modeling, and 4) display and query on the CalVO as well as other VSC and USGS websites. In these capacities, spatial databases of CalVO volcanoes and their derivative map products provide an integrated and readily accessible framework of VSC hazards science to colleagues, emergency managers, and the general public.

  11. A distributed air index based on maximum boundary rectangle over grid-cells for wireless non-flat spatial data broadcast.

    PubMed

    Im, Seokjin; Choi, JinTak

    2014-06-17

    In the pervasive computing environment using smart devices equipped with various sensors, a wireless data broadcasting system for spatial data items is a natural way to efficiently provide a location dependent information service, regardless of the number of clients. A non-flat wireless broadcast system can support the clients in accessing quickly their preferred data items by disseminating the preferred data items more frequently than regular data on the wireless channel. To efficiently support the processing of spatial window queries in a non-flat wireless data broadcasting system, we propose a distributed air index based on a maximum boundary rectangle (MaxBR) over grid-cells (abbreviated DAIM), which uses MaxBRs for filtering out hot data items on the wireless channel. Unlike the existing index that repeats regular data items in close proximity to hot items at same frequency as hot data items in a broadcast cycle, DAIM makes it possible to repeat only hot data items in a cycle and reduces the length of the broadcast cycle. Consequently, DAIM helps the clients access the desired items quickly, improves the access time, and reduces energy consumption. In addition, a MaxBR helps the clients decide whether they have to access regular data items or not. Simulation studies show the proposed DAIM outperforms existing schemes with respect to the access time and energy consumption.

  12. Development of Intelligent Unmanned Systems

    DTIC Science & Technology

    2011-05-01

    statistical analysis on the terrain map. The data points are stored in the corresponding cells of the Traversability Grid as a linked list of 3-D Cartesian...allowed for multiple configurations of specified data to be as flexible as possible. For example, when an object is being created the knowledge store ...library was also used for querying and storing spatial data . It provided many geometric abstractions necessary such as overlap and intersects

  13. Image subregion querying using color correlograms

    DOEpatents

    Huang, Jing; Kumar, Shanmugasundaram Ravi; Mitra, Mandar; Zhu, Wei-Jing

    2002-01-01

    A color correlogram (10) is a representation expressing the spatial correlation of color and distance between pixels in a stored image. The color correlogram (10) may be used to distinguish objects in an image as well as between images in a plurality of images. By intersecting a color correlogram of an image object with correlograms of images to be searched, those images which contain the objects are identified by the intersection correlogram.

  14. PRIVACYGRID: Supporting Anonymous Location Queries in Mobile Environments

    DTIC Science & Technology

    2007-01-01

    continued price reduction of location tracking de- vices, location - based services (LBSs) are widely recognized as an important feature of the future computing... location - based services can operate completely anonymously, such as “when I pass a gas station, alert me with the unit price of the gas”. Others can...Anonymous Usage of Location - Based Services Through Spatial and Tempo- ral Cloaking. In Proceedings of the International Con- ference on Mobile

  15. Mining and Querying Multimedia Data

    DTIC Science & Technology

    2011-09-29

    able to capture more subtle spatial variations such as repetitiveness. Local feature descriptors such as SIFT [74] and SURF [12] have also been widely...empirically set to s = 90%, r = 50%, K = 20, where small variations lead to little perturbation of the output. The pseudo-code of the algorithm is...by constructing a three-layer graph based on clustering outputs, and executing a slight variation of random walk with restart algorithm. It provided

  16. GeoCrystal: graphic-interactive access to geodata archives

    NASA Astrophysics Data System (ADS)

    Goebel, Stefan; Haist, Joerg; Jasnoch, Uwe

    2002-03-01

    Recently there is spent a lot of effort to establish information systems and global infrastructures enabling both data suppliers and users to describe (-> eCommerce, metadata) as well as to find appropriate data. Examples for this are metadata information systems, online-shops or portals for geodata. The main disadvantages of existing approaches are insufficient methods and mechanisms leading users to (e.g. spatial) data archives. This affects aspects concerning usability and personalization in general as well as visual feedback techniques in the different steps of the information retrieval process. Several approaches aim at the improvement of graphical user interfaces by using intuitive metaphors, but only some of them offer 3D interfaces in the form of information landscapes or geographic result scenes in the context of information systems for geodata. This paper presents GeoCrystal, which basic idea is to adopt Venn diagrams to compose complex queries and to visualize search results in a 3D information and navigation space for geodata. These concepts are enhanced with spatial metaphors and 3D information landscapes (library for geodata) wherein users can specify searches for appropriate geodata and are enabled to graphic-interactively communicate with search results (book metaphor).

  17. A data model and database for high-resolution pathology analytical image informatics.

    PubMed

    Wang, Fusheng; Kong, Jun; Cooper, Lee; Pan, Tony; Kurc, Tahsin; Chen, Wenjin; Sharma, Ashish; Niedermayr, Cristobal; Oh, Tae W; Brat, Daniel; Farris, Alton B; Foran, David J; Saltz, Joel

    2011-01-01

    The systematic analysis of imaged pathology specimens often results in a vast amount of morphological information at both the cellular and sub-cellular scales. While microscopy scanners and computerized analysis are capable of capturing and analyzing data rapidly, microscopy image data remain underutilized in research and clinical settings. One major obstacle which tends to reduce wider adoption of these new technologies throughout the clinical and scientific communities is the challenge of managing, querying, and integrating the vast amounts of data resulting from the analysis of large digital pathology datasets. This paper presents a data model, which addresses these challenges, and demonstrates its implementation in a relational database system. This paper describes a data model, referred to as Pathology Analytic Imaging Standards (PAIS), and a database implementation, which are designed to support the data management and query requirements of detailed characterization of micro-anatomic morphology through many interrelated analysis pipelines on whole-slide images and tissue microarrays (TMAs). (1) Development of a data model capable of efficiently representing and storing virtual slide related image, annotation, markup, and feature information. (2) Development of a database, based on the data model, capable of supporting queries for data retrieval based on analysis and image metadata, queries for comparison of results from different analyses, and spatial queries on segmented regions, features, and classified objects. The work described in this paper is motivated by the challenges associated with characterization of micro-scale features for comparative and correlative analyses involving whole-slides tissue images and TMAs. Technologies for digitizing tissues have advanced significantly in the past decade. Slide scanners are capable of producing high-magnification, high-resolution images from whole slides and TMAs within several minutes. Hence, it is becoming increasingly feasible for basic, clinical, and translational research studies to produce thousands of whole-slide images. Systematic analysis of these large datasets requires efficient data management support for representing and indexing results from hundreds of interrelated analyses generating very large volumes of quantifications such as shape and texture and of classifications of the quantified features. We have designed a data model and a database to address the data management requirements of detailed characterization of micro-anatomic morphology through many interrelated analysis pipelines. The data model represents virtual slide related image, annotation, markup and feature information. The database supports a wide range of metadata and spatial queries on images, annotations, markups, and features. We currently have three databases running on a Dell PowerEdge T410 server with CentOS 5.5 Linux operating system. The database server is IBM DB2 Enterprise Edition 9.7.2. The set of databases consists of 1) a TMA database containing image analysis results from 4740 cases of breast cancer, with 641 MB storage size; 2) an algorithm validation database, which stores markups and annotations from two segmentation algorithms and two parameter sets on 18 selected slides, with 66 GB storage size; and 3) an in silico brain tumor study database comprising results from 307 TCGA slides, with 365 GB storage size. The latter two databases also contain human-generated annotations and markups for regions and nuclei. Modeling and managing pathology image analysis results in a database provide immediate benefits on the value and usability of data in a research study. The database provides powerful query capabilities, which are otherwise difficult or cumbersome to support by other approaches such as programming languages. Standardized, semantic annotated data representation and interfaces also make it possible to more efficiently share image data and analysis results.

  18. Research and development of web oriented remote sensing image publication system based on Servlet technique

    NASA Astrophysics Data System (ADS)

    Juanle, Wang; Shuang, Li; Yunqiang, Zhu

    2005-10-01

    According to the requirements of China National Scientific Data Sharing Program (NSDSP), the research and development of web oriented RS Image Publication System (RSIPS) is based on Java Servlet technique. The designing of RSIPS framework is composed of 3 tiers, which is Presentation Tier, Application Service Tier and Data Resource Tier. Presentation Tier provides user interface for data query, review and download. For the convenience of users, visual spatial query interface is included. Served as a middle tier, Application Service Tier controls all actions between users and databases. Data Resources Tier stores RS images in file and relationship databases. RSIPS is developed with cross platform programming based on Java Servlet tools, which is one of advanced techniques in J2EE architecture. RSIPS's prototype has been developed and applied in the geosciences clearinghouse practice which is among the experiment units of NSDSP in China.

  19. A natural language interface plug-in for cooperative query answering in biological databases.

    PubMed

    Jamil, Hasan M

    2012-06-11

    One of the many unique features of biological databases is that the mere existence of a ground data item is not always a precondition for a query response. It may be argued that from a biologist's standpoint, queries are not always best posed using a structured language. By this we mean that approximate and flexible responses to natural language like queries are well suited for this domain. This is partly due to biologists' tendency to seek simpler interfaces and partly due to the fact that questions in biology involve high level concepts that are open to interpretations computed using sophisticated tools. In such highly interpretive environments, rigidly structured databases do not always perform well. In this paper, our goal is to propose a semantic correspondence plug-in to aid natural language query processing over arbitrary biological database schema with an aim to providing cooperative responses to queries tailored to users' interpretations. Natural language interfaces for databases are generally effective when they are tuned to the underlying database schema and its semantics. Therefore, changes in database schema become impossible to support, or a substantial reorganization cost must be absorbed to reflect any change. We leverage developments in natural language parsing, rule languages and ontologies, and data integration technologies to assemble a prototype query processor that is able to transform a natural language query into a semantically equivalent structured query over the database. We allow knowledge rules and their frequent modifications as part of the underlying database schema. The approach we adopt in our plug-in overcomes some of the serious limitations of many contemporary natural language interfaces, including support for schema modifications and independence from underlying database schema. The plug-in introduced in this paper is generic and facilitates connecting user selected natural language interfaces to arbitrary databases using a semantic description of the intended application. We demonstrate the feasibility of our approach with a practical example.

  20. EmptyHeaded: A Relational Engine for Graph Processing

    PubMed Central

    Aberger, Christopher R.; Tu, Susan; Olukotun, Kunle; Ré, Christopher

    2016-01-01

    There are two types of high-performance graph processing engines: low- and high-level engines. Low-level engines (Galois, PowerGraph, Snap) provide optimized data structures and computation models but require users to write low-level imperative code, hence ensuring that efficiency is the burden of the user. In high-level engines, users write in query languages like datalog (SociaLite) or SQL (Grail). High-level engines are easier to use but are orders of magnitude slower than the low-level graph engines. We present EmptyHeaded, a high-level engine that supports a rich datalog-like query language and achieves performance comparable to that of low-level engines. At the core of EmptyHeaded’s design is a new class of join algorithms that satisfy strong theoretical guarantees but have thus far not achieved performance comparable to that of specialized graph processing engines. To achieve high performance, EmptyHeaded introduces a new join engine architecture, including a novel query optimizer and data layouts that leverage single-instruction multiple data (SIMD) parallelism. With this architecture, EmptyHeaded outperforms high-level approaches by up to three orders of magnitude on graph pattern queries, PageRank, and Single-Source Shortest Paths (SSSP) and is an order of magnitude faster than many low-level baselines. We validate that EmptyHeaded competes with the best-of-breed low-level engine (Galois), achieving comparable performance on PageRank and at most 3× worse performance on SSSP. PMID:28077912

  1. Small numbers, disclosure risk, security, and reliability issues in Web-based data query systems.

    PubMed

    Rudolph, Barbara A; Shah, Gulzar H; Love, Denise

    2006-01-01

    This article describes the process for developing consensus guidelines and tools for releasing public health data via the Web and highlights approaches leading agencies have taken to balance disclosure risk with public dissemination of reliable health statistics. An agency's choice of statistical methods for improving the reliability of released data for Web-based query systems is based upon a number of factors, including query system design (dynamic analysis vs preaggregated data and tables), population size, cell size, data use, and how data will be supplied to users. The article also describes those efforts that are necessary to reduce the risk of disclosure of an individual's protected health information.

  2. VIEWCACHE: An incremental pointer-based access method for autonomous interoperable databases

    NASA Technical Reports Server (NTRS)

    Roussopoulos, N.; Sellis, Timos

    1993-01-01

    One of the biggest problems facing NASA today is to provide scientists efficient access to a large number of distributed databases. Our pointer-based incremental data base access method, VIEWCACHE, provides such an interface for accessing distributed datasets and directories. VIEWCACHE allows database browsing and search performing inter-database cross-referencing with no actual data movement between database sites. This organization and processing is especially suitable for managing Astrophysics databases which are physically distributed all over the world. Once the search is complete, the set of collected pointers pointing to the desired data are cached. VIEWCACHE includes spatial access methods for accessing image datasets, which provide much easier query formulation by referring directly to the image and very efficient search for objects contained within a two-dimensional window. We will develop and optimize a VIEWCACHE External Gateway Access to database management systems to facilitate database search.

  3. VIEWCACHE: An incremental pointer-based access method for autonomous interoperable databases

    NASA Technical Reports Server (NTRS)

    Roussopoulos, N.; Sellis, Timos

    1992-01-01

    One of biggest problems facing NASA today is to provide scientists efficient access to a large number of distributed databases. Our pointer-based incremental database access method, VIEWCACHE, provides such an interface for accessing distributed data sets and directories. VIEWCACHE allows database browsing and search performing inter-database cross-referencing with no actual data movement between database sites. This organization and processing is especially suitable for managing Astrophysics databases which are physically distributed all over the world. Once the search is complete, the set of collected pointers pointing to the desired data are cached. VIEWCACHE includes spatial access methods for accessing image data sets, which provide much easier query formulation by referring directly to the image and very efficient search for objects contained within a two-dimensional window. We will develop and optimize a VIEWCACHE External Gateway Access to database management systems to facilitate distributed database search.

  4. Research on sudden environmental pollution public service platform construction based on WebGIS

    NASA Astrophysics Data System (ADS)

    Bi, T. P.; Gao, D. Y.; Zhong, X. Y.

    2016-08-01

    In order to actualize the social sharing and service of the emergency-response information for sudden pollution accidents, the public can share the risk source information service, dangerous goods control technology service and so on, The SQL Server and ArcSDE software are used to establish a spatial database to restore all kinds of information including risk sources, hazardous chemicals and handling methods in case of accidents. Combined with Chinese atmospheric environmental assessment standards, the SCREEN3 atmospheric dispersion model and one-dimensional liquid diffusion model are established to realize the query of related information and the display of the diffusion effect under B/S structure. Based on the WebGIS technology, C#.Net language is used to develop the sudden environmental pollution public service platform. As a result, the public service platform can make risk assessments and provide the best emergency processing services.

  5. Managing data from multiple disciplines, scales, and sites to support synthesis and modeling

    USGS Publications Warehouse

    Olson, R. J.; Briggs, J. M.; Porter, J.H.; Mah, Grant R.; Stafford, S.G.

    1999-01-01

    The synthesis and modeling of ecological processes at multiple spatial and temporal scales involves bringing together and sharing data from numerous sources. This article describes a data and information system model that facilitates assembling, managing, and sharing diverse data from multiple disciplines, scales, and sites to support integrated ecological studies. Cross-site scientific-domain working groups coordinate the development of data associated with their particular scientific working group, including decisions about data requirements, data to be compiled, data formats, derived data products, and schedules across the sites. The Web-based data and information system consists of nodes for each working group plus a central node that provides data access, project information, data query, and other functionality. The approach incorporates scientists and computer experts in the working groups and provides incentives for individuals to submit documented data to the data and information system.

  6. Array Processing in the Cloud: the rasdaman Approach

    NASA Astrophysics Data System (ADS)

    Merticariu, Vlad; Dumitru, Alex

    2015-04-01

    The multi-dimensional array data model is gaining more and more attention when dealing with Big Data challenges in a variety of domains such as climate simulations, geographic information systems, medical imaging or astronomical observations. Solutions provided by classical Big Data tools such as Key-Value Stores and MapReduce, as well as traditional relational databases, proved to be limited in domains associated with multi-dimensional data. This problem has been addressed by the field of array databases, in which systems provide database services for raster data, without imposing limitations on the number of dimensions that a dataset can have. Examples of datasets commonly handled by array databases include 1-dimensional sensor data, 2-D satellite imagery, 3-D x/y/t image time series as well as x/y/z geophysical voxel data, and 4-D x/y/z/t weather data. And this can grow as large as simulations of the whole universe when it comes to astrophysics. rasdaman is a well established array database, which implements many optimizations for dealing with large data volumes and operation complexity. Among those, the latest one is intra-query parallelization support: a network of machines collaborate for answering a single array database query, by dividing it into independent sub-queries sent to different servers. This enables massive processing speed-ups, which promise solutions to research challenges on multi-Petabyte data cubes. There are several correlated factors which influence the speedup that intra-query parallelisation brings: the number of servers, the capabilities of each server, the quality of the network, the availability of the data to the server that needs it in order to compute the result and many more. In the effort of adapting the engine to cloud processing patterns, two main components have been identified: one that handles communication and gathers information about the arrays sitting on every server, and a processing unit responsible with dividing work among available nodes and executing operations on local data. The federation daemon collects and stores statistics from the other network nodes and provides real time updates about local changes. Information exchanged includes available datasets, CPU load and memory usage per host. The processing component is represented by the rasdaman server. Using information from the federation daemon it breaks queries into subqueries to be executed on peer nodes, ships them, and assembles the intermediate results. Thus, we define a rasdaman network node as a pair of a federation daemon and a rasdaman server. Any node can receive a query and will subsequently act as this query's dispatcher, so all peers are at the same level and there is no single point of failure. Should a node become inaccessible then the peers will recognize this and will not any longer consider this peer for distribution. Conversely, a peer at any time can join the network. To assess the feasibility of our approach, we deployed a rasdaman network in the Amazon Elastic Cloud environment on 1001 nodes, and observed that this feature can greatly increase the performance and scalability of the system, offering a large throughput of processed data.

  7. SeqWare Query Engine: storing and searching sequence data in the cloud.

    PubMed

    O'Connor, Brian D; Merriman, Barry; Nelson, Stanley F

    2010-12-21

    Since the introduction of next-generation DNA sequencers the rapid increase in sequencer throughput, and associated drop in costs, has resulted in more than a dozen human genomes being resequenced over the last few years. These efforts are merely a prelude for a future in which genome resequencing will be commonplace for both biomedical research and clinical applications. The dramatic increase in sequencer output strains all facets of computational infrastructure, especially databases and query interfaces. The advent of cloud computing, and a variety of powerful tools designed to process petascale datasets, provide a compelling solution to these ever increasing demands. In this work, we present the SeqWare Query Engine which has been created using modern cloud computing technologies and designed to support databasing information from thousands of genomes. Our backend implementation was built using the highly scalable, NoSQL HBase database from the Hadoop project. We also created a web-based frontend that provides both a programmatic and interactive query interface and integrates with widely used genome browsers and tools. Using the query engine, users can load and query variants (SNVs, indels, translocations, etc) with a rich level of annotations including coverage and functional consequences. As a proof of concept we loaded several whole genome datasets including the U87MG cell line. We also used a glioblastoma multiforme tumor/normal pair to both profile performance and provide an example of using the Hadoop MapReduce framework within the query engine. This software is open source and freely available from the SeqWare project (http://seqware.sourceforge.net). The SeqWare Query Engine provided an easy way to make the U87MG genome accessible to programmers and non-programmers alike. This enabled a faster and more open exploration of results, quicker tuning of parameters for heuristic variant calling filters, and a common data interface to simplify development of analytical tools. The range of data types supported, the ease of querying and integrating with existing tools, and the robust scalability of the underlying cloud-based technologies make SeqWare Query Engine a nature fit for storing and searching ever-growing genome sequence datasets.

  8. SeqWare Query Engine: storing and searching sequence data in the cloud

    PubMed Central

    2010-01-01

    Background Since the introduction of next-generation DNA sequencers the rapid increase in sequencer throughput, and associated drop in costs, has resulted in more than a dozen human genomes being resequenced over the last few years. These efforts are merely a prelude for a future in which genome resequencing will be commonplace for both biomedical research and clinical applications. The dramatic increase in sequencer output strains all facets of computational infrastructure, especially databases and query interfaces. The advent of cloud computing, and a variety of powerful tools designed to process petascale datasets, provide a compelling solution to these ever increasing demands. Results In this work, we present the SeqWare Query Engine which has been created using modern cloud computing technologies and designed to support databasing information from thousands of genomes. Our backend implementation was built using the highly scalable, NoSQL HBase database from the Hadoop project. We also created a web-based frontend that provides both a programmatic and interactive query interface and integrates with widely used genome browsers and tools. Using the query engine, users can load and query variants (SNVs, indels, translocations, etc) with a rich level of annotations including coverage and functional consequences. As a proof of concept we loaded several whole genome datasets including the U87MG cell line. We also used a glioblastoma multiforme tumor/normal pair to both profile performance and provide an example of using the Hadoop MapReduce framework within the query engine. This software is open source and freely available from the SeqWare project (http://seqware.sourceforge.net). Conclusions The SeqWare Query Engine provided an easy way to make the U87MG genome accessible to programmers and non-programmers alike. This enabled a faster and more open exploration of results, quicker tuning of parameters for heuristic variant calling filters, and a common data interface to simplify development of analytical tools. The range of data types supported, the ease of querying and integrating with existing tools, and the robust scalability of the underlying cloud-based technologies make SeqWare Query Engine a nature fit for storing and searching ever-growing genome sequence datasets. PMID:21210981

  9. D Partition-Based Clustering for Supply Chain Data Management

    NASA Astrophysics Data System (ADS)

    Suhaibah, A.; Uznir, U.; Anton, F.; Mioc, D.; Rahman, A. A.

    2015-10-01

    Supply Chain Management (SCM) is the management of the products and goods flow from its origin point to point of consumption. During the process of SCM, information and dataset gathered for this application is massive and complex. This is due to its several processes such as procurement, product development and commercialization, physical distribution, outsourcing and partnerships. For a practical application, SCM datasets need to be managed and maintained to serve a better service to its three main categories; distributor, customer and supplier. To manage these datasets, a structure of data constellation is used to accommodate the data into the spatial database. However, the situation in geospatial database creates few problems, for example the performance of the database deteriorate especially during the query operation. We strongly believe that a more practical hierarchical tree structure is required for efficient process of SCM. Besides that, three-dimensional approach is required for the management of SCM datasets since it involve with the multi-level location such as shop lots and residential apartments. 3D R-Tree has been increasingly used for 3D geospatial database management due to its simplicity and extendibility. However, it suffers from serious overlaps between nodes. In this paper, we proposed a partition-based clustering for the construction of a hierarchical tree structure. Several datasets are tested using the proposed method and the percentage of the overlapping nodes and volume coverage are computed and compared with the original 3D R-Tree and other practical approaches. The experiments demonstrated in this paper substantiated that the hierarchical structure of the proposed partitionbased clustering is capable of preserving minimal overlap and coverage. The query performance was tested using 300,000 points of a SCM dataset and the results are presented in this paper. This paper also discusses the outlook of the structure for future reference.

  10. Data Warehousing at the Marine Corps Institute

    DTIC Science & Technology

    2003-09-01

    applications exists for several reasons. It allows for data to be extracted from many sources, by “cleaned”, and stored into one large data facility ...exists. Key individuals at MCI, or the so called “knowledge workers” will be educated , and try to brainstorm possible data relationships that can...They include querying and reporting, On-Line Analytical Processing (OLAP) and statistical analysis, and data mining. 1. Queries and Reports The

  11. Comment on "flexible protocol for quantum private query based on B92 protocol"

    NASA Astrophysics Data System (ADS)

    Chang, Yan; Zhang, Shi-Bin; Zhu, Jing-Min

    2017-03-01

    In a recent paper (Quantum Inf Process 13:805-813, 2014), a flexible quantum private query (QPQ) protocol based on B92 protocol is presented. Here we point out that the B92-based QPQ protocol is insecure in database security when the channel has loss, that is, the user (Alice) will know more records in Bob's database compared with she has bought.

  12. A Selectivity based approach to Continuous Pattern Detection in Streaming Graphs

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Choudhury, Sutanay; Holder, Larry; Chin, George

    2015-02-02

    Cyber security is one of the most significant technical challenges in current times. Detecting adversarial activities, prevention of theft of intellectual properties and customer data is a high priority for corporations and government agencies around the world. Cyber defenders need to analyze massive-scale, high-resolution network flows to identify, categorize, and mitigate attacks involving net- works spanning institutional and national boundaries. Many of the cyber attacks can be described as subgraph patterns, with promi- nent examples being insider infiltrations (path queries), denial of service (parallel paths) and malicious spreads (tree queries). This motivates us to explore subgraph matching on streaming graphsmore » in a continuous setting. The novelty of our work lies in using the subgraph distributional statistics collected from the streaming graph to determine the query processing strategy. We introduce a “Lazy Search" algorithm where the search strategy is decided on a vertex-to-vertex basis depending on the likelihood of a match in the vertex neighborhood. We also propose a metric named “Relative Selectivity" that is used to se- lect between different query processing strategies. Our experiments performed on real online news, network traffic stream and a syn- thetic social network benchmark demonstrate 10-100x speedups over selectivity agnostic approaches.« less

  13. A Selectivity based approach to Continuous Pattern Detection in Streaming Graphs

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Choudhury, Sutanay; Holder, Larry; Chin, George

    2015-05-27

    Cyber security is one of the most significant technical challenges in current times. Detecting adversarial activities, prevention of theft of intellectual properties and customer data is a high priority for corporations and government agencies around the world. Cyber defenders need to analyze massive-scale, high-resolution network flows to identify, categorize, and mitigate attacks involving networks spanning institutional and national boundaries. Many of the cyber attacks can be described as subgraph patterns, with prominent examples being insider infiltrations (path queries), denial of service (parallel paths) and malicious spreads (tree queries). This motivates us to explore subgraph matching on streaming graphs in amore » continuous setting. The novelty of our work lies in using the subgraph distributional statistics collected from the streaming graph to determine the query processing strategy. We introduce a ``Lazy Search" algorithm where the search strategy is decided on a vertex-to-vertex basis depending on the likelihood of a match in the vertex neighborhood. We also propose a metric named ``Relative Selectivity" that is used to select between different query processing strategies. Our experiments performed on real online news, network traffic stream and a synthetic social network benchmark demonstrate 10-100x speedups over non-incremental, selectivity agnostic approaches.« less

  14. Evaluation of Potential LSST Spatial Indexing Strategies

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Nikolaev, S; Abdulla, G; Matzke, R

    2006-10-13

    The LSST requirement for producing alerts in near real-time, and the fact that generating an alert depends on knowing the history of light variations for a given sky position, both imply that the clustering information for all detections is available at any time during the survey. Therefore, any data structure describing clustering of detections in LSST needs to be continuously updated, even as new detections are arriving from the pipeline. We call this use case ''incremental clustering'', to reflect this continuous updating of clustering information. This document describes the evaluation results for several potential LSST incremental clustering strategies, using: (1)more » Neighbors table and zone optimization to store spatial clusters (a.k.a. Jim Grey's, or SDSS algorithm); (2) MySQL built-in R-tree implementation; (3) an external spatial index library which supports a query interface.« less

  15. Ontology-Driven Provenance Management in eScience: An Application in Parasite Research

    NASA Astrophysics Data System (ADS)

    Sahoo, Satya S.; Weatherly, D. Brent; Mutharaju, Raghava; Anantharam, Pramod; Sheth, Amit; Tarleton, Rick L.

    Provenance, from the French word "provenir", describes the lineage or history of a data entity. Provenance is critical information in scientific applications to verify experiment process, validate data quality and associate trust values with scientific results. Current industrial scale eScience projects require an end-to-end provenance management infrastructure. This infrastructure needs to be underpinned by formal semantics to enable analysis of large scale provenance information by software applications. Further, effective analysis of provenance information requires well-defined query mechanisms to support complex queries over large datasets. This paper introduces an ontology-driven provenance management infrastructure for biology experiment data, as part of the Semantic Problem Solving Environment (SPSE) for Trypanosoma cruzi (T.cruzi). This provenance infrastructure, called T.cruzi Provenance Management System (PMS), is underpinned by (a) a domain-specific provenance ontology called Parasite Experiment ontology, (b) specialized query operators for provenance analysis, and (c) a provenance query engine. The query engine uses a novel optimization technique based on materialized views called materialized provenance views (MPV) to scale with increasing data size and query complexity. This comprehensive ontology-driven provenance infrastructure not only allows effective tracking and management of ongoing experiments in the Tarleton Research Group at the Center for Tropical and Emerging Global Diseases (CTEGD), but also enables researchers to retrieve the complete provenance information of scientific results for publication in literature.

  16. Melody Alignment and Similarity Metric for Content-Based Music Retrieval

    NASA Astrophysics Data System (ADS)

    Zhu, Yongwei; Kankanhalli, Mohan S.

    2003-01-01

    Music query-by-humming has attracted much research interest recently. It is a challenging problem since the hummed query inevitably contains much variation and inaccuracy. Furthermore, the similarity computation between the query tune and the reference melody is not easy due to the difficulty in ensuring proper alignment. This is because the query tune can be rendered at an unknown speed and it is usually an arbitrary subsequence of the target reference melody. Many of the previous methods, which adopt note segmentation and string matching, suffer drastically from the errors in the note segmentation, which affects retrieval accuracy and efficiency. Some methods solve the alignment issue by controlling the speed of the articulation of queries, which is inconvenient because it forces users to hum along a metronome. Some other techniques introduce arbitrary rescaling in time but this is computationally very inefficient. In this paper, we introduce a melody alignment technique, which addresses the robustness and efficiency issues. We also present a new melody similarity metric, which is performed directly on melody contours of the query data. This approach cleanly separates the alignment and similarity measurement in the search process. We show how to robustly and efficiently align the query melody with the reference melodies and how to measure the similarity subsequently. We have carried out extensive experiments. Our melody alignment method can reduce the matching candidate to 1.7% with 95% correct alignment rate. The overall retrieval system achieved 80% recall in the top 10 rank list. The results demonstrate the robustness and effectiveness the proposed methods.

  17. Agile Datacube Analytics (not just) for the Earth Sciences

    NASA Astrophysics Data System (ADS)

    Misev, Dimitar; Merticariu, Vlad; Baumann, Peter

    2017-04-01

    Metadata are considered small, smart, and queryable; data, on the other hand, are known as big, clumsy, hard to analyze. Consequently, gridded data - such as images, image timeseries, and climate datacubes - are managed separately from the metadata, and with different, restricted retrieval capabilities. One reason for this silo approach is that databases, while good at tables, XML hierarchies, RDF graphs, etc., traditionally do not support multi-dimensional arrays well. This gap is being closed by Array Databases which extend the SQL paradigm of "any query, anytime" to NoSQL arrays. They introduce semantically rich modelling combined with declarative, high-level query languages on n-D arrays. On Server side, such queries can be optimized, parallelized, and distributed based on partitioned array storage. This way, they offer new vistas in flexibility, scalability, performance, and data integration. In this respect, the forthcoming ISO SQL extension MDA ("Multi-dimensional Arrays") will be a game changer in Big Data Analytics. We introduce concepts and opportunities through the example of rasdaman ("raster data manager") which in fact has pioneered the field of Array Databases and forms the blueprint for ISO SQL/MDA and further Big Data standards, such as OGC WCPS for querying spatio-temporal Earth datacubes. With operational installations exceeding 140 TB queries have been split across more than one thousand cloud nodes, using CPUs as well as GPUs. Installations can easily be mashed up securely, enabling large-scale location-transparent query processing in federations. Federation queries have been demonstrated live at EGU 2016 spanning Europe and Australia in the context of the intercontinental EarthServer initiative, visualized through NASA WorldWind.

  18. Agile Datacube Analytics (not just) for the Earth Sciences

    NASA Astrophysics Data System (ADS)

    Baumann, P.

    2016-12-01

    Metadata are considered small, smart, and queryable; data, on the other hand, are known as big, clumsy, hard to analyze. Consequently, gridded data - such as images, image timeseries, and climate datacubes - are managed separately from the metadata, and with different, restricted retrieval capabilities. One reason for this silo approach is that databases, while good at tables, XML hierarchies, RDF graphs, etc., traditionally do not support multi-dimensional arrays well.This gap is being closed by Array Databases which extend the SQL paradigm of "any query, anytime" to NoSQL arrays. They introduce semantically rich modelling combined with declarative, high-level query languages on n-D arrays. On Server side, such queries can be optimized, parallelized, and distributed based on partitioned array storage. This way, they offer new vistas in flexibility, scalability, performance, and data integration. In this respect, the forthcoming ISO SQL extension MDA ("Multi-dimensional Arrays") will be a game changer in Big Data Analytics.We introduce concepts and opportunities through the example of rasdaman ("raster data manager") which in fact has pioneered the field of Array Databases and forms the blueprint for ISO SQL/MDA and further Big Data standards, such as OGC WCPS for querying spatio-temporal Earth datacubes. With operational installations exceeding 140 TB queries have been split across more than one thousand cloud nodes, using CPUs as well as GPUs. Installations can easily be mashed up securely, enabling large-scale location-transparent query processing in federations. Federation queries have been demonstrated live at EGU 2016 spanning Europe and Australia in the context of the intercontinental EarthServer initiative, visualized through NASA WorldWind.

  19. Data Management and Site-Visit Monitoring of the Multi-Center Registry in the Korean Neonatal Network.

    PubMed

    Choi, Chang Won; Park, Moon Sung

    2015-10-01

    The Korean Neonatal Network (KNN), a nationwide prospective registry of very-low-birth-weight (VLBW, < 1,500 g at birth) infants, was launched in April 2013. Data management (DM) and site-visit monitoring (SVM) were crucial in ensuring the quality of the data collected from 55 participating hospitals across the country on 116 clinical variables. We describe the processes and results of DM and SVM performed during the establishment stage of the registry. The DM procedure included automated proof checks, electronic data validation, query creation, query resolution, and revalidation of the corrected data. SVM included SVM team organization, identification of unregistered cases, source document verification, and post-visit report production. By March 31, 2015, 4,063 VLBW infants were registered and 1,693 queries were produced. Of these, 1,629 queries were resolved and 64 queries remain unresolved. By November 28, 2014, 52 participating hospitals were visited, with 136 site-visits completed since April 2013. Each participating hospital was visited biannually. DM and SVM were performed to ensure the quality of the data collected for the KNN registry. Our experience with DM and SVM can be applied for similar multi-center registries with large numbers of participating centers.

  20. Asynchronous Data Retrieval from an Object-Oriented Database

    NASA Astrophysics Data System (ADS)

    Gilbert, Jonathan P.; Bic, Lubomir

    We present an object-oriented semantic database model which, similar to other object-oriented systems, combines the virtues of four concepts: the functional data model, a property inheritance hierarchy, abstract data types and message-driven computation. The main emphasis is on the last of these four concepts. We describe generic procedures that permit queries to be processed in a purely message-driven manner. A database is represented as a network of nodes and directed arcs, in which each node is a logical processing element, capable of communicating with other nodes by exchanging messages. This eliminates the need for shared memory and for centralized control during query processing. Hence, the model is suitable for implementation on a multiprocessor computer architecture, consisting of large numbers of loosely coupled processing elements.

  1. TopFed: TCGA tailored federated query processing and linking to LOD.

    PubMed

    Saleem, Muhammad; Padmanabhuni, Shanmukha S; Ngomo, Axel-Cyrille Ngonga; Iqbal, Aftab; Almeida, Jonas S; Decker, Stefan; Deus, Helena F

    2014-01-01

    The Cancer Genome Atlas (TCGA) is a multidisciplinary, multi-institutional effort to catalogue genetic mutations responsible for cancer using genome analysis techniques. One of the aims of this project is to create a comprehensive and open repository of cancer related molecular analysis, to be exploited by bioinformaticians towards advancing cancer knowledge. However, devising bioinformatics applications to analyse such large dataset is still challenging, as it often requires downloading large archives and parsing the relevant text files. Therefore, it is making it difficult to enable virtual data integration in order to collect the critical co-variates necessary for analysis. We address these issues by transforming the TCGA data into the Semantic Web standard Resource Description Format (RDF), link it to relevant datasets in the Linked Open Data (LOD) cloud and further propose an efficient data distribution strategy to host the resulting 20.4 billion triples data via several SPARQL endpoints. Having the TCGA data distributed across multiple SPARQL endpoints, we enable biomedical scientists to query and retrieve information from these SPARQL endpoints by proposing a TCGA tailored federated SPARQL query processing engine named TopFed. We compare TopFed with a well established federation engine FedX in terms of source selection and query execution time by using 10 different federated SPARQL queries with varying requirements. Our evaluation results show that TopFed selects on average less than half of the sources (with 100% recall) with query execution time equal to one third to that of FedX. With TopFed, we aim to offer biomedical scientists a single-point-of-access through which distributed TCGA data can be accessed in unison. We believe the proposed system can greatly help researchers in the biomedical domain to carry out their research effectively with TCGA as the amount and diversity of data exceeds the ability of local resources to handle its retrieval and parsing.

  2. ClimateSpark: An In-memory Distributed Computing Framework for Big Climate Data Analytics

    NASA Astrophysics Data System (ADS)

    Hu, F.; Yang, C. P.; Duffy, D.; Schnase, J. L.; Li, Z.

    2016-12-01

    Massive array-based climate data is being generated from global surveillance systems and model simulations. They are widely used to analyze the environment problems, such as climate changes, natural hazards, and public health. However, knowing the underlying information from these big climate datasets is challenging due to both data- and computing- intensive issues in data processing and analyzing. To tackle the challenges, this paper proposes ClimateSpark, an in-memory distributed computing framework to support big climate data processing. In ClimateSpark, the spatiotemporal index is developed to enable Apache Spark to treat the array-based climate data (e.g. netCDF4, HDF4) as native formats, which are stored in Hadoop Distributed File System (HDFS) without any preprocessing. Based on the index, the spatiotemporal query services are provided to retrieve dataset according to a defined geospatial and temporal bounding box. The data subsets will be read out, and a data partition strategy will be applied to equally split the queried data to each computing node, and store them in memory as climateRDDs for processing. By leveraging Spark SQL and User Defined Function (UDFs), the climate data analysis operations can be conducted by the intuitive SQL language. ClimateSpark is evaluated by two use cases using the NASA Modern-Era Retrospective Analysis for Research and Applications (MERRA) climate reanalysis dataset. One use case is to conduct the spatiotemporal query and visualize the subset results in animation; the other one is to compare different climate model outputs using Taylor-diagram service. Experimental results show that ClimateSpark can significantly accelerate data query and processing, and enable the complex analysis services served in the SQL-style fashion.

  3. A multi-site cognitive task analysis for biomedical query mediation.

    PubMed

    Hruby, Gregory W; Rasmussen, Luke V; Hanauer, David; Patel, Vimla L; Cimino, James J; Weng, Chunhua

    2016-09-01

    To apply cognitive task analyses of the Biomedical query mediation (BQM) processes for EHR data retrieval at multiple sites towards the development of a generic BQM process model. We conducted semi-structured interviews with eleven data analysts from five academic institutions and one government agency, and performed cognitive task analyses on their BQM processes. A coding schema was developed through iterative refinement and used to annotate the interview transcripts. The annotated dataset was used to reconstruct and verify each BQM process and to develop a harmonized BQM process model. A survey was conducted to evaluate the face and content validity of this harmonized model. The harmonized process model is hierarchical, encompassing tasks, activities, and steps. The face validity evaluation concluded the model to be representative of the BQM process. In the content validity evaluation, out of the 27 tasks for BQM, 19 meet the threshold for semi-valid, including 3 fully valid: "Identify potential index phenotype," "If needed, request EHR database access rights," and "Perform query and present output to medical researcher", and 8 are invalid. We aligned the goals of the tasks within the BQM model with the five components of the reference interview. The similarity between the process of BQM and the reference interview is promising and suggests the BQM tasks are powerful for eliciting implicit information needs. We contribute a BQM process model based on a multi-site study. This model promises to inform the standardization of the BQM process towards improved communication efficiency and accuracy. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  4. A Multi-Site Cognitive Task Analysis for Biomedical Query Mediation

    PubMed Central

    Hruby, Gregory W.; Rasmussen, Luke V.; Hanauer, David; Patel, Vimla; Cimino, James J.; Weng, Chunhua

    2016-01-01

    Objective To apply cognitive task analyses of the Biomedical query mediation (BQM) processes for EHR data retrieval at multiple sites towards the development of a generic BQM process model. Materials and Methods We conducted semi-structured interviews with eleven data analysts from five academic institutions and one government agency, and performed cognitive task analyses on their BQM processes. A coding schema was developed through iterative refinement and used to annotate the interview transcripts. The annotated dataset was used to reconstruct and verify each BQM process and to develop a harmonized BQM process model. A survey was conducted to evaluate the face and content validity of this harmonized model. Results The harmonized process model is hierarchical, encompassing tasks, activities, and steps. The face validity evaluation concluded the model to be representative of the BQM process. In the content validity evaluation, out of the 27 tasks for BQM, 19 meet the threshold for semi-valid, including 3 fully valid: “Identify potential index phenotype,” “If needed, request EHR database access rights,” and “Perform query and present output to medical researcher”, and 8 are invalid. Discussion We aligned the goals of the tasks within the BQM model with the five components of the reference interview. The similarity between the process of BQM and the reference interview is promising and suggests the BQM tasks are powerful for eliciting implicit information needs. Conclusions We contribute a BQM process model based on a multi-site study. This model promises to inform the standardization of the BQM process towards improved communication efficiency and accuracy. PMID:27435950

  5. Database architectures for Space Telescope Science Institute

    NASA Astrophysics Data System (ADS)

    Lubow, Stephen

    1993-08-01

    At STScI nearly all large applications require database support. A general purpose architecture has been developed and is in use that relies upon an extended client-server paradigm. Processing is in general distributed across three processes, each of which generally resides on its own processor. Database queries are evaluated on one such process, called the DBMS server. The DBMS server software is provided by a database vendor. The application issues database queries and is called the application client. This client uses a set of generic DBMS application programming calls through our STDB/NET programming interface. Intermediate between the application client and the DBMS server is the STDB/NET server. This server accepts generic query requests from the application and converts them into the specific requirements of the DBMS server. In addition, it accepts query results from the DBMS server and passes them back to the application. Typically the STDB/NET server is local to the DBMS server, while the application client may be remote. The STDB/NET server provides additional capabilities such as database deadlock restart and performance monitoring. This architecture is currently in use for some major STScI applications, including the ground support system. We are currently investigating means of providing ad hoc query support to users through the above architecture. Such support is critical for providing flexible user interface capabilities. The Universal Relation advocated by Ullman, Kernighan, and others appears to be promising. In this approach, the user sees the entire database as a single table, thereby freeing the user from needing to understand the detailed schema. A software layer provides the translation between the user and detailed schema views of the database. However, many subtle issues arise in making this transformation. We are currently exploring this scheme for use in the Hubble Space Telescope user interface to the data archive system (DADS).

  6. Ordered Backward XPath Axis Processing against XML Streams

    NASA Astrophysics Data System (ADS)

    Nizar M., Abdul; Kumar, P. Sreenivasa

    Processing of backward XPath axes against XML streams is challenging for two reasons: (i) Data is not cached for future access. (ii) Query contains steps specifying navigation to the data that already passed by. While there are some attempts to process parent and ancestor axes, there are very few proposals to process ordered backward axes namely, preceding and preceding-sibling. For ordered backward axis processing, the algorithm, in addition to overcoming the limitations on data availability, has to take care of ordering constraints imposed by these axes. In this paper, we show how backward ordered axes can be effectively represented using forward constraints. We then discuss an algorithm for XML stream processing of XPath expressions containing ordered backward axes. The algorithm uses a layered cache structure to systematically accumulate query results. Our experiments show that the new algorithm gains remarkable speed up over the existing algorithm without compromising on bufferspace requirement.

  7. Cross-Dataset Analysis and Visualization Driven by Expressive Web Services

    NASA Astrophysics Data System (ADS)

    Alexandru Dumitru, Mircea; Catalin Merticariu, Vlad

    2015-04-01

    The deluge of data that is hitting us every day from satellite and airborne sensors is changing the workflow of environmental data analysts and modelers. Web geo-services play now a fundamental role, and are no longer needed to preliminary download and store the data, but rather they interact in real-time with GIS applications. Due to the very large amount of data that is curated and made available by web services, it is crucial to deploy smart solutions for optimizing network bandwidth, reducing duplication of data and moving the processing closer to the data. In this context we have created a visualization application for analysis and cross-comparison of aerosol optical thickness datasets. The application aims to help researchers identify and visualize discrepancies between datasets coming from various sources, having different spatial and time resolutions. It also acts as a proof of concept for integration of OGC Web Services under a user-friendly interface that provides beautiful visualizations of the explored data. The tool was built on top of the World Wind engine, a Java based virtual globe built by NASA and the open source community. For data retrieval and processing we exploited the OGC Web Coverage Service potential: the most exciting aspect being its processing extension, a.k.a. the OGC Web Coverage Processing Service (WCPS) standard. A WCPS-compliant service allows a client to execute a processing query on any coverage offered by the server. By exploiting a full grammar, several different kinds of information can be retrieved from one or more datasets together: scalar condensers, cross-sectional profiles, comparison maps and plots, etc. This combination of technology made the application versatile and portable. As the processing is done on the server-side, we ensured that the minimal amount of data is transferred and that the processing is done on a fully-capable server, leaving the client hardware resources to be used for rendering the visualization. The application offers a set of features to visualize and cross-compare the datasets. Users can select a region of interest in space and time on which an aerosol map layer is plotted. Hovmoeller time-latitude and time-longitude profiles can be displayed by selecting orthogonal cross-sections on the globe. Statistics about the selected dataset are also displayed in different text and plot formats. The datasets can also be cross-compared either by using the delta map tool or the merged map tool. For more advanced users, a WCPS query console is also offered allowing users to process their data with ad-hoc queries and then choose how to display the results. Overall, the user has a rich set of tools that can be used to visualize and cross-compare the aerosol datasets. With our application we have shown how the NASA WorldWind framework can be used to display results processed efficiently - and entirely - on the server side using the expressiveness of the OGC WCPS web-service. The application serves not only as a proof of concept of a new paradigm in working with large geospatial data but also as an useful tool for environmental data analysts.

  8. Distributed Sensing and Processing Adaptive Collaboration Environment (D-SPACE)

    DTIC Science & Technology

    2014-07-01

    to the query graph, or subgraph permutations with the same mismatch cost (often the case for homogeneous and/or symmetrical data/query). To avoid...decisions are generated in a bottom-up manner using the metric of entropy at the cluster level (Figure 9c). Using the definition of belief messages...for a cluster and a set of data nodes in this cluster , we compute the entropy for forward and backward messages as (,) = −∑ (

  9. Towards a light-weight query engine for accessing health sensor data in a fall prevention system.

    PubMed

    Kreiner, Karl; Gossy, Christian; Drobics, Mario

    2014-01-01

    Connecting various sensors in sensor networks has become popular during the last decade. An important aspect next to storing and creating data is information access by domain experts, such as researchers, caretakers and physicians. In this work we present the design and prototypic implementation of a light-weight query engine using natural language processing for accessing health-related sensor data in a fall prevention system.

  10. Performance Modeling in CUDA Streams - A Means for High-Throughput Data Processing

    PubMed Central

    Li, Hao; Yu, Di; Kumar, Anand; Tu, Yi-Cheng

    2015-01-01

    Push-based database management system (DBMS) is a new type of data processing software that streams large volume of data to concurrent query operators. The high data rate of such systems requires large computing power provided by the query engine. In our previous work, we built a push-based DBMS named G-SDMS to harness the unrivaled computational capabilities of modern GPUs. A major design goal of G-SDMS is to support concurrent processing of heterogenous query processing operations and enable resource allocation among such operations. Understanding the performance of operations as a result of resource consumption is thus a premise in the design of G-SDMS. With NVIDIA’s CUDA framework as the system implementation platform, we present our recent work on performance modeling of CUDA kernels running concurrently under a runtime mechanism named CUDA stream. Specifically, we explore the connection between performance and resource occupancy of compute-bound kernels and develop a model that can predict the performance of such kernels. Furthermore, we provide an in-depth anatomy of the CUDA stream mechanism and summarize the main kernel scheduling disciplines in it. Our models and derived scheduling disciplines are verified by extensive experiments using synthetic and real-world CUDA kernels. PMID:26566545

  11. A Random Walk Approach to Query Informative Constraints for Clustering.

    PubMed

    Abin, Ahmad Ali

    2017-08-09

    This paper presents a random walk approach to the problem of querying informative constraints for clustering. The proposed method is based on the properties of the commute time, that is the expected time taken for a random walk to travel between two nodes and return, on the adjacency graph of data. Commute time has the nice property of that, the more short paths connect two given nodes in a graph, the more similar those nodes are. Since computing the commute time takes the Laplacian eigenspectrum into account, we use this property in a recursive fashion to query informative constraints for clustering. At each recursion, the proposed method constructs the adjacency graph of data and utilizes the spectral properties of the commute time matrix to bipartition the adjacency graph. Thereafter, the proposed method benefits from the commute times distance on graph to query informative constraints between partitions. This process iterates for each partition until the stop condition becomes true. Experiments on real-world data show the efficiency of the proposed method for constraints selection.

  12. Hierarchical data security in a Query-By-Example interface for a shared database.

    PubMed

    Taylor, Merwyn

    2002-06-01

    Whenever a shared database resource, containing critical patient data, is created, protecting the contents of the database is a high priority goal. This goal can be achieved by developing a Query-By-Example (QBE) interface, designed to access a shared database, and embedding within the QBE a hierarchical security module that limits access to the data. The security module ensures that researchers working in one clinic do not get access to data from another clinic. The security can be based on a flexible taxonomy structure that allows ordinary users to access data from individual clinics and super users to access data from all clinics. All researchers submit queries through the same interface and the security module processes the taxonomy and user identifiers to limit access. Using this system, two different users with different access rights can submit the same query and get different results thus reducing the need to create different interfaces for different clinics and access rights.

  13. Common hyperspectral image database design

    NASA Astrophysics Data System (ADS)

    Tian, Lixun; Liao, Ningfang; Chai, Ali

    2009-11-01

    This paper is to introduce Common hyperspectral image database with a demand-oriented Database design method (CHIDB), which comprehensively set ground-based spectra, standardized hyperspectral cube, spectral analysis together to meet some applications. The paper presents an integrated approach to retrieving spectral and spatial patterns from remotely sensed imagery using state-of-the-art data mining and advanced database technologies, some data mining ideas and functions were associated into CHIDB to make it more suitable to serve in agriculture, geological and environmental areas. A broad range of data from multiple regions of the electromagnetic spectrum is supported, including ultraviolet, visible, near-infrared, thermal infrared, and fluorescence. CHIDB is based on dotnet framework and designed by MVC architecture including five main functional modules: Data importer/exporter, Image/spectrum Viewer, Data Processor, Parameter Extractor, and On-line Analyzer. The original data were all stored in SQL server2008 for efficient search, query and update, and some advance Spectral image data Processing technology are used such as Parallel processing in C#; Finally an application case is presented in agricultural disease detecting area.

  14. An XML-Based Manipulation and Query Language for Rule-Based Information

    NASA Astrophysics Data System (ADS)

    Mansour, Essam; Höpfner, Hagen

    Rules are utilized to assist in the monitoring process that is required in activities, such as disease management and customer relationship management. These rules are specified according to the application best practices. Most of research efforts emphasize on the specification and execution of these rules. Few research efforts focus on managing these rules as one object that has a management life-cycle. This paper presents our manipulation and query language that is developed to facilitate the maintenance of this object during its life-cycle and to query the information contained in this object. This language is based on an XML-based model. Furthermore, we evaluate the model and language using a prototype system applied to a clinical case study.

  15. System, method and apparatus for generating phrases from a database

    NASA Technical Reports Server (NTRS)

    McGreevy, Michael W. (Inventor)

    2004-01-01

    A phrase generation is a method of generating sequences of terms, such as phrases, that may occur within a database of subsets containing sequences of terms, such as text. A database is provided and a relational model of the database is created. A query is then input. The query includes a term or a sequence of terms or multiple individual terms or multiple sequences of terms or combinations thereof. Next, several sequences of terms that are contextually related to the query are assembled from contextual relations in the model of the database. The sequences of terms are then sorted and output. Phrase generation can also be an iterative process used to produce sequences of terms from a relational model of a database.

  16. A geodata warehouse: Using denormalisation techniques as a tool for delivering spatially enabled integrated geological information to geologists

    NASA Astrophysics Data System (ADS)

    Kingdon, Andrew; Nayembil, Martin L.; Richardson, Anne E.; Smith, A. Graham

    2016-11-01

    New requirements to understand geological properties in three dimensions have led to the development of PropBase, a data structure and delivery tools to deliver this. At the BGS, relational database management systems (RDBMS) has facilitated effective data management using normalised subject-based database designs with business rules in a centralised, vocabulary controlled, architecture. These have delivered effective data storage in a secure environment. However, isolated subject-oriented designs prevented efficient cross-domain querying of datasets. Additionally, the tools provided often did not enable effective data discovery as they struggled to resolve the complex underlying normalised structures providing poor data access speeds. Users developed bespoke access tools to structures they did not fully understand sometimes delivering them incorrect results. Therefore, BGS has developed PropBase, a generic denormalised data structure within an RDBMS to store property data, to facilitate rapid and standardised data discovery and access, incorporating 2D and 3D physical and chemical property data, with associated metadata. This includes scripts to populate and synchronise the layer with its data sources through structured input and transcription standards. A core component of the architecture includes, an optimised query object, to deliver geoscience information from a structure equivalent to a data warehouse. This enables optimised query performance to deliver data in multiple standardised formats using a web discovery tool. Semantic interoperability is enforced through vocabularies combined from all data sources facilitating searching of related terms. PropBase holds 28.1 million spatially enabled property data points from 10 source databases incorporating over 50 property data types with a vocabulary set that includes 557 property terms. By enabling property data searches across multiple databases PropBase has facilitated new scientific research, previously considered impractical. PropBase is easily extended to incorporate 4D data (time series) and is providing a baseline for new "big data" monitoring projects.

  17. PRIDE: new developments and new datasets.

    PubMed

    Jones, Philip; Côté, Richard G; Cho, Sang Yun; Klie, Sebastian; Martens, Lennart; Quinn, Antony F; Thorneycroft, David; Hermjakob, Henning

    2008-01-01

    The PRIDE (http://www.ebi.ac.uk/pride) database of protein and peptide identifications was previously described in the NAR Database Special Edition in 2006. Since this publication, the volume of public data in the PRIDE relational database has increased by more than an order of magnitude. Several significant public datasets have been added, including identifications and processed mass spectra generated by the HUPO Brain Proteome Project and the HUPO Liver Proteome Project. The PRIDE software development team has made several significant changes and additions to the user interface and tool set associated with PRIDE. The focus of these changes has been to facilitate the submission process and to improve the mechanisms by which PRIDE can be queried. The PRIDE team has developed a Microsoft Excel workbook that allows the required data to be collated in a series of relatively simple spreadsheets, with automatic generation of PRIDE XML at the end of the process. The ability to query PRIDE has been augmented by the addition of a BioMart interface allowing complex queries to be constructed. Collaboration with groups outside the EBI has been fruitful in extending PRIDE, including an approach to encode iTRAQ quantitative data in PRIDE XML.

  18. Paradise: A Parallel Information System for EOSDIS

    NASA Technical Reports Server (NTRS)

    DeWitt, David

    1996-01-01

    The Paradise project was begun-in 1993 in order to explore the application of the parallel and object-oriented database system technology developed as a part of the Gamma, Exodus. and Shore projects to the design and development of a scaleable, geo-spatial database system for storing both massive spatial and satellite image data sets. Paradise is based on an object-relational data model. In addition to the standard attribute types such as integers, floats, strings and time, Paradise also provides a set of and multimedia data types, designed to facilitate the storage and querying of complex spatial and multimedia data sets. An individual tuple can contain any combination of this rich set of data types. For example, in the EOSDIS context, a tuple might mix terrain and map data for an area along with the latest satellite weather photo of the area. The use of a geo-spatial metaphor simplifies the task of fusing disparate forms of data from multiple data sources including text, image, map, and video data sets.

  19. StreamQRE: Modular Specification and Efficient Evaluation of Quantitative Queries over Streaming Data.

    PubMed

    Mamouras, Konstantinos; Raghothaman, Mukund; Alur, Rajeev; Ives, Zachary G; Khanna, Sanjeev

    2017-06-01

    Real-time decision making in emerging IoT applications typically relies on computing quantitative summaries of large data streams in an efficient and incremental manner. To simplify the task of programming the desired logic, we propose StreamQRE, which provides natural and high-level constructs for processing streaming data. Our language has a novel integration of linguistic constructs from two distinct programming paradigms: streaming extensions of relational query languages and quantitative extensions of regular expressions. The former allows the programmer to employ relational constructs to partition the input data by keys and to integrate data streams from different sources, while the latter can be used to exploit the logical hierarchy in the input stream for modular specifications. We first present the core language with a small set of combinators, formal semantics, and a decidable type system. We then show how to express a number of common patterns with illustrative examples. Our compilation algorithm translates the high-level query into a streaming algorithm with precise complexity bounds on per-item processing time and total memory footprint. We also show how to integrate approximation algorithms into our framework. We report on an implementation in Java, and evaluate it with respect to existing high-performance engines for processing streaming data. Our experimental evaluation shows that (1) StreamQRE allows more natural and succinct specification of queries compared to existing frameworks, (2) the throughput of our implementation is higher than comparable systems (for example, two-to-four times greater than RxJava), and (3) the approximation algorithms supported by our implementation can lead to substantial memory savings.

  20. StreamQRE: Modular Specification and Efficient Evaluation of Quantitative Queries over Streaming Data*

    PubMed Central

    Mamouras, Konstantinos; Raghothaman, Mukund; Alur, Rajeev; Ives, Zachary G.; Khanna, Sanjeev

    2017-01-01

    Real-time decision making in emerging IoT applications typically relies on computing quantitative summaries of large data streams in an efficient and incremental manner. To simplify the task of programming the desired logic, we propose StreamQRE, which provides natural and high-level constructs for processing streaming data. Our language has a novel integration of linguistic constructs from two distinct programming paradigms: streaming extensions of relational query languages and quantitative extensions of regular expressions. The former allows the programmer to employ relational constructs to partition the input data by keys and to integrate data streams from different sources, while the latter can be used to exploit the logical hierarchy in the input stream for modular specifications. We first present the core language with a small set of combinators, formal semantics, and a decidable type system. We then show how to express a number of common patterns with illustrative examples. Our compilation algorithm translates the high-level query into a streaming algorithm with precise complexity bounds on per-item processing time and total memory footprint. We also show how to integrate approximation algorithms into our framework. We report on an implementation in Java, and evaluate it with respect to existing high-performance engines for processing streaming data. Our experimental evaluation shows that (1) StreamQRE allows more natural and succinct specification of queries compared to existing frameworks, (2) the throughput of our implementation is higher than comparable systems (for example, two-to-four times greater than RxJava), and (3) the approximation algorithms supported by our implementation can lead to substantial memory savings. PMID:29151821

  1. Measuring up: Implementing a dental quality measure in the electronic health record context.

    PubMed

    Bhardwaj, Aarti; Ramoni, Rachel; Kalenderian, Elsbeth; Neumann, Ana; Hebballi, Nutan B; White, Joel M; McClellan, Lyle; Walji, Muhammad F

    2016-01-01

    Quality improvement requires using quality measures that can be implemented in a valid manner. Using guidelines set forth by the Meaningful Use portion of the Health Information Technology for Economic and Clinical Health Act, the authors assessed the feasibility and performance of an automated electronic Meaningful Use dental clinical quality measure to determine the percentage of children who received fluoride varnish. The authors defined how to implement the automated measure queries in a dental electronic health record. Within records identified through automated query, the authors manually reviewed a subsample to assess the performance of the query. The automated query results revealed that 71.0% of patients had fluoride varnish compared with the manual chart review results that indicated 77.6% of patients had fluoride varnish. The automated quality measure performance results indicated 90.5% sensitivity, 90.8% specificity, 96.9% positive predictive value, and 75.2% negative predictive value. The authors' findings support the feasibility of using automated dental quality measure queries in the context of sufficient structured data. Information noted only in free text rather than in structured data would require using natural language processing approaches to effectively query electronic health records. To participate in self-directed quality improvement, dental clinicians must embrace the accountability era. Commitment to quality will require enhanced documentation to support near-term automated calculation of quality measures. Copyright © 2016 American Dental Association. Published by Elsevier Inc. All rights reserved.

  2. Adding intelligence to scientific data management

    NASA Technical Reports Server (NTRS)

    Campbell, William J.; Short, Nicholas M., Jr.; Treinish, Lloyd A.

    1989-01-01

    NASA plans to solve some of the problems of handling large-scale scientific data bases by turning to artificial intelligence (AI) are discussed. The growth of the information glut and the ways that AI can help alleviate the resulting problems are reviewed. The employment of the Intelligent User Interface prototype, where the user will generate his own natural language query with the assistance of the system, is examined. Spatial data management, scientific data visualization, and data fusion are discussed.

  3. Asking better questions: How presentation formats influence information search.

    PubMed

    Wu, Charley M; Meder, Björn; Filimon, Flavia; Nelson, Jonathan D

    2017-08-01

    While the influence of presentation formats have been widely studied in Bayesian reasoning tasks, we present the first systematic investigation of how presentation formats influence information search decisions. Four experiments were conducted across different probabilistic environments, where subjects (N = 2,858) chose between 2 possible search queries, each with binary probabilistic outcomes, with the goal of maximizing classification accuracy. We studied 14 different numerical and visual formats for presenting information about the search environment, constructed across 6 design features that have been prominently related to improvements in Bayesian reasoning accuracy (natural frequencies, posteriors, complement, spatial extent, countability, and part-to-whole information). The posterior variants of the icon array and bar graph formats led to the highest proportion of correct responses, and were substantially better than the standard probability format. Results suggest that presenting information in terms of posterior probabilities and visualizing natural frequencies using spatial extent (a perceptual feature) were especially helpful in guiding search decisions, although environments with a mixture of probabilistic and certain outcomes were challenging across all formats. Subjects who made more accurate probability judgments did not perform better on the search task, suggesting that simple decision heuristics may be used to make search decisions without explicitly applying Bayesian inference to compute probabilities. We propose a new take-the-difference (TTD) heuristic that identifies the accuracy-maximizing query without explicit computation of posterior probabilities. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  4. A Rule-Based Spatial Reasoning Approach for OpenStreetMap Data Quality Enrichment; Case Study of Routing and Navigation

    PubMed Central

    2017-01-01

    Finding relevant geospatial information is increasingly critical because of the growing volume of geospatial data available within the emerging “Big Data” era. Users are expecting that the availability of massive datasets will create more opportunities to uncover hidden information and answer more complex queries. This is especially the case with routing and navigation services where the ability to retrieve points of interest and landmarks make the routing service personalized, precise, and relevant. In this paper, we propose a new geospatial information approach that enables the retrieval of implicit information, i.e., geospatial entities that do not exist explicitly in the available source. We present an information broker that uses a rule-based spatial reasoning algorithm to detect topological relations. The information broker is embedded into a framework where annotations and mappings between OpenStreetMap data attributes and external resources, such as taxonomies, support the enrichment of queries to improve the ability of the system to retrieve information. Our method is tested with two case studies that leads to enriching the completeness of OpenStreetMap data with footway crossing points-of-interests as well as building entrances for routing and navigation purposes. It is concluded that the proposed approach can uncover implicit entities and contribute to extract required information from the existing datasets. PMID:29088125

  5. Shark: SQL and Analytics with Cost-Based Query Optimization on Coarse-Grained Distributed Memory

    DTIC Science & Technology

    2014-01-13

    RDBMS and contains a database (often MySQL or Derby) with a namespace for tables, table metadata and partition information. Table data is stored in an...serialization/deserialization) Java interface implementations with corresponding object inspectors. The Hive driver controls the processing of queries, coordinat...native API, RDD operations are invoked through a functional interface similar to DryadLINQ [32] in Scala, Java or Python. For example, the Scala code for

  6. DOE Office of Scientific and Technical Information (OSTI.GOV)

    IRIS is a search tool plug-in that is used to implement latent topic feedback for enhancing text navigation. It accepts a list of returned documents from an information retrieval wywtem that is generated from keyword search queries. Data is pulled directly from a topic information database and processed by IRIS to determine the most prominent and relevant topics, along with topic-ngrams, associated with the list of returned documents. User selected topics are then used to expand the query and presumabley refine the search results.

  7. Aligning HST Images to Gaia: A Faster Mosaicking Workflow

    NASA Astrophysics Data System (ADS)

    Bajaj, V.

    2017-11-01

    We present a fully programmatic workflow for aligning HST images using the high-quality astrometry provided by Gaia Data Release 1. Code provided in a Jupyter Notebook works through this procedure, including parsing the data to determine the query area parameters, querying Gaia for the coordinate catalog, and using the catalog with TweakReg as reference catalog. This workflow greatly simplifies the normally time-consuming process of aligning HST images, especially those taken as part of mosaics.

  8. Geoscience information integration and visualization research of Shandong Province, China based on ArcGIS engine

    NASA Astrophysics Data System (ADS)

    Xu, Mingzhu; Gao, Zhiqiang; Ning, Jicai

    2014-10-01

    To improve the access efficiency of geoscience data, efficient data model and storage solutions should be used. Geoscience data is usually classified by format or coordinate system in existing storage solutions. When data is large, it is not conducive to search the geographic features. In this study, a geographical information integration system of Shandong province, China was developed based on the technology of ArcGIS Engine, .NET, and SQL Server. It uses Geodatabase spatial data model and ArcSDE to organize and store spatial and attribute data and establishes geoscience database of Shangdong. Seven function modules were designed: map browse, database and subject management, layer control, map query, spatial analysis and map symbolization. The system's characteristics of can be browsed and managed by geoscience subjects make the system convenient for geographic researchers and decision-making departments to use the data.

  9. Study of Earthquake Disaster Prediction System of Langfang city Based on GIS

    NASA Astrophysics Data System (ADS)

    Huang, Meng; Zhang, Dian; Li, Pan; Zhang, YunHui; Zhang, RuoFei

    2017-07-01

    In this paper, according to the status of China’s need to improve the ability of earthquake disaster prevention, this paper puts forward the implementation plan of earthquake disaster prediction system of Langfang city based on GIS. Based on the GIS spatial database, coordinate transformation technology, GIS spatial analysis technology and PHP development technology, the seismic damage factor algorithm is used to predict the damage of the city under different intensity earthquake disaster conditions. The earthquake disaster prediction system of Langfang city is based on the B / S system architecture. Degree and spatial distribution and two-dimensional visualization display, comprehensive query analysis and efficient auxiliary decision-making function to determine the weak earthquake in the city and rapid warning. The system has realized the transformation of the city’s earthquake disaster reduction work from static planning to dynamic management, and improved the city’s earthquake and disaster prevention capability.

  10. Intraoperative virtual brain counseling

    NASA Astrophysics Data System (ADS)

    Jiang, Zhaowei; Grosky, William I.; Zamorano, Lucia J.; Muzik, Otto; Diaz, Fernando

    1997-06-01

    Our objective is to offer online real-tim e intelligent guidance to the neurosurgeon. Different from traditional image-guidance technologies that offer intra-operative visualization of medical images or atlas images, virtual brain counseling goes one step further. It can distinguish related brain structures and provide information about them intra-operatively. Virtual brain counseling is the foundation for surgical planing optimization and on-line surgical reference. It can provide a warning system that alerts the neurosurgeon if the chosen trajectory will pass through eloquent brain areas. In order to fulfill this objective, tracking techniques are involved for intra- operativity. Most importantly, a 3D virtual brian environment, different from traditional 3D digitized atlases, is an object-oriented model of the brain that stores information about different brain structures together with their elated information. An object-oriented hierarchical hyper-voxel space (HHVS) is introduced to integrate anatomical and functional structures. Spatial queries based on position of interest, line segment of interest, and volume of interest are introduced in this paper. The virtual brain environment is integrated with existing surgical pre-planning and intra-operative tracking systems to provide information for planning optimization and on-line surgical guidance. The neurosurgeon is alerted automatically if the planned treatment affects any critical structures. Architectures such as HHVS and algorithms, such as spatial querying, normalizing, and warping are presented in the paper. A prototype has shown that the virtual brain is intuitive in its hierarchical 3D appearance. It also showed that HHVS, as the key structure for virtual brain counseling, efficiently integrates multi-scale brain structures based on their spatial relationships.This is a promising development for optimization of treatment plans and online surgical intelligent guidance.

  11. Computing health quality measures using Informatics for Integrating Biology and the Bedside.

    PubMed

    Klann, Jeffrey G; Murphy, Shawn N

    2013-04-19

    The Health Quality Measures Format (HQMF) is a Health Level 7 (HL7) standard for expressing computable Clinical Quality Measures (CQMs). Creating tools to process HQMF queries in clinical databases will become increasingly important as the United States moves forward with its Health Information Technology Strategic Plan to Stages 2 and 3 of the Meaningful Use incentive program (MU2 and MU3). Informatics for Integrating Biology and the Bedside (i2b2) is one of the analytical databases used as part of the Office of the National Coordinator (ONC)'s Query Health platform to move toward this goal. Our goal is to integrate i2b2 with the Query Health HQMF architecture, to prepare for other HQMF use-cases (such as MU2 and MU3), and to articulate the functional overlap between i2b2 and HQMF. Therefore, we analyze the structure of HQMF, and then we apply this understanding to HQMF computation on the i2b2 clinical analytical database platform. Specifically, we develop a translator between two query languages, HQMF and i2b2, so that the i2b2 platform can compute HQMF queries. We use the HQMF structure of queries for aggregate reporting, which define clinical data elements and the temporal and logical relationships between them. We use the i2b2 XML format, which allows flexible querying of a complex clinical data repository in an easy-to-understand domain-specific language. The translator can represent nearly any i2b2-XML query as HQMF and execute in i2b2 nearly any HQMF query expressible in i2b2-XML. This translator is part of the freely available reference implementation of the QueryHealth initiative. We analyze limitations of the conversion and find it covers many, but not all, of the complex temporal and logical operators required by quality measures. HQMF is an expressive language for defining quality measures, and it will be important to understand and implement for CQM computation, in both meaningful use and population health. However, its current form might allow complexity that is intractable for current database systems (both in terms of implementation and computation). Our translator, which supports the subset of HQMF currently expressible in i2b2-XML, may represent the beginnings of a practical compromise. It is being pilot-tested in two Query Health demonstration projects, and it can be further expanded to balance computational tractability with the advanced features needed by measure developers.

  12. Computing Health Quality Measures Using Informatics for Integrating Biology and the Bedside

    PubMed Central

    Murphy, Shawn N

    2013-01-01

    Background The Health Quality Measures Format (HQMF) is a Health Level 7 (HL7) standard for expressing computable Clinical Quality Measures (CQMs). Creating tools to process HQMF queries in clinical databases will become increasingly important as the United States moves forward with its Health Information Technology Strategic Plan to Stages 2 and 3 of the Meaningful Use incentive program (MU2 and MU3). Informatics for Integrating Biology and the Bedside (i2b2) is one of the analytical databases used as part of the Office of the National Coordinator (ONC)’s Query Health platform to move toward this goal. Objective Our goal is to integrate i2b2 with the Query Health HQMF architecture, to prepare for other HQMF use-cases (such as MU2 and MU3), and to articulate the functional overlap between i2b2 and HQMF. Therefore, we analyze the structure of HQMF, and then we apply this understanding to HQMF computation on the i2b2 clinical analytical database platform. Specifically, we develop a translator between two query languages, HQMF and i2b2, so that the i2b2 platform can compute HQMF queries. Methods We use the HQMF structure of queries for aggregate reporting, which define clinical data elements and the temporal and logical relationships between them. We use the i2b2 XML format, which allows flexible querying of a complex clinical data repository in an easy-to-understand domain-specific language. Results The translator can represent nearly any i2b2-XML query as HQMF and execute in i2b2 nearly any HQMF query expressible in i2b2-XML. This translator is part of the freely available reference implementation of the QueryHealth initiative. We analyze limitations of the conversion and find it covers many, but not all, of the complex temporal and logical operators required by quality measures. Conclusions HQMF is an expressive language for defining quality measures, and it will be important to understand and implement for CQM computation, in both meaningful use and population health. However, its current form might allow complexity that is intractable for current database systems (both in terms of implementation and computation). Our translator, which supports the subset of HQMF currently expressible in i2b2-XML, may represent the beginnings of a practical compromise. It is being pilot-tested in two Query Health demonstration projects, and it can be further expanded to balance computational tractability with the advanced features needed by measure developers. PMID:23603227

  13. Survey of Event Processing

    DTIC Science & Technology

    2007-12-01

    1 A Brief History of Event Processing... history of event processing. The Applications section defines several application domains and use cases for event processing technology. Event...subscription” and “subscription language” will be used where some will often use “(continuous) query” or “query language.” A Brief History of

  14. Empirical evaluation of the Process Overview Measure for assessing situation awareness in process plants.

    PubMed

    Lau, Nathan; Jamieson, Greg A; Skraaning, Gyrd

    2016-03-01

    The Process Overview Measure is a query-based measure developed to assess operator situation awareness (SA) from monitoring process plants. A companion paper describes how the measure has been developed according to process plant properties and operator cognitive work. The Process Overview Measure demonstrated practicality, sensitivity, validity and reliability in two full-scope simulator experiments investigating dramatically different operational concepts. Practicality was assessed based on qualitative feedback of participants and researchers. The Process Overview Measure demonstrated sensitivity and validity by revealing significant effects of experimental manipulations that corroborated with other empirical results. The measure also demonstrated adequate inter-rater reliability and practicality for measuring SA in full-scope simulator settings based on data collected on process experts. Thus, full-scope simulator studies can employ the Process Overview Measure to reveal the impact of new control room technology and operational concepts on monitoring process plants. Practitioner Summary: The Process Overview Measure is a query-based measure that demonstrated practicality, sensitivity, validity and reliability for assessing operator situation awareness (SA) from monitoring process plants in representative settings.

  15. A prototype feature system for feature retrieval using relationships

    USGS Publications Warehouse

    Choi, J.; Usery, E.L.

    2009-01-01

    Using a feature data model, geographic phenomena can be represented effectively by integrating space, theme, and time. This paper extends and implements a feature data model that supports query and visualization of geographic features using their non-spatial and temporal relationships. A prototype feature-oriented geographic information system (FOGIS) is then developed and storage of features named Feature Database is designed. Buildings from the U.S. Marine Corps Base, Camp Lejeune, North Carolina and subways in Chicago, Illinois are used to test the developed system. The results of the applications show the strength of the feature data model and the developed system 'FOGIS' when they utilize non-spatial and temporal relationships in order to retrieve and visualize individual features.

  16. The Voronoi spatio-temporal data structure

    NASA Astrophysics Data System (ADS)

    Mioc, Darka

    2002-04-01

    Current GIS models cannot integrate the temporal dimension of spatial data easily. Indeed, current GISs do not support incremental (local) addition and deletion of spatial objects, and they can not support the temporal evolution of spatial data. Spatio-temporal facilities would be very useful in many GIS applications: harvesting and forest planning, cadastre, urban and regional planning, and emergency planning. The spatio-temporal model that can overcome these problems is based on a topological model---the Voronoi data structure. Voronoi diagrams are irregular tessellations of space, that adapt to spatial objects and therefore they are a synthesis of raster and vector spatial data models. The main advantage of the Voronoi data structure is its local and sequential map updates, which allows us to automatically record each event and performed map updates within the system. These map updates are executed through map construction commands that are composed of atomic actions (geometric algorithms for addition, deletion, and motion of spatial objects) on the dynamic Voronoi data structure. The formalization of map commands led to the development of a spatial language comprising a set of atomic operations or constructs on spatial primitives (points and lines), powerful enough to define the complex operations. This resulted in a new formal model for spatio-temporal change representation, where each update is uniquely characterized by the numbers of newly created and inactivated Voronoi regions. This is used for the extension of the model towards the hierarchical Voronoi data structure. In this model, spatio-temporal changes induced by map updates are preserved in a hierarchical data structure that combines events and corresponding changes in topology. This hierarchical Voronoi data structure has an implicit time ordering of events visible through changes in topology, and it is equivalent to an event structure that can support temporal data without precise temporal information. This formal model of spatio-temporal change representation is currently applied to retroactive map updates and visualization of map evolution. It offers new possibilities in the domains of temporal GIS, transaction processing, spatio-temporal queries, spatio-temporal analysis, map animation and map visualization.

  17. Integration of remote sensing and geographic information systems for Great Lakes water quality monitoring

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lathrop, R.G. Jr.

    1988-01-01

    The utility of three operational satellite remote sensing systems, namely, the Landsat Thematic Mapper (TM), the SPOT High Resolution Visible (HRV) sensors and the NOAA Advanced Very High Resolution Radiometer (AVHRR), were evaluated as a means of estimating water quality and surface temperature. Empirical calibration through linear regression techniques was used to relate near-simultaneously acquired satellite radiance/reflectance data and water quality observations obtained in Green Bay and the nearshore waters of Lake Michigan. Four dates of TM and one date each of SPOT and AVHRR imagery/surface reference data were acquired and analyzed. Highly significant relationships were identified between the TMmore » and SPOT data and secchi disk depth, nephelometric turbidity, chlorophyll a, total suspended solids (TSS), absorbance, and surface temperature (TM only). The AVHRR data were not analyzed independently but were used for comparison with the TM data. Calibrated water quality image maps were input to a PC-based raster GIS package, EPPL7. Pattern interpretation and spatial analysis techniques were used to document the circulation dynamics and model mixing processes in Green Bay. A GIS facilitates the retrieval, query and spatial analysis of mapped information and provides the framework for an integrated operational monitoring system for the Great Lakes.« less

  18. Reflections on organizational issues in developing, implementing, and maintaining state Web-based data query systems.

    PubMed

    Love, Denise; Shah, Gulzar H

    2006-01-01

    Emerging technologies, such as Web-based data query systems (WDQSs), provide opportunities for state and local agencies to systematically organize and disseminate data to broad audiences and streamline the data distribution process. Despite the progress in WDQSs' implementation, led by agencies considered the "early adopters," there are still agencies left behind. This article explores the organizational issues and barriers to development of WDQSs in public health agencies and highlights factors facilitating the implementation of WDQSs.

  19. Lyceum: A Multi-Protocol Digital Library Gateway

    NASA Technical Reports Server (NTRS)

    Maa, Ming-Hokng; Nelson, Michael L.; Esler, Sandra L.

    1997-01-01

    Lyceum is a prototype scalable query gateway that provides a logically central interface to multi-protocol and physically distributed, digital libraries of scientific and technical information. Lyceum processes queries to multiple syntactically distinct search engines used by various distributed information servers from a single logically central interface without modification of the remote search engines. A working prototype (http://www.larc.nasa.gov/lyceum/) demonstrates the capabilities, potentials, and advantages of this type of meta-search engine by providing access to over 50 servers covering over 20 disciplines.

  20. Towards a Simple and Efficient Web Search Framework

    DTIC Science & Technology

    2014-11-01

    any useful information about the various aspects of a topic. For example, for the query “ raspberry pi ”, it covers topics such as “what is raspberry pi ...topics generated by the LDA topic model for query ” raspberry pi ”. One simple explanation is that web texts are too noisy and unfocused for the LDA process...making a rasp- berry pi ”. However, the topics generated based on the 10 top ranked documents do not make much sense to us in terms of their keywords

  1. The white matter query language: a novel approach for describing human white matter anatomy

    PubMed Central

    Makris, Nikos; Rathi, Yogesh; Shenton, Martha; Kikinis, Ron; Kubicki, Marek; Westin, Carl-Fredrik

    2016-01-01

    We have developed a novel method to describe human white matter anatomy using an approach that is both intuitive and simple to use, and which automatically extracts white matter tracts from diffusion MRI volumes. Further, our method simplifies the quantification and statistical analysis of white matter tracts on large diffusion MRI databases. This work reflects the careful syntactical definition of major white matter fiber tracts in the human brain based on a neuroanatomist’s expert knowledge. The framework is based on a novel query language with a near-to-English textual syntax. This query language makes it possible to construct a dictionary of anatomical definitions that describe white matter tracts. The definitions include adjacent gray and white matter regions, and rules for spatial relations. This novel method makes it possible to automatically label white matter anatomy across subjects. After describing this method, we provide an example of its implementation where we encode anatomical knowledge in human white matter for ten association and 15 projection tracts per hemisphere, along with seven commissural tracts. Importantly, this novel method is comparable in accuracy to manual labeling. Finally, we present results applying this method to create a white matter atlas from 77 healthy subjects, and we use this atlas in a small proof-of-concept study to detect changes in association tracts that characterize schizophrenia. PMID:26754839

  2. The white matter query language: a novel approach for describing human white matter anatomy.

    PubMed

    Wassermann, Demian; Makris, Nikos; Rathi, Yogesh; Shenton, Martha; Kikinis, Ron; Kubicki, Marek; Westin, Carl-Fredrik

    2016-12-01

    We have developed a novel method to describe human white matter anatomy using an approach that is both intuitive and simple to use, and which automatically extracts white matter tracts from diffusion MRI volumes. Further, our method simplifies the quantification and statistical analysis of white matter tracts on large diffusion MRI databases. This work reflects the careful syntactical definition of major white matter fiber tracts in the human brain based on a neuroanatomist's expert knowledge. The framework is based on a novel query language with a near-to-English textual syntax. This query language makes it possible to construct a dictionary of anatomical definitions that describe white matter tracts. The definitions include adjacent gray and white matter regions, and rules for spatial relations. This novel method makes it possible to automatically label white matter anatomy across subjects. After describing this method, we provide an example of its implementation where we encode anatomical knowledge in human white matter for ten association and 15 projection tracts per hemisphere, along with seven commissural tracts. Importantly, this novel method is comparable in accuracy to manual labeling. Finally, we present results applying this method to create a white matter atlas from 77 healthy subjects, and we use this atlas in a small proof-of-concept study to detect changes in association tracts that characterize schizophrenia.

  3. Nanocubes for real-time exploration of spatiotemporal datasets.

    PubMed

    Lins, Lauro; Klosowski, James T; Scheidegger, Carlos

    2013-12-01

    Consider real-time exploration of large multidimensional spatiotemporal datasets with billions of entries, each defined by a location, a time, and other attributes. Are certain attributes correlated spatially or temporally? Are there trends or outliers in the data? Answering these questions requires aggregation over arbitrary regions of the domain and attributes of the data. Many relational databases implement the well-known data cube aggregation operation, which in a sense precomputes every possible aggregate query over the database. Data cubes are sometimes assumed to take a prohibitively large amount of space, and to consequently require disk storage. In contrast, we show how to construct a data cube that fits in a modern laptop's main memory, even for billions of entries; we call this data structure a nanocube. We present algorithms to compute and query a nanocube, and show how it can be used to generate well-known visual encodings such as heatmaps, histograms, and parallel coordinate plots. When compared to exact visualizations created by scanning an entire dataset, nanocube plots have bounded screen error across a variety of scales, thanks to a hierarchical structure in space and time. We demonstrate the effectiveness of our technique on a variety of real-world datasets, and present memory, timing, and network bandwidth measurements. We find that the timings for the queries in our examples are dominated by network and user-interaction latencies.

  4. FoldMiner and LOCK 2: protein structure comparison and motif discovery on the web.

    PubMed

    Shapiro, Jessica; Brutlag, Douglas

    2004-07-01

    The FoldMiner web server (http://foldminer.stanford.edu/) provides remote access to methods for protein structure alignment and unsupervised motif discovery. FoldMiner is unique among such algorithms in that it improves both the motif definition and the sensitivity of a structural similarity search by combining the search and motif discovery methods and using information from each process to enhance the other. In a typical run, a query structure is aligned to all structures in one of several databases of single domain targets in order to identify its structural neighbors and to discover a motif that is the basis for the similarity among the query and statistically significant targets. This process is fully automated, but options for manual refinement of the results are available as well. The server uses the Chime plugin and customized controls to allow for visualization of the motif and of structural superpositions. In addition, we provide an interface to the LOCK 2 algorithm for rapid alignments of a query structure to smaller numbers of user-specified targets.

  5. Approach to Privacy-Preserve Data in Two-Tiered Wireless Sensor Network Based on Linear System and Histogram

    NASA Astrophysics Data System (ADS)

    Dang, Van H.; Wohlgemuth, Sven; Yoshiura, Hiroshi; Nguyen, Thuc D.; Echizen, Isao

    Wireless sensor network (WSN) has been one of key technologies for the future with broad applications from the military to everyday life [1,2,3,4,5]. There are two kinds of WSN model models with sensors for sensing data and a sink for receiving and processing queries from users; and models with special additional nodes capable of storing large amounts of data from sensors and processing queries from the sink. Among the latter type, a two-tiered model [6,7] has been widely adopted because of its storage and energy saving benefits for weak sensors, as proved by the advent of commercial storage node products such as Stargate [8] and RISE. However, by concentrating storage in certain nodes, this model becomes more vulnerable to attack. Our novel technique, called zip-histogram, contributes to solving the problems of previous studies [6,7] by protecting the stored data's confidentiality and integrity (including data from the sensor and queries from the sink) against attackers who might target storage nodes in two-tiered WSNs.

  6. Preliminary Results on Uncertainty Quantification for Pattern Analytics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Stracuzzi, David John; Brost, Randolph; Chen, Maximillian Gene

    2015-09-01

    This report summarizes preliminary research into uncertainty quantification for pattern ana- lytics within the context of the Pattern Analytics to Support High-Performance Exploitation and Reasoning (PANTHER) project. The primary focus of PANTHER was to make large quantities of remote sensing data searchable by analysts. The work described in this re- port adds nuance to both the initial data preparation steps and the search process. Search queries are transformed from does the specified pattern exist in the data? to how certain is the system that the returned results match the query? We show example results for both data processing and search,more » and discuss a number of possible improvements for each.« less

  7. A Fine-Grained and Privacy-Preserving Query Scheme for Fog Computing-Enhanced Location-Based Service

    PubMed Central

    Yin, Fan; Tang, Xiaohu

    2017-01-01

    Location-based services (LBS), as one of the most popular location-awareness applications, has been further developed to achieve low-latency with the assistance of fog computing. However, privacy issues remain a research challenge in the context of fog computing. Therefore, in this paper, we present a fine-grained and privacy-preserving query scheme for fog computing-enhanced location-based services, hereafter referred to as FGPQ. In particular, mobile users can obtain the fine-grained searching result satisfying not only the given spatial range but also the searching content. Detailed privacy analysis shows that our proposed scheme indeed achieves the privacy preservation for the LBS provider and mobile users. In addition, extensive performance analyses and experiments demonstrate that the FGPQ scheme can significantly reduce computational and communication overheads and ensure the low-latency, which outperforms existing state-of-the art schemes. Hence, our proposed scheme is more suitable for real-time LBS searching. PMID:28696395

  8. A Fine-Grained and Privacy-Preserving Query Scheme for Fog Computing-Enhanced Location-Based Service.

    PubMed

    Yang, Xue; Yin, Fan; Tang, Xiaohu

    2017-07-11

    Location-based services (LBS), as one of the most popular location-awareness applications, has been further developed to achieve low-latency with the assistance of fog computing. However, privacy issues remain a research challenge in the context of fog computing. Therefore, in this paper, we present a fine-grained and privacy-preserving query scheme for fog computing-enhanced location-based services, hereafter referred to as FGPQ. In particular, mobile users can obtain the fine-grained searching result satisfying not only the given spatial range but also the searching content. Detailed privacy analysis shows that our proposed scheme indeed achieves the privacy preservation for the LBS provider and mobile users. In addition, extensive performance analyses and experiments demonstrate that the FGPQ scheme can significantly reduce computational and communication overheads and ensure the low-latency, which outperforms existing state-of-the art schemes. Hence, our proposed scheme is more suitable for real-time LBS searching.

  9. Cyberinfrastructure for the digital brain: spatial standards for integrating rodent brain atlases

    PubMed Central

    Zaslavsky, Ilya; Baldock, Richard A.; Boline, Jyl

    2014-01-01

    Biomedical research entails capture and analysis of massive data volumes and new discoveries arise from data-integration and mining. This is only possible if data can be mapped onto a common framework such as the genome for genomic data. In neuroscience, the framework is intrinsically spatial and based on a number of paper atlases. This cannot meet today's data-intensive analysis and integration challenges. A scalable and extensible software infrastructure that is standards based but open for novel data and resources, is required for integrating information such as signal distributions, gene-expression, neuronal connectivity, electrophysiology, anatomy, and developmental processes. Therefore, the International Neuroinformatics Coordinating Facility (INCF) initiated the development of a spatial framework for neuroscience data integration with an associated Digital Atlasing Infrastructure (DAI). A prototype implementation of this infrastructure for the rodent brain is reported here. The infrastructure is based on a collection of reference spaces to which data is mapped at the required resolution, such as the Waxholm Space (WHS), a 3D reconstruction of the brain generated using high-resolution, multi-channel microMRI. The core standards of the digital atlasing service-oriented infrastructure include Waxholm Markup Language (WaxML): XML schema expressing a uniform information model for key elements such as coordinate systems, transformations, points of interest (POI)s, labels, and annotations; and Atlas Web Services: interfaces for querying and updating atlas data. The services return WaxML-encoded documents with information about capabilities, spatial reference systems (SRSs) and structures, and execute coordinate transformations and POI-based requests. Key elements of INCF-DAI cyberinfrastructure have been prototyped for both mouse and rat brain atlas sources, including the Allen Mouse Brain Atlas, UCSD Cell-Centered Database, and Edinburgh Mouse Atlas Project. PMID:25309417

  10. Cyberinfrastructure for the digital brain: spatial standards for integrating rodent brain atlases.

    PubMed

    Zaslavsky, Ilya; Baldock, Richard A; Boline, Jyl

    2014-01-01

    Biomedical research entails capture and analysis of massive data volumes and new discoveries arise from data-integration and mining. This is only possible if data can be mapped onto a common framework such as the genome for genomic data. In neuroscience, the framework is intrinsically spatial and based on a number of paper atlases. This cannot meet today's data-intensive analysis and integration challenges. A scalable and extensible software infrastructure that is standards based but open for novel data and resources, is required for integrating information such as signal distributions, gene-expression, neuronal connectivity, electrophysiology, anatomy, and developmental processes. Therefore, the International Neuroinformatics Coordinating Facility (INCF) initiated the development of a spatial framework for neuroscience data integration with an associated Digital Atlasing Infrastructure (DAI). A prototype implementation of this infrastructure for the rodent brain is reported here. The infrastructure is based on a collection of reference spaces to which data is mapped at the required resolution, such as the Waxholm Space (WHS), a 3D reconstruction of the brain generated using high-resolution, multi-channel microMRI. The core standards of the digital atlasing service-oriented infrastructure include Waxholm Markup Language (WaxML): XML schema expressing a uniform information model for key elements such as coordinate systems, transformations, points of interest (POI)s, labels, and annotations; and Atlas Web Services: interfaces for querying and updating atlas data. The services return WaxML-encoded documents with information about capabilities, spatial reference systems (SRSs) and structures, and execute coordinate transformations and POI-based requests. Key elements of INCF-DAI cyberinfrastructure have been prototyped for both mouse and rat brain atlas sources, including the Allen Mouse Brain Atlas, UCSD Cell-Centered Database, and Edinburgh Mouse Atlas Project.

  11. A Semantic Approach for Geospatial Information Extraction from Unstructured Documents

    NASA Astrophysics Data System (ADS)

    Sallaberry, Christian; Gaio, Mauro; Lesbegueries, Julien; Loustau, Pierre

    Local cultural heritage document collections are characterized by their content, which is strongly attached to a territory and its land history (i.e., geographical references). Our contribution aims at making the content retrieval process more efficient whenever a query includes geographic criteria. We propose a core model for a formal representation of geographic information. It takes into account characteristics of different modes of expression, such as written language, captures of drawings, maps, photographs, etc. We have developed a prototype that fully implements geographic information extraction (IE) and geographic information retrieval (IR) processes. All PIV prototype processing resources are designed as Web Services. We propose a geographic IE process based on semantic treatment as a supplement to classical IE approaches. We implement geographic IR by using intersection computing algorithms that seek out any intersection between formal geocoded representations of geographic information in a user query and similar representations in document collection indexes.

  12. Implementation of the common phrase index method on the phrase query for information retrieval

    NASA Astrophysics Data System (ADS)

    Fatmawati, Triyah; Zaman, Badrus; Werdiningsih, Indah

    2017-08-01

    As the development of technology, the process of finding information on the news text is easy, because the text of the news is not only distributed in print media, such as newspapers, but also in electronic media that can be accessed using the search engine. In the process of finding relevant documents on the search engine, a phrase often used as a query. The number of words that make up the phrase query and their position obviously affect the relevance of the document produced. As a result, the accuracy of the information obtained will be affected. Based on the outlined problem, the purpose of this research was to analyze the implementation of the common phrase index method on information retrieval. This research will be conducted in English news text and implemented on a prototype to determine the relevance level of the documents produced. The system is built with the stages of pre-processing, indexing, term weighting calculation, and cosine similarity calculation. Then the system will display the document search results in a sequence, based on the cosine similarity. Furthermore, system testing will be conducted using 100 documents and 20 queries. That result is then used for the evaluation stage. First, determine the relevant documents using kappa statistic calculation. Second, determine the system success rate using precision, recall, and F-measure calculation. In this research, the result of kappa statistic calculation was 0.71, so that the relevant documents are eligible for the system evaluation. Then the calculation of precision, recall, and F-measure produces precision of 0.37, recall of 0.50, and F-measure of 0.43. From this result can be said that the success rate of the system to produce relevant documents is low.

  13. Design of FastQuery: How to Generalize Indexing and Querying System for Scientific Data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wu, Jerry; Wu, Kesheng

    2011-04-18

    Modern scientific datasets present numerous data management and analysis challenges. State-of-the-art index and query technologies such as FastBit are critical for facilitating interactive exploration of large datasets. These technologies rely on adding auxiliary information to existing datasets to accelerate query processing. To use these indices, we need to match the relational data model used by the indexing systems with the array data model used by most scientific data, and to provide an efficient input and output layer for reading and writing the indices. In this work, we present a flexible design that can be easily applied to most scientific datamore » formats. We demonstrate this flexibility by applying it to two of the most commonly used scientific data formats, HDF5 and NetCDF. We present two case studies using simulation data from the particle accelerator and climate simulation communities. To demonstrate the effectiveness of the new design, we also present a detailed performance study using both synthetic and real scientific workloads.« less

  14. Petaminer: Using ROOT for efficient data storage in MySQL database

    NASA Astrophysics Data System (ADS)

    Cranshaw, J.; Malon, D.; Vaniachine, A.; Fine, V.; Lauret, J.; Hamill, P.

    2010-04-01

    High Energy and Nuclear Physics (HENP) experiments store Petabytes of event data and Terabytes of calibration data in ROOT files. The Petaminer project is developing a custom MySQL storage engine to enable the MySQL query processor to directly access experimental data stored in ROOT files. Our project is addressing the problem of efficient navigation to PetaBytes of HENP experimental data described with event-level TAG metadata, which is required by data intensive physics communities such as the LHC and RHIC experiments. Physicists need to be able to compose a metadata query and rapidly retrieve the set of matching events, where improved efficiency will facilitate the discovery process by permitting rapid iterations of data evaluation and retrieval. Our custom MySQL storage engine enables the MySQL query processor to directly access TAG data stored in ROOT TTrees. As ROOT TTrees are column-oriented, reading them directly provides improved performance over traditional row-oriented TAG databases. Leveraging the flexible and powerful SQL query language to access data stored in ROOT TTrees, the Petaminer approach enables rich MySQL index-building capabilities for further performance optimization.

  15. Semantics-Based Intelligent Indexing and Retrieval of Digital Images - A Case Study

    NASA Astrophysics Data System (ADS)

    Osman, Taha; Thakker, Dhavalkumar; Schaefer, Gerald

    The proliferation of digital media has led to a huge interest in classifying and indexing media objects for generic search and usage. In particular, we are witnessing colossal growth in digital image repositories that are difficult to navigate using free-text search mechanisms, which often return inaccurate matches as they typically rely on statistical analysis of query keyword recurrence in the image annotation or surrounding text. In this chapter we present a semantically enabled image annotation and retrieval engine that is designed to satisfy the requirements of commercial image collections market in terms of both accuracy and efficiency of the retrieval process. Our search engine relies on methodically structured ontologies for image annotation, thus allowing for more intelligent reasoning about the image content and subsequently obtaining a more accurate set of results and a richer set of alternatives matchmaking the original query. We also show how our well-analysed and designed domain ontology contributes to the implicit expansion of user queries as well as presenting our initial thoughts on exploiting lexical databases for explicit semantic-based query expansion.

  16. Using discordance to improve classification in narrative clinical databases: an application to community-acquired pneumonia.

    PubMed

    Hripcsak, George; Knirsch, Charles; Zhou, Li; Wilcox, Adam; Melton, Genevieve B

    2007-03-01

    Data mining in electronic medical records may facilitate clinical research, but much of the structured data may be miscoded, incomplete, or non-specific. The exploitation of narrative data using natural language processing may help, although nesting, varying granularity, and repetition remain challenges. In a study of community-acquired pneumonia using electronic records, these issues led to poor classification. Limiting queries to accurate, complete records led to vastly reduced, possibly biased samples. We exploited knowledge latent in the electronic records to improve classification. A similarity metric was used to cluster cases. We defined discordance as the degree to which cases within a cluster give different answers for some query that addresses a classification task of interest. Cases with higher discordance are more likely to be incorrectly classified, and can be reviewed manually to adjust the classification, improve the query, or estimate the likely accuracy of the query. In a study of pneumonia--in which the ICD9-CM coding was found to be very poor--the discordance measure was statistically significantly correlated with classification correctness (.45; 95% CI .15-.62).

  17. Profile-IQ: Web-based data query system for local health department infrastructure and activities.

    PubMed

    Shah, Gulzar H; Leep, Carolyn J; Alexander, Dayna

    2014-01-01

    To demonstrate the use of National Association of County & City Health Officials' Profile-IQ, a Web-based data query system, and how policy makers, researchers, the general public, and public health professionals can use the system to generate descriptive statistics on local health departments. This article is a descriptive account of an important health informatics tool based on information from the project charter for Profile-IQ and the authors' experience and knowledge in design and use of this query system. Profile-IQ is a Web-based data query system that is based on open-source software: MySQL 5.5, Google Web Toolkit 2.2.0, Apache Commons Math library, Google Chart API, and Tomcat 6.0 Web server deployed on an Amazon EC2 server. It supports dynamic queries of National Profile of Local Health Departments data on local health department finances, workforce, and activities. Profile-IQ's customizable queries provide a variety of statistics not available in published reports and support the growing information needs of users who do not wish to work directly with data files for lack of staff skills or time, or to avoid a data use agreement. Profile-IQ also meets the growing demand of public health practitioners and policy makers for data to support quality improvement, community health assessment, and other processes associated with voluntary public health accreditation. It represents a step forward in the recent health informatics movement of data liberation and use of open source information technology solutions to promote public health.

  18. Retrieval of diagnostic and treatment studies for clinical use through PubMed and PubMed's Clinical Queries filters.

    PubMed

    Lokker, Cynthia; Haynes, R Brian; Wilczynski, Nancy L; McKibbon, K Ann; Walter, Stephen D

    2011-01-01

    Clinical Queries filters were developed to improve the retrieval of high-quality studies in searches on clinical matters. The study objective was to determine the yield of relevant citations and physician satisfaction while searching for diagnostic and treatment studies using the Clinical Queries page of PubMed compared with searching PubMed without these filters. Forty practicing physicians, presented with standardized treatment and diagnosis questions and one question of their choosing, entered search terms which were processed in a random, blinded fashion through PubMed alone and PubMed Clinical Queries. Participants rated search retrievals for applicability to the question at hand and satisfaction. For treatment, the primary outcome of retrieval of relevant articles was not significantly different between the groups, but a higher proportion of articles from the Clinical Queries searches met methodologic criteria (p=0.049), and more articles were published in core internal medicine journals (p=0.056). For diagnosis, the filtered results returned more relevant articles (p=0.031) and fewer irrelevant articles (overall retrieval less, p=0.023); participants needed to screen fewer articles before arriving at the first relevant citation (p<0.05). Relevance was also influenced by content terms used by participants in searching. Participants varied greatly in their search performance. Clinical Queries filtered searches returned more high-quality studies, though the retrieval of relevant articles was only statistically different between the groups for diagnosis questions. Retrieving clinically important research studies from Medline is a challenging task for physicians. Methodological search filters can improve search retrieval.

  19. Using STOQS and stoqstoolbox for in situ Measurement Data Access in Matlab

    NASA Astrophysics Data System (ADS)

    López-Castejón, F.; Schlining, B.; McCann, M. P.

    2012-12-01

    This poster presents the stoqstoolbox, an extension to Matlab that simplifies the loading of in situ measurement data directly from STOQS databases. STOQS (Spatial Temporal Oceanographic Query System) is a geospatial database tool designed to provide efficient access to data following the CF-NetCDF Discrete Samples Geometries convention. Data are loaded from CF-NetCDF files into a STOQS database where indexes are created on depth, spatial coordinates and other parameters, e.g. platform type. STOQS provides consistent, simple and efficient methods to query for data. For example, we can request all measurements with a standard_name of sea_water_temperature between two times and from between two depths. Data access is simpler because the data are retrieved by parameter irrespective of platform or mission file names. Access is more efficient because data are retrieved via the index on depth and only the requested data are retrieved from the database and transferred into the Matlab workspace. Applications in the stoqstoolbox query the STOQS database via an HTTP REST application programming interface; they follow the Data Access Object pattern, enabling highly customizable query construction. Data are loaded into Matlab structures that clearly indicate latitude, longitude, depth, measurement data value, and platform name. The stoqstoolbox is designed to be used in concert with other tools, such as nctoolbox, which can load data from any OPeNDAP data source. With these two toolboxes a user can easily work with in situ and other gridded data, such as from numerical models and remote sensing platforms. In order to show the capability of stoqstoolbox we will show an example of model validation using data collected during the May-June 2012 field experiment conducted by the Monterey Bay Aquarium Research Institute (MBARI) in Monterey Bay, California. The data are available from the STOQS server at http://odss.mbari.org/canon/stoqs_may2012/query/. Over 14 million data points of 18 parameters from 6 platforms measured over a 3-week period are available on this server. The model used for comparison is the Regional Ocean Modeling System developed by Jet Propulsion Laboratory for the Monterey Bay. The model output are loaded into Matlab using nctoolbox from the JPL server at http://ourocean.jpl.nasa.gov:8080/thredds/dodsC/MBNowcast. Model validation with in situ measurements can be difficult because of different file formats and because data may be spread across individual data systems for each platform. With stoqstoolbox the researcher must know only the URL of the STOQS server and the OPeNDAP URL of the model output. With selected depth and time constraints a user's Matlab program searches for all in situ measurements available for the same time, depth and variable of the model. STOQS and stoqstoolbox are open source software projects supported by MBARI and the David and Lucile Packard foundation. For more information please see http://code.google.com/p/stoqs.

  20. EarthServer: a Summary of Achievements in Technology, Services, and Standards

    NASA Astrophysics Data System (ADS)

    Baumann, Peter

    2015-04-01

    Big Data in the Earth sciences, the Tera- to Exabyte archives, mostly are made up from coverage data, according to ISO and OGC defined as the digital representation of some space-time varying phenomenon. Common examples include 1-D sensor timeseries, 2-D remote sensing imagery, 3D x/y/t image timese ries and x/y/z geology data, and 4-D x/y/z/t atmosphere and ocean data. Analytics on such data requires on-demand processing of sometimes significant complexity, such as getting the Fourier transform of satellite images. As network bandwidth limits prohibit transfer of such Big Data it is indispensable to devise protocols allowing clients to task flexible and fast processing on the server. The transatlantic EarthServer initiative, running from 2011 through 2014, has united 11 partners to establish Big Earth Data Analytics. A key ingredient has been flexibility for users to ask whatever they want, not impeded and complicated by system internals. The EarthServer answer to this is to use high-level, standards-based query languages which unify data and metadata search in a simple, yet powerful way. A second key ingredient is scalability. Without any doubt, scalability ultimately can only be achieved through parallelization. In the past, parallelizing cod e has been done at compile time and usually with manual intervention. The EarthServer approach is to perform a samentic-based dynamic distribution of queries fragments based on networks optimization and further criteria. The EarthServer platform is comprised by rasdaman, the pioneer and leading Array DBMS built for any-size multi-dimensional raster data being extended with support for irregular grids and general meshes; in-situ retrieval (evaluation of database queries on existing archive structures, avoiding data import and, hence, duplication); the aforementioned distributed query processing. Additionally, Web clients for multi-dimensional data visualization are being established. Client/server interfaces are strictly based on OGC and W3C standards, in particular the Web Coverage Processing Service (WCPS) which defines a high-level coverage query language. Reviewers have attested EarthServer that "With no doubt the project has been shaping the Big Earth Data landscape through the standardization activities within OGC, ISO and beyond". We present the project approach, its outcomes and impact on standardization and Big Data technology, and vistas for the future.

  1. Remembering the Important Things: Semantic Importance in Stream Reasoning

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yan, Rui; Greaves, Mark T.; Smith, William P.

    Reasoning and querying over data streams rely on the abil- ity to deliver a sequence of stream snapshots to the processing algo- rithms. These snapshots are typically provided using windows as views into streams and associated window management strategies. Generally, the goal of any window management strategy is to preserve the most im- portant data in the current window and preferentially evict the rest, so that the retained data can continue to be exploited. A simple timestamp- based strategy is rst-in-rst-out (FIFO), in which items are replaced in strict order of arrival. All timestamp-based strategies implicitly assume that a temporalmore » ordering reliably re ects importance to the processing task at hand, and thus that window management using timestamps will maximize the ability of the processing algorithms to deliver accurate interpretations of the stream. In this work, we explore a general no- tion of semantic importance that can be used for window management for streams of RDF data using semantically-aware processing algorithms like deduction or semantic query. Semantic importance exploits the infor- mation carried in RDF and surrounding ontologies for ranking window data in terms of its likely contribution to the processing algorithms. We explore the general semantic categories of query contribution, prove- nance, and trustworthiness, as well as the contribution of domain-specic ontologies. We describe how these categories behave using several con- crete examples. Finally, we consider how a stream window management strategy based on semantic importance could improve overall processing performance, especially as available window sizes decrease.« less

  2. Query Auto-Completion Based on Word2vec Semantic Similarity

    NASA Astrophysics Data System (ADS)

    Shao, Taihua; Chen, Honghui; Chen, Wanyu

    2018-04-01

    Query auto-completion (QAC) is the first step of information retrieval, which helps users formulate the entire query after inputting only a few prefixes. Regarding the models of QAC, the traditional method ignores the contribution from the semantic relevance between queries. However, similar queries always express extremely similar search intention. In this paper, we propose a hybrid model FS-QAC based on query semantic similarity as well as the query frequency. We choose word2vec method to measure the semantic similarity between intended queries and pre-submitted queries. By combining both features, our experiments show that FS-QAC model improves the performance when predicting the user’s query intention and helping formulate the right query. Our experimental results show that the optimal hybrid model contributes to a 7.54% improvement in terms of MRR against a state-of-the-art baseline using the public AOL query logs.

  3. Enhanced Approximate Nearest Neighbor via Local Area Focused Search.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gonzales, Antonio; Blazier, Nicholas Paul

    Approximate Nearest Neighbor (ANN) algorithms are increasingly important in machine learning, data mining, and image processing applications. There is a large family of space- partitioning ANN algorithms, such as randomized KD-Trees, that work well in practice but are limited by an exponential increase in similarity comparisons required to optimize recall. Additionally, they only support a small set of similarity metrics. We present Local Area Fo- cused Search (LAFS), a method that enhances the way queries are performed using an existing ANN index. Instead of a single query, LAFS performs a number of smaller (fewer similarity comparisons) queries and focuses onmore » a local neighborhood which is refined as candidates are identified. We show that our technique improves performance on several well known datasets and is easily extended to general similarity metrics using kernel projection techniques.« less

  4. An organizational framework and strategic implementation for system-level change to enhance research-based practice: QUERI Series

    PubMed Central

    Stetler, Cheryl B; McQueen, Lynn; Demakis, John; Mittman, Brian S

    2008-01-01

    Background The continuing gap between available evidence and current practice in health care reinforces the need for more effective solutions, in particular related to organizational context. Considerable advances have been made within the U.S. Veterans Health Administration (VA) in systematically implementing evidence into practice. These advances have been achieved through a system-level program focused on collaboration and partnerships among policy makers, clinicians, and researchers. The Quality Enhancement Research Initiative (QUERI) was created to generate research-driven initiatives that directly enhance health care quality within the VA and, simultaneously, contribute to the field of implementation science. This paradigm-shifting effort provided a natural laboratory for exploring organizational change processes. This article describes the underlying change framework and implementation strategy used to operationalize QUERI. Strategic approach to organizational change QUERI used an evidence-based organizational framework focused on three contextual elements: 1) cultural norms and values, in this case related to the role of health services researchers in evidence-based quality improvement; 2) capacity, in this case among researchers and key partners to engage in implementation research; 3) and supportive infrastructures to reinforce expectations for change and to sustain new behaviors as part of the norm. As part of a QUERI Series in Implementation Science, this article describes the framework's application in an innovative integration of health services research, policy, and clinical care delivery. Conclusion QUERI's experience and success provide a case study in organizational change. It demonstrates that progress requires a strategic, systems-based effort. QUERI's evidence-based initiative involved a deliberate cultural shift, requiring ongoing commitment in multiple forms and at multiple levels. VA's commitment to QUERI came in the form of visionary leadership, targeted allocation of resources, infrastructure refinements, innovative peer review and study methods, and direct involvement of key stakeholders. Stakeholders included both those providing and managing clinical care, as well as those producing relevant evidence within the health care system. The organizational framework and related implementation interventions used to achieve contextual change resulted in engaged investigators and enhanced uptake of research knowledge. QUERI's approach and progress provide working hypotheses for others pursuing similar system-wide efforts to routinely achieve evidence-based care. PMID:18510750

  5. SPLICE: A program to assemble partial query solutions from three-dimensional database searches into novel ligands

    NASA Astrophysics Data System (ADS)

    Ho, Chris M. W.; Marshall, Garland R.

    1993-12-01

    SPLICE is a program that processes partial query solutions retrieved from 3D, structural databases to generate novel, aggregate ligands. It is designed to interface with the database searching program FOUNDATION, which retrieves fragments containing any combination of a user-specified minimum number of matching query elements. SPLICE eliminates aspects of structures that are physically incapable of binding within the active site. Then, a systematic rule-based procedure is performed upon the remaining fragments to ensure receptor complementarity. All modifications are automated and remain transparent to the user. Ligands are then assembled by linking components into composite structures through overlapping bonds. As a control experiment, FOUNDATION and SPLICE were used to reconstruct a know HIV-1 protease inhibitor after it had been fragmented, reoriented, and added to a sham database of fifty different small molecules. To illustrate the capabilities of this program, a 3D search query containing the pharmacophoric elements of an aspartic proteinase-inhibitor crystal complex was searched using FOUNDATION against a subset of the Cambridge Structural Database. One hundred thirty-one compounds were retrieved, each containing any combination of at least four query elements. Compounds were automatically screened and edited for receptor complementarity. Numerous combinations of fragments were discovered that could be linked to form novel structures, containing a greater number of pharmacophoric elements than any single retrieved fragment.

  6. Improved Information Retrieval Performance on SQL Database Using Data Adapter

    NASA Astrophysics Data System (ADS)

    Husni, M.; Djanali, S.; Ciptaningtyas, H. T.; Wicaksana, I. G. N. A.

    2018-02-01

    The NoSQL databases, short for Not Only SQL, are increasingly being used as the number of big data applications increases. Most systems still use relational databases (RDBs), but as the number of data increases each year, the system handles big data with NoSQL databases to analyze and access data more quickly. NoSQL emerged as a result of the exponential growth of the internet and the development of web applications. The query syntax in the NoSQL database differs from the SQL database, therefore requiring code changes in the application. Data adapter allow applications to not change their SQL query syntax. Data adapters provide methods that can synchronize SQL databases with NotSQL databases. In addition, the data adapter provides an interface which is application can access to run SQL queries. Hence, this research applied data adapter system to synchronize data between MySQL database and Apache HBase using direct access query approach, where system allows application to accept query while synchronization process in progress. From the test performed using data adapter, the results obtained that the data adapter can synchronize between SQL databases, MySQL, and NoSQL database, Apache HBase. This system spends the percentage of memory resources in the range of 40% to 60%, and the percentage of processor moving from 10% to 90%. In addition, from this system also obtained the performance of database NoSQL better than SQL database.

  7. An Application Programming Interface for Synthetic Snowflake Particle Structure and Scattering Data

    NASA Technical Reports Server (NTRS)

    Lammers, Matthew; Kuo, Kwo-Sen

    2017-01-01

    The work by Kuo and colleagues on growing synthetic snowflakes and calculating their single-scattering properties has demonstrated great potential to improve the retrievals of snowfall. To grant colleagues flexible and targeted access to their large collection of sizes and shapes at fifteen (15) microwave frequencies, we have developed a web-based Application Programming Interface (API) integrated with NASA Goddard's Precipitation Processing System (PPS) Group. It is our hope that the API will enable convenient programmatic utilization of the database. To help users better understand the API's capabilities, we have developed an interactive web interface called the OpenSSP API Query Builder, which implements an intuitive system of mechanisms for selecting shapes, sizes, and frequencies to generate queries, with which the API can then extract and return data from the database. The Query Builder also allows for the specification of normalized particle size distributions by setting pertinent parameters, with which the API can also return mean geometric and scattering properties for each size bin. Additionally, the Query Builder interface enables downloading of raw scattering and particle structure data packages. This presentation will describe some of the challenges and successes associated with developing such an API. Examples of its usage will be shown both through downloading output and pulling it into a spreadsheet, as well as querying the API programmatically and working with the output in code.

  8. EquiX-A Search and Query Language for XML.

    ERIC Educational Resources Information Center

    Cohen, Sara; Kanza, Yaron; Kogan, Yakov; Sagiv, Yehoshua; Nutt, Werner; Serebrenik, Alexander

    2002-01-01

    Describes EquiX, a search language for XML that combines querying with searching to query the data and the meta-data content of Web pages. Topics include search engines; a data model for XML documents; search query syntax; search query semantics; an algorithm for evaluating a query on a document; and indexing EquiX queries. (LRW)

  9. [Human body meridian spatial decision support system for clinical treatment and teaching of acupuncture and moxibustion].

    PubMed

    Wu, Dehua

    2016-01-01

    The spatial position and distribution of human body meridian are expressed limitedly in the decision support system (DSS) of acupuncture and moxibustion at present, which leads to the failure to give the effective quantitative analysis on the spatial range and the difficulty for the decision-maker to provide a realistic spatial decision environment. Focusing on the limit spatial expression in DSS of acupuncture and moxibustion, it was proposed that on the basis of the geographic information system, in association of DSS technology, the design idea was developed on the human body meridian spatial DSS. With the 4-layer service-oriented architecture adopted, the data center integrated development platform was taken as the system development environment. The hierarchical organization was done for the spatial data of human body meridian via the directory tree. The structured query language (SQL) server was used to achieve the unified management of spatial data and attribute data. The technologies of architecture, configuration and plug-in development model were integrated to achieve the data inquiry, buffer analysis and program evaluation of the human body meridian spatial DSS. The research results show that the human body meridian spatial DSS could reflect realistically the spatial characteristics of the spatial position and distribution of human body meridian and met the constantly changeable demand of users. It has the powerful spatial analysis function and assists with the scientific decision in clinical treatment and teaching of acupuncture and moxibustion. It is the new attempt to the informatization research of human body meridian.

  10. An efficient compression scheme for bitmap indices

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wu, Kesheng; Otoo, Ekow J.; Shoshani, Arie

    2004-04-13

    When using an out-of-core indexing method to answer a query, it is generally assumed that the I/O cost dominates the overall query response time. Because of this, most research on indexing methods concentrate on reducing the sizes of indices. For bitmap indices, compression has been used for this purpose. However, in most cases, operations on these compressed bitmaps, mostly bitwise logical operations such as AND, OR, and NOT, spend more time in CPU than in I/O. To speedup these operations, a number of specialized bitmap compression schemes have been developed; the best known of which is the byte-aligned bitmap codemore » (BBC). They are usually faster in performing logical operations than the general purpose compression schemes, but, the time spent in CPU still dominates the total query response time. To reduce the query response time, we designed a CPU-friendly scheme named the word-aligned hybrid (WAH) code. In this paper, we prove that the sizes of WAH compressed bitmap indices are about two words per row for large range of attributes. This size is smaller than typical sizes of commonly used indices, such as a B-tree. Therefore, WAH compressed indices are not only appropriate for low cardinality attributes but also for high cardinality attributes.In the worst case, the time to operate on compressed bitmaps is proportional to the total size of the bitmaps involved. The total size of the bitmaps required to answer a query on one attribute is proportional to the number of hits. These indicate that WAH compressed bitmap indices are optimal. To verify their effectiveness, we generated bitmap indices for four different datasets and measured the response time of many range queries. Tests confirm that sizes of compressed bitmap indices are indeed smaller than B-tree indices, and query processing with WAH compressed indices is much faster than with BBC compressed indices, projection indices and B-tree indices. In addition, we also verified that the average query response time is proportional to the index size. This indicates that the compressed bitmap indices are efficient for very large datasets.« less

  11. The OrbitOutlook Data Archive

    NASA Astrophysics Data System (ADS)

    Czajkowski, M.; Shilliday, A.; LoFaso, N.; Dipon, A.; Van Brackle, D.

    2016-09-01

    In this paper, we describe and depict the Defense Advanced Research Projects Agency (DARPA)'s OrbitOutlook Data Archive (OODA) architecture. OODA is the infrastructure that DARPA's OrbitOutlook program has developed to integrate diverse data from various academic, commercial, government, and amateur space situational awareness (SSA) telescopes. At the heart of the OODA system is its world model - a distributed data store built to quickly query big data quantities of information spread out across multiple processing nodes and data centers. The world model applies a multi-index approach where each index is a distinct view on the data. This allows for analysts and analytics (algorithms) to access information through queries with a variety of terms that may be of interest to them. Our indices include: a structured global-graph view of knowledge, a keyword search of data content, an object-characteristic range search, and a geospatial-temporal orientation of spatially located data. In addition, the world model applies a federated approach by connecting to existing databases and integrating them into one single interface as a "one-stop shopping place" to access SSA information. In addition to the world model, OODA provides a processing platform for various analysts to explore and analytics to execute upon this data. Analytic algorithms can use OODA to take raw data and build information from it. They can store these products back into the world model, allowing analysts to gain situational awareness with this information. Analysts in turn would help decision makers use this knowledge to address a wide range of SSA problems. OODA is designed to make it easy for software developers who build graphical user interfaces (GUIs) and algorithms to quickly get started with working with this data. This is done through a multi-language software development kit that includes multiple application program interfaces (APIs) and a data model with SSA concepts and terms such as: space observation, observable, measurable, metadata, track, space object, catalog, expectation, and maneuver.

  12. GenoQuery: a new querying module for functional annotation in a genomic warehouse

    PubMed Central

    Lemoine, Frédéric; Labedan, Bernard; Froidevaux, Christine

    2008-01-01

    Motivation: We have to cope with both a deluge of new genome sequences and a huge amount of data produced by high-throughput approaches used to exploit these genomic features. Crossing and comparing such heterogeneous and disparate data will help improving functional annotation of genomes. This requires designing elaborate integration systems such as warehouses for storing and querying these data. Results: We have designed a relational genomic warehouse with an original multi-layer architecture made of a databases layer and an entities layer. We describe a new querying module, GenoQuery, which is based on this architecture. We use the entities layer to define mixed queries. These mixed queries allow searching for instances of biological entities and their properties in the different databases, without specifying in which database they should be found. Accordingly, we further introduce the central notion of alternative queries. Such queries have the same meaning as the original mixed queries, while exploiting complementarities yielded by the various integrated databases of the warehouse. We explain how GenoQuery computes all the alternative queries of a given mixed query. We illustrate how useful this querying module is by means of a thorough example. Availability: http://www.lri.fr/~lemoine/GenoQuery/ Contact: chris@lri.fr, lemoine@lri.fr PMID:18586731

  13. XAssist: A System for the Automation of X-ray Astrophysics Analysis

    NASA Astrophysics Data System (ADS)

    Ptak, A.

    2004-08-01

    XAssist is a NASA AISR-funded project for the automation of X-ray astrophysics. It is capable of data reprocessing, source detection, and preliminary spatial, temporal and spectral analysis for each source with sufficient counts. The bulk of the system is written in Python, which in turn drives underlying software (CIAO for Chandra data, etc.). Future work will include a GUI (mainly for beginners and status monitoring) and the exposure of at least some functionality as web services. The latter will help XAssist to eventually become part of the VO, making advanced queries possible, such as determining the X-ray fluxes of counterparts to HST or SDSS sources (including the use of unpublished X-ray data), and add the ability of ``on-the-fly'' X-ray processing. Pipelines are running on Chandra and XMM-Newton observations of galaxies to demonstrate XAssist's capabilities, and the results are available online (in real time) at http://www.xassist.org. XAssist itself as well as various associated projects are available for download.

  14. Spatial aspects of building and population exposure data and their implications for global earthquake exposure modeling

    USGS Publications Warehouse

    Dell’Acqua, F.; Gamba, P.; Jaiswal, K.

    2012-01-01

    This paper discusses spatial aspects of the global exposure dataset and mapping needs for earthquake risk assessment. We discuss this in the context of development of a Global Exposure Database for the Global Earthquake Model (GED4GEM), which requires compilation of a multi-scale inventory of assets at risk, for example, buildings, populations, and economic exposure. After defining the relevant spatial and geographic scales of interest, different procedures are proposed to disaggregate coarse-resolution data, to map them, and if necessary to infer missing data by using proxies. We discuss the advantages and limitations of these methodologies and detail the potentials of utilizing remote-sensing data. The latter is used especially to homogenize an existing coarser dataset and, where possible, replace it with detailed information extracted from remote sensing using the built-up indicators for different environments. Present research shows that the spatial aspects of earthquake risk computation are tightly connected with the availability of datasets of the resolution necessary for producing sufficiently detailed exposure. The global exposure database designed by the GED4GEM project is able to manage datasets and queries of multiple spatial scales.

  15. EMAGE mouse embryo spatial gene expression database: 2010 update

    PubMed Central

    Richardson, Lorna; Venkataraman, Shanmugasundaram; Stevenson, Peter; Yang, Yiya; Burton, Nicholas; Rao, Jianguo; Fisher, Malcolm; Baldock, Richard A.; Davidson, Duncan R.; Christiansen, Jeffrey H.

    2010-01-01

    EMAGE (http://www.emouseatlas.org/emage) is a freely available online database of in situ gene expression patterns in the developing mouse embryo. Gene expression domains from raw images are extracted and integrated spatially into a set of standard 3D virtual mouse embryos at different stages of development, which allows data interrogation by spatial methods. An anatomy ontology is also used to describe sites of expression, which allows data to be queried using text-based methods. Here, we describe recent enhancements to EMAGE including: the release of a completely re-designed website, which offers integration of many different search functions in HTML web pages, improved user feedback and the ability to find similar expression patterns at the click of a button; back-end refactoring from an object oriented to relational architecture, allowing associated SQL access; and the provision of further access by standard formatted URLs and a Java API. We have also increased data coverage by sourcing from a greater selection of journals and developed automated methods for spatial data annotation that are being applied to spatially incorporate the genome-wide (∼19 000 gene) ‘EURExpress’ dataset into EMAGE. PMID:19767607

  16. SPARK: Adapting Keyword Query to Semantic Search

    NASA Astrophysics Data System (ADS)

    Zhou, Qi; Wang, Chong; Xiong, Miao; Wang, Haofen; Yu, Yong

    Semantic search promises to provide more accurate result than present-day keyword search. However, progress with semantic search has been delayed due to the complexity of its query languages. In this paper, we explore a novel approach of adapting keywords to querying the semantic web: the approach automatically translates keyword queries into formal logic queries so that end users can use familiar keywords to perform semantic search. A prototype system named 'SPARK' has been implemented in light of this approach. Given a keyword query, SPARK outputs a ranked list of SPARQL queries as the translation result. The translation in SPARK consists of three major steps: term mapping, query graph construction and query ranking. Specifically, a probabilistic query ranking model is proposed to select the most likely SPARQL query. In the experiment, SPARK achieved an encouraging translation result.

  17. The agent-based spatial information semantic grid

    NASA Astrophysics Data System (ADS)

    Cui, Wei; Zhu, YaQiong; Zhou, Yong; Li, Deren

    2006-10-01

    Analyzing the characteristic of multi-Agent and geographic Ontology, The concept of the Agent-based Spatial Information Semantic Grid (ASISG) is defined and the architecture of the ASISG is advanced. ASISG is composed with Multi-Agents and geographic Ontology. The Multi-Agent Systems are composed with User Agents, General Ontology Agent, Geo-Agents, Broker Agents, Resource Agents, Spatial Data Analysis Agents, Spatial Data Access Agents, Task Execution Agent and Monitor Agent. The architecture of ASISG have three layers, they are the fabric layer, the grid management layer and the application layer. The fabric layer what is composed with Data Access Agent, Resource Agent and Geo-Agent encapsulates the data of spatial information system so that exhibits a conceptual interface for the Grid management layer. The Grid management layer, which is composed with General Ontology Agent, Task Execution Agent and Monitor Agent and Data Analysis Agent, used a hybrid method to manage all resources that were registered in a General Ontology Agent that is described by a General Ontology System. The hybrid method is assembled by resource dissemination and resource discovery. The resource dissemination push resource from Local Ontology Agent to General Ontology Agent and the resource discovery pull resource from the General Ontology Agent to Local Ontology Agents. The Local Ontology Agent is derived from special domain and describes the semantic information of local GIS. The nature of the Local Ontology Agents can be filtrated to construct a virtual organization what could provides a global scheme. The virtual organization lightens the burdens of guests because they need not search information site by site manually. The application layer what is composed with User Agent, Geo-Agent and Task Execution Agent can apply a corresponding interface to a domain user. The functions that ASISG should provide are: 1) It integrates different spatial information systems on the semantic The Grid management layer establishes a virtual environment that integrates seamlessly all GIS notes. 2) When the resource management system searches data on different spatial information systems, it transfers the meaning of different Local Ontology Agents rather than access data directly. So the ability of search and query can be said to be on the semantic level. 3) The data access procedure is transparent to guests, that is, they could access the information from remote site as current disk because the General Ontology Agent could automatically link data by the Data Agents that link the Ontology concept to GIS data. 4) The capability of processing massive spatial data. Storing, accessing and managing massive spatial data from TB to PB; efficiently analyzing and processing spatial data to produce model, information and knowledge; and providing 3D and multimedia visualization services. 5) The capability of high performance computing and processing on spatial information. Solving spatial problems with high precision, high quality, and on a large scale; and process spatial information in real time or on time, with high-speed and high efficiency. 6) The capability of sharing spatial resources. The distributed heterogeneous spatial information resources are Shared and realizing integrated and inter-operated on semantic level, so as to make best use of spatial information resources,such as computing resources, storage devices, spatial data (integrating from GIS, RS and GPS), spatial applications and services, GIS platforms, 7) The capability of integrating legacy GIS system. A ASISG can not only be used to construct new advanced spatial application systems, but also integrate legacy GIS system, so as to keep extensibility and inheritance and guarantee investment of users. 8) The capability of collaboration. Large-scale spatial information applications and services always involve different departments in different geographic places, so remote and uniform services are needed. 9) The capability of supporting integration of heterogeneous systems. Large-scale spatial information systems are always synthetically applications, so ASISG should provide interoperation and consistency through adopting open and applied technology standards. 10) The capability of adapting dynamic changes. Business requirements, application patterns, management strategies, and IT products always change endlessly for any departments, so ASISG should be self-adaptive. Two examples are provided in this paper, those examples provide a detailed way on how you design your semantic grid based on Multi-Agent systems and Ontology. In conclusion, the semantic grid of spatial information system could improve the ability of the integration and interoperability of spatial information grid.

  18. Searching for rare diseases in PubMed: a blind comparison of Orphanet expert query and query based on terminological knowledge.

    PubMed

    Griffon, N; Schuers, M; Dhombres, F; Merabti, T; Kerdelhué, G; Rollin, L; Darmoni, S J

    2016-08-02

    Despite international initiatives like Orphanet, it remains difficult to find up-to-date information about rare diseases. The aim of this study is to propose an exhaustive set of queries for PubMed based on terminological knowledge and to evaluate it versus the queries based on expertise provided by the most frequently used resource in Europe: Orphanet. Four rare disease terminologies (MeSH, OMIM, HPO and HRDO) were manually mapped to each other permitting the automatic creation of expended terminological queries for rare diseases. For 30 rare diseases, 30 citations retrieved by Orphanet expert query and/or query based on terminological knowledge were assessed for relevance by two independent reviewers unaware of the query's origin. An adjudication procedure was used to resolve any discrepancy. Precision, relative recall and F-measure were all computed. For each Orphanet rare disease (n = 8982), there was a corresponding terminological query, in contrast with only 2284 queries provided by Orphanet. Only 553 citations were evaluated due to queries with 0 or only a few hits. There were no significant differences between the Orpha query and terminological query in terms of precision, respectively 0.61 vs 0.52 (p = 0.13). Nevertheless, terminological queries retrieved more citations more often than Orpha queries (0.57 vs. 0.33; p = 0.01). Interestingly, Orpha queries seemed to retrieve older citations than terminological queries (p < 0.0001). The terminological queries proposed in this study are now currently available for all rare diseases. They may be a useful tool for both precision or recall oriented literature search.

  19. Representation and Integration of Scientific Information

    NASA Technical Reports Server (NTRS)

    1998-01-01

    The objective of this Joint Research Interchange with NASA-Ames was to investigate how the Tsimmis technology could be used to represent and integrate scientific information. The main goal of the Tsimmis project is to allow a decision maker to find information of interest from such sources, fuse it, and process it (e.g., summarize it, visualize it, discover trends). Another important goal is the easy incorporation of new sources, as well the ability to deal with sources whose structure or services evolve. During the Interchange we had research meetings approximately every month or two. The funds provided by NASA supported work that lead to the following two papers: Fusion Queries over Internet Databases; Efficient Query Subscription Processing in a Multicast Environment.

  20. Informatics Resources to Support Health Care Quality Improvement in the Veterans Health Administration

    PubMed Central

    Hynes, Denise M.; Perrin, Ruth A.; Rappaport, Steven; Stevens, Joanne M.; Demakis, John G.

    2004-01-01

    Information systems are increasingly important for measuring and improving health care quality. A number of integrated health care delivery systems use advanced information systems and integrated decision support to carry out quality assurance activities, but none as large as the Veterans Health Administration (VHA). The VHA's Quality Enhancement Research Initiative (QUERI) is a large-scale, multidisciplinary quality improvement initiative designed to ensure excellence in all areas where VHA provides health care services, including inpatient, outpatient, and long-term care settings. In this paper, we describe the role of information systems in the VHA QUERI process, highlight the major information systems critical to this quality improvement process, and discuss issues associated with the use of these systems. PMID:15187063

  1. A rapid prototyping/artificial intelligence approach to space station-era information management and access

    NASA Technical Reports Server (NTRS)

    Carnahan, Richard S., Jr.; Corey, Stephen M.; Snow, John B.

    1989-01-01

    Applications of rapid prototyping and Artificial Intelligence techniques to problems associated with Space Station-era information management systems are described. In particular, the work is centered on issues related to: (1) intelligent man-machine interfaces applied to scientific data user support, and (2) the requirement that intelligent information management systems (IIMS) be able to efficiently process metadata updates concerning types of data handled. The advanced IIMS represents functional capabilities driven almost entirely by the needs of potential users. Space Station-era scientific data projected to be generated is likely to be significantly greater than data currently processed and analyzed. Information about scientific data must be presented clearly, concisely, and with support features to allow users at all levels of expertise efficient and cost-effective data access. Additionally, mechanisms for allowing more efficient IIMS metadata update processes must be addressed. The work reported covers the following IIMS design aspects: IIMS data and metadata modeling, including the automatic updating of IIMS-contained metadata, IIMS user-system interface considerations, including significant problems associated with remote access, user profiles, and on-line tutorial capabilities, and development of an IIMS query and browse facility, including the capability to deal with spatial information. A working prototype has been developed and is being enhanced.

  2. An advanced web query interface for biological databases

    PubMed Central

    Latendresse, Mario; Karp, Peter D.

    2010-01-01

    Although most web-based biological databases (DBs) offer some type of web-based form to allow users to author DB queries, these query forms are quite restricted in the complexity of DB queries that they can formulate. They can typically query only one DB, and can query only a single type of object at a time (e.g. genes) with no possible interaction between the objects—that is, in SQL parlance, no joins are allowed between DB objects. Writing precise queries against biological DBs is usually left to a programmer skillful enough in complex DB query languages like SQL. We present a web interface for building precise queries for biological DBs that can construct much more precise queries than most web-based query forms, yet that is user friendly enough to be used by biologists. It supports queries containing multiple conditions, and connecting multiple object types without using the join concept, which is unintuitive to biologists. This interactive web interface is called the Structured Advanced Query Page (SAQP). Users interactively build up a wide range of query constructs. Interactive documentation within the SAQP describes the schema of the queried DBs. The SAQP is based on BioVelo, a query language based on list comprehension. The SAQP is part of the Pathway Tools software and is available as part of several bioinformatics web sites powered by Pathway Tools, including the BioCyc.org site that contains more than 500 Pathway/Genome DBs. PMID:20624715

  3. Partitioning medical image databases for content-based queries on a Grid.

    PubMed

    Montagnat, J; Breton, V; E Magnin, I

    2005-01-01

    In this paper we study the impact of executing a medical image database query application on the grid. For lowering the total computation time, the image database is partitioned into subsets to be processed on different grid nodes. A theoretical model of the application complexity and estimates of the grid execution overhead are used to efficiently partition the database. We show results demonstrating that smart partitioning of the database can lead to significant improvements in terms of total computation time. Grids are promising for content-based image retrieval in medical databases.

  4. Large-Scale Image Analytics Using Deep Learning

    NASA Astrophysics Data System (ADS)

    Ganguly, S.; Nemani, R. R.; Basu, S.; Mukhopadhyay, S.; Michaelis, A.; Votava, P.

    2014-12-01

    High resolution land cover classification maps are needed to increase the accuracy of current Land ecosystem and climate model outputs. Limited studies are in place that demonstrates the state-of-the-art in deriving very high resolution (VHR) land cover products. In addition, most methods heavily rely on commercial softwares that are difficult to scale given the region of study (e.g. continents to globe). Complexities in present approaches relate to (a) scalability of the algorithm, (b) large image data processing (compute and memory intensive), (c) computational cost, (d) massively parallel architecture, and (e) machine learning automation. In addition, VHR satellite datasets are of the order of terabytes and features extracted from these datasets are of the order of petabytes. In our present study, we have acquired the National Agricultural Imaging Program (NAIP) dataset for the Continental United States at a spatial resolution of 1-m. This data comes as image tiles (a total of quarter million image scenes with ~60 million pixels) and has a total size of ~100 terabytes for a single acquisition. Features extracted from the entire dataset would amount to ~8-10 petabytes. In our proposed approach, we have implemented a novel semi-automated machine learning algorithm rooted on the principles of "deep learning" to delineate the percentage of tree cover. In order to perform image analytics in such a granular system, it is mandatory to devise an intelligent archiving and query system for image retrieval, file structuring, metadata processing and filtering of all available image scenes. Using the Open NASA Earth Exchange (NEX) initiative, which is a partnership with Amazon Web Services (AWS), we have developed an end-to-end architecture for designing the database and the deep belief network (following the distbelief computing model) to solve a grand challenge of scaling this process across quarter million NAIP tiles that cover the entire Continental United States. The AWS core components that we use to solve this problem are DynamoDB along with S3 for database query and storage, ElastiCache shared memory architecture for image segmentation, Elastic Map Reduce (EMR) for image feature extraction, and the memory optimized Elastic Cloud Compute (EC2) for the learning algorithm.

  5. Executor Framework for DIRAC

    NASA Astrophysics Data System (ADS)

    Casajus Ramo, A.; Graciani Diaz, R.

    2012-12-01

    DIRAC framework for distributed computing has been designed as a group of collaborating components, agents and servers, with persistent database back-end. Components communicate with each other using DISET, an in-house protocol that provides Remote Procedure Call (RPC) and file transfer capabilities. This approach has provided DIRAC with a modular and stable design by enforcing stable interfaces across releases. But it made complicated to scale further with commodity hardware. To further scale DIRAC, components needed to send more queries between them. Using RPC to do so requires a lot of processing power just to handle the secure handshake required to establish the connection. DISET now provides a way to keep stable connections and send and receive queries between components. Only one handshake is required to send and receive any number of queries. Using this new communication mechanism DIRAC now provides a new type of component called Executor. Executors process any task (such as resolving the input data of a job) sent to them by a task dispatcher. This task dispatcher takes care of persisting the state of the tasks to the storage backend and distributing them among all the Executors based on the requirements of each task. In case of a high load, several Executors can be started to process the extra load and stop them once the tasks have been processed. This new approach of handling tasks in DIRAC makes Executors easy to replace and replicate, thus enabling DIRAC to further scale beyond the current approach based on polling agents.

  6. Applying Semantic Web Concepts to Support Net-Centric Warfare Using the Tactical Assessment Markup Language (TAML)

    DTIC Science & Technology

    2006-06-01

    SPARQL SPARQL Protocol and RDF Query Language SQL Structured Query Language SUMO Suggested Upper Merged Ontology SW... Query optimization algorithms are implemented in the Pellet reasoner in order to ensure querying a knowledge base is efficient . These algorithms...memory as a treelike structure in order for the data to be queried . XML Query (XQuery) is the standard language used when querying XML

  7. Preliminary Surficial Geologic Map of the Mesquite Lake 30' X 60' Quadrangle, California and Nevada

    USGS Publications Warehouse

    Schmidt, Kevin M.; McMackin, Matthew

    2006-01-01

    The Quaternary surficial geologic map of the Mesquite Lake, California-Nevada 30'X60' quadrangle depicts deposit age and geomorphic processes of erosion and deposition, as identified by a composite of remote sensing investigations, laboratory analyses, and field work, in the arid to semi-arid Mojave Desert area, straddling the California-Nevada border. Mapping was motivated by the need to address pressing scientific and social issues such as understanding and predicting the effects of climate and associated hydrologic changes, human impacts on landscapes, ecosystem function, and natural hazards at a regional scale. As the map area lies just to the south of Las Vegas, Nevada, a rapidly expanding urban center, land use pressures and the need for additional construction materials are forecasted for the region. The map contains information on the temporal and spatial patterns of surface processes and hazards that can be used to model specific landscape applications. Key features of the geologic map include: (1) spatially extensive Holocene alluvial deposits that compose the bulk of Quaternary units (~25%), (2) remote sensing and field studies that identified fault scarps or queried faults in the Kingston Wash area, Shadow Mountains, southern Pahrump Valley, Bird Spring Range, Lucy Gray Mountains and Piute Valley, (3) a lineament indicative of potential fault offset is located in Mesquite Valley, (4) active eolian dunes and sand ramps located on the east side of Mesquite, Ivanpah, and Hidden Valleys adjacent to playas, (4) groundwater discharge deposits in southern Pahrump Valley, Spring Mountains, and Lucy Gray Mountains and (5) debris-flow deposits spanning almost the entire Quaternary period in age.

  8. Open source GIS based tools to improve hydrochemical water resources management in EU H2020 FREEWAT platform

    NASA Astrophysics Data System (ADS)

    Criollo, Rotman; Velasco, Violeta; Vázquez-Suñé, Enric; Nardi, Albert; Marazuela, Miguel A.; Rossetto, Rudy; Borsi, Iacopo; Foglia, Laura; Cannata, Massimiliano; De Filippis, Giovanna

    2017-04-01

    Due to the general increase of water scarcity (Steduto et al., 2012), water quantity and quality must be well known to ensure a proper access to water resources in compliance with local and regional directives. This circumstance can be supported by tools which facilitate process of data management and its analysis. Such analyses have to provide research/professionals, policy makers and users with the ability to improve the management of the water resources with standard regulatory guidelines. Compliance with the established standard regulatory guidelines (with a special focus on requirement deriving from the GWD) should have an effective monitoring, evaluation, and interpretation of a large number of physical and chemical parameters. These amounts of datasets have to be assessed and interpreted: (i) integrating data from different sources and gathered with different data access techniques and formats; (ii) managing data with varying temporal and spatial extent; (iii) integrating groundwater quality information with other relevant information such as further hydrogeological data (Velasco et al., 2014) and pre-processing these data generally for the realization of groundwater models. In this context, the Hydrochemical Analysis Tools, akvaGIS Tools, has been implemented within the H2020 FREEWAT project; which aims to manage water resources by modelling water resource management in an open source GIS platform (QGIS desktop). The main goal of AkvaGIS Tools is to improve water quality analysis through different capabilities to improve the case study conceptual model managing all data related into its geospatial database (implemented in Spatialite) and a set of tools for improving the harmonization, integration, standardization, visualization and interpretation of the hydrochemical data. To achieve that, different commands cover a wide range of methodologies for querying, interpreting, and comparing groundwater quality data and facilitate the pre-processing analysis for being used in the realization of groundwater modelling. They include, ionic balance calculations, chemical time-series analysis, correlation of chemical parameters, and calculation of various common hydrochemical diagrams (Salinity, Schöeller-Berkaloff, Piper, and Stiff), among others. Furthermore, it allows the generation of maps of the spatial distributions of parameters and diagrams and thematic maps for the parameters measured and classified in the queried area. References: Rossetto R., Borsi I., Schifani C., Bonari E., Mogorovich P., Primicerio M. (2013). SID&GRID: Integrating hydrological modeling in GIS environment. Rendiconti Online Societa Geologica Italiana, Vol. 24, 282-283 Steduto, P., Faurès, J.M., Hoogeveen, J., Winpenny, J.T., Burke, J.J. (2012). Coping with water scarcity: an action framework for agriculture and food security. ISSN 1020-1203 ; 38 Velasco, V., Tubau, I., Vázquez-Suñé, E., Gogu, R., Gaitanaru, D., Alcaraz, M., Sanchez-Vila, X. (2014). GIS-based hydrogeochemical analysis tools (QUIMET). Computers & Geosciences, 70, 164-180.

  9. A study of medical and health queries to web search engines.

    PubMed

    Spink, Amanda; Yang, Yin; Jansen, Jim; Nykanen, Pirrko; Lorence, Daniel P; Ozmutlu, Seda; Ozmutlu, H Cenk

    2004-03-01

    This paper reports findings from an analysis of medical or health queries to different web search engines. We report results: (i). comparing samples of 10000 web queries taken randomly from 1.2 million query logs from the AlltheWeb.com and Excite.com commercial web search engines in 2001 for medical or health queries, (ii). comparing the 2001 findings from Excite and AlltheWeb.com users with results from a previous analysis of medical and health related queries from the Excite Web search engine for 1997 and 1999, and (iii). medical or health advice-seeking queries beginning with the word 'should'. Findings suggest: (i). a small percentage of web queries are medical or health related, (ii). the top five categories of medical or health queries were: general health, weight issues, reproductive health and puberty, pregnancy/obstetrics, and human relationships, and (iii). over time, the medical and health queries may have declined as a proportion of all web queries, as the use of specialized medical/health websites and e-commerce-related queries has increased. Findings provide insights into medical and health-related web querying and suggests some implications for the use of the general web search engines when seeking medical/health information.

  10. Monitoring Moving Queries inside a Safe Region

    PubMed Central

    Al-Khalidi, Haidar; Taniar, David; Alamri, Sultan

    2014-01-01

    With mobile moving range queries, there is a need to recalculate the relevant surrounding objects of interest whenever the query moves. Therefore, monitoring the moving query is very costly. The safe region is one method that has been proposed to minimise the communication and computation cost of continuously monitoring a moving range query. Inside the safe region the set of objects of interest to the query do not change; thus there is no need to update the query while it is inside its safe region. However, when the query leaves its safe region the mobile device has to reevaluate the query, necessitating communication with the server. Knowing when and where the mobile device will leave a safe region is widely known as a difficult problem. To solve this problem, we propose a novel method to monitor the position of the query over time using a linear function based on the direction of the query obtained by periodic monitoring of its position. Periodic monitoring ensures that the query is aware of its location all the time. This method reduces the costs associated with communications in client-server architecture. Computational results show that our method is successful in handling moving query patterns. PMID:24696652

  11. Text Information Extraction System (TIES) | Informatics Technology for Cancer Research (ITCR)

    Cancer.gov

    TIES is a service based software system for acquiring, deidentifying, and processing clinical text reports using natural language processing, and also for querying, sharing and using this data to foster tissue and image based research, within and between institutions.

  12. RDF-GL: A SPARQL-Based Graphical Query Language for RDF

    NASA Astrophysics Data System (ADS)

    Hogenboom, Frederik; Milea, Viorel; Frasincar, Flavius; Kaymak, Uzay

    This chapter presents RDF-GL, a graphical query language (GQL) for RDF. The GQL is based on the textual query language SPARQL and mainly focuses on SPARQL SELECT queries. The advantage of a GQL over textual query languages is that complexity is hidden through the use of graphical symbols. RDF-GL is supported by a Java-based editor, SPARQLinG, which is presented as well. The editor does not only allow for RDF-GL query creation, but also converts RDF-GL queries to SPARQL queries and is able to subsequently execute these. Experiments show that using the GQL in combination with the editor makes RDF querying more accessible for end users.

  13. Representation and Management of the Knowledge of Brittle Deformation in Shear Zones Using Microstructural Data From the SAFOD Core Samples

    NASA Astrophysics Data System (ADS)

    Babaie, H. A.; Broda, C. M.; Kumar, A.; Hadizadeh, J.

    2010-12-01

    Web access to data that represent knowledge acquired by investigators studying the microstructures in the core samples of the SAFOD (San Andreas Observatory at Depth) project can help scientists efficiently integrate and share knowledge, query the data, and update the knowledge base on the Web. To achieve this, we have used OWL (Web Ontology Language) to build the brittle deformation ontology for the microstructures observed in the SAFOD core samples, by explicitly formalizing the knowledge about deformational processes, geological objects undergoing deformation, and the underlying mechanical and environmental conditions in brittle shear zones. The developed Web-based ‘SAFOD Brittle Microstructure and Mechanics Knowledge base’ (SAFOD BM2KB), which instantiates this ontology and is available at http://codd.cs.gsu.edu:9999/safod/index.jsp, will host and serve data that pertains to spatial objects, such as microstructure, gouge, fault, and SEM image, acquired by the SAFOD investigators through the studies of the SAFOD core samples. Deformation in shear zones involves complex brittle and ductile processes that alter, create, and/or destroy a wide variety of one- to three-dimensional, multi-scale spatial entities such as rocks and their constituent minerals and structure. These processes occur through a series of sub-processes that happen in different time intervals, and affect the spatial objects at granular to regional scales within shear zones. The processes bring about qualitative change to the spatial entities over time intervals that start and end with events. Processes, such as mylonitization and cataclastic flow, change the spatial location, distribution, dimension, size, shape, and orientation of some objects through translation, rotation and strain. These processes may also result in newly formed entities, such as a new mineral, gouge, vein, or fault, during one or more phases of deformation. Deformation processes may also destroy entities, such as a mineral, fossil, or original structure. Laboratory investigations by the SAFOD scientists result in ever-increasing volumes of complex data related to different tectonic processes, deformed rocks, and structures. These data are often published in the tables of scientific articles or are stored in personal Excel worksheets or, in rare cases, in a network community database. It is extremely hard to integrate autonomously built databases distributed on the Web because of their heterogeneous schemas. As a closed world model, databases can only store and serve a finite set of static data that are known to be true. They cannot represent knowledge in a constantly changing, open world. In contrast, integration of scientific data and presentation of their underlying knowledge can be achieved through the use of Semantic Web technologies. These technologies are capable of handling an infinite supply of known and yet to be known facts due to their open world model. The inference rules of OWL and its underlying RDFS and RDF semantic languages allow formal and explicit specification of the theories and knowledge of a particular domain such as brittle deformation in shear zone.

  14. Protecting personal data in epidemiological research: DataSHIELD and UK law.

    PubMed

    Wallace, Susan E; Gaye, Amadou; Shoush, Osama; Burton, Paul R

    2014-01-01

    Data from individual collections, such as biobanks and cohort studies, are now being shared in order to create combined datasets which can be queried to ask complex scientific questions. But this sharing must be done with due regard for data protection principles. DataSHIELD is a new technology that queries nonaggregated, individual-level data in situ but returns query data in an anonymous format. This raises questions of the ability of DataSHIELD to adequately protect participant confidentiality. An ethico-legal analysis was conducted that examined each step of the DataSHIELD process from the perspective of UK case law, regulations, and guidance. DataSHIELD reaches agreed UK standards of protection for the sharing of biomedical data. All direct processing of personal data is conducted within the protected environment of the contributing study; participating studies have scientific, ethics, and data access approvals in place prior to the analysis; studies are clear that their consents conform with this use of data, and participants are informed that anonymisation for further disclosure will take place. DataSHIELD can provide a flexible means of interrogating data while protecting the participants' confidentiality in accordance with applicable legislation and guidance. © 2014 S. Karger AG, Basel.

  15. Mining Genotype-Phenotype Associations from Public Knowledge Sources via Semantic Web Querying.

    PubMed

    Kiefer, Richard C; Freimuth, Robert R; Chute, Christopher G; Pathak, Jyotishman

    2013-01-01

    Gene Wiki Plus (GeneWiki+) and the Online Mendelian Inheritance in Man (OMIM) are publicly available resources for sharing information about disease-gene and gene-SNP associations in humans. While immensely useful to the scientific community, both resources are manually curated, thereby making the data entry and publication process time-consuming, and to some degree, error-prone. To this end, this study investigates Semantic Web technologies to validate existing and potentially discover new genotype-phenotype associations in GWP and OMIM. In particular, we demonstrate the applicability of SPARQL queries for identifying associations not explicitly stated for commonly occurring chronic diseases in GWP and OMIM, and report our preliminary findings for coverage, completeness, and validity of the associations. Our results highlight the benefits of Semantic Web querying technology to validate existing disease-gene associations as well as identify novel associations although further evaluation and analysis is required before such information can be applied and used effectively.

  16. Bottom-Up Evaluation of Twig Join Pattern Queries in XML Document Databases

    NASA Astrophysics Data System (ADS)

    Chen, Yangjun

    Since the extensible markup language XML emerged as a new standard for information representation and exchange on the Internet, the problem of storing, indexing, and querying XML documents has been among the major issues of database research. In this paper, we study the twig pattern matching and discuss a new algorithm for processing ordered twig pattern queries. The time complexity of the algorithmis bounded by O(|D|·|Q| + |T|·leaf Q ) and its space overhead is by O(leaf T ·leaf Q ), where T stands for a document tree, Q for a twig pattern and D is a largest data stream associated with a node q of Q, which contains the database nodes that match the node predicate at q. leaf T (leaf Q ) represents the number of the leaf nodes of T (resp. Q). In addition, the algorithm can be adapted to an indexing environment with XB-trees being used.

  17. Flexible querying of Web data to simulate bacterial growth in food.

    PubMed

    Buche, Patrice; Couvert, Olivier; Dibie-Barthélemy, Juliette; Hignette, Gaëlle; Mettler, Eric; Soler, Lydie

    2011-06-01

    A preliminary step in microbial risk assessment in foods is the gathering of experimental data. In the framework of the Sym'Previus project, we have designed a complete data integration system opened on the Web which allows a local database to be complemented by data extracted from the Web and annotated using a domain ontology. We focus on the Web data tables as they contain, in general, a synthesis of data published in the documents. We propose in this paper a flexible querying system using the domain ontology to scan simultaneously local and Web data, this in order to feed the predictive modeling tools available on the Sym'Previus platform. Special attention is paid on the way fuzzy annotations associated with Web data are taken into account in the querying process, which is an important and original contribution of the proposed system. Copyright © 2010 Elsevier Ltd. All rights reserved.

  18. Parallel multi-join query optimization algorithm for distributed sensor network in the internet of things

    NASA Astrophysics Data System (ADS)

    Zheng, Yan

    2015-03-01

    Internet of things (IoT), focusing on providing users with information exchange and intelligent control, attracts a lot of attention of researchers from all over the world since the beginning of this century. IoT is consisted of large scale of sensor nodes and data processing units, and the most important features of IoT can be illustrated as energy confinement, efficient communication and high redundancy. With the sensor nodes increment, the communication efficiency and the available communication band width become bottle necks. Many research work is based on the instance which the number of joins is less. However, it is not proper to the increasing multi-join query in whole internet of things. To improve the communication efficiency between parallel units in the distributed sensor network, this paper proposed parallel query optimization algorithm based on distribution attributes cost graph. The storage information relations and the network communication cost are considered in this algorithm, and an optimized information changing rule is established. The experimental result shows that the algorithm has good performance, and it would effectively use the resource of each node in the distributed sensor network. Therefore, executive efficiency of multi-join query between different nodes could be improved.

  19. Foundations of a query and simulation system for the modeling of biochemical and biological processes.

    PubMed

    Antoniotti, M; Park, F; Policriti, A; Ugel, N; Mishra, B

    2003-01-01

    The analysis of large amounts of data, produced as (numerical) traces of in vivo, in vitro and in silico experiments, has become a central activity for many biologists and biochemists. Recent advances in the mathematical modeling and computation of biochemical systems have moreover increased the prominence of in silico experiments; such experiments typically involve the simulation of sets of Differential Algebraic Equations (DAE), e.g., Generalized Mass Action systems (GMA) and S-systems. In this paper we reason about the necessary theoretical and pragmatic foundations for a query and simulation system capable of analyzing large amounts of such trace data. To this end, we propose to combine in a novel way several well-known tools from numerical analysis (approximation theory), temporal logic and verification, and visualization. The result is a preliminary prototype system: simpathica/xssys. When dealing with simulation data simpathica/xssys exploits the special structure of the underlying DAE, and reduces the search space in an efficient way so as to facilitate any queries about the traces. The proposed system is designed to give the user possibility to systematically analyze and simultaneously query different possible timed evolutions of the modeled system.

  20. Using string alignment in a query-by-humming system for real world applications

    NASA Astrophysics Data System (ADS)

    Sailer, Christian

    2005-09-01

    Though query by humming (i.e., retrieving music or information about music by singing a characteristic melody) has been a popular research topic during the past decade, few approaches have reached a level of usefulness beyond mere scientific interest. One of the main problems is the inherent contradiction between error tolerance and dicriminative power in conventional melody matching algorithms that rely on a melody contour approach to handle intonation or transcription errors. Adopting the string matching/alignment techniques from bioinformatics to melody sequences allows to directly assess the similarity between two melodies. This method takes an MPEG-7 compliant melody sequence (i.e., a list of note intervals and length ratios) as query and evaluates the steps necessary to transform it into the reference sequence. By introducing a musically founded cost-of-replace function and an adequate post processing, this method yields a measure for melodic similarity. Thus it is possible to construct a query by humming system that can properly discriminate between thousands of melodies and still be sufficiently error tolerant to be used by untrained singers. The robustness has been verified in extensive tests and real world applications.

  1. GenoMetric Query Language: a novel approach to large-scale genomic data management.

    PubMed

    Masseroli, Marco; Pinoli, Pietro; Venco, Francesco; Kaitoua, Abdulrahman; Jalili, Vahid; Palluzzi, Fernando; Muller, Heiko; Ceri, Stefano

    2015-06-15

    Improvement of sequencing technologies and data processing pipelines is rapidly providing sequencing data, with associated high-level features, of many individual genomes in multiple biological and clinical conditions. They allow for data-driven genomic, transcriptomic and epigenomic characterizations, but require state-of-the-art 'big data' computing strategies, with abstraction levels beyond available tool capabilities. We propose a high-level, declarative GenoMetric Query Language (GMQL) and a toolkit for its use. GMQL operates downstream of raw data preprocessing pipelines and supports queries over thousands of heterogeneous datasets and samples; as such it is key to genomic 'big data' analysis. GMQL leverages a simple data model that provides both abstractions of genomic region data and associated experimental, biological and clinical metadata and interoperability between many data formats. Based on Hadoop framework and Apache Pig platform, GMQL ensures high scalability, expressivity, flexibility and simplicity of use, as demonstrated by several biological query examples on ENCODE and TCGA datasets. The GMQL toolkit is freely available for non-commercial use at http://www.bioinformatics.deib.polimi.it/GMQL/. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  2. elevatr: Access Elevation Data from Various APIs | Science ...

    EPA Pesticide Factsheets

    Several web services are available that provide access to elevation data. This package provides access to several of those services and returns elevation data either as a SpatialPointsDataFrame from point elevation services or as a raster object from raster elevation services. Currently, the package supports access to the Mapzen Elevation Service, Mapzen Terrain Service, and the USGS Elevation Point Query Service. The R language for statistical computing is increasingly used for spatial data analysis . This R package, elevatr, is in response to this and provides access to elevation data from various sources directly in R. The impact of `elevatr` is that it will 1) facilitate spatial analysis in R by providing access to foundational dataset for many types of analyses (e.g. hydrology, limnology) 2) open up a new set of users and uses for APIs widely used outside of R, and 3) provide an excellent example federal open source development as promoted by the Federal Source Code Policy (https://sourcecode.cio.gov/).

  3. Field Ground Truthing Data Collector - a Mobile Toolkit for Image Analysis and Processing

    NASA Astrophysics Data System (ADS)

    Meng, X.

    2012-07-01

    Field Ground Truthing Data Collector is one of the four key components of the NASA funded ICCaRS project, being developed in Southeast Michigan. The ICCaRS ground truthing toolkit entertains comprehensive functions: 1) Field functions, including determining locations through GPS, gathering and geo-referencing visual data, laying out ground control points for AEROKAT flights, measuring the flight distance and height, and entering observations of land cover (and use) and health conditions of ecosystems and environments in the vicinity of the flight field; 2) Server synchronization functions, such as, downloading study-area maps, aerial photos and satellite images, uploading and synchronizing field-collected data with the distributed databases, calling the geospatial web services on the server side to conduct spatial querying, image analysis and processing, and receiving the processed results in field for near-real-time validation; and 3) Social network communication functions for direct technical assistance and pedagogical support, e.g., having video-conference calls in field with the supporting educators, scientists, and technologists, participating in Webinars, or engaging discussions with other-learning portals. This customized software package is being built on Apple iPhone/iPad and Google Maps/Earth. The technical infrastructures, data models, coupling methods between distributed geospatial data processing and field data collector tools, remote communication interfaces, coding schema, and functional flow charts will be illustrated and explained at the presentation. A pilot case study will be also demonstrated.

  4. Cumulative query method for influenza surveillance using search engine data.

    PubMed

    Seo, Dong-Woo; Jo, Min-Woo; Sohn, Chang Hwan; Shin, Soo-Yong; Lee, JaeHo; Yu, Maengsoo; Kim, Won Young; Lim, Kyoung Soo; Lee, Sang-Il

    2014-12-16

    Internet search queries have become an important data source in syndromic surveillance system. However, there is currently no syndromic surveillance system using Internet search query data in South Korea. The objective of this study was to examine correlations between our cumulative query method and national influenza surveillance data. Our study was based on the local search engine, Daum (approximately 25% market share), and influenza-like illness (ILI) data from the Korea Centers for Disease Control and Prevention. A quota sampling survey was conducted with 200 participants to obtain popular queries. We divided the study period into two sets: Set 1 (the 2009/10 epidemiological year for development set 1 and 2010/11 for validation set 1) and Set 2 (2010/11 for development Set 2 and 2011/12 for validation Set 2). Pearson's correlation coefficients were calculated between the Daum data and the ILI data for the development set. We selected the combined queries for which the correlation coefficients were .7 or higher and listed them in descending order. Then, we created a cumulative query method n representing the number of cumulative combined queries in descending order of the correlation coefficient. In validation set 1, 13 cumulative query methods were applied, and 8 had higher correlation coefficients (min=.916, max=.943) than that of the highest single combined query. Further, 11 of 13 cumulative query methods had an r value of ≥.7, but 4 of 13 combined queries had an r value of ≥.7. In validation set 2, 8 of 15 cumulative query methods showed higher correlation coefficients (min=.975, max=.987) than that of the highest single combined query. All 15 cumulative query methods had an r value of ≥.7, but 6 of 15 combined queries had an r value of ≥.7. Cumulative query method showed relatively higher correlation with national influenza surveillance data than combined queries in the development and validation set.

  5. A Query Integrator and Manager for the Query Web

    PubMed Central

    Brinkley, James F.; Detwiler, Landon T.

    2012-01-01

    We introduce two concepts: the Query Web as a layer of interconnected queries over the document web and the semantic web, and a Query Web Integrator and Manager (QI) that enables the Query Web to evolve. QI permits users to write, save and reuse queries over any web accessible source, including other queries saved in other installations of QI. The saved queries may be in any language (e.g. SPARQL, XQuery); the only condition for interconnection is that the queries return their results in some form of XML. This condition allows queries to chain off each other, and to be written in whatever language is appropriate for the task. We illustrate the potential use of QI for several biomedical use cases, including ontology view generation using a combination of graph-based and logical approaches, value set generation for clinical data management, image annotation using terminology obtained from an ontology web service, ontology-driven brain imaging data integration, small-scale clinical data integration, and wider-scale clinical data integration. Such use cases illustrate the current range of applications of QI and lead us to speculate about the potential evolution from smaller groups of interconnected queries into a larger query network that layers over the document and semantic web. The resulting Query Web could greatly aid researchers and others who now have to manually navigate through multiple information sources in order to answer specific questions. PMID:22531831

  6. Optimizing SIEM Throughput on the Cloud Using Parallelization.

    PubMed

    Alam, Masoom; Ihsan, Asif; Khan, Muazzam A; Javaid, Qaisar; Khan, Abid; Manzoor, Jawad; Akhundzada, Adnan; Khan, Muhammad Khurram; Farooq, Sajid

    2016-01-01

    Processing large amounts of data in real time for identifying security issues pose several performance challenges, especially when hardware infrastructure is limited. Managed Security Service Providers (MSSP), mostly hosting their applications on the Cloud, receive events at a very high rate that varies from a few hundred to a couple of thousand events per second (EPS). It is critical to process this data efficiently, so that attacks could be identified quickly and necessary response could be initiated. This paper evaluates the performance of a security framework OSTROM built on the Esper complex event processing (CEP) engine under a parallel and non-parallel computational framework. We explain three architectures under which Esper can be used to process events. We investigated the effect on throughput, memory and CPU usage in each configuration setting. The results indicate that the performance of the engine is limited by the number of events coming in rather than the queries being processed. The architecture where 1/4th of the total events are submitted to each instance and all the queries are processed by all the units shows best results in terms of throughput, memory and CPU usage.

  7. Automatic Processing of Current Affairs Queries

    ERIC Educational Resources Information Center

    Salton, G.

    1973-01-01

    The SMART system is used for the analysis, search and retrieval of news stories appearing in Time'' magazine. A comparison is made between the automatic text processing methods incorporated into the SMART system and a manual search using the classified index to Time.'' (14 references) (Author)

  8. A novel methodology for querying web images

    NASA Astrophysics Data System (ADS)

    Prabhakara, Rashmi; Lee, Ching Cheng

    2005-01-01

    Ever since the advent of Internet, there has been an immense growth in the amount of image data that is available on the World Wide Web. With such a magnitude of image availability, an efficient and effective image retrieval system is required to make use of this information. This research presents an effective image matching and indexing technique that improvises on existing integrated image retrieval methods. The proposed technique follows a two-phase approach, integrating query by topic and query by example specification methods. The first phase consists of topic-based image retrieval using an improved text information retrieval (IR) technique that makes use of the structured format of HTML documents. It consists of a focused crawler that not only provides for the user to enter the keyword for the topic-based search but also, the scope in which the user wants to find the images. The second phase uses the query by example specification to perform a low-level content-based image match for the retrieval of smaller and relatively closer results of the example image. Information related to the image feature is automatically extracted from the query image by the image processing system. A technique that is not computationally intensive based on color feature is used to perform content-based matching of images. The main goal is to develop a functional image search and indexing system and to demonstrate that better retrieval results can be achieved with this proposed hybrid search technique.

  9. A novel methodology for querying web images

    NASA Astrophysics Data System (ADS)

    Prabhakara, Rashmi; Lee, Ching Cheng

    2004-12-01

    Ever since the advent of Internet, there has been an immense growth in the amount of image data that is available on the World Wide Web. With such a magnitude of image availability, an efficient and effective image retrieval system is required to make use of this information. This research presents an effective image matching and indexing technique that improvises on existing integrated image retrieval methods. The proposed technique follows a two-phase approach, integrating query by topic and query by example specification methods. The first phase consists of topic-based image retrieval using an improved text information retrieval (IR) technique that makes use of the structured format of HTML documents. It consists of a focused crawler that not only provides for the user to enter the keyword for the topic-based search but also, the scope in which the user wants to find the images. The second phase uses the query by example specification to perform a low-level content-based image match for the retrieval of smaller and relatively closer results of the example image. Information related to the image feature is automatically extracted from the query image by the image processing system. A technique that is not computationally intensive based on color feature is used to perform content-based matching of images. The main goal is to develop a functional image search and indexing system and to demonstrate that better retrieval results can be achieved with this proposed hybrid search technique.

  10. Improving integrative searching of systems chemical biology data using semantic annotation.

    PubMed

    Chen, Bin; Ding, Ying; Wild, David J

    2012-03-08

    Systems chemical biology and chemogenomics are considered critical, integrative disciplines in modern biomedical research, but require data mining of large, integrated, heterogeneous datasets from chemistry and biology. We previously developed an RDF-based resource called Chem2Bio2RDF that enabled querying of such data using the SPARQL query language. Whilst this work has proved useful in its own right as one of the first major resources in these disciplines, its utility could be greatly improved by the application of an ontology for annotation of the nodes and edges in the RDF graph, enabling a much richer range of semantic queries to be issued. We developed a generalized chemogenomics and systems chemical biology OWL ontology called Chem2Bio2OWL that describes the semantics of chemical compounds, drugs, protein targets, pathways, genes, diseases and side-effects, and the relationships between them. The ontology also includes data provenance. We used it to annotate our Chem2Bio2RDF dataset, making it a rich semantic resource. Through a series of scientific case studies we demonstrate how this (i) simplifies the process of building SPARQL queries, (ii) enables useful new kinds of queries on the data and (iii) makes possible intelligent reasoning and semantic graph mining in chemogenomics and systems chemical biology. Chem2Bio2OWL is available at http://chem2bio2rdf.org/owl. The document is available at http://chem2bio2owl.wikispaces.com.

  11. Meeting medical terminology needs--the Ontology-Enhanced Medical Concept Mapper.

    PubMed

    Leroy, G; Chen, H

    2001-12-01

    This paper describes the development and testing of the Medical Concept Mapper, a tool designed to facilitate access to online medical information sources by providing users with appropriate medical search terms for their personal queries. Our system is valuable for patients whose knowledge of medical vocabularies is inadequate to find the desired information, and for medical experts who search for information outside their field of expertise. The Medical Concept Mapper maps synonyms and semantically related concepts to a user's query. The system is unique because it integrates our natural language processing tool, i.e., the Arizona (AZ) Noun Phraser, with human-created ontologies, the Unified Medical Language System (UMLS) and WordNet, and our computer generated Concept Space, into one system. Our unique contribution results from combining the UMLS Semantic Net with Concept Space in our deep semantic parsing (DSP) algorithm. This algorithm establishes a medical query context based on the UMLS Semantic Net, which allows Concept Space terms to be filtered so as to isolate related terms relevant to the query. We performed two user studies in which Medical Concept Mapper terms were compared against human experts' terms. We conclude that the AZ Noun Phraser is well suited to extract medical phrases from user queries, that WordNet is not well suited to provide strictly medical synonyms, that the UMLS Metathesaurus is well suited to provide medical synonyms, and that Concept Space is well suited to provide related medical terms, especially when these terms are limited by our DSP algorithm.

  12. Querying archetype-based EHRs by search ontology-based XPath engineering.

    PubMed

    Kropf, Stefan; Uciteli, Alexandr; Schierle, Katrin; Krücken, Peter; Denecke, Kerstin; Herre, Heinrich

    2018-05-11

    Legacy data and new structured data can be stored in a standardized format as XML-based EHRs on XML databases. Querying documents on these databases is crucial for answering research questions. Instead of using free text searches, that lead to false positive results, the precision can be increased by constraining the search to certain parts of documents. A search ontology-based specification of queries on XML documents defines search concepts and relates them to parts in the XML document structure. Such query specification method is practically introduced and evaluated by applying concrete research questions formulated in natural language on a data collection for information retrieval purposes. The search is performed by search ontology-based XPath engineering that reuses ontologies and XML-related W3C standards. The key result is that the specification of research questions can be supported by the usage of search ontology-based XPath engineering. A deeper recognition of entities and a semantic understanding of the content is necessary for a further improvement of precision and recall. Key limitation is that the application of the introduced process requires skills in ontology and software development. In future, the time consuming ontology development could be overcome by implementing a new clinical role: the clinical ontologist. The introduced Search Ontology XML extension connects Search Terms to certain parts in XML documents and enables an ontology-based definition of queries. Search ontology-based XPath engineering can support research question answering by the specification of complex XPath expressions without deep syntax knowledge about XPaths.

  13. Selecting the Best Mobile Information Service with Natural Language User Input

    NASA Astrophysics Data System (ADS)

    Feng, Qiangze; Qi, Hongwei; Fukushima, Toshikazu

    Information services accessed via mobile phones provide information directly relevant to subscribers’ daily lives and are an area of dynamic market growth worldwide. Although many information services are currently offered by mobile operators, many of the existing solutions require a unique gateway for each service, and it is inconvenient for users to have to remember a large number of such gateways. Furthermore, the Short Message Service (SMS) is very popular in China and Chinese users would prefer to access these services in natural language via SMS. This chapter describes a Natural Language Based Service Selection System (NL3S) for use with a large number of mobile information services. The system can accept user queries in natural language and navigate it to the required service. Since it is difficult for existing methods to achieve high accuracy and high coverage and anticipate which other services a user might want to query, the NL3S is developed based on a Multi-service Ontology (MO) and Multi-service Query Language (MQL). The MO and MQL provide semantic and linguistic knowledge, respectively, to facilitate service selection for a user query and to provide adaptive service recommendations. Experiments show that the NL3S can achieve 75-95% accuracies and 85-95% satisfactions for processing various styles of natural language queries. A trial involving navigation of 30 different mobile services shows that the NL3S can provide a viable commercial solution for mobile operators.

  14. VIGOR: Interactive Visual Exploration of Graph Query Results.

    PubMed

    Pienta, Robert; Hohman, Fred; Endert, Alex; Tamersoy, Acar; Roundy, Kevin; Gates, Chris; Navathe, Shamkant; Chau, Duen Horng

    2018-01-01

    Finding patterns in graphs has become a vital challenge in many domains from biological systems, network security, to finance (e.g., finding money laundering rings of bankers and business owners). While there is significant interest in graph databases and querying techniques, less research has focused on helping analysts make sense of underlying patterns within a group of subgraph results. Visualizing graph query results is challenging, requiring effective summarization of a large number of subgraphs, each having potentially shared node-values, rich node features, and flexible structure across queries. We present VIGOR, a novel interactive visual analytics system, for exploring and making sense of query results. VIGOR uses multiple coordinated views, leveraging different data representations and organizations to streamline analysts sensemaking process. VIGOR contributes: (1) an exemplar-based interaction technique, where an analyst starts with a specific result and relaxes constraints to find other similar results or starts with only the structure (i.e., without node value constraints), and adds constraints to narrow in on specific results; and (2) a novel feature-aware subgraph result summarization. Through a collaboration with Symantec, we demonstrate how VIGOR helps tackle real-world problems through the discovery of security blindspots in a cybersecurity dataset with over 11,000 incidents. We also evaluate VIGOR with a within-subjects study, demonstrating VIGOR's ease of use over a leading graph database management system, and its ability to help analysts understand their results at higher speed and make fewer errors.

  15. Large Survey Database: A Distributed Framework for Storage and Analysis of Large Datasets

    NASA Astrophysics Data System (ADS)

    Juric, Mario

    2011-01-01

    The Large Survey Database (LSD) is a Python framework and DBMS for distributed storage, cross-matching and querying of large survey catalogs (>10^9 rows, >1 TB). The primary driver behind its development is the analysis of Pan-STARRS PS1 data. It is specifically optimized for fast queries and parallel sweeps of positionally and temporally indexed datasets. It transparently scales to more than >10^2 nodes, and can be made to function in "shared nothing" architectures. An LSD database consists of a set of vertically and horizontally partitioned tables, physically stored as compressed HDF5 files. Vertically, we partition the tables into groups of related columns ('column groups'), storing together logically related data (e.g., astrometry, photometry). Horizontally, the tables are partitioned into partially overlapping ``cells'' by position in space (lon, lat) and time (t). This organization allows for fast lookups based on spatial and temporal coordinates, as well as data and task distribution. The design was inspired by the success of Google BigTable (Chang et al., 2006). Our programming model is a pipelined extension of MapReduce (Dean and Ghemawat, 2004). An SQL-like query language is used to access data. For complex tasks, map-reduce ``kernels'' that operate on query results on a per-cell basis can be written, with the framework taking care of scheduling and execution. The combination leverages users' familiarity with SQL, while offering a fully distributed computing environment. LSD adds little overhead compared to direct Python file I/O. In tests, we sweeped through 1.1 Grows of PanSTARRS+SDSS data (220GB) less than 15 minutes on a dual CPU machine. In a cluster environment, we achieved bandwidths of 17Gbits/sec (I/O limited). Based on current experience, we believe LSD should scale to be useful for analysis and storage of LSST-scale datasets. It can be downloaded from http://mwscience.net/lsd.

  16. Using Generalized Annotated Programs to Solve Social Network Diffusion Optimization Problems

    DTIC Science & Technology

    2013-01-01

    as follows: —Let kall be the k value for the SNDOP-ALL query and for each SNDOP query i, let ki be the k for that query. For each query i, set ki... kall − 1. —Number each element of vi ∈ V such that gI(vi) and V C(vi) are true. For the ith SNDOP query, let vi be the corresponding element of V —Let...vertices of S. PROOF. We set up |V | SNDOP-queries as follows: —Let kall be the k value for the SNDOP-ALL query and and for each SNDOP-query i, let ki be

  17. A web-based data-querying tool based on ontology-driven methodology and flowchart-based model.

    PubMed

    Ping, Xiao-Ou; Chung, Yufang; Tseng, Yi-Ju; Liang, Ja-Der; Yang, Pei-Ming; Huang, Guan-Tarn; Lai, Feipei

    2013-10-08

    Because of the increased adoption rate of electronic medical record (EMR) systems, more health care records have been increasingly accumulating in clinical data repositories. Therefore, querying the data stored in these repositories is crucial for retrieving the knowledge from such large volumes of clinical data. The aim of this study is to develop a Web-based approach for enriching the capabilities of the data-querying system along the three following considerations: (1) the interface design used for query formulation, (2) the representation of query results, and (3) the models used for formulating query criteria. The Guideline Interchange Format version 3.5 (GLIF3.5), an ontology-driven clinical guideline representation language, was used for formulating the query tasks based on the GLIF3.5 flowchart in the Protégé environment. The flowchart-based data-querying model (FBDQM) query execution engine was developed and implemented for executing queries and presenting the results through a visual and graphical interface. To examine a broad variety of patient data, the clinical data generator was implemented to automatically generate the clinical data in the repository, and the generated data, thereby, were employed to evaluate the system. The accuracy and time performance of the system for three medical query tasks relevant to liver cancer were evaluated based on the clinical data generator in the experiments with varying numbers of patients. In this study, a prototype system was developed to test the feasibility of applying a methodology for building a query execution engine using FBDQMs by formulating query tasks using the existing GLIF. The FBDQM-based query execution engine was used to successfully retrieve the clinical data based on the query tasks formatted using the GLIF3.5 in the experiments with varying numbers of patients. The accuracy of the three queries (ie, "degree of liver damage," "degree of liver damage when applying a mutually exclusive setting," and "treatments for liver cancer") was 100% for all four experiments (10 patients, 100 patients, 1000 patients, and 10,000 patients). Among the three measured query phases, (1) structured query language operations, (2) criteria verification, and (3) other, the first two had the longest execution time. The ontology-driven FBDQM-based approach enriched the capabilities of the data-querying system. The adoption of the GLIF3.5 increased the potential for interoperability, shareability, and reusability of the query tasks.

  18. Towards Big Earth Data Analytics: The EarthServer Approach

    NASA Astrophysics Data System (ADS)

    Baumann, Peter

    2013-04-01

    Big Data in the Earth sciences, the Tera- to Exabyte archives, mostly are made up from coverage data whereby the term "coverage", according to ISO and OGC, is defined as the digital representation of some space-time varying phenomenon. Common examples include 1-D sensor timeseries, 2-D remote sensing imagery, 3D x/y/t image timeseries and x/y/z geology data, and 4-D x/y/z/t atmosphere and ocean data. Analytics on such data requires on-demand processing of sometimes significant complexity, such as getting the Fourier transform of satellite images. As network bandwidth limits prohibit transfer of such Big Data it is indispensable to devise protocols allowing clients to task flexible and fast processing on the server. The EarthServer initiative, funded by EU FP7 eInfrastructures, unites 11 partners from computer and earth sciences to establish Big Earth Data Analytics. One key ingredient is flexibility for users to ask what they want, not impeded and complicated by system internals. The EarthServer answer to this is to use high-level query languages; these have proven tremendously successful on tabular and XML data, and we extend them with a central geo data structure, multi-dimensional arrays. A second key ingredient is scalability. Without any doubt, scalability ultimately can only be achieved through parallelization. In the past, parallelizing code has been done at compile time and usually with manual intervention. The EarthServer approach is to perform a samentic-based dynamic distribution of queries fragments based on networks optimization and further criteria. The EarthServer platform is comprised by rasdaman, an Array DBMS enabling efficient storage and retrieval of any-size, any-type multi-dimensional raster data. In the project, rasdaman is being extended with several functionality and scalability features, including: support for irregular grids and general meshes; in-situ retrieval (evaluation of database queries on existing archive structures, avoiding data import and, hence, duplication); the aforementioned distributed query processing. Additionally, Web clients for multi-dimensional data visualization are being established. Client/server interfaces are strictly based on OGC and W3C standards, in particular the Web Coverage Processing Service (WCPS) which defines a high-level raster query language. We present the EarthServer project with its vision and approaches, relate it to the current state of standardization, and demonstrate it by way of large-scale data centers and their services using rasdaman.

  19. GSHR-Tree: a spatial index tree based on dynamic spatial slot and hash table in grid environments

    NASA Astrophysics Data System (ADS)

    Chen, Zhanlong; Wu, Xin-cai; Wu, Liang

    2008-12-01

    Computation Grids enable the coordinated sharing of large-scale distributed heterogeneous computing resources that can be used to solve computationally intensive problems in science, engineering, and commerce. Grid spatial applications are made possible by high-speed networks and a new generation of Grid middleware that resides between networks and traditional GIS applications. The integration of the multi-sources and heterogeneous spatial information and the management of the distributed spatial resources and the sharing and cooperative of the spatial data and Grid services are the key problems to resolve in the development of the Grid GIS. The performance of the spatial index mechanism is the key technology of the Grid GIS and spatial database affects the holistic performance of the GIS in Grid Environments. In order to improve the efficiency of parallel processing of a spatial mass data under the distributed parallel computing grid environment, this paper presents a new grid slot hash parallel spatial index GSHR-Tree structure established in the parallel spatial indexing mechanism. Based on the hash table and dynamic spatial slot, this paper has improved the structure of the classical parallel R tree index. The GSHR-Tree index makes full use of the good qualities of R-Tree and hash data structure. This paper has constructed a new parallel spatial index that can meet the needs of parallel grid computing about the magnanimous spatial data in the distributed network. This arithmetic splits space in to multi-slots by multiplying and reverting and maps these slots to sites in distributed and parallel system. Each sites constructs the spatial objects in its spatial slot into an R tree. On the basis of this tree structure, the index data was distributed among multiple nodes in the grid networks by using large node R-tree method. The unbalance during process can be quickly adjusted by means of a dynamical adjusting algorithm. This tree structure has considered the distributed operation, reduplication operation transfer operation of spatial index in the grid environment. The design of GSHR-Tree has ensured the performance of the load balance in the parallel computation. This tree structure is fit for the parallel process of the spatial information in the distributed network environments. Instead of spatial object's recursive comparison where original R tree has been used, the algorithm builds the spatial index by applying binary code operation in which computer runs more efficiently, and extended dynamic hash code for bit comparison. In GSHR-Tree, a new server is assigned to the network whenever a split of a full node is required. We describe a more flexible allocation protocol which copes with a temporary shortage of storage resources. It uses a distributed balanced binary spatial tree that scales with insertions to potentially any number of storage servers through splits of the overloaded ones. The application manipulates the GSHR-Tree structure from a node in the grid environment. The node addresses the tree through its image that the splits can make outdated. This may generate addressing errors, solved by the forwarding among the servers. In this paper, a spatial index data distribution algorithm that limits the number of servers has been proposed. We improve the storage utilization at the cost of additional messages. The structure of GSHR-Tree is believed that the scheme of this grid spatial index should fit the needs of new applications using endlessly larger sets of spatial data. Our proposal constitutes a flexible storage allocation method for a distributed spatial index. The insertion policy can be tuned dynamically to cope with periods of storage shortage. In such cases storage balancing should be favored for better space utilization, at the price of extra message exchanges between servers. This structure makes a compromise in the updating of the duplicated index and the transformation of the spatial index data. Meeting the needs of the grid computing, GSHRTree has a flexible structure in order to satisfy new needs in the future. The GSHR-Tree provides the R-tree capabilities for large spatial datasets stored over interconnected servers. The analysis, including the experiments, confirmed the efficiency of our design choices. The scheme should fit the needs of new applications of spatial data, using endlessly larger datasets. Using the system response time of the parallel processing of spatial scope query algorithm as the performance evaluation factor, According to the result of the simulated the experiments, GSHR-Tree is performed to prove the reasonable design and the high performance of the indexing structure that the paper presented.

  20. Toward An Unstructured Mesh Database

    NASA Astrophysics Data System (ADS)

    Rezaei Mahdiraji, Alireza; Baumann, Peter Peter

    2014-05-01

    Unstructured meshes are used in several application domains such as earth sciences (e.g., seismology), medicine, oceanography, cli- mate modeling, GIS as approximate representations of physical objects. Meshes subdivide a domain into smaller geometric elements (called cells) which are glued together by incidence relationships. The subdivision of a domain allows computational manipulation of complicated physical structures. For instance, seismologists model earthquakes using elastic wave propagation solvers on hexahedral meshes. The hexahedral con- tains several hundred millions of grid points and millions of hexahedral cells. Each vertex node in the hexahedrals stores a multitude of data fields. To run simulation on such meshes, one needs to iterate over all the cells, iterate over incident cells to a given cell, retrieve coordinates of cells, assign data values to cells, etc. Although meshes are used in many application domains, to the best of our knowledge there is no database vendor that support unstructured mesh features. Currently, the main tool for querying and manipulating unstructured meshes are mesh libraries, e.g., CGAL and GRAL. Mesh li- braries are dedicated libraries which includes mesh algorithms and can be run on mesh representations. The libraries do not scale with dataset size, do not have declarative query language, and need deep C++ knowledge for query implementations. Furthermore, due to high coupling between the implementations and input file structure, the implementations are less reusable and costly to maintain. A dedicated mesh database offers the following advantages: 1) declarative querying, 2) ease of maintenance, 3) hiding mesh storage structure from applications, and 4) transparent query optimization. To design a mesh database, the first challenge is to define a suitable generic data model for unstructured meshes. We proposed ImG-Complexes data model as a generic topological mesh data model which extends incidence graph model to multi-incidence relationships. We instrument ImG model with sets of optional and application-specific constraints which can be used to check validity of meshes for a specific class of object such as manifold, pseudo-manifold, and simplicial manifold. We conducted experiments to measure the performance of the graph database solution in processing mesh queries and compare it with GrAL mesh library and PostgreSQL database on synthetic and real mesh datasets. The experiments show that each system perform well on specific types of mesh queries, e.g., graph databases perform well on global path-intensive queries. In the future, we investigate database operations for the ImG model and design a mesh query language.

  1. Generating and Visualizing Climate Indices using Google Earth Engine

    NASA Astrophysics Data System (ADS)

    Erickson, T. A.; Guentchev, G.; Rood, R. B.

    2017-12-01

    Climate change is expected to have largest impacts on regional and local scales. Relevant and credible climate information is needed to support the planning and adaptation efforts in our communities. The volume of climate projections of temperature and precipitation is steadily increasing, as datasets are being generated on finer spatial and temporal grids with an increasing number of ensembles to characterize uncertainty. Despite advancements in tools for querying and retrieving subsets of these large, multi-dimensional datasets, ease of access remains a barrier for many existing and potential users who want to derive useful information from these data, particularly for those outside of the climate modelling research community. Climate indices, that can be derived from daily temperature and precipitation data, such as annual number of frost days or growing season length, can provide useful information to practitioners and stakeholders. For this work the NASA Earth Exchange Global Daily Downscaled Projections (NEX-GDDP) dataset was loaded into Google Earth Engine, a cloud-based geospatial processing platform. Algorithms that use the Earth Engine API to generate several climate indices were written. The indices were chosen from the set developed by the joint CCl/CLIVAR/JCOMM Expert Team on Climate Change Detection and Indices (ETCCDI). Simple user interfaces were created that allow users to query, produce maps and graphs of the indices, as well as download results for additional analyses. These browser-based interfaces could allow users in low-bandwidth environments to access climate information. This research shows that calculating climate indices from global downscaled climate projection datasets and sharing them widely using cloud computing technologies is feasible. Further development will focus on exposing the climate indices to existing applications via the Earth Engine API, and building custom user interfaces for presenting climate indices to a diverse set of user groups.

  2. Comparative Analysis of Online Health Queries Originating From Personal Computers and Smart Devices on a Consumer Health Information Portal

    PubMed Central

    Jadhav, Ashutosh; Andrews, Donna; Fiksdal, Alexander; Kumbamu, Ashok; McCormick, Jennifer B; Misitano, Andrew; Nelsen, Laurie; Ryu, Euijung; Sheth, Amit; Wu, Stephen

    2014-01-01

    Background The number of people using the Internet and mobile/smart devices for health information seeking is increasing rapidly. Although the user experience for online health information seeking varies with the device used, for example, smart devices (SDs) like smartphones/tablets versus personal computers (PCs) like desktops/laptops, very few studies have investigated how online health information seeking behavior (OHISB) may differ by device. Objective The objective of this study is to examine differences in OHISB between PCs and SDs through a comparative analysis of large-scale health search queries submitted through Web search engines from both types of devices. Methods Using the Web analytics tool, IBM NetInsight OnDemand, and based on the type of devices used (PCs or SDs), we obtained the most frequent health search queries between June 2011 and May 2013 that were submitted on Web search engines and directed users to the Mayo Clinic’s consumer health information website. We performed analyses on “Queries with considering repetition counts (QwR)” and “Queries without considering repetition counts (QwoR)”. The dataset contains (1) 2.74 million and 3.94 million QwoR, respectively for PCs and SDs, and (2) more than 100 million QwR for both PCs and SDs. We analyzed structural properties of the queries (length of the search queries, usage of query operators and special characters in health queries), types of search queries (keyword-based, wh-questions, yes/no questions), categorization of the queries based on health categories and information mentioned in the queries (gender, age-groups, temporal references), misspellings in the health queries, and the linguistic structure of the health queries. Results Query strings used for health information searching via PCs and SDs differ by almost 50%. The most searched health categories are “Symptoms” (1 in 3 search queries), “Causes”, and “Treatments & Drugs”. The distribution of search queries for different health categories differs with the device used for the search. Health queries tend to be longer and more specific than general search queries. Health queries from SDs are longer and have slightly fewer spelling mistakes than those from PCs. Users specify words related to women and children more often than that of men and any other age group. Most of the health queries are formulated using keywords; the second-most common are wh- and yes/no questions. Users ask more health questions using SDs than PCs. Almost all health queries have at least one noun and health queries from SDs are more descriptive than those from PCs. Conclusions This study is a large-scale comparative analysis of health search queries to understand the effects of device type (PCs vs SDs) used on OHISB. The study indicates that the device used for online health information search plays an important role in shaping how health information searches by consumers and patients are executed. PMID:25000537

  3. Comparative analysis of online health queries originating from personal computers and smart devices on a consumer health information portal.

    PubMed

    Jadhav, Ashutosh; Andrews, Donna; Fiksdal, Alexander; Kumbamu, Ashok; McCormick, Jennifer B; Misitano, Andrew; Nelsen, Laurie; Ryu, Euijung; Sheth, Amit; Wu, Stephen; Pathak, Jyotishman

    2014-07-04

    The number of people using the Internet and mobile/smart devices for health information seeking is increasing rapidly. Although the user experience for online health information seeking varies with the device used, for example, smart devices (SDs) like smartphones/tablets versus personal computers (PCs) like desktops/laptops, very few studies have investigated how online health information seeking behavior (OHISB) may differ by device. The objective of this study is to examine differences in OHISB between PCs and SDs through a comparative analysis of large-scale health search queries submitted through Web search engines from both types of devices. Using the Web analytics tool, IBM NetInsight OnDemand, and based on the type of devices used (PCs or SDs), we obtained the most frequent health search queries between June 2011 and May 2013 that were submitted on Web search engines and directed users to the Mayo Clinic's consumer health information website. We performed analyses on "Queries with considering repetition counts (QwR)" and "Queries without considering repetition counts (QwoR)". The dataset contains (1) 2.74 million and 3.94 million QwoR, respectively for PCs and SDs, and (2) more than 100 million QwR for both PCs and SDs. We analyzed structural properties of the queries (length of the search queries, usage of query operators and special characters in health queries), types of search queries (keyword-based, wh-questions, yes/no questions), categorization of the queries based on health categories and information mentioned in the queries (gender, age-groups, temporal references), misspellings in the health queries, and the linguistic structure of the health queries. Query strings used for health information searching via PCs and SDs differ by almost 50%. The most searched health categories are "Symptoms" (1 in 3 search queries), "Causes", and "Treatments & Drugs". The distribution of search queries for different health categories differs with the device used for the search. Health queries tend to be longer and more specific than general search queries. Health queries from SDs are longer and have slightly fewer spelling mistakes than those from PCs. Users specify words related to women and children more often than that of men and any other age group. Most of the health queries are formulated using keywords; the second-most common are wh- and yes/no questions. Users ask more health questions using SDs than PCs. Almost all health queries have at least one noun and health queries from SDs are more descriptive than those from PCs. This study is a large-scale comparative analysis of health search queries to understand the effects of device type (PCs vs. SDs) used on OHISB. The study indicates that the device used for online health information search plays an important role in shaping how health information searches by consumers and patients are executed.

  4. A Spatially-Registered, Massively Parallelised Data Structure for Interacting with Large, Integrated Geodatasets

    NASA Astrophysics Data System (ADS)

    Irving, D. H.; Rasheed, M.; O'Doherty, N.

    2010-12-01

    The efficient storage, retrieval and interactive use of subsurface data present great challenges in geodata management. Data volumes are typically massive, complex and poorly indexed with inadequate metadata. Derived geomodels and interpretations are often tightly bound in application-centric and proprietary formats; open standards for long-term stewardship are poorly developed. Consequently current data storage is a combination of: complex Logical Data Models (LDMs) based on file storage formats; 2D GIS tree-based indexing of spatial data; and translations of serialised memory-based storage techniques into disk-based storage. Whilst adequate for working at the mesoscale over a short timeframes, these approaches all possess technical and operational shortcomings: data model complexity; anisotropy of access; scalability to large and complex datasets; and weak implementation and integration of metadata. High performance hardware such as parallelised storage and Relational Database Management System (RDBMS) have long been exploited in many solutions but the underlying data structure must provide commensurate efficiencies to allow multi-user, multi-application and near-realtime data interaction. We present an open Spatially-Registered Data Structure (SRDS) built on Massively Parallel Processing (MPP) database architecture implemented by a ANSI SQL 2008 compliant RDBMS. We propose a LDM comprising a 3D Earth model that is decomposed such that each increasing Level of Detail (LoD) is achieved by recursively halving the bin size until it is less than the error in each spatial dimension for that data point. The value of an attribute at that point is stored as a property of that point and at that LoD. It is key to the numerical efficiency of the SRDS that it is under-pinned by a power-of-two relationship thus precluding the need for computationally intensive floating point arithmetic. Our approach employed a tightly clustered MPP array with small clusters of storage, processors and memory communicating over a high-speed network inter-connect. This is a shared-nothing architecture where resources are managed within each cluster unlike most other RDBMSs. Data are accessed on this architecture by their primary index values which utilises the hashing algorithm for point-to-point access. The hashing algorithm’s main role is the efficient distribution of data across the clusters based on the primary index. In this study we used 3D seismic volumes, 2D seismic profiles and borehole logs to demonstrate application in both (x,y,TWT) and (x,y,z)-space. In the SRDS the primary index is a composite column index of (x,y) to avoid invoking time-consuming full table scans as is the case in tree-based systems. This means that data access is isotropic. A query for data in a specified spatial range permits retrieval recursively by point-to-point queries within each nested LoD yielding true linear performance up to the Petabyte scale with hardware scaling presenting the primary limiting factor. Our architecture and LDM promotes: realtime interaction with massive data volumes; streaming of result sets and server-rendered 2D/3D imagery; rigorous workflow control and auditing; and in-database algorithms run directly against data as a HPC cloud service.

  5. The Localized Discovery and Recovery for Query Packet Losses in Wireless Sensor Networks with Distributed Detector Clusters

    PubMed Central

    Teng, Rui; Leibnitz, Kenji; Miura, Ryu

    2013-01-01

    An essential application of wireless sensor networks is to successfully respond to user queries. Query packet losses occur in the query dissemination due to wireless communication problems such as interference, multipath fading, packet collisions, etc. The losses of query messages at sensor nodes result in the failure of sensor nodes reporting the requested data. Hence, the reliable and successful dissemination of query messages to sensor nodes is a non-trivial problem. The target of this paper is to enable highly successful query delivery to sensor nodes by localized and energy-efficient discovery, and recovery of query losses. We adopt local and collective cooperation among sensor nodes to increase the success rate of distributed discoveries and recoveries. To enable the scalability in the operations of discoveries and recoveries, we employ a distributed name resolution mechanism at each sensor node to allow sensor nodes to self-detect the correlated queries and query losses, and then efficiently locally respond to the query losses. We prove that the collective discovery of query losses has a high impact on the success of query dissemination and reveal that scalability can be achieved by using the proposed approach. We further study the novel features of the cooperation and competition in the collective recovery at PHY and MAC layers, and show that the appropriate number of detectors can achieve optimal successful recovery rate. We evaluate the proposed approach with both mathematical analyses and computer simulations. The proposed approach enables a high rate of successful delivery of query messages and it results in short route lengths to recover from query losses. The proposed approach is scalable and operates in a fully distributed manner. PMID:23748172

  6. Partial automation of database processing of simulation outputs from L-systems models of plant morphogenesis.

    PubMed

    Chen, Yi- Ping Phoebe; Hanan, Jim

    2002-01-01

    Models of plant architecture allow us to explore how genotype environment interactions effect the development of plant phenotypes. Such models generate masses of data organised in complex hierarchies. This paper presents a generic system for creating and automatically populating a relational database from data generated by the widely used L-system approach to modelling plant morphogenesis. Techniques from compiler technology are applied to generate attributes (new fields) in the database, to simplify query development for the recursively-structured branching relationship. Use of biological terminology in an interactive query builder contributes towards making the system biologist-friendly.

  7. Data Parallel Bin-Based Indexing for Answering Queries on Multi-Core Architectures

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gosink, Luke; Wu, Kesheng; Bethel, E. Wes

    2009-06-02

    The multi-core trend in CPUs and general purpose graphics processing units (GPUs) offers new opportunities for the database community. The increase of cores at exponential rates is likely to affect virtually every server and client in the coming decade, and presents database management systems with a huge, compelling disruption that will radically change how processing is done. This paper presents a new parallel indexing data structure for answering queries that takes full advantage of the increasing thread-level parallelism emerging in multi-core architectures. In our approach, our Data Parallel Bin-based Index Strategy (DP-BIS) first bins the base data, and then partitionsmore » and stores the values in each bin as a separate, bin-based data cluster. In answering a query, the procedures for examining the bin numbers and the bin-based data clusters offer the maximum possible level of concurrency; each record is evaluated by a single thread and all threads are processed simultaneously in parallel. We implement and demonstrate the effectiveness of DP-BIS on two multi-core architectures: a multi-core CPU and a GPU. The concurrency afforded by DP-BIS allows us to fully utilize the thread-level parallelism provided by each architecture--for example, our GPU-based DP-BIS implementation simultaneously evaluates over 12,000 records with an equivalent number of concurrently executing threads. In comparing DP-BIS's performance across these architectures, we show that the GPU-based DP-BIS implementation requires significantly less computation time to answer a query than the CPU-based implementation. We also demonstrate in our analysis that DP-BIS provides better overall performance than the commonly utilized CPU and GPU-based projection index. Finally, due to data encoding, we show that DP-BIS accesses significantly smaller amounts of data than index strategies that operate solely on a column's base data; this smaller data footprint is critical for parallel processors that possess limited memory resources (e.g., GPUs).« less

  8. CGDM: collaborative genomic data model for molecular profiling data using NoSQL.

    PubMed

    Wang, Shicai; Mares, Mihaela A; Guo, Yi-Ke

    2016-12-01

    High-throughput molecular profiling has greatly improved patient stratification and mechanistic understanding of diseases. With the increasing amount of data used in translational medicine studies in recent years, there is a need to improve the performance of data warehouses in terms of data retrieval and statistical processing. Both relational and Key Value models have been used for managing molecular profiling data. Key Value models such as SeqWare have been shown to be particularly advantageous in terms of query processing speed for large datasets. However, more improvement can be achieved, particularly through better indexing techniques of the Key Value models, taking advantage of the types of queries which are specific for the high-throughput molecular profiling data. In this article, we introduce a Collaborative Genomic Data Model (CGDM), aimed at significantly increasing the query processing speed for the main classes of queries on genomic databases. CGDM creates three Collaborative Global Clustering Index Tables (CGCITs) to solve the velocity and variety issues at the cost of limited extra volume. Several benchmarking experiments were carried out, comparing CGDM implemented on HBase to the traditional SQL data model (TDM) implemented on both HBase and MySQL Cluster, using large publicly available molecular profiling datasets taken from NCBI and HapMap. In the microarray case, CGDM on HBase performed up to 246 times faster than TDM on HBase and 7 times faster than TDM on MySQL Cluster. In single nucleotide polymorphism case, CGDM on HBase outperformed TDM on HBase by up to 351 times and TDM on MySQL Cluster by up to 9 times. The CGDM source code is available at https://github.com/evanswang/CGDM. y.guo@imperial.ac.uk. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  9. Design of a graphical user interface for an intelligent multimedia information system for radiology research

    NASA Astrophysics Data System (ADS)

    Taira, Ricky K.; Wong, Clement; Johnson, David; Bhushan, Vikas; Rivera, Monica; Huang, Lu J.; Aberle, Denise R.; Cardenas, Alfonso F.; Chu, Wesley W.

    1995-05-01

    With the increase in the volume and distribution of images and text available in PACS and medical electronic health-care environments it becomes increasingly important to maintain indexes that summarize the content of these multi-media documents. Such indices are necessary to quickly locate relevant patient cases for research, patient management, and teaching. The goal of this project is to develop an intelligent document retrieval system that allows researchers to request for patient cases based on document content. Thus we wish to retrieve patient cases from electronic information archives that could include a combined specification of patient demographics, low level radiologic findings (size, shape, number), intermediate-level radiologic findings (e.g., atelectasis, infiltrates, etc.) and/or high-level pathology constraints (e.g., well-differentiated small cell carcinoma). The cases could be distributed among multiple heterogeneous databases such as PACS, RIS, and HIS. Content- based retrieval systems go beyond the capabilities of simple key-word or string-based retrieval matching systems. These systems require a knowledge base to comprehend the generality/specificity of a concept (thus knowing the subclasses or related concepts to a given concept) and knowledge of the various string representations for each concept (i.e., synonyms, lexical variants, etc.). We have previously reported on a data integration mediation layer that allows transparent access to multiple heterogeneous distributed medical databases (HIS, RIS, and PACS). The data access layer of our architecture currently has limited query processing capabilities. Given a patient hospital identification number, the access mediation layer collects all documents in RIS and HIS and returns this information to a specified workstation location. In this paper we report on our efforts to extend the query processing capabilities of the system by creation of custom query interfaces, an intelligent query processing engine, and a document-content index that can be generated automatically (i.e., no manual authoring or changes to the normal clinical protocols).

  10. The BioPrompt-box: an ontology-based clustering tool for searching in biological databases.

    PubMed

    Corsi, Claudio; Ferragina, Paolo; Marangoni, Roberto

    2007-03-08

    High-throughput molecular biology provides new data at an incredible rate, so that the increase in the size of biological databanks is enormous and very rapid. This scenario generates severe problems not only at indexing time, where suitable algorithmic techniques for data indexing and retrieval are required, but also at query time, since a user query may produce such a large set of results that their browsing and "understanding" becomes humanly impractical. This problem is well known to the Web community, where a new generation of Web search engines is being developed, like Vivisimo. These tools organize on-the-fly the results of a user query in a hierarchy of labeled folders that ease their browsing and knowledge extraction. We investigate this approach on biological data, and propose the so called The BioPrompt-boxsoftware system which deploys ontology-driven clustering strategies for making the searching process of biologists more efficient and effective. The BioPrompt-box (Bpb) defines a document as a biological sequence plus its associated meta-data taken from the underneath databank--like references to ontologies or to external databanks, and plain texts as comments of researchers and (title, abstracts or even body of) papers. Bpboffers several tools to customize the search and the clustering process over its indexed documents. The user can search a set of keywords within a specific field of the document schema, or can execute Blastto find documents relative to homologue sequences. In both cases the search task returns a set of documents (hits) which constitute the answer to the user query. Since the number of hits may be large, Bpbclusters them into groups of homogenous content, organized as a hierarchy of labeled clusters. The user can actually choose among several ontology-based hierarchical clustering strategies, each offering a different "view" of the returned hits. Bpbcomputes these views by exploiting the meta-data present within the retrieved documents such as the references to Gene Ontology, the taxonomy lineage, the organism and the keywords. Of course, the approach is flexible enough to leave room for future additions of other meta-information. The ultimate goal of the clustering process is to provide the user with several different readings of the (maybe numerous) query results and show possible hidden correlations among them, thus improving their browsing and understanding. Bpb is a powerful search engine that makes it very easy to perform complex queries over the indexed databanks (currently only UNIPROT is considered). The ontology-based clustering approach is efficient and effective, and could thus be applied successfully to larger databanks, like GenBank or EMBL.

  11. The BioPrompt-box: an ontology-based clustering tool for searching in biological databases

    PubMed Central

    Corsi, Claudio; Ferragina, Paolo; Marangoni, Roberto

    2007-01-01

    Background High-throughput molecular biology provides new data at an incredible rate, so that the increase in the size of biological databanks is enormous and very rapid. This scenario generates severe problems not only at indexing time, where suitable algorithmic techniques for data indexing and retrieval are required, but also at query time, since a user query may produce such a large set of results that their browsing and "understanding" becomes humanly impractical. This problem is well known to the Web community, where a new generation of Web search engines is being developed, like Vivisimo. These tools organize on-the-fly the results of a user query in a hierarchy of labeled folders that ease their browsing and knowledge extraction. We investigate this approach on biological data, and propose the so called The BioPrompt-boxsoftware system which deploys ontology-driven clustering strategies for making the searching process of biologists more efficient and effective. Results The BioPrompt-box (Bpb) defines a document as a biological sequence plus its associated meta-data taken from the underneath databank – like references to ontologies or to external databanks, and plain texts as comments of researchers and (title, abstracts or even body of) papers. Bpboffers several tools to customize the search and the clustering process over its indexed documents. The user can search a set of keywords within a specific field of the document schema, or can execute Blastto find documents relative to homologue sequences. In both cases the search task returns a set of documents (hits) which constitute the answer to the user query. Since the number of hits may be large, Bpbclusters them into groups of homogenous content, organized as a hierarchy of labeled clusters. The user can actually choose among several ontology-based hierarchical clustering strategies, each offering a different "view" of the returned hits. Bpbcomputes these views by exploiting the meta-data present within the retrieved documents such as the references to Gene Ontology, the taxonomy lineage, the organism and the keywords. Of course, the approach is flexible enough to leave room for future additions of other meta-information. The ultimate goal of the clustering process is to provide the user with several different readings of the (maybe numerous) query results and show possible hidden correlations among them, thus improving their browsing and understanding. Conclusion Bpb is a powerful search engine that makes it very easy to perform complex queries over the indexed databanks (currently only UNIPROT is considered). The ontology-based clustering approach is efficient and effective, and could thus be applied successfully to larger databanks, like GenBank or EMBL. PMID:17430575

  12. VISAGE: Interactive Visual Graph Querying.

    PubMed

    Pienta, Robert; Navathe, Shamkant; Tamersoy, Acar; Tong, Hanghang; Endert, Alex; Chau, Duen Horng

    2016-06-01

    Extracting useful patterns from large network datasets has become a fundamental challenge in many domains. We present VISAGE, an interactive visual graph querying approach that empowers users to construct expressive queries, without writing complex code (e.g., finding money laundering rings of bankers and business owners). Our contributions are as follows: (1) we introduce graph autocomplete , an interactive approach that guides users to construct and refine queries, preventing over-specification; (2) VISAGE guides the construction of graph queries using a data-driven approach, enabling users to specify queries with varying levels of specificity, from concrete and detailed (e.g., query by example), to abstract (e.g., with "wildcard" nodes of any types), to purely structural matching; (3) a twelve-participant, within-subject user study demonstrates VISAGE's ease of use and the ability to construct graph queries significantly faster than using a conventional query language; (4) VISAGE works on real graphs with over 468K edges, achieving sub-second response times for common queries.

  13. VISAGE: Interactive Visual Graph Querying

    PubMed Central

    Pienta, Robert; Navathe, Shamkant; Tamersoy, Acar; Tong, Hanghang; Endert, Alex; Chau, Duen Horng

    2017-01-01

    Extracting useful patterns from large network datasets has become a fundamental challenge in many domains. We present VISAGE, an interactive visual graph querying approach that empowers users to construct expressive queries, without writing complex code (e.g., finding money laundering rings of bankers and business owners). Our contributions are as follows: (1) we introduce graph autocomplete, an interactive approach that guides users to construct and refine queries, preventing over-specification; (2) VISAGE guides the construction of graph queries using a data-driven approach, enabling users to specify queries with varying levels of specificity, from concrete and detailed (e.g., query by example), to abstract (e.g., with “wildcard” nodes of any types), to purely structural matching; (3) a twelve-participant, within-subject user study demonstrates VISAGE’s ease of use and the ability to construct graph queries significantly faster than using a conventional query language; (4) VISAGE works on real graphs with over 468K edges, achieving sub-second response times for common queries. PMID:28553670

  14. A Visual Interface for Querying Heterogeneous Phylogenetic Databases.

    PubMed

    Jamil, Hasan M

    2017-01-01

    Despite the recent growth in the number of phylogenetic databases, access to these wealth of resources remain largely tool or form-based interface driven. It is our thesis that the flexibility afforded by declarative query languages may offer the opportunity to access these repositories in a better way, and to use such a language to pose truly powerful queries in unprecedented ways. In this paper, we propose a substantially enhanced closed visual query language, called PhyQL, that can be used to query phylogenetic databases represented in a canonical form. The canonical representation presented helps capture most phylogenetic tree formats in a convenient way, and is used as the storage model for our PhyloBase database for which PhyQL serves as the query language. We have implemented a visual interface for the end users to pose PhyQL queries using visual icons, and drag and drop operations defined over them. Once a query is posed, the interface translates the visual query into a Datalog query for execution over the canonical database. Responses are returned as hyperlinks to phylogenies that can be viewed in several formats using the tree viewers supported by PhyloBase. Results cached in PhyQL buffer allows secondary querying on the computed results making it a truly powerful querying architecture.

  15. Which factors predict the time spent answering queries to a drug information centre?

    PubMed Central

    Reppe, Linda A.; Spigset, Olav

    2010-01-01

    Objective To develop a model based upon factors able to predict the time spent answering drug-related queries to Norwegian drug information centres (DICs). Setting and method Drug-related queries received at 5 DICs in Norway from March to May 2007 were randomly assigned to 20 employees until each of them had answered a minimum of five queries. The employees reported the number of drugs involved, the type of literature search performed, and whether the queries were considered judgmental or not, using a specifically developed scoring system. Main outcome measures The scores of these three factors were added together to define a workload score for each query. Workload and its individual factors were subsequently related to the measured time spent answering the queries by simple or multiple linear regression analyses. Results Ninety-six query/answer pairs were analyzed. Workload significantly predicted the time spent answering the queries (adjusted R2 = 0.22, P < 0.001). Literature search was the individual factor best predicting the time spent answering the queries (adjusted R2 = 0.17, P < 0.001), and this variable also contributed the most in the multiple regression analyses. Conclusion The most important workload factor predicting the time spent handling the queries in this study was the type of literature search that had to be performed. The categorisation of queries as judgmental or not, also affected the time spent answering the queries. The number of drugs involved did not significantly influence the time spent answering drug information queries. PMID:20922480

  16. Personalized query suggestion based on user behavior

    NASA Astrophysics Data System (ADS)

    Chen, Wanyu; Hao, Zepeng; Shao, Taihua; Chen, Honghui

    Query suggestions help users refine their queries after they input an initial query. Previous work mainly concentrated on similarity-based and context-based query suggestion approaches. However, models that focus on adapting to a specific user (personalization) can help to improve the probability of the user being satisfied. In this paper, we propose a personalized query suggestion model based on users’ search behavior (UB model), where we inject relevance between queries and users’ search behavior into a basic probabilistic model. For the relevance between queries, we consider their semantical similarity and co-occurrence which indicates the behavior information from other users in web search. Regarding the current user’s preference to a query, we combine the user’s short-term and long-term search behavior in a linear fashion and deal with the data sparse problem with Bayesian probabilistic matrix factorization (BPMF). In particular, we also investigate the impact of different personalization strategies (the combination of the user’s short-term and long-term search behavior) on the performance of query suggestion reranking. We quantify the improvement of our proposed UB model against a state-of-the-art baseline using the public AOL query logs and show that it beats the baseline in terms of metrics used in query suggestion reranking. The experimental results show that: (i) for personalized ranking, users’ behavioral information helps to improve query suggestion effectiveness; and (ii) given a query, merging information inferred from the short-term and long-term search behavior of a particular user can result in a better performance than both plain approaches.

  17. Connecting Provenance with Semantic Descriptions in the NASA Earth Exchange (NEX)

    NASA Astrophysics Data System (ADS)

    Votava, P.; Michaelis, A.; Nemani, R. R.

    2012-12-01

    NASA Earth Exchange (NEX) is a data, modeling and knowledge collaboratory that houses NASA satellite data, climate data and ancillary data where a focused community may come together to share modeling and analysis codes, scientific results, knowledge and expertise on a centralized platform. Some of the main goals of NEX are transparency and repeatability and to that extent we have been adding components that enable tracking of provenance of both scientific processes and datasets produced by these processes. As scientific processes become more complex, they are often developed collaboratively and it becomes increasingly important for the research team to be able to track the development of the process and the datasets that are produced along the way. Additionally, we want to be able to link the processes and the datasets developed on NEX to an existing information and knowledge, so that the users can query and compare the provenance of any dataset or process with regard to the component-specific attributes such as data quality, geographic location, related publications, user comments and annotations etc. We have developed several ontologies that describe datasets and workflow components available on NEX using the OWL ontology language as well as a simple ontology that provides linking mechanism to the collected provenance information. The provenance is captured in two ways - we utilize existing provenance infrastructure of VisTrails, which is used as a workflow engine on NEX, and we extend the captured provenance using the PROV data model expressed through the PROV-O ontology. We do this in order to link and query the provenance easier in the context of the existing NEX information and knowledge. The captured provenance graph is processed and stored using RDFlib with MySQL backend that can be queried using either RDFLib or SPARQL. As a concrete example, we show how this information is captured during anomaly detection process in large satellite datasets.

  18. Estimating Influenza Outbreaks Using Both Search Engine Query Data and Social Media Data in South Korea.

    PubMed

    Woo, Hyekyung; Cho, Youngtae; Shim, Eunyoung; Lee, Jong-Koo; Lee, Chang-Gun; Kim, Seong Hwan

    2016-07-04

    As suggested as early as in 2006, logs of queries submitted to search engines seeking information could be a source for detection of emerging influenza epidemics if changes in the volume of search queries are monitored (infodemiology). However, selecting queries that are most likely to be associated with influenza epidemics is a particular challenge when it comes to generating better predictions. In this study, we describe a methodological extension for detecting influenza outbreaks using search query data; we provide a new approach for query selection through the exploration of contextual information gleaned from social media data. Additionally, we evaluate whether it is possible to use these queries for monitoring and predicting influenza epidemics in South Korea. Our study was based on freely available weekly influenza incidence data and query data originating from the search engine on the Korean website Daum between April 3, 2011 and April 5, 2014. To select queries related to influenza epidemics, several approaches were applied: (1) exploring influenza-related words in social media data, (2) identifying the chief concerns related to influenza, and (3) using Web query recommendations. Optimal feature selection by least absolute shrinkage and selection operator (Lasso) and support vector machine for regression (SVR) were used to construct a model predicting influenza epidemics. In total, 146 queries related to influenza were generated through our initial query selection approach. A considerable proportion of optimal features for final models were derived from queries with reference to the social media data. The SVR model performed well: the prediction values were highly correlated with the recent observed influenza-like illness (r=.956; P<.001) and virological incidence rate (r=.963; P<.001). These results demonstrate the feasibility of using search queries to enhance influenza surveillance in South Korea. In addition, an approach for query selection using social media data seems ideal for supporting influenza surveillance based on search query data.

  19. Estimating Influenza Outbreaks Using Both Search Engine Query Data and Social Media Data in South Korea

    PubMed Central

    Woo, Hyekyung; Shim, Eunyoung; Lee, Jong-Koo; Lee, Chang-Gun; Kim, Seong Hwan

    2016-01-01

    Background As suggested as early as in 2006, logs of queries submitted to search engines seeking information could be a source for detection of emerging influenza epidemics if changes in the volume of search queries are monitored (infodemiology). However, selecting queries that are most likely to be associated with influenza epidemics is a particular challenge when it comes to generating better predictions. Objective In this study, we describe a methodological extension for detecting influenza outbreaks using search query data; we provide a new approach for query selection through the exploration of contextual information gleaned from social media data. Additionally, we evaluate whether it is possible to use these queries for monitoring and predicting influenza epidemics in South Korea. Methods Our study was based on freely available weekly influenza incidence data and query data originating from the search engine on the Korean website Daum between April 3, 2011 and April 5, 2014. To select queries related to influenza epidemics, several approaches were applied: (1) exploring influenza-related words in social media data, (2) identifying the chief concerns related to influenza, and (3) using Web query recommendations. Optimal feature selection by least absolute shrinkage and selection operator (Lasso) and support vector machine for regression (SVR) were used to construct a model predicting influenza epidemics. Results In total, 146 queries related to influenza were generated through our initial query selection approach. A considerable proportion of optimal features for final models were derived from queries with reference to the social media data. The SVR model performed well: the prediction values were highly correlated with the recent observed influenza-like illness (r=.956; P<.001) and virological incidence rate (r=.963; P<.001). Conclusions These results demonstrate the feasibility of using search queries to enhance influenza surveillance in South Korea. In addition, an approach for query selection using social media data seems ideal for supporting influenza surveillance based on search query data. PMID:27377323

  20. Lost in translation? A multilingual Query Builder improves the quality of PubMed queries: a randomised controlled trial.

    PubMed

    Schuers, Matthieu; Joulakian, Mher; Kerdelhué, Gaetan; Segas, Léa; Grosjean, Julien; Darmoni, Stéfan J; Griffon, Nicolas

    2017-07-03

    MEDLINE is the most widely used medical bibliographic database in the world. Most of its citations are in English and this can be an obstacle for some researchers to access the information the database contains. We created a multilingual query builder to facilitate access to the PubMed subset using a language other than English. The aim of our study was to assess the impact of this multilingual query builder on the quality of PubMed queries for non-native English speaking physicians and medical researchers. A randomised controlled study was conducted among French speaking general practice residents. We designed a multi-lingual query builder to facilitate information retrieval, based on available MeSH translations and providing users with both an interface and a controlled vocabulary in their own language. Participating residents were randomly allocated either the French or the English version of the query builder. They were asked to translate 12 short medical questions into MeSH queries. The main outcome was the quality of the query. Two librarians blind to the arm independently evaluated each query, using a modified published classification that differentiated eight types of errors. Twenty residents used the French version of the query builder and 22 used the English version. 492 queries were analysed. There were significantly more perfect queries in the French group vs. the English group (respectively 37.9% vs. 17.9%; p < 0.01). It took significantly more time for the members of the English group than the members of the French group to build each query, respectively 194 sec vs. 128 sec; p < 0.01. This multi-lingual query builder is an effective tool to improve the quality of PubMed queries in particular for researchers whose first language is not English.

  1. Development of the Instructional Model by Integrating Information Literacy in the Class Learning and Teaching Processes

    ERIC Educational Resources Information Center

    Maitaouthong, Therdsak; Tuamsuk, Kulthida; Techamanee, Yupin

    2011-01-01

    This study was aimed at developing an instructional model by integrating information literacy in the instructional process of general education courses at an undergraduate level. The research query, "What is the teaching methodology that integrates information literacy in the instructional process of general education courses at an undergraduate…

  2. A Web-Based Data-Querying Tool Based on Ontology-Driven Methodology and Flowchart-Based Model

    PubMed Central

    Ping, Xiao-Ou; Chung, Yufang; Liang, Ja-Der; Yang, Pei-Ming; Huang, Guan-Tarn; Lai, Feipei

    2013-01-01

    Background Because of the increased adoption rate of electronic medical record (EMR) systems, more health care records have been increasingly accumulating in clinical data repositories. Therefore, querying the data stored in these repositories is crucial for retrieving the knowledge from such large volumes of clinical data. Objective The aim of this study is to develop a Web-based approach for enriching the capabilities of the data-querying system along the three following considerations: (1) the interface design used for query formulation, (2) the representation of query results, and (3) the models used for formulating query criteria. Methods The Guideline Interchange Format version 3.5 (GLIF3.5), an ontology-driven clinical guideline representation language, was used for formulating the query tasks based on the GLIF3.5 flowchart in the Protégé environment. The flowchart-based data-querying model (FBDQM) query execution engine was developed and implemented for executing queries and presenting the results through a visual and graphical interface. To examine a broad variety of patient data, the clinical data generator was implemented to automatically generate the clinical data in the repository, and the generated data, thereby, were employed to evaluate the system. The accuracy and time performance of the system for three medical query tasks relevant to liver cancer were evaluated based on the clinical data generator in the experiments with varying numbers of patients. Results In this study, a prototype system was developed to test the feasibility of applying a methodology for building a query execution engine using FBDQMs by formulating query tasks using the existing GLIF. The FBDQM-based query execution engine was used to successfully retrieve the clinical data based on the query tasks formatted using the GLIF3.5 in the experiments with varying numbers of patients. The accuracy of the three queries (ie, “degree of liver damage,” “degree of liver damage when applying a mutually exclusive setting,” and “treatments for liver cancer”) was 100% for all four experiments (10 patients, 100 patients, 1000 patients, and 10,000 patients). Among the three measured query phases, (1) structured query language operations, (2) criteria verification, and (3) other, the first two had the longest execution time. Conclusions The ontology-driven FBDQM-based approach enriched the capabilities of the data-querying system. The adoption of the GLIF3.5 increased the potential for interoperability, shareability, and reusability of the query tasks. PMID:25600078

  3. Mining Longitudinal Web Queries: Trends and Patterns.

    ERIC Educational Resources Information Center

    Wang, Peiling; Berry, Michael W.; Yang, Yiheng

    2003-01-01

    Analyzed user queries submitted to an academic Web site during a four-year period, using a relational database, to examine users' query behavior, to identify problems they encounter, and to develop techniques for optimizing query analysis and mining. Linguistic analyses focus on query structures, lexicon, and word associations using statistical…

  4. WATCHMAN: A Data Warehouse Intelligent Cache Manager

    NASA Technical Reports Server (NTRS)

    Scheuermann, Peter; Shim, Junho; Vingralek, Radek

    1996-01-01

    Data warehouses store large volumes of data which are used frequently by decision support applications. Such applications involve complex queries. Query performance in such an environment is critical because decision support applications often require interactive query response time. Because data warehouses are updated infrequently, it becomes possible to improve query performance by caching sets retrieved by queries in addition to query execution plans. In this paper we report on the design of an intelligent cache manager for sets retrieved by queries called WATCHMAN, which is particularly well suited for data warehousing environment. Our cache manager employs two novel, complementary algorithms for cache replacement and for cache admission. WATCHMAN aims at minimizing query response time and its cache replacement policy swaps out entire retrieved sets of queries instead of individual pages. The cache replacement and admission algorithms make use of a profit metric, which considers for each retrieved set its average rate of reference, its size, and execution cost of the associated query. We report on a performance evaluation based on the TPC-D and Set Query benchmarks. These experiments show that WATCHMAN achieves a substantial performance improvement in a decision support environment when compared to a traditional LRU replacement algorithm.

  5. PIBAS FedSPARQL: a web-based platform for integration and exploration of bioinformatics datasets.

    PubMed

    Djokic-Petrovic, Marija; Cvjetkovic, Vladimir; Yang, Jeremy; Zivanovic, Marko; Wild, David J

    2017-09-20

    There are a huge variety of data sources relevant to chemical, biological and pharmacological research, but these data sources are highly siloed and cannot be queried together in a straightforward way. Semantic technologies offer the ability to create links and mappings across datasets and manage them as a single, linked network so that searching can be carried out across datasets, independently of the source. We have developed an application called PIBAS FedSPARQL that uses semantic technologies to allow researchers to carry out such searching across a vast array of data sources. PIBAS FedSPARQL is a web-based query builder and result set visualizer of bioinformatics data. As an advanced feature, our system can detect similar data items identified by different Uniform Resource Identifiers (URIs), using a text-mining algorithm based on the processing of named entities to be used in Vector Space Model and Cosine Similarity Measures. According to our knowledge, PIBAS FedSPARQL was unique among the systems that we found in that it allows detecting of similar data items. As a query builder, our system allows researchers to intuitively construct and run Federated SPARQL queries across multiple data sources, including global initiatives, such as Bio2RDF, Chem2Bio2RDF, EMBL-EBI, and one local initiative called CPCTAS, as well as additional user-specified data source. From the input topic, subtopic, template and keyword, a corresponding initial Federated SPARQL query is created and executed. Based on the data obtained, end users have the ability to choose the most appropriate data sources in their area of interest and exploit their Resource Description Framework (RDF) structure, which allows users to select certain properties of data to enhance query results. The developed system is flexible and allows intuitive creation and execution of queries for an extensive range of bioinformatics topics. Also, the novel "similar data items detection" algorithm can be particularly useful for suggesting new data sources and cost optimization for new experiments. PIBAS FedSPARQL can be expanded with new topics, subtopics and templates on demand, rendering information retrieval more robust.

  6. CampusGIS of the University of Cologne: a tool for orientation, navigation, and management

    NASA Astrophysics Data System (ADS)

    Baaser, U.; Gnyp, M. L.; Hennig, S.; Hoffmeister, D.; Köhn, N.; Laudien, R.; Bareth, G.

    2006-10-01

    The working group for GIS and Remote Sensing at the Department of Geography at the University of Cologne has established a WebGIS called CampusGIS of the University of Cologne. The overall task of the CampusGIS is the connection of several existing databases at the University of Cologne with spatial data. These existing databases comprise data about staff, buildings, rooms, lectures, and general infrastructure like bus stops etc. These information were yet not linked to their spatial relation. Therefore, a GIS-based method is developed to link all the different databases to spatial entities. Due to the philosophy of the CampusGIS, an online-GUI is programmed which enables users to search for staff, buildings, or institutions. The query results are linked to the GIS database which allows the visualization of the spatial location of the searched entity. This system was established in 2005 and is operational since early 2006. In this contribution, the focus is on further developments. First results of (i) including routing services in, (ii) programming GUIs for mobile devices for, and (iii) including infrastructure management tools in the CampusGIS are presented. Consequently, the CampusGIS is not only available for spatial information retrieval and orientation. It also serves for on-campus navigation and administrative management.

  7. Integration and management of massive remote-sensing data based on GeoSOT subdivision model

    NASA Astrophysics Data System (ADS)

    Li, Shuang; Cheng, Chengqi; Chen, Bo; Meng, Li

    2016-07-01

    Owing to the rapid development of earth observation technology, the volume of spatial information is growing rapidly; therefore, improving query retrieval speed from large, rich data sources for remote-sensing data management systems is quite urgent. A global subdivision model, geographic coordinate subdivision grid with one-dimension integer coding on 2n-tree, which we propose as a solution, has been used in data management organizations. However, because a spatial object may cover several grids, ample data redundancy will occur when data are stored in relational databases. To solve this redundancy problem, we first combined the subdivision model with the spatial array database containing the inverted index. We proposed an improved approach for integrating and managing massive remote-sensing data. By adding a spatial code column in an array format in a database, spatial information in remote-sensing metadata can be stored and logically subdivided. We implemented our method in a Kingbase Enterprise Server database system and compared the results with the Oracle platform by simulating worldwide image data. Experimental results showed that our approach performed better than Oracle in terms of data integration and time and space efficiency. Our approach also offers an efficient storage management system for existing storage centers and management systems.

  8. Fuzzy Relational Databases: Representational Issues and Reduction Using Similarity Measures.

    ERIC Educational Resources Information Center

    Prade, Henri; Testemale, Claudette

    1987-01-01

    Compares and expands upon two approaches to dealing with fuzzy relational databases. The proposed similarity measure is based on a fuzzy Hausdorff distance and estimates the mismatch between two possibility distributions using a reduction process. The consequences of the reduction process on query evaluation are studied. (Author/EM)

  9. KARL: A Knowledge-Assisted Retrieval Language. Presentation visuals. M.S. Thesis Final Report, 1 Jul. 1985 - 31 Dec. 1987

    NASA Technical Reports Server (NTRS)

    Dominick, Wayne D. (Editor); Triantafyllopoulos, Spiros

    1985-01-01

    A collection of presentation visuals associated with the companion report entitled KARL: A Knowledge-Assisted Retrieval Language, is presented. Information is given on data retrieval, natural language database front ends, generic design objectives, processing capababilities and the query processing cycle.

  10. Visualization of Earth and Space Science Data at JPL's Science Data Processing Systems Section

    NASA Technical Reports Server (NTRS)

    Green, William B.

    1996-01-01

    This presentation will provide an overview of systems in use at NASA's Jet Propulsion Laboratory for processing data returned by space exploration and earth observations spacecraft. Graphical and visualization techniques used to query and retrieve data from large scientific data bases will be described.

  11. NLPIR: A Theoretical Framework for Applying Natural Language Processing to Information Retrieval.

    ERIC Educational Resources Information Center

    Zhou, Lina; Zhang, Dongsong

    2003-01-01

    Proposes a theoretical framework called NLPIR that integrates natural language processing (NLP) into information retrieval (IR) based on the assumption that there exists representation distance between queries and documents. Discusses problems in traditional keyword-based IR, including relevance, and describes some existing NLP techniques.…

  12. Implementation and evaluation of a hypercube-based method for spatiotemporal exploration and analysis

    NASA Astrophysics Data System (ADS)

    Marchand, Pierre; Brisebois, Alexandre; Bédard, Yvan; Edwards, Geoffrey

    This paper presents the results obtained with a new type of spatiotemporal topological dimension implemented within a hypercube, i.e., within a multidimensional database (MDDB) structure formed by the conjunction of several thematic, spatial and temporal dimensions. Our goal is to support efficient SpatioTemporal Exploration and Analysis (STEA) in the context of Automatic Position Reporting System (APRS), the worldwide amateur radio system for position report transmission. Mobile APRS stations are equipped with GPS navigation systems to provide real-time positioning reports. Previous research about the multidimensional approach has proved good potential for spatiotemporal exploration and analysis despite a lack of explicit topological operators (spatial, temporal and spatiotemporal). Our project implemented such operators through a hierarchy of operators that are applied to pairs of instances of objects. At the top of the hierarchy, users can use simple operators such as "same place", "same time" or "same time, same place". As they drill down into the hierarchy, more detailed topological operators are made available such as "adjacent immediately after", "touch during" or more detailed operators. This hierarchy is structured according to four levels of granularity based on cognitive models, generalized relationships and formal models of topological relationships. In this paper, we also describe the generic approach which allows efficient STEA within the multidimensional approach. Finally, we demonstrate that such an implementation offers query run times which permit to maintain a "train-of-thought" during exploration and analysis operations as they are compatible with Newell's cognitive band (query runtime<10 s) (Newell, A., 1990. Unified theories of cognition. Harvard University Press, Cambridge MA, 549 p.).

  13. Assisting Consumer Health Information Retrieval with Query Recommendations

    PubMed Central

    Zeng, Qing T.; Crowell, Jonathan; Plovnick, Robert M.; Kim, Eunjung; Ngo, Long; Dibble, Emily

    2006-01-01

    Objective: Health information retrieval (HIR) on the Internet has become an important practice for millions of people, many of whom have problems forming effective queries. We have developed and evaluated a tool to assist people in health-related query formation. Design: We developed the Health Information Query Assistant (HIQuA) system. The system suggests alternative/additional query terms related to the user's initial query that can be used as building blocks to construct a better, more specific query. The recommended terms are selected according to their semantic distance from the original query, which is calculated on the basis of concept co-occurrences in medical literature and log data as well as semantic relations in medical vocabularies. Measurements: An evaluation of the HIQuA system was conducted and a total of 213 subjects participated in the study. The subjects were randomized into 2 groups. One group was given query recommendations and the other was not. Each subject performed HIR for both a predefined and a self-defined task. Results: The study showed that providing HIQuA recommendations resulted in statistically significantly higher rates of successful queries (odds ratio = 1.66, 95% confidence interval = 1.16–2.38), although no statistically significant impact on user satisfaction or the users' ability to accomplish the predefined retrieval task was found. Conclusion: Providing semantic-distance-based query recommendations can help consumers with query formation during HIR. PMID:16221944

  14. PAQ: Persistent Adaptive Query Middleware for Dynamic Environments

    NASA Astrophysics Data System (ADS)

    Rajamani, Vasanth; Julien, Christine; Payton, Jamie; Roman, Gruia-Catalin

    Pervasive computing applications often entail continuous monitoring tasks, issuing persistent queries that return continuously updated views of the operational environment. We present PAQ, a middleware that supports applications' needs by approximating a persistent query as a sequence of one-time queries. PAQ introduces an integration strategy abstraction that allows composition of one-time query responses into streams representing sophisticated spatio-temporal phenomena of interest. A distinguishing feature of our middleware is the realization that the suitability of a persistent query's result is a function of the application's tolerance for accuracy weighed against the associated overhead costs. In PAQ, programmers can specify an inquiry strategy that dictates how information is gathered. Since network dynamics impact the suitability of a particular inquiry strategy, PAQ associates an introspection strategy with a persistent query, that evaluates the quality of the query's results. The result of introspection can trigger application-defined adaptation strategies that alter the nature of the query. PAQ's simple API makes developing adaptive querying systems easily realizable. We present the key abstractions, describe their implementations, and demonstrate the middleware's usefulness through application examples and evaluation.

  15. The CMS DBS query language

    NASA Astrophysics Data System (ADS)

    Kuznetsov, Valentin; Riley, Daniel; Afaq, Anzar; Sekhri, Vijay; Guo, Yuyi; Lueking, Lee

    2010-04-01

    The CMS experiment has implemented a flexible and powerful system enabling users to find data within the CMS physics data catalog. The Dataset Bookkeeping Service (DBS) comprises a database and the services used to store and access metadata related to CMS physics data. To this, we have added a generalized query system in addition to the existing web and programmatic interfaces to the DBS. This query system is based on a query language that hides the complexity of the underlying database structure by discovering the join conditions between database tables. This provides a way of querying the system that is simple and straightforward for CMS data managers and physicists to use without requiring knowledge of the database tables or keys. The DBS Query Language uses the ANTLR tool to build the input query parser and tokenizer, followed by a query builder that uses a graph representation of the DBS schema to construct the SQL query sent to underlying database. We will describe the design of the query system, provide details of the language components and overview of how this component fits into the overall data discovery system architecture.

  16. Querying Co-regulated Genes on Diverse Gene Expression Datasets Via Biclustering.

    PubMed

    Deveci, Mehmet; Küçüktunç, Onur; Eren, Kemal; Bozdağ, Doruk; Kaya, Kamer; Çatalyürek, Ümit V

    2016-01-01

    Rapid development and increasing popularity of gene expression microarrays have resulted in a number of studies on the discovery of co-regulated genes. One important way of discovering such co-regulations is the query-based search since gene co-expressions may indicate a shared role in a biological process. Although there exist promising query-driven search methods adapting clustering, they fail to capture many genes that function in the same biological pathway because microarray datasets are fraught with spurious samples or samples of diverse origin, or the pathways might be regulated under only a subset of samples. On the other hand, a class of clustering algorithms known as biclustering algorithms which simultaneously cluster both the items and their features are useful while analyzing gene expression data, or any data in which items are related in only a subset of their samples. This means that genes need not be related in all samples to be clustered together. Because many genes only interact under specific circumstances, biclustering may recover the relationships that traditional clustering algorithms can easily miss. In this chapter, we briefly summarize the literature using biclustering for querying co-regulated genes. Then we present a novel biclustering approach and evaluate its performance by a thorough experimental analysis.

  17. Ontobee: A linked ontology data server to support ontology term dereferencing, linkage, query and integration

    PubMed Central

    Ong, Edison; Xiang, Zuoshuang; Zhao, Bin; Liu, Yue; Lin, Yu; Zheng, Jie; Mungall, Chris; Courtot, Mélanie; Ruttenberg, Alan; He, Yongqun

    2017-01-01

    Linked Data (LD) aims to achieve interconnected data by representing entities using Unified Resource Identifiers (URIs), and sharing information using Resource Description Frameworks (RDFs) and HTTP. Ontologies, which logically represent entities and relations in specific domains, are the basis of LD. Ontobee (http://www.ontobee.org/) is a linked ontology data server that stores ontology information using RDF triple store technology and supports query, visualization and linkage of ontology terms. Ontobee is also the default linked data server for publishing and browsing biomedical ontologies in the Open Biological Ontology (OBO) Foundry (http://obofoundry.org) library. Ontobee currently hosts more than 180 ontologies (including 131 OBO Foundry Library ontologies) with over four million terms. Ontobee provides a user-friendly web interface for querying and visualizing the details and hierarchy of a specific ontology term. Using the eXtensible Stylesheet Language Transformation (XSLT) technology, Ontobee is able to dereference a single ontology term URI, and then output RDF/eXtensible Markup Language (XML) for computer processing or display the HTML information on a web browser for human users. Statistics and detailed information are generated and displayed for each ontology listed in Ontobee. In addition, a SPARQL web interface is provided for custom advanced SPARQL queries of one or multiple ontologies. PMID:27733503

  18. EpiGeNet: A Graph Database of Interdependencies Between Genetic and Epigenetic Events in Colorectal Cancer.

    PubMed

    Balaur, Irina; Saqi, Mansoor; Barat, Ana; Lysenko, Artem; Mazein, Alexander; Rawlings, Christopher J; Ruskin, Heather J; Auffray, Charles

    2017-10-01

    The development of colorectal cancer (CRC)-the third most common cancer type-has been associated with deregulations of cellular mechanisms stimulated by both genetic and epigenetic events. StatEpigen is a manually curated and annotated database, containing information on interdependencies between genetic and epigenetic signals, and specialized currently for CRC research. Although StatEpigen provides a well-developed graphical user interface for information retrieval, advanced queries involving associations between multiple concepts can benefit from more detailed graph representation of the integrated data. This can be achieved by using a graph database (NoSQL) approach. Data were extracted from StatEpigen and imported to our newly developed EpiGeNet, a graph database for storage and querying of conditional relationships between molecular (genetic and epigenetic) events observed at different stages of colorectal oncogenesis. We illustrate the enhanced capability of EpiGeNet for exploration of different queries related to colorectal tumor progression; specifically, we demonstrate the query process for (i) stage-specific molecular events, (ii) most frequently observed genetic and epigenetic interdependencies in colon adenoma, and (iii) paths connecting key genes reported in CRC and associated events. The EpiGeNet framework offers improved capability for management and visualization of data on molecular events specific to CRC initiation and progression.

  19. Omicseq: a web-based search engine for exploring omics datasets

    PubMed Central

    Sun, Xiaobo; Pittard, William S.; Xu, Tianlei; Chen, Li; Zwick, Michael E.; Jiang, Xiaoqian; Wang, Fusheng

    2017-01-01

    Abstract The development and application of high-throughput genomics technologies has resulted in massive quantities of diverse omics data that continue to accumulate rapidly. These rich datasets offer unprecedented and exciting opportunities to address long standing questions in biomedical research. However, our ability to explore and query the content of diverse omics data is very limited. Existing dataset search tools rely almost exclusively on the metadata. A text-based query for gene name(s) does not work well on datasets wherein the vast majority of their content is numeric. To overcome this barrier, we have developed Omicseq, a novel web-based platform that facilitates the easy interrogation of omics datasets holistically to improve ‘findability’ of relevant data. The core component of Omicseq is trackRank, a novel algorithm for ranking omics datasets that fully uses the numerical content of the dataset to determine relevance to the query entity. The Omicseq system is supported by a scalable and elastic, NoSQL database that hosts a large collection of processed omics datasets. In the front end, a simple, web-based interface allows users to enter queries and instantly receive search results as a list of ranked datasets deemed to be the most relevant. Omicseq is freely available at http://www.omicseq.org. PMID:28402462

  20. Human motion retrieval from hand-drawn sketch.

    PubMed

    Chao, Min-Wen; Lin, Chao-Hung; Assa, Jackie; Lee, Tong-Yee

    2012-05-01

    The rapid growth of motion capture data increases the importance of motion retrieval. The majority of the existing motion retrieval approaches are based on a labor-intensive step in which the user browses and selects a desired query motion clip from the large motion clip database. In this work, a novel sketching interface for defining the query is presented. This simple approach allows users to define the required motion by sketching several motion strokes over a drawn character, which requires less effort and extends the users’ expressiveness. To support the real-time interface, a specialized encoding of the motions and the hand-drawn query is required. Here, we introduce a novel hierarchical encoding scheme based on a set of orthonormal spherical harmonic (SH) basis functions, which provides a compact representation, and avoids the CPU/processing intensive stage of temporal alignment used by previous solutions. Experimental results show that the proposed approach can well retrieve the motions, and is capable of retrieve logically and numerically similar motions, which is superior to previous approaches. The user study shows that the proposed system can be a useful tool to input motion query if the users are familiar with it. Finally, an application of generating a 3D animation from a hand-drawn comics strip is demonstrated.

Top