Sample records for continuous range queries

  1. Evaluation of Content-Matched Range Monitoring Queries over Moving Objects in Mobile Computing Environments

    PubMed Central

    Jung, HaRim; Song, MoonBae; Youn, Hee Yong; Kim, Ung Mo

    2015-01-01

    A content-matched (CM) range monitoring query over moving objects continually retrieves the moving objects (i) whose non-spatial attribute values are matched to given non-spatial query values; and (ii) that are currently located within a given spatial query range. In this paper, we propose a new query indexing structure, called the group-aware query region tree (GQR-tree) for efficient evaluation of CM range monitoring queries. The primary role of the GQR-tree is to help the server leverage the computational capabilities of moving objects in order to improve the system performance in terms of the wireless communication cost and server workload. Through a series of comprehensive simulations, we verify the superiority of the GQR-tree method over the existing methods. PMID:26393613

  2. Evaluation of Content-Matched Range Monitoring Queries over Moving Objects in Mobile Computing Environments.

    PubMed

    Jung, HaRim; Song, MoonBae; Youn, Hee Yong; Kim, Ung Mo

    2015-09-18

    A content-matched (CM) rangemonitoring query overmoving objects continually retrieves the moving objects (i) whose non-spatial attribute values are matched to given non-spatial query values; and (ii) that are currently located within a given spatial query range. In this paper, we propose a new query indexing structure, called the group-aware query region tree (GQR-tree) for efficient evaluation of CMrange monitoring queries. The primary role of the GQR-tree is to help the server leverage the computational capabilities of moving objects in order to improve the system performance in terms of the wireless communication cost and server workload. Through a series of comprehensive simulations, we verify the superiority of the GQR-tree method over the existing methods.

  3. Monitoring Moving Queries inside a Safe Region

    PubMed Central

    Al-Khalidi, Haidar; Taniar, David; Alamri, Sultan

    2014-01-01

    With mobile moving range queries, there is a need to recalculate the relevant surrounding objects of interest whenever the query moves. Therefore, monitoring the moving query is very costly. The safe region is one method that has been proposed to minimise the communication and computation cost of continuously monitoring a moving range query. Inside the safe region the set of objects of interest to the query do not change; thus there is no need to update the query while it is inside its safe region. However, when the query leaves its safe region the mobile device has to reevaluate the query, necessitating communication with the server. Knowing when and where the mobile device will leave a safe region is widely known as a difficult problem. To solve this problem, we propose a novel method to monitor the position of the query over time using a linear function based on the direction of the query obtained by periodic monitoring of its position. Periodic monitoring ensures that the query is aware of its location all the time. This method reduces the costs associated with communications in client-server architecture. Computational results show that our method is successful in handling moving query patterns. PMID:24696652

  4. CSRQ: Communication-Efficient Secure Range Queries in Two-Tiered Sensor Networks

    PubMed Central

    Dai, Hua; Ye, Qingqun; Yang, Geng; Xu, Jia; He, Ruiliang

    2016-01-01

    In recent years, we have seen many applications of secure query in two-tiered wireless sensor networks. Storage nodes are responsible for storing data from nearby sensor nodes and answering queries from Sink. It is critical to protect data security from a compromised storage node. In this paper, the Communication-efficient Secure Range Query (CSRQ)—a privacy and integrity preserving range query protocol—is proposed to prevent attackers from gaining information of both data collected by sensor nodes and queries issued by Sink. To preserve privacy and integrity, in addition to employing the encoding mechanisms, a novel data structure called encrypted constraint chain is proposed, which embeds the information of integrity verification. Sink can use this encrypted constraint chain to verify the query result. The performance evaluation shows that CSRQ has lower communication cost than the current range query protocols. PMID:26907293

  5. A Novel Evaluation of World No Tobacco Day in Latin America

    PubMed Central

    Althouse, Benjamin M; Allem, Jon-Patrick; Ford, Daniel E; Ribisl, Kurt M; Cohen, Joanna E

    2012-01-01

    Background World No Tobacco Day (WNTD), commemorated annually on May 31, aims to inform the public about tobacco harms. Because tobacco control surveillance is usually annualized, the effectiveness of WNTD remains unexplored into its 25th year. Objective To explore the potential of digital surveillance (infoveillance) to evaluate the impacts of WNTD on population awareness of and interest in cessation. Methods Health-related news stories and Internet search queries were aggregated to form a continuous and real-time data stream. We monitored daily news coverage of and Internet search queries for cessation in seven Latin American nations from 2006 to 2011. Results Cessation news coverage peaked around WNTD, typically increasing 71% (95% confidence interval [CI] 61–81), ranging from 61% in Mexico to 83% in Venezuela. Queries indicative of cessation interest peaked on WNTD, increasing 40% (95% CI 32–48), ranging from 24% in Colombia to 84% in Venezuela. A doubling in cessation news coverage was associated with approximately a 50% increase in cessation queries. To gain a practical perspective, we compared WNTD-related activity with New Year’s Day and several cigarette excise tax increases in Mexico. Cessation queries around WNTD were typically greater than New Year’s Day and approximated a 2.8% (95% CI –0.8 to 6.3) increase in cigarette excise taxes. Conclusions This novel evaluation suggests WNTD had a significant impact on popular awareness (media trends) and individual interest (query trends) in smoking cessation. Because WNTD is constantly evolving, our work is also a model for real-time surveillance and potential improvement in WNTD and similar initiatives. PMID:22634568

  6. Query-based learning for aerospace applications.

    PubMed

    Saad, E W; Choi, J J; Vian, J L; Wunsch, D C Ii

    2003-01-01

    Models of real-world applications often include a large number of parameters with a wide dynamic range, which contributes to the difficulties of neural network training. Creating the training data set for such applications becomes costly, if not impossible. In order to overcome the challenge, one can employ an active learning technique known as query-based learning (QBL) to add performance-critical data to the training set during the learning phase, thereby efficiently improving the overall learning/generalization. The performance-critical data can be obtained using an inverse mapping called network inversion (discrete network inversion and continuous network inversion) followed by oracle query. This paper investigates the use of both inversion techniques for QBL learning, and introduces an original heuristic to select the inversion target values for continuous network inversion method. Efficiency and generalization was further enhanced by employing node decoupled extended Kalman filter (NDEKF) training and a causality index (CI) as a means to reduce the input search dimensionality. The benefits of the overall QBL approach are experimentally demonstrated in two aerospace applications: a classification problem with large input space and a control distribution problem.

  7. PAQ: Persistent Adaptive Query Middleware for Dynamic Environments

    NASA Astrophysics Data System (ADS)

    Rajamani, Vasanth; Julien, Christine; Payton, Jamie; Roman, Gruia-Catalin

    Pervasive computing applications often entail continuous monitoring tasks, issuing persistent queries that return continuously updated views of the operational environment. We present PAQ, a middleware that supports applications' needs by approximating a persistent query as a sequence of one-time queries. PAQ introduces an integration strategy abstraction that allows composition of one-time query responses into streams representing sophisticated spatio-temporal phenomena of interest. A distinguishing feature of our middleware is the realization that the suitability of a persistent query's result is a function of the application's tolerance for accuracy weighed against the associated overhead costs. In PAQ, programmers can specify an inquiry strategy that dictates how information is gathered. Since network dynamics impact the suitability of a particular inquiry strategy, PAQ associates an introspection strategy with a persistent query, that evaluates the quality of the query's results. The result of introspection can trigger application-defined adaptation strategies that alter the nature of the query. PAQ's simple API makes developing adaptive querying systems easily realizable. We present the key abstractions, describe their implementations, and demonstrate the middleware's usefulness through application examples and evaluation.

  8. Query-Time Optimization Techniques for Structured Queries in Information Retrieval

    ERIC Educational Resources Information Center

    Cartright, Marc-Allen

    2013-01-01

    The use of information retrieval (IR) systems is evolving towards larger, more complicated queries. Both the IR industrial and research communities have generated significant evidence indicating that in order to continue improving retrieval effectiveness, increases in retrieval model complexity may be unavoidable. From an operational perspective,…

  9. In-context query reformulation for failing SPARQL queries

    NASA Astrophysics Data System (ADS)

    Viswanathan, Amar; Michaelis, James R.; Cassidy, Taylor; de Mel, Geeth; Hendler, James

    2017-05-01

    Knowledge bases for decision support systems are growing increasingly complex, through continued advances in data ingest and management approaches. However, humans do not possess the cognitive capabilities to retain a bird's-eyeview of such knowledge bases, and may end up issuing unsatisfiable queries to such systems. This work focuses on the implementation of a query reformulation approach for graph-based knowledge bases, specifically designed to support the Resource Description Framework (RDF). The reformulation approach presented is instance-and schema-aware. Thus, in contrast to relaxation techniques found in the state-of-the-art, the presented approach produces in-context query reformulation.

  10. Collusion-aware privacy-preserving range query in tiered wireless sensor networks.

    PubMed

    Zhang, Xiaoying; Dong, Lei; Peng, Hui; Chen, Hong; Zhao, Suyun; Li, Cuiping

    2014-12-11

    Wireless sensor networks (WSNs) are indispensable building blocks for the Internet of Things (IoT). With the development of WSNs, privacy issues have drawn more attention. Existing work on the privacy-preserving range query mainly focuses on privacy preservation and integrity verification in two-tiered WSNs in the case of compromisedmaster nodes, but neglects the damage of node collusion. In this paper, we propose a series of collusion-aware privacy-preserving range query protocols in two-tiered WSNs. To the best of our knowledge, this paper is the first to consider collusion attacks for a range query in tiered WSNs while fulfilling the preservation of privacy and integrity. To preserve the privacy of data and queries, we propose a novel encoding scheme to conceal sensitive information. To preserve the integrity of the results, we present a verification scheme using the correlation among data. In addition, two schemes are further presented to improve result accuracy and reduce communication cost. Finally, theoretical analysis and experimental results confirm the efficiency, accuracy and privacy of our proposals.

  11. Collusion-Aware Privacy-Preserving Range Query in Tiered Wireless Sensor Networks†

    PubMed Central

    Zhang, Xiaoying; Dong, Lei; Peng, Hui; Chen, Hong; Zhao, Suyun; Li, Cuiping

    2014-01-01

    Wireless sensor networks (WSNs) are indispensable building blocks for the Internet of Things (IoT). With the development of WSNs, privacy issues have drawn more attention. Existing work on the privacy-preserving range query mainly focuses on privacy preservation and integrity verification in two-tiered WSNs in the case of compromised master nodes, but neglects the damage of node collusion. In this paper, we propose a series of collusion-aware privacy-preserving range query protocols in two-tiered WSNs. To the best of our knowledge, this paper is the first to consider collusion attacks for a range query in tiered WSNs while fulfilling the preservation of privacy and integrity. To preserve the privacy of data and queries, we propose a novel encoding scheme to conceal sensitive information. To preserve the integrity of the results, we present a verification scheme using the correlation among data. In addition, two schemes are further presented to improve result accuracy and reduce communication cost. Finally, theoretical analysis and experimental results confirm the efficiency, accuracy and privacy of our proposals. PMID:25615731

  12. VPipe: Virtual Pipelining for Scheduling of DAG Stream Query Plans

    NASA Astrophysics Data System (ADS)

    Wang, Song; Gupta, Chetan; Mehta, Abhay

    There are data streams all around us that can be harnessed for tremendous business and personal advantage. For an enterprise-level stream processing system such as CHAOS [1] (Continuous, Heterogeneous Analytic Over Streams), handling of complex query plans with resource constraints is challenging. While several scheduling strategies exist for stream processing, efficient scheduling of complex DAG query plans is still largely unsolved. In this paper, we propose a novel execution scheme for scheduling complex directed acyclic graph (DAG) query plans with meta-data enriched stream tuples. Our solution, called Virtual Pipelined Chain (or VPipe Chain for short), effectively extends the "Chain" pipelining scheduling approach to complex DAG query plans.

  13. Addressing Security Challenges in Pervasive Computing Applications

    DTIC Science & Technology

    2010-10-10

    Personalized Privacy for Location - Based Services ", Transactions on Data Privacy, 2(1), 2009. 22. Indrakshi Ray, Indrajit Ray and Sudip Chakraborty, "An...Dewri, Indrakshi Ray, Indrajit Ray and Darrell Whitley, "Query m-Invariance: Pre- venting Query Disclosures in Continuous Location - Based Services ", Proceedings...location information is used to provide better services. Often such applications need continuous location - based services (LBS) where the mobile object must

  14. Blind Seer: A Scalable Private DBMS

    DTIC Science & Technology

    2014-05-01

    searchable index terms per DB row, in time comparable to (insecure) MySQL (many practical queries can be privately executed with work 1.2-3 times slower...than MySQL , although some queries are costlier). We support a rich query set, including searching on arbitrary boolean formulas on keywords and ranges...index terms per DB row, in time comparable to (insecure) MySQL (many practical queries can be privately executed with work 1.2-3 times slower than MySQL

  15. Walter User’s Manual (Version 1.0).

    DTIC Science & Technology

    1987-09-01

    queries and/or commands. 1.2 - How Walter Uses the Screen As shown in Figure 1-1, Walter divides the screen of your terminal into five separate areas...our attention to queries and how to submit them to the database. 1.3.1 - Submitting Queries A query is an expression consisting of words, parentheses...dates, but also with ranges of dates, such as "oct 15 : nov 15". Waiter recognizes three kinds of dates: * Specific dates of the form [date <month> <day

  16. An advanced web query interface for biological databases

    PubMed Central

    Latendresse, Mario; Karp, Peter D.

    2010-01-01

    Although most web-based biological databases (DBs) offer some type of web-based form to allow users to author DB queries, these query forms are quite restricted in the complexity of DB queries that they can formulate. They can typically query only one DB, and can query only a single type of object at a time (e.g. genes) with no possible interaction between the objects—that is, in SQL parlance, no joins are allowed between DB objects. Writing precise queries against biological DBs is usually left to a programmer skillful enough in complex DB query languages like SQL. We present a web interface for building precise queries for biological DBs that can construct much more precise queries than most web-based query forms, yet that is user friendly enough to be used by biologists. It supports queries containing multiple conditions, and connecting multiple object types without using the join concept, which is unintuitive to biologists. This interactive web interface is called the Structured Advanced Query Page (SAQP). Users interactively build up a wide range of query constructs. Interactive documentation within the SAQP describes the schema of the queried DBs. The SAQP is based on BioVelo, a query language based on list comprehension. The SAQP is part of the Pathway Tools software and is available as part of several bioinformatics web sites powered by Pathway Tools, including the BioCyc.org site that contains more than 500 Pathway/Genome DBs. PMID:20624715

  17. A Query Integrator and Manager for the Query Web

    PubMed Central

    Brinkley, James F.; Detwiler, Landon T.

    2012-01-01

    We introduce two concepts: the Query Web as a layer of interconnected queries over the document web and the semantic web, and a Query Web Integrator and Manager (QI) that enables the Query Web to evolve. QI permits users to write, save and reuse queries over any web accessible source, including other queries saved in other installations of QI. The saved queries may be in any language (e.g. SPARQL, XQuery); the only condition for interconnection is that the queries return their results in some form of XML. This condition allows queries to chain off each other, and to be written in whatever language is appropriate for the task. We illustrate the potential use of QI for several biomedical use cases, including ontology view generation using a combination of graph-based and logical approaches, value set generation for clinical data management, image annotation using terminology obtained from an ontology web service, ontology-driven brain imaging data integration, small-scale clinical data integration, and wider-scale clinical data integration. Such use cases illustrate the current range of applications of QI and lead us to speculate about the potential evolution from smaller groups of interconnected queries into a larger query network that layers over the document and semantic web. The resulting Query Web could greatly aid researchers and others who now have to manually navigate through multiple information sources in order to answer specific questions. PMID:22531831

  18. Analytics-Driven Lossless Data Compression for Rapid In-situ Indexing, Storing, and Querying

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jenkins, John; Arkatkar, Isha; Lakshminarasimhan, Sriram

    2013-01-01

    The analysis of scientific simulations is highly data-intensive and is becoming an increasingly important challenge. Peta-scale data sets require the use of light-weight query-driven analysis methods, as opposed to heavy-weight schemes that optimize for speed at the expense of size. This paper is an attempt in the direction of query processing over losslessly compressed scientific data. We propose a co-designed double-precision compression and indexing methodology for range queries by performing unique-value-based binning on the most significant bytes of double precision data (sign, exponent, and most significant mantissa bits), and inverting the resulting metadata to produce an inverted index over amore » reduced data representation. Without the inverted index, our method matches or improves compression ratios over both general-purpose and floating-point compression utilities. The inverted index is light-weight, and the overall storage requirement for both reduced column and index is less than 135%, whereas existing DBMS technologies can require 200-400%. As a proof-of-concept, we evaluate univariate range queries that additionally return column values, a critical component of data analytics, against state-of-the-art bitmap indexing technology, showing multi-fold query performance improvements.« less

  19. Executing SPARQL Queries over the Web of Linked Data

    NASA Astrophysics Data System (ADS)

    Hartig, Olaf; Bizer, Christian; Freytag, Johann-Christoph

    The Web of Linked Data forms a single, globally distributed dataspace. Due to the openness of this dataspace, it is not possible to know in advance all data sources that might be relevant for query answering. This openness poses a new challenge that is not addressed by traditional research on federated query processing. In this paper we present an approach to execute SPARQL queries over the Web of Linked Data. The main idea of our approach is to discover data that might be relevant for answering a query during the query execution itself. This discovery is driven by following RDF links between data sources based on URIs in the query and in partial results. The URIs are resolved over the HTTP protocol into RDF data which is continuously added to the queried dataset. This paper describes concepts and algorithms to implement our approach using an iterator-based pipeline. We introduce a formalization of the pipelining approach and show that classical iterators may cause blocking due to the latency of HTTP requests. To avoid blocking, we propose an extension of the iterator paradigm. The evaluation of our approach shows its strengths as well as the still existing challenges.

  20. The Binding Database: data management and interface design.

    PubMed

    Chen, Xi; Lin, Yuhmei; Liu, Ming; Gilson, Michael K

    2002-01-01

    The large and growing body of experimental data on biomolecular binding is of enormous value in developing a deeper understanding of molecular biology, in developing new therapeutics, and in various molecular design applications. However, most of these data are found only in the published literature and are therefore difficult to access and use. No existing public database has focused on measured binding affinities and has provided query capabilities that include chemical structure and sequence homology searches. We have created Binding DataBase (BindingDB), a public, web-accessible database of measured binding affinities. BindingDB is based upon a relational data specification for describing binding measurements via Isothermal Titration Calorimetry (ITC) and enzyme inhibition. A corresponding XML Document Type Definition (DTD) is used to create and parse intermediate files during the on-line deposition process and will also be used for data interchange, including collection of data from other sources. The on-line query interface, which is constructed with Java Servlet technology, supports standard SQL queries as well as searches for molecules by chemical structure and sequence homology. The on-line deposition interface uses Java Server Pages and JavaBean objects to generate dynamic HTML and to store intermediate results. The resulting data resource provides a range of functionality with brisk response-times, and lends itself well to continued development and enhancement.

  1. Producing approximate answers to database queries

    NASA Technical Reports Server (NTRS)

    Vrbsky, Susan V.; Liu, Jane W. S.

    1993-01-01

    We have designed and implemented a query processor, called APPROXIMATE, that makes approximate answers available if part of the database is unavailable or if there is not enough time to produce an exact answer. The accuracy of the approximate answers produced improves monotonically with the amount of data retrieved to produce the result. The exact answer is produced if all of the needed data are available and query processing is allowed to continue until completion. The monotone query processing algorithm of APPROXIMATE works within the standard relational algebra framework and can be implemented on a relational database system with little change to the relational architecture. We describe here the approximation semantics of APPROXIMATE that serves as the basis for meaningful approximations of both set-valued and single-valued queries. We show how APPROXIMATE is implemented to make effective use of semantic information, provided by an object-oriented view of the database, and describe the additional overhead required by APPROXIMATE.

  2. The role of economics in the QUERI program: QUERI Series

    PubMed Central

    Smith, Mark W; Barnett, Paul G

    2008-01-01

    Background The United States (U.S.) Department of Veterans Affairs (VA) Quality Enhancement Research Initiative (QUERI) has implemented economic analyses in single-site and multi-site clinical trials. To date, no one has reviewed whether the QUERI Centers are taking an optimal approach to doing so. Consistent with the continuous learning culture of the QUERI Program, this paper provides such a reflection. Methods We present a case study of QUERI as an example of how economic considerations can and should be integrated into implementation research within both single and multi-site studies. We review theoretical and applied cost research in implementation studies outside and within VA. We also present a critique of the use of economic research within the QUERI program. Results Economic evaluation is a key element of implementation research. QUERI has contributed many developments in the field of implementation but has only recently begun multi-site implementation trials across multiple regions within the national VA healthcare system. These trials are unusual in their emphasis on developing detailed costs of implementation, as well as in the use of business case analyses (budget impact analyses). Conclusion Economics appears to play an important role in QUERI implementation studies, only after implementation has reached the stage of multi-site trials. Economic analysis could better inform the choice of which clinical best practices to implement and the choice of implementation interventions to employ. QUERI economics also would benefit from research on costing methods and development of widely accepted international standards for implementation economics. PMID:18430199

  3. The role of economics in the QUERI program: QUERI Series.

    PubMed

    Smith, Mark W; Barnett, Paul G

    2008-04-22

    The United States (U.S.) Department of Veterans Affairs (VA) Quality Enhancement Research Initiative (QUERI) has implemented economic analyses in single-site and multi-site clinical trials. To date, no one has reviewed whether the QUERI Centers are taking an optimal approach to doing so. Consistent with the continuous learning culture of the QUERI Program, this paper provides such a reflection. We present a case study of QUERI as an example of how economic considerations can and should be integrated into implementation research within both single and multi-site studies. We review theoretical and applied cost research in implementation studies outside and within VA. We also present a critique of the use of economic research within the QUERI program. Economic evaluation is a key element of implementation research. QUERI has contributed many developments in the field of implementation but has only recently begun multi-site implementation trials across multiple regions within the national VA healthcare system. These trials are unusual in their emphasis on developing detailed costs of implementation, as well as in the use of business case analyses (budget impact analyses). Economics appears to play an important role in QUERI implementation studies, only after implementation has reached the stage of multi-site trials. Economic analysis could better inform the choice of which clinical best practices to implement and the choice of implementation interventions to employ. QUERI economics also would benefit from research on costing methods and development of widely accepted international standards for implementation economics.

  4. A Novel Two-Tier Cooperative Caching Mechanism for the Optimization of Multi-Attribute Periodic Queries in Wireless Sensor Networks

    PubMed Central

    Zhou, ZhangBing; Zhao, Deng; Shu, Lei; Tsang, Kim-Fung

    2015-01-01

    Wireless sensor networks, serving as an important interface between physical environments and computational systems, have been used extensively for supporting domain applications, where multiple-attribute sensory data are queried from the network continuously and periodically. Usually, certain sensory data may not vary significantly within a certain time duration for certain applications. In this setting, sensory data gathered at a certain time slot can be used for answering concurrent queries and may be reused for answering the forthcoming queries when the variation of these data is within a certain threshold. To address this challenge, a popularity-based cooperative caching mechanism is proposed in this article, where the popularity of sensory data is calculated according to the queries issued in recent time slots. This popularity reflects the possibility that sensory data are interested in the forthcoming queries. Generally, sensory data with the highest popularity are cached at the sink node, while sensory data that may not be interested in the forthcoming queries are cached in the head nodes of divided grid cells. Leveraging these cooperatively cached sensory data, queries are answered through composing these two-tier cached data. Experimental evaluation shows that this approach can reduce the network communication cost significantly and increase the network capability. PMID:26131665

  5. a Novel Approach of Indexing and Retrieving Spatial Polygons for Efficient Spatial Region Queries

    NASA Astrophysics Data System (ADS)

    Zhao, J. H.; Wang, X. Z.; Wang, F. Y.; Shen, Z. H.; Zhou, Y. C.; Wang, Y. L.

    2017-10-01

    Spatial region queries are more and more widely used in web-based applications. Mechanisms to provide efficient query processing over geospatial data are essential. However, due to the massive geospatial data volume, heavy geometric computation, and high access concurrency, it is difficult to get response in real time. Spatial indexes are usually used in this situation. In this paper, based on k-d tree, we introduce a distributed KD-Tree (DKD-Tree) suitbable for polygon data, and a two-step query algorithm. The spatial index construction is recursive and iterative, and the query is an in memory process. Both the index and query methods can be processed in parallel, and are implemented based on HDFS, Spark and Redis. Experiments on a large volume of Remote Sensing images metadata have been carried out, and the advantages of our method are investigated by comparing with spatial region queries executed on PostgreSQL and PostGIS. Results show that our approach not only greatly improves the efficiency of spatial region query, but also has good scalability, Moreover, the two-step spatial range query algorithm can also save cluster resources to support a large number of concurrent queries. Therefore, this method is very useful when building large geographic information systems.

  6. ASSET Queries: A Set-Oriented and Column-Wise Approach to Modern OLAP

    NASA Astrophysics Data System (ADS)

    Chatziantoniou, Damianos; Sotiropoulos, Yannis

    Modern data analysis has given birth to numerous grouping constructs and programming paradigms, way beyond the traditional group by. Applications such as data warehousing, web log analysis, streams monitoring and social networks understanding necessitated the use of data cubes, grouping variables, windows and MapReduce. In this paper we review the associated set (ASSET) concept and discuss its applicability in both continuous and traditional data settings. Given a set of values B, an associated set over B is just a collection of annotated data multisets, one for each b(B. The goal is to efficiently compute aggregates over these data sets. An ASSET query consists of repeated definitions of associated sets and aggregates of these, possibly correlated, resembling a spreadsheet document. We review systems implementing ASSET queries both in continuous and persistent contexts and argue for associated sets' analytical abilities and optimization opportunities.

  7. Real-time community detection in full social networks on a laptop

    PubMed Central

    Chamberlain, Benjamin Paul; Levy-Kramer, Josh; Humby, Clive

    2018-01-01

    For a broad range of research and practical applications it is important to understand the allegiances, communities and structure of key players in society. One promising direction towards extracting this information is to exploit the rich relational data in digital social networks (the social graph). As global social networks (e.g., Facebook and Twitter) are very large, most approaches make use of distributed computing systems for this purpose. Distributing graph processing requires solving many difficult engineering problems, which has lead some researchers to look at single-machine solutions that are faster and easier to maintain. In this article, we present an approach for analyzing full social networks on a standard laptop, allowing for interactive exploration of the communities in the locality of a set of user specified query vertices. The key idea is that the aggregate actions of large numbers of users can be compressed into a data structure that encapsulates the edge weights between vertices in a derived graph. Local communities can be constructed by selecting vertices that are connected to the query vertices with high edge weights in the derived graph. This compression is robust to noise and allows for interactive queries of local communities in real-time, which we define to be less than the average human reaction time of 0.25s. We achieve single-machine real-time performance by compressing the neighborhood of each vertex using minhash signatures and facilitate rapid queries through Locality Sensitive Hashing. These techniques reduce query times from hours using industrial desktop machines operating on the full graph to milliseconds on standard laptops. Our method allows exploration of strongly associated regions (i.e., communities) of large graphs in real-time on a laptop. It has been deployed in software that is actively used by social network analysts and offers another channel for media owners to monetize their data, helping them to continue to provide free services that are valued by billions of people globally. PMID:29342158

  8. Achieve Location Privacy-Preserving Range Query in Vehicular Sensing

    PubMed Central

    Lu, Rongxing; Ma, Maode; Bao, Haiyong

    2017-01-01

    Modern vehicles are equipped with a plethora of on-board sensors and large on-board storage, which enables them to gather and store various local-relevant data. However, the wide application of vehicular sensing has its own challenges, among which location-privacy preservation and data query accuracy are two critical problems. In this paper, we propose a novel range query scheme, which helps the data requester to accurately retrieve the sensed data from the distributive on-board storage in vehicular ad hoc networks (VANETs) with location privacy preservation. The proposed scheme exploits structured scalars to denote the locations of data requesters and vehicles, and achieves the privacy-preserving location matching with the homomorphic Paillier cryptosystem technique. Detailed security analysis shows that the proposed range query scheme can successfully preserve the location privacy of the involved data requesters and vehicles, and protect the confidentiality of the sensed data. In addition, performance evaluations are conducted to show the efficiency of the proposed scheme, in terms of computation delay and communication overhead. Specifically, the computation delay and communication overhead are not dependent on the length of the scalar, and they are only proportional to the number of vehicles. PMID:28786943

  9. Achieve Location Privacy-Preserving Range Query in Vehicular Sensing.

    PubMed

    Kong, Qinglei; Lu, Rongxing; Ma, Maode; Bao, Haiyong

    2017-08-08

    Modern vehicles are equipped with a plethora of on-board sensors and large on-board storage, which enables them to gather and store various local-relevant data. However, the wide application of vehicular sensing has its own challenges, among which location-privacy preservation and data query accuracy are two critical problems. In this paper, we propose a novel range query scheme, which helps the data requester to accurately retrieve the sensed data from the distributive on-board storage in vehicular ad hoc networks (VANETs) with location privacy preservation. The proposed scheme exploits structured scalars to denote the locations of data requesters and vehicles, and achieves the privacy-preserving location matching with the homomorphic Paillier cryptosystem technique. Detailed security analysis shows that the proposed range query scheme can successfully preserve the location privacy of the involved data requesters and vehicles, and protect the confidentiality of the sensed data. In addition, performance evaluations are conducted to show the efficiency of the proposed scheme, in terms of computation delay and communication overhead. Specifically, the computation delay and communication overhead are not dependent on the length of the scalar, and they are only proportional to the number of vehicles.

  10. SkyQuery - A Prototype Distributed Query and Cross-Matching Web Service for the Virtual Observatory

    NASA Astrophysics Data System (ADS)

    Thakar, A. R.; Budavari, T.; Malik, T.; Szalay, A. S.; Fekete, G.; Nieto-Santisteban, M.; Haridas, V.; Gray, J.

    2002-12-01

    We have developed a prototype distributed query and cross-matching service for the VO community, called SkyQuery, which is implemented with hierarchichal Web Services. SkyQuery enables astronomers to run combined queries on existing distributed heterogeneous astronomy archives. SkyQuery provides a simple, user-friendly interface to run distributed queries over the federation of registered astronomical archives in the VO. The SkyQuery client connects to the portal Web Service, which farms the query out to the individual archives, which are also Web Services called SkyNodes. The cross-matching algorithm is run recursively on each SkyNode. Each archive is a relational DBMS with a HTM index for fast spatial lookups. The results of the distributed query are returned as an XML DataSet that is automatically rendered by the client. SkyQuery also returns the image cutout corresponding to the query result. SkyQuery finds not only matches between the various catalogs, but also dropouts - objects that exist in some of the catalogs but not in others. This is often as important as finding matches. We demonstrate the utility of SkyQuery with a brown-dwarf search between SDSS and 2MASS, and a search for radio-quiet quasars in SDSS, 2MASS and FIRST. The importance of a service like SkyQuery for the worldwide astronomical community cannot be overstated: data on the same objects in various archives is mapped in different wavelength ranges and looks very different due to different errors, instrument sensitivities and other peculiarities of each archive. Our cross-matching algorithm preforms a fuzzy spatial join across multiple catalogs. This type of cross-matching is currently often done by eye, one object at a time. A static cross-identification table for a set of archives would become obsolete by the time it was built - the exponential growth of astronomical data means that a dynamic cross-identification mechanism like SkyQuery is the only viable option. SkyQuery was funded by a grant from the NASA AISR program.

  11. Privacy-Preserving Location-Based Services

    ERIC Educational Resources Information Center

    Chow, Chi Yin

    2010-01-01

    Location-based services (LBS for short) providers require users' current locations to answer their location-based queries, e.g., range and nearest-neighbor queries. Revealing personal location information to potentially untrusted service providers could create privacy risks for users. To this end, our objective is to design a privacy-preserving…

  12. Analysis of Information Needs of Users of MEDLINEplus, 2002 – 2003

    PubMed Central

    Scott-Wright, Alicia; Crowell, Jon; Zeng, Qing; Bates, David W.; Greenes, Robert

    2006-01-01

    We analyzed query logs from use of MEDLINEplus to answer the questions: Are consumers’ health information needs stable over time? and To what extent do users’ queries change over time? To determine log stability, we assessed an Overlap Rate (OR) defined as the number of unique queries common to two adjacent months divided by the total number of unique queries in those months. All exactly matching queries were considered as one unique query. We measured ORs for the top 10 and 100 unique queries of a month and compared these to ORs for the following month. Over ten months, users submitted 12,234,737 queries; only 2,179,571 (17.8%) were unique and these had a mean word count of 2.73 (S.D., 0.24); 121 of 137 (88.3%) unique queries each comprised of exactly matching search term(s) used at least 5000 times were of only one word. We could predict with 95% confidence that the monthly OR for the top 100 unique queries would lie between 67% – 87% when compared with the top 100 from the previous month. The mean month-to-month OR for top 10 queries was 62% (S.D., 20%) indicating significant variability; the lowest OR of 33% between the top 10 in Mar. compared to Apr. was likely due to “new” interest in information about SARS pneumonia in Apr. 2003. Consumers’ health information needs are relatively stable and the 100 most common unique queries are about 77% the same from month to month. Website sponsors should provide a broad range of information about a relatively stable number of topics. Analyses of log similarity may identify media-induced, cyclical, or seasonal changes in areas of consumer interest. PMID:17238431

  13. Big Data and Dysmenorrhea: What Questions Do Women and Men Ask About Menstrual Pain?

    PubMed

    Chen, Chen X; Groves, Doyle; Miller, Wendy R; Carpenter, Janet S

    2018-04-30

    Menstrual pain is highly prevalent among women of reproductive age. As the general public increasingly obtains health information online, Big Data from online platforms provide novel sources to understand the public's perspectives and information needs about menstrual pain. The study's purpose was to describe salient queries about dysmenorrhea using Big Data from a question and answer platform. We performed text-mining of 1.9 billion queries from ChaCha, a United States-based question and answer platform. Dysmenorrhea-related queries were identified by using keyword searching. Each relevant query was split into token words (i.e., meaningful words or phrases) and stop words (i.e., not meaningful functional words). Word Adjacency Graph (WAG) modeling was used to detect clusters of queries and visualize the range of dysmenorrhea-related topics. We constructed two WAG models respectively from queries by women of reproductive age and bymen. Salient themes were identified through inspecting clusters of WAG models. We identified two subsets of queries: Subset 1 contained 507,327 queries from women aged 13-50 years. Subset 2 contained 113,888 queries from men aged 13 or above. WAG modeling revealed topic clusters for each subset. Between female and male subsets, topic clusters overlapped on dysmenorrhea symptoms and management. Among female queries, there were distinctive topics on approaching menstrual pain at school and menstrual pain-related conditions; while among male queries, there was a distinctive cluster of queries on menstrual pain from male's perspectives. Big Data mining of the ChaCha ® question and answer service revealed a series of information needs among women and men on menstrual pain. Findings may be useful in structuring the content and informing the delivery platform for educational interventions.

  14. A Layered Searchable Encryption Scheme with Functional Components Independent of Encryption Methods

    PubMed Central

    Luo, Guangchun; Qin, Ke

    2014-01-01

    Searchable encryption technique enables the users to securely store and search their documents over the remote semitrusted server, which is especially suitable for protecting sensitive data in the cloud. However, various settings (based on symmetric or asymmetric encryption) and functionalities (ranked keyword query, range query, phrase query, etc.) are often realized by different methods with different searchable structures that are generally not compatible with each other, which limits the scope of application and hinders the functional extensions. We prove that asymmetric searchable structure could be converted to symmetric structure, and functions could be modeled separately apart from the core searchable structure. Based on this observation, we propose a layered searchable encryption (LSE) scheme, which provides compatibility, flexibility, and security for various settings and functionalities. In this scheme, the outputs of the core searchable component based on either symmetric or asymmetric setting are converted to some uniform mappings, which are then transmitted to loosely coupled functional components to further filter the results. In such a way, all functional components could directly support both symmetric and asymmetric settings. Based on LSE, we propose two representative and novel constructions for ranked keyword query (previously only available in symmetric scheme) and range query (previously only available in asymmetric scheme). PMID:24719565

  15. A Selectivity based approach to Continuous Pattern Detection in Streaming Graphs

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Choudhury, Sutanay; Holder, Larry; Chin, George

    2015-02-02

    Cyber security is one of the most significant technical challenges in current times. Detecting adversarial activities, prevention of theft of intellectual properties and customer data is a high priority for corporations and government agencies around the world. Cyber defenders need to analyze massive-scale, high-resolution network flows to identify, categorize, and mitigate attacks involving net- works spanning institutional and national boundaries. Many of the cyber attacks can be described as subgraph patterns, with promi- nent examples being insider infiltrations (path queries), denial of service (parallel paths) and malicious spreads (tree queries). This motivates us to explore subgraph matching on streaming graphsmore » in a continuous setting. The novelty of our work lies in using the subgraph distributional statistics collected from the streaming graph to determine the query processing strategy. We introduce a “Lazy Search" algorithm where the search strategy is decided on a vertex-to-vertex basis depending on the likelihood of a match in the vertex neighborhood. We also propose a metric named “Relative Selectivity" that is used to se- lect between different query processing strategies. Our experiments performed on real online news, network traffic stream and a syn- thetic social network benchmark demonstrate 10-100x speedups over selectivity agnostic approaches.« less

  16. A Selectivity based approach to Continuous Pattern Detection in Streaming Graphs

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Choudhury, Sutanay; Holder, Larry; Chin, George

    2015-05-27

    Cyber security is one of the most significant technical challenges in current times. Detecting adversarial activities, prevention of theft of intellectual properties and customer data is a high priority for corporations and government agencies around the world. Cyber defenders need to analyze massive-scale, high-resolution network flows to identify, categorize, and mitigate attacks involving networks spanning institutional and national boundaries. Many of the cyber attacks can be described as subgraph patterns, with prominent examples being insider infiltrations (path queries), denial of service (parallel paths) and malicious spreads (tree queries). This motivates us to explore subgraph matching on streaming graphs in amore » continuous setting. The novelty of our work lies in using the subgraph distributional statistics collected from the streaming graph to determine the query processing strategy. We introduce a ``Lazy Search" algorithm where the search strategy is decided on a vertex-to-vertex basis depending on the likelihood of a match in the vertex neighborhood. We also propose a metric named ``Relative Selectivity" that is used to select between different query processing strategies. Our experiments performed on real online news, network traffic stream and a synthetic social network benchmark demonstrate 10-100x speedups over non-incremental, selectivity agnostic approaches.« less

  17. Adaptive search in mobile peer-to-peer databases

    NASA Technical Reports Server (NTRS)

    Wolfson, Ouri (Inventor); Xu, Bo (Inventor)

    2010-01-01

    Information is stored in a plurality of mobile peers. The peers communicate in a peer to peer fashion, using a short-range wireless network. Occasionally, a peer initiates a search for information in the peer to peer network by issuing a query. Queries and pieces of information, called reports, are transmitted among peers that are within a transmission range. For each search additional peers are utilized, wherein these additional peers search and relay information on behalf of the originator of the search.

  18. Cyclone: java-based querying and computing with Pathway/Genome databases.

    PubMed

    Le Fèvre, François; Smidtas, Serge; Schächter, Vincent

    2007-05-15

    Cyclone aims at facilitating the use of BioCyc, a collection of Pathway/Genome Databases (PGDBs). Cyclone provides a fully extensible Java Object API to analyze and visualize these data. Cyclone can read and write PGDBs, and can write its own data in the CycloneML format. This format is automatically generated from the BioCyc ontology by Cyclone itself, ensuring continued compatibility. Cyclone objects can also be stored in a relational database CycloneDB. Queries can be written in SQL, and in an intuitive and concise object-oriented query language, Hibernate Query Language (HQL). In addition, Cyclone interfaces easily with Java software including the Eclipse IDE for HQL edition, the Jung API for graph algorithms or Cytoscape for graph visualization. Cyclone is freely available under an open source license at: http://sourceforge.net/projects/nemo-cyclone. For download and installation instructions, tutorials, use cases and examples, see http://nemo-cyclone.sourceforge.net.

  19. Performance of Point and Range Queries for In-memory Databases using Radix Trees on GPUs

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Alam, Maksudul; Yoginath, Srikanth B; Perumalla, Kalyan S

    In in-memory database systems augmented by hardware accelerators, accelerating the index searching operations can greatly increase the runtime performance of database queries. Recently, adaptive radix trees (ART) have been shown to provide very fast index search implementation on the CPU. Here, we focus on an accelerator-based implementation of ART. We present a detailed performance study of our GPU-based adaptive radix tree (GRT) implementation over a variety of key distributions, synthetic benchmarks, and actual keys from music and book data sets. The performance is also compared with other index-searching schemes on the GPU. GRT on modern GPUs achieves some of themore » highest rates of index searches reported in the literature. For point queries, a throughput of up to 106 million and 130 million lookups per second is achieved for sparse and dense keys, respectively. For range queries, GRT yields 600 million and 1000 million lookups per second for sparse and dense keys, respectively, on a large dataset of 64 million 32-bit keys.« less

  20. Menopause and big data: Word Adjacency Graph modeling of menopause-related ChaCha data.

    PubMed

    Carpenter, Janet S; Groves, Doyle; Chen, Chen X; Otte, Julie L; Miller, Wendy R

    2017-07-01

    To detect and visualize salient queries about menopause using Big Data from ChaCha. We used Word Adjacency Graph (WAG) modeling to detect clusters and visualize the range of menopause-related topics and their mutual proximity. The subset of relevant queries was fully modeled. We split each query into token words (ie, meaningful words and phrases) and removed stopwords (ie, not meaningful functional words). The remaining words were considered in sequence to build summary tables of words and two and three-word phrases. Phrases occurring at least 10 times were used to build a network graph model that was iteratively refined by observing and removing clusters of unrelated content. We identified two menopause-related subsets of queries by searching for questions containing menopause and menopause-related terms (eg, climacteric, hot flashes, night sweats, hormone replacement). The first contained 263,363 queries from individuals aged 13 and older and the second contained 5,892 queries from women aged 40 to 62 years. In the first set, we identified 12 topic clusters: 6 relevant to menopause and 6 less relevant. In the second set, we identified 15 topic clusters: 11 relevant to menopause and 4 less relevant. Queries about hormones were pervasive within both WAG models. Many of the queries reflected low literacy levels and/or feelings of embarrassment. We modeled menopause-related queries posed by ChaCha users between 2009 and 2012. ChaCha data may be used on its own or in combination with other Big Data sources to identify patient-driven educational needs and create patient-centered interventions.

  1. An efficient compression scheme for bitmap indices

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wu, Kesheng; Otoo, Ekow J.; Shoshani, Arie

    2004-04-13

    When using an out-of-core indexing method to answer a query, it is generally assumed that the I/O cost dominates the overall query response time. Because of this, most research on indexing methods concentrate on reducing the sizes of indices. For bitmap indices, compression has been used for this purpose. However, in most cases, operations on these compressed bitmaps, mostly bitwise logical operations such as AND, OR, and NOT, spend more time in CPU than in I/O. To speedup these operations, a number of specialized bitmap compression schemes have been developed; the best known of which is the byte-aligned bitmap codemore » (BBC). They are usually faster in performing logical operations than the general purpose compression schemes, but, the time spent in CPU still dominates the total query response time. To reduce the query response time, we designed a CPU-friendly scheme named the word-aligned hybrid (WAH) code. In this paper, we prove that the sizes of WAH compressed bitmap indices are about two words per row for large range of attributes. This size is smaller than typical sizes of commonly used indices, such as a B-tree. Therefore, WAH compressed indices are not only appropriate for low cardinality attributes but also for high cardinality attributes.In the worst case, the time to operate on compressed bitmaps is proportional to the total size of the bitmaps involved. The total size of the bitmaps required to answer a query on one attribute is proportional to the number of hits. These indicate that WAH compressed bitmap indices are optimal. To verify their effectiveness, we generated bitmap indices for four different datasets and measured the response time of many range queries. Tests confirm that sizes of compressed bitmap indices are indeed smaller than B-tree indices, and query processing with WAH compressed indices is much faster than with BBC compressed indices, projection indices and B-tree indices. In addition, we also verified that the average query response time is proportional to the index size. This indicates that the compressed bitmap indices are efficient for very large datasets.« less

  2. Integrating Syntax, Semantics, and Discourse DARPA Natural Language Understanding Program. Volume 3. Papers

    DTIC Science & Technology

    1989-09-30

    parses, in a second experiment. This procedure used PUNDIT’s Selection Pattern Query and Response ( SPQR ) component JLang19881. We first used SPQR in...messages pattern. SPQR continues the analysis of the ISR. from each domain, and the resulting output is and the parsing of the sentence is allowed to...UNISYS P. 0. Box 517, Paoli, PA 19301 ABSTRACT knowledge. This paper presents SPQR (Selectional Pat- One obvious benefit of acquiring domain- tern Queries

  3. Benchmarking distributed data warehouse solutions for storing genomic variant information

    PubMed Central

    Wiewiórka, Marek S.; Wysakowicz, Dawid P.; Okoniewski, Michał J.

    2017-01-01

    Abstract Genomic-based personalized medicine encompasses storing, analysing and interpreting genomic variants as its central issues. At a time when thousands of patientss sequenced exomes and genomes are becoming available, there is a growing need for efficient database storage and querying. The answer could be the application of modern distributed storage systems and query engines. However, the application of large genomic variant databases to this problem has not been sufficiently far explored so far in the literature. To investigate the effectiveness of modern columnar storage [column-oriented Database Management System (DBMS)] and query engines, we have developed a prototypic genomic variant data warehouse, populated with large generated content of genomic variants and phenotypic data. Next, we have benchmarked performance of a number of combinations of distributed storages and query engines on a set of SQL queries that address biological questions essential for both research and medical applications. In addition, a non-distributed, analytical database (MonetDB) has been used as a baseline. Comparison of query execution times confirms that distributed data warehousing solutions outperform classic relational DBMSs. Moreover, pre-aggregation and further denormalization of data, which reduce the number of distributed join operations, significantly improve query performance by several orders of magnitude. Most of distributed back-ends offer a good performance for complex analytical queries, while the Optimized Row Columnar (ORC) format paired with Presto and Parquet with Spark 2 query engines provide, on average, the lowest execution times. Apache Kudu on the other hand, is the only solution that guarantees a sub-second performance for simple genome range queries returning a small subset of data, where low-latency response is expected, while still offering decent performance for running analytical queries. In summary, research and clinical applications that require the storage and analysis of variants from thousands of samples can benefit from the scalability and performance of distributed data warehouse solutions. Database URL: https://github.com/ZSI-Bio/variantsdwh PMID:29220442

  4. FastBit Reference Manual

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wu, Kesheng

    2007-08-02

    An index in a database system is a data structure that utilizes redundant information about the base data to speed up common searching and retrieval operations. Most commonly used indexes are variants of B-trees, such as B+-tree and B*-tree. FastBit implements a set of alternative indexes call compressed bitmap indexes. Compared with B-tree variants, these indexes provide very efficient searching and retrieval operations by sacrificing the efficiency of updating the indexes after the modification of an individual record. In addition to the well-known strengths of bitmap indexes, FastBit has a special strength stemming from the bitmap compression scheme used. Themore » compression method is called the Word-Aligned Hybrid (WAH) code. It reduces the bitmap indexes to reasonable sizes and at the same time allows very efficient bitwise logical operations directly on the compressed bitmaps. Compared with the well-known compression methods such as LZ77 and Byte-aligned Bitmap code (BBC), WAH sacrifices some space efficiency for a significant improvement in operational efficiency. Since the bitwise logical operations are the most important operations needed to answer queries, using WAH compression has been shown to answer queries significantly faster than using other compression schemes. Theoretical analyses showed that WAH compressed bitmap indexes are optimal for one-dimensional range queries. Only the most efficient indexing schemes such as B+-tree and B*-tree have this optimality property. However, bitmap indexes are superior because they can efficiently answer multi-dimensional range queries by combining the answers to one-dimensional queries.« less

  5. Using web search query data to monitor dengue epidemics: a new model for neglected tropical disease surveillance.

    PubMed

    Chan, Emily H; Sahai, Vikram; Conrad, Corrie; Brownstein, John S

    2011-05-01

    A variety of obstacles including bureaucracy and lack of resources have interfered with timely detection and reporting of dengue cases in many endemic countries. Surveillance efforts have turned to modern data sources, such as Internet search queries, which have been shown to be effective for monitoring influenza-like illnesses. However, few have evaluated the utility of web search query data for other diseases, especially those of high morbidity and mortality or where a vaccine may not exist. In this study, we aimed to assess whether web search queries are a viable data source for the early detection and monitoring of dengue epidemics. Bolivia, Brazil, India, Indonesia and Singapore were chosen for analysis based on available data and adequate search volume. For each country, a univariate linear model was then built by fitting a time series of the fraction of Google search query volume for specific dengue-related queries from that country against a time series of official dengue case counts for a time-frame within 2003-2010. The specific combination of queries used was chosen to maximize model fit. Spurious spikes in the data were also removed prior to model fitting. The final models, fit using a training subset of the data, were cross-validated against both the overall dataset and a holdout subset of the data. All models were found to fit the data quite well, with validation correlations ranging from 0.82 to 0.99. Web search query data were found to be capable of tracking dengue activity in Bolivia, Brazil, India, Indonesia and Singapore. Whereas traditional dengue data from official sources are often not available until after some substantial delay, web search query data are available in near real-time. These data represent valuable complement to assist with traditional dengue surveillance.

  6. Quality Management Tools: Facilitating Clinical Research Data Integrity by Utilizing Specialized Reports with Electronic Case Report Forms

    PubMed Central

    Trocky, NM; Fontinha, M

    2005-01-01

    Data collected throughout the course of a clinical research trial must be reviewed for accuracy and completeness continually. The Oracle Clinical® (OC) data management application utilized to capture clinical data facilitates data integrity through pre-programmed validations, edit and range checks, and discrepancy management modules. These functions were not enough. Coupled with the use of specially created reports in Oracle Discoverer® and Integrated Review TM, both ad-hoc query and reporting tools, research staff have enhanced their ability to clean, analyze and report more accurate data captured within and among Case Report Forms (eCRFs) by individual study or across multiple studies. PMID:16779428

  7. Knowledge and Theme Discovery across Very Large Biological Data Sets Using Distributed Queries: A Prototype Combining Unstructured and Structured Data

    PubMed Central

    Repetski, Stephen; Venkataraman, Girish; Che, Anney; Luke, Brian T.; Girard, F. Pascal; Stephens, Robert M.

    2013-01-01

    As the discipline of biomedical science continues to apply new technologies capable of producing unprecedented volumes of noisy and complex biological data, it has become evident that available methods for deriving meaningful information from such data are simply not keeping pace. In order to achieve useful results, researchers require methods that consolidate, store and query combinations of structured and unstructured data sets efficiently and effectively. As we move towards personalized medicine, the need to combine unstructured data, such as medical literature, with large amounts of highly structured and high-throughput data such as human variation or expression data from very large cohorts, is especially urgent. For our study, we investigated a likely biomedical query using the Hadoop framework. We ran queries using native MapReduce tools we developed as well as other open source and proprietary tools. Our results suggest that the available technologies within the Big Data domain can reduce the time and effort needed to utilize and apply distributed queries over large datasets in practical clinical applications in the life sciences domain. The methodologies and technologies discussed in this paper set the stage for a more detailed evaluation that investigates how various data structures and data models are best mapped to the proper computational framework. PMID:24312478

  8. Knowledge and theme discovery across very large biological data sets using distributed queries: a prototype combining unstructured and structured data.

    PubMed

    Mudunuri, Uma S; Khouja, Mohamad; Repetski, Stephen; Venkataraman, Girish; Che, Anney; Luke, Brian T; Girard, F Pascal; Stephens, Robert M

    2013-01-01

    As the discipline of biomedical science continues to apply new technologies capable of producing unprecedented volumes of noisy and complex biological data, it has become evident that available methods for deriving meaningful information from such data are simply not keeping pace. In order to achieve useful results, researchers require methods that consolidate, store and query combinations of structured and unstructured data sets efficiently and effectively. As we move towards personalized medicine, the need to combine unstructured data, such as medical literature, with large amounts of highly structured and high-throughput data such as human variation or expression data from very large cohorts, is especially urgent. For our study, we investigated a likely biomedical query using the Hadoop framework. We ran queries using native MapReduce tools we developed as well as other open source and proprietary tools. Our results suggest that the available technologies within the Big Data domain can reduce the time and effort needed to utilize and apply distributed queries over large datasets in practical clinical applications in the life sciences domain. The methodologies and technologies discussed in this paper set the stage for a more detailed evaluation that investigates how various data structures and data models are best mapped to the proper computational framework.

  9. Fast and Exact Continuous Collision Detection with Bernstein Sign Classification

    PubMed Central

    Tang, Min; Tong, Ruofeng; Wang, Zhendong; Manocha, Dinesh

    2014-01-01

    We present fast algorithms to perform accurate CCD queries between triangulated models. Our formulation uses properties of the Bernstein basis and Bézier curves and reduces the problem to evaluating signs of polynomials. We present a geometrically exact CCD algorithm based on the exact geometric computation paradigm to perform reliable Boolean collision queries. Our algorithm is more than an order of magnitude faster than prior exact algorithms. We evaluate its performance for cloth and FEM simulations on CPUs and GPUs, and highlight the benefits. PMID:25568589

  10. Web search queries can predict stock market volumes.

    PubMed

    Bordino, Ilaria; Battiston, Stefano; Caldarelli, Guido; Cristelli, Matthieu; Ukkonen, Antti; Weber, Ingmar

    2012-01-01

    We live in a computerized and networked society where many of our actions leave a digital trace and affect other people's actions. This has lead to the emergence of a new data-driven research field: mathematical methods of computer science, statistical physics and sociometry provide insights on a wide range of disciplines ranging from social science to human mobility. A recent important discovery is that search engine traffic (i.e., the number of requests submitted by users to search engines on the www) can be used to track and, in some cases, to anticipate the dynamics of social phenomena. Successful examples include unemployment levels, car and home sales, and epidemics spreading. Few recent works applied this approach to stock prices and market sentiment. However, it remains unclear if trends in financial markets can be anticipated by the collective wisdom of on-line users on the web. Here we show that daily trading volumes of stocks traded in NASDAQ-100 are correlated with daily volumes of queries related to the same stocks. In particular, query volumes anticipate in many cases peaks of trading by one day or more. Our analysis is carried out on a unique dataset of queries, submitted to an important web search engine, which enable us to investigate also the user behavior. We show that the query volume dynamics emerges from the collective but seemingly uncoordinated activity of many users. These findings contribute to the debate on the identification of early warnings of financial systemic risk, based on the activity of users of the www.

  11. Web Search Queries Can Predict Stock Market Volumes

    PubMed Central

    Bordino, Ilaria; Battiston, Stefano; Caldarelli, Guido; Cristelli, Matthieu; Ukkonen, Antti; Weber, Ingmar

    2012-01-01

    We live in a computerized and networked society where many of our actions leave a digital trace and affect other people’s actions. This has lead to the emergence of a new data-driven research field: mathematical methods of computer science, statistical physics and sociometry provide insights on a wide range of disciplines ranging from social science to human mobility. A recent important discovery is that search engine traffic (i.e., the number of requests submitted by users to search engines on the www) can be used to track and, in some cases, to anticipate the dynamics of social phenomena. Successful examples include unemployment levels, car and home sales, and epidemics spreading. Few recent works applied this approach to stock prices and market sentiment. However, it remains unclear if trends in financial markets can be anticipated by the collective wisdom of on-line users on the web. Here we show that daily trading volumes of stocks traded in NASDAQ-100 are correlated with daily volumes of queries related to the same stocks. In particular, query volumes anticipate in many cases peaks of trading by one day or more. Our analysis is carried out on a unique dataset of queries, submitted to an important web search engine, which enable us to investigate also the user behavior. We show that the query volume dynamics emerges from the collective but seemingly uncoordinated activity of many users. These findings contribute to the debate on the identification of early warnings of financial systemic risk, based on the activity of users of the www. PMID:22829871

  12. Brief Report: Autistic Features in Children and Adolescents with Gender Dysphoria.

    PubMed

    Skagerberg, Elin; Di Ceglie, Domenico; Carmichael, Polly

    2015-08-01

    This paper looks at the association between gender dysphoria (GD), scores on the Social Responsiveness Scale (SRS), and reported diagnoses of autism spectrum disorder (ASD). Parents of 166 young people presenting with GD (Mean age = 14.26, SD = 2.68) completed the SRS. Information concerning an ASD diagnosis was also extracted from the patient files. 45.8% fell within the normal range on the SRS and of those 2.8% had an ASD diagnosis. 27.1% fell within the mild/moderate range and of those 15.6% had an ASD diagnosis and 6.7% an ASD query. 27.1% fell within the severe range and of those 24.4% had an ASD diagnosis and 26.7% an ASD query. No difference was found in autistic features between the natal females and males.

  13. BioMon: A Google Earth Based Continuous Biomass Monitoring System (Demo Paper)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Vatsavai, Raju

    2009-01-01

    We demonstrate a Google Earth based novel visualization system for continuous monitoring of biomass at regional and global scales. This system is integrated with a back-end spatiotemporal data mining system that continuously detects changes using high temporal resolution MODIS images. In addition to the visualization, we demonstrate novel query features of the system that provides insights into the current conditions of the landscape.

  14. Using Web Search Query Data to Monitor Dengue Epidemics: A New Model for Neglected Tropical Disease Surveillance

    PubMed Central

    Chan, Emily H.; Sahai, Vikram; Conrad, Corrie; Brownstein, John S.

    2011-01-01

    Background A variety of obstacles including bureaucracy and lack of resources have interfered with timely detection and reporting of dengue cases in many endemic countries. Surveillance efforts have turned to modern data sources, such as Internet search queries, which have been shown to be effective for monitoring influenza-like illnesses. However, few have evaluated the utility of web search query data for other diseases, especially those of high morbidity and mortality or where a vaccine may not exist. In this study, we aimed to assess whether web search queries are a viable data source for the early detection and monitoring of dengue epidemics. Methodology/Principal Findings Bolivia, Brazil, India, Indonesia and Singapore were chosen for analysis based on available data and adequate search volume. For each country, a univariate linear model was then built by fitting a time series of the fraction of Google search query volume for specific dengue-related queries from that country against a time series of official dengue case counts for a time-frame within 2003–2010. The specific combination of queries used was chosen to maximize model fit. Spurious spikes in the data were also removed prior to model fitting. The final models, fit using a training subset of the data, were cross-validated against both the overall dataset and a holdout subset of the data. All models were found to fit the data quite well, with validation correlations ranging from 0.82 to 0.99. Conclusions/Significance Web search query data were found to be capable of tracking dengue activity in Bolivia, Brazil, India, Indonesia and Singapore. Whereas traditional dengue data from official sources are often not available until after some substantial delay, web search query data are available in near real-time. These data represent valuable complement to assist with traditional dengue surveillance. PMID:21647308

  15. An alternative database approach for management of SNOMED CT and improved patient data queries.

    PubMed

    Campbell, W Scott; Pedersen, Jay; McClay, James C; Rao, Praveen; Bastola, Dhundy; Campbell, James R

    2015-10-01

    SNOMED CT is the international lingua franca of terminologies for human health. Based in Description Logics (DL), the terminology enables data queries that incorporate inferences between data elements, as well as, those relationships that are explicitly stated. However, the ontologic and polyhierarchical nature of the SNOMED CT concept model make it difficult to implement in its entirety within electronic health record systems that largely employ object oriented or relational database architectures. The result is a reduction of data richness, limitations of query capability and increased systems overhead. The hypothesis of this research was that a graph database (graph DB) architecture using SNOMED CT as the basis for the data model and subsequently modeling patient data upon the semantic core of SNOMED CT could exploit the full value of the terminology to enrich and support advanced data querying capability of patient data sets. The hypothesis was tested by instantiating a graph DB with the fully classified SNOMED CT concept model. The graph DB instance was tested for integrity by calculating the transitive closure table for the SNOMED CT hierarchy and comparing the results with transitive closure tables created using current, validated methods. The graph DB was then populated with 461,171 anonymized patient record fragments and over 2.1 million associated SNOMED CT clinical findings. Queries, including concept negation and disjunction, were then run against the graph database and an enterprise Oracle relational database (RDBMS) of the same patient data sets. The graph DB was then populated with laboratory data encoded using LOINC, as well as, medication data encoded with RxNorm and complex queries performed using LOINC, RxNorm and SNOMED CT to identify uniquely described patient populations. A graph database instance was successfully created for two international releases of SNOMED CT and two US SNOMED CT editions. Transitive closure tables and descriptive statistics generated using the graph database were identical to those using validated methods. Patient queries produced identical patient count results to the Oracle RDBMS with comparable times. Database queries involving defining attributes of SNOMED CT concepts were possible with the graph DB. The same queries could not be directly performed with the Oracle RDBMS representation of the patient data and required the creation and use of external terminology services. Further, queries of undefined depth were successful in identifying unknown relationships between patient cohorts. The results of this study supported the hypothesis that a patient database built upon and around the semantic model of SNOMED CT was possible. The model supported queries that leveraged all aspects of the SNOMED CT logical model to produce clinically relevant query results. Logical disjunction and negation queries were possible using the data model, as well as, queries that extended beyond the structural IS_A hierarchy of SNOMED CT to include queries that employed defining attribute-values of SNOMED CT concepts as search parameters. As medical terminologies, such as SNOMED CT, continue to expand, they will become more complex and model consistency will be more difficult to assure. Simultaneously, consumers of data will increasingly demand improvements to query functionality to accommodate additional granularity of clinical concepts without sacrificing speed. This new line of research provides an alternative approach to instantiating and querying patient data represented using advanced computable clinical terminologies. Copyright © 2015 Elsevier Inc. All rights reserved.

  16. A quasi-current representation for information needs inspired by Two-State Vector Formalism

    NASA Astrophysics Data System (ADS)

    Wang, Panpan; Hou, Yuexian; Li, Jingfei; Zhang, Yazhou; Song, Dawei; Li, Wenjie

    2017-09-01

    Recently, a number of quantum theory (QT)-based information retrieval (IR) models have been proposed for modeling session search task that users issue queries continuously in order to describe their evolving information needs (IN). However, the standard formalism of QT cannot provide a complete description for users' current IN in a sense that it does not take the 'future' information into consideration. Therefore, to seek a more proper and complete representation for users' IN, we construct a representation of quasi-current IN inspired by an emerging Two-State Vector Formalism (TSVF). With the enlightenment of the completeness of TSVF, a "two-state vector" derived from the 'future' (the current query) and the 'history' (the previous query) is employed to describe users' quasi-current IN in a more complete way. Extensive experiments are conducted on the session tracks of TREC 2013 & 2014, and show that our model outperforms a series of compared IR models.

  17. An incremental database access method for autonomous interoperable databases

    NASA Technical Reports Server (NTRS)

    Roussopoulos, Nicholas; Sellis, Timos

    1994-01-01

    We investigated a number of design and performance issues of interoperable database management systems (DBMS's). The major results of our investigation were obtained in the areas of client-server database architectures for heterogeneous DBMS's, incremental computation models, buffer management techniques, and query optimization. We finished a prototype of an advanced client-server workstation-based DBMS which allows access to multiple heterogeneous commercial DBMS's. Experiments and simulations were then run to compare its performance with the standard client-server architectures. The focus of this research was on adaptive optimization methods of heterogeneous database systems. Adaptive buffer management accounts for the random and object-oriented access methods for which no known characterization of the access patterns exists. Adaptive query optimization means that value distributions and selectives, which play the most significant role in query plan evaluation, are continuously refined to reflect the actual values as opposed to static ones that are computed off-line. Query feedback is a concept that was first introduced to the literature by our group. We employed query feedback for both adaptive buffer management and for computing value distributions and selectivities. For adaptive buffer management, we use the page faults of prior executions to achieve more 'informed' management decisions. For the estimation of the distributions of the selectivities, we use curve-fitting techniques, such as least squares and splines, for regressing on these values.

  18. Virtual Observatory Interfaces to the Chandra Data Archive

    NASA Astrophysics Data System (ADS)

    Tibbetts, M.; Harbo, P.; Van Stone, D.; Zografou, P.

    2014-05-01

    The Chandra Data Archive (CDA) plays a central role in the operation of the Chandra X-ray Center (CXC) by providing access to Chandra data. Proprietary interfaces have been the backbone of the CDA throughout the Chandra mission. While these interfaces continue to provide the depth and breadth of mission specific access Chandra users expect, the CXC has been adding Virtual Observatory (VO) interfaces to the Chandra proposal catalog and observation catalog. VO interfaces provide standards-based access to Chandra data through simple positional queries or more complex queries using the Astronomical Data Query Language. Recent development at the CDA has generalized our existing VO services to create a suite of services that can be configured to provide VO interfaces to any dataset. This approach uses a thin web service layer for the individual VO interfaces, a middle-tier query component which is shared among the VO interfaces for parsing, scheduling, and executing queries, and existing web services for file and data access. The CXC VO services provide Simple Cone Search (SCS), Simple Image Access (SIA), and Table Access Protocol (TAP) implementations for both the Chandra proposal and observation catalogs within the existing archive architecture. Our work with the Chandra proposal and observation catalogs, as well as additional datasets beyond the CDA, illustrates how we can provide configurable VO services to extend core archive functionality.

  19. SeqWare Query Engine: storing and searching sequence data in the cloud.

    PubMed

    O'Connor, Brian D; Merriman, Barry; Nelson, Stanley F

    2010-12-21

    Since the introduction of next-generation DNA sequencers the rapid increase in sequencer throughput, and associated drop in costs, has resulted in more than a dozen human genomes being resequenced over the last few years. These efforts are merely a prelude for a future in which genome resequencing will be commonplace for both biomedical research and clinical applications. The dramatic increase in sequencer output strains all facets of computational infrastructure, especially databases and query interfaces. The advent of cloud computing, and a variety of powerful tools designed to process petascale datasets, provide a compelling solution to these ever increasing demands. In this work, we present the SeqWare Query Engine which has been created using modern cloud computing technologies and designed to support databasing information from thousands of genomes. Our backend implementation was built using the highly scalable, NoSQL HBase database from the Hadoop project. We also created a web-based frontend that provides both a programmatic and interactive query interface and integrates with widely used genome browsers and tools. Using the query engine, users can load and query variants (SNVs, indels, translocations, etc) with a rich level of annotations including coverage and functional consequences. As a proof of concept we loaded several whole genome datasets including the U87MG cell line. We also used a glioblastoma multiforme tumor/normal pair to both profile performance and provide an example of using the Hadoop MapReduce framework within the query engine. This software is open source and freely available from the SeqWare project (http://seqware.sourceforge.net). The SeqWare Query Engine provided an easy way to make the U87MG genome accessible to programmers and non-programmers alike. This enabled a faster and more open exploration of results, quicker tuning of parameters for heuristic variant calling filters, and a common data interface to simplify development of analytical tools. The range of data types supported, the ease of querying and integrating with existing tools, and the robust scalability of the underlying cloud-based technologies make SeqWare Query Engine a nature fit for storing and searching ever-growing genome sequence datasets.

  20. SeqWare Query Engine: storing and searching sequence data in the cloud

    PubMed Central

    2010-01-01

    Background Since the introduction of next-generation DNA sequencers the rapid increase in sequencer throughput, and associated drop in costs, has resulted in more than a dozen human genomes being resequenced over the last few years. These efforts are merely a prelude for a future in which genome resequencing will be commonplace for both biomedical research and clinical applications. The dramatic increase in sequencer output strains all facets of computational infrastructure, especially databases and query interfaces. The advent of cloud computing, and a variety of powerful tools designed to process petascale datasets, provide a compelling solution to these ever increasing demands. Results In this work, we present the SeqWare Query Engine which has been created using modern cloud computing technologies and designed to support databasing information from thousands of genomes. Our backend implementation was built using the highly scalable, NoSQL HBase database from the Hadoop project. We also created a web-based frontend that provides both a programmatic and interactive query interface and integrates with widely used genome browsers and tools. Using the query engine, users can load and query variants (SNVs, indels, translocations, etc) with a rich level of annotations including coverage and functional consequences. As a proof of concept we loaded several whole genome datasets including the U87MG cell line. We also used a glioblastoma multiforme tumor/normal pair to both profile performance and provide an example of using the Hadoop MapReduce framework within the query engine. This software is open source and freely available from the SeqWare project (http://seqware.sourceforge.net). Conclusions The SeqWare Query Engine provided an easy way to make the U87MG genome accessible to programmers and non-programmers alike. This enabled a faster and more open exploration of results, quicker tuning of parameters for heuristic variant calling filters, and a common data interface to simplify development of analytical tools. The range of data types supported, the ease of querying and integrating with existing tools, and the robust scalability of the underlying cloud-based technologies make SeqWare Query Engine a nature fit for storing and searching ever-growing genome sequence datasets. PMID:21210981

  1. Evaluation of the Feasibility of Screening Patients for Early Signs of Lung Carcinoma in Web Search Logs.

    PubMed

    White, Ryen W; Horvitz, Eric

    2017-03-01

    A statistical model that predicts the appearance of strong evidence of a lung carcinoma diagnosis via analysis of large-scale anonymized logs of web search queries from millions of people across the United States. To evaluate the feasibility of screening patients at risk of lung carcinoma via analysis of signals from online search activity. We identified people who issue special queries that provide strong evidence of a recent diagnosis of lung carcinoma. We then considered patterns of symptoms expressed as searches about concerning symptoms over several months prior to the appearance of the landmark web queries. We built statistical classifiers that predict the future appearance of landmark queries based on the search log signals. This was a retrospective log analysis of the online activity of millions of web searchers seeking health-related information online. Of web searchers who queried for symptoms related to lung carcinoma, some (n = 5443 of 4 813 985) later issued queries that provide strong evidence of recent clinical diagnosis of lung carcinoma and are regarded as positive cases in our analysis. Additional evidence on the reliability of these queries as representing clinical diagnoses is based on the significant increase in follow-on searches for treatments and medications for these searchers and on the correlation between lung carcinoma incidence rates and our log-based statistics. The remaining symptom searchers (n = 4 808 542) are regarded as negative cases. Performance of the statistical model for early detection from online search behavior, for different lead times, different sets of signals, and different cohorts of searchers stratified by potential risk. The statistical classifier predicting the future appearance of landmark web queries based on search log signals identified searchers who later input queries consistent with a lung carcinoma diagnosis, with a true-positive rate ranging from 3% to 57% for false-positive rates ranging from 0.00001 to 0.001, respectively. The methods can be used to identify people at highest risk up to a year in advance of the inferred diagnosis time. The 5 factors associated with the highest relative risk (RR) were evidence of family history (RR = 7.548; 95% CI, 3.937-14.470), age (RR = 3.558; 95% CI, 3.357-3.772), radon (RR = 2.529; 95% CI, 1.137-5.624), primary location (RR = 2.463; 95% CI, 1.364-4.446), and occupation (RR = 1.969; 95% CI, 1.143-3.391). Evidence of smoking (RR = 1.646; 95% CI, 1.032-2.260) was important but not top-ranked, which was due to the difficulty of identifying smoking history from search terms. Pattern recognition based on data drawn from large-scale web search queries holds opportunity for identifying risk factors and frames new directions with early detection of lung carcinoma.

  2. The Geodetic Seamless Archive Centers Service Layer: A System Architecture for Federating Geodesy Data Repositories

    NASA Astrophysics Data System (ADS)

    McWhirter, J.; Boler, F. M.; Bock, Y.; Jamason, P.; Squibb, M. B.; Noll, C. E.; Blewitt, G.; Kreemer, C. W.

    2010-12-01

    Three geodesy Archive Centers, Scripps Orbit and Permanent Array Center (SOPAC), NASA's Crustal Dynamics Data Information System (CDDIS) and UNAVCO are engaged in a joint effort to define and develop a common Web Service Application Programming Interface (API) for accessing geodetic data holdings. This effort is funded by the NASA ROSES ACCESS Program to modernize the original GPS Seamless Archive Centers (GSAC) technology which was developed in the 1990s. A new web service interface, the GSAC-WS, is being developed to provide uniform and expanded mechanisms through which users can access our data repositories. In total, our respective archives hold tens of millions of files and contain a rich collection of site/station metadata. Though we serve similar user communities, we currently provide a range of different access methods, query services and metadata formats. This leads to a lack of consistency in the userís experience and a duplication of engineering efforts. The GSAC-WS API and its reference implementation in an underlying Java-based GSAC Service Layer (GSL) supports metadata and data queries into site/station oriented data archives. The general nature of this API makes it applicable to a broad range of data systems. The overall goals of this project include providing consistent and rich query interfaces for end users and client programs, the development of enabling technology to facilitate third party repositories in developing these web service capabilities and to enable the ability to perform data queries across a collection of federated GSAC-WS enabled repositories. A fundamental challenge faced in this project is to provide a common suite of query services across a heterogeneous collection of data yet enabling each repository to expose their specific metadata holdings. To address this challenge we are developing a "capabilities" based service where a repository can describe its specific query and metadata capabilities. Furthermore, the architecture of the GSL is based on a model-view paradigm that decouples the underlying data model semantics from particular representations of the data model. This will allow for the GSAC-WS enabled repositories to evolve their service offerings to incorporate new metadata definition formats (e.g., ISO-19115, FGDC, JSON, etc.) and new techniques for accessing their holdings. Building on the core GSAC-WS implementations the project is also developing a federated/distributed query service. This service will seamlessly integrate with the GSAC Service Layer and will support data and metadata queries across a collection of federated GSAC repositories.

  3. BP Spill Sampling and Monitoring Data

    EPA Pesticide Factsheets

    This dataset analyzes waste from the the British Petroleum Deepwater Horizon Rig Explosion Emergency Response, providing opportunity to query data sets by metadata criteria and find resulting raw datasets in CSV format.The data query tool allows users to download EPA's air, water and sediment sampling and monitoring data that has been collected in response to the BP oil spill. All sampling and monitoring data that has been collected to date is available for download as raw structured data.The query tools enables CSV file creation to be refined based on the following search criteria: date range (between April 28, 2010 and 9/29/2010); location by zip, city, or county; media (solid waste, weathered oil, air, surface water, liquid waste, tar, sediment, water); substance categories (based on media selection) and substances (based on substance category selection).

  4. Real-Time Earthquake Monitoring with Spatio-Temporal Fields

    NASA Astrophysics Data System (ADS)

    Whittier, J. C.; Nittel, S.; Subasinghe, I.

    2017-10-01

    With live streaming sensors and sensor networks, increasingly large numbers of individual sensors are deployed in physical space. Sensor data streams are a fundamentally novel mechanism to deliver observations to information systems. They enable us to represent spatio-temporal continuous phenomena such as radiation accidents, toxic plumes, or earthquakes almost as instantaneously as they happen in the real world. Sensor data streams discretely sample an earthquake, while the earthquake is continuous over space and time. Programmers attempting to integrate many streams to analyze earthquake activity and scope need to write code to integrate potentially very large sets of asynchronously sampled, concurrent streams in tedious application code. In previous work, we proposed the field stream data model (Liang et al., 2016) for data stream engines. Abstracting the stream of an individual sensor as a temporal field, the field represents the Earth's movement at the sensor position as continuous. This simplifies analysis across many sensors significantly. In this paper, we undertake a feasibility study of using the field stream model and the open source Data Stream Engine (DSE) Apache Spark(Apache Spark, 2017) to implement a real-time earthquake event detection with a subset of the 250 GPS sensor data streams of the Southern California Integrated GPS Network (SCIGN). The field-based real-time stream queries compute maximum displacement values over the latest query window of each stream, and related spatially neighboring streams to identify earthquake events and their extent. Further, we correlated the detected events with an USGS earthquake event feed. The query results are visualized in real-time.

  5. Providing Web Interfaces to the NSF EarthScope USArray Transportable Array

    NASA Astrophysics Data System (ADS)

    Vernon, Frank; Newman, Robert; Lindquist, Kent

    2010-05-01

    Since April 2004 the EarthScope USArray seismic network has grown to over 850 broadband stations that stream multi-channel data in near real-time to the Array Network Facility in San Diego. Providing secure, yet open, access to real-time and archived data for a broad range of audiences is best served by a series of platform agnostic low-latency web-based applications. We present a framework of tools that mediate between the world wide web and Boulder Real Time Technologies Antelope Environmental Monitoring System data acquisition and archival software. These tools provide comprehensive information to audiences ranging from network operators and geoscience researchers, to funding agencies and the general public. This ranges from network-wide to station-specific metadata, state-of-health metrics, event detection rates, archival data and dynamic report generation over a station's two year life span. Leveraging open source web-site development frameworks for both the server side (Perl, Python and PHP) and client-side (Flickr, Google Maps/Earth and jQuery) facilitates the development of a robust extensible architecture that can be tailored on a per-user basis, with rapid prototyping and development that adheres to web-standards. Typical seismic data warehouses allow online users to query and download data collected from regional networks, without the scientist directly visually assessing data coverage and/or quality. Using a suite of web-based protocols, we have recently developed an online seismic waveform interface that directly queries and displays data from a relational database through a web-browser. Using the Python interface to Datascope and the Python-based Twisted network package on the server side, and the jQuery Javascript framework on the client side to send and receive asynchronous waveform queries, we display broadband seismic data using the HTML Canvas element that is globally accessible by anyone using a modern web-browser. We are currently creating additional interface tools to create a rich-client interface for accessing and displaying seismic data that can be deployed to any system running the Antelope Real Time System. The software is freely available from the Antelope contributed code Git repository (http://www.antelopeusersgroup.org).

  6. Predicting Drug Recalls From Internet Search Engine Queries.

    PubMed

    Yom-Tov, Elad

    2017-01-01

    Batches of pharmaceuticals are sometimes recalled from the market when a safety issue or a defect is detected in specific production runs of a drug. Such problems are usually detected when patients or healthcare providers report abnormalities to medical authorities. Here, we test the hypothesis that defective production lots can be detected earlier by monitoring queries to Internet search engines. We extracted queries from the USA to the Bing search engine, which mentioned one of the 5195 pharmaceutical drugs during 2015 and all recall notifications issued by the Food and Drug Administration (FDA) during that year. By using attributes that quantify the change in query volume at the state level, we attempted to predict if a recall of a specific drug will be ordered by FDA in a time horizon ranging from 1 to 40 days in future. Our results show that future drug recalls can indeed be identified with an AUC of 0.791 and a lift at 5% of approximately 6 when predicting a recall occurring one day ahead. This performance degrades as prediction is made for longer periods ahead. The most indicative attributes for prediction are sudden spikes in query volume about a specific medicine in each state. Recalls of prescription drugs and those estimated to be of medium-risk are more likely to be identified using search query data. These findings suggest that aggregated Internet search engine data can be used to facilitate in early warning of faulty batches of medicines.

  7. Searching PubMed for studies on bacteremia, bloodstream infection, septicemia, or whatever the best term is: a note of caution.

    PubMed

    Søgaard, Mette; Andersen, Jens P; Schønheyder, Henrik C

    2012-04-01

    There is inconsistency in the terminology used to describe bacteremia. To demonstrate the impact on information retrieval, we compared the yield of articles from PubMed MEDLINE using the terms "bacteremia," "bloodstream infection," and "septicemia." We searched for articles published between 1966 and 2009, and depicted the relationships among queries graphically. To examine the content of the retrieved articles, we extracted all Medical Subject Headings (MeSH) terms and compared topic similarity using a cosine measure. The recovered articles differed greatly by term, and only 53 articles were captured by all terms. Of the articles retrieved by the "bacteremia" query, 21,438 (84.1%) were not captured when searching for "bloodstream infection" or "septicemia." Likewise, only 2,243 of the 11,796 articles recovered by free-text query for "bloodstream infection" were retrieved by the "bacteremia" query (19%). Entering "bloodstream infection" as a phrase, 46.1% of the records overlapped with the "bacteremia" query. Similarity measures ranged from 0.52 to 0.78 and were lowest for "bloodstream infection" as a phrase compared with "septicemia." Inconsistent terminology has a major impact on the yield of queries. Agreement on terminology should be sought and promoted by scientific journals. An immediate solution is to add "bloodstream infection" as entry term for bacteremia in the MeSH vocabulary. Copyright © 2012 Association for Professionals in Infection Control and Epidemiology, Inc. Published by Mosby, Inc. All rights reserved.

  8. A journey to Semantic Web query federation in the life sciences.

    PubMed

    Cheung, Kei-Hoi; Frost, H Robert; Marshall, M Scott; Prud'hommeaux, Eric; Samwald, Matthias; Zhao, Jun; Paschke, Adrian

    2009-10-01

    As interest in adopting the Semantic Web in the biomedical domain continues to grow, Semantic Web technology has been evolving and maturing. A variety of technological approaches including triplestore technologies, SPARQL endpoints, Linked Data, and Vocabulary of Interlinked Datasets have emerged in recent years. In addition to the data warehouse construction, these technological approaches can be used to support dynamic query federation. As a community effort, the BioRDF task force, within the Semantic Web for Health Care and Life Sciences Interest Group, is exploring how these emerging approaches can be utilized to execute distributed queries across different neuroscience data sources. We have created two health care and life science knowledge bases. We have explored a variety of Semantic Web approaches to describe, map, and dynamically query multiple datasets. We have demonstrated several federation approaches that integrate diverse types of information about neurons and receptors that play an important role in basic, clinical, and translational neuroscience research. Particularly, we have created a prototype receptor explorer which uses OWL mappings to provide an integrated list of receptors and executes individual queries against different SPARQL endpoints. We have also employed the AIDA Toolkit, which is directed at groups of knowledge workers who cooperatively search, annotate, interpret, and enrich large collections of heterogeneous documents from diverse locations. We have explored a tool called "FeDeRate", which enables a global SPARQL query to be decomposed into subqueries against the remote databases offering either SPARQL or SQL query interfaces. Finally, we have explored how to use the vocabulary of interlinked Datasets (voiD) to create metadata for describing datasets exposed as Linked Data URIs or SPARQL endpoints. We have demonstrated the use of a set of novel and state-of-the-art Semantic Web technologies in support of a neuroscience query federation scenario. We have identified both the strengths and weaknesses of these technologies. While Semantic Web offers a global data model including the use of Uniform Resource Identifiers (URI's), the proliferation of semantically-equivalent URI's hinders large scale data integration. Our work helps direct research and tool development, which will be of benefit to this community.

  9. A journey to Semantic Web query federation in the life sciences

    PubMed Central

    Cheung, Kei-Hoi; Frost, H Robert; Marshall, M Scott; Prud'hommeaux, Eric; Samwald, Matthias; Zhao, Jun; Paschke, Adrian

    2009-01-01

    Background As interest in adopting the Semantic Web in the biomedical domain continues to grow, Semantic Web technology has been evolving and maturing. A variety of technological approaches including triplestore technologies, SPARQL endpoints, Linked Data, and Vocabulary of Interlinked Datasets have emerged in recent years. In addition to the data warehouse construction, these technological approaches can be used to support dynamic query federation. As a community effort, the BioRDF task force, within the Semantic Web for Health Care and Life Sciences Interest Group, is exploring how these emerging approaches can be utilized to execute distributed queries across different neuroscience data sources. Methods and results We have created two health care and life science knowledge bases. We have explored a variety of Semantic Web approaches to describe, map, and dynamically query multiple datasets. We have demonstrated several federation approaches that integrate diverse types of information about neurons and receptors that play an important role in basic, clinical, and translational neuroscience research. Particularly, we have created a prototype receptor explorer which uses OWL mappings to provide an integrated list of receptors and executes individual queries against different SPARQL endpoints. We have also employed the AIDA Toolkit, which is directed at groups of knowledge workers who cooperatively search, annotate, interpret, and enrich large collections of heterogeneous documents from diverse locations. We have explored a tool called "FeDeRate", which enables a global SPARQL query to be decomposed into subqueries against the remote databases offering either SPARQL or SQL query interfaces. Finally, we have explored how to use the vocabulary of interlinked Datasets (voiD) to create metadata for describing datasets exposed as Linked Data URIs or SPARQL endpoints. Conclusion We have demonstrated the use of a set of novel and state-of-the-art Semantic Web technologies in support of a neuroscience query federation scenario. We have identified both the strengths and weaknesses of these technologies. While Semantic Web offers a global data model including the use of Uniform Resource Identifiers (URI's), the proliferation of semantically-equivalent URI's hinders large scale data integration. Our work helps direct research and tool development, which will be of benefit to this community. PMID:19796394

  10. Random and Directed Walk-Based Top-k Queries in Wireless Sensor Networks

    PubMed Central

    Fu, Jun-Song; Liu, Yun

    2015-01-01

    In wireless sensor networks, filter-based top-k query approaches are the state-of-the-art solutions and have been extensively researched in the literature, however, they are very sensitive to the network parameters, including the size of the network, dynamics of the sensors’ readings and declines in the overall range of all the readings. In this work, a random walk-based top-k query approach called RWTQ and a directed walk-based top-k query approach called DWTQ are proposed. At the beginning of a top-k query, one or several tokens are sent to the specific node(s) in the network by the base station. Then, each token walks in the network independently to record and process the readings in a random or directed way. A strategy of choosing the “right” way in DWTQ is carefully designed for the token(s) to arrive at the high-value regions as soon as possible. When designing the walking strategy for DWTQ, the spatial correlations of the readings are also considered. Theoretical analysis and simulation results indicate that RWTQ and DWTQ both are very robust against these parameters discussed previously. In addition, DWTQ outperforms TAG, FILA and EXTOK in transmission cost, energy consumption and network lifetime. PMID:26016914

  11. An organizational framework and strategic implementation for system-level change to enhance research-based practice: QUERI Series

    PubMed Central

    Stetler, Cheryl B; McQueen, Lynn; Demakis, John; Mittman, Brian S

    2008-01-01

    Background The continuing gap between available evidence and current practice in health care reinforces the need for more effective solutions, in particular related to organizational context. Considerable advances have been made within the U.S. Veterans Health Administration (VA) in systematically implementing evidence into practice. These advances have been achieved through a system-level program focused on collaboration and partnerships among policy makers, clinicians, and researchers. The Quality Enhancement Research Initiative (QUERI) was created to generate research-driven initiatives that directly enhance health care quality within the VA and, simultaneously, contribute to the field of implementation science. This paradigm-shifting effort provided a natural laboratory for exploring organizational change processes. This article describes the underlying change framework and implementation strategy used to operationalize QUERI. Strategic approach to organizational change QUERI used an evidence-based organizational framework focused on three contextual elements: 1) cultural norms and values, in this case related to the role of health services researchers in evidence-based quality improvement; 2) capacity, in this case among researchers and key partners to engage in implementation research; 3) and supportive infrastructures to reinforce expectations for change and to sustain new behaviors as part of the norm. As part of a QUERI Series in Implementation Science, this article describes the framework's application in an innovative integration of health services research, policy, and clinical care delivery. Conclusion QUERI's experience and success provide a case study in organizational change. It demonstrates that progress requires a strategic, systems-based effort. QUERI's evidence-based initiative involved a deliberate cultural shift, requiring ongoing commitment in multiple forms and at multiple levels. VA's commitment to QUERI came in the form of visionary leadership, targeted allocation of resources, infrastructure refinements, innovative peer review and study methods, and direct involvement of key stakeholders. Stakeholders included both those providing and managing clinical care, as well as those producing relevant evidence within the health care system. The organizational framework and related implementation interventions used to achieve contextual change resulted in engaged investigators and enhanced uptake of research knowledge. QUERI's approach and progress provide working hypotheses for others pursuing similar system-wide efforts to routinely achieve evidence-based care. PMID:18510750

  12. Space environment data storage and access: lessons learned and recommendations for the future

    NASA Astrophysics Data System (ADS)

    Evans, Hugh; Heynderickx, Daniel

    2012-07-01

    With the ever increasing volume of space environment data available at present and planned for the near future, the demands on data storage and access methods are increasing as well. In addition, continued access to historical, archived data remains crucial. On the basis of many years of experience, the authors identify the following issues as important for continued and efficient handling of datasets now and in the future: The huge data volumes currently or very soon avaiable from a number of space missions will limi direct Internet download access to even relatively short epoch ranges of data. Therefore, data providers should establish or extend standardised data (post-) processing services so that only data query results should be downloaded. Although a single standardised data format will in all likelihood remain utopia, data providers should at least include extensive metadata with their data products, according to established standards and practices (e.g. ISTP, SPASE). Standardisation of (sets of) metadata greatly facilitates data mining and querying. The use of SQL database storage should be considered instead of, or in parallel with, classic storage of data files. The use of SQL does away with having to handle file parsing and processing, while at the same time standard access protocols can be used to (remotely) connect to such data repositories. Many data holdings are still lacking in extensive descriptions of data provenance (e.g. instrument description), content and format. Unfortunately, detailed data information is usually rejected by scientific and technical journals. Re-processing of historical archived datasets into modern formats, making them easily available and usable, is urgently required, as knowledge is being lost. A global data directory has still not been achieved; policy makers should enforce stricter rules for "broadcasting" dataset information.

  13. Breaking the Curse of Cardinality on Bitmap Indexes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wu, Kesheng; Wu, Kesheng; Stockinger, Kurt

    2008-04-04

    Bitmap indexes are known to be efficient for ad-hoc range queries that are common in data warehousing and scientific applications. However, they suffer from the curse of cardinality, that is, their efficiency deteriorates as attribute cardinalities increase. A number of strategies have been proposed, but none of them addresses the problem adequately. In this paper, we propose a novel binned bitmap index that greatly reduces the cost to answer queries, and therefore breaks the curse of cardinality. The key idea is to augment the binned index with an Order-preserving Bin-based Clustering (OrBiC) structure. This data structure significantly reduces the I/Omore » operations needed to resolve records that cannot be resolved with the bitmaps. To further improve the proposed index structure, we also present a strategy to create single-valued bins for frequent values. This strategy reduces index sizes and improves query processing speed. Overall, the binned indexes with OrBiC great improves the query processing speed, and are 3 - 25 times faster than the best available indexes for high-cardinality data.« less

  14. BP Spill Sampling and Monitoring Data April-September 2010 - Data Download Tool

    EPA Pesticide Factsheets

    This dataset analyzes waste from the the British Petroleum Deepwater Horizon Rig Explosion Emergency Response, providing opportunity to query data sets by metadata criteria and find resulting raw datasets in CSV format.The data query tool allows users to download air, water and sediment sampling and monitoring data that has been collected in response to the BP oil spill. All sampling and monitoring data that has been collected to date is available for download as raw structured data.The query tools enables CSV file creation to be refined based on the following search criteria: date range (between April 28, 2010 and 9/29/2010); location by zip, city, or county; media (solid waste, weathered oil, air, surface water, liquid waste, tar, sediment, water); substance categories (based on media selection) and substances (based on substance category selection).

  15. Scalable and responsive event processing in the cloud

    PubMed Central

    Suresh, Visalakshmi; Ezhilchelvan, Paul; Watson, Paul

    2013-01-01

    Event processing involves continuous evaluation of queries over streams of events. Response-time optimization is traditionally done over a fixed set of nodes and/or by using metrics measured at query-operator levels. Cloud computing makes it easy to acquire and release computing nodes as required. Leveraging this flexibility, we propose a novel, queueing-theory-based approach for meeting specified response-time targets against fluctuating event arrival rates by drawing only the necessary amount of computing resources from a cloud platform. In the proposed approach, the entire processing engine of a distinct query is modelled as an atomic unit for predicting response times. Several such units hosted on a single node are modelled as a multiple class M/G/1 system. These aspects eliminate intrusive, low-level performance measurements at run-time, and also offer portability and scalability. Using model-based predictions, cloud resources are efficiently used to meet response-time targets. The efficacy of the approach is demonstrated through cloud-based experiments. PMID:23230164

  16. AQUAdexIM: highly efficient in-memory indexing and querying of astronomy time series images

    NASA Astrophysics Data System (ADS)

    Hong, Zhi; Yu, Ce; Wang, Jie; Xiao, Jian; Cui, Chenzhou; Sun, Jizhou

    2016-12-01

    Astronomy has always been, and will continue to be, a data-based science, and astronomers nowadays are faced with increasingly massive datasets, one key problem of which is to efficiently retrieve the desired cup of data from the ocean. AQUAdexIM, an innovative spatial indexing and querying method, performs highly efficient on-the-fly queries under users' request to search for Time Series Images from existing observation data on the server side and only return the desired FITS images to users, so users no longer need to download entire datasets to their local machines, which will only become more and more impractical as the data size keeps increasing. Moreover, AQUAdexIM manages to keep a very low storage space overhead and its specially designed in-memory index structure enables it to search for Time Series Images of a given area of the sky 10 times faster than using Redis, a state-of-the-art in-memory database.

  17. Privacy-Aware Location Database Service for Granular Queries

    NASA Astrophysics Data System (ADS)

    Kiyomoto, Shinsaku; Martin, Keith M.; Fukushima, Kazuhide

    Future mobile markets are expected to increasingly embrace location-based services. This paper presents a new system architecture for location-based services, which consists of a location database and distributed location anonymizers. The service is privacy-aware in the sense that the location database always maintains a degree of anonymity. The location database service permits three different levels of query and can thus be used to implement a wide range of location-based services. Furthermore, the architecture is scalable and employs simple functions that are similar to those found in general database systems.

  18. Real-time fiber selection using the Wii remote

    NASA Astrophysics Data System (ADS)

    Klein, Jan; Scholl, Mike; Köhn, Alexander; Hahn, Horst K.

    2010-02-01

    In the last few years, fiber tracking tools have become popular in clinical contexts, e.g., for pre- and intraoperative neurosurgical planning. The efficient, intuitive, and reproducible selection of fiber bundles still constitutes one of the main issues. In this paper, we present a framework for a real-time selection of axonal fiber bundles using a Wii remote control, a wireless controller for Nintendo's gaming console. It enables the user to select fiber bundles without any other input devices. To achieve a smooth interaction, we propose a novel spacepartitioning data structure for efficient 3D range queries in a data set consisting of precomputed fibers. The data structure which is adapted to the special geometry of fiber tracts allows for queries that are many times faster compared with previous state-of-the-art approaches. In order to extract reliably fibers for further processing, e.g., for quantification purposes or comparisons with preoperatively tracked fibers, we developed an expectationmaximization clustering algorithm that can refine the range queries. Our initial experiments have shown that white matter fiber bundles can be reliably selected within a few seconds by the Wii, which has been placed in a sterile plastic bag to simulate usage under surgical conditions.

  19. Omicseq: a web-based search engine for exploring omics datasets

    PubMed Central

    Sun, Xiaobo; Pittard, William S.; Xu, Tianlei; Chen, Li; Zwick, Michael E.; Jiang, Xiaoqian; Wang, Fusheng

    2017-01-01

    Abstract The development and application of high-throughput genomics technologies has resulted in massive quantities of diverse omics data that continue to accumulate rapidly. These rich datasets offer unprecedented and exciting opportunities to address long standing questions in biomedical research. However, our ability to explore and query the content of diverse omics data is very limited. Existing dataset search tools rely almost exclusively on the metadata. A text-based query for gene name(s) does not work well on datasets wherein the vast majority of their content is numeric. To overcome this barrier, we have developed Omicseq, a novel web-based platform that facilitates the easy interrogation of omics datasets holistically to improve ‘findability’ of relevant data. The core component of Omicseq is trackRank, a novel algorithm for ranking omics datasets that fully uses the numerical content of the dataset to determine relevance to the query entity. The Omicseq system is supported by a scalable and elastic, NoSQL database that hosts a large collection of processed omics datasets. In the front end, a simple, web-based interface allows users to enter queries and instantly receive search results as a list of ranked datasets deemed to be the most relevant. Omicseq is freely available at http://www.omicseq.org. PMID:28402462

  20. Study of Mobile GIS Application on the Field of GPR in the Road Disease Detection

    NASA Astrophysics Data System (ADS)

    Liao, Q.; Yang, F.

    2013-12-01

    With the reflection principle of pulsed electromagnetic waves, ground penetrating radar (GPR) is available to measure depth of the pavement layer, reflecting different hidden danger underground. Currently, GPR has been widely used in road engineering with the constantly improved ability of detection and diagnosis to road diseases. The sum of road disease data of a region, a city, and even a wider range will be a very informative database, so we need a more convenient way to achieve data query intuitively. As mobile internet develops continuously, application of mobile terminal device plays a more important role in information platform. Mobile GIS, with smartphone as its terminal, is supported by the mobile Internet, GPS or base station as its positioning method. In this article, based on Android Platform and using C/S pattern, the LBS application of road diseases information which integrates Baidu Map API and database technology was discussed. After testing, it can display and query the real-time and historical road diseases data, the classification of data on a phone intuitively and easily. Because of the location technique and high portability of smart phone, the spot investigations of road diseases become easier. Though, the system needs further improvement, especially with the improving of the mobile phone performance, the system can also add the function of analysis to the disease data, thus forming a set of service system with more applicable.

  1. Distributed Data Service for Data Management in Internet of Things Middleware.

    PubMed

    Cruz Huacarpuma, Ruben; de Sousa Junior, Rafael Timoteo; de Holanda, Maristela Terto; de Oliveira Albuquerque, Robson; García Villalba, Luis Javier; Kim, Tai-Hoon

    2017-04-27

    The development of the Internet of Things (IoT) is closely related to a considerable increase in the number and variety of devices connected to the Internet. Sensors have become a regular component of our environment, as well as smart phones and other devices that continuously collect data about our lives even without our intervention. With such connected devices, a broad range of applications has been developed and deployed, including those dealing with massive volumes of data. In this paper, we introduce a Distributed Data Service (DDS) to collect and process data for IoT environments. One central goal of this DDS is to enable multiple and distinct IoT middleware systems to share common data services from a loosely-coupled provider. In this context, we propose a new specification of functionalities for a DDS and the conception of the corresponding techniques for collecting, filtering and storing data conveniently and efficiently in this environment. Another contribution is a data aggregation component that is proposed to support efficient real-time data querying. To validate its data collecting and querying functionalities and performance, the proposed DDS is evaluated in two case studies regarding a simulated smart home system, the first case devoted to evaluating data collection and aggregation when the DDS is interacting with the UIoT middleware, and the second aimed at comparing the DDS data collection with this same functionality implemented within the Kaa middleware.

  2. FTree query construction for virtual screening: a statistical analysis.

    PubMed

    Gerlach, Christof; Broughton, Howard; Zaliani, Andrea

    2008-02-01

    FTrees (FT) is a known chemoinformatic tool able to condense molecular descriptions into a graph object and to search for actives in large databases using graph similarity. The query graph is classically derived from a known active molecule, or a set of actives, for which a similar compound has to be found. Recently, FT similarity has been extended to fragment space, widening its capabilities. If a user were able to build a knowledge-based FT query from information other than a known active structure, the similarity search could be combined with other, normally separate, fields like de-novo design or pharmacophore searches. With this aim in mind, we performed a comprehensive analysis of several databases in terms of FT description and provide a basic statistical analysis of the FT spaces so far at hand. Vendors' catalogue collections and MDDR as a source of potential or known "actives", respectively, have been used. With the results reported herein, a set of ranges, mean values and standard deviations for several query parameters are presented in order to set a reference guide for the users. Applications on how to use this information in FT query building are also provided, using a newly built 3D-pharmacophore from 57 5HT-1F agonists and a published one which was used for virtual screening for tRNA-guanine transglycosylase (TGT) inhibitors.

  3. Improved Information Retrieval Performance on SQL Database Using Data Adapter

    NASA Astrophysics Data System (ADS)

    Husni, M.; Djanali, S.; Ciptaningtyas, H. T.; Wicaksana, I. G. N. A.

    2018-02-01

    The NoSQL databases, short for Not Only SQL, are increasingly being used as the number of big data applications increases. Most systems still use relational databases (RDBs), but as the number of data increases each year, the system handles big data with NoSQL databases to analyze and access data more quickly. NoSQL emerged as a result of the exponential growth of the internet and the development of web applications. The query syntax in the NoSQL database differs from the SQL database, therefore requiring code changes in the application. Data adapter allow applications to not change their SQL query syntax. Data adapters provide methods that can synchronize SQL databases with NotSQL databases. In addition, the data adapter provides an interface which is application can access to run SQL queries. Hence, this research applied data adapter system to synchronize data between MySQL database and Apache HBase using direct access query approach, where system allows application to accept query while synchronization process in progress. From the test performed using data adapter, the results obtained that the data adapter can synchronize between SQL databases, MySQL, and NoSQL database, Apache HBase. This system spends the percentage of memory resources in the range of 40% to 60%, and the percentage of processor moving from 10% to 90%. In addition, from this system also obtained the performance of database NoSQL better than SQL database.

  4. FTree query construction for virtual screening: a statistical analysis

    NASA Astrophysics Data System (ADS)

    Gerlach, Christof; Broughton, Howard; Zaliani, Andrea

    2008-02-01

    FTrees (FT) is a known chemoinformatic tool able to condense molecular descriptions into a graph object and to search for actives in large databases using graph similarity. The query graph is classically derived from a known active molecule, or a set of actives, for which a similar compound has to be found. Recently, FT similarity has been extended to fragment space, widening its capabilities. If a user were able to build a knowledge-based FT query from information other than a known active structure, the similarity search could be combined with other, normally separate, fields like de-novo design or pharmacophore searches. With this aim in mind, we performed a comprehensive analysis of several databases in terms of FT description and provide a basic statistical analysis of the FT spaces so far at hand. Vendors' catalogue collections and MDDR as a source of potential or known "actives", respectively, have been used. With the results reported herein, a set of ranges, mean values and standard deviations for several query parameters are presented in order to set a reference guide for the users. Applications on how to use this information in FT query building are also provided, using a newly built 3D-pharmacophore from 57 5HT-1F agonists and a published one which was used for virtual screening for tRNA-guanine transglycosylase (TGT) inhibitors.

  5. ChronoMedIt--a computational quality audit framework for better management of patients with chronic conditions.

    PubMed

    Mabotuwana, Thusitha; Warren, Jim

    2010-02-01

    Quality audit and feedback to general practice is an important aspect of successful chronic disease management. However, due to the complex temporal relationships associated with the nature of chronic illness, formulating clinically relevant queries within the context of a specific evaluation period is difficult. We abstracted requirements from a set of previously developed criteria to develop a generic criteria model that can be used to formulate queries related to chronic condition management. We implemented and verified the framework, ChronoMedIt, to execute clinical queries within the scope of the criteria model. Our criteria model consists of four broad classes of audit criteria - lapse in indicated therapy, no measurement recording, time to achieve target and measurement contraindicating therapy. Using these criteria classes as a guide, ChronoMedIt has been implemented as an extensible framework. ChronoMedIt can produce criteria reports and has an integrated prescription and measurement timeline visualisation tool. We illustrate the use of the framework by identifying patients on suboptimal therapy based on a range of pre-determined audit criteria using production electronic medical record data from two general medical practices for 607 and 679 patients with hypertension. As the most prominent result, we find that 59% (out of 607) and 34% (out of 679) of patients with hypertension had at least one episode of >30day lapse in their antihypertensive therapy over a 12-month evaluation period. ChronoMedIt can reliably execute a wide range of clinically useful queries to identify patients whose chronic condition management can be improved.

  6. WOVOdat Design Document: The Schema, Table Descriptions, and Create Table Statements for the Database of Worldwide Volcanic Unrest (WOVOdat Version 1.0)

    USGS Publications Warehouse

    Venezky, Dina Y.; Newhall, Christopher G.

    2007-01-01

    WOVOdat Overview During periods of volcanic unrest, the ability to forecast near future activity has been a primary concern for human populations living near volcanoes. Our ability to forecast future activity and mitigate hazards is based on knowledge of previous activity at the volcano exhibiting unrest and knowledge of previous activity at similar volcanoes. A small set of experts with past experience are often involved in forecasting. We need to both preserve the knowledge the experts use and continue to investigate volcanic data to make better forecasts. Advances in instrumentation, networking, and data storage technologies have greatly increased our ability to collect volcanic data and share observations with our colleagues. The wealth of data creates numerous opportunities for gaining a better understanding of magmatic conditions and processes, if the data can be easily accessed for comparison. To allow for comparison of volcanic unrest data, we are creating a central database called WOVOdat. WOVOdat will contain a subset of time-series and geo-referenced data from each WOVO observatory in common and easily accessible formats. WOVOdat is being created for volcano experts in charge of forecasting volcanic activity, scientists investigating volcanic processes, and the public. The types of queries each of these groups might ask range from, 'What volcanoes were active in November of 2002?' and 'What are the relationships between tectonic earthquakes and volcanic processes?' to complex analyses of volcanic unrest to determine what future activity might occur. A new structure for storing and accessing our data was needed to examine processes across a wide range of volcanologic conditions. WOVOdat provides this new structure using relationships to connect the data parameters such that searches can be created for analogs of unrest. The subset of data that will fill WOVOdat will continue to be collected by the observatories, who will remain the primary archives of raw and detailed data on individual episodes of unrest. MySQL, an Open Source database, was chosen as the WOVOdat database for its integration with common web languages. The question of where the data will be stored and how the disparate data sets will be integrated will not be discussed in detail here. The focus of this document is to explain the data types, formats, and table organization chosen for WOVOdat 1.0. It was written for database administrators, data loaders, query writers, and anyone who monitors volcanoes. We begin with an overview of several challenges faced and solutions used in creating the WOVOdat schema. Specifics are then given for the parameters and table organization. After each table organization section, basic create table statements are included for viewing the database field formats. In the next stage of the project, scripts will be needed for data conversion, entry, and cleansing. Views will also need to be created once the data have been loaded and the basic queries are better known. Many questions and opportunities remain. We look forward to the growth and continual improvement in efficiency of the system. We hope WOVOdat will improve our understanding of magmatic systems and help mitigate future volcanic hazards.

  7. Machine Translation-Supported Cross-Language Information Retrieval for a Consumer Health Resource

    PubMed Central

    Rosemblat, Graciela; Gemoets, Darren; Browne, Allen C.; Tse, Tony

    2003-01-01

    The U.S. National Institutes of Health, through its National Library of Medicine, developed ClinicalTrials.gov to provide the public with easy access to information on clinical trials on a wide range of conditions or diseases. Only English language information retrieval is currently supported. Given the growing number of Spanish speakers in the U.S. and their increasing use of the Web, we anticipate a significant increase in Spanish-speaking users. This study compares the effectiveness of two common cross-language information retrieval methods using machine translation, query translation versus document translation, using a subset of genuine user queries from ClinicalTrials.gov. Preliminary results conducted with the ClinicalTrials.gov search engine show that in our environment, query translation is statistically significantly better than document translation. We discuss possible reasons for this result and we conclude with suggestions for future work. PMID:14728236

  8. Targeted Data Extraction of the MS/MS Spectra Generated by Data-independent Acquisition: A New Concept for Consistent and Accurate Proteome Analysis*

    PubMed Central

    Gillet, Ludovic C.; Navarro, Pedro; Tate, Stephen; Röst, Hannes; Selevsek, Nathalie; Reiter, Lukas; Bonner, Ron; Aebersold, Ruedi

    2012-01-01

    Most proteomic studies use liquid chromatography coupled to tandem mass spectrometry to identify and quantify the peptides generated by the proteolysis of a biological sample. However, with the current methods it remains challenging to rapidly, consistently, reproducibly, accurately, and sensitively detect and quantify large fractions of proteomes across multiple samples. Here we present a new strategy that systematically queries sample sets for the presence and quantity of essentially any protein of interest. It consists of using the information available in fragment ion spectral libraries to mine the complete fragment ion maps generated using a data-independent acquisition method. For this study, the data were acquired on a fast, high resolution quadrupole-quadrupole time-of-flight (TOF) instrument by repeatedly cycling through 32 consecutive 25-Da precursor isolation windows (swaths). This SWATH MS acquisition setup generates, in a single sample injection, time-resolved fragment ion spectra for all the analytes detectable within the 400–1200 m/z precursor range and the user-defined retention time window. We show that suitable combinations of fragment ions extracted from these data sets are sufficiently specific to confidently identify query peptides over a dynamic range of 4 orders of magnitude, even if the precursors of the queried peptides are not detectable in the survey scans. We also show that queried peptides are quantified with a consistency and accuracy comparable with that of selected reaction monitoring, the gold standard proteomic quantification method. Moreover, targeted data extraction enables ad libitum quantification refinement and dynamic extension of protein probing by iterative re-mining of the once-and-forever acquired data sets. This combination of unbiased, broad range precursor ion fragmentation and targeted data extraction alleviates most constraints of present proteomic methods and should be equally applicable to the comprehensive analysis of other classes of analytes, beyond proteomics. PMID:22261725

  9. Graphical modeling and query language for hospitals.

    PubMed

    Barzdins, Janis; Barzdins, Juris; Rencis, Edgars; Sostaks, Agris

    2013-01-01

    So far there has been little evidence that implementation of the health information technologies (HIT) is leading to health care cost savings. One of the reasons for this lack of impact by the HIT likely lies in the complexity of the business process ownership in the hospitals. The goal of our research is to develop a business model-based method for hospital use which would allow doctors to retrieve directly the ad-hoc information from various hospital databases. We have developed a special domain-specific process modelling language called the MedMod. Formally, we define the MedMod language as a profile on UML Class diagrams, but we also demonstrate it on examples, where we explain the semantics of all its elements informally. Moreover, we have developed the Process Query Language (PQL) that is based on MedMod process definition language. The purpose of PQL is to allow a doctor querying (filtering) runtime data of hospital's processes described using MedMod. The MedMod language tries to overcome deficiencies in existing process modeling languages, allowing to specify the loosely-defined sequence of the steps to be performed in the clinical process. The main advantages of PQL are in two main areas - usability and efficiency. They are: 1) the view on data through "glasses" of familiar process, 2) the simple and easy-to-perceive means of setting filtering conditions require no more expertise than using spreadsheet applications, 3) the dynamic response to each step in construction of the complete query that shortens the learning curve greatly and reduces the error rate, and 4) the selected means of filtering and data retrieving allows to execute queries in O(n) time regarding the size of the dataset. We are about to continue developing this project with three further steps. First, we are planning to develop user-friendly graphical editors for the MedMod process modeling and query languages. The second step is to do evaluation of usability the proposed language and tool involving the physicians from several hospitals in Latvia and working with real data from these hospitals. Our third step is to develop an efficient implementation of the query language.

  10. Survey of Event Processing

    DTIC Science & Technology

    2007-12-01

    1 A Brief History of Event Processing... history of event processing. The Applications section defines several application domains and use cases for event processing technology. Event...subscription” and “subscription language” will be used where some will often use “(continuous) query” or “query language.” A Brief History of

  11. Integrated databanks access and sequence/structure analysis services at the PBIL.

    PubMed

    Perrière, Guy; Combet, Christophe; Penel, Simon; Blanchet, Christophe; Thioulouse, Jean; Geourjon, Christophe; Grassot, Julien; Charavay, Céline; Gouy, Manolo; Duret, Laurent; Deléage, Gilbert

    2003-07-01

    The World Wide Web server of the PBIL (Pôle Bioinformatique Lyonnais) provides on-line access to sequence databanks and to many tools of nucleic acid and protein sequence analyses. This server allows to query nucleotide sequence banks in the EMBL and GenBank formats and protein sequence banks in the SWISS-PROT and PIR formats. The query engine on which our data bank access is based is the ACNUC system. It allows the possibility to build complex queries to access functional zones of biological interest and to retrieve large sequence sets. Of special interest are the unique features provided by this system to query the data banks of gene families developed at the PBIL. The server also provides access to a wide range of sequence analysis methods: similarity search programs, multiple alignments, protein structure prediction and multivariate statistics. An originality of this server is the integration of these two aspects: sequence retrieval and sequence analysis. Indeed, thanks to the introduction of re-usable lists, it is possible to perform treatments on large sets of data. The PBIL server can be reached at: http://pbil.univ-lyon1.fr.

  12. Towards ontology-driven navigation of the lipid bibliosphere

    PubMed Central

    Baker, Christopher JO; Kanagasabai, Rajaraman; Ang, Wee Tiong; Veeramani, Anitha; Low, Hong-Sang; Wenk, Markus R

    2008-01-01

    Background The indexing of scientific literature and content is a relevant and contemporary requirement within life science information systems. Navigating information available in legacy formats continues to be a challenge both in enterprise and academic domains. The emergence of semantic web technologies and their fusion with artificial intelligence techniques has provided a new toolkit with which to address these data integration challenges. In the emerging field of lipidomics such navigation challenges are barriers to the translation of scientific results into actionable knowledge, critical to the treatment of diseases such as Alzheimer's syndrome, Mycobacterium infections and cancer. Results We present a literature-driven workflow involving document delivery and natural language processing steps generating tagged sentences containing lipid, protein and disease names, which are instantiated to custom designed lipid ontology. We describe the design challenges in capturing lipid nomenclature, the mandate of the ontology and its role as query model in the navigation of the lipid bibliosphere. We illustrate the extent of the description logic-based A-box query capability provided by the instantiated ontology using a graphical query composer to query sentences describing lipid-protein and lipid-disease correlations. Conclusion As scientists accept the need to readjust the manner in which we search for information and derive knowledge we illustrate a system that can constrain the literature explosion and knowledge navigation problems. Specifically we have focussed on solving this challenge for lipidomics researchers who have to deal with the lack of standardized vocabulary, differing classification schemes, and a wide array of synonyms before being able to derive scientific insights. The use of the OWL-DL variant of the Web Ontology Language (OWL) and description logic reasoning is pivotal in this regard, providing the lipid scientist with advanced query access to the results of text mining algorithms instantiated into the ontology. The visual query paradigm assists in the adoption of this technology. PMID:18315858

  13. Towards ontology-driven navigation of the lipid bibliosphere.

    PubMed

    Baker, Christopher Jo; Kanagasabai, Rajaraman; Ang, Wee Tiong; Veeramani, Anitha; Low, Hong-Sang; Wenk, Markus R

    2008-01-01

    The indexing of scientific literature and content is a relevant and contemporary requirement within life science information systems. Navigating information available in legacy formats continues to be a challenge both in enterprise and academic domains. The emergence of semantic web technologies and their fusion with artificial intelligence techniques has provided a new toolkit with which to address these data integration challenges. In the emerging field of lipidomics such navigation challenges are barriers to the translation of scientific results into actionable knowledge, critical to the treatment of diseases such as Alzheimer's syndrome, Mycobacterium infections and cancer. We present a literature-driven workflow involving document delivery and natural language processing steps generating tagged sentences containing lipid, protein and disease names, which are instantiated to custom designed lipid ontology. We describe the design challenges in capturing lipid nomenclature, the mandate of the ontology and its role as query model in the navigation of the lipid bibliosphere. We illustrate the extent of the description logic-based A-box query capability provided by the instantiated ontology using a graphical query composer to query sentences describing lipid-protein and lipid-disease correlations. As scientists accept the need to readjust the manner in which we search for information and derive knowledge we illustrate a system that can constrain the literature explosion and knowledge navigation problems. Specifically we have focussed on solving this challenge for lipidomics researchers who have to deal with the lack of standardized vocabulary, differing classification schemes, and a wide array of synonyms before being able to derive scientific insights. The use of the OWL-DL variant of the Web Ontology Language (OWL) and description logic reasoning is pivotal in this regard, providing the lipid scientist with advanced query access to the results of text mining algorithms instantiated into the ontology. The visual query paradigm assists in the adoption of this technology.

  14. Language-concordant automated telephone queries to assess medication adherence in a diverse population: a cross-sectional analysis of convergent validity with pharmacy claims.

    PubMed

    Ratanawongsa, Neda; Quan, Judy; Handley, Margaret A; Sarkar, Urmimala; Schillinger, Dean

    2018-04-06

    Clinicians have difficulty accurately assessing medication non-adherence within chronic disease care settings. Health information technology (HIT) could offer novel tools to assess medication adherence in diverse populations outside of usual health care settings. In a multilingual urban safety net population, we examined the validity of assessing adherence using automated telephone self-management (ATSM) queries, when compared with non-adherence using continuous medication gap (CMG) on pharmacy claims. We hypothesized that patients reporting greater days of missed pills to ATSM queries would have higher rates of non-adherence as measured by CMG, and that ATSM adherence assessments would perform as well as structured interview assessments. As part of an ATSM-facilitated diabetes self-management program, low-income health plan members typed numeric responses to rotating weekly ATSM queries: "In the last 7 days, how many days did you MISS taking your …" diabetes, blood pressure, or cholesterol pill. Research assistants asked similar questions in computer-assisted structured telephone interviews. We measured continuous medication gap (CMG) by claims over 12 preceding months. To evaluate convergent validity, we compared rates of optimal adherence (CMG ≤ 20%) across respondents reporting 0, 1, and ≥ 2 missed pill days on ATSM and on structured interview. Among 210 participants, 46% had limited health literacy, 57% spoke Cantonese, and 19% Spanish. ATSM respondents reported ≥1 missed day for diabetes (33%), blood pressure (19%), and cholesterol (36%) pills. Interview respondents reported ≥1 missed day for diabetes (28%), blood pressure (21%), and cholesterol (26%) pills. Optimal adherence rates by CMG were lower among ATSM respondents reporting more missed days for blood pressure (p = 0.02) and cholesterol (p < 0.01); by interview, differences were significant for cholesterol (p = 0.01). Language-concordant ATSM demonstrated modest potential for assessing adherence. Studies should evaluate HIT assessments of medication beliefs and concerns in diverse populations. NCT00683020 , registered May 21, 2008.

  15. Location Prediction Based on Transition Probability Matrices Constructing from Sequential Rules for Spatial-Temporal K-Anonymity Dataset

    PubMed Central

    Liu, Zhao; Zhu, Yunhong; Wu, Chenxue

    2016-01-01

    Spatial-temporal k-anonymity has become a mainstream approach among techniques for protection of users’ privacy in location-based services (LBS) applications, and has been applied to several variants such as LBS snapshot queries and continuous queries. Analyzing large-scale spatial-temporal anonymity sets may benefit several LBS applications. In this paper, we propose two location prediction methods based on transition probability matrices constructing from sequential rules for spatial-temporal k-anonymity dataset. First, we define single-step sequential rules mined from sequential spatial-temporal k-anonymity datasets generated from continuous LBS queries for multiple users. We then construct transition probability matrices from mined single-step sequential rules, and normalize the transition probabilities in the transition matrices. Next, we regard a mobility model for an LBS requester as a stationary stochastic process and compute the n-step transition probability matrices by raising the normalized transition probability matrices to the power n. Furthermore, we propose two location prediction methods: rough prediction and accurate prediction. The former achieves the probabilities of arriving at target locations along simple paths those include only current locations, target locations and transition steps. By iteratively combining the probabilities for simple paths with n steps and the probabilities for detailed paths with n-1 steps, the latter method calculates transition probabilities for detailed paths with n steps from current locations to target locations. Finally, we conduct extensive experiments, and correctness and flexibility of our proposed algorithm have been verified. PMID:27508502

  16. EnsMart: A Generic System for Fast and Flexible Access to Biological Data

    PubMed Central

    Kasprzyk, Arek; Keefe, Damian; Smedley, Damian; London, Darin; Spooner, William; Melsopp, Craig; Hammond, Martin; Rocca-Serra, Philippe; Cox, Tony; Birney, Ewan

    2004-01-01

    The EnsMart system (www.ensembl.org/EnsMart) provides a generic data warehousing solution for fast and flexible querying of large biological data sets and integration with third-party data and tools. The system consists of a query-optimized database and interactive, user-friendly interfaces. EnsMart has been applied to Ensembl, where it extends its genomic browser capabilities, facilitating rapid retrieval of customized data sets. A wide variety of complex queries, on various types of annotations, for numerous species are supported. These can be applied to many research problems, ranging from SNP selection for candidate gene screening, through cross-species evolutionary comparisons, to microarray annotation. Users can group and refine biological data according to many criteria, including cross-species analyses, disease links, sequence variations, and expression patterns. Both tabulated list data and biological sequence output can be generated dynamically, in HTML, text, Microsoft Excel, and compressed formats. A wide range of sequence types, such as cDNA, peptides, coding regions, UTRs, and exons, with additional upstream and downstream regions, can be retrieved. The EnsMart database can be accessed via a public Web site, or through a Java application suite. Both implementations and the database are freely available for local installation, and can be extended or adapted to `non-Ensembl' data sets. PMID:14707178

  17. Omicseq: a web-based search engine for exploring omics datasets.

    PubMed

    Sun, Xiaobo; Pittard, William S; Xu, Tianlei; Chen, Li; Zwick, Michael E; Jiang, Xiaoqian; Wang, Fusheng; Qin, Zhaohui S

    2017-07-03

    The development and application of high-throughput genomics technologies has resulted in massive quantities of diverse omics data that continue to accumulate rapidly. These rich datasets offer unprecedented and exciting opportunities to address long standing questions in biomedical research. However, our ability to explore and query the content of diverse omics data is very limited. Existing dataset search tools rely almost exclusively on the metadata. A text-based query for gene name(s) does not work well on datasets wherein the vast majority of their content is numeric. To overcome this barrier, we have developed Omicseq, a novel web-based platform that facilitates the easy interrogation of omics datasets holistically to improve 'findability' of relevant data. The core component of Omicseq is trackRank, a novel algorithm for ranking omics datasets that fully uses the numerical content of the dataset to determine relevance to the query entity. The Omicseq system is supported by a scalable and elastic, NoSQL database that hosts a large collection of processed omics datasets. In the front end, a simple, web-based interface allows users to enter queries and instantly receive search results as a list of ranked datasets deemed to be the most relevant. Omicseq is freely available at http://www.omicseq.org. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  18. GEMINI: a computationally-efficient search engine for large gene expression datasets.

    PubMed

    DeFreitas, Timothy; Saddiki, Hachem; Flaherty, Patrick

    2016-02-24

    Low-cost DNA sequencing allows organizations to accumulate massive amounts of genomic data and use that data to answer a diverse range of research questions. Presently, users must search for relevant genomic data using a keyword, accession number of meta-data tag. However, in this search paradigm the form of the query - a text-based string - is mismatched with the form of the target - a genomic profile. To improve access to massive genomic data resources, we have developed a fast search engine, GEMINI, that uses a genomic profile as a query to search for similar genomic profiles. GEMINI implements a nearest-neighbor search algorithm using a vantage-point tree to store a database of n profiles and in certain circumstances achieves an [Formula: see text] expected query time in the limit. We tested GEMINI on breast and ovarian cancer gene expression data from The Cancer Genome Atlas project and show that it achieves a query time that scales as the logarithm of the number of records in practice on genomic data. In a database with 10(5) samples, GEMINI identifies the nearest neighbor in 0.05 sec compared to a brute force search time of 0.6 sec. GEMINI is a fast search engine that uses a query genomic profile to search for similar profiles in a very large genomic database. It enables users to identify similar profiles independent of sample label, data origin or other meta-data information.

  19. Improving integrative searching of systems chemical biology data using semantic annotation.

    PubMed

    Chen, Bin; Ding, Ying; Wild, David J

    2012-03-08

    Systems chemical biology and chemogenomics are considered critical, integrative disciplines in modern biomedical research, but require data mining of large, integrated, heterogeneous datasets from chemistry and biology. We previously developed an RDF-based resource called Chem2Bio2RDF that enabled querying of such data using the SPARQL query language. Whilst this work has proved useful in its own right as one of the first major resources in these disciplines, its utility could be greatly improved by the application of an ontology for annotation of the nodes and edges in the RDF graph, enabling a much richer range of semantic queries to be issued. We developed a generalized chemogenomics and systems chemical biology OWL ontology called Chem2Bio2OWL that describes the semantics of chemical compounds, drugs, protein targets, pathways, genes, diseases and side-effects, and the relationships between them. The ontology also includes data provenance. We used it to annotate our Chem2Bio2RDF dataset, making it a rich semantic resource. Through a series of scientific case studies we demonstrate how this (i) simplifies the process of building SPARQL queries, (ii) enables useful new kinds of queries on the data and (iii) makes possible intelligent reasoning and semantic graph mining in chemogenomics and systems chemical biology. Chem2Bio2OWL is available at http://chem2bio2rdf.org/owl. The document is available at http://chem2bio2owl.wikispaces.com.

  20. Modeling and query the uncertainty of network constrained moving objects based on RFID data

    NASA Astrophysics Data System (ADS)

    Han, Liang; Xie, Kunqing; Ma, Xiujun; Song, Guojie

    2007-06-01

    The management of network constrained moving objects is more and more practical, especially in intelligent transportation system. In the past, the location information of moving objects on network is collected by GPS, which cost high and has the problem of frequent update and privacy. The RFID (Radio Frequency IDentification) devices are used more and more widely to collect the location information. They are cheaper and have less update. And they interfere in the privacy less. They detect the id of the object and the time when moving object passed by the node of the network. They don't detect the objects' exact movement in side the edge, which lead to a problem of uncertainty. How to modeling and query the uncertainty of the network constrained moving objects based on RFID data becomes a research issue. In this paper, a model is proposed to describe the uncertainty of network constrained moving objects. A two level index is presented to provide efficient access to the network and the data of movement. The processing of imprecise time-slice query and spatio-temporal range query are studied in this paper. The processing includes four steps: spatial filter, spatial refinement, temporal filter and probability calculation. Finally, some experiments are done based on the simulated data. In the experiments the performance of the index is studied. The precision and recall of the result set are defined. And how the query arguments affect the precision and recall of the result set is also discussed.

  1. Distributed Data Service for Data Management in Internet of Things Middleware

    PubMed Central

    Cruz Huacarpuma, Ruben; de Sousa Junior, Rafael Timoteo; de Holanda, Maristela Terto; de Oliveira Albuquerque, Robson; García Villalba, Luis Javier; Kim, Tai-Hoon

    2017-01-01

    The development of the Internet of Things (IoT) is closely related to a considerable increase in the number and variety of devices connected to the Internet. Sensors have become a regular component of our environment, as well as smart phones and other devices that continuously collect data about our lives even without our intervention. With such connected devices, a broad range of applications has been developed and deployed, including those dealing with massive volumes of data. In this paper, we introduce a Distributed Data Service (DDS) to collect and process data for IoT environments. One central goal of this DDS is to enable multiple and distinct IoT middleware systems to share common data services from a loosely-coupled provider. In this context, we propose a new specification of functionalities for a DDS and the conception of the corresponding techniques for collecting, filtering and storing data conveniently and efficiently in this environment. Another contribution is a data aggregation component that is proposed to support efficient real-time data querying. To validate its data collecting and querying functionalities and performance, the proposed DDS is evaluated in two case studies regarding a simulated smart home system, the first case devoted to evaluating data collection and aggregation when the DDS is interacting with the UIoT middleware, and the second aimed at comparing the DDS data collection with this same functionality implemented within the Kaa middleware. PMID:28448469

  2. Query by Browsing: An Alternative Hypertext Information Retrieval Method

    PubMed Central

    Frisse, Mark E.; Cousins, Steve B.

    1989-01-01

    In this paper we discuss our efforts to develop programs which enhance the ability to navigate through large medical hypertext systems. Our approach organizes hypertext index terms into a belief network and uses reader feedback to update the degree of belief in the index terms' utility to a query. We begin by describing various possible configurations for indexes to hypertext. We then describe how belief network calculations can be applied to these indexes. After a brief discussion of early results using manuscripts from a medical handbook, we close with an analysis of our approach's applicability to a wider range of hypertext information retrieval problems.

  3. Semantic Web Applications and Tools for the Life Sciences: SWAT4LS 2010

    PubMed Central

    2012-01-01

    As Semantic Web technologies mature and new releases of key elements, such as SPARQL 1.1 and OWL 2.0, become available, the Life Sciences continue to push the boundaries of these technologies with ever more sophisticated tools and applications. Unsurprisingly, therefore, interest in the SWAT4LS (Semantic Web Applications and Tools for the Life Sciences) activities have remained high, as was evident during the third international SWAT4LS workshop held in Berlin in December 2010. Contributors to this workshop were invited to submit extended versions of their papers, the best of which are now made available in the special supplement of BMC Bioinformatics. The papers reflect the wide range of work in this area, covering the storage and querying of Life Sciences data in RDF triple stores, tools for the development of biomedical ontologies and the semantics-based integration of Life Sciences as well as clinicial data. PMID:22373274

  4. Semantic Web applications and tools for the life sciences: SWAT4LS 2010.

    PubMed

    Burger, Albert; Paschke, Adrian; Romano, Paolo; Marshall, M Scott; Splendiani, Andrea

    2012-01-25

    As Semantic Web technologies mature and new releases of key elements, such as SPARQL 1.1 and OWL 2.0, become available, the Life Sciences continue to push the boundaries of these technologies with ever more sophisticated tools and applications. Unsurprisingly, therefore, interest in the SWAT4LS (Semantic Web Applications and Tools for the Life Sciences) activities have remained high, as was evident during the third international SWAT4LS workshop held in Berlin in December 2010. Contributors to this workshop were invited to submit extended versions of their papers, the best of which are now made available in the special supplement of BMC Bioinformatics. The papers reflect the wide range of work in this area, covering the storage and querying of Life Sciences data in RDF triple stores, tools for the development of biomedical ontologies and the semantics-based integration of Life Sciences as well as clinicial data.

  5. U.S. Adults with Agricultural Experience Report More Genetic Engineering Familiarity than Those Without

    ERIC Educational Resources Information Center

    Stofer, Kathryn A.; Schiebel, Tracee M.

    2017-01-01

    Researchers and pollsters still debate the acceptance of genetic engineering technology among U.S. adults, and continue to assess their knowledge as part of this research. While decision-making may not rely entirely on knowledge, querying opinions and perceptions rely on public understanding of genetic engineering terms. Experience with…

  6. Word Production Inconsistency of Singaporean-English-Speaking Adolescents with Down Syndrome

    ERIC Educational Resources Information Center

    Wong, Betty; Brebner, Chris; McCormack, Paul; Butcher, Andy

    2015-01-01

    Background: The nature of speech disorders in individuals with Down Syndrome (DS) remains controversial despite various explanations put forth in the literature to account for the observed speech profiles. A high level of word production inconsistency in children with DS has led researchers to query whether the inconsistency continues into…

  7. Foundations for Streaming Model Transformations by Complex Event Processing.

    PubMed

    Dávid, István; Ráth, István; Varró, Dániel

    2018-01-01

    Streaming model transformations represent a novel class of transformations to manipulate models whose elements are continuously produced or modified in high volume and with rapid rate of change. Executing streaming transformations requires efficient techniques to recognize activated transformation rules over a live model and a potentially infinite stream of events. In this paper, we propose foundations of streaming model transformations by innovatively integrating incremental model query, complex event processing (CEP) and reactive (event-driven) transformation techniques. Complex event processing allows to identify relevant patterns and sequences of events over an event stream. Our approach enables event streams to include model change events which are automatically and continuously populated by incremental model queries. Furthermore, a reactive rule engine carries out transformations on identified complex event patterns. We provide an integrated domain-specific language with precise semantics for capturing complex event patterns and streaming transformations together with an execution engine, all of which is now part of the Viatra reactive transformation framework. We demonstrate the feasibility of our approach with two case studies: one in an advanced model engineering workflow; and one in the context of on-the-fly gesture recognition.

  8. A Big Spatial Data Processing Framework Applying to National Geographic Conditions Monitoring

    NASA Astrophysics Data System (ADS)

    Xiao, F.

    2018-04-01

    In this paper, a novel framework for spatial data processing is proposed, which apply to National Geographic Conditions Monitoring project of China. It includes 4 layers: spatial data storage, spatial RDDs, spatial operations, and spatial query language. The spatial data storage layer uses HDFS to store large size of spatial vector/raster data in the distributed cluster. The spatial RDDs are the abstract logical dataset of spatial data types, and can be transferred to the spark cluster to conduct spark transformations and actions. The spatial operations layer is a series of processing on spatial RDDs, such as range query, k nearest neighbor and spatial join. The spatial query language is a user-friendly interface which provide people not familiar with Spark with a comfortable way to operation the spatial operation. Compared with other spatial frameworks, it is highlighted that comprehensive technologies are referred for big spatial data processing. Extensive experiments on real datasets show that the framework achieves better performance than traditional process methods.

  9. UCSC genome browser: deep support for molecular biomedical research.

    PubMed

    Mangan, Mary E; Williams, Jennifer M; Lathe, Scott M; Karolchik, Donna; Lathe, Warren C

    2008-01-01

    The volume and complexity of genomic sequence data, and the additional experimental data required for annotation of the genomic context, pose a major challenge for display and access for biomedical researchers. Genome browsers organize this data and make it available in various ways to extract useful information to advance research projects. The UCSC Genome Browser is one of these resources. The official sequence data for a given species forms the framework to display many other types of data such as expression, variation, cross-species comparisons, and more. Visual representations of the data are available for exploration. Data can be queried with sequences. Complex database queries are also easily achieved with the Table Browser interface. Associated tools permit additional query types or access to additional data sources such as images of in situ localizations. Support for solving researcher's issues is provided with active discussion mailing lists and by providing updated training materials. The UCSC Genome Browser provides a source of deep support for a wide range of biomedical molecular research (http://genome.ucsc.edu).

  10. Route Generation for a Synthetic Character (BOT) Using a Partial or Incomplete Knowledge Route Generation Algorithm in UT2004 Virtual Environment

    NASA Technical Reports Server (NTRS)

    Hanold, Gregg T.; Hanold, David T.

    2010-01-01

    This paper presents a new Route Generation Algorithm that accurately and realistically represents human route planning and navigation for Military Operations in Urban Terrain (MOUT). The accuracy of this algorithm in representing human behavior is measured using the Unreal Tournament(Trademark) 2004 (UT2004) Game Engine to provide the simulation environment in which the differences between the routes taken by the human player and those of a Synthetic Agent (BOT) executing the A-star algorithm and the new Route Generation Algorithm can be compared. The new Route Generation Algorithm computes the BOT route based on partial or incomplete knowledge received from the UT2004 game engine during game play. To allow BOT navigation to occur continuously throughout the game play with incomplete knowledge of the terrain, a spatial network model of the UT2004 MOUT terrain is captured and stored in an Oracle 11 9 Spatial Data Object (SOO). The SOO allows a partial data query to be executed to generate continuous route updates based on the terrain knowledge, and stored dynamic BOT, Player and environmental parameters returned by the query. The partial data query permits the dynamic adjustment of the planned routes by the Route Generation Algorithm based on the current state of the environment during a simulation. The dynamic nature of this algorithm more accurately allows the BOT to mimic the routes taken by the human executing under the same conditions thereby improving the realism of the BOT in a MOUT simulation environment.

  11. PIBAS FedSPARQL: a web-based platform for integration and exploration of bioinformatics datasets.

    PubMed

    Djokic-Petrovic, Marija; Cvjetkovic, Vladimir; Yang, Jeremy; Zivanovic, Marko; Wild, David J

    2017-09-20

    There are a huge variety of data sources relevant to chemical, biological and pharmacological research, but these data sources are highly siloed and cannot be queried together in a straightforward way. Semantic technologies offer the ability to create links and mappings across datasets and manage them as a single, linked network so that searching can be carried out across datasets, independently of the source. We have developed an application called PIBAS FedSPARQL that uses semantic technologies to allow researchers to carry out such searching across a vast array of data sources. PIBAS FedSPARQL is a web-based query builder and result set visualizer of bioinformatics data. As an advanced feature, our system can detect similar data items identified by different Uniform Resource Identifiers (URIs), using a text-mining algorithm based on the processing of named entities to be used in Vector Space Model and Cosine Similarity Measures. According to our knowledge, PIBAS FedSPARQL was unique among the systems that we found in that it allows detecting of similar data items. As a query builder, our system allows researchers to intuitively construct and run Federated SPARQL queries across multiple data sources, including global initiatives, such as Bio2RDF, Chem2Bio2RDF, EMBL-EBI, and one local initiative called CPCTAS, as well as additional user-specified data source. From the input topic, subtopic, template and keyword, a corresponding initial Federated SPARQL query is created and executed. Based on the data obtained, end users have the ability to choose the most appropriate data sources in their area of interest and exploit their Resource Description Framework (RDF) structure, which allows users to select certain properties of data to enhance query results. The developed system is flexible and allows intuitive creation and execution of queries for an extensive range of bioinformatics topics. Also, the novel "similar data items detection" algorithm can be particularly useful for suggesting new data sources and cost optimization for new experiments. PIBAS FedSPARQL can be expanded with new topics, subtopics and templates on demand, rendering information retrieval more robust.

  12. Query Auto-Completion Based on Word2vec Semantic Similarity

    NASA Astrophysics Data System (ADS)

    Shao, Taihua; Chen, Honghui; Chen, Wanyu

    2018-04-01

    Query auto-completion (QAC) is the first step of information retrieval, which helps users formulate the entire query after inputting only a few prefixes. Regarding the models of QAC, the traditional method ignores the contribution from the semantic relevance between queries. However, similar queries always express extremely similar search intention. In this paper, we propose a hybrid model FS-QAC based on query semantic similarity as well as the query frequency. We choose word2vec method to measure the semantic similarity between intended queries and pre-submitted queries. By combining both features, our experiments show that FS-QAC model improves the performance when predicting the user’s query intention and helping formulate the right query. Our experimental results show that the optimal hybrid model contributes to a 7.54% improvement in terms of MRR against a state-of-the-art baseline using the public AOL query logs.

  13. EquiX-A Search and Query Language for XML.

    ERIC Educational Resources Information Center

    Cohen, Sara; Kanza, Yaron; Kogan, Yakov; Sagiv, Yehoshua; Nutt, Werner; Serebrenik, Alexander

    2002-01-01

    Describes EquiX, a search language for XML that combines querying with searching to query the data and the meta-data content of Web pages. Topics include search engines; a data model for XML documents; search query syntax; search query semantics; an algorithm for evaluating a query on a document; and indexing EquiX queries. (LRW)

  14. The EarthServer Federation: State, Role, and Contribution to GEOSS

    NASA Astrophysics Data System (ADS)

    Merticariu, Vlad; Baumann, Peter

    2016-04-01

    The intercontinental EarthServer initiative has established a European datacube platform with proven scalability: known databases exceed 100 TB, and single queries have been split across more than 1,000 cloud nodes. Its service interface being rigorously based on the OGC "Big Geo Data" standards, Web Coverage Service (WCS) and Web Coverage Processing Service (WCPS), a series of clients can dock into the services, ranging from open-source OpenLayers and QGIS over open-source NASA WorldWind to proprietary ESRI ArcGIS. Datacube fusion in a "mix and match" style is supported by the platform technolgy, the rasdaman Array Database System, which transparently federates queries so that users simply approach any node of the federation to access any data item, internally optimized for minimal data transfer. Notably, rasdaman is part of GEOSS GCI. NASA is contributing its Web WorldWind virtual globe for user-friendly data extraction, navigation, and analysis. Integrated datacube / metadata queries are contributed by CITE. Current federation members include ESA (managed by MEEO sr.l.), Plymouth Marine Laboratory (PML), the European Centre for Medium-Range Weather Forecast (ECMWF), Australia's National Computational Infrastructure, and Jacobs University (adding in Planetary Science). Further data centers have expressed interest in joining. We present the EarthServer approach, discuss its underlying technology, and illustrate the contribution this datacube platform can make to GEOSS.

  15. Spatial and symbolic queries for 3D image data

    NASA Astrophysics Data System (ADS)

    Benson, Daniel C.; Zick, Gregory L.

    1992-04-01

    We present a query system for an object-oriented biomedical imaging database containing 3-D anatomical structures and their corresponding 2-D images. The graphical interface facilitates the formation of spatial queries, nonspatial or symbolic queries, and combined spatial/symbolic queries. A query editor is used for the creation and manipulation of 3-D query objects as volumes, surfaces, lines, and points. Symbolic predicates are formulated through a combination of text fields and multiple choice selections. Query results, which may include images, image contents, composite objects, graphics, and alphanumeric data, are displayed in multiple views. Objects returned by the query may be selected directly within the views for further inspection or modification, or for use as query objects in subsequent queries. Our image database query system provides visual feedback and manipulation of spatial query objects, multiple views of volume data, and the ability to combine spatial and symbolic queries. The system allows for incremental enhancement of existing objects and the addition of new objects and spatial relationships. The query system is designed for databases containing symbolic and spatial data. This paper discuses its application to data acquired in biomedical 3- D image reconstruction, but it is applicable to other areas such as CAD/CAM, geographical information systems, and computer vision.

  16. GenoQuery: a new querying module for functional annotation in a genomic warehouse

    PubMed Central

    Lemoine, Frédéric; Labedan, Bernard; Froidevaux, Christine

    2008-01-01

    Motivation: We have to cope with both a deluge of new genome sequences and a huge amount of data produced by high-throughput approaches used to exploit these genomic features. Crossing and comparing such heterogeneous and disparate data will help improving functional annotation of genomes. This requires designing elaborate integration systems such as warehouses for storing and querying these data. Results: We have designed a relational genomic warehouse with an original multi-layer architecture made of a databases layer and an entities layer. We describe a new querying module, GenoQuery, which is based on this architecture. We use the entities layer to define mixed queries. These mixed queries allow searching for instances of biological entities and their properties in the different databases, without specifying in which database they should be found. Accordingly, we further introduce the central notion of alternative queries. Such queries have the same meaning as the original mixed queries, while exploiting complementarities yielded by the various integrated databases of the warehouse. We explain how GenoQuery computes all the alternative queries of a given mixed query. We illustrate how useful this querying module is by means of a thorough example. Availability: http://www.lri.fr/~lemoine/GenoQuery/ Contact: chris@lri.fr, lemoine@lri.fr PMID:18586731

  17. SPARK: Adapting Keyword Query to Semantic Search

    NASA Astrophysics Data System (ADS)

    Zhou, Qi; Wang, Chong; Xiong, Miao; Wang, Haofen; Yu, Yong

    Semantic search promises to provide more accurate result than present-day keyword search. However, progress with semantic search has been delayed due to the complexity of its query languages. In this paper, we explore a novel approach of adapting keywords to querying the semantic web: the approach automatically translates keyword queries into formal logic queries so that end users can use familiar keywords to perform semantic search. A prototype system named 'SPARK' has been implemented in light of this approach. Given a keyword query, SPARK outputs a ranked list of SPARQL queries as the translation result. The translation in SPARK consists of three major steps: term mapping, query graph construction and query ranking. Specifically, a probabilistic query ranking model is proposed to select the most likely SPARQL query. In the experiment, SPARK achieved an encouraging translation result.

  18. Searching for rare diseases in PubMed: a blind comparison of Orphanet expert query and query based on terminological knowledge.

    PubMed

    Griffon, N; Schuers, M; Dhombres, F; Merabti, T; Kerdelhué, G; Rollin, L; Darmoni, S J

    2016-08-02

    Despite international initiatives like Orphanet, it remains difficult to find up-to-date information about rare diseases. The aim of this study is to propose an exhaustive set of queries for PubMed based on terminological knowledge and to evaluate it versus the queries based on expertise provided by the most frequently used resource in Europe: Orphanet. Four rare disease terminologies (MeSH, OMIM, HPO and HRDO) were manually mapped to each other permitting the automatic creation of expended terminological queries for rare diseases. For 30 rare diseases, 30 citations retrieved by Orphanet expert query and/or query based on terminological knowledge were assessed for relevance by two independent reviewers unaware of the query's origin. An adjudication procedure was used to resolve any discrepancy. Precision, relative recall and F-measure were all computed. For each Orphanet rare disease (n = 8982), there was a corresponding terminological query, in contrast with only 2284 queries provided by Orphanet. Only 553 citations were evaluated due to queries with 0 or only a few hits. There were no significant differences between the Orpha query and terminological query in terms of precision, respectively 0.61 vs 0.52 (p = 0.13). Nevertheless, terminological queries retrieved more citations more often than Orpha queries (0.57 vs. 0.33; p = 0.01). Interestingly, Orpha queries seemed to retrieve older citations than terminological queries (p < 0.0001). The terminological queries proposed in this study are now currently available for all rare diseases. They may be a useful tool for both precision or recall oriented literature search.

  19. Visual exploration of big spatio-temporal urban data: a study of New York City taxi trips.

    PubMed

    Ferreira, Nivan; Poco, Jorge; Vo, Huy T; Freire, Juliana; Silva, Cláudio T

    2013-12-01

    As increasing volumes of urban data are captured and become available, new opportunities arise for data-driven analysis that can lead to improvements in the lives of citizens through evidence-based decision making and policies. In this paper, we focus on a particularly important urban data set: taxi trips. Taxis are valuable sensors and information associated with taxi trips can provide unprecedented insight into many different aspects of city life, from economic activity and human behavior to mobility patterns. But analyzing these data presents many challenges. The data are complex, containing geographical and temporal components in addition to multiple variables associated with each trip. Consequently, it is hard to specify exploratory queries and to perform comparative analyses (e.g., compare different regions over time). This problem is compounded due to the size of the data-there are on average 500,000 taxi trips each day in NYC. We propose a new model that allows users to visually query taxi trips. Besides standard analytics queries, the model supports origin-destination queries that enable the study of mobility across the city. We show that this model is able to express a wide range of spatio-temporal queries, and it is also flexible in that not only can queries be composed but also different aggregations and visual representations can be applied, allowing users to explore and compare results. We have built a scalable system that implements this model which supports interactive response times; makes use of an adaptive level-of-detail rendering strategy to generate clutter-free visualization for large results; and shows hidden details to the users in a summary through the use of overlay heat maps. We present a series of case studies motivated by traffic engineers and economists that show how our model and system enable domain experts to perform tasks that were previously unattainable for them.

  20. Seasonality in seeking mental health information on Google.

    PubMed

    Ayers, John W; Althouse, Benjamin M; Allem, Jon-Patrick; Rosenquist, J Niels; Ford, Daniel E

    2013-05-01

    Population mental health surveillance is an important challenge limited by resource constraints, long time lags in data collection, and stigma. One promising approach to bridge similar gaps elsewhere has been the use of passively generated digital data. This article assesses the viability of aggregate Internet search queries for real-time monitoring of several mental health problems, specifically in regard to seasonal patterns of seeking out mental health information. All Google mental health queries were monitored in the U.S. and Australia from 2006 to 2010. Additionally, queries were subdivided among those including the terms ADHD (attention deficit-hyperactivity disorder); anxiety; bipolar; depression; anorexia or bulimia (eating disorders); OCD (obsessive-compulsive disorder); schizophrenia; and suicide. A wavelet phase analysis was used to isolate seasonal components in the trends, and based on this model, the mean search volume in winter was compared with that in summer, as performed in 2012. All mental health queries followed seasonal patterns with winter peaks and summer troughs amounting to a 14% (95% CI=11%, 16%) difference in volume for the U.S. and 11% (95% CI=7%, 15%) for Australia. These patterns also were evident for all specific subcategories of illness or problem. For instance, seasonal differences ranged from 7% (95% CI=5%, 10%) for anxiety (followed by OCD, bipolar, depression, suicide, ADHD, schizophrenia) to 37% (95% CI=31%, 44%) for eating disorder queries in the U.S. Several nonclinical motivators for query seasonality (such as media trends or academic interest) were explored and rejected. Information seeking on Google across all major mental illnesses and/or problems followed seasonal patterns similar to those found for seasonal affective disorder. These are the first data published on patterns of seasonality in information seeking encompassing all the major mental illnesses, notable also because they likely would have gone undetected using traditional surveillance. Copyright © 2013. Published by Elsevier Inc.

  1. SPARQL Query Re-writing Using Partonomy Based Transformation Rules

    NASA Astrophysics Data System (ADS)

    Jain, Prateek; Yeh, Peter Z.; Verma, Kunal; Henson, Cory A.; Sheth, Amit P.

    Often the information present in a spatial knowledge base is represented at a different level of granularity and abstraction than the query constraints. For querying ontology's containing spatial information, the precise relationships between spatial entities has to be specified in the basic graph pattern of SPARQL query which can result in long and complex queries. We present a novel approach to help users intuitively write SPARQL queries to query spatial data, rather than relying on knowledge of the ontology structure. Our framework re-writes queries, using transformation rules to exploit part-whole relations between geographical entities to address the mismatches between query constraints and knowledge base. Our experiments were performed on completely third party datasets and queries. Evaluations were performed on Geonames dataset using questions from National Geographic Bee serialized into SPARQL and British Administrative Geography Ontology using questions from a popular trivia website. These experiments demonstrate high precision in retrieval of results and ease in writing queries.

  2. Targeting and Structuring Information Resource Use: A Path toward Informed Clinical Decisions

    ERIC Educational Resources Information Center

    Mangrulkar, Rajesh S.

    2004-01-01

    A core skill for all physicians to master is that of information manager. Despite a rapidly expanding set of electronic and print-based information resources, clinicians continue to answer their clinical queries predominantly through informal or formal consultation. Even as new tools are brought to market, the majority of them present information…

  3. Applying Semantic Web Concepts to Support Net-Centric Warfare Using the Tactical Assessment Markup Language (TAML)

    DTIC Science & Technology

    2006-06-01

    SPARQL SPARQL Protocol and RDF Query Language SQL Structured Query Language SUMO Suggested Upper Merged Ontology SW... Query optimization algorithms are implemented in the Pellet reasoner in order to ensure querying a knowledge base is efficient . These algorithms...memory as a treelike structure in order for the data to be queried . XML Query (XQuery) is the standard language used when querying XML

  4. Implementation of Quantum Private Queries Using Nuclear Magnetic Resonance

    NASA Astrophysics Data System (ADS)

    Wang, Chuan; Hao, Liang; Zhao, Lian-Jie

    2011-08-01

    We present a modified protocol for the realization of a quantum private query process on a classical database. Using one-qubit query and CNOT operation, the query process can be realized in a two-mode database. In the query process, the data privacy is preserved as the sender would not reveal any information about the database besides her query information, and the database provider cannot retain any information about the query. We implement the quantum private query protocol in a nuclear magnetic resonance system. The density matrix of the memory registers are constructed.

  5. A study of medical and health queries to web search engines.

    PubMed

    Spink, Amanda; Yang, Yin; Jansen, Jim; Nykanen, Pirrko; Lorence, Daniel P; Ozmutlu, Seda; Ozmutlu, H Cenk

    2004-03-01

    This paper reports findings from an analysis of medical or health queries to different web search engines. We report results: (i). comparing samples of 10000 web queries taken randomly from 1.2 million query logs from the AlltheWeb.com and Excite.com commercial web search engines in 2001 for medical or health queries, (ii). comparing the 2001 findings from Excite and AlltheWeb.com users with results from a previous analysis of medical and health related queries from the Excite Web search engine for 1997 and 1999, and (iii). medical or health advice-seeking queries beginning with the word 'should'. Findings suggest: (i). a small percentage of web queries are medical or health related, (ii). the top five categories of medical or health queries were: general health, weight issues, reproductive health and puberty, pregnancy/obstetrics, and human relationships, and (iii). over time, the medical and health queries may have declined as a proportion of all web queries, as the use of specialized medical/health websites and e-commerce-related queries has increased. Findings provide insights into medical and health-related web querying and suggests some implications for the use of the general web search engines when seeking medical/health information.

  6. TOWARD AN ACCURATE ANALYSIS OF RANGE QUERIES ON SPATIAL DATA. (R825195)

    EPA Science Inventory

    The perspectives, information and conclusions conveyed in research project abstracts, progress reports, final reports, journal abstracts and journal publications convey the viewpoints of the principal investigator and may not represent the views and policies of ORD and EPA. Concl...

  7. RDF-GL: A SPARQL-Based Graphical Query Language for RDF

    NASA Astrophysics Data System (ADS)

    Hogenboom, Frederik; Milea, Viorel; Frasincar, Flavius; Kaymak, Uzay

    This chapter presents RDF-GL, a graphical query language (GQL) for RDF. The GQL is based on the textual query language SPARQL and mainly focuses on SPARQL SELECT queries. The advantage of a GQL over textual query languages is that complexity is hidden through the use of graphical symbols. RDF-GL is supported by a Java-based editor, SPARQLinG, which is presented as well. The editor does not only allow for RDF-GL query creation, but also converts RDF-GL queries to SPARQL queries and is able to subsequently execute these. Experiments show that using the GQL in combination with the editor makes RDF querying more accessible for end users.

  8. Machine Learning Approaches in Cardiovascular Imaging.

    PubMed

    Henglin, Mir; Stein, Gillian; Hushcha, Pavel V; Snoek, Jasper; Wiltschko, Alexander B; Cheng, Susan

    2017-10-01

    Cardiovascular imaging technologies continue to increase in their capacity to capture and store large quantities of data. Modern computational methods, developed in the field of machine learning, offer new approaches to leveraging the growing volume of imaging data available for analyses. Machine learning methods can now address data-related problems ranging from simple analytic queries of existing measurement data to the more complex challenges involved in analyzing raw images. To date, machine learning has been used in 2 broad and highly interconnected areas: automation of tasks that might otherwise be performed by a human and generation of clinically important new knowledge. Most cardiovascular imaging studies have focused on task-oriented problems, but more studies involving algorithms aimed at generating new clinical insights are emerging. Continued expansion in the size and dimensionality of cardiovascular imaging databases is driving strong interest in applying powerful deep learning methods, in particular, to analyze these data. Overall, the most effective approaches will require an investment in the resources needed to appropriately prepare such large data sets for analyses. Notwithstanding current technical and logistical challenges, machine learning and especially deep learning methods have much to offer and will substantially impact the future practice and science of cardiovascular imaging. © 2017 American Heart Association, Inc.

  9. Outcomes of short-gap sensory nerve injuries reconstructed with processed nerve allografts from a multicenter registry study.

    PubMed

    Rinker, Brian D; Ingari, John V; Greenberg, Jeffrey A; Thayer, Wesley P; Safa, Bauback; Buncke, Gregory M

    2015-06-01

    Short-gap digital nerve injuries are a common surgical problem, but the optimal treatment modality is unknown. A multicenter database was queried and analyzed to determine the outcomes of nerve gap reconstructions between 5 and 15 mm with processed nerve allograft. The current RANGER registry is designed to continuously monitor and compile injury, repair, safety, and outcomes data. Centers followed their own standard of care for treatment and follow-up. The database was queried for digital nerve injuries with a gap between 5 and 15 mm reporting sufficient follow-up data to complete outcomes analysis. Available quantitative outcome measures were reviewed and reported. Meaningful recovery was defined by the Medical Research Council Classification (MRCC) scale at S3-S4 for sensory function. Sufficient follow-up data were available for 24 subjects (37 repairs) in the prescribed gap range. Mean age was 43 years (range, 23-81). Mean gap was 11 ± 3 (5-15) mm. Time to repair was 13 ± 42 (0-215) days. There were 25 lacerations, 8 avulsion/amputations, 2 gunshots, 1 crush injury, and 1 injury of unknown mechanism. Meaningful recovery, defined as S3-S4 on the MRCC scales, was reported in 92% of repairs. Sensory recovery of S3+ or S4 was observed in 84% of repairs. Static 2PD was 7.1 ± 2.9 mm (n = 19). Return to light touch was observed in 23 out of 32 repairs reporting Semmes-Weinstein monofilament outcomes (SWMF). There were no reported nerve adverse events. Sensory outcomes for processed nerve allografts were equivalent to historical controls for nerve autograft and exceed those of conduit. Processed nerve allografts provide an effective solution for short-gap digital nerve reconstructions. Thieme Medical Publishers 333 Seventh Avenue, New York, NY 10001, USA.

  10. An overview of the National Earthquake Information Center acquisition software system, Edge/Continuous Waveform Buffer

    USGS Publications Warehouse

    Patton, John M.; Ketchum, David C.; Guy, Michelle R.

    2015-11-02

    This document provides an overview of the capabilities, design, and use cases of the data acquisition and archiving subsystem at the U.S. Geological Survey National Earthquake Information Center. The Edge and Continuous Waveform Buffer software supports the National Earthquake Information Center’s worldwide earthquake monitoring mission in direct station data acquisition, data import, short- and long-term data archiving, data distribution, query services, and playback, among other capabilities. The software design and architecture can be configured to support acquisition and (or) archiving use cases. The software continues to be developed in order to expand the acquisition, storage, and distribution capabilities.

  11. Addressing the Challenges of Multi-Domain Data Integration with the SemantEco Framework

    NASA Astrophysics Data System (ADS)

    Patton, E. W.; Seyed, P.; McGuinness, D. L.

    2013-12-01

    Data integration across multiple domains will continue to be a challenge with the proliferation of big data in the sciences. Data origination issues and how data are manipulated are critical to enable scientists to understand and consume disparate datasets as research becomes more multidisciplinary. We present the SemantEco framework as an exemplar for designing an integrative portal for data discovery, exploration, and interpretation that uses best practice W3C Recommendations. We use the Resource Description Framework (RDF) with extensible ontologies described in the Web Ontology Language (OWL) to provide graph-based data representation. Furthermore, SemantEco ingests data via the software package csv2rdf4lod, which generates data provenance using the W3C provenance recommendation (PROV). Our presentation will discuss benefits and challenges of semantic integration, their effect on runtime performance, and how the SemantEco framework assisted in identifying performance issues and improved query performance across multiple domains by an order of magnitude. SemantEco benefits from a semantic approach that provides an 'open world', which allows data to incrementally change just as it does in the real world. SemantEco modules may load new ontologies and data using the W3C's SPARQL Protocol and RDF Query Language via HTTP. Modules may also provide user interface elements for applications and query capabilities to support new use cases. Modules can associate with domains, which are first-class objects in SemantEco. This enables SemantEco to perform integration and reasoning both within and across domains on module-provided data. The SemantEco framework has been used to construct a web portal for environmental and ecological data. The portal includes water and air quality data from the U.S. Geological Survey (USGS) and Environmental Protection Agency (EPA) and species observation counts for birds and fish from the Avian Knowledge Network and the Santa Barbara Long Term Ecological Research, respectively. We provide regulation ontologies using OWL2 datatype facets to detect out-of-range measurements for environmental standards set by the EPA, i.a. Users adjust queries using module-defined facets and a map presents the resulting measurement sites. Custom icons identify sites that violate regulations, making them easy to locate. Selecting a site gives the option of charting spatially proximate data from different domains over time. Our portal currently provides 1.6 billion triples of scientific data in RDF. We segment data by ZIP code and reasoning over 2157 measurements with our EPA regulation ontology that contains 131 regulations takes 2.5 seconds on a 2.4 GHz Intel Core 2 Quad with 8 GB of RAM. SemantEco's modular design and reasoning capabilities make it an exemplar for building multidisciplinary data integration tools that provide data access to scientists and the general population alike. Its provenance tracking provides accountability and its reasoning services can assist users in interpreting data. Future work includes support for geographical queries using the Open Geospatial Consortium's GeoSPARQL standard.

  12. Cumulative query method for influenza surveillance using search engine data.

    PubMed

    Seo, Dong-Woo; Jo, Min-Woo; Sohn, Chang Hwan; Shin, Soo-Yong; Lee, JaeHo; Yu, Maengsoo; Kim, Won Young; Lim, Kyoung Soo; Lee, Sang-Il

    2014-12-16

    Internet search queries have become an important data source in syndromic surveillance system. However, there is currently no syndromic surveillance system using Internet search query data in South Korea. The objective of this study was to examine correlations between our cumulative query method and national influenza surveillance data. Our study was based on the local search engine, Daum (approximately 25% market share), and influenza-like illness (ILI) data from the Korea Centers for Disease Control and Prevention. A quota sampling survey was conducted with 200 participants to obtain popular queries. We divided the study period into two sets: Set 1 (the 2009/10 epidemiological year for development set 1 and 2010/11 for validation set 1) and Set 2 (2010/11 for development Set 2 and 2011/12 for validation Set 2). Pearson's correlation coefficients were calculated between the Daum data and the ILI data for the development set. We selected the combined queries for which the correlation coefficients were .7 or higher and listed them in descending order. Then, we created a cumulative query method n representing the number of cumulative combined queries in descending order of the correlation coefficient. In validation set 1, 13 cumulative query methods were applied, and 8 had higher correlation coefficients (min=.916, max=.943) than that of the highest single combined query. Further, 11 of 13 cumulative query methods had an r value of ≥.7, but 4 of 13 combined queries had an r value of ≥.7. In validation set 2, 8 of 15 cumulative query methods showed higher correlation coefficients (min=.975, max=.987) than that of the highest single combined query. All 15 cumulative query methods had an r value of ≥.7, but 6 of 15 combined queries had an r value of ≥.7. Cumulative query method showed relatively higher correlation with national influenza surveillance data than combined queries in the development and validation set.

  13. Retrieving high-resolution images over the Internet from an anatomical image database

    NASA Astrophysics Data System (ADS)

    Strupp-Adams, Annette; Henderson, Earl

    1999-12-01

    The Visible Human Data set is an important contribution to the national collection of anatomical images. To enhance the availability of these images, the National Library of Medicine has supported the design and development of a prototype object-oriented image database which imports, stores, and distributes high resolution anatomical images in both pixel and voxel formats. One of the key database modules is its client-server Internet interface. This Web interface provides a query engine with retrieval access to high-resolution anatomical images that range in size from 100KB for browser viewable rendered images, to 1GB for anatomical structures in voxel file formats. The Web query and retrieval client-server system is composed of applet GUIs, servlets, and RMI application modules which communicate with each other to allow users to query for specific anatomical structures, and retrieve image data as well as associated anatomical images from the database. Selected images can be downloaded individually as single files via HTTP or downloaded in batch-mode over the Internet to the user's machine through an applet that uses Netscape's Object Signing mechanism. The image database uses ObjectDesign's object-oriented DBMS, ObjectStore that has a Java interface. The query and retrieval systems has been tested with a Java-CDE window system, and on the x86 architecture using Windows NT 4.0. This paper describes the Java applet client search engine that queries the database; the Java client module that enables users to view anatomical images online; the Java application server interface to the database which organizes data returned to the user, and its distribution engine that allow users to download image files individually and/or in batch-mode.

  14. Analysis of Patent Databases Using VxInsight

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    BOYACK,KEVIN W.; WYLIE,BRIAN N.; DAVIDSON,GEORGE S.

    2000-12-12

    We present the application of a new knowledge visualization tool, VxInsight, to the mapping and analysis of patent databases. Patent data are mined and placed in a database, relationships between the patents are identified, primarily using the citation and classification structures, then the patents are clustered using a proprietary force-directed placement algorithm. Related patents cluster together to produce a 3-D landscape view of the tens of thousands of patents. The user can navigate the landscape by zooming into or out of regions of interest. Querying the underlying database places a colored marker on each patent matching the query. Automatically generatedmore » labels, showing landscape content, update continually upon zooming. Optionally, citation links between patents may be shown on the landscape. The combination of these features enables powerful analyses of patent databases.« less

  15. Interaction and Communication of Agents in Networks and Language Complexity Estimates

    NASA Technical Reports Server (NTRS)

    Smid, Jan; Obitko, Marek; Fisher, David; Truszkowski, Walt

    2004-01-01

    Knowledge acquisition and sharing are arguably the most critical activities of communicating agents. We report about our on-going project featuring knowledge acquisition and sharing among communicating agents embedded in a network. The applications we target range from hardware robots to virtual entities such as internet agents. Agent experiments can be simulated using a convenient simulation language. We analyzed the complexity of communicating agent simulations using Java and Easel. Scenarios we have studied are listed below. The communication among agents can range from declarative queries to sub-natural language queries. 1) A set of agents monitoring an object are asked to build activity profiles based on exchanging elementary observations; 2) A set of car drivers form a line, where every car is following its predecessor. An unsafe distance cm create a strong wave in the line. Individual agents are asked to incorporate and apply directions how to avoid the wave. 3) A set of micro-vehicles form a grid and are asked to propagate information and concepts to a central server.

  16. Objective and automated protocols for the evaluation of biomedical search engines using No Title Evaluation protocols.

    PubMed

    Campagne, Fabien

    2008-02-29

    The evaluation of information retrieval techniques has traditionally relied on human judges to determine which documents are relevant to a query and which are not. This protocol is used in the Text Retrieval Evaluation Conference (TREC), organized annually for the past 15 years, to support the unbiased evaluation of novel information retrieval approaches. The TREC Genomics Track has recently been introduced to measure the performance of information retrieval for biomedical applications. We describe two protocols for evaluating biomedical information retrieval techniques without human relevance judgments. We call these protocols No Title Evaluation (NT Evaluation). The first protocol measures performance for focused searches, where only one relevant document exists for each query. The second protocol measures performance for queries expected to have potentially many relevant documents per query (high-recall searches). Both protocols take advantage of the clear separation of titles and abstracts found in Medline. We compare the performance obtained with these evaluation protocols to results obtained by reusing the relevance judgments produced in the 2004 and 2005 TREC Genomics Track and observe significant correlations between performance rankings generated by our approach and TREC. Spearman's correlation coefficients in the range of 0.79-0.92 are observed comparing bpref measured with NT Evaluation or with TREC evaluations. For comparison, coefficients in the range 0.86-0.94 can be observed when evaluating the same set of methods with data from two independent TREC Genomics Track evaluations. We discuss the advantages of NT Evaluation over the TRels and the data fusion evaluation protocols introduced recently. Our results suggest that the NT Evaluation protocols described here could be used to optimize some search engine parameters before human evaluation. Further research is needed to determine if NT Evaluation or variants of these protocols can fully substitute for human evaluations.

  17. Using Generalized Annotated Programs to Solve Social Network Diffusion Optimization Problems

    DTIC Science & Technology

    2013-01-01

    as follows: —Let kall be the k value for the SNDOP-ALL query and for each SNDOP query i, let ki be the k for that query. For each query i, set ki... kall − 1. —Number each element of vi ∈ V such that gI(vi) and V C(vi) are true. For the ith SNDOP query, let vi be the corresponding element of V —Let...vertices of S. PROOF. We set up |V | SNDOP-queries as follows: —Let kall be the k value for the SNDOP-ALL query and and for each SNDOP-query i, let ki be

  18. A web-based data-querying tool based on ontology-driven methodology and flowchart-based model.

    PubMed

    Ping, Xiao-Ou; Chung, Yufang; Tseng, Yi-Ju; Liang, Ja-Der; Yang, Pei-Ming; Huang, Guan-Tarn; Lai, Feipei

    2013-10-08

    Because of the increased adoption rate of electronic medical record (EMR) systems, more health care records have been increasingly accumulating in clinical data repositories. Therefore, querying the data stored in these repositories is crucial for retrieving the knowledge from such large volumes of clinical data. The aim of this study is to develop a Web-based approach for enriching the capabilities of the data-querying system along the three following considerations: (1) the interface design used for query formulation, (2) the representation of query results, and (3) the models used for formulating query criteria. The Guideline Interchange Format version 3.5 (GLIF3.5), an ontology-driven clinical guideline representation language, was used for formulating the query tasks based on the GLIF3.5 flowchart in the Protégé environment. The flowchart-based data-querying model (FBDQM) query execution engine was developed and implemented for executing queries and presenting the results through a visual and graphical interface. To examine a broad variety of patient data, the clinical data generator was implemented to automatically generate the clinical data in the repository, and the generated data, thereby, were employed to evaluate the system. The accuracy and time performance of the system for three medical query tasks relevant to liver cancer were evaluated based on the clinical data generator in the experiments with varying numbers of patients. In this study, a prototype system was developed to test the feasibility of applying a methodology for building a query execution engine using FBDQMs by formulating query tasks using the existing GLIF. The FBDQM-based query execution engine was used to successfully retrieve the clinical data based on the query tasks formatted using the GLIF3.5 in the experiments with varying numbers of patients. The accuracy of the three queries (ie, "degree of liver damage," "degree of liver damage when applying a mutually exclusive setting," and "treatments for liver cancer") was 100% for all four experiments (10 patients, 100 patients, 1000 patients, and 10,000 patients). Among the three measured query phases, (1) structured query language operations, (2) criteria verification, and (3) other, the first two had the longest execution time. The ontology-driven FBDQM-based approach enriched the capabilities of the data-querying system. The adoption of the GLIF3.5 increased the potential for interoperability, shareability, and reusability of the query tasks.

  19. Comparative Analysis of Online Health Queries Originating From Personal Computers and Smart Devices on a Consumer Health Information Portal

    PubMed Central

    Jadhav, Ashutosh; Andrews, Donna; Fiksdal, Alexander; Kumbamu, Ashok; McCormick, Jennifer B; Misitano, Andrew; Nelsen, Laurie; Ryu, Euijung; Sheth, Amit; Wu, Stephen

    2014-01-01

    Background The number of people using the Internet and mobile/smart devices for health information seeking is increasing rapidly. Although the user experience for online health information seeking varies with the device used, for example, smart devices (SDs) like smartphones/tablets versus personal computers (PCs) like desktops/laptops, very few studies have investigated how online health information seeking behavior (OHISB) may differ by device. Objective The objective of this study is to examine differences in OHISB between PCs and SDs through a comparative analysis of large-scale health search queries submitted through Web search engines from both types of devices. Methods Using the Web analytics tool, IBM NetInsight OnDemand, and based on the type of devices used (PCs or SDs), we obtained the most frequent health search queries between June 2011 and May 2013 that were submitted on Web search engines and directed users to the Mayo Clinic’s consumer health information website. We performed analyses on “Queries with considering repetition counts (QwR)” and “Queries without considering repetition counts (QwoR)”. The dataset contains (1) 2.74 million and 3.94 million QwoR, respectively for PCs and SDs, and (2) more than 100 million QwR for both PCs and SDs. We analyzed structural properties of the queries (length of the search queries, usage of query operators and special characters in health queries), types of search queries (keyword-based, wh-questions, yes/no questions), categorization of the queries based on health categories and information mentioned in the queries (gender, age-groups, temporal references), misspellings in the health queries, and the linguistic structure of the health queries. Results Query strings used for health information searching via PCs and SDs differ by almost 50%. The most searched health categories are “Symptoms” (1 in 3 search queries), “Causes”, and “Treatments & Drugs”. The distribution of search queries for different health categories differs with the device used for the search. Health queries tend to be longer and more specific than general search queries. Health queries from SDs are longer and have slightly fewer spelling mistakes than those from PCs. Users specify words related to women and children more often than that of men and any other age group. Most of the health queries are formulated using keywords; the second-most common are wh- and yes/no questions. Users ask more health questions using SDs than PCs. Almost all health queries have at least one noun and health queries from SDs are more descriptive than those from PCs. Conclusions This study is a large-scale comparative analysis of health search queries to understand the effects of device type (PCs vs SDs) used on OHISB. The study indicates that the device used for online health information search plays an important role in shaping how health information searches by consumers and patients are executed. PMID:25000537

  20. Comparative analysis of online health queries originating from personal computers and smart devices on a consumer health information portal.

    PubMed

    Jadhav, Ashutosh; Andrews, Donna; Fiksdal, Alexander; Kumbamu, Ashok; McCormick, Jennifer B; Misitano, Andrew; Nelsen, Laurie; Ryu, Euijung; Sheth, Amit; Wu, Stephen; Pathak, Jyotishman

    2014-07-04

    The number of people using the Internet and mobile/smart devices for health information seeking is increasing rapidly. Although the user experience for online health information seeking varies with the device used, for example, smart devices (SDs) like smartphones/tablets versus personal computers (PCs) like desktops/laptops, very few studies have investigated how online health information seeking behavior (OHISB) may differ by device. The objective of this study is to examine differences in OHISB between PCs and SDs through a comparative analysis of large-scale health search queries submitted through Web search engines from both types of devices. Using the Web analytics tool, IBM NetInsight OnDemand, and based on the type of devices used (PCs or SDs), we obtained the most frequent health search queries between June 2011 and May 2013 that were submitted on Web search engines and directed users to the Mayo Clinic's consumer health information website. We performed analyses on "Queries with considering repetition counts (QwR)" and "Queries without considering repetition counts (QwoR)". The dataset contains (1) 2.74 million and 3.94 million QwoR, respectively for PCs and SDs, and (2) more than 100 million QwR for both PCs and SDs. We analyzed structural properties of the queries (length of the search queries, usage of query operators and special characters in health queries), types of search queries (keyword-based, wh-questions, yes/no questions), categorization of the queries based on health categories and information mentioned in the queries (gender, age-groups, temporal references), misspellings in the health queries, and the linguistic structure of the health queries. Query strings used for health information searching via PCs and SDs differ by almost 50%. The most searched health categories are "Symptoms" (1 in 3 search queries), "Causes", and "Treatments & Drugs". The distribution of search queries for different health categories differs with the device used for the search. Health queries tend to be longer and more specific than general search queries. Health queries from SDs are longer and have slightly fewer spelling mistakes than those from PCs. Users specify words related to women and children more often than that of men and any other age group. Most of the health queries are formulated using keywords; the second-most common are wh- and yes/no questions. Users ask more health questions using SDs than PCs. Almost all health queries have at least one noun and health queries from SDs are more descriptive than those from PCs. This study is a large-scale comparative analysis of health search queries to understand the effects of device type (PCs vs. SDs) used on OHISB. The study indicates that the device used for online health information search plays an important role in shaping how health information searches by consumers and patients are executed.

  1. Development and evaluation of a biomedical search engine using a predicate-based vector space model.

    PubMed

    Kwak, Myungjae; Leroy, Gondy; Martinez, Jesse D; Harwell, Jeffrey

    2013-10-01

    Although biomedical information available in articles and patents is increasing exponentially, we continue to rely on the same information retrieval methods and use very few keywords to search millions of documents. We are developing a fundamentally different approach for finding much more precise and complete information with a single query using predicates instead of keywords for both query and document representation. Predicates are triples that are more complex datastructures than keywords and contain more structured information. To make optimal use of them, we developed a new predicate-based vector space model and query-document similarity function with adjusted tf-idf and boost function. Using a test bed of 107,367 PubMed abstracts, we evaluated the first essential function: retrieving information. Cancer researchers provided 20 realistic queries, for which the top 15 abstracts were retrieved using a predicate-based (new) and keyword-based (baseline) approach. Each abstract was evaluated, double-blind, by cancer researchers on a 0-5 point scale to calculate precision (0 versus higher) and relevance (0-5 score). Precision was significantly higher (p<.001) for the predicate-based (80%) than for the keyword-based (71%) approach. Relevance was almost doubled with the predicate-based approach-2.1 versus 1.6 without rank order adjustment (p<.001) and 1.34 versus 0.98 with rank order adjustment (p<.001) for predicate--versus keyword-based approach respectively. Predicates can support more precise searching than keywords, laying the foundation for rich and sophisticated information search. Copyright © 2013 Elsevier Inc. All rights reserved.

  2. The Localized Discovery and Recovery for Query Packet Losses in Wireless Sensor Networks with Distributed Detector Clusters

    PubMed Central

    Teng, Rui; Leibnitz, Kenji; Miura, Ryu

    2013-01-01

    An essential application of wireless sensor networks is to successfully respond to user queries. Query packet losses occur in the query dissemination due to wireless communication problems such as interference, multipath fading, packet collisions, etc. The losses of query messages at sensor nodes result in the failure of sensor nodes reporting the requested data. Hence, the reliable and successful dissemination of query messages to sensor nodes is a non-trivial problem. The target of this paper is to enable highly successful query delivery to sensor nodes by localized and energy-efficient discovery, and recovery of query losses. We adopt local and collective cooperation among sensor nodes to increase the success rate of distributed discoveries and recoveries. To enable the scalability in the operations of discoveries and recoveries, we employ a distributed name resolution mechanism at each sensor node to allow sensor nodes to self-detect the correlated queries and query losses, and then efficiently locally respond to the query losses. We prove that the collective discovery of query losses has a high impact on the success of query dissemination and reveal that scalability can be achieved by using the proposed approach. We further study the novel features of the cooperation and competition in the collective recovery at PHY and MAC layers, and show that the appropriate number of detectors can achieve optimal successful recovery rate. We evaluate the proposed approach with both mathematical analyses and computer simulations. The proposed approach enables a high rate of successful delivery of query messages and it results in short route lengths to recover from query losses. The proposed approach is scalable and operates in a fully distributed manner. PMID:23748172

  3. EarthServer - an FP7 project to enable the web delivery and analysis of 3D/4D models

    NASA Astrophysics Data System (ADS)

    Laxton, John; Sen, Marcus; Passmore, James

    2013-04-01

    EarthServer aims at open access and ad-hoc analytics on big Earth Science data, based on the OGC geoservice standards Web Coverage Service (WCS) and Web Coverage Processing Service (WCPS). The WCS model defines "coverages" as a unifying paradigm for multi-dimensional raster data, point clouds, meshes, etc., thereby addressing a wide range of Earth Science data including 3D/4D models. WCPS allows declarative SQL-style queries on coverages. The project is developing a pilot implementing these standards, and will also investigate the use of GeoSciML to describe coverages. Integration of WCPS with XQuery will in turn allow coverages to be queried in combination with their metadata and GeoSciML description. The unified service will support navigation, extraction, aggregation, and ad-hoc analysis on coverage data from SQL. Clients will range from mobile devices to high-end immersive virtual reality, and will enable 3D model visualisation using web browser technology coupled with developing web standards. EarthServer is establishing open-source client and server technology intended to be scalable to Petabyte/Exabyte volumes, based on distributed processing, supercomputing, and cloud virtualization. Implementation will be based on the existing rasdaman server technology developed. Services using rasdaman technology are being installed serving the atmospheric, oceanographic, geological, cryospheric, planetary and general earth observation communities. The geology service (http://earthserver.bgs.ac.uk/) is being provided by BGS and at present includes satellite imagery, superficial thickness data, onshore DTMs and 3D models for the Glasgow area. It is intended to extend the data sets available to include 3D voxel models. Use of the WCPS standard allows queries to be constructed against single or multiple coverages. For example on a single coverage data for a particular area can be selected or data with a particular range of pixel values. Queries on multiple surfaces can be constructed to calculate, for example, the thickness between two surfaces in a 3D model or the depth from ground surface to the top of a particular geologic unit. In the first version of the service a simple interface showing some example queries has been implemented in order to show the potential of the technologies. The project aims to develop the services available in light of user feedback, both in terms of the data available, the functionality and the interface. User feedback on the services guides the software and standards development aspects of the project, leading to enhanced versions of the software which will be implemented in upgraded versions of the services during the lifetime of the project.

  4. Ontological Approach to Military Knowledge Modeling and Management

    DTIC Science & Technology

    2004-03-01

    federated search mechanism has to reformulate user queries (expressed using the ontology) in the query languages of the different sources (e.g. SQL...ontologies as a common terminology – Unified query to perform federated search • Query processing – Ontology mapping to sources reformulate queries

  5. Research on Extension of Sparql Ontology Query Language Considering the Computation of Indoor Spatial Relations

    NASA Astrophysics Data System (ADS)

    Li, C.; Zhu, X.; Guo, W.; Liu, Y.; Huang, H.

    2015-05-01

    A method suitable for indoor complex semantic query considering the computation of indoor spatial relations is provided According to the characteristics of indoor space. This paper designs ontology model describing the space related information of humans, events and Indoor space objects (e.g. Storey and Room) as well as their relations to meet the indoor semantic query. The ontology concepts are used in IndoorSPARQL query language which extends SPARQL syntax for representing and querying indoor space. And four types specific primitives for indoor query, "Adjacent", "Opposite", "Vertical" and "Contain", are defined as query functions in IndoorSPARQL used to support quantitative spatial computations. Also a method is proposed to analysis the query language. Finally this paper adopts this method to realize indoor semantic query on the study area through constructing the ontology model for the study building. The experimental results show that the method proposed in this paper can effectively support complex indoor space semantic query.

  6. VISAGE: Interactive Visual Graph Querying.

    PubMed

    Pienta, Robert; Navathe, Shamkant; Tamersoy, Acar; Tong, Hanghang; Endert, Alex; Chau, Duen Horng

    2016-06-01

    Extracting useful patterns from large network datasets has become a fundamental challenge in many domains. We present VISAGE, an interactive visual graph querying approach that empowers users to construct expressive queries, without writing complex code (e.g., finding money laundering rings of bankers and business owners). Our contributions are as follows: (1) we introduce graph autocomplete , an interactive approach that guides users to construct and refine queries, preventing over-specification; (2) VISAGE guides the construction of graph queries using a data-driven approach, enabling users to specify queries with varying levels of specificity, from concrete and detailed (e.g., query by example), to abstract (e.g., with "wildcard" nodes of any types), to purely structural matching; (3) a twelve-participant, within-subject user study demonstrates VISAGE's ease of use and the ability to construct graph queries significantly faster than using a conventional query language; (4) VISAGE works on real graphs with over 468K edges, achieving sub-second response times for common queries.

  7. VISAGE: Interactive Visual Graph Querying

    PubMed Central

    Pienta, Robert; Navathe, Shamkant; Tamersoy, Acar; Tong, Hanghang; Endert, Alex; Chau, Duen Horng

    2017-01-01

    Extracting useful patterns from large network datasets has become a fundamental challenge in many domains. We present VISAGE, an interactive visual graph querying approach that empowers users to construct expressive queries, without writing complex code (e.g., finding money laundering rings of bankers and business owners). Our contributions are as follows: (1) we introduce graph autocomplete, an interactive approach that guides users to construct and refine queries, preventing over-specification; (2) VISAGE guides the construction of graph queries using a data-driven approach, enabling users to specify queries with varying levels of specificity, from concrete and detailed (e.g., query by example), to abstract (e.g., with “wildcard” nodes of any types), to purely structural matching; (3) a twelve-participant, within-subject user study demonstrates VISAGE’s ease of use and the ability to construct graph queries significantly faster than using a conventional query language; (4) VISAGE works on real graphs with over 468K edges, achieving sub-second response times for common queries. PMID:28553670

  8. A Visual Interface for Querying Heterogeneous Phylogenetic Databases.

    PubMed

    Jamil, Hasan M

    2017-01-01

    Despite the recent growth in the number of phylogenetic databases, access to these wealth of resources remain largely tool or form-based interface driven. It is our thesis that the flexibility afforded by declarative query languages may offer the opportunity to access these repositories in a better way, and to use such a language to pose truly powerful queries in unprecedented ways. In this paper, we propose a substantially enhanced closed visual query language, called PhyQL, that can be used to query phylogenetic databases represented in a canonical form. The canonical representation presented helps capture most phylogenetic tree formats in a convenient way, and is used as the storage model for our PhyloBase database for which PhyQL serves as the query language. We have implemented a visual interface for the end users to pose PhyQL queries using visual icons, and drag and drop operations defined over them. Once a query is posed, the interface translates the visual query into a Datalog query for execution over the canonical database. Responses are returned as hyperlinks to phylogenies that can be viewed in several formats using the tree viewers supported by PhyloBase. Results cached in PhyQL buffer allows secondary querying on the computed results making it a truly powerful querying architecture.

  9. An evaluation of the NCI Physician Data Query (PDQ) dissemination in Europe.

    PubMed

    Arrigo, C

    1997-06-01

    The Physician Data Query (PDQ) is a computerised medical database aimed at communicating cancer-related information through computer technology. PDQ has been distributed in Europe since 1988 and in the U.S. since 1985. A study aimed at evaluating whether PDQ had reached a wide medical community was conducted among European PDQ distributors in 1992 as part of an evaluation of PDQ use and impact. Results indicate that PDQ promotion was limited and that several countries were underserved. A preference was expressed for on-line systems. It was found that PDQ access was limited by the requirement of special equipment, by subscription costs and by language barriers. Subscribers were found mainly in the countries with the highest number of distributors. Among a total 12,205 reported subscribers, only 20% were physicians. The proportion of physicians subscribing to the system was, therefore, low (range 4-39.7/10,000). Altogether 453 searches (range 0-213) and 84.5 h of connection time (range 0-52.5) were reported per month for Europe. It is obvious that additional investigation is needed to further evaluate the impact of PDQ on clinical practice.

  10. Which factors predict the time spent answering queries to a drug information centre?

    PubMed Central

    Reppe, Linda A.; Spigset, Olav

    2010-01-01

    Objective To develop a model based upon factors able to predict the time spent answering drug-related queries to Norwegian drug information centres (DICs). Setting and method Drug-related queries received at 5 DICs in Norway from March to May 2007 were randomly assigned to 20 employees until each of them had answered a minimum of five queries. The employees reported the number of drugs involved, the type of literature search performed, and whether the queries were considered judgmental or not, using a specifically developed scoring system. Main outcome measures The scores of these three factors were added together to define a workload score for each query. Workload and its individual factors were subsequently related to the measured time spent answering the queries by simple or multiple linear regression analyses. Results Ninety-six query/answer pairs were analyzed. Workload significantly predicted the time spent answering the queries (adjusted R2 = 0.22, P < 0.001). Literature search was the individual factor best predicting the time spent answering the queries (adjusted R2 = 0.17, P < 0.001), and this variable also contributed the most in the multiple regression analyses. Conclusion The most important workload factor predicting the time spent handling the queries in this study was the type of literature search that had to be performed. The categorisation of queries as judgmental or not, also affected the time spent answering the queries. The number of drugs involved did not significantly influence the time spent answering drug information queries. PMID:20922480

  11. Personalized query suggestion based on user behavior

    NASA Astrophysics Data System (ADS)

    Chen, Wanyu; Hao, Zepeng; Shao, Taihua; Chen, Honghui

    Query suggestions help users refine their queries after they input an initial query. Previous work mainly concentrated on similarity-based and context-based query suggestion approaches. However, models that focus on adapting to a specific user (personalization) can help to improve the probability of the user being satisfied. In this paper, we propose a personalized query suggestion model based on users’ search behavior (UB model), where we inject relevance between queries and users’ search behavior into a basic probabilistic model. For the relevance between queries, we consider their semantical similarity and co-occurrence which indicates the behavior information from other users in web search. Regarding the current user’s preference to a query, we combine the user’s short-term and long-term search behavior in a linear fashion and deal with the data sparse problem with Bayesian probabilistic matrix factorization (BPMF). In particular, we also investigate the impact of different personalization strategies (the combination of the user’s short-term and long-term search behavior) on the performance of query suggestion reranking. We quantify the improvement of our proposed UB model against a state-of-the-art baseline using the public AOL query logs and show that it beats the baseline in terms of metrics used in query suggestion reranking. The experimental results show that: (i) for personalized ranking, users’ behavioral information helps to improve query suggestion effectiveness; and (ii) given a query, merging information inferred from the short-term and long-term search behavior of a particular user can result in a better performance than both plain approaches.

  12. Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce.

    PubMed

    Aji, Ablimit; Wang, Fusheng; Vo, Hoang; Lee, Rubao; Liu, Qiaoling; Zhang, Xiaodong; Saltz, Joel

    2013-08-01

    Support of high performance queries on large volumes of spatial data becomes increasingly important in many application domains, including geospatial problems in numerous fields, location based services, and emerging scientific applications that are increasingly data- and compute-intensive. The emergence of massive scale spatial data is due to the proliferation of cost effective and ubiquitous positioning technologies, development of high resolution imaging technologies, and contribution from a large number of community users. There are two major challenges for managing and querying massive spatial data to support spatial queries: the explosion of spatial data, and the high computational complexity of spatial queries. In this paper, we present Hadoop-GIS - a scalable and high performance spatial data warehousing system for running large scale spatial queries on Hadoop. Hadoop-GIS supports multiple types of spatial queries on MapReduce through spatial partitioning, customizable spatial query engine RESQUE, implicit parallel spatial query execution on MapReduce, and effective methods for amending query results through handling boundary objects. Hadoop-GIS utilizes global partition indexing and customizable on demand local spatial indexing to achieve efficient query processing. Hadoop-GIS is integrated into Hive to support declarative spatial queries with an integrated architecture. Our experiments have demonstrated the high efficiency of Hadoop-GIS on query response and high scalability to run on commodity clusters. Our comparative experiments have showed that performance of Hadoop-GIS is on par with parallel SDBMS and outperforms SDBMS for compute-intensive queries. Hadoop-GIS is available as a set of library for processing spatial queries, and as an integrated software package in Hive.

  13. Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce

    PubMed Central

    Aji, Ablimit; Wang, Fusheng; Vo, Hoang; Lee, Rubao; Liu, Qiaoling; Zhang, Xiaodong; Saltz, Joel

    2013-01-01

    Support of high performance queries on large volumes of spatial data becomes increasingly important in many application domains, including geospatial problems in numerous fields, location based services, and emerging scientific applications that are increasingly data- and compute-intensive. The emergence of massive scale spatial data is due to the proliferation of cost effective and ubiquitous positioning technologies, development of high resolution imaging technologies, and contribution from a large number of community users. There are two major challenges for managing and querying massive spatial data to support spatial queries: the explosion of spatial data, and the high computational complexity of spatial queries. In this paper, we present Hadoop-GIS – a scalable and high performance spatial data warehousing system for running large scale spatial queries on Hadoop. Hadoop-GIS supports multiple types of spatial queries on MapReduce through spatial partitioning, customizable spatial query engine RESQUE, implicit parallel spatial query execution on MapReduce, and effective methods for amending query results through handling boundary objects. Hadoop-GIS utilizes global partition indexing and customizable on demand local spatial indexing to achieve efficient query processing. Hadoop-GIS is integrated into Hive to support declarative spatial queries with an integrated architecture. Our experiments have demonstrated the high efficiency of Hadoop-GIS on query response and high scalability to run on commodity clusters. Our comparative experiments have showed that performance of Hadoop-GIS is on par with parallel SDBMS and outperforms SDBMS for compute-intensive queries. Hadoop-GIS is available as a set of library for processing spatial queries, and as an integrated software package in Hive. PMID:24187650

  14. Estimating Influenza Outbreaks Using Both Search Engine Query Data and Social Media Data in South Korea.

    PubMed

    Woo, Hyekyung; Cho, Youngtae; Shim, Eunyoung; Lee, Jong-Koo; Lee, Chang-Gun; Kim, Seong Hwan

    2016-07-04

    As suggested as early as in 2006, logs of queries submitted to search engines seeking information could be a source for detection of emerging influenza epidemics if changes in the volume of search queries are monitored (infodemiology). However, selecting queries that are most likely to be associated with influenza epidemics is a particular challenge when it comes to generating better predictions. In this study, we describe a methodological extension for detecting influenza outbreaks using search query data; we provide a new approach for query selection through the exploration of contextual information gleaned from social media data. Additionally, we evaluate whether it is possible to use these queries for monitoring and predicting influenza epidemics in South Korea. Our study was based on freely available weekly influenza incidence data and query data originating from the search engine on the Korean website Daum between April 3, 2011 and April 5, 2014. To select queries related to influenza epidemics, several approaches were applied: (1) exploring influenza-related words in social media data, (2) identifying the chief concerns related to influenza, and (3) using Web query recommendations. Optimal feature selection by least absolute shrinkage and selection operator (Lasso) and support vector machine for regression (SVR) were used to construct a model predicting influenza epidemics. In total, 146 queries related to influenza were generated through our initial query selection approach. A considerable proportion of optimal features for final models were derived from queries with reference to the social media data. The SVR model performed well: the prediction values were highly correlated with the recent observed influenza-like illness (r=.956; P<.001) and virological incidence rate (r=.963; P<.001). These results demonstrate the feasibility of using search queries to enhance influenza surveillance in South Korea. In addition, an approach for query selection using social media data seems ideal for supporting influenza surveillance based on search query data.

  15. Estimating Influenza Outbreaks Using Both Search Engine Query Data and Social Media Data in South Korea

    PubMed Central

    Woo, Hyekyung; Shim, Eunyoung; Lee, Jong-Koo; Lee, Chang-Gun; Kim, Seong Hwan

    2016-01-01

    Background As suggested as early as in 2006, logs of queries submitted to search engines seeking information could be a source for detection of emerging influenza epidemics if changes in the volume of search queries are monitored (infodemiology). However, selecting queries that are most likely to be associated with influenza epidemics is a particular challenge when it comes to generating better predictions. Objective In this study, we describe a methodological extension for detecting influenza outbreaks using search query data; we provide a new approach for query selection through the exploration of contextual information gleaned from social media data. Additionally, we evaluate whether it is possible to use these queries for monitoring and predicting influenza epidemics in South Korea. Methods Our study was based on freely available weekly influenza incidence data and query data originating from the search engine on the Korean website Daum between April 3, 2011 and April 5, 2014. To select queries related to influenza epidemics, several approaches were applied: (1) exploring influenza-related words in social media data, (2) identifying the chief concerns related to influenza, and (3) using Web query recommendations. Optimal feature selection by least absolute shrinkage and selection operator (Lasso) and support vector machine for regression (SVR) were used to construct a model predicting influenza epidemics. Results In total, 146 queries related to influenza were generated through our initial query selection approach. A considerable proportion of optimal features for final models were derived from queries with reference to the social media data. The SVR model performed well: the prediction values were highly correlated with the recent observed influenza-like illness (r=.956; P<.001) and virological incidence rate (r=.963; P<.001). Conclusions These results demonstrate the feasibility of using search queries to enhance influenza surveillance in South Korea. In addition, an approach for query selection using social media data seems ideal for supporting influenza surveillance based on search query data. PMID:27377323

  16. Lost in translation? A multilingual Query Builder improves the quality of PubMed queries: a randomised controlled trial.

    PubMed

    Schuers, Matthieu; Joulakian, Mher; Kerdelhué, Gaetan; Segas, Léa; Grosjean, Julien; Darmoni, Stéfan J; Griffon, Nicolas

    2017-07-03

    MEDLINE is the most widely used medical bibliographic database in the world. Most of its citations are in English and this can be an obstacle for some researchers to access the information the database contains. We created a multilingual query builder to facilitate access to the PubMed subset using a language other than English. The aim of our study was to assess the impact of this multilingual query builder on the quality of PubMed queries for non-native English speaking physicians and medical researchers. A randomised controlled study was conducted among French speaking general practice residents. We designed a multi-lingual query builder to facilitate information retrieval, based on available MeSH translations and providing users with both an interface and a controlled vocabulary in their own language. Participating residents were randomly allocated either the French or the English version of the query builder. They were asked to translate 12 short medical questions into MeSH queries. The main outcome was the quality of the query. Two librarians blind to the arm independently evaluated each query, using a modified published classification that differentiated eight types of errors. Twenty residents used the French version of the query builder and 22 used the English version. 492 queries were analysed. There were significantly more perfect queries in the French group vs. the English group (respectively 37.9% vs. 17.9%; p < 0.01). It took significantly more time for the members of the English group than the members of the French group to build each query, respectively 194 sec vs. 128 sec; p < 0.01. This multi-lingual query builder is an effective tool to improve the quality of PubMed queries in particular for researchers whose first language is not English.

  17. A Web-Based Data-Querying Tool Based on Ontology-Driven Methodology and Flowchart-Based Model

    PubMed Central

    Ping, Xiao-Ou; Chung, Yufang; Liang, Ja-Der; Yang, Pei-Ming; Huang, Guan-Tarn; Lai, Feipei

    2013-01-01

    Background Because of the increased adoption rate of electronic medical record (EMR) systems, more health care records have been increasingly accumulating in clinical data repositories. Therefore, querying the data stored in these repositories is crucial for retrieving the knowledge from such large volumes of clinical data. Objective The aim of this study is to develop a Web-based approach for enriching the capabilities of the data-querying system along the three following considerations: (1) the interface design used for query formulation, (2) the representation of query results, and (3) the models used for formulating query criteria. Methods The Guideline Interchange Format version 3.5 (GLIF3.5), an ontology-driven clinical guideline representation language, was used for formulating the query tasks based on the GLIF3.5 flowchart in the Protégé environment. The flowchart-based data-querying model (FBDQM) query execution engine was developed and implemented for executing queries and presenting the results through a visual and graphical interface. To examine a broad variety of patient data, the clinical data generator was implemented to automatically generate the clinical data in the repository, and the generated data, thereby, were employed to evaluate the system. The accuracy and time performance of the system for three medical query tasks relevant to liver cancer were evaluated based on the clinical data generator in the experiments with varying numbers of patients. Results In this study, a prototype system was developed to test the feasibility of applying a methodology for building a query execution engine using FBDQMs by formulating query tasks using the existing GLIF. The FBDQM-based query execution engine was used to successfully retrieve the clinical data based on the query tasks formatted using the GLIF3.5 in the experiments with varying numbers of patients. The accuracy of the three queries (ie, “degree of liver damage,” “degree of liver damage when applying a mutually exclusive setting,” and “treatments for liver cancer”) was 100% for all four experiments (10 patients, 100 patients, 1000 patients, and 10,000 patients). Among the three measured query phases, (1) structured query language operations, (2) criteria verification, and (3) other, the first two had the longest execution time. Conclusions The ontology-driven FBDQM-based approach enriched the capabilities of the data-querying system. The adoption of the GLIF3.5 increased the potential for interoperability, shareability, and reusability of the query tasks. PMID:25600078

  18. Mining Longitudinal Web Queries: Trends and Patterns.

    ERIC Educational Resources Information Center

    Wang, Peiling; Berry, Michael W.; Yang, Yiheng

    2003-01-01

    Analyzed user queries submitted to an academic Web site during a four-year period, using a relational database, to examine users' query behavior, to identify problems they encounter, and to develop techniques for optimizing query analysis and mining. Linguistic analyses focus on query structures, lexicon, and word associations using statistical…

  19. Optimizing a Query by Transformation and Expansion.

    PubMed

    Glocker, Katrin; Knurr, Alexander; Dieter, Julia; Dominick, Friederike; Forche, Melanie; Koch, Christian; Pascoe Pérez, Analie; Roth, Benjamin; Ückert, Frank

    2017-01-01

    In the biomedical sector not only the amount of information produced and uploaded into the web is enormous, but also the number of sources where these data can be found. Clinicians and researchers spend huge amounts of time on trying to access this information and to filter the most important answers to a given question. As the formulation of these queries is crucial, automated query expansion is an effective tool to optimize a query and receive the best possible results. In this paper we introduce the concept of a workflow for an optimization of queries in the medical and biological sector by using a series of tools for expansion and transformation of the query. After the definition of attributes by the user, the query string is compared to previous queries in order to add semantic co-occurring terms to the query. Additionally, the query is enlarged by an inclusion of synonyms. The translation into database specific ontologies ensures the optimal query formulation for the chosen database(s). As this process can be performed in various databases at once, the results are ranked and normalized in order to achieve a comparable list of answers for a question.

  20. WATCHMAN: A Data Warehouse Intelligent Cache Manager

    NASA Technical Reports Server (NTRS)

    Scheuermann, Peter; Shim, Junho; Vingralek, Radek

    1996-01-01

    Data warehouses store large volumes of data which are used frequently by decision support applications. Such applications involve complex queries. Query performance in such an environment is critical because decision support applications often require interactive query response time. Because data warehouses are updated infrequently, it becomes possible to improve query performance by caching sets retrieved by queries in addition to query execution plans. In this paper we report on the design of an intelligent cache manager for sets retrieved by queries called WATCHMAN, which is particularly well suited for data warehousing environment. Our cache manager employs two novel, complementary algorithms for cache replacement and for cache admission. WATCHMAN aims at minimizing query response time and its cache replacement policy swaps out entire retrieved sets of queries instead of individual pages. The cache replacement and admission algorithms make use of a profit metric, which considers for each retrieved set its average rate of reference, its size, and execution cost of the associated query. We report on a performance evaluation based on the TPC-D and Set Query benchmarks. These experiments show that WATCHMAN achieves a substantial performance improvement in a decision support environment when compared to a traditional LRU replacement algorithm.

  1. Temporal Representation in Semantic Graphs

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Levandoski, J J; Abdulla, G M

    2007-08-07

    A wide range of knowledge discovery and analysis applications, ranging from business to biological, make use of semantic graphs when modeling relationships and concepts. Most of the semantic graphs used in these applications are assumed to be static pieces of information, meaning temporal evolution of concepts and relationships are not taken into account. Guided by the need for more advanced semantic graph queries involving temporal concepts, this paper surveys the existing work involving temporal representations in semantic graphs.

  2. Assisting Consumer Health Information Retrieval with Query Recommendations

    PubMed Central

    Zeng, Qing T.; Crowell, Jonathan; Plovnick, Robert M.; Kim, Eunjung; Ngo, Long; Dibble, Emily

    2006-01-01

    Objective: Health information retrieval (HIR) on the Internet has become an important practice for millions of people, many of whom have problems forming effective queries. We have developed and evaluated a tool to assist people in health-related query formation. Design: We developed the Health Information Query Assistant (HIQuA) system. The system suggests alternative/additional query terms related to the user's initial query that can be used as building blocks to construct a better, more specific query. The recommended terms are selected according to their semantic distance from the original query, which is calculated on the basis of concept co-occurrences in medical literature and log data as well as semantic relations in medical vocabularies. Measurements: An evaluation of the HIQuA system was conducted and a total of 213 subjects participated in the study. The subjects were randomized into 2 groups. One group was given query recommendations and the other was not. Each subject performed HIR for both a predefined and a self-defined task. Results: The study showed that providing HIQuA recommendations resulted in statistically significantly higher rates of successful queries (odds ratio = 1.66, 95% confidence interval = 1.16–2.38), although no statistically significant impact on user satisfaction or the users' ability to accomplish the predefined retrieval task was found. Conclusion: Providing semantic-distance-based query recommendations can help consumers with query formation during HIR. PMID:16221944

  3. Design and Implementation of a Metadata-rich File System

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ames, S; Gokhale, M B; Maltzahn, C

    2010-01-19

    Despite continual improvements in the performance and reliability of large scale file systems, the management of user-defined file system metadata has changed little in the past decade. The mismatch between the size and complexity of large scale data stores and their ability to organize and query their metadata has led to a de facto standard in which raw data is stored in traditional file systems, while related, application-specific metadata is stored in relational databases. This separation of data and semantic metadata requires considerable effort to maintain consistency and can result in complex, slow, and inflexible system operation. To address thesemore » problems, we have developed the Quasar File System (QFS), a metadata-rich file system in which files, user-defined attributes, and file relationships are all first class objects. In contrast to hierarchical file systems and relational databases, QFS defines a graph data model composed of files and their relationships. QFS incorporates Quasar, an XPATH-extended query language for searching the file system. Results from our QFS prototype show the effectiveness of this approach. Compared to the de facto standard, the QFS prototype shows superior ingest performance and comparable query performance on user metadata-intensive operations and superior performance on normal file metadata operations.« less

  4. The CMS DBS query language

    NASA Astrophysics Data System (ADS)

    Kuznetsov, Valentin; Riley, Daniel; Afaq, Anzar; Sekhri, Vijay; Guo, Yuyi; Lueking, Lee

    2010-04-01

    The CMS experiment has implemented a flexible and powerful system enabling users to find data within the CMS physics data catalog. The Dataset Bookkeeping Service (DBS) comprises a database and the services used to store and access metadata related to CMS physics data. To this, we have added a generalized query system in addition to the existing web and programmatic interfaces to the DBS. This query system is based on a query language that hides the complexity of the underlying database structure by discovering the join conditions between database tables. This provides a way of querying the system that is simple and straightforward for CMS data managers and physicists to use without requiring knowledge of the database tables or keys. The DBS Query Language uses the ANTLR tool to build the input query parser and tokenizer, followed by a query builder that uses a graph representation of the DBS schema to construct the SQL query sent to underlying database. We will describe the design of the query system, provide details of the language components and overview of how this component fits into the overall data discovery system architecture.

  5. Spatial aggregation query in dynamic geosensor networks

    NASA Astrophysics Data System (ADS)

    Yi, Baolin; Feng, Dayang; Xiao, Shisong; Zhao, Erdun

    2007-11-01

    Wireless sensor networks have been widely used for civilian and military applications, such as environmental monitoring and vehicle tracking. In many of these applications, the researches mainly aim at building sensor network based systems to leverage the sensed data to applications. However, the existing works seldom exploited spatial aggregation query considering the dynamic characteristics of sensor networks. In this paper, we investigate how to process spatial aggregation query over dynamic geosensor networks where both the sink node and sensor nodes are mobile and propose several novel improvements on enabling techniques. The mobility of sensors makes the existing routing protocol based on information of fixed framework or the neighborhood infeasible. We present an improved location-based stateless implicit geographic forwarding (IGF) protocol for routing a query toward the area specified by query window, a diameter-based window aggregation query (DWAQ) algorithm for query propagation and data aggregation in the query window, finally considering the location changing of the sink node, we present two schemes to forward the result to the sink node. Simulation results show that the proposed algorithms can improve query latency and query accuracy.

  6. Relational similarity-based model of data part 1: foundations and query systems

    NASA Astrophysics Data System (ADS)

    Belohlavek, Radim; Vychodil, Vilem

    2017-10-01

    We present a general rank-aware model of data which supports handling of similarity in relational databases. The model is based on the assumption that in many cases it is desirable to replace equalities on values in data tables by similarity relations expressing degrees to which the values are similar. In this context, we study various phenomena which emerge in the model, including similarity-based queries and similarity-based data dependencies. Central notion in our model is that of a ranked data table over domains with similarities which is our counterpart to the notion of relation on relation scheme from the classical relational model. Compared to other approaches which cover related problems, we do not propose a similarity-based or ranking module on top of the classical relational model. Instead, we generalize the very core of the model by replacing the classical, two-valued logic upon which the classical model is built by a more general logic involving a scale of truth degrees that, in addition to the classical truth degrees 0 and 1, contains intermediate truth degrees. While the classical truth degrees 0 and 1 represent nonequality and equality of values, and subsequently mismatch and match of queries, the intermediate truth degrees in the new model represent similarity of values and partial match of queries. Moreover, the truth functions of many-valued logical connectives in the new model serve to aggregate degrees of similarity. The presented approach is conceptually clean, logically sound, and retains most properties of the classical model while enabling us to employ new types of queries and data dependencies. Most importantly, similarity is not handled in an ad hoc way or by putting a "similarity module" atop the classical model in our approach. Rather, it is consistently viewed as a notion that generalizes and replaces equality in the very core of the relational model. We present fundamentals of the formal model and two equivalent query systems which are analogues of the classical relational algebra and domain relational calculus with range declarations. In the sequel to this paper, we deal with similarity-based dependencies.

  7. Analysis of queries sent to PubMed at the point of care: Observation of search behaviour in a medical teaching hospital

    PubMed Central

    Hoogendam, Arjen; Stalenhoef, Anton FH; Robbé, Pieter F de Vries; Overbeke, A John PM

    2008-01-01

    Background The use of PubMed to answer daily medical care questions is limited because it is challenging to retrieve a small set of relevant articles and time is restricted. Knowing what aspects of queries are likely to retrieve relevant articles can increase the effectiveness of PubMed searches. The objectives of our study were to identify queries that are likely to retrieve relevant articles by relating PubMed search techniques and tools to the number of articles retrieved and the selection of articles for further reading. Methods This was a prospective observational study of queries regarding patient-related problems sent to PubMed by residents and internists in internal medicine working in an Academic Medical Centre. We analyzed queries, search results, query tools (Mesh, Limits, wildcards, operators), selection of abstract and full-text for further reading, using a portal that mimics PubMed. Results PubMed was used to solve 1121 patient-related problems, resulting in 3205 distinct queries. Abstracts were viewed in 999 (31%) of these queries, and in 126 (39%) of 321 queries using query tools. The average term count per query was 2.5. Abstracts were selected in more than 40% of queries using four or five terms, increasing to 63% if the use of four or five terms yielded 2–161 articles. Conclusion Queries sent to PubMed by physicians at our hospital during daily medical care contain fewer than three terms. Queries using four to five terms, retrieving less than 161 article titles, are most likely to result in abstract viewing. PubMed search tools are used infrequently by our population and are less effective than the use of four or five terms. Methods to facilitate the formulation of precise queries, using more relevant terms, should be the focus of education and research. PMID:18816391

  8. Catalogue of HI PArameters (CHIPA)

    NASA Astrophysics Data System (ADS)

    Saponara, J.; Benaglia, P.; Koribalski, B.; Andruchow, I.

    2015-08-01

    The catalogue of HI parameters of galaxies HI (CHIPA) is the natural continuation of the compilation by M.C. Martin in 1998. CHIPA provides the most important parameters of nearby galaxies derived from observations of the neutral Hydrogen line. The catalogue contains information of 1400 galaxies across the sky and different morphological types. Parameters like the optical diameter of the galaxy, the blue magnitude, the distance, morphological type, HI extension are listed among others. Maps of the HI distribution, velocity and velocity dispersion can also be display for some cases. The main objective of this catalogue is to facilitate the bibliographic queries, through searching in a database accessible from the internet that will be available in 2015 (the website is under construction). The database was built using the open source `` mysql (SQL, Structured Query Language, management system relational database) '', while the website was built with ''HTML (Hypertext Markup Language)'' and ''PHP (Hypertext Preprocessor)''.

  9. Evaluating Open-Source Full-Text Search Engines for Matching ICD-10 Codes.

    PubMed

    Jurcău, Daniel-Alexandru; Stoicu-Tivadar, Vasile

    2016-01-01

    This research presents the results of evaluating multiple free, open-source engines on matching ICD-10 diagnostic codes via full-text searches. The study investigates what it takes to get an accurate match when searching for a specific diagnostic code. For each code the evaluation starts by extracting the words that make up its text and continues with building full-text search queries from the combinations of these words. The queries are then run against all the ICD-10 codes until a match indicates the code in question as a match with the highest relative score. This method identifies the minimum number of words that must be provided in order for the search engines choose the desired entry. The engines analyzed include a popular Java-based full-text search engine, a lightweight engine written in JavaScript which can even execute on the user's browser, and two popular open-source relational database management systems.

  10. Logic Programming in LISP.

    DTIC Science & Technology

    1981-01-01

    THIS PAGZ(Whan Doee Es tMord) Item 20 (Cont’d) ------ work in the area of artificial intelligence and those used in general program development into a...Controlling Gfile) IS. SECURITY CLASS. (of tis report) Same .,/ UNCLASSIFIED 13d. DECLASSIFICATION/ DOWN GRADING ..- ". .--- /A!CHEDULEI t I IS...logic programming with LISP for implementing intelligent data base query systems. Continued developments will allow for enhancements to be made to the

  11. Federated queries of clinical data repositories: Scaling to a national network.

    PubMed

    Weber, Griffin M

    2015-06-01

    Federated networks of clinical research data repositories are rapidly growing in size from a handful of sites to true national networks with more than 100 hospitals. This study creates a conceptual framework for predicting how various properties of these systems will scale as they continue to expand. Starting with actual data from Harvard's four-site Shared Health Research Information Network (SHRINE), the framework is used to imagine a future 4000 site network, representing the majority of hospitals in the United States. From this it becomes clear that several common assumptions of small networks fail to scale to a national level, such as all sites being online at all times or containing data from the same date range. On the other hand, a large network enables researchers to select subsets of sites that are most appropriate for particular research questions. Developers of federated clinical data networks should be aware of how the properties of these networks change at different scales and design their software accordingly. Copyright © 2015 Elsevier Inc. All rights reserved.

  12. The design and implementation of the immune epitope database and analysis resource

    PubMed Central

    Peters, Bjoern; Sidney, John; Bourne, Phil; Bui, Huynh-Hoa; Buus, Soeren; Doh, Grace; Fleri, Ward; Kronenberg, Mitch; Kubo, Ralph; Lund, Ole; Nemazee, David; Ponomarenko, Julia V.; Sathiamurthy, Muthu; Schoenberger, Stephen P.; Stewart, Scott; Surko, Pamela; Way, Scott; Wilson, Steve; Sette, Alessandro

    2016-01-01

    Epitopes are defined as parts of antigens interacting with receptors of the immune system. Knowledge about their intrinsic structure and how they affect the immune response is required to continue development of techniques that detect, monitor, and fight diseases. Their scientific importance is reflected in the vast amount of epitope-related information gathered, ranging from interactions between epitopes and major histocompatibility complex molecules determined by X-ray crystallography to clinical studies analyzing correlates of protection for epitope based vaccines. Our goal is to provide a central resource capable of capturing this information, allowing users to access and connect realms of knowledge that are currently separated and difficult to access. Here, we portray a new initiative, “The Immune Epitope Database and Analysis Resource.” We describe how we plan to capture, structure, and store this information, what query interfaces we will make available to the public, and what additional predictive and analytical tools we will provide. PMID:15895191

  13. The Biomolecular Crystallization Database Version 4: expanded content and new features.

    PubMed

    Tung, Michael; Gallagher, D Travis

    2009-01-01

    The Biological Macromolecular Crystallization Database (BMCD) has been a publicly available resource since 1988, providing a curated archive of information on crystal growth for proteins and other biological macromolecules. The BMCD content has recently been expanded to include 14 372 crystal entries. The resource continues to be freely available at http://xpdb.nist.gov:8060/BMCD4. In addition, the software has been adapted to support the Java-based Lucene query language, enabling detailed searching over specific parameters, and explicit search of parameter ranges is offered for five numeric variables. Extensive tools have been developed for import and handling of data from the RCSB Protein Data Bank. The updated BMCD is called version 4.02 or BMCD4. BMCD4 entries have been expanded to include macromolecule sequence, enabling more elaborate analysis of relations among protein properties, crystal-growth conditions and the geometric and diffraction properties of the crystals. The BMCD version 4.02 contains greatly expanded content and enhanced search capabilities to facilitate scientific analysis and design of crystal-growth strategies.

  14. LAILAPS-QSM: A RESTful API and JAVA library for semantic query suggestions.

    PubMed

    Chen, Jinbo; Scholz, Uwe; Zhou, Ruonan; Lange, Matthias

    2018-03-01

    In order to access and filter content of life-science databases, full text search is a widely applied query interface. But its high flexibility and intuitiveness is paid for with potentially imprecise and incomplete query results. To reduce this drawback, query assistance systems suggest those combinations of keywords with the highest potential to match most of the relevant data records. Widespread approaches are syntactic query corrections that avoid misspelling and support expansion of words by suffixes and prefixes. Synonym expansion approaches apply thesauri, ontologies, and query logs. All need laborious curation and maintenance. Furthermore, access to query logs is in general restricted. Approaches that infer related queries by their query profile like research field, geographic location, co-authorship, affiliation etc. require user's registration and its public accessibility that contradict privacy concerns. To overcome these drawbacks, we implemented LAILAPS-QSM, a machine learning approach that reconstruct possible linguistic contexts of a given keyword query. The context is referred from the text records that are stored in the databases that are going to be queried or extracted for a general purpose query suggestion from PubMed abstracts and UniProt data. The supplied tool suite enables the pre-processing of these text records and the further computation of customized distributed word vectors. The latter are used to suggest alternative keyword queries. An evaluated of the query suggestion quality was done for plant science use cases. Locally present experts enable a cost-efficient quality assessment in the categories trait, biological entity, taxonomy, affiliation, and metabolic function which has been performed using ontology term similarities. LAILAPS-QSM mean information content similarity for 15 representative queries is 0.70, whereas 34% have a score above 0.80. In comparison, the information content similarity for human expert made query suggestions is 0.90. The software is either available as tool set to build and train dedicated query suggestion services or as already trained general purpose RESTful web service. The service uses open interfaces to be seamless embeddable into database frontends. The JAVA implementation uses highly optimized data structures and streamlined code to provide fast and scalable response for web service calls. The source code of LAILAPS-QSM is available under GNU General Public License version 2 in Bitbucket GIT repository: https://bitbucket.org/ipk_bit_team/bioescorte-suggestion.

  15. Facilitating Cohort Discovery by Enhancing Ontology Exploration, Query Management and Query Sharing for Large Clinical Data Repositories.

    PubMed

    Tao, Shiqiang; Cui, Licong; Wu, Xi; Zhang, Guo-Qiang

    2017-01-01

    To help researchers better access clinical data, we developed a prototype query engine called DataSphere for exploring large-scale integrated clinical data repositories. DataSphere expedites data importing using a NoSQL data management system and dynamically renders its user interface for concept-based querying tasks. DataSphere provides an interactive query-building interface together with query translation and optimization strategies, which enable users to build and execute queries effectively and efficiently. We successfully loaded a dataset of one million patients for University of Kentucky (UK) Healthcare into DataSphere with more than 300 million clinical data records. We evaluated DataSphere by comparing it with an instance of i2b2 deployed at UK Healthcare, demonstrating that DataSphere provides enhanced user experience for both query building and execution.

  16. Facilitating Cohort Discovery by Enhancing Ontology Exploration, Query Management and Query Sharing for Large Clinical Data Repositories

    PubMed Central

    Tao, Shiqiang; Cui, Licong; Wu, Xi; Zhang, Guo-Qiang

    2017-01-01

    To help researchers better access clinical data, we developed a prototype query engine called DataSphere for exploring large-scale integrated clinical data repositories. DataSphere expedites data importing using a NoSQL data management system and dynamically renders its user interface for concept-based querying tasks. DataSphere provides an interactive query-building interface together with query translation and optimization strategies, which enable users to build and execute queries effectively and efficiently. We successfully loaded a dataset of one million patients for University of Kentucky (UK) Healthcare into DataSphere with more than 300 million clinical data records. We evaluated DataSphere by comparing it with an instance of i2b2 deployed at UK Healthcare, demonstrating that DataSphere provides enhanced user experience for both query building and execution. PMID:29854239

  17. Improve Performance of Data Warehouse by Query Cache

    NASA Astrophysics Data System (ADS)

    Gour, Vishal; Sarangdevot, S. S.; Sharma, Anand; Choudhary, Vinod

    2010-11-01

    The primary goal of data warehouse is to free the information locked up in the operational database so that decision makers and business analyst can make queries, analysis and planning regardless of the data changes in operational database. As the number of queries is large, therefore, in certain cases there is reasonable probability that same query submitted by the one or multiple users at different times. Each time when query is executed, all the data of warehouse is analyzed to generate the result of that query. In this paper we will study how using query cache improves performance of Data Warehouse and try to find the common problems faced. These kinds of problems are faced by Data Warehouse administrators which are minimizes response time and improves the efficiency of query in data warehouse overall, particularly when data warehouse is updated at regular interval.

  18. Complex analyses on clinical information systems using restricted natural language querying to resolve time-event dependencies.

    PubMed

    Safari, Leila; Patrick, Jon D

    2018-06-01

    This paper reports on a generic framework to provide clinicians with the ability to conduct complex analyses on elaborate research topics using cascaded queries to resolve internal time-event dependencies in the research questions, as an extension to the proposed Clinical Data Analytics Language (CliniDAL). A cascaded query model is proposed to resolve internal time-event dependencies in the queries which can have up to five levels of criteria starting with a query to define subjects to be admitted into a study, followed by a query to define the time span of the experiment. Three more cascaded queries can be required to define control groups, control variables and output variables which all together simulate a real scientific experiment. According to the complexity of the research questions, the cascaded query model has the flexibility of merging some lower level queries for simple research questions or adding a nested query to each level to compose more complex queries. Three different scenarios (one of them contains two studies) are described and used for evaluation of the proposed solution. CliniDAL's complex analyses solution enables answering complex queries with time-event dependencies at most in a few hours which manually would take many days. An evaluation of results of the research studies based on the comparison between CliniDAL and SQL solutions reveals high usability and efficiency of CliniDAL's solution. Copyright © 2018 Elsevier Inc. All rights reserved.

  19. Evaluation of Sub Query Performance in SQL Server

    NASA Astrophysics Data System (ADS)

    Oktavia, Tanty; Sujarwo, Surya

    2014-03-01

    The paper explores several sub query methods used in a query and their impact on the query performance. The study uses experimental approach to evaluate the performance of each sub query methods combined with indexing strategy. The sub query methods consist of in, exists, relational operator and relational operator combined with top operator. The experimental shows that using relational operator combined with indexing strategy in sub query has greater performance compared with using same method without indexing strategy and also other methods. In summary, for application that emphasized on the performance of retrieving data from database, it better to use relational operator combined with indexing strategy. This study is done on Microsoft SQL Server 2012.

  20. Secure Skyline Queries on Cloud Platform.

    PubMed

    Liu, Jinfei; Yang, Juncheng; Xiong, Li; Pei, Jian

    2017-04-01

    Outsourcing data and computation to cloud server provides a cost-effective way to support large scale data storage and query processing. However, due to security and privacy concerns, sensitive data (e.g., medical records) need to be protected from the cloud server and other unauthorized users. One approach is to outsource encrypted data to the cloud server and have the cloud server perform query processing on the encrypted data only. It remains a challenging task to support various queries over encrypted data in a secure and efficient way such that the cloud server does not gain any knowledge about the data, query, and query result. In this paper, we study the problem of secure skyline queries over encrypted data. The skyline query is particularly important for multi-criteria decision making but also presents significant challenges due to its complex computations. We propose a fully secure skyline query protocol on data encrypted using semantically-secure encryption. As a key subroutine, we present a new secure dominance protocol, which can be also used as a building block for other queries. Finally, we provide both serial and parallelized implementations and empirically study the protocols in terms of efficiency and scalability under different parameter settings, verifying the feasibility of our proposed solutions.

  1. Distributed query plan generation using multiobjective genetic algorithm.

    PubMed

    Panicker, Shina; Kumar, T V Vijay

    2014-01-01

    A distributed query processing strategy, which is a key performance determinant in accessing distributed databases, aims to minimize the total query processing cost. One way to achieve this is by generating efficient distributed query plans that involve fewer sites for processing a query. In the case of distributed relational databases, the number of possible query plans increases exponentially with respect to the number of relations accessed by the query and the number of sites where these relations reside. Consequently, computing optimal distributed query plans becomes a complex problem. This distributed query plan generation (DQPG) problem has already been addressed using single objective genetic algorithm, where the objective is to minimize the total query processing cost comprising the local processing cost (LPC) and the site-to-site communication cost (CC). In this paper, this DQPG problem is formulated and solved as a biobjective optimization problem with the two objectives being minimize total LPC and minimize total CC. These objectives are simultaneously optimized using a multiobjective genetic algorithm NSGA-II. Experimental comparison of the proposed NSGA-II based DQPG algorithm with the single objective genetic algorithm shows that the former performs comparatively better and converges quickly towards optimal solutions for an observed crossover and mutation probability.

  2. Distributed Query Plan Generation Using Multiobjective Genetic Algorithm

    PubMed Central

    Panicker, Shina; Vijay Kumar, T. V.

    2014-01-01

    A distributed query processing strategy, which is a key performance determinant in accessing distributed databases, aims to minimize the total query processing cost. One way to achieve this is by generating efficient distributed query plans that involve fewer sites for processing a query. In the case of distributed relational databases, the number of possible query plans increases exponentially with respect to the number of relations accessed by the query and the number of sites where these relations reside. Consequently, computing optimal distributed query plans becomes a complex problem. This distributed query plan generation (DQPG) problem has already been addressed using single objective genetic algorithm, where the objective is to minimize the total query processing cost comprising the local processing cost (LPC) and the site-to-site communication cost (CC). In this paper, this DQPG problem is formulated and solved as a biobjective optimization problem with the two objectives being minimize total LPC and minimize total CC. These objectives are simultaneously optimized using a multiobjective genetic algorithm NSGA-II. Experimental comparison of the proposed NSGA-II based DQPG algorithm with the single objective genetic algorithm shows that the former performs comparatively better and converges quickly towards optimal solutions for an observed crossover and mutation probability. PMID:24963513

  3. Towards Hybrid Online On-Demand Querying of Realtime Data with Stateful Complex Event Processing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhou, Qunzhi; Simmhan, Yogesh; Prasanna, Viktor K.

    Emerging Big Data applications in areas like e-commerce and energy industry require both online and on-demand queries to be performed over vast and fast data arriving as streams. These present novel challenges to Big Data management systems. Complex Event Processing (CEP) is recognized as a high performance online query scheme which in particular deals with the velocity aspect of the 3-V’s of Big Data. However, traditional CEP systems do not consider data variety and lack the capability to embed ad hoc queries over the volume of data streams. In this paper, we propose H2O, a stateful complex event processing framework,more » to support hybrid online and on-demand queries over realtime data. We propose a semantically enriched event and query model to address data variety. A formal query algebra is developed to precisely capture the stateful and containment semantics of online and on-demand queries. We describe techniques to achieve the interactive query processing over realtime data featured by efficient online querying, dynamic stream data persistence and on-demand access. The system architecture is presented and the current implementation status reported.« less

  4. Demonstration of Hadoop-GIS: A Spatial Data Warehousing System Over MapReduce.

    PubMed

    Aji, Ablimit; Sun, Xiling; Vo, Hoang; Liu, Qioaling; Lee, Rubao; Zhang, Xiaodong; Saltz, Joel; Wang, Fusheng

    2013-11-01

    The proliferation of GPS-enabled devices, and the rapid improvement of scientific instruments have resulted in massive amounts of spatial data in the last decade. Support of high performance spatial queries on large volumes data has become increasingly important in numerous fields, which requires a scalable and efficient spatial data warehousing solution as existing approaches exhibit scalability limitations and efficiency bottlenecks for large scale spatial applications. In this demonstration, we present Hadoop-GIS - a scalable and high performance spatial query system over MapReduce. Hadoop-GIS provides an efficient spatial query engine to process spatial queries, data and space based partitioning, and query pipelines that parallelize queries implicitly on MapReduce. Hadoop-GIS also provides an expressive, SQL-like spatial query language for workload specification. We will demonstrate how spatial queries are expressed in spatially extended SQL queries, and submitted through a command line/web interface for execution. Parallel to our system demonstration, we explain the system architecture and details on how queries are translated to MapReduce operators, optimized, and executed on Hadoop. In addition, we will showcase how the system can be used to support two representative real world use cases: large scale pathology analytical imaging, and geo-spatial data warehousing.

  5. Query Health: standards-based, cross-platform population health surveillance

    PubMed Central

    Klann, Jeffrey G; Buck, Michael D; Brown, Jeffrey; Hadley, Marc; Elmore, Richard; Weber, Griffin M; Murphy, Shawn N

    2014-01-01

    Objective Understanding population-level health trends is essential to effectively monitor and improve public health. The Office of the National Coordinator for Health Information Technology (ONC) Query Health initiative is a collaboration to develop a national architecture for distributed, population-level health queries across diverse clinical systems with disparate data models. Here we review Query Health activities, including a standards-based methodology, an open-source reference implementation, and three pilot projects. Materials and methods Query Health defined a standards-based approach for distributed population health queries, using an ontology based on the Quality Data Model and Consolidated Clinical Document Architecture, Health Quality Measures Format (HQMF) as the query language, the Query Envelope as the secure transport layer, and the Quality Reporting Document Architecture as the result language. Results We implemented this approach using Informatics for Integrating Biology and the Bedside (i2b2) and hQuery for data analytics and PopMedNet for access control, secure query distribution, and response. We deployed the reference implementation at three pilot sites: two public health departments (New York City and Massachusetts) and one pilot designed to support Food and Drug Administration post-market safety surveillance activities. The pilots were successful, although improved cross-platform data normalization is needed. Discussions This initiative resulted in a standards-based methodology for population health queries, a reference implementation, and revision of the HQMF standard. It also informed future directions regarding interoperability and data access for ONC's Data Access Framework initiative. Conclusions Query Health was a test of the learning health system that supplied a functional methodology and reference implementation for distributed population health queries that has been validated at three sites. PMID:24699371

  6. Query Health: standards-based, cross-platform population health surveillance.

    PubMed

    Klann, Jeffrey G; Buck, Michael D; Brown, Jeffrey; Hadley, Marc; Elmore, Richard; Weber, Griffin M; Murphy, Shawn N

    2014-01-01

    Understanding population-level health trends is essential to effectively monitor and improve public health. The Office of the National Coordinator for Health Information Technology (ONC) Query Health initiative is a collaboration to develop a national architecture for distributed, population-level health queries across diverse clinical systems with disparate data models. Here we review Query Health activities, including a standards-based methodology, an open-source reference implementation, and three pilot projects. Query Health defined a standards-based approach for distributed population health queries, using an ontology based on the Quality Data Model and Consolidated Clinical Document Architecture, Health Quality Measures Format (HQMF) as the query language, the Query Envelope as the secure transport layer, and the Quality Reporting Document Architecture as the result language. We implemented this approach using Informatics for Integrating Biology and the Bedside (i2b2) and hQuery for data analytics and PopMedNet for access control, secure query distribution, and response. We deployed the reference implementation at three pilot sites: two public health departments (New York City and Massachusetts) and one pilot designed to support Food and Drug Administration post-market safety surveillance activities. The pilots were successful, although improved cross-platform data normalization is needed. This initiative resulted in a standards-based methodology for population health queries, a reference implementation, and revision of the HQMF standard. It also informed future directions regarding interoperability and data access for ONC's Data Access Framework initiative. Query Health was a test of the learning health system that supplied a functional methodology and reference implementation for distributed population health queries that has been validated at three sites. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.

  7. Using search engine query data to track pharmaceutical utilization: a study of statins.

    PubMed

    Schuster, Nathaniel M; Rogers, Mary A M; McMahon, Laurence F

    2010-08-01

    To examine temporal and geographic associations between Google queries for health information and healthcare utilization benchmarks. Retrospective longitudinal study. Using Google Trends and Google Insights for Search data, the search terms Lipitor (atorvastatin calcium; Pfizer, Ann Arbor, MI) and simvastatin were evaluated for change over time and for association with Lipitor revenues. The relationship between query data and community-based resource use per Medicare beneficiary was assessed for 35 US metropolitan areas. Google queries for Lipitor significantly decreased from January 2004 through June 2009 and queries for simvastatin significantly increased (P <.001 for both), particularly after Lipitor came off patent (P <.001 for change in slope). The mean number of Google queries for Lipitor correlated (r = 0.98) with the percentage change in Lipitor global revenues from 2004 to 2008 (P <.001). Query preference for Lipitor over simvastatin was positively associated (r = 0.40) with a community's use of Medicare services. For every 1% increase in utilization of Medicare services in a community, there was a 0.2-unit increase in the ratio of Lipitor queries to simvastatin queries in that community (P = .02). Specific search engine queries for medical information correlate with pharmaceutical revenue and with overall healthcare utilization in a community. This suggests that search query data can track community-wide characteristics in healthcare utilization and have the potential for informing payers and policy makers regarding trends in utilization.

  8. SPARQLGraph: a web-based platform for graphically querying biological Semantic Web databases.

    PubMed

    Schweiger, Dominik; Trajanoski, Zlatko; Pabinger, Stephan

    2014-08-15

    Semantic Web has established itself as a framework for using and sharing data across applications and database boundaries. Here, we present a web-based platform for querying biological Semantic Web databases in a graphical way. SPARQLGraph offers an intuitive drag & drop query builder, which converts the visual graph into a query and executes it on a public endpoint. The tool integrates several publicly available Semantic Web databases, including the databases of the just recently released EBI RDF platform. Furthermore, it provides several predefined template queries for answering biological questions. Users can easily create and save new query graphs, which can also be shared with other researchers. This new graphical way of creating queries for biological Semantic Web databases considerably facilitates usability as it removes the requirement of knowing specific query languages and database structures. The system is freely available at http://sparqlgraph.i-med.ac.at.

  9. PhyreStorm: A Web Server for Fast Structural Searches Against the PDB.

    PubMed

    Mezulis, Stefans; Sternberg, Michael J E; Kelley, Lawrence A

    2016-02-22

    The identification of structurally similar proteins can provide a range of biological insights, and accordingly, the alignment of a query protein to a database of experimentally determined protein structures is a technique commonly used in the fields of structural and evolutionary biology. The PhyreStorm Web server has been designed to provide comprehensive, up-to-date and rapid structural comparisons against the Protein Data Bank (PDB) combined with a rich and intuitive user interface. It is intended that this facility will enable biologists inexpert in bioinformatics access to a powerful tool for exploring protein structure relationships beyond what can be achieved by sequence analysis alone. By partitioning the PDB into similar structures, PhyreStorm is able to quickly discard the majority of structures that cannot possibly align well to a query protein, reducing the number of alignments required by an order of magnitude. PhyreStorm is capable of finding 93±2% of all highly similar (TM-score>0.7) structures in the PDB for each query structure, usually in less than 60s. PhyreStorm is available at http://www.sbg.bio.ic.ac.uk/phyrestorm/. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.

  10. Exploring Diverse Data Sets and Developing New Theories and Ideas With Project Integration Architecture

    NASA Technical Reports Server (NTRS)

    Benyo, Theresa L.; Jones, William H.

    2005-01-01

    The development of new ideas is the essence of scientific research. This is frequently done by developing models of physical processes and comparing model predictions with results from experiments. With models becoming ever more complex and data acquisition systems becoming more powerful, the researcher is burdened with wading through data ranging in volume up to a level of many terabytes and beyond. These data often come from multiple, heterogeneous sources and usually the methods for searching through it are at or near the manual level. In addition, current documentation methods are generally limited to researchers pen-and-paper style notebooks. Researchers may want to form constraint-based queries on a body of existing knowledge that is, itself, distributed over many different machines and environments and from the results of such queries then spawn additional queries, simulations, and data analyses in order to discover new insights into the problem being investigated. Currently, researchers are restricted to working within the boundaries of tools that are inefficient at probing current and legacy data to extend the knowledge of the problem at hand and reveal innovative and efficient solutions. A framework called the Project Integration Architecture is discussed that can address these desired functionalities.

  11. Improving accuracy for identifying related PubMed queries by an integrated approach.

    PubMed

    Lu, Zhiyong; Wilbur, W John

    2009-10-01

    PubMed is the most widely used tool for searching biomedical literature online. As with many other online search tools, a user often types a series of multiple related queries before retrieving satisfactory results to fulfill a single information need. Meanwhile, it is also a common phenomenon to see a user type queries on unrelated topics in a single session. In order to study PubMed users' search strategies, it is necessary to be able to automatically separate unrelated queries and group together related queries. Here, we report a novel approach combining both lexical and contextual analyses for segmenting PubMed query sessions and identifying related queries and compare its performance with the previous approach based solely on concept mapping. We experimented with our integrated approach on sample data consisting of 1539 pairs of consecutive user queries in 351 user sessions. The prediction results of 1396 pairs agreed with the gold-standard annotations, achieving an overall accuracy of 90.7%. This demonstrates that our approach is significantly better than the previously published method. By applying this approach to a one day query log of PubMed, we found that a significant proportion of information needs involved more than one PubMed query, and that most of the consecutive queries for the same information need are lexically related. Finally, the proposed PubMed distance is shown to be an accurate and meaningful measure for determining the contextual similarity between biological terms. The integrated approach can play a critical role in handling real-world PubMed query log data as is demonstrated in our experiments.

  12. Improving accuracy for identifying related PubMed queries by an integrated approach

    PubMed Central

    Lu, Zhiyong; Wilbur, W. John

    2009-01-01

    PubMed is the most widely used tool for searching biomedical literature online. As with many other online search tools, a user often types a series of multiple related queries before retrieving satisfactory results to fulfill a single information need. Meanwhile, it is also a common phenomenon to see a user type queries on unrelated topics in a single session. In order to study PubMed users’ search strategies, it is necessary to be able to automatically separate unrelated queries and group together related queries. Here, we report a novel approach combining both lexical and contextual analyses for segmenting PubMed query sessions and identifying related queries and compare its performance with the previous approach based solely on concept mapping. We experimented with our integrated approach on sample data consisting of 1,539 pairs of consecutive user queries in 351 user sessions. The prediction results of 1,396 pairs agreed with the gold-standard annotations, achieving an overall accuracy of 90.7%. This demonstrates that our approach is significantly better than the previously published method. By applying this approach to a one day query log of PubMed, we found that a significant proportion of information needs involved more than one PubMed query, and that most of the consecutive queries for the same information need are lexically related. Finally, the proposed PubMed distance is shown to be an accurate and meaningful measure for determining the contextual similarity between biological terms. The integrated approach can play a critical role in handling real-world PubMed query log data as is demonstrated in our experiments. PMID:19162232

  13. Multi-Bit Quantum Private Query

    NASA Astrophysics Data System (ADS)

    Shi, Wei-Xu; Liu, Xing-Tong; Wang, Jian; Tang, Chao-Jing

    2015-09-01

    Most of the existing Quantum Private Queries (QPQ) protocols provide only single-bit queries service, thus have to be repeated several times when more bits are retrieved. Wei et al.'s scheme for block queries requires a high-dimension quantum key distribution system to sustain, which is still restricted in the laboratory. Here, based on Markus Jakobi et al.'s single-bit QPQ protocol, we propose a multi-bit quantum private query protocol, in which the user can get access to several bits within one single query. We also extend the proposed protocol to block queries, using a binary matrix to guard database security. Analysis in this paper shows that our protocol has better communication complexity, implementability and can achieve a considerable level of security.

  14. Estimating Missing Features to Improve Multimedia Information Retrieval

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bagherjeiran, A; Love, N S; Kamath, C

    Retrieval in a multimedia database usually involves combining information from different modalities of data, such as text and images. However, all modalities of the data may not be available to form the query. The retrieval results from such a partial query are often less than satisfactory. In this paper, we present an approach to complete a partial query by estimating the missing features in the query. Our experiments with a database of images and their associated captions show that, with an initial text-only query, our completion method has similar performance to a full query with both image and text features.more » In addition, when we use relevance feedback, our approach outperforms the results obtained using a full query.« less

  15. a Spatiotemporal Aggregation Query Method Using Multi-Thread Parallel Technique Based on Regional Division

    NASA Astrophysics Data System (ADS)

    Liao, S.; Chen, L.; Li, J.; Xiong, W.; Wu, Q.

    2015-07-01

    Existing spatiotemporal database supports spatiotemporal aggregation query over massive moving objects datasets. Due to the large amounts of data and single-thread processing method, the query speed cannot meet the application requirements. On the other hand, the query efficiency is more sensitive to spatial variation then temporal variation. In this paper, we proposed a spatiotemporal aggregation query method using multi-thread parallel technique based on regional divison and implemented it on the server. Concretely, we divided the spatiotemporal domain into several spatiotemporal cubes, computed spatiotemporal aggregation on all cubes using the technique of multi-thread parallel processing, and then integrated the query results. By testing and analyzing on the real datasets, this method has improved the query speed significantly.

  16. Fixing Dataset Search

    NASA Technical Reports Server (NTRS)

    Lynnes, Chris

    2014-01-01

    Three current search engines are queried for ozone data at the GES DISC. The results range from sub-optimal to counter-intuitive. We propose a method to fix dataset search by implementing a robust relevancy ranking scheme. The relevancy ranking scheme is based on several heuristics culled from more than 20 years of helping users select datasets.

  17. A New Framework for Addressing Temporal Range Queries and Some Preliminary Results

    DTIC Science & Technology

    2003-01-29

    dominance reporting problem, which canbe solved in O(log n + f(v)) time using the data structure of Makris and Tsakalidis [11],which we call the...SIGMODInternational Conference on Management of Data, pages 426{435, 1991.[11] C. Makris and A. K. Tsakalidis . Algorithms for three-dimensional dominance searchingin

  18. The Clinical Practice Library of Medicine (CPLM): An on-line biomedical computer library. System documentation

    NASA Technical Reports Server (NTRS)

    Grams, R. R.

    1982-01-01

    A system designed to access a large range of available medical textbook information in an online interactive fashion is described. A high level query type database manager, INQUIRE, is used. Operating instructions, system flow diagrams, database descriptions, text generation, and error messages are discussed. User information is provided.

  19. Integrating Engineering Data Systems for NASA Spaceflight Projects

    NASA Technical Reports Server (NTRS)

    Carvalho, Robert E.; Tollinger, Irene; Bell, David G.; Berrios, Daniel C.

    2012-01-01

    NASA has a large range of custom-built and commercial data systems to support spaceflight programs. Some of the systems are re-used by many programs and projects over time. Management and systems engineering processes require integration of data across many of these systems, a difficult problem given the widely diverse nature of system interfaces and data models. This paper describes an ongoing project to use a central data model with a web services architecture to support the integration and access of linked data across engineering functions for multiple NASA programs. The work involves the implementation of a web service-based middleware system called Data Aggregator to bring together data from a variety of systems to support space exploration. Data Aggregator includes a central data model registry for storing and managing links between the data in disparate systems. Initially developed for NASA's Constellation Program needs, Data Aggregator is currently being repurposed to support the International Space Station Program and new NASA projects with processes that involve significant aggregating and linking of data. This change in user needs led to development of a more streamlined data model registry for Data Aggregator in order to simplify adding new project application data as well as standardization of the Data Aggregator query syntax to facilitate cross-application querying by client applications. This paper documents the approach from a set of stand-alone engineering systems from which data are manually retrieved and integrated, to a web of engineering data systems from which the latest data are automatically retrieved and more quickly and accurately integrated. This paper includes the lessons learned through these efforts, including the design and development of a service-oriented architecture and the evolution of the data model registry approaches as the effort continues to evolve and adapt to support multiple NASA programs and priorities.

  20. A Framework for WWW Query Processing

    NASA Technical Reports Server (NTRS)

    Wu, Binghui Helen; Wharton, Stephen (Technical Monitor)

    2000-01-01

    Query processing is the most common operation in a DBMS. Sophisticated query processing has been mainly targeted at a single enterprise environment providing centralized control over data and metadata. Submitting queries by anonymous users on the web is different in such a way that load balancing or DBMS' accessing control becomes the key issue. This paper provides a solution by introducing a framework for WWW query processing. The success of this framework lies in the utilization of query optimization techniques and the ontological approach. This methodology has proved to be cost effective at the NASA Goddard Space Flight Center Distributed Active Archive Center (GDAAC).

  1. QBIC project: querying images by content, using color, texture, and shape

    NASA Astrophysics Data System (ADS)

    Niblack, Carlton W.; Barber, Ron; Equitz, Will; Flickner, Myron D.; Glasman, Eduardo H.; Petkovic, Dragutin; Yanker, Peter; Faloutsos, Christos; Taubin, Gabriel

    1993-04-01

    In the query by image content (QBIC) project we are studying methods to query large on-line image databases using the images' content as the basis of the queries. Examples of the content we use include color, texture, and shape of image objects and regions. Potential applications include medical (`Give me other images that contain a tumor with a texture like this one'), photo-journalism (`Give me images that have blue at the top and red at the bottom'), and many others in art, fashion, cataloging, retailing, and industry. Key issues include derivation and computation of attributes of images and objects that provide useful query functionality, retrieval methods based on similarity as opposed to exact match, query by image example or user drawn image, the user interfaces, query refinement and navigation, high dimensional database indexing, and automatic and semi-automatic database population. We currently have a prototype system written in X/Motif and C running on an RS/6000 that allows a variety of queries, and a test database of over 1000 images and 1000 objects populated from commercially available photo clip art images. In this paper we present the main algorithms for color texture, shape and sketch query that we use, show example query results, and discuss future directions.

  2. Does query expansion limit our learning? A comparison of social-based expansion to content-based expansion for medical queries on the internet.

    PubMed

    Pentoney, Christopher; Harwell, Jeff; Leroy, Gondy

    2014-01-01

    Searching for medical information online is a common activity. While it has been shown that forming good queries is difficult, Google's query suggestion tool, a type of query expansion, aims to facilitate query formation. However, it is unknown how this expansion, which is based on what others searched for, affects the information gathering of the online community. To measure the impact of social-based query expansion, this study compared it with content-based expansion, i.e., what is really in the text. We used 138,906 medical queries from the AOL User Session Collection and expanded them using Google's Autocomplete method (social-based) and the content of the Google Web Corpus (content-based). We evaluated the specificity and ambiguity of the expansion terms for trigram queries. We also looked at the impact on the actual results using domain diversity and expansion edit distance. Results showed that the social-based method provided more precise expansion terms as well as terms that were less ambiguous. Expanded queries do not differ significantly in diversity when expanded using the social-based method (6.72 different domains returned in the first ten results, on average) vs. content-based method (6.73 different domains, on average).

  3. Secure Skyline Queries on Cloud Platform

    PubMed Central

    Liu, Jinfei; Yang, Juncheng; Xiong, Li; Pei, Jian

    2017-01-01

    Outsourcing data and computation to cloud server provides a cost-effective way to support large scale data storage and query processing. However, due to security and privacy concerns, sensitive data (e.g., medical records) need to be protected from the cloud server and other unauthorized users. One approach is to outsource encrypted data to the cloud server and have the cloud server perform query processing on the encrypted data only. It remains a challenging task to support various queries over encrypted data in a secure and efficient way such that the cloud server does not gain any knowledge about the data, query, and query result. In this paper, we study the problem of secure skyline queries over encrypted data. The skyline query is particularly important for multi-criteria decision making but also presents significant challenges due to its complex computations. We propose a fully secure skyline query protocol on data encrypted using semantically-secure encryption. As a key subroutine, we present a new secure dominance protocol, which can be also used as a building block for other queries. Finally, we provide both serial and parallelized implementations and empirically study the protocols in terms of efficiency and scalability under different parameter settings, verifying the feasibility of our proposed solutions. PMID:28883710

  4. Heuristic query optimization for query multiple table and multiple clausa on mobile finance application

    NASA Astrophysics Data System (ADS)

    Indrayana, I. N. E.; P, N. M. Wirasyanti D.; Sudiartha, I. KG

    2018-01-01

    Mobile application allow many users to access data from the application without being limited to space, space and time. Over time the data population of this application will increase. Data access time will cause problems if the data record has reached tens of thousands to millions of records.The objective of this research is to maintain the performance of data execution for large data records. One effort to maintain data access time performance is to apply query optimization method. The optimization used in this research is query heuristic optimization method. The built application is a mobile-based financial application using MySQL database with stored procedure therein. This application is used by more than one business entity in one database, thus enabling rapid data growth. In this stored procedure there is an optimized query using heuristic method. Query optimization is performed on a “Select” query that involves more than one table with multiple clausa. Evaluation is done by calculating the average access time using optimized and unoptimized queries. Access time calculation is also performed on the increase of population data in the database. The evaluation results shown the time of data execution with query heuristic optimization relatively faster than data execution time without using query optimization.

  5. Demonstration of Hadoop-GIS: A Spatial Data Warehousing System Over MapReduce

    PubMed Central

    Aji, Ablimit; Sun, Xiling; Vo, Hoang; Liu, Qioaling; Lee, Rubao; Zhang, Xiaodong; Saltz, Joel; Wang, Fusheng

    2016-01-01

    The proliferation of GPS-enabled devices, and the rapid improvement of scientific instruments have resulted in massive amounts of spatial data in the last decade. Support of high performance spatial queries on large volumes data has become increasingly important in numerous fields, which requires a scalable and efficient spatial data warehousing solution as existing approaches exhibit scalability limitations and efficiency bottlenecks for large scale spatial applications. In this demonstration, we present Hadoop-GIS – a scalable and high performance spatial query system over MapReduce. Hadoop-GIS provides an efficient spatial query engine to process spatial queries, data and space based partitioning, and query pipelines that parallelize queries implicitly on MapReduce. Hadoop-GIS also provides an expressive, SQL-like spatial query language for workload specification. We will demonstrate how spatial queries are expressed in spatially extended SQL queries, and submitted through a command line/web interface for execution. Parallel to our system demonstration, we explain the system architecture and details on how queries are translated to MapReduce operators, optimized, and executed on Hadoop. In addition, we will showcase how the system can be used to support two representative real world use cases: large scale pathology analytical imaging, and geo-spatial data warehousing. PMID:27617325

  6. Enhanced Freight Tracking System: Increased Visibility for the Future

    DTIC Science & Technology

    2009-11-01

    personnel queried the Enhanced Freight Tracking System ( EFTS ). The EFTS records showed that the items arrived at the port of a neighboring country two weeks...humanitarian missions to continue uninterrupted, potentially saving hundreds of lives. Because the EFTS provides precise shipment status data to mission...33 billion. In addition to materiel coming from Department of Defense (DOD) stock , the U.S. Government (USG) is procuring from domestic and

  7. PRIVACYGRID: Supporting Anonymous Location Queries in Mobile Environments

    DTIC Science & Technology

    2007-01-01

    continued price reduction of location tracking de- vices, location - based services (LBSs) are widely recognized as an important feature of the future computing... location - based services can operate completely anonymously, such as “when I pass a gas station, alert me with the unit price of the gas”. Others can...Anonymous Usage of Location - Based Services Through Spatial and Tempo- ral Cloaking. In Proceedings of the International Con- ference on Mobile

  8. Visual analytics for semantic queries of TerraSAR-X image content

    NASA Astrophysics Data System (ADS)

    Espinoza-Molina, Daniela; Alonso, Kevin; Datcu, Mihai

    2015-10-01

    With the continuous image product acquisition of satellite missions, the size of the image archives is considerably increasing every day as well as the variety and complexity of their content, surpassing the end-user capacity to analyse and exploit them. Advances in the image retrieval field have contributed to the development of tools for interactive exploration and extraction of the images from huge archives using different parameters like metadata, key-words, and basic image descriptors. Even though we count on more powerful tools for automated image retrieval and data analysis, we still face the problem of understanding and analyzing the results. Thus, a systematic computational analysis of these results is required in order to provide to the end-user a summary of the archive content in comprehensible terms. In this context, visual analytics combines automated analysis with interactive visualizations analysis techniques for an effective understanding, reasoning and decision making on the basis of very large and complex datasets. Moreover, currently several researches are focused on associating the content of the images with semantic definitions for describing the data in a format to be easily understood by the end-user. In this paper, we present our approach for computing visual analytics and semantically querying the TerraSAR-X archive. Our approach is mainly composed of four steps: 1) the generation of a data model that explains the information contained in a TerraSAR-X product. The model is formed by primitive descriptors and metadata entries, 2) the storage of this model in a database system, 3) the semantic definition of the image content based on machine learning algorithms and relevance feedback, and 4) querying the image archive using semantic descriptors as query parameters and computing the statistical analysis of the query results. The experimental results shows that with the help of visual analytics and semantic definitions we are able to explain the image content using semantic terms and the relations between them answering questions such as what is the percentage of urban area in a region? or what is the distribution of water bodies in a city?

  9. Know your market: use of online query tools to quantify trends in patient information-seeking behavior for varicose vein treatment.

    PubMed

    Harsha, Asheesh K; Schmitt, J Eric; Stavropoulos, S William

    2014-01-01

    To analyze Internet search data to characterize the temporal and geographic interest of Internet users in the United States in varicose vein treatment. From January 1, 2004, to September 1, 2012, the Google Trends tool was used to analyze query data for "varicose vein treatment" to identify individuals seeking treatment information for varicose veins. The term "varicose vein treatment" returned a search volume index (SVI), representing the search frequency relative to the total search volume during a specific time interval and region. Linear regression analysis and Kruskal-Wallis one-way analysis of variance were employed to characterize search results. Search traffic for varicose vein treatment increased by 520% over the 104-month study period. There was an annual mean increase of 28% (range, -18%-100%; standard deviation [SD], 35%), with a statistically significant linear increase in average yearly SVI over time (R(2) = 0.94, P < .0001). All years showed positive growth in mean SVI except for 2008 (18% decrease). There were statistically significant differences in SVI by month (Kruskal-Wallis, P < .0001) with significantly higher mean SVI compared with other months in May (190% increase; range, 26%-670%; SD, 15%) and June (209% increase; range, 35%-700%; SD, 20%). The southern United States showed significantly higher search traffic than all other regions (Tukey-Kramer, P < .00001). There have been significant increases in Internet search traffic related to varicose vein treatment in the past 8 years. Reflected in this trend is an annual peak in search traffic in the late spring months with an overall geographic bias toward southern states. Rigorous analysis of Internet search queries for medical procedures may prove useful to guide the efficient use of limited resources and marketing dollars. © 2013 The Society of Interventional Radiology Published by SIR All rights reserved.

  10. A high performance, ad-hoc, fuzzy query processing system for relational databases

    NASA Technical Reports Server (NTRS)

    Mansfield, William H., Jr.; Fleischman, Robert M.

    1992-01-01

    Database queries involving imprecise or fuzzy predicates are currently an evolving area of academic and industrial research. Such queries place severe stress on the indexing and I/O subsystems of conventional database environments since they involve the search of large numbers of records. The Datacycle architecture and research prototype is a database environment that uses filtering technology to perform an efficient, exhaustive search of an entire database. It has recently been modified to include fuzzy predicates in its query processing. The approach obviates the need for complex index structures, provides unlimited query throughput, permits the use of ad-hoc fuzzy membership functions, and provides a deterministic response time largely independent of query complexity and load. This paper describes the Datacycle prototype implementation of fuzzy queries and some recent performance results.

  11. Systems and methods for an extensible business application framework

    NASA Technical Reports Server (NTRS)

    Bell, David G. (Inventor); Crawford, Michael (Inventor)

    2012-01-01

    Method and systems for editing data from a query result include requesting a query result using a unique collection identifier for a collection of individual files and a unique identifier for a configuration file that specifies a data structure for the query result. A query result is generated that contains a plurality of fields as specified by the configuration file, by combining each of the individual files associated with a unique identifier for a collection of individual files. The query result data is displayed with a plurality of labels as specified in the configuration file. Edits can be performed by querying a collection of individual files using the configuration file, editing a portion of the query result, and transmitting only the edited information for storage back into a data repository.

  12. Multiple Query Evaluation Based on an Enhanced Genetic Algorithm.

    ERIC Educational Resources Information Center

    Tamine, Lynda; Chrisment, Claude; Boughanem, Mohand

    2003-01-01

    Explains the use of genetic algorithms to combine results from multiple query evaluations to improve relevance in information retrieval. Discusses niching techniques, relevance feedback techniques, and evolution heuristics, and compares retrieval results obtained by both genetic multiple query evaluation and classical single query evaluation…

  13. Relational Algebra and SQL: Better Together

    ERIC Educational Resources Information Center

    McMaster, Kirby; Sambasivam, Samuel; Hadfield, Steven; Wolthuis, Stuart

    2013-01-01

    In this paper, we describe how database instructors can teach Relational Algebra and Structured Query Language together through programming. Students write query programs consisting of sequences of Relational Algebra operations vs. Structured Query Language SELECT statements. The query programs can then be run interactively, allowing students to…

  14. A Firefly Algorithm-based Approach for Pseudo-Relevance Feedback: Application to Medical Database.

    PubMed

    Khennak, Ilyes; Drias, Habiba

    2016-11-01

    The difficulty of disambiguating the sense of the incomplete and imprecise keywords that are extensively used in the search queries has caused the failure of search systems to retrieve the desired information. One of the most powerful and promising method to overcome this shortcoming and improve the performance of search engines is Query Expansion, whereby the user's original query is augmented by new keywords that best characterize the user's information needs and produce more useful query. In this paper, a new Firefly Algorithm-based approach is proposed to enhance the retrieval effectiveness of query expansion while maintaining low computational complexity. In contrast to the existing literature, the proposed approach uses a Firefly Algorithm to find the best expanded query among a set of expanded query candidates. Moreover, this new approach allows the determination of the length of the expanded query empirically. Experimental results on MEDLINE, the on-line medical information database, show that our proposed approach is more effective and efficient compared to the state-of-the-art.

  15. RiPPAS: A Ring-Based Privacy-Preserving Aggregation Scheme in Wireless Sensor Networks

    PubMed Central

    Zhang, Kejia; Han, Qilong; Cai, Zhipeng; Yin, Guisheng

    2017-01-01

    Recently, data privacy in wireless sensor networks (WSNs) has been paid increased attention. The characteristics of WSNs determine that users’ queries are mainly aggregation queries. In this paper, the problem of processing aggregation queries in WSNs with data privacy preservation is investigated. A Ring-based Privacy-Preserving Aggregation Scheme (RiPPAS) is proposed. RiPPAS adopts ring structure to perform aggregation. It uses pseudonym mechanism for anonymous communication and uses homomorphic encryption technique to add noise to the data easily to be disclosed. RiPPAS can handle both sum() queries and min()/max() queries, while the existing privacy-preserving aggregation methods can only deal with sum() queries. For processing sum() queries, compared with the existing methods, RiPPAS has advantages in the aspects of privacy preservation and communication efficiency, which can be proved by theoretical analysis and simulation results. For processing min()/max() queries, RiPPAS provides effective privacy preservation and has low communication overhead. PMID:28178197

  16. Dynamic Querying of Mass-Storage RDF Data with Rule-Based Entailment Regimes

    NASA Astrophysics Data System (ADS)

    Ianni, Giovambattista; Krennwallner, Thomas; Martello, Alessandra; Polleres, Axel

    RDF Schema (RDFS) as a lightweight ontology language is gaining popularity and, consequently, tools for scalable RDFS inference and querying are needed. SPARQL has become recently a W3C standard for querying RDF data, but it mostly provides means for querying simple RDF graphs only, whereas querying with respect to RDFS or other entailment regimes is left outside the current specification. In this paper, we show that SPARQL faces certain unwanted ramifications when querying ontologies in conjunction with RDF datasets that comprise multiple named graphs, and we provide an extension for SPARQL that remedies these effects. Moreover, since RDFS inference has a close relationship with logic rules, we generalize our approach to select a custom ruleset for specifying inferences to be taken into account in a SPARQL query. We show that our extensions are technically feasible by providing benchmark results for RDFS querying in our prototype system GiaBATA, which uses Datalog coupled with a persistent Relational Database as a back-end for implementing SPARQL with dynamic rule-based inference. By employing different optimization techniques like magic set rewriting our system remains competitive with state-of-the-art RDFS querying systems.

  17. Mining the SDSS SkyServer SQL queries log

    NASA Astrophysics Data System (ADS)

    Hirota, Vitor M.; Santos, Rafael; Raddick, Jordan; Thakar, Ani

    2016-05-01

    SkyServer, the Internet portal for the Sloan Digital Sky Survey (SDSS) astronomic catalog, provides a set of tools that allows data access for astronomers and scientific education. One of SkyServer data access interfaces allows users to enter ad-hoc SQL statements to query the catalog. SkyServer also presents some template queries that can be used as basis for more complex queries. This interface has logged over 330 million queries submitted since 2001. It is expected that analysis of this data can be used to investigate usage patterns, identify potential new classes of queries, find similar queries, etc. and to shed some light on how users interact with the Sloan Digital Sky Survey data and how scientists have adopted the new paradigm of e-Science, which could in turn lead to enhancements on the user interfaces and experience in general. In this paper we review some approaches to SQL query mining, apply the traditional techniques used in the literature and present lessons learned, namely, that the general text mining approach for feature extraction and clustering does not seem to be adequate for this type of data, and, most importantly, we find that this type of analysis can result in very different queries being clustered together.

  18. Applying Query Structuring in Cross-language Retrieval.

    ERIC Educational Resources Information Center

    Pirkola, Ari; Puolamaki, Deniz; Jarvelin, Kalervo

    2003-01-01

    Explores ways to apply query structuring in cross-language information retrieval. Tested were: English queries translated into Finnish using an electronic dictionary, and run in a Finnish newspaper databases; effects of compound-based structuring using a proximity operator for translation equivalents of query language compound components; and a…

  19. Querying and Ranking XML Documents.

    ERIC Educational Resources Information Center

    Schlieder, Torsten; Meuss, Holger

    2002-01-01

    Discussion of XML, information retrieval, precision, and recall focuses on a retrieval technique that adopts the similarity measure of the vector space model, incorporates the document structure, and supports structured queries. Topics include a query model based on tree matching; structured queries and term-based ranking; and term frequency and…

  20. Advanced Query Formulation in Deductive Databases.

    ERIC Educational Resources Information Center

    Niemi, Timo; Jarvelin, Kalervo

    1992-01-01

    Discusses deductive databases and database management systems (DBMS) and introduces a framework for advanced query formulation for end users. Recursive processing is described, a sample extensional database is presented, query types are explained, and criteria for advanced query formulation from the end user's viewpoint are examined. (31…

  1. A Semantic Graph Query Language

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kaplan, I L

    2006-10-16

    Semantic graphs can be used to organize large amounts of information from a number of sources into one unified structure. A semantic query language provides a foundation for extracting information from the semantic graph. The graph query language described here provides a simple, powerful method for querying semantic graphs.

  2. Using Common Table Expressions to Build a Scalable Boolean Query Generator for Clinical Data Warehouses

    PubMed Central

    Harris, Daniel R.; Henderson, Darren W.; Kavuluru, Ramakanth; Stromberg, Arnold J.; Johnson, Todd R.

    2015-01-01

    We present a custom, Boolean query generator utilizing common-table expressions (CTEs) that is capable of scaling with big datasets. The generator maps user-defined Boolean queries, such as those interactively created in clinical-research and general-purpose healthcare tools, into SQL. We demonstrate the effectiveness of this generator by integrating our work into the Informatics for Integrating Biology and the Bedside (i2b2) query tool and show that it is capable of scaling. Our custom generator replaces and outperforms the default query generator found within the Clinical Research Chart (CRC) cell of i2b2. In our experiments, sixteen different types of i2b2 queries were identified by varying four constraints: date, frequency, exclusion criteria, and whether selected concepts occurred in the same encounter. We generated non-trivial, random Boolean queries based on these 16 types; the corresponding SQL queries produced by both generators were compared by execution times. The CTE-based solution significantly outperformed the default query generator and provided a much more consistent response time across all query types (M=2.03, SD=6.64 vs. M=75.82, SD=238.88 seconds). Without costly hardware upgrades, we provide a scalable solution based on CTEs with very promising empirical results centered on performance gains. The evaluation methodology used for this provides a means of profiling clinical data warehouse performance. PMID:25192572

  3. A Hybrid Spatio-Temporal Data Indexing Method for Trajectory Databases

    PubMed Central

    Ke, Shengnan; Gong, Jun; Li, Songnian; Zhu, Qing; Liu, Xintao; Zhang, Yeting

    2014-01-01

    In recent years, there has been tremendous growth in the field of indoor and outdoor positioning sensors continuously producing huge volumes of trajectory data that has been used in many fields such as location-based services or location intelligence. Trajectory data is massively increased and semantically complicated, which poses a great challenge on spatio-temporal data indexing. This paper proposes a spatio-temporal data indexing method, named HBSTR-tree, which is a hybrid index structure comprising spatio-temporal R-tree, B*-tree and Hash table. To improve the index generation efficiency, rather than directly inserting trajectory points, we group consecutive trajectory points as nodes according to their spatio-temporal semantics and then insert them into spatio-temporal R-tree as leaf nodes. Hash table is used to manage the latest leaf nodes to reduce the frequency of insertion. A new spatio-temporal interval criterion and a new node-choosing sub-algorithm are also proposed to optimize spatio-temporal R-tree structures. In addition, a B*-tree sub-index of leaf nodes is built to query the trajectories of targeted objects efficiently. Furthermore, a database storage scheme based on a NoSQL-type DBMS is also proposed for the purpose of cloud storage. Experimental results prove that HBSTR-tree outperforms TB*-tree in some aspects such as generation efficiency, query performance and query type. PMID:25051028

  4. A hybrid spatio-temporal data indexing method for trajectory databases.

    PubMed

    Ke, Shengnan; Gong, Jun; Li, Songnian; Zhu, Qing; Liu, Xintao; Zhang, Yeting

    2014-07-21

    In recent years, there has been tremendous growth in the field of indoor and outdoor positioning sensors continuously producing huge volumes of trajectory data that has been used in many fields such as location-based services or location intelligence. Trajectory data is massively increased and semantically complicated, which poses a great challenge on spatio-temporal data indexing. This paper proposes a spatio-temporal data indexing method, named HBSTR-tree, which is a hybrid index structure comprising spatio-temporal R-tree, B*-tree and Hash table. To improve the index generation efficiency, rather than directly inserting trajectory points, we group consecutive trajectory points as nodes according to their spatio-temporal semantics and then insert them into spatio-temporal R-tree as leaf nodes. Hash table is used to manage the latest leaf nodes to reduce the frequency of insertion. A new spatio-temporal interval criterion and a new node-choosing sub-algorithm are also proposed to optimize spatio-temporal R-tree structures. In addition, a B*-tree sub-index of leaf nodes is built to query the trajectories of targeted objects efficiently. Furthermore, a database storage scheme based on a NoSQL-type DBMS is also proposed for the purpose of cloud storage. Experimental results prove that HBSTR-tree outperforms TB*-tree in some aspects such as generation efficiency, query performance and query type.

  5. RCQ-GA: RDF Chain Query Optimization Using Genetic Algorithms

    NASA Astrophysics Data System (ADS)

    Hogenboom, Alexander; Milea, Viorel; Frasincar, Flavius; Kaymak, Uzay

    The application of Semantic Web technologies in an Electronic Commerce environment implies a need for good support tools. Fast query engines are needed for efficient querying of large amounts of data, usually represented using RDF. We focus on optimizing a special class of SPARQL queries, the so-called RDF chain queries. For this purpose, we devise a genetic algorithm called RCQ-GA that determines the order in which joins need to be performed for an efficient evaluation of RDF chain queries. The approach is benchmarked against a two-phase optimization algorithm, previously proposed in literature. The more complex a query is, the more RCQ-GA outperforms the benchmark in solution quality, execution time needed, and consistency of solution quality. When the algorithms are constrained by a time limit, the overall performance of RCQ-GA compared to the benchmark further improves.

  6. Query Language for Location-Based Services: A Model Checking Approach

    NASA Astrophysics Data System (ADS)

    Hoareau, Christian; Satoh, Ichiro

    We present a model checking approach to the rationale, implementation, and applications of a query language for location-based services. Such query mechanisms are necessary so that users, objects, and/or services can effectively benefit from the location-awareness of their surrounding environment. The underlying data model is founded on a symbolic model of space organized in a tree structure. Once extended to a semantic model for modal logic, we regard location query processing as a model checking problem, and thus define location queries as hybrid logicbased formulas. Our approach is unique to existing research because it explores the connection between location models and query processing in ubiquitous computing systems, relies on a sound theoretical basis, and provides modal logic-based query mechanisms for expressive searches over a decentralized data structure. A prototype implementation is also presented and will be discussed.

  7. Towards Building a High Performance Spatial Query System for Large Scale Medical Imaging Data.

    PubMed

    Aji, Ablimit; Wang, Fusheng; Saltz, Joel H

    2012-11-06

    Support of high performance queries on large volumes of scientific spatial data is becoming increasingly important in many applications. This growth is driven by not only geospatial problems in numerous fields, but also emerging scientific applications that are increasingly data- and compute-intensive. For example, digital pathology imaging has become an emerging field during the past decade, where examination of high resolution images of human tissue specimens enables more effective diagnosis, prediction and treatment of diseases. Systematic analysis of large-scale pathology images generates tremendous amounts of spatially derived quantifications of micro-anatomic objects, such as nuclei, blood vessels, and tissue regions. Analytical pathology imaging provides high potential to support image based computer aided diagnosis. One major requirement for this is effective querying of such enormous amount of data with fast response, which is faced with two major challenges: the "big data" challenge and the high computation complexity. In this paper, we present our work towards building a high performance spatial query system for querying massive spatial data on MapReduce. Our framework takes an on demand index building approach for processing spatial queries and a partition-merge approach for building parallel spatial query pipelines, which fits nicely with the computing model of MapReduce. We demonstrate our framework on supporting multi-way spatial joins for algorithm evaluation and nearest neighbor queries for microanatomic objects. To reduce query response time, we propose cost based query optimization to mitigate the effect of data skew. Our experiments show that the framework can efficiently support complex analytical spatial queries on MapReduce.

  8. Towards Building a High Performance Spatial Query System for Large Scale Medical Imaging Data

    PubMed Central

    Aji, Ablimit; Wang, Fusheng; Saltz, Joel H.

    2013-01-01

    Support of high performance queries on large volumes of scientific spatial data is becoming increasingly important in many applications. This growth is driven by not only geospatial problems in numerous fields, but also emerging scientific applications that are increasingly data- and compute-intensive. For example, digital pathology imaging has become an emerging field during the past decade, where examination of high resolution images of human tissue specimens enables more effective diagnosis, prediction and treatment of diseases. Systematic analysis of large-scale pathology images generates tremendous amounts of spatially derived quantifications of micro-anatomic objects, such as nuclei, blood vessels, and tissue regions. Analytical pathology imaging provides high potential to support image based computer aided diagnosis. One major requirement for this is effective querying of such enormous amount of data with fast response, which is faced with two major challenges: the “big data” challenge and the high computation complexity. In this paper, we present our work towards building a high performance spatial query system for querying massive spatial data on MapReduce. Our framework takes an on demand index building approach for processing spatial queries and a partition-merge approach for building parallel spatial query pipelines, which fits nicely with the computing model of MapReduce. We demonstrate our framework on supporting multi-way spatial joins for algorithm evaluation and nearest neighbor queries for microanatomic objects. To reduce query response time, we propose cost based query optimization to mitigate the effect of data skew. Our experiments show that the framework can efficiently support complex analytical spatial queries on MapReduce. PMID:24501719

  9. Query Expansion and Query Translation as Logical Inference.

    ERIC Educational Resources Information Center

    Nie, Jian-Yun

    2003-01-01

    Examines query expansion during query translation in cross language information retrieval and develops a general framework for inferential information retrieval in two particular contexts: using fuzzy logic and probability theory. Obtains evaluation formulas that are shown to strongly correspond to those used in other information retrieval models.…

  10. End-User Use of Data Base Query Language: Pros and Cons.

    ERIC Educational Resources Information Center

    Nicholes, Walter

    1988-01-01

    Man-machine interface, the concept of a computer "query," a review of database technology, and a description of the use of query languages at Brigham Young University are discussed. The pros and cons of end-user use of database query languages are explored. (Author/MLW)

  11. Information Retrieval Using UMLS-based Structured Queries

    PubMed Central

    Fagan, Lawrence M.; Berrios, Daniel C.; Chan, Albert; Cucina, Russell; Datta, Anupam; Shah, Maulik; Surendran, Sujith

    2001-01-01

    During the last three years, we have developed and described components of ELBook, a semantically based information-retrieval system [1-4]. Using these components, domain experts can specify a query model, indexers can use the query model to index documents, and end-users can search these documents for instances of indexed queries.

  12. A Relational Algebra Query Language for Programming Relational Databases

    ERIC Educational Resources Information Center

    McMaster, Kirby; Sambasivam, Samuel; Anderson, Nicole

    2011-01-01

    In this paper, we describe a Relational Algebra Query Language (RAQL) and Relational Algebra Query (RAQ) software product we have developed that allows database instructors to teach relational algebra through programming. Instead of defining query operations using mathematical notation (the approach commonly taken in database textbooks), students…

  13. Recommender engine for continuous-time quantum Monte Carlo methods

    NASA Astrophysics Data System (ADS)

    Huang, Li; Yang, Yi-feng; Wang, Lei

    2017-03-01

    Recommender systems play an essential role in the modern business world. They recommend favorable items such as books, movies, and search queries to users based on their past preferences. Applying similar ideas and techniques to Monte Carlo simulations of physical systems boosts their efficiency without sacrificing accuracy. Exploiting the quantum to classical mapping inherent in the continuous-time quantum Monte Carlo methods, we construct a classical molecular gas model to reproduce the quantum distributions. We then utilize powerful molecular simulation techniques to propose efficient quantum Monte Carlo updates. The recommender engine approach provides a general way to speed up the quantum impurity solvers.

  14. An Ensemble Approach for Expanding Queries

    DTIC Science & Technology

    2012-11-01

    0.39 pain^0.39 Hospital 15094 0.82 hospital^0.82 Miscarriage 45 3.35 miscarriage ^3.35 Radiotherapy 53 3.28 radiotherapy^3.28 Hypoaldosteronism 3...negated query is the expansion of the original query with negation terms preceding each word. For example, the negated version of “ miscarriage ^3.35...includes “no miscarriage ”^3.35 and “not miscarriage ”^3.35. If a document is the result of both original query and negated query, its score is

  15. Resource Needs and Pedagogical Value of Web Mapping for Spatial Thinking

    ERIC Educational Resources Information Center

    Manson, Steven; Shannon, Jerry; Eria, Sami; Kne, Len; Dyke, Kevin; Nelson, Sara; Batra, Lalit; Bonsal, Dudley; Kernik, Melinda; Immich, Jennifer; Matson, Laura

    2014-01-01

    Web mapping involves publishing and using maps via the Internet, and can range from presenting static maps to offering dynamic data querying and spatial analysis. Web mapping is seen as a promising way to support development of spatial thinking in the classroom but there are unanswered questions about how this promise plays out in reality. This…

  16. Testing the Usability of Interactive Visualizations for Complex Problem-Solving: Findings Related to Improving Interfaces and Help.

    ERIC Educational Resources Information Center

    Mirel, Barbara

    2001-01-01

    Conducts a scenario-based usability test with 10 data analysts using visual querying (visually analyzing data with interactive graphics). Details a range of difficulties found in visual selection that, at times, gave rise to inaccurate selections, invalid conclusions, and misguided decisions. Argues that support for visual selection must be built…

  17. Remanat old-growth longleaf pine (Pinus palustris Mill.) savannas and forests of the southeastern USA: Status and threats

    Treesearch

    J. Moragan Varner; John S. Kush

    2004-01-01

    Old-growth savannas and forests dominated by longleaf pine (Pinus palustris Mill.) are of great conservation and research interest. Comprehensive inventories of old-growth communities, however, are lacking for most of longleaf pine's natural range. We searched the literature, interviewed regional experts, queried email discussion lists and...

  18. Performing Cities: Engaging the High-Tech Flaneur

    ERIC Educational Resources Information Center

    Rodriguez, Karen; Rink, Bradley

    2011-01-01

    As international educators, the overwhelmingly urban context of study abroad requires not only a more intense querying of "the city" as a concept, but also more attention to how students could engage with and learn from the urban landscape. In this article, the authors begin with a theoretical flanerie through a range of ideas related to the city…

  19. An ontology-based comparative anatomy information system

    PubMed Central

    Travillian, Ravensara S.; Diatchka, Kremena; Judge, Tejinder K.; Wilamowska, Katarzyna; Shapiro, Linda G.

    2010-01-01

    Introduction This paper describes the design, implementation, and potential use of a comparative anatomy information system (CAIS) for querying on similarities and differences between homologous anatomical structures across species, the knowledge base it operates upon, the method it uses for determining the answers to the queries, and the user interface it employs to present the results. The relevant informatics contributions of our work include (1) the development and application of the structural difference method, a formalism for symbolically representing anatomical similarities and differences across species; (2) the design of the structure of a mapping between the anatomical models of two different species and its application to information about specific structures in humans, mice, and rats; and (3) the design of the internal syntax and semantics of the query language. These contributions provide the foundation for the development of a working system that allows users to submit queries about the similarities and differences between mouse, rat, and human anatomy; delivers result sets that describe those similarities and differences in symbolic terms; and serves as a prototype for the extension of the knowledge base to any number of species. Additionally, we expanded the domain knowledge by identifying medically relevant structural questions for the human, the mouse, and the rat, and made an initial foray into the validation of the application and its content by means of user questionnaires, software testing, and other feedback. Methods The anatomical structures of the species to be compared, as well as the mappings between species, are modeled on templates from the Foundational Model of Anatomy knowledge base, and compared using graph-matching techniques. A graphical user interface allows users to issue queries that retrieve information concerning similarities and differences between structures in the species being examined. Queries from diverse information sources, including domain experts, peer-reviewed articles, and reference books, have been used to test the system and to illustrate its potential use in comparative anatomy studies. Results 157 test queries were submitted to the CAIS system, and all of them were correctly answered. The interface was evaluated in terms of clarity and ease of use. This testing determined that the application works well, and is fairly intuitive to use, but users want to see more clarification of the meaning of the different types of possible queries. Some of the interface issues will naturally be resolved as we refine our conceptual model to deal with partial and complex homologies in the content. Conclusions The CAIS system and its associated methods are expected to be useful to biologists and translational medicine researchers. Possible applications range from supporting theoretical work in clarifying and modeling ontogenetic, physiological, pathological, and evolutionary transformations, to concrete techniques for improving the analysis of genotype–phenotype relationships among various animal models in support of a wide array of clinical and scientific initiatives. PMID:21146377

  20. EpiK: A Knowledge Base for Epidemiological Modeling and Analytics of Infectious Diseases

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hasan, S. M. Shamimul; Fox, Edward A.; Bisset, Keith

    Computational epidemiology seeks to develop computational methods to study the distribution and determinants of health-related states or events (including disease), and the application of this study to the control of diseases and other health problems. Recent advances in computing and data sciences have led to the development of innovative modeling environments to support this important goal. The datasets used to drive the dynamic models as well as the data produced by these models presents unique challenges owing to their size, heterogeneity and diversity. These datasets form the basis of effective and easy to use decision support and analytical environments. Asmore » a result, it is important to develop scalable data management systems to store, manage and integrate these datasets. In this paper, we develop EpiK—a knowledge base that facilitates the development of decision support and analytical environments to support epidemic science. An important goal is to develop a framework that links the input as well as output datasets to facilitate effective spatio-temporal and social reasoning that is critical in planning and intervention analysis before and during an epidemic. The data management framework links modeling workflow data and its metadata using a controlled vocabulary. The metadata captures information about storage, the mapping between the linked model and the physical layout, and relationships to support services. EpiK is designed to support agent-based modeling and analytics frameworks—aggregate models can be seen as special cases and are thus supported. We use semantic web technologies to create a representation of the datasets that encapsulates both the location and the schema heterogeneity. The choice of RDF as a representation language is motivated by the diversity and growth of the datasets that need to be integrated. A query bank is developed—the queries capture a broad range of questions that can be posed and answered during a typical case study pertaining to disease outbreaks. The queries are constructed using SPARQL Protocol and RDF Query Language (SPARQL) over the EpiK. EpiK can hide schema and location heterogeneity while efficiently supporting queries that span the computational epidemiology modeling pipeline: from model construction to simulation output. As a result, we show that the performance of benchmark queries varies significantly with respect to the choice of hardware underlying the database and resource description framework (RDF) engine.« less

  1. EpiK: A Knowledge Base for Epidemiological Modeling and Analytics of Infectious Diseases

    DOE PAGES

    Hasan, S. M. Shamimul; Fox, Edward A.; Bisset, Keith; ...

    2017-11-06

    Computational epidemiology seeks to develop computational methods to study the distribution and determinants of health-related states or events (including disease), and the application of this study to the control of diseases and other health problems. Recent advances in computing and data sciences have led to the development of innovative modeling environments to support this important goal. The datasets used to drive the dynamic models as well as the data produced by these models presents unique challenges owing to their size, heterogeneity and diversity. These datasets form the basis of effective and easy to use decision support and analytical environments. Asmore » a result, it is important to develop scalable data management systems to store, manage and integrate these datasets. In this paper, we develop EpiK—a knowledge base that facilitates the development of decision support and analytical environments to support epidemic science. An important goal is to develop a framework that links the input as well as output datasets to facilitate effective spatio-temporal and social reasoning that is critical in planning and intervention analysis before and during an epidemic. The data management framework links modeling workflow data and its metadata using a controlled vocabulary. The metadata captures information about storage, the mapping between the linked model and the physical layout, and relationships to support services. EpiK is designed to support agent-based modeling and analytics frameworks—aggregate models can be seen as special cases and are thus supported. We use semantic web technologies to create a representation of the datasets that encapsulates both the location and the schema heterogeneity. The choice of RDF as a representation language is motivated by the diversity and growth of the datasets that need to be integrated. A query bank is developed—the queries capture a broad range of questions that can be posed and answered during a typical case study pertaining to disease outbreaks. The queries are constructed using SPARQL Protocol and RDF Query Language (SPARQL) over the EpiK. EpiK can hide schema and location heterogeneity while efficiently supporting queries that span the computational epidemiology modeling pipeline: from model construction to simulation output. As a result, we show that the performance of benchmark queries varies significantly with respect to the choice of hardware underlying the database and resource description framework (RDF) engine.« less

  2. The Effect of Different Materials on the Accuracy of the HYDRA Optical-Fiber-Coupled Coherent Range/Pressure Measurement System and the Development of the Health Care Database System at Old Dominion University

    NASA Technical Reports Server (NTRS)

    Johnson, Kimberly D.

    1995-01-01

    The objective of the first project involving the HYDRA laser system was to determine what effects, if any, could been seen in the system's measurements when testing was done with objects composed of different materials. Ideally we would like to have seen that the range of measurements were all within the accepted 0.4 millimeter accuracy of the system. Unfortunately our results were not as we had hoped, and there did appear to be some significant difference in the measurements made on objects composed of different materials. The second project is a continuing project at Old Dominion University. The ultimate goal is to develop a medical database that allows a doctor or hospital to keep medical records on line. The current data of the system consisted of one patient whose medical data had been hard coded to allow for a demonstration of the potentials of the system. The short term goal for this summer was to add additional patients to the system for testing, and to eliminate the hard coding of data by creating a database where data could be stored and queried to produce the results seen in the current state.

  3. The Analysis of RDF Semantic Data Storage Optimization in Large Data Era

    NASA Astrophysics Data System (ADS)

    He, Dandan; Wang, Lijuan; Wang, Can

    2018-03-01

    With the continuous development of information technology and network technology in China, the Internet has also ushered in the era of large data. In order to obtain the effective acquisition of information in the era of large data, it is necessary to optimize the existing RDF semantic data storage and realize the effective query of various data. This paper discusses the storage optimization of RDF semantic data under large data.

  4. A novel adaptive Cuckoo search for optimal query plan generation.

    PubMed

    Gomathi, Ramalingam; Sharmila, Dhandapani

    2014-01-01

    The emergence of multiple web pages day by day leads to the development of the semantic web technology. A World Wide Web Consortium (W3C) standard for storing semantic web data is the resource description framework (RDF). To enhance the efficiency in the execution time for querying large RDF graphs, the evolving metaheuristic algorithms become an alternate to the traditional query optimization methods. This paper focuses on the problem of query optimization of semantic web data. An efficient algorithm called adaptive Cuckoo search (ACS) for querying and generating optimal query plan for large RDF graphs is designed in this research. Experiments were conducted on different datasets with varying number of predicates. The experimental results have exposed that the proposed approach has provided significant results in terms of query execution time. The extent to which the algorithm is efficient is tested and the results are documented.

  5. Query-Based Outlier Detection in Heterogeneous Information Networks.

    PubMed

    Kuck, Jonathan; Zhuang, Honglei; Yan, Xifeng; Cam, Hasan; Han, Jiawei

    2015-03-01

    Outlier or anomaly detection in large data sets is a fundamental task in data science, with broad applications. However, in real data sets with high-dimensional space, most outliers are hidden in certain dimensional combinations and are relative to a user's search space and interest. It is often more effective to give power to users and allow them to specify outlier queries flexibly, and the system will then process such mining queries efficiently. In this study, we introduce the concept of query-based outlier in heterogeneous information networks, design a query language to facilitate users to specify such queries flexibly, define a good outlier measure in heterogeneous networks, and study how to process outlier queries efficiently in large data sets. Our experiments on real data sets show that following such a methodology, interesting outliers can be defined and uncovered flexibly and effectively in large heterogeneous networks.

  6. Query-Based Outlier Detection in Heterogeneous Information Networks

    PubMed Central

    Kuck, Jonathan; Zhuang, Honglei; Yan, Xifeng; Cam, Hasan; Han, Jiawei

    2015-01-01

    Outlier or anomaly detection in large data sets is a fundamental task in data science, with broad applications. However, in real data sets with high-dimensional space, most outliers are hidden in certain dimensional combinations and are relative to a user’s search space and interest. It is often more effective to give power to users and allow them to specify outlier queries flexibly, and the system will then process such mining queries efficiently. In this study, we introduce the concept of query-based outlier in heterogeneous information networks, design a query language to facilitate users to specify such queries flexibly, define a good outlier measure in heterogeneous networks, and study how to process outlier queries efficiently in large data sets. Our experiments on real data sets show that following such a methodology, interesting outliers can be defined and uncovered flexibly and effectively in large heterogeneous networks. PMID:27064397

  7. Querying and Extracting Timeline Information from Road Traffic Sensor Data

    PubMed Central

    Imawan, Ardi; Indikawati, Fitri Indra; Kwon, Joonho; Rao, Praveen

    2016-01-01

    The escalation of traffic congestion in urban cities has urged many countries to use intelligent transportation system (ITS) centers to collect historical traffic sensor data from multiple heterogeneous sources. By analyzing historical traffic data, we can obtain valuable insights into traffic behavior. Many existing applications have been proposed with limited analysis results because of the inability to cope with several types of analytical queries. In this paper, we propose the QET (querying and extracting timeline information) system—a novel analytical query processing method based on a timeline model for road traffic sensor data. To address query performance, we build a TQ-index (timeline query-index) that exploits spatio-temporal features of timeline modeling. We also propose an intuitive timeline visualization method to display congestion events obtained from specified query parameters. In addition, we demonstrate the benefit of our system through a performance evaluation using a Busan ITS dataset and a Seattle freeway dataset. PMID:27563900

  8. Policy Compliance of Queries for Private Information Retrieval

    DTIC Science & Technology

    2010-11-01

    SPARQL, unfortunately, is not in RDF and so we had to develop tools to translate SPARQL queries into RDF to be used by our policy compliance prototype...policy-assurance/sparql2n3.py) that accepts SPARQL queries and returns the translated query in our simplified ontology. An example of a translated

  9. Knowledge Query Language (KQL)

    DTIC Science & Technology

    2016-02-12

    Lexington Massachusetts This page intentionally left blank. iii EXECUTIVE SUMMARY Currently, queries for data ...retrieval from non-Structured Query Language (NoSQL) data stores are tightly coupled to the specific implementation of the data store implementation...independent of the storage content and format for querying NoSQL or relational data stores. This approach uses address expressions (or A-Expressions

  10. Fragger: a protein fragment picker for structural queries.

    PubMed

    Berenger, Francois; Simoncini, David; Voet, Arnout; Shrestha, Rojan; Zhang, Kam Y J

    2017-01-01

    Protein modeling and design activities often require querying the Protein Data Bank (PDB) with a structural fragment, possibly containing gaps. For some applications, it is preferable to work on a specific subset of the PDB or with unpublished structures. These requirements, along with specific user needs, motivated the creation of a new software to manage and query 3D protein fragments. Fragger is a protein fragment picker that allows protein fragment databases to be created and queried. All fragment lengths are supported and any set of PDB files can be used to create a database. Fragger can efficiently search a fragment database with a query fragment and a distance threshold. Matching fragments are ranked by distance to the query. The query fragment can have structural gaps and the allowed amino acid sequences matching a query can be constrained via a regular expression of one-letter amino acid codes. Fragger also incorporates a tool to compute the backbone RMSD of one versus many fragments in high throughput. Fragger should be useful for protein design, loop grafting and related structural bioinformatics tasks.

  11. Query by forms: User-oriented relational database retrieving system and its application in analysis of experiment data

    NASA Astrophysics Data System (ADS)

    Skotniczny, Zbigniew

    1989-12-01

    The Query by Forms (QbF) system is a user-oriented interactive tool for querying large relational database with minimal queries difinition cost. The system was worked out under the assumption that user's time and effort for defining needed queries is the most severe bottleneck. The system may be applied in any Rdb/VMS databases system and is recommended for specific information systems of any project where end-user queries cannot be foreseen. The tool is dedicated to specialist of an application domain who have to analyze data maintained in database from any needed point of view, who do not need to know commercial databases languages. The paper presents the system developed as a compromise between its functionality and usability. User-system communication via a menu-driven "tree-like" structure of screen-forms which produces a query difinition and execution is discussed in detail. Output of query results (printed reports and graphics) is also discussed. Finally the paper shows one application of QbF to a HERA-project.

  12. Hybrid ontology for semantic information retrieval model using keyword matching indexing system.

    PubMed

    Uthayan, K R; Mala, G S Anandha

    2015-01-01

    Ontology is the process of growth and elucidation of concepts of an information domain being common for a group of users. Establishing ontology into information retrieval is a normal method to develop searching effects of relevant information users require. Keywords matching process with historical or information domain is significant in recent calculations for assisting the best match for specific input queries. This research presents a better querying mechanism for information retrieval which integrates the ontology queries with keyword search. The ontology-based query is changed into a primary order to predicate logic uncertainty which is used for routing the query to the appropriate servers. Matching algorithms characterize warm area of researches in computer science and artificial intelligence. In text matching, it is more dependable to study semantics model and query for conditions of semantic matching. This research develops the semantic matching results between input queries and information in ontology field. The contributed algorithm is a hybrid method that is based on matching extracted instances from the queries and information field. The queries and information domain is focused on semantic matching, to discover the best match and to progress the executive process. In conclusion, the hybrid ontology in semantic web is sufficient to retrieve the documents when compared to standard ontology.

  13. Hybrid Ontology for Semantic Information Retrieval Model Using Keyword Matching Indexing System

    PubMed Central

    Uthayan, K. R.; Anandha Mala, G. S.

    2015-01-01

    Ontology is the process of growth and elucidation of concepts of an information domain being common for a group of users. Establishing ontology into information retrieval is a normal method to develop searching effects of relevant information users require. Keywords matching process with historical or information domain is significant in recent calculations for assisting the best match for specific input queries. This research presents a better querying mechanism for information retrieval which integrates the ontology queries with keyword search. The ontology-based query is changed into a primary order to predicate logic uncertainty which is used for routing the query to the appropriate servers. Matching algorithms characterize warm area of researches in computer science and artificial intelligence. In text matching, it is more dependable to study semantics model and query for conditions of semantic matching. This research develops the semantic matching results between input queries and information in ontology field. The contributed algorithm is a hybrid method that is based on matching extracted instances from the queries and information field. The queries and information domain is focused on semantic matching, to discover the best match and to progress the executive process. In conclusion, the hybrid ontology in semantic web is sufficient to retrieve the documents when compared to standard ontology. PMID:25922851

  14. Multidimensional indexing structure for use with linear optimization queries

    NASA Technical Reports Server (NTRS)

    Bergman, Lawrence David (Inventor); Castelli, Vittorio (Inventor); Chang, Yuan-Chi (Inventor); Li, Chung-Sheng (Inventor); Smith, John Richard (Inventor)

    2002-01-01

    Linear optimization queries, which usually arise in various decision support and resource planning applications, are queries that retrieve top N data records (where N is an integer greater than zero) which satisfy a specific optimization criterion. The optimization criterion is to either maximize or minimize a linear equation. The coefficients of the linear equation are given at query time. Methods and apparatus are disclosed for constructing, maintaining and utilizing a multidimensional indexing structure of database records to improve the execution speed of linear optimization queries. Database records with numerical attributes are organized into a number of layers and each layer represents a geometric structure called convex hull. Such linear optimization queries are processed by searching from the outer-most layer of this multi-layer indexing structure inwards. At least one record per layer will satisfy the query criterion and the number of layers needed to be searched depends on the spatial distribution of records, the query-issued linear coefficients, and N, the number of records to be returned. When N is small compared to the total size of the database, answering the query typically requires searching only a small fraction of all relevant records, resulting in a tremendous speedup as compared to linearly scanning the entire dataset.

  15. Processing SPARQL queries with regular expressions in RDF databases

    PubMed Central

    2011-01-01

    Background As the Resource Description Framework (RDF) data model is widely used for modeling and sharing a lot of online bioinformatics resources such as Uniprot (dev.isb-sib.ch/projects/uniprot-rdf) or Bio2RDF (bio2rdf.org), SPARQL - a W3C recommendation query for RDF databases - has become an important query language for querying the bioinformatics knowledge bases. Moreover, due to the diversity of users’ requests for extracting information from the RDF data as well as the lack of users’ knowledge about the exact value of each fact in the RDF databases, it is desirable to use the SPARQL query with regular expression patterns for querying the RDF data. To the best of our knowledge, there is currently no work that efficiently supports regular expression processing in SPARQL over RDF databases. Most of the existing techniques for processing regular expressions are designed for querying a text corpus, or only for supporting the matching over the paths in an RDF graph. Results In this paper, we propose a novel framework for supporting regular expression processing in SPARQL query. Our contributions can be summarized as follows. 1) We propose an efficient framework for processing SPARQL queries with regular expression patterns in RDF databases. 2) We propose a cost model in order to adapt the proposed framework in the existing query optimizers. 3) We build a prototype for the proposed framework in C++ and conduct extensive experiments demonstrating the efficiency and effectiveness of our technique. Conclusions Experiments with a full-blown RDF engine show that our framework outperforms the existing ones by up to two orders of magnitude in processing SPARQL queries with regular expression patterns. PMID:21489225

  16. Processing SPARQL queries with regular expressions in RDF databases.

    PubMed

    Lee, Jinsoo; Pham, Minh-Duc; Lee, Jihwan; Han, Wook-Shin; Cho, Hune; Yu, Hwanjo; Lee, Jeong-Hoon

    2011-03-29

    As the Resource Description Framework (RDF) data model is widely used for modeling and sharing a lot of online bioinformatics resources such as Uniprot (dev.isb-sib.ch/projects/uniprot-rdf) or Bio2RDF (bio2rdf.org), SPARQL - a W3C recommendation query for RDF databases - has become an important query language for querying the bioinformatics knowledge bases. Moreover, due to the diversity of users' requests for extracting information from the RDF data as well as the lack of users' knowledge about the exact value of each fact in the RDF databases, it is desirable to use the SPARQL query with regular expression patterns for querying the RDF data. To the best of our knowledge, there is currently no work that efficiently supports regular expression processing in SPARQL over RDF databases. Most of the existing techniques for processing regular expressions are designed for querying a text corpus, or only for supporting the matching over the paths in an RDF graph. In this paper, we propose a novel framework for supporting regular expression processing in SPARQL query. Our contributions can be summarized as follows. 1) We propose an efficient framework for processing SPARQL queries with regular expression patterns in RDF databases. 2) We propose a cost model in order to adapt the proposed framework in the existing query optimizers. 3) We build a prototype for the proposed framework in C++ and conduct extensive experiments demonstrating the efficiency and effectiveness of our technique. Experiments with a full-blown RDF engine show that our framework outperforms the existing ones by up to two orders of magnitude in processing SPARQL queries with regular expression patterns.

  17. Exploring performance issues for a clinical database organized using an entity-attribute-value representation.

    PubMed

    Chen, R S; Nadkarni, P; Marenco, L; Levin, F; Erdos, J; Miller, P L

    2000-01-01

    The entity-attribute-value representation with classes and relationships (EAV/CR) provides a flexible and simple database schema to store heterogeneous biomedical data. In certain circumstances, however, the EAV/CR model is known to retrieve data less efficiently than conventionally based database schemas. To perform a pilot study that systematically quantifies performance differences for database queries directed at real-world microbiology data modeled with EAV/CR and conventional representations, and to explore the relative merits of different EAV/CR query implementation strategies. Clinical microbiology data obtained over a ten-year period were stored using both database models. Query execution times were compared for four clinically oriented attribute-centered and entity-centered queries operating under varying conditions of database size and system memory. The performance characteristics of three different EAV/CR query strategies were also examined. Performance was similar for entity-centered queries in the two database models. Performance in the EAV/CR model was approximately three to five times less efficient than its conventional counterpart for attribute-centered queries. The differences in query efficiency became slightly greater as database size increased, although they were reduced with the addition of system memory. The authors found that EAV/CR queries formulated using multiple, simple SQL statements executed in batch were more efficient than single, large SQL statements. This paper describes a pilot project to explore issues in and compare query performance for EAV/CR and conventional database representations. Although attribute-centered queries were less efficient in the EAV/CR model, these inefficiencies may be addressable, at least in part, by the use of more powerful hardware or more memory, or both.

  18. Correlation between National Influenza Surveillance Data and Search Queries from Mobile Devices and Desktops in South Korea

    PubMed Central

    Seo, Dong-Woo; Sohn, Chang Hwan; Kim, Sung-Hoon; Ryoo, Seung Mok; Lee, Yoon-Seon; Lee, Jae Ho; Kim, Won Young; Lim, Kyoung Soo

    2016-01-01

    Background Digital surveillance using internet search queries can improve both the sensitivity and timeliness of the detection of a health event, such as an influenza outbreak. While it has recently been estimated that the mobile search volume surpasses the desktop search volume and mobile search patterns differ from desktop search patterns, the previous digital surveillance systems did not distinguish mobile and desktop search queries. The purpose of this study was to compare the performance of mobile and desktop search queries in terms of digital influenza surveillance. Methods and Results The study period was from September 6, 2010 through August 30, 2014, which consisted of four epidemiological years. Influenza-like illness (ILI) and virologic surveillance data from the Korea Centers for Disease Control and Prevention were used. A total of 210 combined queries from our previous survey work were used for this study. Mobile and desktop weekly search data were extracted from Naver, which is the largest search engine in Korea. Spearman’s correlation analysis was used to examine the correlation of the mobile and desktop data with ILI and virologic data in Korea. We also performed lag correlation analysis. We observed that the influenza surveillance performance of mobile search queries matched or exceeded that of desktop search queries over time. The mean correlation coefficients of mobile search queries and the number of queries with an r-value of ≥ 0.7 equaled or became greater than those of desktop searches over the four epidemiological years. A lag correlation analysis of up to two weeks showed similar trends. Conclusion Our study shows that mobile search queries for influenza surveillance have equaled or even become greater than desktop search queries over time. In the future development of influenza surveillance using search queries, the recognition of changing trend of mobile search data could be necessary. PMID:27391028

  19. Correlation between National Influenza Surveillance Data and Search Queries from Mobile Devices and Desktops in South Korea.

    PubMed

    Shin, Soo-Yong; Kim, Taerim; Seo, Dong-Woo; Sohn, Chang Hwan; Kim, Sung-Hoon; Ryoo, Seung Mok; Lee, Yoon-Seon; Lee, Jae Ho; Kim, Won Young; Lim, Kyoung Soo

    2016-01-01

    Digital surveillance using internet search queries can improve both the sensitivity and timeliness of the detection of a health event, such as an influenza outbreak. While it has recently been estimated that the mobile search volume surpasses the desktop search volume and mobile search patterns differ from desktop search patterns, the previous digital surveillance systems did not distinguish mobile and desktop search queries. The purpose of this study was to compare the performance of mobile and desktop search queries in terms of digital influenza surveillance. The study period was from September 6, 2010 through August 30, 2014, which consisted of four epidemiological years. Influenza-like illness (ILI) and virologic surveillance data from the Korea Centers for Disease Control and Prevention were used. A total of 210 combined queries from our previous survey work were used for this study. Mobile and desktop weekly search data were extracted from Naver, which is the largest search engine in Korea. Spearman's correlation analysis was used to examine the correlation of the mobile and desktop data with ILI and virologic data in Korea. We also performed lag correlation analysis. We observed that the influenza surveillance performance of mobile search queries matched or exceeded that of desktop search queries over time. The mean correlation coefficients of mobile search queries and the number of queries with an r-value of ≥ 0.7 equaled or became greater than those of desktop searches over the four epidemiological years. A lag correlation analysis of up to two weeks showed similar trends. Our study shows that mobile search queries for influenza surveillance have equaled or even become greater than desktop search queries over time. In the future development of influenza surveillance using search queries, the recognition of changing trend of mobile search data could be necessary.

  20. Searching for cancer information on the internet: analyzing natural language search queries.

    PubMed

    Bader, Judith L; Theofanos, Mary Frances

    2003-12-11

    Searching for health information is one of the most-common tasks performed by Internet users. Many users begin searching on popular search engines rather than on prominent health information sites. We know that many visitors to our (National Cancer Institute) Web site, cancer.gov, arrive via links in search engine result. To learn more about the specific needs of our general-public users, we wanted to understand what lay users really wanted to know about cancer, how they phrased their questions, and how much detail they used. The National Cancer Institute partnered with AskJeeves, Inc to develop a methodology to capture, sample, and analyze 3 months of cancer-related queries on the Ask.com Web site, a prominent United States consumer search engine, which receives over 35 million queries per week. Using a benchmark set of 500 terms and word roots supplied by the National Cancer Institute, AskJeeves identified a test sample of cancer queries for 1 week in August 2001. From these 500 terms only 37 appeared >or= 5 times/day over the trial test week in 17208 queries. Using these 37 terms, 204165 instances of cancer queries were found in the Ask.com query logs for the actual test period of June-August 2001. Of these, 7500 individual user questions were randomly selected for detailed analysis and assigned to appropriate categories. The exact language of sample queries is presented. Considering multiples of the same questions, the sample of 7500 individual user queries represented 76077 queries (37% of the total 3-month pool). Overall 78.37% of sampled Cancer queries asked about 14 specific cancer types. Within each cancer type, queries were sorted into appropriate subcategories including at least the following: General Information, Symptoms, Diagnosis and Testing, Treatment, Statistics, Definition, and Cause/Risk/Link. The most-common specific cancer types mentioned in queries were Digestive/Gastrointestinal/Bowel (15.0%), Breast (11.7%), Skin (11.3%), and Genitourinary (10.5%). Additional subcategories of queries about specific cancer types varied, depending on user input. Queries that were not specific to a cancer type were also tracked and categorized. Natural-language searching affords users the opportunity to fully express their information needs and can aid users naïve to the content and vocabulary. The specific queries analyzed for this study reflect news and research studies reported during the study dates and would surely change with different study dates. Analyzing queries from search engines represents one way of knowing what kinds of content to provide to users of a given Web site. Users ask questions using whole sentences and keywords, often misspelling words. Providing the option for natural-language searching does not obviate the need for good information architecture, usability engineering, and user testing in order to optimize user experience.

  1. Searching for Cancer Information on the Internet: Analyzing Natural Language Search Queries

    PubMed Central

    Theofanos, Mary Frances

    2003-01-01

    Background Searching for health information is one of the most-common tasks performed by Internet users. Many users begin searching on popular search engines rather than on prominent health information sites. We know that many visitors to our (National Cancer Institute) Web site, cancer.gov, arrive via links in search engine result. Objective To learn more about the specific needs of our general-public users, we wanted to understand what lay users really wanted to know about cancer, how they phrased their questions, and how much detail they used. Methods The National Cancer Institute partnered with AskJeeves, Inc to develop a methodology to capture, sample, and analyze 3 months of cancer-related queries on the Ask.com Web site, a prominent United States consumer search engine, which receives over 35 million queries per week. Using a benchmark set of 500 terms and word roots supplied by the National Cancer Institute, AskJeeves identified a test sample of cancer queries for 1 week in August 2001. From these 500 terms only 37 appeared ≥ 5 times/day over the trial test week in 17208 queries. Using these 37 terms, 204165 instances of cancer queries were found in the Ask.com query logs for the actual test period of June-August 2001. Of these, 7500 individual user questions were randomly selected for detailed analysis and assigned to appropriate categories. The exact language of sample queries is presented. Results Considering multiples of the same questions, the sample of 7500 individual user queries represented 76077 queries (37% of the total 3-month pool). Overall 78.37% of sampled Cancer queries asked about 14 specific cancer types. Within each cancer type, queries were sorted into appropriate subcategories including at least the following: General Information, Symptoms, Diagnosis and Testing, Treatment, Statistics, Definition, and Cause/Risk/Link. The most-common specific cancer types mentioned in queries were Digestive/Gastrointestinal/Bowel (15.0%), Breast (11.7%), Skin (11.3%), and Genitourinary (10.5%). Additional subcategories of queries about specific cancer types varied, depending on user input. Queries that were not specific to a cancer type were also tracked and categorized. Conclusions Natural-language searching affords users the opportunity to fully express their information needs and can aid users naïve to the content and vocabulary. The specific queries analyzed for this study reflect news and research studies reported during the study dates and would surely change with different study dates. Analyzing queries from search engines represents one way of knowing what kinds of content to provide to users of a given Web site. Users ask questions using whole sentences and keywords, often misspelling words. Providing the option for natural-language searching does not obviate the need for good information architecture, usability engineering, and user testing in order to optimize user experience. PMID:14713659

  2. Intelligent Data Granulation on Load: Improving Infobright's Knowledge Grid

    NASA Astrophysics Data System (ADS)

    Ślęzak, Dominik; Kowalski, Marcin

    One of the major aspects of Infobright's relational database technology is automatic decomposition of each of data tables onto Rough Rows, each consisting of 64K of original rows. Rough Rows are automatically annotated by Knowledge Nodes that represent compact information about the rows' values. Query performance depends on the quality of Knowledge Nodes, i.e., their efficiency in minimizing the access to the compressed portions of data stored on disk, according to the specific query optimization procedures. We show how to implement the mechanism of organizing the incoming data into such Rough Rows that maximize the quality of the corresponding Knowledge Nodes. Given clear business-driven requirements, the implemented mechanism needs to be fully integrated with the data load process, causing no decrease in the data load speed. The performance gain resulting from better data organization is illustrated by some tests over our benchmark data. The differences between the proposed mechanism and some well-known procedures of database clustering or partitioning are discussed. The paper is a continuation of our patent application [22].

  3. Selected Topics from LVCSR Research for Asian Languages at Tokyo Tech

    NASA Astrophysics Data System (ADS)

    Furui, Sadaoki

    This paper presents our recent work in regard to building Large Vocabulary Continuous Speech Recognition (LVCSR) systems for the Thai, Indonesian, and Chinese languages. For Thai, since there is no word boundary in the written form, we have proposed a new method for automatically creating word-like units from a text corpus, and applied topic and speaking style adaptation to the language model to recognize spoken-style utterances. For Indonesian, we have applied proper noun-specific adaptation to acoustic modeling, and rule-based English-to-Indonesian phoneme mapping to solve the problem of large variation in proper noun and English word pronunciation in a spoken-query information retrieval system. In spoken Chinese, long organization names are frequently abbreviated, and abbreviated utterances cannot be recognized if the abbreviations are not included in the dictionary. We have proposed a new method for automatically generating Chinese abbreviations, and by expanding the vocabulary using the generated abbreviations, we have significantly improved the performance of spoken query-based search.

  4. Searching for Images: The Analysis of Users' Queries for Image Retrieval in American History.

    ERIC Educational Resources Information Center

    Choi, Youngok; Rasmussen, Edie M.

    2003-01-01

    Studied users' queries for visual information in American history to identify the image attributes important for retrieval and the characteristics of users' queries for digital images, based on queries from 38 faculty and graduate students. Results of pre- and post-test questionnaires and interviews suggest principle categories of search terms.…

  5. Searching and Filtering Tweets: CSIRO at the TREC 2012 Microblog Track

    DTIC Science & Technology

    2012-11-01

    stages. We first evaluate the effect of tweet corpus pre- processing in vanilla runs (no query expansion), and then assess the effect of query expansion...Effect of a vanilla run on D4 index (both realtime and non-real-time), and query expansion methods based on the submitted runs for two sets of queries

  6. Knowledge Query Language (KQL)

    DTIC Science & Technology

    2016-02-01

    unlimited. This page intentionally left blank. iii EXECUTIVE SUMMARY Currently, queries for data ...retrieval from non-Structured Query Language (NoSQL) data stores are tightly coupled to the specific implementation of the data store implementation, making...of the storage content and format for querying NoSQL or relational data stores. This approach uses address expressions (or A-Expressions) embedded in

  7. System, method and apparatus for conducting a keyterm search

    NASA Technical Reports Server (NTRS)

    McGreevy, Michael W. (Inventor)

    2004-01-01

    A keyterm search is a method of searching a database for subsets of the database that are relevant to an input query. First, a number of relational models of subsets of a database are provided. A query is then input. The query can include one or more keyterms. Next, a gleaning model of the query is created. The gleaning model of the query is then compared to each one of the relational models of subsets of the database. The identifiers of the relevant subsets are then output.

  8. System, method and apparatus for conducting a phrase search

    NASA Technical Reports Server (NTRS)

    McGreevy, Michael W. (Inventor)

    2004-01-01

    A phrase search is a method of searching a database for subsets of the database that are relevant to an input query. First, a number of relational models of subsets of a database are provided. A query is then input. The query can include one or more sequences of terms. Next, a relational model of the query is created. The relational model of the query is then compared to each one of the relational models of subsets of the database. The identifiers of the relevant subsets are then output.

  9. Targeted exploration and analysis of large cross-platform human transcriptomic compendia

    PubMed Central

    Zhu, Qian; Wong, Aaron K; Krishnan, Arjun; Aure, Miriam R; Tadych, Alicja; Zhang, Ran; Corney, David C; Greene, Casey S; Bongo, Lars A; Kristensen, Vessela N; Charikar, Moses; Li, Kai; Troyanskaya, Olga G.

    2016-01-01

    We present SEEK (http://seek.princeton.edu), a query-based search engine across very large transcriptomic data collections, including thousands of human data sets from almost 50 microarray and next-generation sequencing platforms. SEEK uses a novel query-level cross-validation-based algorithm to automatically prioritize data sets relevant to the query and a robust search approach to identify query-coregulated genes, pathways, and processes. SEEK provides cross-platform handling, multi-gene query search, iterative metadata-based search refinement, and extensive visualization-based analysis options. PMID:25581801

  10. Optimization of the Controlled Evaluation of Closed Relational Queries

    NASA Astrophysics Data System (ADS)

    Biskup, Joachim; Lochner, Jan-Hendrik; Sonntag, Sebastian

    For relational databases, controlled query evaluation is an effective inference control mechanism preserving confidentiality regarding a previously declared confidentiality policy. Implementations of controlled query evaluation usually lack efficiency due to costly theorem prover calls. Suitably constrained controlled query evaluation can be implemented efficiently, but is not flexible enough from the perspective of database users and security administrators. In this paper, we propose an optimized framework for controlled query evaluation in relational databases, being efficiently implementable on the one hand and relaxing the constraints of previous approaches on the other hand.

  11. Modelling the spatial distribution of Fasciola hepatica in bovines using decision tree, logistic regression and GIS query approaches for Brazil.

    PubMed

    Bennema, S C; Molento, M B; Scholte, R G; Carvalho, O S; Pritsch, I

    2017-11-01

    Fascioliasis is a condition caused by the trematode Fasciola hepatica. In this paper, the spatial distribution of F. hepatica in bovines in Brazil was modelled using a decision tree approach and a logistic regression, combined with a geographic information system (GIS) query. In the decision tree and the logistic model, isothermality had the strongest influence on disease prevalence. Also, the 50-year average precipitation in the warmest quarter of the year was included as a risk factor, having a negative influence on the parasite prevalence. The risk maps developed using both techniques, showed a predicted higher prevalence mainly in the South of Brazil. The prediction performance seemed to be high, but both techniques failed to reach a high accuracy in predicting the medium and high prevalence classes to the entire country. The GIS query map, based on the range of isothermality, minimum temperature of coldest month, precipitation of warmest quarter of the year, altitude and the average dailyland surface temperature, showed a possibility of presence of F. hepatica in a very large area. The risk maps produced using these methods can be used to focus activities of animal and public health programmes, even on non-evaluated F. hepatica areas.

  12. High Performance Analytics with the R3-Cache

    NASA Astrophysics Data System (ADS)

    Eavis, Todd; Sayeed, Ruhan

    Contemporary data warehouses now represent some of the world’s largest databases. As these systems grow in size and complexity, however, it becomes increasingly difficult for brute force query processing approaches to meet the performance demands of end users. Certainly, improved indexing and more selective view materialization are helpful in this regard. Nevertheless, with warehouses moving into the multi-terabyte range, it is clear that the minimization of external memory accesses must be a primary performance objective. In this paper, we describe the R 3-cache, a natively multi-dimensional caching framework designed specifically to support sophisticated warehouse/OLAP environments. R 3-cache is based upon an in-memory version of the R-tree that has been extended to support buffer pages rather than disk blocks. A key strength of the R 3-cache is that it is able to utilize multi-dimensional fragments of previous query results so as to significantly minimize the frequency and scale of disk accesses. Moreover, the new caching model directly accommodates the standard relational storage model and provides mechanisms for pro-active updates that exploit the existence of query “hot spots”. The current prototype has been evaluated as a component of the Sidera DBMS, a “shared nothing” parallel OLAP server designed for multi-terabyte analytics. Experimental results demonstrate significant performance improvements relative to simpler alternatives.

  13. Diurnal Variations of Depression-Related Health Information Seeking: Case Study in Finland Using Google Trends Data

    PubMed Central

    Kettunen, Jyrki; Eirola, Emil; Paakkonen, Heikki

    2018-01-01

    Background Some of the temporal variations and clock-like rhythms that govern several different health-related behaviors can be traced in near real-time with the help of search engine data. This is especially useful when studying phenomena where little or no traditional data exist. One specific area where traditional data are incomplete is the study of diurnal mood variations, or daily changes in individuals’ overall mood state in relation to depression-like symptoms. Objective The objective of this exploratory study was to analyze diurnal variations for interest in depression on the Web to discover hourly patterns of depression interest and help seeking. Methods Hourly query volume data for 6 depression-related queries in Finland were downloaded from Google Trends in March 2017. A continuous wavelet transform (CWT) was applied to the hourly data to focus on the diurnal variation. Longer term trends and noise were also eliminated from the data to extract the diurnal variation for each query term. An analysis of variance was conducted to determine the statistical differences between the distributions of each hour. Data were also trichotomized and analyzed in 3 time blocks to make comparisons between different time periods during the day. Results Search volumes for all depression-related query terms showed a unimodal regular pattern during the 24 hours of the day. All queries feature clear peaks during the nighttime hours around 11 PM to 4 AM and troughs between 5 AM and 10 PM. In the means of the CWT-reconstructed data, the differences in nighttime and daytime interest are evident, with a difference of 37.3 percentage points (pp) for the term “Depression,” 33.5 pp for “Masennustesti,” 30.6 pp for “Masennus,” 12.8 pp for “Depression test,” 12.0 pp for “Masennus testi,” and 11.8 pp for “Masennus oireet.” The trichotomization showed peaks in the first time block (00.00 AM-7.59 AM) for all 6 terms. The search volumes then decreased significantly during the second time block (8.00 AM-3.59 PM) for the terms “Masennus oireet” (P<.001), “Masennus” (P=.001), “Depression” (P=.005), and “Depression test” (P=.004). Higher search volumes for the terms “Masennus” (P=.14), “Masennustesti” (P=.07), and “Depression test” (P=.10) were present between the second and third time blocks. Conclusions Help seeking for depression has clear diurnal patterns, with significant rise in depression-related query volumes toward the evening and night. Thus, search engine query data support the notion of the evening-worse pattern in diurnal mood variation. Information on the timely nature of depression-related interest on an hourly level could improve the chances for early intervention, which is beneficial for positive health outcomes. PMID:29792291

  14. Diurnal Variations of Depression-Related Health Information Seeking: Case Study in Finland Using Google Trends Data.

    PubMed

    Tana, Jonas Christoffer; Kettunen, Jyrki; Eirola, Emil; Paakkonen, Heikki

    2018-05-23

    Some of the temporal variations and clock-like rhythms that govern several different health-related behaviors can be traced in near real-time with the help of search engine data. This is especially useful when studying phenomena where little or no traditional data exist. One specific area where traditional data are incomplete is the study of diurnal mood variations, or daily changes in individuals' overall mood state in relation to depression-like symptoms. The objective of this exploratory study was to analyze diurnal variations for interest in depression on the Web to discover hourly patterns of depression interest and help seeking. Hourly query volume data for 6 depression-related queries in Finland were downloaded from Google Trends in March 2017. A continuous wavelet transform (CWT) was applied to the hourly data to focus on the diurnal variation. Longer term trends and noise were also eliminated from the data to extract the diurnal variation for each query term. An analysis of variance was conducted to determine the statistical differences between the distributions of each hour. Data were also trichotomized and analyzed in 3 time blocks to make comparisons between different time periods during the day. Search volumes for all depression-related query terms showed a unimodal regular pattern during the 24 hours of the day. All queries feature clear peaks during the nighttime hours around 11 PM to 4 AM and troughs between 5 AM and 10 PM. In the means of the CWT-reconstructed data, the differences in nighttime and daytime interest are evident, with a difference of 37.3 percentage points (pp) for the term "Depression," 33.5 pp for "Masennustesti," 30.6 pp for "Masennus," 12.8 pp for "Depression test," 12.0 pp for "Masennus testi," and 11.8 pp for "Masennus oireet." The trichotomization showed peaks in the first time block (00.00 AM-7.59 AM) for all 6 terms. The search volumes then decreased significantly during the second time block (8.00 AM-3.59 PM) for the terms "Masennus oireet" (P<.001), "Masennus" (P=.001), "Depression" (P=.005), and "Depression test" (P=.004). Higher search volumes for the terms "Masennus" (P=.14), "Masennustesti" (P=.07), and "Depression test" (P=.10) were present between the second and third time blocks. Help seeking for depression has clear diurnal patterns, with significant rise in depression-related query volumes toward the evening and night. Thus, search engine query data support the notion of the evening-worse pattern in diurnal mood variation. Information on the timely nature of depression-related interest on an hourly level could improve the chances for early intervention, which is beneficial for positive health outcomes. ©Jonas Christoffer Tana, Jyrki Kettunen, Emil Eirola, Heikki Paakkonen. Originally published in JMIR Mental Health (http://mental.jmir.org), 23.05.2018.

  15. An index-based algorithm for fast on-line query processing of latent semantic analysis

    PubMed Central

    Li, Pohan; Wang, Wei

    2017-01-01

    Latent Semantic Analysis (LSA) is widely used for finding the documents whose semantic is similar to the query of keywords. Although LSA yield promising similar results, the existing LSA algorithms involve lots of unnecessary operations in similarity computation and candidate check during on-line query processing, which is expensive in terms of time cost and cannot efficiently response the query request especially when the dataset becomes large. In this paper, we study the efficiency problem of on-line query processing for LSA towards efficiently searching the similar documents to a given query. We rewrite the similarity equation of LSA combined with an intermediate value called partial similarity that is stored in a designed index called partial index. For reducing the searching space, we give an approximate form of similarity equation, and then develop an efficient algorithm for building partial index, which skips the partial similarities lower than a given threshold θ. Based on partial index, we develop an efficient algorithm called ILSA for supporting fast on-line query processing. The given query is transformed into a pseudo document vector, and the similarities between query and candidate documents are computed by accumulating the partial similarities obtained from the index nodes corresponds to non-zero entries in the pseudo document vector. Compared to the LSA algorithm, ILSA reduces the time cost of on-line query processing by pruning the candidate documents that are not promising and skipping the operations that make little contribution to similarity scores. Extensive experiments through comparison with LSA have been done, which demonstrate the efficiency and effectiveness of our proposed algorithm. PMID:28520747

  16. Analyzing Medical Image Search Behavior: Semantics and Prediction of Query Results.

    PubMed

    De-Arteaga, Maria; Eggel, Ivan; Kahn, Charles E; Müller, Henning

    2015-10-01

    Log files of information retrieval systems that record user behavior have been used to improve the outcomes of retrieval systems, understand user behavior, and predict events. In this article, a log file of the ARRS GoldMiner search engine containing 222,005 consecutive queries is analyzed. Time stamps are available for each query, as well as masked IP addresses, which enables to identify queries from the same person. This article describes the ways in which physicians (or Internet searchers interested in medical images) search and proposes potential improvements by suggesting query modifications. For example, many queries contain only few terms and therefore are not specific; others contain spelling mistakes or non-medical terms that likely lead to poor or empty results. One of the goals of this report is to predict the number of results a query will have since such a model allows search engines to automatically propose query modifications in order to avoid result lists that are empty or too large. This prediction is made based on characteristics of the query terms themselves. Prediction of empty results has an accuracy above 88%, and thus can be used to automatically modify the query to avoid empty result sets for a user. The semantic analysis and data of reformulations done by users in the past can aid the development of better search systems, particularly to improve results for novice users. Therefore, this paper gives important ideas to better understand how people search and how to use this knowledge to improve the performance of specialized medical search engines.

  17. An index-based algorithm for fast on-line query processing of latent semantic analysis.

    PubMed

    Zhang, Mingxi; Li, Pohan; Wang, Wei

    2017-01-01

    Latent Semantic Analysis (LSA) is widely used for finding the documents whose semantic is similar to the query of keywords. Although LSA yield promising similar results, the existing LSA algorithms involve lots of unnecessary operations in similarity computation and candidate check during on-line query processing, which is expensive in terms of time cost and cannot efficiently response the query request especially when the dataset becomes large. In this paper, we study the efficiency problem of on-line query processing for LSA towards efficiently searching the similar documents to a given query. We rewrite the similarity equation of LSA combined with an intermediate value called partial similarity that is stored in a designed index called partial index. For reducing the searching space, we give an approximate form of similarity equation, and then develop an efficient algorithm for building partial index, which skips the partial similarities lower than a given threshold θ. Based on partial index, we develop an efficient algorithm called ILSA for supporting fast on-line query processing. The given query is transformed into a pseudo document vector, and the similarities between query and candidate documents are computed by accumulating the partial similarities obtained from the index nodes corresponds to non-zero entries in the pseudo document vector. Compared to the LSA algorithm, ILSA reduces the time cost of on-line query processing by pruning the candidate documents that are not promising and skipping the operations that make little contribution to similarity scores. Extensive experiments through comparison with LSA have been done, which demonstrate the efficiency and effectiveness of our proposed algorithm.

  18. Bat-Inspired Algorithm Based Query Expansion for Medical Web Information Retrieval.

    PubMed

    Khennak, Ilyes; Drias, Habiba

    2017-02-01

    With the increasing amount of medical data available on the Web, looking for health information has become one of the most widely searched topics on the Internet. Patients and people of several backgrounds are now using Web search engines to acquire medical information, including information about a specific disease, medical treatment or professional advice. Nonetheless, due to a lack of medical knowledge, many laypeople have difficulties in forming appropriate queries to articulate their inquiries, which deem their search queries to be imprecise due the use of unclear keywords. The use of these ambiguous and vague queries to describe the patients' needs has resulted in a failure of Web search engines to retrieve accurate and relevant information. One of the most natural and promising method to overcome this drawback is Query Expansion. In this paper, an original approach based on Bat Algorithm is proposed to improve the retrieval effectiveness of query expansion in medical field. In contrast to the existing literature, the proposed approach uses Bat Algorithm to find the best expanded query among a set of expanded query candidates, while maintaining low computational complexity. Moreover, this new approach allows the determination of the length of the expanded query empirically. Numerical results on MEDLINE, the on-line medical information database, show that the proposed approach is more effective and efficient compared to the baseline.

  19. Semantic querying of relational data for clinical intelligence: a semantic web services-based approach

    PubMed Central

    2013-01-01

    Background Clinical Intelligence, as a research and engineering discipline, is dedicated to the development of tools for data analysis for the purposes of clinical research, surveillance, and effective health care management. Self-service ad hoc querying of clinical data is one desirable type of functionality. Since most of the data are currently stored in relational or similar form, ad hoc querying is problematic as it requires specialised technical skills and the knowledge of particular data schemas. Results A possible solution is semantic querying where the user formulates queries in terms of domain ontologies that are much easier to navigate and comprehend than data schemas. In this article, we are exploring the possibility of using SADI Semantic Web services for semantic querying of clinical data. We have developed a prototype of a semantic querying infrastructure for the surveillance of, and research on, hospital-acquired infections. Conclusions Our results suggest that SADI can support ad-hoc, self-service, semantic queries of relational data in a Clinical Intelligence context. The use of SADI compares favourably with approaches based on declarative semantic mappings from data schemas to ontologies, such as query rewriting and RDFizing by materialisation, because it can easily cope with situations when (i) some computation is required to turn relational data into RDF or OWL, e.g., to implement temporal reasoning, or (ii) integration with external data sources is necessary. PMID:23497556

  20. Efficient Queries of Stand-off Annotations for Natural Language Processing on Electronic Medical Records.

    PubMed

    Luo, Yuan; Szolovits, Peter

    2016-01-01

    In natural language processing, stand-off annotation uses the starting and ending positions of an annotation to anchor it to the text and stores the annotation content separately from the text. We address the fundamental problem of efficiently storing stand-off annotations when applying natural language processing on narrative clinical notes in electronic medical records (EMRs) and efficiently retrieving such annotations that satisfy position constraints. Efficient storage and retrieval of stand-off annotations can facilitate tasks such as mapping unstructured text to electronic medical record ontologies. We first formulate this problem into the interval query problem, for which optimal query/update time is in general logarithm. We next perform a tight time complexity analysis on the basic interval tree query algorithm and show its nonoptimality when being applied to a collection of 13 query types from Allen's interval algebra. We then study two closely related state-of-the-art interval query algorithms, proposed query reformulations, and augmentations to the second algorithm. Our proposed algorithm achieves logarithmic time stabbing-max query time complexity and solves the stabbing-interval query tasks on all of Allen's relations in logarithmic time, attaining the theoretic lower bound. Updating time is kept logarithmic and the space requirement is kept linear at the same time. We also discuss interval management in external memory models and higher dimensions.

  1. Efficient Queries of Stand-off Annotations for Natural Language Processing on Electronic Medical Records

    PubMed Central

    Luo, Yuan; Szolovits, Peter

    2016-01-01

    In natural language processing, stand-off annotation uses the starting and ending positions of an annotation to anchor it to the text and stores the annotation content separately from the text. We address the fundamental problem of efficiently storing stand-off annotations when applying natural language processing on narrative clinical notes in electronic medical records (EMRs) and efficiently retrieving such annotations that satisfy position constraints. Efficient storage and retrieval of stand-off annotations can facilitate tasks such as mapping unstructured text to electronic medical record ontologies. We first formulate this problem into the interval query problem, for which optimal query/update time is in general logarithm. We next perform a tight time complexity analysis on the basic interval tree query algorithm and show its nonoptimality when being applied to a collection of 13 query types from Allen’s interval algebra. We then study two closely related state-of-the-art interval query algorithms, proposed query reformulations, and augmentations to the second algorithm. Our proposed algorithm achieves logarithmic time stabbing-max query time complexity and solves the stabbing-interval query tasks on all of Allen’s relations in logarithmic time, attaining the theoretic lower bound. Updating time is kept logarithmic and the space requirement is kept linear at the same time. We also discuss interval management in external memory models and higher dimensions. PMID:27478379

  2. A Natural Language Interface Concordant with a Knowledge Base.

    PubMed

    Han, Yong-Jin; Park, Seong-Bae; Park, Se-Young

    2016-01-01

    The discordance between expressions interpretable by a natural language interface (NLI) system and those answerable by a knowledge base is a critical problem in the field of NLIs. In order to solve this discordance problem, this paper proposes a method to translate natural language questions into formal queries that can be generated from a graph-based knowledge base. The proposed method considers a subgraph of a knowledge base as a formal query. Thus, all formal queries corresponding to a concept or a predicate in the knowledge base can be generated prior to query time and all possible natural language expressions corresponding to each formal query can also be collected in advance. A natural language expression has a one-to-one mapping with a formal query. Hence, a natural language question is translated into a formal query by matching the question with the most appropriate natural language expression. If the confidence of this matching is not sufficiently high the proposed method rejects the question and does not answer it. Multipredicate queries are processed by regarding them as a set of collected expressions. The experimental results show that the proposed method thoroughly handles answerable questions from the knowledge base and rejects unanswerable ones effectively.

  3. Saying What You're Looking For: Linguistics Meets Video Search.

    PubMed

    Barrett, Daniel Paul; Barbu, Andrei; Siddharth, N; Siskind, Jeffrey Mark

    2016-10-01

    We present an approach to searching large video corpora for clips which depict a natural-language query in the form of a sentence. Compositional semantics is used to encode subtle meaning differences lost in other approaches, such as the difference between two sentences which have identical words but entirely different meaning: The person rode the horse versus The horse rode the person. Given a sentential query and a natural-language parser, we produce a score indicating how well a video clip depicts that sentence for each clip in a corpus and return a ranked list of clips. Two fundamental problems are addressed simultaneously: detecting and tracking objects, and recognizing whether those tracks depict the query. Because both tracking and object detection are unreliable, our approach uses the sentential query to focus the tracker on the relevant participants and ensures that the resulting tracks are described by the sentential query. While most earlier work was limited to single-word queries which correspond to either verbs or nouns, we search for complex queries which contain multiple phrases, such as prepositional phrases, and modifiers, such as adverbs. We demonstrate this approach by searching for 2,627 naturally elicited sentential queries in 10 Hollywood movies.

  4. Context-Aware Online Commercial Intention Detection

    NASA Astrophysics Data System (ADS)

    Hu, Derek Hao; Shen, Dou; Sun, Jian-Tao; Yang, Qiang; Chen, Zheng

    With more and more commercial activities moving onto the Internet, people tend to purchase what they need through Internet or conduct some online research before the actual transactions happen. For many Web users, their online commercial activities start from submitting a search query to search engines. Just like the common Web search queries, the queries with commercial intention are usually very short. Recognizing the queries with commercial intention against the common queries will help search engines provide proper search results and advertisements, help Web users obtain the right information they desire and help the advertisers benefit from the potential transactions. However, the intentions behind a query vary a lot for users with different background and interest. The intentions can even be different for the same user, when the query is issued in different contexts. In this paper, we present a new algorithm framework based on skip-chain conditional random field (SCCRF) for automatically classifying Web queries according to context-based online commercial intention. We analyze our algorithm performance both theoretically and empirically. Extensive experiments on several real search engine log datasets show that our algorithm can improve more than 10% on F1 score than previous algorithms on commercial intention detection.

  5. Seasonal trends in sleep-disordered breathing: evidence from Internet search engine query data.

    PubMed

    Ingram, David G; Matthews, Camilla K; Plante, David T

    2015-03-01

    The primary aim of the current study was to test the hypothesis that there is a seasonal component to snoring and obstructive sleep apnea (OSA) through the use of Google search engine query data. Internet search engine query data were retrieved from Google Trends from January 2006 to December 2012. Monthly normalized search volume was obtained over that 7-year period in the USA and Australia for the following search terms: "snoring" and "sleep apnea". Seasonal effects were investigated by fitting cosinor regression models. In addition, the search terms "snoring children" and "sleep apnea children" were evaluated to examine seasonal effects in pediatric populations. Statistically significant seasonal effects were found using cosinor analysis in both USA and Australia for "snoring" (p < 0.00001 for both countries). Similarly, seasonal patterns were observed for "sleep apnea" in the USA (p = 0.001); however, cosinor analysis was not significant for this search term in Australia (p = 0.13). Seasonal patterns for "snoring children" and "sleep apnea children" were observed in the USA (p = 0.002 and p < 0.00001, respectively), with insufficient search volume to examine these search terms in Australia. All searches peaked in the winter or early spring in both countries, with the magnitude of seasonal effect ranging from 5 to 50 %. Our findings indicate that there are significant seasonal trends for both snoring and sleep apnea internet search engine queries, with a peak in the winter and early spring. Further research is indicated to determine the mechanisms underlying these findings, whether they have clinical impact, and if they are associated with other comorbid medical conditions that have similar patterns of seasonal exacerbation.

  6. Raman Spectra of Crystalline Double Calcium Orthovanadates Ca10M(VO4)7 (M = Li, K, Na) and Their Interpretation Based on Deconvolution Into Voigt Profiles

    NASA Astrophysics Data System (ADS)

    Khodasevich, I. A.; Voitikov, S. V.; Orlovich, V. A.; Kosmyna, M. B.; Shekhovtsov, A. N.

    2016-09-01

    Unpolarized spontaneous Raman spectra of crystalline double calcium orthovanadates Ca10M(VO4)7 (M = Li, K, Na) in the range 150-1600 cm-1 were measured. Two vibrational bands with full-width at half-maximum (FWHM) of 37-50 cm-1 were found in the regions 150-500 and 700-1000 cm-1. The band shapes were approximated well by deconvolution into Voigt profiles. The band at 700-1000 cm-1 was stronger and deconvoluted into eight Voigt profiles. The frequencies of two strong lines were ~848 and ~862 cm-1 for Ca10Li(VO4)7; ~850 and ~866 cm-1 for Ca10Na(VO4)7; and ~844 and ~866 cm-1 for Ca10K(VO4)7. The Lorentzian width parameters of these lines in the Voigt profiles were ~5 times greater than those of the Gaussian width parameters. The FWHM of the Voigt profiles were ~18-42 cm-1. The two strongest lines had widths of 21-25 cm-1. The vibrational band at 300-500 cm-1 was ~5-6 times weaker than that at 700-1000 cm-1 and was deconvoluted into four lines with widths of 25-40 cm-1. The large FWHM of the Raman lines indicated that the crystal structures were disordered. These crystals could be of interest for Raman conversion of pico- and femtosecond laser pulses because of the intense vibrations with large FWHM in the Raman spectra.

  7. Evaluation of Graph Pattern Matching Workloads in Graph Analysis Systems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hong, Seokyong; Lee, Sangkeun; Lim, Seung-Hwan

    2016-01-01

    Graph analysis has emerged as a powerful method for data scientists to represent, integrate, query, and explore heterogeneous data sources. As a result, graph data management and mining became a popular area of research, and led to the development of plethora of systems in recent years. Unfortunately, the number of emerging graph analysis systems and the wide range of applications, coupled with a lack of apples-to-apples comparisons, make it difficult to understand the trade-offs between different systems and the graph operations for which they are designed. A fair comparison of these systems is a challenging task for the following reasons:more » multiple data models, non-standardized serialization formats, various query interfaces to users, and diverse environments they operate in. To address these key challenges, in this paper we present a new benchmark suite by extending the Lehigh University Benchmark (LUBM) to cover the most common capabilities of various graph analysis systems. We provide the design process of the benchmark, which generalizes the workflow for data scientists to conduct the desired graph analysis on different graph analysis systems. Equipped with this extended benchmark suite, we present performance comparison for nine subgraph pattern retrieval operations over six graph analysis systems, namely NetworkX, Neo4j, Jena, Titan, GraphX, and uRiKA. Through the proposed benchmark suite, this study reveals both quantitative and qualitative findings in (1) implications in loading data into each system; (2) challenges in describing graph patterns for each query interface; and (3) different sensitivity of each system to query selectivity. We envision that this study will pave the road for: (i) data scientists to select the suitable graph analysis systems, and (ii) data management system designers to advance graph analysis systems.« less

  8. Incremental Query Rewriting with Resolution

    NASA Astrophysics Data System (ADS)

    Riazanov, Alexandre; Aragão, Marcelo A. T.

    We address the problem of semantic querying of relational databases (RDB) modulo knowledge bases using very expressive knowledge representation formalisms, such as full first-order logic or its various fragments. We propose to use a resolution-based first-order logic (FOL) reasoner for computing schematic answers to deductive queries, with the subsequent translation of these schematic answers to SQL queries which are evaluated using a conventional relational DBMS. We call our method incremental query rewriting, because an original semantic query is rewritten into a (potentially infinite) series of SQL queries. In this chapter, we outline the main idea of our technique - using abstractions of databases and constrained clauses for deriving schematic answers, and provide completeness and soundness proofs to justify the applicability of this technique to the case of resolution for FOL without equality. The proposed method can be directly used with regular RDBs, including legacy databases. Moreover, we propose it as a potential basis for an efficient Web-scale semantic search technology.

  9. Research on presentation and query service of geo-spatial data based on ontology

    NASA Astrophysics Data System (ADS)

    Li, Hong-wei; Li, Qin-chao; Cai, Chang

    2008-10-01

    The paper analyzed the deficiency on presentation and query of geo-spatial data existed in current GIS, discussed the advantages that ontology possessed in formalization of geo-spatial data and the presentation of semantic granularity, taken land-use classification system as an example to construct domain ontology, and described it by OWL; realized the grade level and category presentation of land-use data benefited from the thoughts of vertical and horizontal navigation; and then discussed query mode of geo-spatial data based on ontology, including data query based on types and grade levels, instances and spatial relation, and synthetic query based on types and instances; these methods enriched query mode of current GIS, and is a useful attempt; point out that the key point of the presentation and query of spatial data based on ontology is to construct domain ontology that can correctly reflect geo-concept and its spatial relation and realize its fine formalization description.

  10. Spanish for Business: A Journey into Employability

    ERIC Educational Resources Information Center

    Lallana, Amparo; Pastor-González, Victoria

    2016-01-01

    As language lecturers, we believe that we equip our graduates with a range of key skills that give them an edge in the employment market. But, query final year students of a Business and Languages degree on the value of language learning for employability, and they are likely to mention a small number of functional abilities such as CV writing and…

  11. Development of XML Schema for Broadband Digital Seismograms and Data Center Portal

    NASA Astrophysics Data System (ADS)

    Takeuchi, N.; Tsuboi, S.; Ishihara, Y.; Nagao, H.; Yamagishi, Y.; Watanabe, T.; Yanaka, H.; Yamaji, H.

    2008-12-01

    There are a number of data centers around the globe, where the digital broadband seismograms are opened to researchers. Those centers use their own user interfaces and there are no standard to access and retrieve seismograms from different data centers using unified interface. One of the emergent technologies to realize unified user interface for different data centers is the concept of WebService and WebService portal. Here we have developed a prototype of data center portal for digital broadband seismograms. This WebService portal uses WSDL (Web Services Description Language) to accommodate differences among the different data centers. By using the WSDL, alteration and addition of data center user interfaces can be easily managed. This portal, called NINJA Portal, assumes three WebServices: (1) database Query service, (2) Seismic event data request service, and (3) Seismic continuous data request service. Current system supports both station search of database Query service and seismic continuous data request service. Data centers supported by this NINJA portal will be OHP data center in ERI and Pacific21 data center in IFREE/JAMSTEC in the beginning. We have developed metadata standard for seismological data based on QuakeML for parametric data, which has been developed by ETH Zurich, and XML-SEED for waveform data, which was developed by IFREE/JAMSTEC. The prototype of NINJA portal is now released through IFREE web page (http://www.jamstec.go.jp/pacific21/).

  12. Model-based query language for analyzing clinical processes.

    PubMed

    Barzdins, Janis; Barzdins, Juris; Rencis, Edgars; Sostaks, Agris

    2013-01-01

    Nowadays large databases of clinical process data exist in hospitals. However, these data are rarely used in full scope. In order to perform queries on hospital processes, one must either choose from the predefined queries or develop queries using MS Excel-type software system, which is not always a trivial task. In this paper we propose a new query language for analyzing clinical processes that is easily perceptible also by non-IT professionals. We develop this language based on a process modeling language which is also described in this paper. Prototypes of both languages have already been verified using real examples from hospitals.

  13. AQBE — QBE Style Queries for Archetyped Data

    NASA Astrophysics Data System (ADS)

    Sachdeva, Shelly; Yaginuma, Daigo; Chu, Wanming; Bhalla, Subhash

    Large-scale adoption of electronic healthcare applications requires semantic interoperability. The new proposals propose an advanced (multi-level) DBMS architecture for repository services for health records of patients. These also require query interfaces at multiple levels and at the level of semi-skilled users. In this regard, a high-level user interface for querying the new form of standardized Electronic Health Records system has been examined in this study. It proposes a step-by-step graphical query interface to allow semi-skilled users to write queries. Its aim is to decrease user effort and communication ambiguities, and increase user friendliness.

  14. The User Community and a Multi-Mission Data Project: Services, Experiences and Directions of the Space Physics Data Facility

    NASA Technical Reports Server (NTRS)

    Fung, Shing F.; Bilitza, D.; Candey, R.; Chimiak, R.; Cooper, John; Fung, Shing; Harris, B.; Johnson R.; King, J.; Kovalick, T.; hide

    2008-01-01

    From a user's perspective, the multi-mission data and orbit services of NASA's Space Physics Data Facility (SPDF) project offer a unique range of important data and services highly complementary to other services presently available or now evolving in the international heliophysics data environment. The VSP (Virtual Space Physics Observatory) service is an active portal to a wide range of distributed data sources. CDAWeb (Coordinate Data Analysis Web) enables plots, listings and file downloads for current data cross the boundaries of missions and instrument types (and now including data from THEMIS and STEREO). SSCWeb, Helioweb and our 3D Animated Orbit Viewer (TIPSOD) provide position data and query logic for most missions currently important to heliophysics science. OMNIWeb with its new extension to 1- and 5-minute resolution provides interplanetary parameters at the Earth's bow shock as a unique value-added data product. SPDF also maintains NASA's CDF (common Data Format) standard and a range of associated tools including translation services. These capabilities are all now available through webservices-based APIs as well as through our direct user interfaces. In this paper, we will demonstrate the latest data and capabilities now supported in these multi-mission services, review the lessons we continue to learn in what science users need and value in this class of services, and discuss out current thinking to the future role and appropriate focus of the SPDF effort in the evolving and increasingly distributed heliophysics data environment.

  15. StarView: The object oriented design of the ST DADS user interface

    NASA Technical Reports Server (NTRS)

    Williams, J. D.; Pollizzi, J. A.

    1992-01-01

    StarView is the user interface being developed for the Hubble Space Telescope Data Archive and Distribution Service (ST DADS). ST DADS is the data archive for HST observations and a relational database catalog describing the archived data. Users will use StarView to query the catalog and select appropriate datasets for study. StarView sends requests for archived datasets to ST DADS which processes the requests and returns the database to the user. StarView is designed to be a powerful and extensible user interface. Unique features include an internal relational database to navigate query results, a form definition language that will work with both CRT and X interfaces, a data definition language that will allow StarView to work with any relational database, and the ability to generate adhoc queries without requiring the user to understand the structure of the ST DADS catalog. Ultimately, StarView will allow the user to refine queries in the local database for improved performance and merge in data from external sources for correlation with other query results. The user will be able to create a query from single or multiple forms, merging the selected attributes into a single query. Arbitrary selection of attributes for querying is supported. The user will be able to select how query results are viewed. A standard form or table-row format may be used. Navigation capabilities are provided to aid the user in viewing query results. Object oriented analysis and design techniques were used in the design of StarView to support the mechanisms and concepts required to implement these features. One such mechanism is the Model-View-Controller (MVC) paradigm. The MVC allows the user to have multiple views of the underlying database, while providing a consistent mechanism for interaction regardless of the view. This approach supports both CRT and X interfaces while providing a common mode of user interaction. Another powerful abstraction is the concept of a Query Model. This concept allows a single query to be built form a single or multiple forms before it is submitted to ST DADS. Supporting this concept is the adhoc query generator which allows the user to select and qualify an indeterminate number attributes from the database. The user does not need any knowledge of how the joins across various tables are to be resolved. The adhoc generator calculates the joins automatically and generates the correct SQL query.

  16. BioMart: a data federation framework for large collaborative projects.

    PubMed

    Zhang, Junjun; Haider, Syed; Baran, Joachim; Cros, Anthony; Guberman, Jonathan M; Hsu, Jack; Liang, Yong; Yao, Long; Kasprzyk, Arek

    2011-01-01

    BioMart is a freely available, open source, federated database system that provides a unified access to disparate, geographically distributed data sources. It is designed to be data agnostic and platform independent, such that existing databases can easily be incorporated into the BioMart framework. BioMart allows databases hosted on different servers to be presented seamlessly to users, facilitating collaborative projects between different research groups. BioMart contains several levels of query optimization to efficiently manage large data sets and offers a diverse selection of graphical user interfaces and application programming interfaces to ensure that queries can be performed in whatever manner is most convenient for the user. The software has now been adopted by a large number of different biological databases spanning a wide range of data types and providing a rich source of annotation available to bioinformaticians and biologists alike.

  17. Distributed Computation of the knn Graph for Large High-Dimensional Point Sets

    PubMed Central

    Plaku, Erion; Kavraki, Lydia E.

    2009-01-01

    High-dimensional problems arising from robot motion planning, biology, data mining, and geographic information systems often require the computation of k nearest neighbor (knn) graphs. The knn graph of a data set is obtained by connecting each point to its k closest points. As the research in the above-mentioned fields progressively addresses problems of unprecedented complexity, the demand for computing knn graphs based on arbitrary distance metrics and large high-dimensional data sets increases, exceeding resources available to a single machine. In this work we efficiently distribute the computation of knn graphs for clusters of processors with message passing. Extensions to our distributed framework include the computation of graphs based on other proximity queries, such as approximate knn or range queries. Our experiments show nearly linear speedup with over one hundred processors and indicate that similar speedup can be obtained with several hundred processors. PMID:19847318

  18. Diamond Eye: a distributed architecture for image data mining

    NASA Astrophysics Data System (ADS)

    Burl, Michael C.; Fowlkes, Charless; Roden, Joe; Stechert, Andre; Mukhtar, Saleem

    1999-02-01

    Diamond Eye is a distributed software architecture, which enables users (scientists) to analyze large image collections by interacting with one or more custom data mining servers via a Java applet interface. Each server is coupled with an object-oriented database and a computational engine, such as a network of high-performance workstations. The database provides persistent storage and supports querying of the 'mined' information. The computational engine provides parallel execution of expensive image processing, object recognition, and query-by-content operations. Key benefits of the Diamond Eye architecture are: (1) the design promotes trial evaluation of advanced data mining and machine learning techniques by potential new users (all that is required is to point a web browser to the appropriate URL), (2) software infrastructure that is common across a range of science mining applications is factored out and reused, and (3) the system facilitates closer collaborations between algorithm developers and domain experts.

  19. Querying Proofs

    NASA Technical Reports Server (NTRS)

    Aspinall, David; Denney, Ewen; Lueth, Christoph

    2012-01-01

    We motivate and introduce a query language PrQL designed for inspecting machine representations of proofs. PrQL natively supports hiproofs which express proof structure using hierarchical nested labelled trees. The core language presented in this paper is locally structured (first-order), with queries built using recursion and patterns over proof structure and rule names. We define the syntax and semantics of locally structured queries, demonstrate their power, and sketch some implementation experiments.

  20. Plant Proteomics and Peptidomics in Host-Pathogen Interactions: The Weapons Used by Each Side.

    PubMed

    Silva, Fabiana Aparecida Cavalcante; de Sousa Oliveira, Melquisedec; de Souza, Juliana Maria; Martins, Paulo Geovani Silva; Pestana-Calsa, Maria Clara; Junior, Tercilio Calsa

    2017-01-01

    Environmental biotic stress factors act continuously on plants, through multiple molecular interactions that eventually lead to the establishment and progress of symbiotic or pathogenic complex interactions. Proteins and peptides play noteworthy roles in such biological processes, usually being the main effectors since the initial recognizing and elicitor functions until the following transduction, gene regulation and physiological responses activities. Ranging from specific regulators to direct antimicrobial agents, plant or pathogen proteins and peptides comprise the arsenal available to each side in this biological war, resulting from the genetic coding potential inherited by each one. Post-translational research tools have widely contributed with valuable information on how the plant proteome works to achieve, maintain and adjust plant immunity in order to properly cope with the challenging pathogenic derived proteomes. These key proteins and peptides have great biotechnological potential since they represent distinctive features of each pathogen group (fungi, bacteria, viruses and other) in response to molecules of defense of host plants. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  1. Interpersonal Conflict and Organizational Commitment Among Licensed Practical Nurses.

    PubMed

    Loes, Chad N; Tobin, Mary B

    The shortage of nursing professionals in the United States is unquestionable. This shortage, which is predicted to continue into the foreseeable future, is a particularly salient problem within the nursing profession. This is especially true for long-term care facility administrators who not only are faced with the challenge of increasing numbers of aging residents but also regularly struggle with turnover among more cost-effective nursing staff, such as licensed practical nurses (LPNs). The primary purpose of this study was to examine whether perceived interpersonal conflict influences organizational commitment among LPNs. To accomplish this, we analyzed responses from 1165 LPNs throughout a Midwestern state who were queried on their perceptions of interpersonal conflict and organizational commitment in their work settings. Considering a wide range of potential confounding influences such as age and years working as an LPN, for example, we found that higher perceived interpersonal conflict was associated with significantly lower levels of organizational commitment. The implications of these findings, along with recommendations for nurse administrators to reduce LPN turnover, are discussed in the article.

  2. The Database Query Support Processor (QSP)

    NASA Technical Reports Server (NTRS)

    1993-01-01

    The number and diversity of databases available to users continues to increase dramatically. Currently, the trend is towards decentralized, client server architectures that (on the surface) are less expensive to acquire, operate, and maintain than information architectures based on centralized, monolithic mainframes. The database query support processor (QSP) effort evaluates the performance of a network level, heterogeneous database access capability. Air Force Material Command's Rome Laboratory has developed an approach, based on ANSI standard X3.138 - 1988, 'The Information Resource Dictionary System (IRDS)' to seamless access to heterogeneous databases based on extensions to data dictionary technology. To successfully query a decentralized information system, users must know what data are available from which source, or have the knowledge and system privileges necessary to find out this information. Privacy and security considerations prohibit free and open access to every information system in every network. Even in completely open systems, time required to locate relevant data (in systems of any appreciable size) would be better spent analyzing the data, assuming the original question was not forgotten. Extensions to data dictionary technology have the potential to more fully automate the search and retrieval for relevant data in a decentralized environment. Substantial amounts of time and money could be saved by not having to teach users what data resides in which systems and how to access each of those systems. Information describing data and how to get it could be removed from the application and placed in a dedicated repository where it belongs. The result simplified applications that are less brittle and less expensive to build and maintain. Software technology providing the required functionality is off the shelf. The key difficulty is in defining the metadata required to support the process. The database query support processor effort will provide quantitative data on the amount of effort required to implement an extended data dictionary at the network level, add new systems, adapt to changing user needs, and provide sound estimates on operations and maintenance costs and savings.

  3. Effective Multi-Query Expansions: Collaborative Deep Networks for Robust Landmark Retrieval.

    PubMed

    Wang, Yang; Lin, Xuemin; Wu, Lin; Zhang, Wenjie

    2017-03-01

    Given a query photo issued by a user (q-user), the landmark retrieval is to return a set of photos with their landmarks similar to those of the query, while the existing studies on the landmark retrieval focus on exploiting geometries of landmarks for similarity matches between candidate photos and a query photo. We observe that the same landmarks provided by different users over social media community may convey different geometry information depending on the viewpoints and/or angles, and may, subsequently, yield very different results. In fact, dealing with the landmarks with low quality shapes caused by the photography of q-users is often nontrivial and has seldom been studied. In this paper, we propose a novel framework, namely, multi-query expansions, to retrieve semantically robust landmarks by two steps. First, we identify the top- k photos regarding the latent topics of a query landmark to construct multi-query set so as to remedy its possible low quality shape. For this purpose, we significantly extend the techniques of Latent Dirichlet Allocation. Then, motivated by the typical collaborative filtering methods, we propose to learn a collaborative deep networks-based semantically, nonlinear, and high-level features over the latent factor for landmark photo as the training set, which is formed by matrix factorization over collaborative user-photo matrix regarding the multi-query set. The learned deep network is further applied to generate the features for all the other photos, meanwhile resulting into a compact multi-query set within such space. Then, the final ranking scores are calculated over the high-level feature space between the multi-query set and all other photos, which are ranked to serve as the final ranking list of landmark retrieval. Extensive experiments are conducted on real-world social media data with both landmark photos together with their user information to show the superior performance over the existing methods, especially our recently proposed multi-query based mid-level pattern representation method [1].

  4. CUFID-query: accurate network querying through random walk based network flow estimation.

    PubMed

    Jeong, Hyundoo; Qian, Xiaoning; Yoon, Byung-Jun

    2017-12-28

    Functional modules in biological networks consist of numerous biomolecules and their complicated interactions. Recent studies have shown that biomolecules in a functional module tend to have similar interaction patterns and that such modules are often conserved across biological networks of different species. As a result, such conserved functional modules can be identified through comparative analysis of biological networks. In this work, we propose a novel network querying algorithm based on the CUFID (Comparative network analysis Using the steady-state network Flow to IDentify orthologous proteins) framework combined with an efficient seed-and-extension approach. The proposed algorithm, CUFID-query, can accurately detect conserved functional modules as small subnetworks in the target network that are expected to perform similar functions to the given query functional module. The CUFID framework was recently developed for probabilistic pairwise global comparison of biological networks, and it has been applied to pairwise global network alignment, where the framework was shown to yield accurate network alignment results. In the proposed CUFID-query algorithm, we adopt the CUFID framework and extend it for local network alignment, specifically to solve network querying problems. First, in the seed selection phase, the proposed method utilizes the CUFID framework to compare the query and the target networks and to predict the probabilistic node-to-node correspondence between the networks. Next, the algorithm selects and greedily extends the seed in the target network by iteratively adding nodes that have frequent interactions with other nodes in the seed network, in a way that the conductance of the extended network is maximally reduced. Finally, CUFID-query removes irrelevant nodes from the querying results based on the personalized PageRank vector for the induced network that includes the fully extended network and its neighboring nodes. Through extensive performance evaluation based on biological networks with known functional modules, we show that CUFID-query outperforms the existing state-of-the-art algorithms in terms of prediction accuracy and biological significance of the predictions.

  5. Querying graphs in protein-protein interactions networks using feedback vertex set.

    PubMed

    Blin, Guillaume; Sikora, Florian; Vialette, Stéphane

    2010-01-01

    Recent techniques increase rapidly the amount of our knowledge on interactions between proteins. The interpretation of these new information depends on our ability to retrieve known substructures in the data, the Protein-Protein Interactions (PPIs) networks. In an algorithmic point of view, it is an hard task since it often leads to NP-hard problems. To overcome this difficulty, many authors have provided tools for querying patterns with a restricted topology, i.e., paths or trees in PPI networks. Such restriction leads to the development of fixed parameter tractable (FPT) algorithms, which can be practicable for restricted sizes of queries. Unfortunately, Graph Homomorphism is a W[1]-hard problem, and hence, no FPT algorithm can be found when patterns are in the shape of general graphs. However, Dost et al. gave an algorithm (which is not implemented) to query graphs with a bounded treewidth in PPI networks (the treewidth of the query being involved in the time complexity). In this paper, we propose another algorithm for querying pattern in the shape of graphs, also based on dynamic programming and the color-coding technique. To transform graphs queries into trees without loss of informations, we use feedback vertex set coupled to a node duplication mechanism. Hence, our algorithm is FPT for querying graphs with a bounded size of their feedback vertex set. It gives an alternative to the treewidth parameter, which can be better or worst for a given query. We provide a python implementation which allows us to validate our implementation on real data. Especially, we retrieve some human queries in the shape of graphs into the fly PPI network.

  6. Development and empirical user-centered evaluation of semantically-based query recommendation for an electronic health record search engine.

    PubMed

    Hanauer, David A; Wu, Danny T Y; Yang, Lei; Mei, Qiaozhu; Murkowski-Steffy, Katherine B; Vydiswaran, V G Vinod; Zheng, Kai

    2017-03-01

    The utility of biomedical information retrieval environments can be severely limited when users lack expertise in constructing effective search queries. To address this issue, we developed a computer-based query recommendation algorithm that suggests semantically interchangeable terms based on an initial user-entered query. In this study, we assessed the value of this approach, which has broad applicability in biomedical information retrieval, by demonstrating its application as part of a search engine that facilitates retrieval of information from electronic health records (EHRs). The query recommendation algorithm utilizes MetaMap to identify medical concepts from search queries and indexed EHR documents. Synonym variants from UMLS are used to expand the concepts along with a synonym set curated from historical EHR search logs. The empirical study involved 33 clinicians and staff who evaluated the system through a set of simulated EHR search tasks. User acceptance was assessed using the widely used technology acceptance model. The search engine's performance was rated consistently higher with the query recommendation feature turned on vs. off. The relevance of computer-recommended search terms was also rated high, and in most cases the participants had not thought of these terms on their own. The questions on perceived usefulness and perceived ease of use received overwhelmingly positive responses. A vast majority of the participants wanted the query recommendation feature to be available to assist in their day-to-day EHR search tasks. Challenges persist for users to construct effective search queries when retrieving information from biomedical documents including those from EHRs. This study demonstrates that semantically-based query recommendation is a viable solution to addressing this challenge. Published by Elsevier Inc.

  7. Advanced Query and Data Mining Capabilities for MaROS

    NASA Technical Reports Server (NTRS)

    Wang, Paul; Wallick, Michael N.; Allard, Daniel A.; Gladden, Roy E.; Hy, Franklin H.

    2013-01-01

    The Mars Relay Operational Service (MaROS) comprises a number of tools to coordinate, plan, and visualize various aspects of the Mars Relay network. These levels include a Web-based user interface, a back-end "ReSTlet" built in Java, and databases that store the data as it is received from the network. As part of MaROS, the innovators have developed and implemented a feature set that operates on several levels of the software architecture. This new feature is an advanced querying capability through either the Web-based user interface, or through a back-end REST interface to access all of the data gathered from the network. This software is not meant to replace the REST interface, but to augment and expand the range of available data. The current REST interface provides specific data that is used by the MaROS Web application to display and visualize the information; however, the returned information from the REST interface has typically been pre-processed to return only a subset of the entire information within the repository, particularly only the information that is of interest to the GUI (graphical user interface). The new, advanced query and data mining capabilities allow users to retrieve the raw data and/or to perform their own data processing. The query language used to access the repository is a restricted subset of the structured query language (SQL) that can be built safely from the Web user interface, or entered as freeform SQL by a user. The results are returned in a CSV (Comma Separated Values) format for easy exporting to third party tools and applications that can be used for data mining or user-defined visualization and interpretation. This is the first time that a service is capable of providing access to all cross-project relay data from a single Web resource. Because MaROS contains the data for a variety of missions from the Mars network, which span both NASA and ESA, the software also establishes an access control list (ACL) on each data record in the database repository to enforce user access permissions through a multilayered approach.

  8. A Study of the Efficiency of Spatial Indexing Methods Applied to Large Astronomical Databases

    NASA Astrophysics Data System (ADS)

    Donaldson, Tom; Berriman, G. Bruce; Good, John; Shiao, Bernie

    2018-01-01

    Spatial indexing of astronomical databases generally uses quadrature methods, which partition the sky into cells used to create an index (usually a B-tree) written as database column. We report the results of a study to compare the performance of two common indexing methods, HTM and HEALPix, on Solaris and Windows database servers installed with a PostgreSQL database, and a Windows Server installed with MS SQL Server. The indexing was applied to the 2MASS All-Sky Catalog and to the Hubble Source catalog. On each server, the study compared indexing performance by submitting 1 million queries at each index level with random sky positions and random cone search radius, which was computed on a logarithmic scale between 1 arcsec and 1 degree, and measuring the time to complete the query and write the output. These simulated queries, intended to model realistic use patterns, were run in a uniform way on many combinations of indexing method and indexing level. The query times in all simulations are strongly I/O-bound and are linear with number of records returned for large numbers of sources. There are, however, considerable differences between simulations, which reveal that hardware I/O throughput is a more important factor in managing the performance of a DBMS than the choice of indexing scheme. The choice of index itself is relatively unimportant: for comparable index levels, the performance is consistent within the scatter of the timings. At small index levels (large cells; e.g. level 4; cell size 3.7 deg), there is large scatter in the timings because of wide variations in the number of sources found in the cells. At larger index levels, performance improves and scatter decreases, but the improvement at level 8 (14 min) and higher is masked to some extent in the timing scatter caused by the range of query sizes. At very high levels (20; 0.0004 arsec), the granularity of the cells becomes so high that a large number of extraneous empty cells begin to degrade performance. Thus, for the use patterns studied here the database performance is not critically dependent on the exact choices of index or level.

  9. Occam's razor: supporting visual query expression for content-based image queries

    NASA Astrophysics Data System (ADS)

    Venters, Colin C.; Hartley, Richard J.; Hewitt, William T.

    2005-01-01

    This paper reports the results of a usability experiment that investigated visual query formulation on three dimensions: effectiveness, efficiency, and user satisfaction. Twenty eight evaluation sessions were conducted in order to assess the extent to which query by visual example supports visual query formulation in a content-based image retrieval environment. In order to provide a context and focus for the investigation, the study was segmented by image type, user group, and use function. The image type consisted of a set of abstract geometric device marks supplied by the UK Trademark Registry. Users were selected from the 14 UK Patent Information Network offices. The use function was limited to the retrieval of images by shape similarity. Two client interfaces were developed for comparison purposes: Trademark Image Browser Engine (TRIBE) and Shape Query Image Retrieval Systems Engine (SQUIRE).

  10. Geometric Representations of Condition Queries on Three-Dimensional Vector Fields

    NASA Technical Reports Server (NTRS)

    Henze, Chris

    1999-01-01

    Condition queries on distributed data ask where particular conditions are satisfied. It is possible to represent condition queries as geometric objects by plotting field data in various spaces derived from the data, and by selecting loci within these derived spaces which signify the desired conditions. Rather simple geometric partitions of derived spaces can represent complex condition queries because much complexity can be encapsulated in the derived space mapping itself A geometric view of condition queries provides a useful conceptual unification, allowing one to intuitively understand many existing vector field feature detection algorithms -- and to design new ones -- as variations on a common theme. A geometric representation of condition queries also provides a simple and coherent basis for computer implementation, reducing a wide variety of existing and potential vector field feature detection techniques to a few simple geometric operations.

  11. Occam"s razor: supporting visual query expression for content-based image queries

    NASA Astrophysics Data System (ADS)

    Venters, Colin C.; Hartley, Richard J.; Hewitt, William T.

    2004-12-01

    This paper reports the results of a usability experiment that investigated visual query formulation on three dimensions: effectiveness, efficiency, and user satisfaction. Twenty eight evaluation sessions were conducted in order to assess the extent to which query by visual example supports visual query formulation in a content-based image retrieval environment. In order to provide a context and focus for the investigation, the study was segmented by image type, user group, and use function. The image type consisted of a set of abstract geometric device marks supplied by the UK Trademark Registry. Users were selected from the 14 UK Patent Information Network offices. The use function was limited to the retrieval of images by shape similarity. Two client interfaces were developed for comparison purposes: Trademark Image Browser Engine (TRIBE) and Shape Query Image Retrieval Systems Engine (SQUIRE).

  12. Retrieval feedback in MEDLINE.

    PubMed Central

    Srinivasan, P

    1996-01-01

    OBJECTIVE: To investigate a new approach for query expansion based on retrieval feedback. The first objective in this study was to examine alternative query-expansion methods within the same retrieval-feedback framework. The three alternatives proposed are: expansion on the MeSH query field alone, expansion on the free-text field alone, and expansion on both the MeSH and the free-text fields. The second objective was to gain further understanding of retrieval feedback by examining possible dependencies on relevant documents during the feedback cycle. DESIGN: Comparative study of retrieval effectiveness using the original unexpanded and the alternative expanded user queries on a MEDLINE test collection of 75 queries and 2,334 MEDLINE citations. MEASUREMENTS: Retrieval effectivenesses of the original unexpanded and the alternative expanded queries were compared using 11-point-average precision scores (11-AvgP). These are averages of precision scores obtained at 11 standard recall points. RESULTS: All three expansion strategies significantly improved the original queries in terms of retrieval effectiveness. Expansion on MeSH alone was equivalent to expansion on both MeSH and the free-text fields. Expansion on the free-text field alone improved the queries significantly less than did the other two strategies. The second part of the study indicated that retrieval-feedback-based expansion yields significant performance improvements independent of the availability of relevant documents for feedback information. CONCLUSIONS: Retrieval feedback offers a robust procedure for query expansion that is most effective for MEDLINE when applied to the MeSH field. PMID:8653452

  13. Query Expansion Using SNOMED-CT and Weighing Schemes

    DTIC Science & Technology

    2014-11-01

    For this research, we have used SNOMED-CT along with UMLS Methathesaurus as our ontology in medical domain to expand the queries. General Terms...CT along with UMLS Methathesaurus as our ontology in medical domain to expand the queries. 15. SUBJECT TERMS 16. SECURITY CLASSIFICATION OF: 17...University of the Basque country discuss their finding on query expansion using external sources headlined by Unified Medical Language System ( UMLS

  14. Categorical and Specificity Differences between User-Supplied Tags and Search Query Terms for Images. An Analysis of "Flickr" Tags and Web Image Search Queries

    ERIC Educational Resources Information Center

    Chung, EunKyung; Yoon, JungWon

    2009-01-01

    Introduction: The purpose of this study is to compare characteristics and features of user supplied tags and search query terms for images on the "Flickr" Website in terms of categories of pictorial meanings and level of term specificity. Method: This study focuses on comparisons between tags and search queries using Shatford's categorization…

  15. Design Recommendations for Query Languages

    DTIC Science & Technology

    1980-09-01

    DESIGN RECOMMENDATIONS FOR QUERY LANGUAGES S.L. Ehrenreich Submitted by: Stanley M. Halpin, Acting Chief HUMAN FACTORS TECHNICAL AREA Approved by: Edgar ...respond to que- ries that it recognizes as faulty. Codd (1974) states that in designing a nat- ural query language, attention must be given to dealing...impaired. Codd (1974) also regarded the user’s perception of the data base to be of critical importance in properly designing a query language system

  16. Agent-Based Framework for Discrete Entity Simulations

    DTIC Science & Technology

    2006-11-01

    Postgres database server for environment queries of neighbors and continuum data. As expected for raw database queries (no database optimizations in...form. Eventually the code was ported to GNU C++ on the same single Intel Pentium 4 CPU running RedHat Linux 9.0 and Postgres database server...Again Postgres was used for environmental queries, and the tool remained relatively slow because of the immense number of queries necessary to assess

  17. An SSVEP-Based Brain-Computer Interface for Text Spelling With Adaptive Queries That Maximize Information Gain Rates.

    PubMed

    Akce, Abdullah; Norton, James J S; Bretl, Timothy

    2015-09-01

    This paper presents a brain-computer interface for text entry using steady-state visually evoked potentials (SSVEP). Like other SSVEP-based spellers, ours identifies the desired input character by posing questions (or queries) to users through a visual interface. Each query defines a mapping from possible characters to steady-state stimuli. The user responds by attending to one of these stimuli. Unlike other SSVEP-based spellers, ours chooses from a much larger pool of possible queries-on the order of ten thousand instead of ten. The larger query pool allows our speller to adapt more effectively to the inherent structure of what is being typed and to the input performance of the user, both of which make certain queries provide more information than others. In particular, our speller chooses queries from this pool that maximize the amount of information to be received per unit of time, a measure of mutual information that we call information gain rate. To validate our interface, we compared it with two other state-of-the-art SSVEP-based spellers, which were re-implemented to use the same input mechanism. Results showed that our interface, with the larger query pool, allowed users to spell multiple-word texts nearly twice as fast as they could with the compared spellers.

  18. Query construction, entropy, and generalization in neural-network models

    NASA Astrophysics Data System (ADS)

    Sollich, Peter

    1994-05-01

    We study query construction algorithms, which aim at improving the generalization ability of systems that learn from examples by choosing optimal, nonredundant training sets. We set up a general probabilistic framework for deriving such algorithms from the requirement of optimizing a suitable objective function; specifically, we consider the objective functions entropy (or information gain) and generalization error. For two learning scenarios, the high-low game and the linear perceptron, we evaluate the generalization performance obtained by applying the corresponding query construction algorithms and compare it to training on random examples. We find qualitative differences between the two scenarios due to the different structure of the underlying rules (nonlinear and ``noninvertible'' versus linear); in particular, for the linear perceptron, random examples lead to the same generalization ability as a sequence of queries in the limit of an infinite number of examples. We also investigate learning algorithms which are ill matched to the learning environment and find that, in this case, minimum entropy queries can in fact yield a lower generalization ability than random examples. Finally, we study the efficiency of single queries and its dependence on the learning history, i.e., on whether the previous training examples were generated randomly or by querying, and the difference between globally and locally optimal query construction.

  19. Spatial information semantic query based on SPARQL

    NASA Astrophysics Data System (ADS)

    Xiao, Zhifeng; Huang, Lei; Zhai, Xiaofang

    2009-10-01

    How can the efficiency of spatial information inquiries be enhanced in today's fast-growing information age? We are rich in geospatial data but poor in up-to-date geospatial information and knowledge that are ready to be accessed by public users. This paper adopts an approach for querying spatial semantic by building an Web Ontology language(OWL) format ontology and introducing SPARQL Protocol and RDF Query Language(SPARQL) to search spatial semantic relations. It is important to establish spatial semantics that support for effective spatial reasoning for performing semantic query. Compared to earlier keyword-based and information retrieval techniques that rely on syntax, we use semantic approaches in our spatial queries system. Semantic approaches need to be developed by ontology, so we use OWL to describe spatial information extracted by the large-scale map of Wuhan. Spatial information expressed by ontology with formal semantics is available to machines for processing and to people for understanding. The approach is illustrated by introducing a case study for using SPARQL to query geo-spatial ontology instances of Wuhan. The paper shows that making use of SPARQL to search OWL ontology instances can ensure the result's accuracy and applicability. The result also indicates constructing a geo-spatial semantic query system has positive efforts on forming spatial query and retrieval.

  20. DISPAQ: Distributed Profitable-Area Query from Big Taxi Trip Data.

    PubMed

    Putri, Fadhilah Kurnia; Song, Giltae; Kwon, Joonho; Rao, Praveen

    2017-09-25

    One of the crucial problems for taxi drivers is to efficiently locate passengers in order to increase profits. The rapid advancement and ubiquitous penetration of Internet of Things (IoT) technology into transportation industries enables us to provide taxi drivers with locations that have more potential passengers (more profitable areas) by analyzing and querying taxi trip data. In this paper, we propose a query processing system, called Distributed Profitable-Area Query ( DISPAQ ) which efficiently identifies profitable areas by exploiting the Apache Software Foundation's Spark framework and a MongoDB database. DISPAQ first maintains a profitable-area query index (PQ-index) by extracting area summaries and route summaries from raw taxi trip data. It then identifies candidate profitable areas by searching the PQ-index during query processing. Then, it exploits a Z-Skyline algorithm, which is an extension of skyline processing with a Z-order space filling curve, to quickly refine the candidate profitable areas. To improve the performance of distributed query processing, we also propose local Z-Skyline optimization, which reduces the number of dominant tests by distributing killer profitable areas to each cluster node. Through extensive evaluation with real datasets, we demonstrate that our DISPAQ system provides a scalable and efficient solution for processing profitable-area queries from huge amounts of big taxi trip data.

  1. DISPAQ: Distributed Profitable-Area Query from Big Taxi Trip Data †

    PubMed Central

    Putri, Fadhilah Kurnia; Song, Giltae; Rao, Praveen

    2017-01-01

    One of the crucial problems for taxi drivers is to efficiently locate passengers in order to increase profits. The rapid advancement and ubiquitous penetration of Internet of Things (IoT) technology into transportation industries enables us to provide taxi drivers with locations that have more potential passengers (more profitable areas) by analyzing and querying taxi trip data. In this paper, we propose a query processing system, called Distributed Profitable-Area Query (DISPAQ) which efficiently identifies profitable areas by exploiting the Apache Software Foundation’s Spark framework and a MongoDB database. DISPAQ first maintains a profitable-area query index (PQ-index) by extracting area summaries and route summaries from raw taxi trip data. It then identifies candidate profitable areas by searching the PQ-index during query processing. Then, it exploits a Z-Skyline algorithm, which is an extension of skyline processing with a Z-order space filling curve, to quickly refine the candidate profitable areas. To improve the performance of distributed query processing, we also propose local Z-Skyline optimization, which reduces the number of dominant tests by distributing killer profitable areas to each cluster node. Through extensive evaluation with real datasets, we demonstrate that our DISPAQ system provides a scalable and efficient solution for processing profitable-area queries from huge amounts of big taxi trip data. PMID:28946679

  2. Pulsed coherent population trapping with repeated queries for producing single-peaked high contrast Ramsey interference

    NASA Astrophysics Data System (ADS)

    Warren, Z.; Shahriar, M. S.; Tripathi, R.; Pati, G. S.

    2018-02-01

    A repeated query technique has been demonstrated as a new interrogation method in pulsed coherent population trapping for producing single-peaked Ramsey interference with high contrast. This technique enhances the contrast of the central Ramsey fringe by nearly 1.5 times and significantly suppresses the side fringes by using more query pulses ( >10) in the pulse cycle. Theoretical models have been developed to simulate Ramsey interference and analyze the characteristics of the Ramsey spectrum produced by the repeated query technique. Experiments have also been carried out employing a repeated query technique in a prototype rubidium clock to study its frequency stability performance.

  3. Concept locator: a client-server application for retrieval of UMLS metathesaurus concepts through complex boolean query.

    PubMed

    Nadkarni, P M

    1997-08-01

    Concept Locator (CL) is a client-server application that accesses a Sybase relational database server containing a subset of the UMLS Metathesaurus for the purpose of retrieval of concepts corresponding to one or more query expressions supplied to it. CL's query grammar permits complex Boolean expressions, wildcard patterns, and parenthesized (nested) subexpressions. CL translates the query expressions supplied to it into one or more SQL statements that actually perform the retrieval. The generated SQL is optimized by the client to take advantage of the strengths of the server's query optimizer, and sidesteps its weaknesses, so that execution is reasonably efficient.

  4. Evolution of Query Optimization Methods

    NASA Astrophysics Data System (ADS)

    Hameurlain, Abdelkader; Morvan, Franck

    Query optimization is the most critical phase in query processing. In this paper, we try to describe synthetically the evolution of query optimization methods from uniprocessor relational database systems to data Grid systems through parallel, distributed and data integration systems. We point out a set of parameters to characterize and compare query optimization methods, mainly: (i) size of the search space, (ii) type of method (static or dynamic), (iii) modification types of execution plans (re-optimization or re-scheduling), (iv) level of modification (intra-operator and/or inter-operator), (v) type of event (estimation errors, delay, user preferences), and (vi) nature of decision-making (centralized or decentralized control).

  5. Content-Aware DataGuide with Incremental Index Update using Frequently Used Paths

    NASA Astrophysics Data System (ADS)

    Sharma, A. K.; Duhan, Neelam; Khattar, Priyanka

    2010-11-01

    Size of the WWW is increasing day by day. Due to the absence of structured data on the Web, it becomes very difficult for information retrieval tools to fully utilize the Web information. As a solution to this problem, XML pages come into play, which provide structural information to the users to some extent. Without efficient indexes, query processing can be quite inefficient due to an exhaustive traversal on XML data. In this paper an improved content-centric approach of Content-Aware DataGuide, which is an indexing technique for XML databases, is being proposed that uses frequently used paths from historical query logs to improve query performance. The index can be updated incrementally according to the changes in query workload and thus, the overhead of reconstruction can be minimized. Frequently used paths are extracted using any Sequential Pattern mining algorithm on subsequent queries in the query workload. After this, the data structures are incrementally updated. This indexing technique proves to be efficient as partial matching queries can be executed efficiently and users can now get the more relevant documents in results.

  6. Macromolecular Structure Database. Final Progress Report

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gilliland, Gary L.

    2003-09-23

    The central activity of the PDB continues to be the collection, archiving and distribution of high quality structural data to the scientific community on a timely basis. In support of these activities NIST has continued its roles in developing the physical archive, in developing data uniformity, in dealing with NMR issues and in the distribution of PDB data through CD-ROMs. The physical archive holdings have been organized, inventoried, and a database has been created to facilitate their use. Data from individual PDB entries have been annotated to produce uniform values improving tremendously the accuracy of results of queries. Working withmore » the NMR community we have established data items specific for NMR that will be included in new entries and facilitate data deposition. The PDB CD-ROM production has continued on a quarterly basis, and new products are being distributed.« less

  7. Autocorrelation and Regularization of Query-Based Information Retrieval Scores

    DTIC Science & Technology

    2008-02-01

    of the most general information retrieval models [ Salton , 1968]. By treating a query as a very short document, documents and queries can be rep... Salton , 1971]. In the context of single link hierarchical clustering, Jardine and van Rijsbergen showed that ranking all k clusters and retrieving a...a document about “dogs”, then the system will always miss this document when a user queries “dog”. Salton recognized that a document’s representation

  8. Query Log Analysis of an Electronic Health Record Search Engine

    PubMed Central

    Yang, Lei; Mei, Qiaozhu; Zheng, Kai; Hanauer, David A.

    2011-01-01

    We analyzed a longitudinal collection of query logs of a full-text search engine designed to facilitate information retrieval in electronic health records (EHR). The collection, 202,905 queries and 35,928 user sessions recorded over a course of 4 years, represents the information-seeking behavior of 533 medical professionals, including frontline practitioners, coding personnel, patient safety officers, and biomedical researchers for patient data stored in EHR systems. In this paper, we present descriptive statistics of the queries, a categorization of information needs manifested through the queries, as well as temporal patterns of the users’ information-seeking behavior. The results suggest that information needs in medical domain are substantially more sophisticated than those that general-purpose web search engines need to accommodate. Therefore, we envision there exists a significant challenge, along with significant opportunities, to provide intelligent query recommendations to facilitate information retrieval in EHR. PMID:22195150

  9. Efficient hemodynamic event detection utilizing relational databases and wavelet analysis

    NASA Technical Reports Server (NTRS)

    Saeed, M.; Mark, R. G.

    2001-01-01

    Development of a temporal query framework for time-oriented medical databases has hitherto been a challenging problem. We describe a novel method for the detection of hemodynamic events in multiparameter trends utilizing wavelet coefficients in a MySQL relational database. Storage of the wavelet coefficients allowed for a compact representation of the trends, and provided robust descriptors for the dynamics of the parameter time series. A data model was developed to allow for simplified queries along several dimensions and time scales. Of particular importance, the data model and wavelet framework allowed for queries to be processed with minimal table-join operations. A web-based search engine was developed to allow for user-defined queries. Typical queries required between 0.01 and 0.02 seconds, with at least two orders of magnitude improvement in speed over conventional queries. This powerful and innovative structure will facilitate research on large-scale time-oriented medical databases.

  10. A Fuzzy Query Mechanism for Human Resource Websites

    NASA Astrophysics Data System (ADS)

    Lai, Lien-Fu; Wu, Chao-Chin; Huang, Liang-Tsung; Kuo, Jung-Chih

    Users' preferences often contain imprecision and uncertainty that are difficult for traditional human resource websites to deal with. In this paper, we apply the fuzzy logic theory to develop a fuzzy query mechanism for human resource websites. First, a storing mechanism is proposed to store fuzzy data into conventional database management systems without modifying DBMS models. Second, a fuzzy query language is proposed for users to make fuzzy queries on fuzzy databases. User's fuzzy requirement can be expressed by a fuzzy query which consists of a set of fuzzy conditions. Third, each fuzzy condition associates with a fuzzy importance to differentiate between fuzzy conditions according to their degrees of importance. Fourth, the fuzzy weighted average is utilized to aggregate all fuzzy conditions based on their degrees of importance and degrees of matching. Through the mutual compensation of all fuzzy conditions, the ordering of query results can be obtained according to user's preference.

  11. Integrating Smart Health in the US Health Care System: Infodemiology Study of Asthma Monitoring in the Google Era

    PubMed Central

    Sampri, Alexia; Sypsa, Karla; Tsagarakis, Konstantinos P

    2018-01-01

    Background With the internet’s penetration and use constantly expanding, this vast amount of information can be employed in order to better assess issues in the US health care system. Google Trends, a popular tool in big data analytics, has been widely used in the past to examine interest in various medical and health-related topics and has shown great potential in forecastings, predictions, and nowcastings. As empirical relationships between online queries and human behavior have been shown to exist, a new opportunity to explore the behavior toward asthma—a common respiratory disease—is present. Objective This study aimed at forecasting the online behavior toward asthma and examined the correlations between queries and reported cases in order to explore the possibility of nowcasting asthma prevalence in the United States using online search traffic data. Methods Applying Holt-Winters exponential smoothing to Google Trends time series from 2004 to 2015 for the term “asthma,” forecasts for online queries at state and national levels are estimated from 2016 to 2020 and validated against available Google query data from January 2016 to June 2017. Correlations among yearly Google queries and between Google queries and reported asthma cases are examined. Results Our analysis shows that search queries exhibit seasonality within each year and the relationships between each 2 years’ queries are statistically significant (P<.05). Estimated forecasting models for a 5-year period (2016 through 2020) for Google queries are robust and validated against available data from January 2016 to June 2017. Significant correlations were found between (1) online queries and National Health Interview Survey lifetime asthma (r=–.82, P=.001) and current asthma (r=–.77, P=.004) rates from 2004 to 2015 and (2) between online queries and Behavioral Risk Factor Surveillance System lifetime (r=–.78, P=.003) and current asthma (r=–.79, P=.002) rates from 2004 to 2014. The correlations are negative, but lag analysis to identify the period of response cannot be employed until short-interval data on asthma prevalence are made available. Conclusions Online behavior toward asthma can be accurately predicted, and significant correlations between online queries and reported cases exist. This method of forecasting Google queries can be used by health care officials to nowcast asthma prevalence by city, state, or nationally, subject to future availability of daily, weekly, or monthly data on reported cases. This method could therefore be used for improved monitoring and assessment of the needs surrounding the current population of patients with asthma. PMID:29530839

  12. Fast Inbound Top-K Query for Random Walk with Restart.

    PubMed

    Zhang, Chao; Jiang, Shan; Chen, Yucheng; Sun, Yidan; Han, Jiawei

    2015-09-01

    Random walk with restart (RWR) is widely recognized as one of the most important node proximity measures for graphs, as it captures the holistic graph structure and is robust to noise in the graph. In this paper, we study a novel query based on the RWR measure, called the inbound top-k (Ink) query. Given a query node q and a number k , the Ink query aims at retrieving k nodes in the graph that have the largest weighted RWR scores to q . Ink queries can be highly useful for various applications such as traffic scheduling, disease treatment, and targeted advertising. Nevertheless, none of the existing RWR computation techniques can accurately and efficiently process the Ink query in large graphs. We propose two algorithms, namely Squeeze and Ripple, both of which can accurately answer the Ink query in a fast and incremental manner. To identify the top- k nodes, Squeeze iteratively performs matrix-vector multiplication and estimates the lower and upper bounds for all the nodes in the graph. Ripple employs a more aggressive strategy by only estimating the RWR scores for the nodes falling in the vicinity of q , the nodes outside the vicinity do not need to be evaluated because their RWR scores are propagated from the boundary of the vicinity and thus upper bounded. Ripple incrementally expands the vicinity until the top- k result set can be obtained. Our extensive experiments on real-life graph data sets show that Ink queries can retrieve interesting results, and the proposed algorithms are orders of magnitude faster than state-of-the-art method.

  13. Sleep-wake time perception varies by direct or indirect query.

    PubMed

    Alameddine, Y; Ellenbogen, J M; Bianchi, M T

    2015-01-15

    The diagnosis of insomnia rests on self-report of difficulty initiating or maintaining sleep. However, subjective reports may be unreliable, and possibly may vary by the method of inquiry. We investigated this possibility by comparing within-individual response to direct versus indirect time queries after overnight polysomnography. We obtained self-reported sleep-wake times via morning questionnaires in 879 consecutive adult diagnostic polysomnograms. Responses were compared within subjects (direct versus indirect query) and across groups defined by apnea-hypopnea index and by self-reported insomnia symptoms in pre-sleep questionnaires. Direct queries required a time duration response, while indirect queries required clock times from which we calculated time durations. Direct and indirect queries of sleep latency were the same in only 41% of cases, and total sleep time queries matched in only 5.4%. For both latency and total sleep, the most common discrepancy involved the indirect value being larger than the direct response. The discrepancy between direct and indirect queries was not related to objective sleep metrics. The degree of discrepancy was not related to the presence of insomnia symptoms, although patients reporting insomnia symptoms showed underestimation of total sleep duration by direct response. Self-reported sleep latency and total sleep time are often internally inconsistent when comparing direct and indirect survey queries of each measure. These discrepancies represent substantive challenges to effective clinical practice, particularly when diagnosis and management depends on self-reported sleep patterns, as with insomnia. Although self-reported sleep-wake times remains fundamental to clinical practice, objective measures provide clinically relevant adjunctive information. © 2015 American Academy of Sleep Medicine.

  14. Secure and Efficient k-NN Queries⋆

    PubMed Central

    Asif, Hafiz; Vaidya, Jaideep; Shafiq, Basit; Adam, Nabil

    2017-01-01

    Given the morass of available data, ranking and best match queries are often used to find records of interest. As such, k-NN queries, which give the k closest matches to a query point, are of particular interest, and have many applications. We study this problem in the context of the financial sector, wherein an investment portfolio database is queried for matching portfolios. Given the sensitivity of the information involved, our key contribution is to develop a secure k-NN computation protocol that can enable the computation k-NN queries in a distributed multi-party environment while taking domain semantics into account. The experimental results show that the proposed protocols are extremely efficient. PMID:29218333

  15. Nearest private query based on quantum oblivious key distribution

    NASA Astrophysics Data System (ADS)

    Xu, Min; Shi, Run-hua; Luo, Zhen-yu; Peng, Zhen-wan

    2017-12-01

    Nearest private query is a special private query which involves two parties, a user and a data owner, where the user has a private input (e.g., an integer) and the data owner has a private data set, and the user wants to query which element in the owner's private data set is the nearest to his input without revealing their respective private information. In this paper, we first present a quantum protocol for nearest private query, which is based on quantum oblivious key distribution (QOKD). Compared to the classical related protocols, our protocol has the advantages of the higher security and the better feasibility, so it has a better prospect of applications.

  16. Cognitive issues in searching images with visual queries

    NASA Astrophysics Data System (ADS)

    Yu, ByungGu; Evens, Martha W.

    1999-01-01

    In this paper, we propose our image indexing technique and visual query processing technique. Our mental images are different from the actual retinal images and many things, such as personal interests, personal experiences, perceptual context, the characteristics of spatial objects, and so on, affect our spatial perception. These private differences are propagated into our mental images and so our visual queries become different from the real images that we want to find. This is a hard problem and few people have tried to work on it. In this paper, we survey the human mental imagery system, the human spatial perception, and discuss several kinds of visual queries. Also, we propose our own approach to visual query interpretation and processing.

  17. Applying Wave (registered trademark) to Build an Air Force Community of Interest Shared Space

    DTIC Science & Technology

    2007-08-01

    Performance. It is essential that an inverse transform be defined for every transform, or else the query mediator must be smart enough to figure out how...to invert it. Without an inverse transform , if an incoming query constrains on the transformed attribute, the query mediator might generate a query...plan that is horribly inefficient. If you must code a custom transformation function, you must also code the inverse transform . Putting the

  18. Concept Based Tie-breaking and Maximal Marginal Relevance Retrieval in Microblog Retrieval

    DTIC Science & Technology

    2014-11-01

    the same score, another singal will be used to rank these documents to break the ties , but the relative orders of other documents against these...documents remain the same. The tie- breaking step above is repeatedly applied to further break ties until all candidate signals are applied and the ranking...searched it on the Yahoo! search engine, which returned some query sug- gestions for the query. The original queries as well as their query suggestions

  19. Multi-field query expansion is effective for biomedical dataset retrieval.

    PubMed

    Bouadjenek, Mohamed Reda; Verspoor, Karin

    2017-01-01

    In the context of the bioCADDIE challenge addressing information retrieval of biomedical datasets, we propose a method for retrieval of biomedical data sets with heterogenous schemas through query reformulation. In particular, the method proposed transforms the initial query into a multi-field query that is then enriched with terms that are likely to occur in the relevant datasets. We compare and evaluate two query expansion strategies, one based on the Rocchio method and another based on a biomedical lexicon. We then perform a comprehensive comparative evaluation of our method on the bioCADDIE dataset collection for biomedical retrieval. We demonstrate the effectiveness of our multi-field query method compared to two baselines, with MAP improved from 0.2171 and 0.2669 to 0.2996. We also show the benefits of query expansion, where the Rocchio expanstion method improves the MAP for our two baselines from 0.2171 and 0.2669 to 0.335. We show that the Rocchio query expansion method slightly outperforms the one based on the biomedical lexicon as a source of terms, with an improvement of roughly 3% for MAP. However, the query expansion method based on the biomedical lexicon is much less resource intensive since it does not require computation of any relevance feedback set or any initial execution of the query. Hence, in term of trade-off between efficiency, execution time and retrieval accuracy, we argue that the query expansion method based on the biomedical lexicon offers the best performance for a prototype biomedical data search engine intended to be used at a large scale. In the official bioCADDIE challenge results, although our approach is ranked seventh in terms of the infNDCG evaluation metric, it ranks second in term of P@10 and NDCG. Hence, the method proposed here provides overall good retrieval performance in relation to the approaches of other competitors. Consequently, the observations made in this paper should benefit the development of a Data Discovery Index prototype or the improvement of the existing one. © The Author(s) 2017. Published by Oxford University Press.

  20. Multi-field query expansion is effective for biomedical dataset retrieval

    PubMed Central

    2017-01-01

    Abstract In the context of the bioCADDIE challenge addressing information retrieval of biomedical datasets, we propose a method for retrieval of biomedical data sets with heterogenous schemas through query reformulation. In particular, the method proposed transforms the initial query into a multi-field query that is then enriched with terms that are likely to occur in the relevant datasets. We compare and evaluate two query expansion strategies, one based on the Rocchio method and another based on a biomedical lexicon. We then perform a comprehensive comparative evaluation of our method on the bioCADDIE dataset collection for biomedical retrieval. We demonstrate the effectiveness of our multi-field query method compared to two baselines, with MAP improved from 0.2171 and 0.2669 to 0.2996. We also show the benefits of query expansion, where the Rocchio expanstion method improves the MAP for our two baselines from 0.2171 and 0.2669 to 0.335. We show that the Rocchio query expansion method slightly outperforms the one based on the biomedical lexicon as a source of terms, with an improvement of roughly 3% for MAP. However, the query expansion method based on the biomedical lexicon is much less resource intensive since it does not require computation of any relevance feedback set or any initial execution of the query. Hence, in term of trade-off between efficiency, execution time and retrieval accuracy, we argue that the query expansion method based on the biomedical lexicon offers the best performance for a prototype biomedical data search engine intended to be used at a large scale. In the official bioCADDIE challenge results, although our approach is ranked seventh in terms of the infNDCG evaluation metric, it ranks second in term of P@10 and NDCG. Hence, the method proposed here provides overall good retrieval performance in relation to the approaches of other competitors. Consequently, the observations made in this paper should benefit the development of a Data Discovery Index prototype or the improvement of the existing one. PMID:29220457

  1. Efficient processing of multiple nested event pattern queries over multi-dimensional event streams based on a triaxial hierarchical model.

    PubMed

    Xiao, Fuyuan; Aritsugi, Masayoshi; Wang, Qing; Zhang, Rong

    2016-09-01

    For efficient and sophisticated analysis of complex event patterns that appear in streams of big data from health care information systems and support for decision-making, a triaxial hierarchical model is proposed in this paper. Our triaxial hierarchical model is developed by focusing on hierarchies among nested event pattern queries with an event concept hierarchy, thereby allowing us to identify the relationships among the expressions and sub-expressions of the queries extensively. We devise a cost-based heuristic by means of the triaxial hierarchical model to find an optimised query execution plan in terms of the costs of both the operators and the communications between them. According to the triaxial hierarchical model, we can also calculate how to reuse the results of the common sub-expressions in multiple queries. By integrating the optimised query execution plan with the reuse schemes, a multi-query optimisation strategy is developed to accomplish efficient processing of multiple nested event pattern queries. We present empirical studies in which the performance of multi-query optimisation strategy was examined under various stream input rates and workloads. Specifically, the workloads of pattern queries can be used for supporting monitoring patients' conditions. On the other hand, experiments with varying input rates of streams can correspond to changes of the numbers of patients that a system should manage, whereas burst input rates can correspond to changes of rushes of patients to be taken care of. The experimental results have shown that, in Workload 1, our proposal can improve about 4 and 2 times throughput comparing with the relative works, respectively; in Workload 2, our proposal can improve about 3 and 2 times throughput comparing with the relative works, respectively; in Workload 3, our proposal can improve about 6 times throughput comparing with the relative work. The experimental results demonstrated that our proposal was able to process complex queries efficiently which can support health information systems and further decision-making. Copyright © 2016 Elsevier B.V. All rights reserved.

  2. Kangaroo – A pattern-matching program for biological sequences

    PubMed Central

    2002-01-01

    Background Biologists are often interested in performing a simple database search to identify proteins or genes that contain a well-defined sequence pattern. Many databases do not provide straightforward or readily available query tools to perform simple searches, such as identifying transcription binding sites, protein motifs, or repetitive DNA sequences. However, in many cases simple pattern-matching searches can reveal a wealth of information. We present in this paper a regular expression pattern-matching tool that was used to identify short repetitive DNA sequences in human coding regions for the purpose of identifying potential mutation sites in mismatch repair deficient cells. Results Kangaroo is a web-based regular expression pattern-matching program that can search for patterns in DNA, protein, or coding region sequences in ten different organisms. The program is implemented to facilitate a wide range of queries with no restriction on the length or complexity of the query expression. The program is accessible on the web at http://bioinfo.mshri.on.ca/kangaroo/ and the source code is freely distributed at http://sourceforge.net/projects/slritools/. Conclusion A low-level simple pattern-matching application can prove to be a useful tool in many research settings. For example, Kangaroo was used to identify potential genetic targets in a human colorectal cancer variant that is characterized by a high frequency of mutations in coding regions containing mononucleotide repeats. PMID:12150718

  3. MV-OPES: Multivalued-Order Preserving Encryption Scheme: A Novel Scheme for Encrypting Integer Value to Many Different Values

    NASA Astrophysics Data System (ADS)

    Kadhem, Hasan; Amagasa, Toshiyuki; Kitagawa, Hiroyuki

    Encryption can provide strong security for sensitive data against inside and outside attacks. This is especially true in the “Database as Service” model, where confidentiality and privacy are important issues for the client. In fact, existing encryption approaches are vulnerable to a statistical attack because each value is encrypted to another fixed value. This paper presents a novel database encryption scheme called MV-OPES (Multivalued — Order Preserving Encryption Scheme), which allows privacy-preserving queries over encrypted databases with an improved security level. Our idea is to encrypt a value to different multiple values to prevent statistical attacks. At the same time, MV-OPES preserves the order of the integer values to allow comparison operations to be directly applied on encrypted data. Using calculated distance (range), we propose a novel method that allows a join query between relations based on inequality over encrypted values. We also present techniques to offload query execution load to a database server as much as possible, thereby making a better use of server resources in a database outsourcing environment. Our scheme can easily be integrated with current database systems as it is designed to work with existing indexing structures. It is robust against statistical attack and the estimation of true values. MV-OPES experiments show that security for sensitive data can be achieved with reasonable overhead, establishing the practicability of the scheme.

  4. 78 FR 20473 - National Practitioner Data Bank

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-04-05

    ... may self-query. Information under the HCQIA is reported by medical malpractice payers, state medical... Organizations (QIOs). Individual health care practitioners and entities may self-query. Information under... have access to this information. Individual practitioners, providers, and suppliers may self-query the...

  5. Outcomes of extended oral contraceptive regimens with a shortened hormone-free interval to manage breakthrough bleeding.

    PubMed

    Sulak, Patricia J; Carl, Jenny; Gopalakrishnan, Isai; Coffee, Andrea; Kuehl, Thomas J

    2004-10-01

    To evaluate in a clinical practice setting the acceptance, continuation and variability of extending the active interval of oral contraceptives (OCs) with introduction of a shortened hormone-free interval (HFI) to manage breakthrough bleeding. A retrospective review was undertaken of patients seen by one obstetrician/gynecologist and counseled on extending the active interval of OCs with a shortened HFI of 3-4 days to manage bleeding. Electronic medical records were searched for the phrase "extending the number of active pills" for patients counseled between January 1, 2000, and January 31, 2003, with follow-up through January 31, 2004. A structured query of each patient's initial and follow-up records was performed. The 220 patients counseled on the extended regimen were 14-52 years of age (mean 36.4, SD 9.3 years). At initial counseling before extending, the majority of patients cited more than one reason for using OCs in the standard fashion with 59% using OCs for noncontraceptive reasons. Reasons for extending the active interval of OCs included premenstrual symptoms (45%), dysmenorrhea/pelvic pain (40%), heavy withdrawal bleeding (36%), menstrual associated headaches (35%), convenience (13%), acne associated with menses (10%) and other (15%). Of the 181 patients with follow-up data, 174 (96%) attempted an extended regimen with 121 (67%) continuing to do so at last follow-up. Follow-up intervals ranged from 0.3 to 3.8 years (mean 1.6 years). Using Kaplan-Meier product limit survival analysis, 60% of patients continued using extended patterns of OCs for more than 2 years. For 121 currently extending, the HFI varied from 0 to 7 days with 88% utilizing a 0 to 4 day HFI. Sixty percent of patients offered extending the active interval and shortening the HFI of OCs initiate and continue this pattern for more than 2 years without serious sequelae or pregnancy while individually directing both the number of days of continuous pills and the length of the HFI. Copyright 2004 Elsevier Inc.

  6. Query by example video based on fuzzy c-means initialized by fixed clustering center

    NASA Astrophysics Data System (ADS)

    Hou, Sujuan; Zhou, Shangbo; Siddique, Muhammad Abubakar

    2012-04-01

    Currently, the high complexity of video contents has posed the following major challenges for fast retrieval: (1) efficient similarity measurements, and (2) efficient indexing on the compact representations. A video-retrieval strategy based on fuzzy c-means (FCM) is presented for querying by example. Initially, the query video is segmented and represented by a set of shots, each shot can be represented by a key frame, and then we used video processing techniques to find visual cues to represent the key frame. Next, because the FCM algorithm is sensitive to the initializations, here we initialized the cluster center by the shots of query video so that users could achieve appropriate convergence. After an FCM cluster was initialized by the query video, each shot of query video was considered a benchmark point in the aforesaid cluster, and each shot in the database possessed a class label. The similarity between the shots in the database with the same class label and benchmark point can be transformed into the distance between them. Finally, the similarity between the query video and the video in database was transformed into the number of similar shots. Our experimental results demonstrated the performance of this proposed approach.

  7. Army technology development. IBIS query. Software to support the Image Based Information System (IBIS) expansion for mapping, charting and geodesy

    NASA Technical Reports Server (NTRS)

    Friedman, S. Z.; Walker, R. E.; Aitken, R. B.

    1986-01-01

    The Image Based Information System (IBIS) has been under development at the Jet Propulsion Laboratory (JPL) since 1975. It is a collection of more than 90 programs that enable processing of image, graphical, tabular data for spatial analysis. IBIS can be utilized to create comprehensive geographic data bases. From these data, an analyst can study various attributes describing characteristics of a given study area. Even complex combinations of disparate data types can be synthesized to obtain a new perspective on spatial phenomena. In 1984, new query software was developed enabling direct Boolean queries of IBIS data bases through the submission of easily understood expressions. An improved syntax methodology, a data dictionary, and display software simplified the analysts' tasks associated with building, executing, and subsequently displaying the results of a query. The primary purpose of this report is to describe the features and capabilities of the new query software. A secondary purpose of this report is to compare this new query software to the query software developed previously (Friedman, 1982). With respect to this topic, the relative merits and drawbacks of both approaches are covered.

  8. NCBI2RDF: enabling full RDF-based access to NCBI databases.

    PubMed

    Anguita, Alberto; García-Remesal, Miguel; de la Iglesia, Diana; Maojo, Victor

    2013-01-01

    RDF has become the standard technology for enabling interoperability among heterogeneous biomedical databases. The NCBI provides access to a large set of life sciences databases through a common interface called Entrez. However, the latter does not provide RDF-based access to such databases, and, therefore, they cannot be integrated with other RDF-compliant databases and accessed via SPARQL query interfaces. This paper presents the NCBI2RDF system, aimed at providing RDF-based access to the complete NCBI data repository. This API creates a virtual endpoint for servicing SPARQL queries over different NCBI repositories and presenting to users the query results in SPARQL results format, thus enabling this data to be integrated and/or stored with other RDF-compliant repositories. SPARQL queries are dynamically resolved, decomposed, and forwarded to the NCBI-provided E-utilities programmatic interface to access the NCBI data. Furthermore, we show how our approach increases the expressiveness of the native NCBI querying system, allowing several databases to be accessed simultaneously. This feature significantly boosts productivity when working with complex queries and saves time and effort to biomedical researchers. Our approach has been validated with a large number of SPARQL queries, thus proving its reliability and enhanced capabilities in biomedical environments.

  9. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Madduri, Kamesh; Wu, Kesheng

    The Resource Description Framework (RDF) is a popular data model for representing linked data sets arising from the web, as well as large scienti c data repositories such as UniProt. RDF data intrinsically represents a labeled and directed multi-graph. SPARQL is a query language for RDF that expresses subgraph pattern- nding queries on this implicit multigraph in a SQL- like syntax. SPARQL queries generate complex intermediate join queries; to compute these joins e ciently, we propose a new strategy based on bitmap indexes. We store the RDF data in column-oriented structures as compressed bitmaps along with two dictionaries. This papermore » makes three new contributions. (i) We present an e cient parallel strategy for parsing the raw RDF data, building dictionaries of unique entities, and creating compressed bitmap indexes of the data. (ii) We utilize the constructed bitmap indexes to e ciently answer SPARQL queries, simplifying the join evaluations. (iii) To quantify the performance impact of using bitmap indexes, we compare our approach to the state-of-the-art triple-store RDF-3X. We nd that our bitmap index-based approach to answering queries is up to an order of magnitude faster for a variety of SPARQL queries, on gigascale RDF data sets.« less

  10. [On the seasonality of dermatoses: a retrospective analysis of search engine query data depending on the season].

    PubMed

    Köhler, M J; Springer, S; Kaatz, M

    2014-09-01

    The volume of search engine queries about disease-relevant items reflects public interest and correlates with disease prevalence as proven by the example of flu (influenza). Other influences include media attention or holidays. The present work investigates if the seasonality of prevalence or symptom severity of dermatoses correlates with search engine query data. The relative weekly volume of dermatological relevant search terms was assessed by the online tool Google Trends for the years 2009-2013. For each item, the degree of seasonality was calculated via frequency analysis and a geometric approach. Many dermatoses show a marked seasonality, reflected by search engine query volumes. Unexpected seasonal variations of these queries suggest a previously unknown variability of the respective disease prevalence. Furthermore, using the example of allergic rhinitis, a close correlation of search engine query data with actual pollen count can be demonstrated. In many cases, search engine query data are appropriate to estimate seasonal variability in prevalence of common dermatoses. This finding may be useful for real-time analysis and formation of hypotheses concerning pathogenetic or symptom aggravating mechanisms and may thus contribute to improvement of diagnostics and prevention of skin diseases.

  11. HodDB: Design and Analysis of a Query Processor for Brick.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Fierro, Gabriel; Culler, David

    Brick is a recently proposed metadata schema and ontology for describing building components and the relationships between them. It represents buildings as directed labeled graphs using the RDF data model. Using the SPARQL query language, building-agnostic applications query a Brick graph to discover the set of resources and relationships they require to operate. Latency-sensitive applications, such as user interfaces, demand response and modelpredictive control, require fast queries — conventionally less than 100ms. We benchmark a set of popular open-source and commercial SPARQL databases against three real Brick models using seven application queries and find that none of them meet thismore » performance target. This lack of performance can be attributed to design decisions that optimize for queries over large graphs consisting of billions of triples, but give poor spatial locality and join performance on the small dense graphs typical of Brick. We present the design and evaluation of HodDB, a RDF/SPARQL database for Brick built over a node-based index structure. HodDB performs Brick queries 3-700x faster than leading SPARQL databases and consistently meets the 100ms threshold, enabling the portability of important latency-sensitive building applications.« less

  12. Visual graph query formulation and exploration: a new perspective on information retrieval at the edge

    NASA Astrophysics Data System (ADS)

    Kase, Sue E.; Vanni, Michelle; Knight, Joanne A.; Su, Yu; Yan, Xifeng

    2016-05-01

    Within operational environments decisions must be made quickly based on the information available. Identifying an appropriate knowledge base and accurately formulating a search query are critical tasks for decision-making effectiveness in dynamic situations. The spreading of graph data management tools to access large graph databases is a rapidly emerging research area of potential benefit to the intelligence community. A graph representation provides a natural way of modeling data in a wide variety of domains. Graph structures use nodes, edges, and properties to represent and store data. This research investigates the advantages of information search by graph query initiated by the analyst and interactively refined within the contextual dimensions of the answer space toward a solution. The paper introduces SLQ, a user-friendly graph querying system enabling the visual formulation of schemaless and structureless graph queries. SLQ is demonstrated with an intelligence analyst information search scenario focused on identifying individuals responsible for manufacturing a mosquito-hosted deadly virus. The scenario highlights the interactive construction of graph queries without prior training in complex query languages or graph databases, intuitive navigation through the problem space, and visualization of results in graphical format.

  13. Representation and alignment of sung queries for music information retrieval

    NASA Astrophysics Data System (ADS)

    Adams, Norman H.; Wakefield, Gregory H.

    2005-09-01

    The pursuit of robust and rapid query-by-humming systems, which search melodic databases using sung queries, is a common theme in music information retrieval. The retrieval aspect of this database problem has received considerable attention, whereas the front-end processing of sung queries and the data structure to represent melodies has been based on musical intuition and historical momentum. The present work explores three time series representations for sung queries: a sequence of notes, a ``smooth'' pitch contour, and a sequence of pitch histograms. The performance of the three representations is compared using a collection of naturally sung queries. It is found that the most robust performance is achieved by the representation with highest dimension, the smooth pitch contour, but that this representation presents a formidable computational burden. For all three representations, it is necessary to align the query and target in order to achieve robust performance. The computational cost of the alignment is quadratic, hence it is necessary to keep the dimension small for rapid retrieval. Accordingly, iterative deepening is employed to achieve both robust performance and rapid retrieval. Finally, the conventional iterative framework is expanded to adapt the alignment constraints based on previous iterations, further expediting retrieval without degrading performance.

  14. Concept-based query language approach to enterprise information systems

    NASA Astrophysics Data System (ADS)

    Niemi, Timo; Junkkari, Marko; Järvelin, Kalervo

    2014-01-01

    In enterprise information systems (EISs) it is necessary to model, integrate and compute very diverse data. In advanced EISs the stored data often are based both on structured (e.g. relational) and semi-structured (e.g. XML) data models. In addition, the ad hoc information needs of end-users may require the manipulation of data-oriented (structural), behavioural and deductive aspects of data. Contemporary languages capable of treating this kind of diversity suit only persons with good programming skills. In this paper we present a concept-oriented query language approach to manipulate this diversity so that the programming skill requirements are considerably reduced. In our query language, the features which need technical knowledge are hidden in application-specific concepts and structures. Therefore, users need not be aware of the underlying technology. Application-specific concepts and structures are represented by the modelling primitives of the extended RDOOM (relational deductive object-oriented modelling) which contains primitives for all crucial real world relationships (is-a relationship, part-of relationship, association), XML documents and views. Our query language also supports intensional and extensional-intensional queries, in addition to conventional extensional queries. In its query formulation, the end-user combines available application-specific concepts and structures through shared variables.

  15. Relativistic quantum private database queries

    NASA Astrophysics Data System (ADS)

    Sun, Si-Jia; Yang, Yu-Guang; Zhang, Ming-Ou

    2015-04-01

    Recently, Jakobi et al. (Phys Rev A 83, 022301, 2011) suggested the first practical private database query protocol (J-protocol) based on the Scarani et al. (Phys Rev Lett 92, 057901, 2004) quantum key distribution protocol. Unfortunately, the J-protocol is just a cheat-sensitive private database query protocol. In this paper, we present an idealized relativistic quantum private database query protocol based on Minkowski causality and the properties of quantum information. Also, we prove that the protocol is secure in terms of the user security and the database security.

  16. Flexible Decision Support in Device-Saturated Environments

    DTIC Science & Technology

    2003-10-01

    also output tuples to a remote MySQL or Postgres database. 3.3 GUI The GUI allows the user to pose queries using SQL and to display query...DatabaseConnection.java – handles connections to an external database (such as MySQL or Postgres ). • Debug.java – contains the code for printing out Debug messages...also provided. It is possible to output the results of queries to a MySQL or Postgres database for archival and the GUI can query those results

  17. Improving biomedical information retrieval by linear combinations of different query expansion techniques.

    PubMed

    Abdulla, Ahmed AbdoAziz Ahmed; Lin, Hongfei; Xu, Bo; Banbhrani, Santosh Kumar

    2016-07-25

    Biomedical literature retrieval is becoming increasingly complex, and there is a fundamental need for advanced information retrieval systems. Information Retrieval (IR) programs scour unstructured materials such as text documents in large reserves of data that are usually stored on computers. IR is related to the representation, storage, and organization of information items, as well as to access. In IR one of the main problems is to determine which documents are relevant and which are not to the user's needs. Under the current regime, users cannot precisely construct queries in an accurate way to retrieve particular pieces of data from large reserves of data. Basic information retrieval systems are producing low-quality search results. In our proposed system for this paper we present a new technique to refine Information Retrieval searches to better represent the user's information need in order to enhance the performance of information retrieval by using different query expansion techniques and apply a linear combinations between them, where the combinations was linearly between two expansion results at one time. Query expansions expand the search query, for example, by finding synonyms and reweighting original terms. They provide significantly more focused, particularized search results than do basic search queries. The retrieval performance is measured by some variants of MAP (Mean Average Precision) and according to our experimental results, the combination of best results of query expansion is enhanced the retrieved documents and outperforms our baseline by 21.06 %, even it outperforms a previous study by 7.12 %. We propose several query expansion techniques and their combinations (linearly) to make user queries more cognizable to search engines and to produce higher-quality search results.

  18. A distributed query execution engine of big attributed graphs.

    PubMed

    Batarfi, Omar; Elshawi, Radwa; Fayoumi, Ayman; Barnawi, Ahmed; Sakr, Sherif

    2016-01-01

    A graph is a popular data model that has become pervasively used for modeling structural relationships between objects. In practice, in many real-world graphs, the graph vertices and edges need to be associated with descriptive attributes. Such type of graphs are referred to as attributed graphs. G-SPARQL has been proposed as an expressive language, with a centralized execution engine, for querying attributed graphs. G-SPARQL supports various types of graph querying operations including reachability, pattern matching and shortest path where any G-SPARQL query may include value-based predicates on the descriptive information (attributes) of the graph edges/vertices in addition to the structural predicates. In general, a main limitation of centralized systems is that their vertical scalability is always restricted by the physical limits of computer systems. This article describes the design, implementation in addition to the performance evaluation of DG-SPARQL, a distributed, hybrid and adaptive parallel execution engine of G-SPARQL queries. In this engine, the topology of the graph is distributed over the main memory of the underlying nodes while the graph data are maintained in a relational store which is replicated on the disk of each of the underlying nodes. DG-SPARQL evaluates parts of the query plan via SQL queries which are pushed to the underlying relational stores while other parts of the query plan, as necessary, are evaluated via indexless memory-based graph traversal algorithms. Our experimental evaluation shows the efficiency and the scalability of DG-SPARQL on querying massive attributed graph datasets in addition to its ability to outperform the performance of Apache Giraph, a popular distributed graph processing system, by orders of magnitudes.

  19. Folksonomical P2P File Sharing Networks Using Vectorized KANSEI Information as Search Tags

    NASA Astrophysics Data System (ADS)

    Ohnishi, Kei; Yoshida, Kaori; Oie, Yuji

    We present the concept of folksonomical peer-to-peer (P2P) file sharing networks that allow participants (peers) to freely assign structured search tags to files. These networks are similar to folksonomies in the present Web from the point of view that users assign search tags to information distributed over a network. As a concrete example, we consider an unstructured P2P network using vectorized Kansei (human sensitivity) information as structured search tags for file search. Vectorized Kansei information as search tags indicates what participants feel about their files and is assigned by the participant to each of their files. A search query also has the same form of search tags and indicates what participants want to feel about files that they will eventually obtain. A method that enables file search using vectorized Kansei information is the Kansei query-forwarding method, which probabilistically propagates a search query to peers that are likely to hold more files having search tags that are similar to the query. The similarity between the search query and the search tags is measured in terms of their dot product. The simulation experiments examine if the Kansei query-forwarding method can provide equal search performance for all peers in a network in which only the Kansei information and the tendency with respect to file collection are different among all of the peers. The simulation results show that the Kansei query forwarding method and a random-walk-based query forwarding method, for comparison, work effectively in different situations and are complementary. Furthermore, the Kansei query forwarding method is shown, through simulations, to be superior to or equal to the random-walk based one in terms of search speed.

  20. Quantum algorithms on Walsh transform and Hamming distance for Boolean functions

    NASA Astrophysics Data System (ADS)

    Xie, Zhengwei; Qiu, Daowen; Cai, Guangya

    2018-06-01

    Walsh spectrum or Walsh transform is an alternative description of Boolean functions. In this paper, we explore quantum algorithms to approximate the absolute value of Walsh transform W_f at a single point z0 (i.e., |W_f(z0)|) for n-variable Boolean functions with probability at least 8/π 2 using the number of O(1/|W_f(z_{0)|ɛ }) queries, promised that the accuracy is ɛ , while the best known classical algorithm requires O(2n) queries. The Hamming distance between Boolean functions is used to study the linearity testing and other important problems. We take advantage of Walsh transform to calculate the Hamming distance between two n-variable Boolean functions f and g using O(1) queries in some cases. Then, we exploit another quantum algorithm which converts computing Hamming distance between two Boolean functions to quantum amplitude estimation (i.e., approximate counting). If Ham(f,g)=t≠0, we can approximately compute Ham( f, g) with probability at least 2/3 by combining our algorithm and {Approx-Count(f,ɛ ) algorithm} using the expected number of Θ( √{N/(\\lfloor ɛ t\\rfloor +1)}+√{t(N-t)}/\\lfloor ɛ t\\rfloor +1) queries, promised that the accuracy is ɛ . Moreover, our algorithm is optimal, while the exact query complexity for the above problem is Θ(N) and the query complexity with the accuracy ɛ is O(1/ɛ 2N/(t+1)) in classical algorithm, where N=2n. Finally, we present three exact quantum query algorithms for two promise problems on Hamming distance using O(1) queries, while any classical deterministic algorithm solving the problem uses Ω(2n) queries.

  1. Parasol: An Architecture for Cross-Cloud Federated Graph Querying

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lieberman, Michael; Choudhury, Sutanay; Hughes, Marisa

    2014-06-22

    Large scale data fusion of multiple datasets can often provide in- sights that examining datasets individually cannot. However, when these datasets reside in different data centers and cannot be collocated due to technical, administrative, or policy barriers, a unique set of problems arise that hamper querying and data fusion. To ad- dress these problems, a system and architecture named Parasol is presented that enables federated queries over graph databases residing in multiple clouds. Parasol’s design is flexible and requires only minimal assumptions for participant clouds. Query optimization techniques are also described that are compatible with Parasol’s lightweight architecture. Experiments onmore » a prototype implementation of Parasol indicate its suitability for cross-cloud federated graph queries.« less

  2. Querying XML Data with SPARQL

    NASA Astrophysics Data System (ADS)

    Bikakis, Nikos; Gioldasis, Nektarios; Tsinaraki, Chrisa; Christodoulakis, Stavros

    SPARQL is today the standard access language for Semantic Web data. In the recent years XML databases have also acquired industrial importance due to the widespread applicability of XML in the Web. In this paper we present a framework that bridges the heterogeneity gap and creates an interoperable environment where SPARQL queries are used to access XML databases. Our approach assumes that fairly generic mappings between ontology constructs and XML Schema constructs have been automatically derived or manually specified. The mappings are used to automatically translate SPARQL queries to semantically equivalent XQuery queries which are used to access the XML databases. We present the algorithms and the implementation of SPARQL2XQuery framework, which is used for answering SPARQL queries over XML databases.

  3. Searching Electronic Health Records for Temporal Patterns in Patient Histories: A Case Study with Microsoft Amalga

    PubMed Central

    Plaisant, Catherine; Lam, Stanley; Shneiderman, Ben; Smith, Mark S.; Roseman, David; Marchand, Greg; Gillam, Michael; Feied, Craig; Handler, Jonathan; Rappaport, Hank

    2008-01-01

    As electronic health records (EHR) become more widespread, they enable clinicians and researchers to pose complex queries that can benefit immediate patient care and deepen understanding of medical treatment and outcomes. However, current query tools make complex temporal queries difficult to pose, and physicians have to rely on computer professionals to specify the queries for them. This paper describes our efforts to develop a novel query tool implemented in a large operational system at the Washington Hospital Center (Microsoft Amalga, formerly known as Azyxxi). We describe our design of the interface to specify temporal patterns and the visual presentation of results, and report on a pilot user study looking for adverse reactions following radiology studies using contrast. PMID:18999158

  4. Seasons, Searches, and Intentions: What The Internet Can Tell Us About The Bed Bug (Hemiptera: Cimicidae) Epidemic.

    PubMed

    Sentana-Lledo, Daniel; Barbu, Corentin M; Ngo, Michelle N; Wu, Yage; Sethuraman, Karthik; Levy, Michael Z

    2016-01-01

    The common bed bug (Cimex lectularius L.) is once again prevalent in the United States. We investigated temporal patterns in Google search queries for bed bugs and co-occurring terms, and conducted in-person surveys to explore the intentions behind searches that included those terms. Searches for "bed bugs" rose steadily through 2011 and then plateaued, suggesting that the epidemic has reached an equilibrium in the United States. However, queries including terms that survey respondents associated strongly with having bed bugs (e.g., "exterminator," "remedies") continued to climb, while terms more closely associated with informational searches (e.g., "hotels," "about") fell. Respondents' rankings of terms and nonseasonal trends in Google search volume as assessed by a cosinor model were significantly correlated (Kendall's Tau-b P = 0.015). We find no evidence from Google Trends that the bed bug epidemic in the United States has reached equilibrium. © The Authors 2015. Published by Oxford University Press on behalf of Entomological Society of America. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  5. A Simple and Customizable Web Interface to the Virtual Solar Observatory

    NASA Astrophysics Data System (ADS)

    Hughitt, V. Keith; Hourcle, J.; Suarez-Sola, I.; Davey, A.

    2010-05-01

    As the variety and number of solar data sources continue to increase at a rapid rate, the importance of providing methods to search through these sources becomes increasingly important. By taking advantage of the power of modern JavaScript libraries, a new version of the Virtual Solar Observatory's web interface aims to provide a significantly faster and simpler way to explore the multitude of data repositories available. Querying asynchroniously serves not only to eliminates bottlenecks resulting from slow or unresponsive data providers, but also allows for displaying of results as soon as they are returned. Implicit pagination and post-query filtering enables users to work with large result-sets, while a more modular and customizable UI provides a mechanism for customizing both the look-and-feel and behavior of the VSO web interface. Finally, the new web interface features a custom widget system capable of displaying additional tools and information along-side of the standard VSO search form. Interested users can also write their own widgets and submit them for future incorporation into VSO.

  6. Flora-On: Occurrence data of the vascular flora of mainland Portugal.

    PubMed

    Pereira, Ana Júlia; Francisco, Ana; Porto, Miguel

    2016-01-01

    The Flora-On dataset currently includes 253,310 occurrence records for the class Embryopsidae (vascular plants), comprising data collated via the platform http://flora-on.pt/ relating to observation records of vascular plants across mainland Portugal. Observations are uploaded directly to the database primarily by experienced botanists and naturalists, typically on a weekly basis, and consist of geo-referenced data points for species (or infraspecific taxa) along with their date of observation and phenological state. The Flora-On project aims to compile and make publicly accessible chorological, ecological, morphological and photographic information for the entire vascular flora of Portugal. The project's website offers powerful query and visualization capabilities, of which we highlight the probabilistic bioclimatic and phenological queries which operate based on the empirical density distributions of species in those variables. Flora-On was created and continues to be maintained by volunteers who are Associate members of Sociedade Portuguesa de Botânica (Botanical Society of Portugal). Given its focus on research-grade and current data, the Flora-On project represents a significant contribution to the knowledge of the present distribution and status of the Portuguese flora.

  7. PIML: the Pathogen Information Markup Language.

    PubMed

    He, Yongqun; Vines, Richard R; Wattam, Alice R; Abramochkin, Georgiy V; Dickerman, Allan W; Eckart, J Dana; Sobral, Bruno W S

    2005-01-01

    A vast amount of information about human, animal and plant pathogens has been acquired, stored and displayed in varied formats through different resources, both electronically and otherwise. However, there is no community standard format for organizing this information or agreement on machine-readable format(s) for data exchange, thereby hampering interoperation efforts across information systems harboring such infectious disease data. The Pathogen Information Markup Language (PIML) is a free, open, XML-based format for representing pathogen information. XSLT-based visual presentations of valid PIML documents were developed and can be accessed through the PathInfo website or as part of the interoperable web services federation known as ToolBus/PathPort. Currently, detailed PIML documents are available for 21 pathogens deemed of high priority with regard to public health and national biological defense. A dynamic query system allows simple queries as well as comparisons among these pathogens. Continuing efforts are being taken to include other groups' supporting PIML and to develop more PIML documents. All the PIML-related information is accessible from http://www.vbi.vt.edu/pathport/pathinfo/

  8. Rapid Exploitation and Analysis of Documents

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Buttler, D J; Andrzejewski, D; Stevens, K D

    Analysts are overwhelmed with information. They have large archives of historical data, both structured and unstructured, and continuous streams of relevant messages and documents that they need to match to current tasks, digest, and incorporate into their analysis. The purpose of the READ project is to develop technologies to make it easier to catalog, classify, and locate relevant information. We approached this task from multiple angles. First, we tackle the issue of processing large quantities of information in reasonable time. Second, we provide mechanisms that allow users to customize their queries based on latent topics exposed from corpus statistics. Third,more » we assist users in organizing query results, adding localized expert structure over results. Forth, we use word sense disambiguation techniques to increase the precision of matching user generated keyword lists with terms and concepts in the corpus. Fifth, we enhance co-occurrence statistics with latent topic attribution, to aid entity relationship discovery. Finally we quantitatively analyze the quality of three popular latent modeling techniques to examine under which circumstances each is useful.« less

  9. R3D-2-MSA: the RNA 3D structure-to-multiple sequence alignment server

    PubMed Central

    Cannone, Jamie J.; Sweeney, Blake A.; Petrov, Anton I.; Gutell, Robin R.; Zirbel, Craig L.; Leontis, Neocles

    2015-01-01

    The RNA 3D Structure-to-Multiple Sequence Alignment Server (R3D-2-MSA) is a new web service that seamlessly links RNA three-dimensional (3D) structures to high-quality RNA multiple sequence alignments (MSAs) from diverse biological sources. In this first release, R3D-2-MSA provides manual and programmatic access to curated, representative ribosomal RNA sequence alignments from bacterial, archaeal, eukaryal and organellar ribosomes, using nucleotide numbers from representative atomic-resolution 3D structures. A web-based front end is available for manual entry and an Application Program Interface for programmatic access. Users can specify up to five ranges of nucleotides and 50 nucleotide positions per range. The R3D-2-MSA server maps these ranges to the appropriate columns of the corresponding MSA and returns the contents of the columns, either for display in a web browser or in JSON format for subsequent programmatic use. The browser output page provides a 3D interactive display of the query, a full list of sequence variants with taxonomic information and a statistical summary of distinct sequence variants found. The output can be filtered and sorted in the browser. Previous user queries can be viewed at any time by resubmitting the output URL, which encodes the search and re-generates the results. The service is freely available with no login requirement at http://rna.bgsu.edu/r3d-2-msa. PMID:26048960

  10. Improved data retrieval from TreeBASE via taxonomic and linguistic data enrichment

    PubMed Central

    Anwar, Nadia; Hunt, Ela

    2009-01-01

    Background TreeBASE, the only data repository for phylogenetic studies, is not being used effectively since it does not meet the taxonomic data retrieval requirements of the systematics community. We show, through an examination of the queries performed on TreeBASE, that data retrieval using taxon names is unsatisfactory. Results We report on a new wrapper supporting taxon queries on TreeBASE by utilising a Taxonomy and Classification Database (TCl-Db) we created. TCl-Db holds merged and consolidated taxonomic names from multiple data sources and can be used to translate hierarchical, vernacular and synonym queries into specific query terms in TreeBASE. The query expansion supported by TCl-Db shows very significant information retrieval quality improvement. The wrapper can be accessed at the URL The methodology we developed is scalable and can be applied to new data, as those become available in the future. Conclusion Significantly improved data retrieval quality is shown for all queries, and additional flexibility is achieved via user-driven taxonomy selection. PMID:19426482

  11. SPARQL Assist language-neutral query composer

    PubMed Central

    2012-01-01

    Background SPARQL query composition is difficult for the lay-person, and even the experienced bioinformatician in cases where the data model is unfamiliar. Moreover, established best-practices and internationalization concerns dictate that the identifiers for ontological terms should be opaque rather than human-readable, which further complicates the task of synthesizing queries manually. Results We present SPARQL Assist: a Web application that addresses these issues by providing context-sensitive type-ahead completion during SPARQL query construction. Ontological terms are suggested using their multi-lingual labels and descriptions, leveraging existing support for internationalization and language-neutrality. Moreover, the system utilizes the semantics embedded in ontologies, and within the query itself, to help prioritize the most likely suggestions. Conclusions To ensure success, the Semantic Web must be easily available to all users, regardless of locale, training, or preferred language. By enhancing support for internationalization, and moreover by simplifying the manual construction of SPARQL queries through the use of controlled-natural-language interfaces, we believe we have made some early steps towards simplifying access to Semantic Web resources. PMID:22373327

  12. SPARQL assist language-neutral query composer.

    PubMed

    McCarthy, Luke; Vandervalk, Ben; Wilkinson, Mark

    2012-01-25

    SPARQL query composition is difficult for the lay-person, and even the experienced bioinformatician in cases where the data model is unfamiliar. Moreover, established best-practices and internationalization concerns dictate that the identifiers for ontological terms should be opaque rather than human-readable, which further complicates the task of synthesizing queries manually. We present SPARQL Assist: a Web application that addresses these issues by providing context-sensitive type-ahead completion during SPARQL query construction. Ontological terms are suggested using their multi-lingual labels and descriptions, leveraging existing support for internationalization and language-neutrality. Moreover, the system utilizes the semantics embedded in ontologies, and within the query itself, to help prioritize the most likely suggestions. To ensure success, the Semantic Web must be easily available to all users, regardless of locale, training, or preferred language. By enhancing support for internationalization, and moreover by simplifying the manual construction of SPARQL queries through the use of controlled-natural-language interfaces, we believe we have made some early steps towards simplifying access to Semantic Web resources.

  13. Querying Safety Cases

    NASA Technical Reports Server (NTRS)

    Denney, Ewen W.; Naylor, Dwight; Pai, Ganesh

    2014-01-01

    Querying a safety case to show how the various stakeholders' concerns about system safety are addressed has been put forth as one of the benefits of argument-based assurance (in a recent study by the Health Foundation, UK, which reviewed the use of safety cases in safety-critical industries). However, neither the literature nor current practice offer much guidance on querying mechanisms appropriate for, or available within, a safety case paradigm. This paper presents a preliminary approach that uses a formal basis for querying safety cases, specifically Goal Structuring Notation (GSN) argument structures. Our approach semantically enriches GSN arguments with domain-specific metadata that the query language leverages, along with its inherent structure, to produce views. We have implemented the approach in our toolset AdvoCATE, and illustrate it by application to a fragment of the safety argument for an Unmanned Aircraft System (UAS) being developed at NASA Ames. We also discuss the potential practical utility of our query mechanism within the context of the existing framework for UAS safety assurance.

  14. Parallel Index and Query for Large Scale Data Analysis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chou, Jerry; Wu, Kesheng; Ruebel, Oliver

    2011-07-18

    Modern scientific datasets present numerous data management and analysis challenges. State-of-the-art index and query technologies are critical for facilitating interactive exploration of large datasets, but numerous challenges remain in terms of designing a system for process- ing general scientific datasets. The system needs to be able to run on distributed multi-core platforms, efficiently utilize underlying I/O infrastructure, and scale to massive datasets. We present FastQuery, a novel software framework that address these challenges. FastQuery utilizes a state-of-the-art index and query technology (FastBit) and is designed to process mas- sive datasets on modern supercomputing platforms. We apply FastQuery to processing ofmore » a massive 50TB dataset generated by a large scale accelerator modeling code. We demonstrate the scalability of the tool to 11,520 cores. Motivated by the scientific need to search for inter- esting particles in this dataset, we use our framework to reduce search time from hours to tens of seconds.« less

  15. Information Network Model Query Processing

    NASA Astrophysics Data System (ADS)

    Song, Xiaopu

    Information Networking Model (INM) [31] is a novel database model for real world objects and relationships management. It naturally and directly supports various kinds of static and dynamic relationships between objects. In INM, objects are networked through various natural and complex relationships. INM Query Language (INM-QL) [30] is designed to explore such information network, retrieve information about schema, instance, their attributes, relationships, and context-dependent information, and process query results in the user specified form. INM database management system has been implemented using Berkeley DB, and it supports INM-QL. This thesis is mainly focused on the implementation of the subsystem that is able to effectively and efficiently process INM-QL. The subsystem provides a lexical and syntactical analyzer of INM-QL, and it is able to choose appropriate evaluation strategies and index mechanism to process queries in INM-QL without the user's intervention. It also uses intermediate result structure to hold intermediate query result and other helping structures to reduce complexity of query processing.

  16. Using Bitmap Indexing Technology for Combined Numerical and TextQueries

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Stockinger, Kurt; Cieslewicz, John; Wu, Kesheng

    2006-10-16

    In this paper, we describe a strategy of using compressedbitmap indices to speed up queries on both numerical data and textdocuments. By using an efficient compression algorithm, these compressedbitmap indices are compact even for indices with millions of distinctterms. Moreover, bitmap indices can be used very efficiently to answerBoolean queries over text documents involving multiple query terms.Existing inverted indices for text searches are usually inefficient forcorpora with a very large number of terms as well as for queriesinvolving a large number of hits. We demonstrate that our compressedbitmap index technology overcomes both of those short-comings. In aperformance comparison against amore » commonly used database system, ourindices answer queries 30 times faster on average. To provide full SQLsupport, we integrated our indexing software, called FastBit, withMonetDB. The integrated system MonetDB/FastBit provides not onlyefficient searches on a single table as FastBit does, but also answersjoin queries efficiently. Furthermore, MonetDB/FastBit also provides avery efficient retrieval mechanism of result records.« less

  17. Integrating Smart Health in the US Health Care System: Infodemiology Study of Asthma Monitoring in the Google Era.

    PubMed

    Mavragani, Amaryllis; Sampri, Alexia; Sypsa, Karla; Tsagarakis, Konstantinos P

    2018-03-12

    With the internet's penetration and use constantly expanding, this vast amount of information can be employed in order to better assess issues in the US health care system. Google Trends, a popular tool in big data analytics, has been widely used in the past to examine interest in various medical and health-related topics and has shown great potential in forecastings, predictions, and nowcastings. As empirical relationships between online queries and human behavior have been shown to exist, a new opportunity to explore the behavior toward asthma-a common respiratory disease-is present. This study aimed at forecasting the online behavior toward asthma and examined the correlations between queries and reported cases in order to explore the possibility of nowcasting asthma prevalence in the United States using online search traffic data. Applying Holt-Winters exponential smoothing to Google Trends time series from 2004 to 2015 for the term "asthma," forecasts for online queries at state and national levels are estimated from 2016 to 2020 and validated against available Google query data from January 2016 to June 2017. Correlations among yearly Google queries and between Google queries and reported asthma cases are examined. Our analysis shows that search queries exhibit seasonality within each year and the relationships between each 2 years' queries are statistically significant (P<.05). Estimated forecasting models for a 5-year period (2016 through 2020) for Google queries are robust and validated against available data from January 2016 to June 2017. Significant correlations were found between (1) online queries and National Health Interview Survey lifetime asthma (r=-.82, P=.001) and current asthma (r=-.77, P=.004) rates from 2004 to 2015 and (2) between online queries and Behavioral Risk Factor Surveillance System lifetime (r=-.78, P=.003) and current asthma (r=-.79, P=.002) rates from 2004 to 2014. The correlations are negative, but lag analysis to identify the period of response cannot be employed until short-interval data on asthma prevalence are made available. Online behavior toward asthma can be accurately predicted, and significant correlations between online queries and reported cases exist. This method of forecasting Google queries can be used by health care officials to nowcast asthma prevalence by city, state, or nationally, subject to future availability of daily, weekly, or monthly data on reported cases. This method could therefore be used for improved monitoring and assessment of the needs surrounding the current population of patients with asthma. ©Amaryllis Mavragani, Alexia Sampri, Karla Sypsa, Konstantinos P Tsagarakis. Originally published in JMIR Public Health and Surveillance (http://publichealth.jmir.org), 12.03.2018.

  18. BioFed: federated query processing over life sciences linked open data.

    PubMed

    Hasnain, Ali; Mehmood, Qaiser; Sana E Zainab, Syeda; Saleem, Muhammad; Warren, Claude; Zehra, Durre; Decker, Stefan; Rebholz-Schuhmann, Dietrich

    2017-03-15

    Biomedical data, e.g. from knowledge bases and ontologies, is increasingly made available following open linked data principles, at best as RDF triple data. This is a necessary step towards unified access to biological data sets, but this still requires solutions to query multiple endpoints for their heterogeneous data to eventually retrieve all the meaningful information. Suggested solutions are based on query federation approaches, which require the submission of SPARQL queries to endpoints. Due to the size and complexity of available data, these solutions have to be optimised for efficient retrieval times and for users in life sciences research. Last but not least, over time, the reliability of data resources in terms of access and quality have to be monitored. Our solution (BioFed) federates data over 130 SPARQL endpoints in life sciences and tailors query submission according to the provenance information. BioFed has been evaluated against the state of the art solution FedX and forms an important benchmark for the life science domain. The efficient cataloguing approach of the federated query processing system 'BioFed', the triple pattern wise source selection and the semantic source normalisation forms the core to our solution. It gathers and integrates data from newly identified public endpoints for federated access. Basic provenance information is linked to the retrieved data. Last but not least, BioFed makes use of the latest SPARQL standard (i.e., 1.1) to leverage the full benefits for query federation. The evaluation is based on 10 simple and 10 complex queries, which address data in 10 major and very popular data sources (e.g., Dugbank, Sider). BioFed is a solution for a single-point-of-access for a large number of SPARQL endpoints providing life science data. It facilitates efficient query generation for data access and provides basic provenance information in combination with the retrieved data. BioFed fully supports SPARQL 1.1 and gives access to the endpoint's availability based on the EndpointData graph. Our evaluation of BioFed against FedX is based on 20 heterogeneous federated SPARQL queries and shows competitive execution performance in comparison to FedX, which can be attributed to the provision of provenance information for the source selection. Developing and testing federated query engines for life sciences data is still a challenging task. According to our findings, it is advantageous to optimise the source selection. The cataloguing of SPARQL endpoints, including type and property indexing, leads to efficient querying of data resources over the Web of Data. This could even be further improved through the use of ontologies, e.g., for abstract normalisation of query terms.

  19. Neurological Basis and Potential Modification of Emotional Intelligence through Affective/Behavioral Training

    DTIC Science & Technology

    2014-12-01

    December 2014 TYPE OF REPORT: Final PREPARED FOR: U.S. Army Medical Research and Material Command Fort Detrick, Maryland...9. SPONSORING / MONITORING AGENCY NAME(S) AND ADDRESS(ES) 10. SPONSOR/MONITOR’S ACRONYM(S) U.S. Army Medical Research and Materiel Command...healthy adults (31 men), ranging in age from 18-50 years, were also queried about their practice of medication . The sample included 28 individuals who

  20. An adaptable architecture for patient cohort identification from diverse data sources.

    PubMed

    Bache, Richard; Miles, Simon; Taweel, Adel

    2013-12-01

    We define and validate an architecture for systems that identify patient cohorts for clinical trials from multiple heterogeneous data sources. This architecture has an explicit query model capable of supporting temporal reasoning and expressing eligibility criteria independently of the representation of the data used to evaluate them. The architecture has the key feature that queries defined according to the query model are both pre and post-processed and this is used to address both structural and semantic heterogeneity. The process of extracting the relevant clinical facts is separated from the process of reasoning about them. A specific instance of the query model is then defined and implemented. We show that the specific instance of the query model has wide applicability. We then describe how it is used to access three diverse data warehouses to determine patient counts. Although the proposed architecture requires greater effort to implement the query model than would be the case for using just SQL and accessing a data-based management system directly, this effort is justified because it supports both temporal reasoning and heterogeneous data sources. The query model only needs to be implemented once no matter how many data sources are accessed. Each additional source requires only the implementation of a lightweight adaptor. The architecture has been used to implement a specific query model that can express complex eligibility criteria and access three diverse data warehouses thus demonstrating the feasibility of this approach in dealing with temporal reasoning and data heterogeneity.

  1. The Profile-Query Relationship.

    ERIC Educational Resources Information Center

    Shepherd, Michael A.; Phillips, W. J.

    1986-01-01

    Defines relationship between user profile and user query in terms of relationship between clusters of documents retrieved by each, and explores the expression of cluster similarity and cluster overlap as linear functions of similarity existing between original pairs of profiles and queries, given the desired retrieval threshold. (23 references)…

  2. Sugeno Fuzzy Integral as a Basis for the Interpretation of Flexible Queries Involving Monotonic Aggregates.

    ERIC Educational Resources Information Center

    Bosc, P.; Lietard, L.; Pivert, O.

    2003-01-01

    Considers flexible querying of relational databases. Highlights include SQL languages and basic aggregate operators; Sugeno's fuzzy integral; evaluation examples; and how and under what conditions other aggregate functions could be applied to fuzzy sets in a flexible query. (Author/LRW)

  3. Massive Query Resolution for Rapid Selective Dissemination of Information.

    ERIC Educational Resources Information Center

    Cohen, Jonathan D.

    1999-01-01

    Outlines an efficient approach to performing query resolution which, when matched with a keyword scanner, offers rapid selecting and routing for massive Boolean queries, and which is suitable for implementation on a desktop computer. Demonstrates the system's operation with large examples in a practical setting. (AEF)

  4. Querying Proofs (Work in Progress)

    NASA Technical Reports Server (NTRS)

    Aspinall, David; Denney, Ewen; Lueth, Christoph

    2011-01-01

    We motivate and introduce the basis for a query language designed for inspecting electronic representations of proofs. We argue that there is much to learn from large proofs beyond their validity, and that a dedicated query language can provide a principled way of implementing a family of useful operations.

  5. Matching health information seekers' queries to medical terms

    PubMed Central

    2012-01-01

    Background The Internet is a major source of health information but most seekers are not familiar with medical vocabularies. Hence, their searches fail due to bad query formulation. Several methods have been proposed to improve information retrieval: query expansion, syntactic and semantic techniques or knowledge-based methods. However, it would be useful to clean those queries which are misspelled. In this paper, we propose a simple yet efficient method in order to correct misspellings of queries submitted by health information seekers to a medical online search tool. Methods In addition to query normalizations and exact phonetic term matching, we tested two approximate string comparators: the similarity score function of Stoilos and the normalized Levenshtein edit distance. We propose here to combine them to increase the number of matched medical terms in French. We first took a sample of query logs to determine the thresholds and processing times. In the second run, at a greater scale we tested different combinations of query normalizations before or after misspelling correction with the retained thresholds in the first run. Results According to the total number of suggestions (around 163, the number of the first sample of queries), at a threshold comparator score of 0.3, the normalized Levenshtein edit distance gave the highest F-Measure (88.15%) and at a threshold comparator score of 0.7, the Stoilos function gave the highest F-Measure (84.31%). By combining Levenshtein and Stoilos, the highest F-Measure (80.28%) is obtained with 0.2 and 0.7 thresholds respectively. However, queries are composed by several terms that may be combination of medical terms. The process of query normalization and segmentation is thus required. The highest F-Measure (64.18%) is obtained when this process is realized before spelling-correction. Conclusions Despite the widely known high performance of the normalized edit distance of Levenshtein, we show in this paper that its combination with the Stoilos algorithm improved the results for misspelling correction of user queries. Accuracy is improved by combining spelling, phoneme-based information and string normalizations and segmentations into medical terms. These encouraging results have enabled the integration of this method into two projects funded by the French National Research Agency-Technologies for Health Care. The first aims to facilitate the coding process of clinical free texts contained in Electronic Health Records and discharge summaries, whereas the second aims at improving information retrieval through Electronic Health Records. PMID:23095521

  6. SOCR data dashboard: an integrated big data archive mashing medicare, labor, census and econometric information.

    PubMed

    Husain, Syed S; Kalinin, Alexandr; Truong, Anh; Dinov, Ivo D

    Intuitive formulation of informative and computationally-efficient queries on big and complex datasets present a number of challenges. As data collection is increasingly streamlined and ubiquitous, data exploration, discovery and analytics get considerably harder. Exploratory querying of heterogeneous and multi-source information is both difficult and necessary to advance our knowledge about the world around us. We developed a mechanism to integrate dispersed multi-source data and service the mashed information via human and machine interfaces in a secure, scalable manner. This process facilitates the exploration of subtle associations between variables, population strata, or clusters of data elements, which may be opaque to standard independent inspection of the individual sources. This a new platform includes a device agnostic tool (Dashboard webapp, http://socr.umich.edu/HTML5/Dashboard/) for graphical querying, navigating and exploring the multivariate associations in complex heterogeneous datasets. The paper illustrates this core functionality and serviceoriented infrastructure using healthcare data (e.g., US data from the 2010 Census, Demographic and Economic surveys, Bureau of Labor Statistics, and Center for Medicare Services) as well as Parkinson's Disease neuroimaging data. Both the back-end data archive and the front-end dashboard interfaces are continuously expanded to include additional data elements and new ways to customize the human and machine interactions. A client-side data import utility allows for easy and intuitive integration of user-supplied datasets. This completely open-science framework may be used for exploratory analytics, confirmatory analyses, meta-analyses, and education and training purposes in a wide variety of fields.

  7. Distributed XQuery-Based Integration and Visualization of Multimodality Brain Mapping Data

    PubMed Central

    Detwiler, Landon T.; Suciu, Dan; Franklin, Joshua D.; Moore, Eider B.; Poliakov, Andrew V.; Lee, Eunjung S.; Corina, David P.; Ojemann, George A.; Brinkley, James F.

    2008-01-01

    This paper addresses the need for relatively small groups of collaborating investigators to integrate distributed and heterogeneous data about the brain. Although various national efforts facilitate large-scale data sharing, these approaches are generally too “heavyweight” for individual or small groups of investigators, with the result that most data sharing among collaborators continues to be ad hoc. Our approach to this problem is to create a “lightweight” distributed query architecture, in which data sources are accessible via web services that accept arbitrary query languages but return XML results. A Distributed XQuery Processor (DXQP) accepts distributed XQueries in which subqueries are shipped to the remote data sources to be executed, with the resulting XML integrated by DXQP. A web-based application called DXBrain accesses DXQP, allowing a user to create, save and execute distributed XQueries, and to view the results in various formats including a 3-D brain visualization. Example results are presented using distributed brain mapping data sources obtained in studies of language organization in the brain, but any other XML source could be included. The advantage of this approach is that it is very easy to add and query a new source, the tradeoff being that the user needs to understand XQuery and the schemata of the underlying sources. For small numbers of known sources this burden is not onerous for a knowledgeable user, leading to the conclusion that the system helps to fill the gap between ad hoc local methods and large scale but complex national data sharing efforts. PMID:19198662

  8. Learning multiple relative attributes with humans in the loop.

    PubMed

    Qian, Buyue; Wang, Xiang; Cao, Nan; Jiang, Yu-Gang; Davidson, Ian

    2014-12-01

    Semantic attributes have been recognized as a more spontaneous manner to describe and annotate image content. It is widely accepted that image annotation using semantic attributes is a significant improvement to the traditional binary or multiclass annotation due to its naturally continuous and relative properties. Though useful, existing approaches rely on an abundant supervision and high-quality training data, which limit their applicability. Two standard methods to overcome small amounts of guidance and low-quality training data are transfer and active learning. In the context of relative attributes, this would entail learning multiple relative attributes simultaneously and actively querying a human for additional information. This paper addresses the two main limitations in existing work: 1) it actively adds humans to the learning loop so that minimal additional guidance can be given and 2) it learns multiple relative attributes simultaneously and thereby leverages dependence amongst them. In this paper, we formulate a joint active learning to rank framework with pairwise supervision to achieve these two aims, which also has other benefits such as the ability to be kernelized. The proposed framework optimizes over a set of ranking functions (measuring the strength of the presence of attributes) simultaneously and dependently on each other. The proposed pairwise queries take the form of which one of these two pictures is more natural? These queries can be easily answered by humans. Extensive empirical study on real image data sets shows that our proposed method, compared with several state-of-the-art methods, achieves superior retrieval performance while requires significantly less human inputs.

  9. Remembering the Important Things: Semantic Importance in Stream Reasoning

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yan, Rui; Greaves, Mark T.; Smith, William P.

    Reasoning and querying over data streams rely on the abil- ity to deliver a sequence of stream snapshots to the processing algo- rithms. These snapshots are typically provided using windows as views into streams and associated window management strategies. Generally, the goal of any window management strategy is to preserve the most im- portant data in the current window and preferentially evict the rest, so that the retained data can continue to be exploited. A simple timestamp- based strategy is rst-in-rst-out (FIFO), in which items are replaced in strict order of arrival. All timestamp-based strategies implicitly assume that a temporalmore » ordering reliably re ects importance to the processing task at hand, and thus that window management using timestamps will maximize the ability of the processing algorithms to deliver accurate interpretations of the stream. In this work, we explore a general no- tion of semantic importance that can be used for window management for streams of RDF data using semantically-aware processing algorithms like deduction or semantic query. Semantic importance exploits the infor- mation carried in RDF and surrounding ontologies for ranking window data in terms of its likely contribution to the processing algorithms. We explore the general semantic categories of query contribution, prove- nance, and trustworthiness, as well as the contribution of domain-specic ontologies. We describe how these categories behave using several con- crete examples. Finally, we consider how a stream window management strategy based on semantic importance could improve overall processing performance, especially as available window sizes decrease.« less

  10. Locality in Search Engine Queries and Its Implications for Caching

    DTIC Science & Technology

    2001-05-01

    in the question of whether caching might be effective for search engines as well. They study two real search engine traces by examining query...locality and its implications for caching. The two search engines studied are Vivisimo and Excite. Their trace analysis results show that queries have

  11. Flexible Querying of Lifelong Learner Metadata

    ERIC Educational Resources Information Center

    Poulovassilis, A.; Selmer, P.; Wood, P. T.

    2012-01-01

    This paper discusses the provision of flexible querying facilities over heterogeneous data arising from lifelong learners' educational and work experiences. A key aim of such querying facilities is to allow learners to identify possible choices for their future learning and professional development by seeing what others have done. We motivate and…

  12. A Comparison of Two Methods for Boolean Query Relevancy Feedback.

    ERIC Educational Resources Information Center

    Salton, G.; And Others

    1984-01-01

    Evaluates and compares two recently proposed automatic methods for relevance feedback of Boolean queries (Dillon method, which uses probabilistic approach as basis, and disjunctive normal form method). Conclusions are drawn concerning the use of effective feedback methods in a Boolean query environment. Nineteen references are included. (EJS)

  13. Consistent Query Answering of Conjunctive Queries under Primary Key Constraints

    ERIC Educational Resources Information Center

    Pema, Enela

    2014-01-01

    An inconsistent database is a database that violates one or more of its integrity constraints. In reality, violations of integrity constraints arise frequently under several different circumstances. Inconsistent databases have long posed the challenge to develop suitable tools for meaningful query answering. A principled approach for querying…

  14. 28 CFR 25.7 - Querying records in the system.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 28 Judicial Administration 1 2010-07-01 2010-07-01 false Querying records in the system. 25.7 Section 25.7 Judicial Administration DEPARTMENT OF JUSTICE DEPARTMENT OF JUSTICE INFORMATION SYSTEMS The National Instant Criminal Background Check System § 25.7 Querying records in the system. (a) The following...

  15. 76 FR 9295 - Privacy Act; Exempt Record System

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-02-17

    ... entities can self-query. One of the primary purposes of these data will be use of this information by a... information on all other queries to the data bank, disclosure of law enforcement queries could compromise ongoing investigation activities. The premature disclosure of the existence of a law enforcement activity...

  16. A New Publicly Available Chemical Query Language, CSRML, to support Chemotype Representations for Application to Data-Mining and Modeling

    EPA Science Inventory

    A new XML-based query language, CSRML, has been developed for representing chemical substructures, molecules, reaction rules, and reactions. CSRML queries are capable of integrating additional forms of information beyond the simple substructure (e.g., SMARTS) or reaction transfor...

  17. Experimental quantum private queries with linear optics

    NASA Astrophysics Data System (ADS)

    de Martini, Francesco; Giovannetti, Vittorio; Lloyd, Seth; Maccone, Lorenzo; Nagali, Eleonora; Sansoni, Linda; Sciarrino, Fabio

    2009-07-01

    The quantum private query is a quantum cryptographic protocol to recover information from a database, preserving both user and data privacy: the user can test whether someone has retained information on which query was asked and the database provider can test the amount of information released. Here we discuss a variant of the quantum private query algorithm that admits a simple linear optical implementation: it employs the photon’s momentum (or time slot) as address qubits and its polarization as bus qubit. A proof-of-principle experimental realization is implemented.

  18. Natural Language Query System Design for Interactive Information Storage and Retrieval Systems. M.S. Thesis

    NASA Technical Reports Server (NTRS)

    Dominick, Wayne D. (Editor); Liu, I-Hsiung

    1985-01-01

    The currently developed multi-level language interfaces of information systems are generally designed for experienced users. These interfaces commonly ignore the nature and needs of the largest user group, i.e., casual users. This research identifies the importance of natural language query system research within information storage and retrieval system development; addresses the topics of developing such a query system; and finally, proposes a framework for the development of natural language query systems in order to facilitate the communication between casual users and information storage and retrieval systems.

  19. Provenance Storage, Querying, and Visualization in PBase

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kianmajd, Parisa; Ludascher, Bertram; Missier, Paolo

    2015-01-01

    We present PBase, a repository for scientific workflows and their corresponding provenance information that facilitates the sharing of experiments among the scientific community. PBase is interoperable since it uses ProvONE, a standard provenance model for scientific workflows. Workflows and traces are stored in RDF, and with the support of SPARQL and the tree cover encoding, the repository provides a scalable infrastructure for querying the provenance data. Furthermore, through its user interface, it is possible to: visualize workflows and execution traces; visualize reachability relations within these traces; issue SPARQL queries; and visualize query results.

  20. A Semantic Basis for Proof Queries and Transformations

    NASA Technical Reports Server (NTRS)

    Aspinall, David; Denney, Ewen W.; Luth, Christoph

    2013-01-01

    We extend the query language PrQL, designed for inspecting machine representations of proofs, to also allow transformation of proofs. PrQL natively supports hiproofs which express proof structure using hierarchically nested labelled trees, which we claim is a natural way of taming the complexity of huge proofs. Query-driven transformations enable manipulation of this structure, in particular, to transform proofs produced by interactive theorem provers into forms that assist their understanding, or that could be consumed by other tools. In this paper we motivate and define basic transformation operations, using an abstract denotational semantics of hiproofs and queries. This extends our previous semantics for queries based on syntactic tree representations.We define update operations that add and remove sub-proofs, and manipulate the hierarchy to group and ungroup nodes. We show that

  1. iSMART: Ontology-based Semantic Query of CDA Documents

    PubMed Central

    Liu, Shengping; Ni, Yuan; Mei, Jing; Li, Hanyu; Xie, Guotong; Hu, Gang; Liu, Haifeng; Hou, Xueqiao; Pan, Yue

    2009-01-01

    The Health Level 7 Clinical Document Architecture (CDA) is widely accepted as the format for electronic clinical document. With the rich ontological references in CDA documents, the ontology-based semantic query could be performed to retrieve CDA documents. In this paper, we present iSMART (interactive Semantic MedicAl Record reTrieval), a prototype system designed for ontology-based semantic query of CDA documents. The clinical information in CDA documents will be extracted into RDF triples by a declarative XML to RDF transformer. An ontology reasoner is developed to infer additional information by combining the background knowledge from SNOMED CT ontology. Then an RDF query engine is leveraged to enable the semantic queries. This system has been evaluated using the real clinical documents collected from a large hospital in southern China. PMID:20351883

  2. From headache to tumour: An examination of health anxiety, health-related Internet use and 'query escalation'.

    PubMed

    Singh, Karmpaul; Brown, Richard J

    2016-09-01

    The current study aimed to explore the phenomenon of disease-related 'query escalation' in high/low health anxious Internet users (N = 40). During a 15-minute health-related Internet search, participants rated their anxiety and the perceived seriousness of information on each page. Post-search interviews determined the reasons for, and effects of, escalating queries to consider serious diseases. Both groups were found to be significantly more anxious after escalating queries. The high group was significantly more likely to escalate queries. Evaluating personal relevance of material was the main reason for escalations and moderated anxiety post-escalation. We conclude that searching for online disease information can increase anxiety, particularly for people worried about their health. © The Author(s) 2015.

  3. Reactome graph database: Efficient access to complex pathway data

    PubMed Central

    Korninger, Florian; Viteri, Guilherme; Marin-Garcia, Pablo; Ping, Peipei; Wu, Guanming; Stein, Lincoln; D’Eustachio, Peter

    2018-01-01

    Reactome is a free, open-source, open-data, curated and peer-reviewed knowledgebase of biomolecular pathways. One of its main priorities is to provide easy and efficient access to its high quality curated data. At present, biological pathway databases typically store their contents in relational databases. This limits access efficiency because there are performance issues associated with queries traversing highly interconnected data. The same data in a graph database can be queried more efficiently. Here we present the rationale behind the adoption of a graph database (Neo4j) as well as the new ContentService (REST API) that provides access to these data. The Neo4j graph database and its query language, Cypher, provide efficient access to the complex Reactome data model, facilitating easy traversal and knowledge discovery. The adoption of this technology greatly improved query efficiency, reducing the average query time by 93%. The web service built on top of the graph database provides programmatic access to Reactome data by object oriented queries, but also supports more complex queries that take advantage of the new underlying graph-based data storage. By adopting graph database technology we are providing a high performance pathway data resource to the community. The Reactome graph database use case shows the power of NoSQL database engines for complex biological data types. PMID:29377902

  4. An Ontology-Based Reasoning Framework for Querying Satellite Images for Disaster Monitoring.

    PubMed

    Alirezaie, Marjan; Kiselev, Andrey; Längkvist, Martin; Klügl, Franziska; Loutfi, Amy

    2017-11-05

    This paper presents a framework in which satellite images are classified and augmented with additional semantic information to enable queries about what can be found on the map at a particular location, but also about paths that can be taken. This is achieved by a reasoning framework based on qualitative spatial reasoning that is able to find answers to high level queries that may vary on the current situation. This framework called SemCityMap, provides the full pipeline from enriching the raw image data with rudimentary labels to the integration of a knowledge representation and reasoning methods to user interfaces for high level querying. To illustrate the utility of SemCityMap in a disaster scenario, we use an urban environment-central Stockholm-in combination with a flood simulation. We show that the system provides useful answers to high-level queries also with respect to the current flood status. Examples of such queries concern path planning for vehicles or retrieval of safe regions such as "find all regions close to schools and far from the flooded area". The particular advantage of our approach lies in the fact that ontological information and reasoning is explicitly integrated so that queries can be formulated in a natural way using concepts on appropriate level of abstraction, including additional constraints.

  5. An Ontology-Based Reasoning Framework for Querying Satellite Images for Disaster Monitoring

    PubMed Central

    Alirezaie, Marjan; Klügl, Franziska; Loutfi, Amy

    2017-01-01

    This paper presents a framework in which satellite images are classified and augmented with additional semantic information to enable queries about what can be found on the map at a particular location, but also about paths that can be taken. This is achieved by a reasoning framework based on qualitative spatial reasoning that is able to find answers to high level queries that may vary on the current situation. This framework called SemCityMap, provides the full pipeline from enriching the raw image data with rudimentary labels to the integration of a knowledge representation and reasoning methods to user interfaces for high level querying. To illustrate the utility of SemCityMap in a disaster scenario, we use an urban environment—central Stockholm—in combination with a flood simulation. We show that the system provides useful answers to high-level queries also with respect to the current flood status. Examples of such queries concern path planning for vehicles or retrieval of safe regions such as “find all regions close to schools and far from the flooded area”. The particular advantage of our approach lies in the fact that ontological information and reasoning is explicitly integrated so that queries can be formulated in a natural way using concepts on appropriate level of abstraction, including additional constraints. PMID:29113073

  6. SEQUOIA: significance enhanced network querying through context-sensitive random walk and minimization of network conductance.

    PubMed

    Jeong, Hyundoo; Yoon, Byung-Jun

    2017-03-14

    Network querying algorithms provide computational means to identify conserved network modules in large-scale biological networks that are similar to known functional modules, such as pathways or molecular complexes. Two main challenges for network querying algorithms are the high computational complexity of detecting potential isomorphism between the query and the target graphs and ensuring the biological significance of the query results. In this paper, we propose SEQUOIA, a novel network querying algorithm that effectively addresses these issues by utilizing a context-sensitive random walk (CSRW) model for network comparison and minimizing the network conductance of potential matches in the target network. The CSRW model, inspired by the pair hidden Markov model (pair-HMM) that has been widely used for sequence comparison and alignment, can accurately assess the node-to-node correspondence between different graphs by accounting for node insertions and deletions. The proposed algorithm identifies high-scoring network regions based on the CSRW scores, which are subsequently extended by maximally reducing the network conductance of the identified subnetworks. Performance assessment based on real PPI networks and known molecular complexes show that SEQUOIA outperforms existing methods and clearly enhances the biological significance of the query results. The source code and datasets can be downloaded from http://www.ece.tamu.edu/~bjyoon/SEQUOIA .

  7. Labeling RDF Graphs for Linear Time and Space Querying

    NASA Astrophysics Data System (ADS)

    Furche, Tim; Weinzierl, Antonius; Bry, François

    Indices and data structures for web querying have mostly considered tree shaped data, reflecting the view of XML documents as tree-shaped. However, for RDF (and when querying ID/IDREF constraints in XML) data is indisputably graph-shaped. In this chapter, we first study existing indexing and labeling schemes for RDF and other graph datawith focus on support for efficient adjacency and reachability queries. For XML, labeling schemes are an important part of the widespread adoption of XML, in particular for mapping XML to existing (relational) database technology. However, the existing indexing and labeling schemes for RDF (and graph data in general) sacrifice one of the most attractive properties of XML labeling schemes, the constant time (and per-node space) test for adjacency (child) and reachability (descendant). In the second part, we introduce the first labeling scheme for RDF data that retains this property and thus achieves linear time and space processing of acyclic RDF queries on a significantly larger class of graphs than previous approaches (which are mostly limited to tree-shaped data). Finally, we show how this labeling scheme can be applied to (acyclic) SPARQL queries to obtain an evaluation algorithm with time and space complexity linear in the number of resources in the queried RDF graph.

  8. NCBI2RDF: Enabling Full RDF-Based Access to NCBI Databases

    PubMed Central

    Anguita, Alberto; García-Remesal, Miguel; de la Iglesia, Diana; Maojo, Victor

    2013-01-01

    RDF has become the standard technology for enabling interoperability among heterogeneous biomedical databases. The NCBI provides access to a large set of life sciences databases through a common interface called Entrez. However, the latter does not provide RDF-based access to such databases, and, therefore, they cannot be integrated with other RDF-compliant databases and accessed via SPARQL query interfaces. This paper presents the NCBI2RDF system, aimed at providing RDF-based access to the complete NCBI data repository. This API creates a virtual endpoint for servicing SPARQL queries over different NCBI repositories and presenting to users the query results in SPARQL results format, thus enabling this data to be integrated and/or stored with other RDF-compliant repositories. SPARQL queries are dynamically resolved, decomposed, and forwarded to the NCBI-provided E-utilities programmatic interface to access the NCBI data. Furthermore, we show how our approach increases the expressiveness of the native NCBI querying system, allowing several databases to be accessed simultaneously. This feature significantly boosts productivity when working with complex queries and saves time and effort to biomedical researchers. Our approach has been validated with a large number of SPARQL queries, thus proving its reliability and enhanced capabilities in biomedical environments. PMID:23984425

  9. A New Publicly Available Chemical Query Language, CSRML ...

    EPA Pesticide Factsheets

    A new XML-based query language, CSRML, has been developed for representing chemical substructures, molecules, reaction rules, and reactions. CSRML queries are capable of integrating additional forms of information beyond the simple substructure (e.g., SMARTS) or reaction transformation (e.g., SMIRKS, reaction SMILES) queries currently in use. Chemotypes, a term used to represent advanced CSRML queries for repeated application can be encoded not only with connectivity and topology, but also with properties of atoms, bonds, electronic systems, or molecules. The CSRML language has been developed in parallel with a public set of chemotypes, i.e., the ToxPrint chemotypes, which are designed to provide excellent coverage of environmental, regulatory and commercial use chemical space, as well as to represent features and frameworks believed to be especially relevant to toxicity concerns. A software application, ChemoTyper, has also been developed and made publicly available to enable chemotype searching and fingerprinting against a target structure set. The public ChemoTyper houses the ToxPrint chemotype CSRML dictionary, as well as reference implementation so that the query specifications may be adopted by other chemical structure knowledge systems. The full specifications of the XML standard used in CSRML-based chemotypes are publicly available to facilitate and encourage the exchange of structural knowledge. Paper details specifications for a new XML-based query lan

  10. An approach for heterogeneous and loosely coupled geospatial data distributed computing

    NASA Astrophysics Data System (ADS)

    Chen, Bin; Huang, Fengru; Fang, Yu; Huang, Zhou; Lin, Hui

    2010-07-01

    Most GIS (Geographic Information System) applications tend to have heterogeneous and autonomous geospatial information resources, and the availability of these local resources is unpredictable and dynamic under a distributed computing environment. In order to make use of these local resources together to solve larger geospatial information processing problems that are related to an overall situation, in this paper, with the support of peer-to-peer computing technologies, we propose a geospatial data distributed computing mechanism that involves loosely coupled geospatial resource directories and a term named as Equivalent Distributed Program of global geospatial queries to solve geospatial distributed computing problems under heterogeneous GIS environments. First, a geospatial query process schema for distributed computing as well as a method for equivalent transformation from a global geospatial query to distributed local queries at SQL (Structured Query Language) level to solve the coordinating problem among heterogeneous resources are presented. Second, peer-to-peer technologies are used to maintain a loosely coupled network environment that consists of autonomous geospatial information resources, thus to achieve decentralized and consistent synchronization among global geospatial resource directories, and to carry out distributed transaction management of local queries. Finally, based on the developed prototype system, example applications of simple and complex geospatial data distributed queries are presented to illustrate the procedure of global geospatial information processing.

  11. Reactome graph database: Efficient access to complex pathway data.

    PubMed

    Fabregat, Antonio; Korninger, Florian; Viteri, Guilherme; Sidiropoulos, Konstantinos; Marin-Garcia, Pablo; Ping, Peipei; Wu, Guanming; Stein, Lincoln; D'Eustachio, Peter; Hermjakob, Henning

    2018-01-01

    Reactome is a free, open-source, open-data, curated and peer-reviewed knowledgebase of biomolecular pathways. One of its main priorities is to provide easy and efficient access to its high quality curated data. At present, biological pathway databases typically store their contents in relational databases. This limits access efficiency because there are performance issues associated with queries traversing highly interconnected data. The same data in a graph database can be queried more efficiently. Here we present the rationale behind the adoption of a graph database (Neo4j) as well as the new ContentService (REST API) that provides access to these data. The Neo4j graph database and its query language, Cypher, provide efficient access to the complex Reactome data model, facilitating easy traversal and knowledge discovery. The adoption of this technology greatly improved query efficiency, reducing the average query time by 93%. The web service built on top of the graph database provides programmatic access to Reactome data by object oriented queries, but also supports more complex queries that take advantage of the new underlying graph-based data storage. By adopting graph database technology we are providing a high performance pathway data resource to the community. The Reactome graph database use case shows the power of NoSQL database engines for complex biological data types.

  12. miBLAST: scalable evaluation of a batch of nucleotide sequence queries with BLAST

    PubMed Central

    Kim, You Jung; Boyd, Andrew; Athey, Brian D.; Patel, Jignesh M.

    2005-01-01

    A common task in many modern bioinformatics applications is to match a set of nucleotide query sequences against a large sequence dataset. Exis-ting tools, such as BLAST, are designed to evaluate a single query at a time and can be unacceptably slow when the number of sequences in the query set is large. In this paper, we present a new algorithm, called miBLAST, that evaluates such batch workloads efficiently. At the core, miBLAST employs a q-gram filtering and an index join for efficiently detecting similarity between the query sequences and database sequences. This set-oriented technique, which indexes both the query and the database sets, results in substantial performance improvements over existing methods. Our results show that miBLAST is significantly faster than BLAST in many cases. For example, miBLAST aligned 247 965 oligonucleotide sequences in the Affymetrix probe set against the Human UniGene in 1.26 days, compared with 27.27 days with BLAST (an improvement by a factor of 22). The relative performance of miBLAST increases for larger word sizes; however, it decreases for longer queries. miBLAST employs the familiar BLAST statistical model and output format, guaranteeing the same accuracy as BLAST and facilitating a seamless transition for existing BLAST users. PMID:16061938

  13. Intelligent data management

    NASA Technical Reports Server (NTRS)

    Campbell, William J.

    1985-01-01

    Intelligent data management is the concept of interfacing a user to a database management system with a value added service that will allow a full range of data management operations at a high level of abstraction using human written language. The development of such a system will be based on expert systems and related artificial intelligence technologies, and will allow the capturing of procedural and relational knowledge about data management operations and the support of a user with such knowledge in an on-line, interactive manner. Such a system will have the following capabilities: (1) the ability to construct a model of the users view of the database, based on the query syntax; (2) the ability to transform English queries and commands into database instructions and processes; (3) the ability to use heuristic knowledge to rapidly prune the data space in search processes; and (4) the ability to use an on-line explanation system to allow the user to understand what the system is doing and why it is doing it. Additional information is given in outline form.

  14. Query-Driven Visualization and Analysis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ruebel, Oliver; Bethel, E. Wes; Prabhat, Mr.

    2012-11-01

    This report focuses on an approach to high performance visualization and analysis, termed query-driven visualization and analysis (QDV). QDV aims to reduce the amount of data that needs to be processed by the visualization, analysis, and rendering pipelines. The goal of the data reduction process is to separate out data that is "scientifically interesting'' and to focus visualization, analysis, and rendering on that interesting subset. The premise is that for any given visualization or analysis task, the data subset of interest is much smaller than the larger, complete data set. This strategy---extracting smaller data subsets of interest and focusing ofmore » the visualization processing on these subsets---is complementary to the approach of increasing the capacity of the visualization, analysis, and rendering pipelines through parallelism. This report discusses the fundamental concepts in QDV, their relationship to different stages in the visualization and analysis pipelines, and presents QDV's application to problems in diverse areas, ranging from forensic cybersecurity to high energy physics.« less

  15. DUK - A Fast and Efficient Kmer Based Sequence Matching Tool

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Li, Mingkun; Copeland, Alex; Han, James

    2011-03-21

    A new tool, DUK, is developed to perform matching task. Matching is to find whether a query sequence partially or totally matches given reference sequences or not. Matching is similar to alignment. Indeed many traditional analysis tasks like contaminant removal use alignment tools. But for matching, there is no need to know which bases of a query sequence matches which position of a reference sequence, it only need know whether there exists a match or not. This subtle difference can make matching task much faster than alignment. DUK is accurate, versatile, fast, and has efficient memory usage. It uses Kmermore » hashing method to index reference sequences and Poisson model to calculate p-value. DUK is carefully implemented in C++ in object oriented design. The resulted classes can also be used to develop other tools quickly. DUK have been widely used in JGI for a wide range of applications such as contaminant removal, organelle genome separation, and assembly refinement. Many real applications and simulated dataset demonstrate its power.« less

  16. Providing R-Tree Support for Mongodb

    NASA Astrophysics Data System (ADS)

    Xiang, Longgang; Shao, Xiaotian; Wang, Dehao

    2016-06-01

    Supporting large amounts of spatial data is a significant characteristic of modern databases. However, unlike some mature relational databases, such as Oracle and PostgreSQL, most of current burgeoning NoSQL databases are not well designed for storing geospatial data, which is becoming increasingly important in various fields. In this paper, we propose a novel method to provide R-tree index, as well as corresponding spatial range query and nearest neighbour query functions, for MongoDB, one of the most prevalent NoSQL databases. First, after in-depth analysis of MongoDB's features, we devise an efficient tabular document structure which flattens R-tree index into MongoDB collections. Further, relevant mechanisms of R-tree operations are issued, and then we discuss in detail how to integrate R-tree into MongoDB. Finally, we present the experimental results which show that our proposed method out-performs the built-in spatial index of MongoDB. Our research will greatly facilitate big data management issues with MongoDB in a variety of geospatial information applications.

  17. A Fine-Grained and Privacy-Preserving Query Scheme for Fog Computing-Enhanced Location-Based Service

    PubMed Central

    Yin, Fan; Tang, Xiaohu

    2017-01-01

    Location-based services (LBS), as one of the most popular location-awareness applications, has been further developed to achieve low-latency with the assistance of fog computing. However, privacy issues remain a research challenge in the context of fog computing. Therefore, in this paper, we present a fine-grained and privacy-preserving query scheme for fog computing-enhanced location-based services, hereafter referred to as FGPQ. In particular, mobile users can obtain the fine-grained searching result satisfying not only the given spatial range but also the searching content. Detailed privacy analysis shows that our proposed scheme indeed achieves the privacy preservation for the LBS provider and mobile users. In addition, extensive performance analyses and experiments demonstrate that the FGPQ scheme can significantly reduce computational and communication overheads and ensure the low-latency, which outperforms existing state-of-the art schemes. Hence, our proposed scheme is more suitable for real-time LBS searching. PMID:28696395

  18. Gene Expression Omnibus (GEO): Microarray data storage, submission, retrieval, and analysis

    PubMed Central

    Barrett, Tanya

    2006-01-01

    The Gene Expression Omnibus (GEO) repository at the National Center for Biotechnology Information (NCBI) archives and freely distributes high-throughput molecular abundance data, predominantly gene expression data generated by DNA microarray technology. The database has a flexible design that can handle diverse styles of both unprocessed and processed data in a MIAME- (Minimum Information About a Microarray Experiment) supportive infrastructure that promotes fully annotated submissions. GEO currently stores about a billion individual gene expression measurements, derived from over 100 organisms, submitted by over 1,500 laboratories, addressing a wide range of biological phenomena. To maximize the utility of these data, several user-friendly Web-based interfaces and applications have been implemented that enable effective exploration, query, and visualization of these data, at the level of individual genes or entire studies. This chapter describes how the data are stored, submission procedures, and mechanisms for data retrieval and query. GEO is publicly accessible at http://www.ncbi.nlm.nih.gov/projects/geo/. PMID:16939800

  19. A Fine-Grained and Privacy-Preserving Query Scheme for Fog Computing-Enhanced Location-Based Service.

    PubMed

    Yang, Xue; Yin, Fan; Tang, Xiaohu

    2017-07-11

    Location-based services (LBS), as one of the most popular location-awareness applications, has been further developed to achieve low-latency with the assistance of fog computing. However, privacy issues remain a research challenge in the context of fog computing. Therefore, in this paper, we present a fine-grained and privacy-preserving query scheme for fog computing-enhanced location-based services, hereafter referred to as FGPQ. In particular, mobile users can obtain the fine-grained searching result satisfying not only the given spatial range but also the searching content. Detailed privacy analysis shows that our proposed scheme indeed achieves the privacy preservation for the LBS provider and mobile users. In addition, extensive performance analyses and experiments demonstrate that the FGPQ scheme can significantly reduce computational and communication overheads and ensure the low-latency, which outperforms existing state-of-the art schemes. Hence, our proposed scheme is more suitable for real-time LBS searching.

  20. A history of health-related quality of life outcomes in psychiatry.

    PubMed

    Revicki, Dennis A; Kleinman, Leah; Cella, David

    2014-06-01

    Health-related quality of life (HRQoL) is a multidimensional concept that includes subjective reports of symptoms, side effects, functioning in multiple life domains, and general perceptions of life satisfaction and quality. Rather than estimating it from external observations, interview, or clinical assessment, it is best measured by direct query. Due to a perception that respondents may not be reliable or credible, there has been some reluctance to use self-report outcomes in psychiatry. More recently, and increasingly, HRQoL assessment through direct patient query has become common when evaluating a range of psychiatric, psychological, and social therapies. With few exceptions, psychiatric patients are credible and reliable reporters of this information. This article summarizes studies that highlight the development, validation, and application of HRQoL measures in psychiatry. Thoughtful application of these tools in psychiatric research can provide a much-needed patient perspective in the future of comparative effectiveness research, patient-centered outcomes research, and clinical care.

  1. Child pornography in peer-to-peer networks.

    PubMed

    Steel, Chad M S

    2009-08-01

    The presence of child pornography in peer-to-peer networks is not disputed, but there has been little effort done to quantify and analyze the distribution and nature of that content to-date. By performing an analysis of queries and query hits on the largest peer-to-peer network, we are able to both quantify and describe the nature of querying by child pornographers as well as the content they are sharing. Child pornography related content was identified and analyzed in 235,513 user queries and 194,444 query hits. The research confirmed a large amount of peer-to-peer traffic is dedicated to child pornography, but supply and demand must be separated for a better understanding. The most prevalent query and the top two most prevalent filenames returned as query hits were child pornography related. However, it would be inaccurate to state child pornography dominates peer-to-peer as 1% of all queries were related to child pornography and 1.45% of all query hits (unique filenames) were related to child pornography, consistent with a smaller study (Hughes et al., 2008). In addition to the above, research indicates that the median age searched for was 13 years old, and the majority of queries were gender-neutral, but of those with gender-related terms, 79% were female-oriented. Distribution-wise, the vast majority of content-specific searches are for movies at 99%, though images are still the most prevalent in availability. There is no shortage of child pornography supply and demand on peer-to-peer networks and by analyzing how consumers seek and distributors advertise content we can better understand their motivations. Understanding the behavior of child pornographers and how they search for content when contrasted with those sharing content provides a basis for finding and combating that behavior. For law enforcement, knowing the specific terms used allows more timely and accurate forensics and better identification of those seeking and distributing child pornography. For Internet researchers, better filtering and monitoring is possible. For mental health professionals, understanding the preferences and behaviors of those searching supports more effective treatment.

  2. A study on PubMed search tag usage pattern: association rule mining of a full-day PubMed query log.

    PubMed

    Mosa, Abu Saleh Mohammad; Yoo, Illhoi

    2013-01-09

    The practice of evidence-based medicine requires efficient biomedical literature search such as PubMed/MEDLINE. Retrieval performance relies highly on the efficient use of search field tags. The purpose of this study was to analyze PubMed log data in order to understand the usage pattern of search tags by the end user in PubMed/MEDLINE search. A PubMed query log file was obtained from the National Library of Medicine containing anonymous user identification, timestamp, and query text. Inconsistent records were removed from the dataset and the search tags were extracted from the query texts. A total of 2,917,159 queries were selected for this study issued by a total of 613,061 users. The analysis of frequent co-occurrences and usage patterns of the search tags was conducted using an association mining algorithm. The percentage of search tag usage was low (11.38% of the total queries) and only 2.95% of queries contained two or more tags. Three out of four users used no search tag and about two-third of them issued less than four queries. Among the queries containing at least one tagged search term, the average number of search tags was almost half of the number of total search terms. Navigational search tags are more frequently used than informational search tags. While no strong association was observed between informational and navigational tags, six (out of 19) informational tags and six (out of 29) navigational tags showed strong associations in PubMed searches. The low percentage of search tag usage implies that PubMed/MEDLINE users do not utilize the features of PubMed/MEDLINE widely or they are not aware of such features or solely depend on the high recall focused query translation by the PubMed's Automatic Term Mapping. The users need further education and interactive search application for effective use of the search tags in order to fulfill their biomedical information needs from PubMed/MEDLINE.

  3. A Study on Pubmed Search Tag Usage Pattern: Association Rule Mining of a Full-day Pubmed Query Log

    PubMed Central

    2013-01-01

    Background The practice of evidence-based medicine requires efficient biomedical literature search such as PubMed/MEDLINE. Retrieval performance relies highly on the efficient use of search field tags. The purpose of this study was to analyze PubMed log data in order to understand the usage pattern of search tags by the end user in PubMed/MEDLINE search. Methods A PubMed query log file was obtained from the National Library of Medicine containing anonymous user identification, timestamp, and query text. Inconsistent records were removed from the dataset and the search tags were extracted from the query texts. A total of 2,917,159 queries were selected for this study issued by a total of 613,061 users. The analysis of frequent co-occurrences and usage patterns of the search tags was conducted using an association mining algorithm. Results The percentage of search tag usage was low (11.38% of the total queries) and only 2.95% of queries contained two or more tags. Three out of four users used no search tag and about two-third of them issued less than four queries. Among the queries containing at least one tagged search term, the average number of search tags was almost half of the number of total search terms. Navigational search tags are more frequently used than informational search tags. While no strong association was observed between informational and navigational tags, six (out of 19) informational tags and six (out of 29) navigational tags showed strong associations in PubMed searches. Conclusions The low percentage of search tag usage implies that PubMed/MEDLINE users do not utilize the features of PubMed/MEDLINE widely or they are not aware of such features or solely depend on the high recall focused query translation by the PubMed’s Automatic Term Mapping. The users need further education and interactive search application for effective use of the search tags in order to fulfill their biomedical information needs from PubMed/MEDLINE. PMID:23302604

  4. Urban Soil Hydrology: bridging the data gap with a nationwide field study

    NASA Astrophysics Data System (ADS)

    Schifman, L. A.; Shuster, W.

    2016-12-01

    Urban communities generally rely on hydrologic models or tools for assessing suitable sites for green infrastructure. These rainfall-runoff models, e.g. National Stormwater Calculator (NSWC), query soil hydrologic information from national databases, e.g. Soil Survey Geographic Database (SSURGO), or are estimated via pedotransfer-based algorithms like USDA Rosetta. As part of urban soil hydrologic assessments we have collected soil textural and hydrologic data in 12 cities throughout the United States and compared these measurements to NSWC and SSURGO queried infiltration rates (Kunsat) and Rosetta-estimated drainage rates (Ksat and Kunsat). We found that soil hydrologic parameters obtained through pedotransfer functions and queries to soil databases are not representative of field-measured values (RMSE range from 6.2 to 15.2 for infiltration and from 13.2 to 16.3 for drainage). Although the NSWC queries SSURGO, we found that SSURGO overestimates infiltration and NSWC underestimates with MEs of 4.9, and -1.4, respectively. In Rosetta, we found that pedotransfer functions overestimated drainage rates (MEs 1.8 to 3.8). In an attempt to improve drainage estimates using Rosetta the soil texture was adjusted in soils with an apparent portion of finer sands. Here, sand included: very coarse, coarse, and medium sand, whereas silt included fine, and very fine sand and silt, with the justification that fine sands behave similarly to silt. These adjusted estimates resulted in generally underestimating drainage and still not suitable for use in planning for stormwater detention (e.g., infiltrative green infrastructure). With this work we highlight the importance of obtaining field measured values when assessing sites for green infrastructure planning instead of relying on estimates, as the discrepancies in sensitive parameters such as Kunsat and Ksat, implications for parameter selection in error propagation through rainfall-runoff models, and consequences for over- or under-design of stormwater control measures for detention.

  5. AlzPharm: integration of neurodegeneration data using RDF.

    PubMed

    Lam, Hugo Y K; Marenco, Luis; Clark, Tim; Gao, Yong; Kinoshita, June; Shepherd, Gordon; Miller, Perry; Wu, Elizabeth; Wong, Gwendolyn T; Liu, Nian; Crasto, Chiquito; Morse, Thomas; Stephens, Susie; Cheung, Kei-Hoi

    2007-05-09

    Neuroscientists often need to access a wide range of data sets distributed over the Internet. These data sets, however, are typically neither integrated nor interoperable, resulting in a barrier to answering complex neuroscience research questions. Domain ontologies can enable the querying heterogeneous data sets, but they are not sufficient for neuroscience since the data of interest commonly span multiple research domains. To this end, e-Neuroscience seeks to provide an integrated platform for neuroscientists to discover new knowledge through seamless integration of the very diverse types of neuroscience data. Here we present a Semantic Web approach to building this e-Neuroscience framework by using the Resource Description Framework (RDF) and its vocabulary description language, RDF Schema (RDFS), as a standard data model to facilitate both representation and integration of the data. We have constructed a pilot ontology for BrainPharm (a subset of SenseLab) using RDFS and then converted a subset of the BrainPharm data into RDF according to the ontological structure. We have also integrated the converted BrainPharm data with existing RDF hypothesis and publication data from a pilot version of SWAN (Semantic Web Applications in Neuromedicine). Our implementation uses the RDF Data Model in Oracle Database 10g release 2 for data integration, query, and inference, while our Web interface allows users to query the data and retrieve the results in a convenient fashion. Accessing and integrating biomedical data which cuts across multiple disciplines will be increasingly indispensable and beneficial to neuroscience researchers. The Semantic Web approach we undertook has demonstrated a promising way to semantically integrate data sets created independently. It also shows how advanced queries and inferences can be performed over the integrated data, which are hard to achieve using traditional data integration approaches. Our pilot results suggest that our Semantic Web approach is suitable for realizing e-Neuroscience and generic enough to be applied in other biomedical fields.

  6. AlzPharm: integration of neurodegeneration data using RDF

    PubMed Central

    Lam, Hugo YK; Marenco, Luis; Clark, Tim; Gao, Yong; Kinoshita, June; Shepherd, Gordon; Miller, Perry; Wu, Elizabeth; Wong, Gwendolyn T; Liu, Nian; Crasto, Chiquito; Morse, Thomas; Stephens, Susie; Cheung, Kei-Hoi

    2007-01-01

    Background Neuroscientists often need to access a wide range of data sets distributed over the Internet. These data sets, however, are typically neither integrated nor interoperable, resulting in a barrier to answering complex neuroscience research questions. Domain ontologies can enable the querying heterogeneous data sets, but they are not sufficient for neuroscience since the data of interest commonly span multiple research domains. To this end, e-Neuroscience seeks to provide an integrated platform for neuroscientists to discover new knowledge through seamless integration of the very diverse types of neuroscience data. Here we present a Semantic Web approach to building this e-Neuroscience framework by using the Resource Description Framework (RDF) and its vocabulary description language, RDF Schema (RDFS), as a standard data model to facilitate both representation and integration of the data. Results We have constructed a pilot ontology for BrainPharm (a subset of SenseLab) using RDFS and then converted a subset of the BrainPharm data into RDF according to the ontological structure. We have also integrated the converted BrainPharm data with existing RDF hypothesis and publication data from a pilot version of SWAN (Semantic Web Applications in Neuromedicine). Our implementation uses the RDF Data Model in Oracle Database 10g release 2 for data integration, query, and inference, while our Web interface allows users to query the data and retrieve the results in a convenient fashion. Conclusion Accessing and integrating biomedical data which cuts across multiple disciplines will be increasingly indispensable and beneficial to neuroscience researchers. The Semantic Web approach we undertook has demonstrated a promising way to semantically integrate data sets created independently. It also shows how advanced queries and inferences can be performed over the integrated data, which are hard to achieve using traditional data integration approaches. Our pilot results suggest that our Semantic Web approach is suitable for realizing e-Neuroscience and generic enough to be applied in other biomedical fields. PMID:17493287

  7. Radiation dose reduction: comparative assessment of publication volume between interventional and diagnostic radiology.

    PubMed

    Hansmann, Jan; Henzler, Thomas; Gaba, Ron C; Morelli, John N

    2017-01-01

    We aimed to quantify and compare awareness regarding radiation dose reduction within the interventional radiology and diagnostic radiology communities. Abstracts accepted to the annual meetings of the Society of Interventional Radiology (SIR), the Cardiovascular and Interventional Radiological Society of Europe (CIRSE), the Radiological Society of North America (RSNA), and the European Congress of Radiology (ECR) between 2005 and 2015 were analyzed using the search terms "interventional/computed tomography" and "radiation dose/radiation dose reduction." A PubMed query using the above-mentioned search terms for the years of 2005-2015 was performed. Between 2005 and 2015, a total of 14 520 abstracts (mean, 660±297 abstracts) and 80 614 abstracts (mean, 3664±1025 abstracts) were presented at interventional and diagnostic radiology meetings, respectively. Significantly fewer abstracts related to radiation dose were presented at the interventional radiology meetings compared with the diagnostic radiology meetings (162 abstracts [1% of total] vs. 2706 [3% of total]; P < 0.001). On average 15±7 interventional radiology abstracts (range, 6-27) and 246±105 diagnostic radiology abstracts (range, 112-389) pertaining to radiation dose were presented at each meeting. The PubMed query revealed an average of 124±39 publications (range, 79-187) and 1205±307 publications (range, 829-1672) related to interventional and diagnostic radiology dose reduction per year, respectively (P < 0.001). The observed increase in the number of abstracts regarding radiation dose reduction in the interventional radiology community over the past 10 years has not mirrored the increased volume seen within diagnostic radiology, suggesting that increased education and discussion about this topic may be warranted.

  8. Radiation dose reduction: comparative assessment of publication volume between interventional and diagnostic radiology

    PubMed Central

    Hansmann, Jan; Henzler, Thomas; Gaba, Ron C.; Morelli, John N.

    2017-01-01

    PURPOSE We aimed to quantify and compare awareness regarding radiation dose reduction within the interventional radiology and diagnostic radiology communities. METHODS Abstracts accepted to the annual meetings of the Society of Interventional Radiology (SIR), the Cardiovascular and Interventional Radiological Society of Europe (CIRSE), the Radiological Society of North America (RSNA), and the European Congress of Radiology (ECR) between 2005 and 2015 were analyzed using the search terms “interventional/computed tomography” and “radiation dose/radiation dose reduction.” A PubMed query using the above-mentioned search terms for the years of 2005–2015 was performed. RESULTS Between 2005 and 2015, a total of 14 520 abstracts (mean, 660±297 abstracts) and 80 614 abstracts (mean, 3664±1025 abstracts) were presented at interventional and diagnostic radiology meetings, respectively. Significantly fewer abstracts related to radiation dose were presented at the interventional radiology meetings compared with the diagnostic radiology meetings (162 abstracts [1% of total] vs. 2706 [3% of total]; P < 0.001). On average 15±7 interventional radiology abstracts (range, 6–27) and 246±105 diagnostic radiology abstracts (range, 112–389) pertaining to radiation dose were presented at each meeting. The PubMed query revealed an average of 124±39 publications (range, 79–187) and 1205±307 publications (range, 829–1672) related to interventional and diagnostic radiology dose reduction per year, respectively (P < 0.001). CONCLUSION The observed increase in the number of abstracts regarding radiation dose reduction in the interventional radiology community over the past 10 years has not mirrored the increased volume seen within diagnostic radiology, suggesting that increased education and discussion about this topic may be warranted. PMID:28287072

  9. STARS 2.0: 2nd-generation open-source archiving and query software

    NASA Astrophysics Data System (ADS)

    Winegar, Tom

    2008-07-01

    The Subaru Telescope is in process of developing an open-source alternative to the 1st-generation software and databases (STARS 1) used for archiving and query. For STARS 2, we have chosen PHP and Python for scripting and MySQL as the database software. We have collected feedback from staff and observers, and used this feedback to significantly improve the design and functionality of our future archiving and query software. Archiving - We identified two weaknesses in 1st-generation STARS archiving software: a complex and inflexible table structure and uncoordinated system administration for our business model: taking pictures from the summit and archiving them in both Hawaii and Japan. We adopted a simplified and normalized table structure with passive keyword collection, and we are designing an archive-to-archive file transfer system that automatically reports real-time status and error conditions and permits error recovery. Query - We identified several weaknesses in 1st-generation STARS query software: inflexible query tools, poor sharing of calibration data, and no automatic file transfer mechanisms to observers. We are developing improved query tools and sharing of calibration data, and multi-protocol unassisted file transfer mechanisms for observers. In the process, we have redefined a 'query': from an invisible search result that can only transfer once in-house right now, with little status and error reporting and no error recovery - to a stored search result that can be monitored, transferred to different locations with multiple protocols, reporting status and error conditions and permitting recovery from errors.

  10. Federated queries of clinical data repositories: the sum of the parts does not equal the whole

    PubMed Central

    Weber, Griffin M

    2013-01-01

    Background and objective In 2008 we developed a shared health research information network (SHRINE), which for the first time enabled research queries across the full patient populations of four Boston hospitals. It uses a federated architecture, where each hospital returns only the aggregate count of the number of patients who match a query. This allows hospitals to retain control over their local databases and comply with federal and state privacy laws. However, because patients may receive care from multiple hospitals, the result of a federated query might differ from what the result would be if the query were run against a single central repository. This paper describes the situations when this happens and presents a technique for correcting these errors. Methods We use a one-time process of identifying which patients have data in multiple repositories by comparing one-way hash values of patient demographics. This enables us to partition the local databases such that all patients within a given partition have data at the same subset of hospitals. Federated queries are then run separately on each partition independently, and the combined results are presented to the user. Results Using theoretical bounds and simulated hospital networks, we demonstrate that once the partitions are made, SHRINE can produce more precise estimates of the number of patients matching a query. Conclusions Uncertainty in the overlap of patient populations across hospitals limits the effectiveness of SHRINE and other federated query tools. Our technique reduces this uncertainty while retaining an aggregate federated architecture. PMID:23349080

  11. An adaptable architecture for patient cohort identification from diverse data sources

    PubMed Central

    Bache, Richard; Miles, Simon; Taweel, Adel

    2013-01-01

    Objective We define and validate an architecture for systems that identify patient cohorts for clinical trials from multiple heterogeneous data sources. This architecture has an explicit query model capable of supporting temporal reasoning and expressing eligibility criteria independently of the representation of the data used to evaluate them. Method The architecture has the key feature that queries defined according to the query model are both pre and post-processed and this is used to address both structural and semantic heterogeneity. The process of extracting the relevant clinical facts is separated from the process of reasoning about them. A specific instance of the query model is then defined and implemented. Results We show that the specific instance of the query model has wide applicability. We then describe how it is used to access three diverse data warehouses to determine patient counts. Discussion Although the proposed architecture requires greater effort to implement the query model than would be the case for using just SQL and accessing a data-based management system directly, this effort is justified because it supports both temporal reasoning and heterogeneous data sources. The query model only needs to be implemented once no matter how many data sources are accessed. Each additional source requires only the implementation of a lightweight adaptor. Conclusions The architecture has been used to implement a specific query model that can express complex eligibility criteria and access three diverse data warehouses thus demonstrating the feasibility of this approach in dealing with temporal reasoning and data heterogeneity. PMID:24064442

  12. Adverse Reactions Associated With Cannabis Consumption as Evident From Search Engine Queries

    PubMed Central

    Lev-Ran, Shaul

    2017-01-01

    Background Cannabis is one of the most widely used psychoactive substances worldwide, but adverse drug reactions (ADRs) associated with its use are difficult to study because of its prohibited status in many countries. Objective Internet search engine queries have been used to investigate ADRs in pharmaceutical drugs. In this proof-of-concept study, we tested whether these queries can be used to detect the adverse reactions of cannabis use. Methods We analyzed anonymized queries from US-based users of Bing, a widely used search engine, made over a period of 6 months and compared the results with the prevalence of cannabis use as reported in the US National Survey on Drug Use in the Household (NSDUH) and with ADRs reported in the Food and Drug Administration’s Adverse Drug Reporting System. Predicted prevalence of cannabis use was estimated from the fraction of people making queries about cannabis, marijuana, and 121 additional synonyms. Predicted ADRs were estimated from queries containing layperson descriptions to 195 ICD-10 symptoms list. Results Our results indicated that the predicted prevalence of cannabis use at the US census regional level reaches an R2 of .71 NSDUH data. Queries for ADRs made by people who also searched for cannabis reveal many of the known adverse effects of cannabis (eg, cough and psychotic symptoms), as well as plausible unknown reactions (eg, pyrexia). Conclusions These results indicate that search engine queries can serve as an important tool for the study of adverse reactions of illicit drugs, which are difficult to study in other settings. PMID:29074469

  13. A weight based genetic algorithm for selecting views

    NASA Astrophysics Data System (ADS)

    Talebian, Seyed H.; Kareem, Sameem A.

    2013-03-01

    Data warehouse is a technology designed for supporting decision making. Data warehouse is made by extracting large amount of data from different operational systems; transforming it to a consistent form and loading it to the central repository. The type of queries in data warehouse environment differs from those in operational systems. In contrast to operational systems, the analytical queries that are issued in data warehouses involve summarization of large volume of data and therefore in normal circumstance take a long time to be answered. On the other hand, the result of these queries must be answered in a short time to enable managers to make decisions as short time as possible. As a result, an essential need in this environment is in improving the performances of queries. One of the most popular methods to do this task is utilizing pre-computed result of queries. In this method, whenever a new query is submitted by the user instead of calculating the query on the fly through a large underlying database, the pre-computed result or views are used to answer the queries. Although, the ideal option would be pre-computing and saving all possible views, but, in practice due to disk space constraint and overhead due to view updates it is not considered as a feasible choice. Therefore, we need to select a subset of possible views to save on disk. The problem of selecting the right subset of views is considered as an important challenge in data warehousing. In this paper we suggest a Weighted Based Genetic Algorithm (WBGA) for solving the view selection problem with two objectives.

  14. SCEC UCVM - Unified California Velocity Model

    NASA Astrophysics Data System (ADS)

    Small, P.; Maechling, P. J.; Jordan, T. H.; Ely, G. P.; Taborda, R.

    2011-12-01

    The SCEC Unified California Velocity Model (UCVM) is a software framework for a state-wide California velocity model. UCVM provides researchers with two new capabilities: (1) the ability to query Vp, Vs, and density from any standard regional California velocity model through a uniform interface, and (2) the ability to combine multiple velocity models into a single state-wide model. These features are crucial in order to support large-scale ground motion simulations and to facilitate improvements in the underlying velocity models. UCVM provides integrated support for the following standard velocity models: SCEC CVM-H, SCEC CVM-S and the CVM-SI variant, USGS Bay Area (cencalvm), Lin-Thurber Statewide, and other smaller regional models. New models may be easily incorporated as they become available. Two query interfaces are provided: a Linux command line program, and a C application programming interface (API). The C API query interface is simple, fully independent of any specific model, and MPI-friendly. Input coordinates are geographic longitude/latitude and the vertical coordinate may be either depth or elevation. Output parameters include Vp, Vs, and density along with the identity of the model from which these material properties were obtained. In addition to access to the standard models, UCVM also includes a high resolution statewide digital elevation model, Vs30 map, and an optional near-surface geo-technical layer (GTL) based on Ely's Vs30-derived GTL. The elevation and Vs30 information is bundled along with the returned Vp,Vs velocities and density, so that all relevant information is retrieved with a single query. When the GTL is enabled, it is blended with the underlying crustal velocity models along a configurable transition depth range with an interpolation function. Multiple, possibly overlapping, regional velocity models may be combined together into a single state-wide model. This is accomplished by tiling the regional models on top of one another in three dimensions in a researcher-specified order. No reconciliation is performed within overlapping model regions, although a post-processing tool is provided to perform a simple numerical smoothing. Lastly, a 3D region from a combined model may be extracted and exported into a CVM-Etree. This etree may then be queried by UCVM much like a standard velocity model but with less overhead and generally better performance due to the efficiency of the etree data structure.

  15. How Do Children Reformulate Their Search Queries?

    ERIC Educational Resources Information Center

    Rutter, Sophie; Ford, Nigel; Clough, Paul

    2015-01-01

    Introduction: This paper investigates techniques used by children in year 4 (age eight to nine) of a UK primary school to reformulate their queries, and how they use information retrieval systems to support query reformulation. Method: An in-depth study analysing the interactions of twelve children carrying out search tasks in a primary school…

  16. Improving Web Search for Difficult Queries

    ERIC Educational Resources Information Center

    Wang, Xuanhui

    2009-01-01

    Search engines have now become essential tools in all aspects of our life. Although a variety of information needs can be served very successfully, there are still a lot of queries that search engines can not answer very effectively and these queries always make users feel frustrated. Since it is quite often that users encounter such "difficult…

  17. Overview of the TREC 2014 Session Track

    DTIC Science & Technology

    2014-11-01

    except all of them have length mi = 1 and thus they have no current/final query. Participants were to run the 1,021 current queries against their search ... engines under each of the following three conditions separately: RL1 ignoring the session prior to this query RL2 considering all the items (1), (2) and

  18. A "Simple Query Interface" Adapter for the Discovery and Exchange of Learning Resources

    ERIC Educational Resources Information Center

    Massart, David

    2006-01-01

    Developed as part of CEN/ISSS Workshop on Learning Technology efforts to improve interoperability between learning resource repositories, the Simple Query Interface (SQI) is an Application Program Interface (API) for querying heterogeneous repositories of learning resource metadata. In the context of the ProLearn Network of Excellence, SQI is used…

  19. An Experimental Investigation of Complexity in Database Query Formulation Tasks

    ERIC Educational Resources Information Center

    Casterella, Gretchen Irwin; Vijayasarathy, Leo

    2013-01-01

    Information Technology professionals and other knowledge workers rely on their ability to extract data from organizational databases to respond to business questions and support decision making. Structured query language (SQL) is the standard programming language for querying data in relational databases, and SQL skills are in high demand and are…

  20. Cognitive search model and a new query paradigm

    NASA Astrophysics Data System (ADS)

    Xu, Zhonghui

    2001-06-01

    This paper proposes a cognitive model in which people begin to search pictures by using semantic content and find a right picture by judging whether its visual content is a proper visualization of the semantics desired. It is essential that human search is not just a process of matching computation on visual feature but rather a process of visualization of the semantic content known. For people to search electronic images in the way as they manually do in the model, we suggest that querying be a semantic-driven process like design. A query-by-design paradigm is prosed in the sense that what you design is what you find. Unlike query-by-example, query-by-design allows users to specify the semantic content through an iterative and incremental interaction process so that a retrieval can start with association and identification of the given semantic content and get refined while further visual cues are available. An experimental image retrieval system, Kuafu, has been under development using the query-by-design paradigm and an iconic language is adopted.

  1. Secure Nearest Neighbor Query on Crowd-Sensing Data

    PubMed Central

    Cheng, Ke; Wang, Liangmin; Zhong, Hong

    2016-01-01

    Nearest neighbor queries are fundamental in location-based services, and secure nearest neighbor queries mainly focus on how to securely and quickly retrieve the nearest neighbor in the outsourced cloud server. However, the previous big data system structure has changed because of the crowd-sensing data. On the one hand, sensing data terminals as the data owner are numerous and mistrustful, while, on the other hand, in most cases, the terminals find it difficult to finish many safety operation due to computation and storage capability constraints. In light of they Multi Owners and Multi Users (MOMU) situation in the crowd-sensing data cloud environment, this paper presents a secure nearest neighbor query scheme based on the proxy server architecture, which is constructed by protocols of secure two-party computation and secure Voronoi diagram algorithm. It not only preserves the data confidentiality and query privacy but also effectively resists the collusion between the cloud server and the data owners or users. Finally, extensive theoretical and experimental evaluations are presented to show that our proposed scheme achieves a superior balance between the security and query performance compared to other schemes. PMID:27669253

  2. Evaluation methodology for query-based scene understanding systems

    NASA Astrophysics Data System (ADS)

    Huster, Todd P.; Ross, Timothy D.; Culbertson, Jared L.

    2015-05-01

    In this paper, we are proposing a method for the principled evaluation of scene understanding systems in a query-based framework. We can think of a query-based scene understanding system as a generalization of typical sensor exploitation systems where instead of performing a narrowly defined task (e.g., detect, track, classify, etc.), the system can perform general user-defined tasks specified in a query language. Examples of this type of system have been developed as part of DARPA's Mathematics of Sensing, Exploitation, and Execution (MSEE) program. There is a body of literature on the evaluation of typical sensor exploitation systems, but the open-ended nature of the query interface introduces new aspects to the evaluation problem that have not been widely considered before. In this paper, we state the evaluation problem and propose an approach to efficiently learn about the quality of the system under test. We consider the objective of the evaluation to be to build a performance model of the system under test, and we rely on the principles of Bayesian experiment design to help construct and select optimal queries for learning about the parameters of that model.

  3. Practical quantum private query of blocks based on unbalanced-state Bennett-Brassard-1984 quantum-key-distribution protocol

    NASA Astrophysics Data System (ADS)

    Wei, Chun-Yan; Gao, Fei; Wen, Qiao-Yan; Wang, Tian-Yin

    2014-12-01

    Until now, the only kind of practical quantum private query (QPQ), quantum-key-distribution (QKD)-based QPQ, focuses on the retrieval of a single bit. In fact, meaningful message is generally composed of multiple adjacent bits (i.e., a multi-bit block). To obtain a message from database, the user Alice has to query l times to get each ai. In this condition, the server Bob could gain Alice's privacy once he obtains the address she queried in any of the l queries, since each ai contributes to the message Alice retrieves. Apparently, the longer the retrieved message is, the worse the user privacy becomes. To solve this problem, via an unbalanced-state technique and based on a variant of multi-level BB84 protocol, we present a protocol for QPQ of blocks, which allows the user to retrieve a multi-bit block from database in one query. Our protocol is somewhat like the high-dimension version of the first QKD-based QPQ protocol proposed by Jacobi et al., but some nontrivial modifications are necessary.

  4. SPANG: a SPARQL client supporting generation and reuse of queries for distributed RDF databases.

    PubMed

    Chiba, Hirokazu; Uchiyama, Ikuo

    2017-02-08

    Toward improved interoperability of distributed biological databases, an increasing number of datasets have been published in the standardized Resource Description Framework (RDF). Although the powerful SPARQL Protocol and RDF Query Language (SPARQL) provides a basis for exploiting RDF databases, writing SPARQL code is burdensome for users including bioinformaticians. Thus, an easy-to-use interface is necessary. We developed SPANG, a SPARQL client that has unique features for querying RDF datasets. SPANG dynamically generates typical SPARQL queries according to specified arguments. It can also call SPARQL template libraries constructed in a local system or published on the Web. Further, it enables combinatorial execution of multiple queries, each with a distinct target database. These features facilitate easy and effective access to RDF datasets and integrative analysis of distributed data. SPANG helps users to exploit RDF datasets by generation and reuse of SPARQL queries through a simple interface. This client will enhance integrative exploitation of biological RDF datasets distributed across the Web. This software package is freely available at http://purl.org/net/spang .

  5. Analysis of Online Information Searching for Cardiovascular Diseases on a Consumer Health Information Portal

    PubMed Central

    Jadhav, Ashutosh; Sheth, Amit; Pathak, Jyotishman

    2014-01-01

    Since the early 2000’s, Internet usage for health information searching has increased significantly. Studying search queries can help us to understand users “information need” and how do they formulate search queries (“expression of information need”). Although cardiovascular diseases (CVD) affect a large percentage of the population, few studies have investigated how and what users search for CVD. We address this knowledge gap in the community by analyzing a large corpus of 10 million CVD related search queries from MayoClinic.com. Using UMLS MetaMap and UMLS semantic types/concepts, we developed a rule-based approach to categorize the queries into 14 health categories. We analyzed structural properties, types (keyword-based/Wh-questions/Yes-No questions) and linguistic structure of the queries. Our results show that the most searched health categories are ‘Diseases/Conditions’, ‘Vital-Sings’, ‘Symptoms’ and ‘Living-with’. CVD queries are longer and are predominantly keyword-based. This study extends our knowledge about online health information searching and provides useful insights for Web search engines and health websites. PMID:25954380

  6. Secure Nearest Neighbor Query on Crowd-Sensing Data.

    PubMed

    Cheng, Ke; Wang, Liangmin; Zhong, Hong

    2016-09-22

    Nearest neighbor queries are fundamental in location-based services, and secure nearest neighbor queries mainly focus on how to securely and quickly retrieve the nearest neighbor in the outsourced cloud server. However, the previous big data system structure has changed because of the crowd-sensing data. On the one hand, sensing data terminals as the data owner are numerous and mistrustful, while, on the other hand, in most cases, the terminals find it difficult to finish many safety operation due to computation and storage capability constraints. In light of they Multi Owners and Multi Users (MOMU) situation in the crowd-sensing data cloud environment, this paper presents a secure nearest neighbor query scheme based on the proxy server architecture, which is constructed by protocols of secure two-party computation and secure Voronoi diagram algorithm. It not only preserves the data confidentiality and query privacy but also effectively resists the collusion between the cloud server and the data owners or users. Finally, extensive theoretical and experimental evaluations are presented to show that our proposed scheme achieves a superior balance between the security and query performance compared to other schemes.

  7. Practical quantum private query of blocks based on unbalanced-state Bennett-Brassard-1984 quantum-key-distribution protocol

    PubMed Central

    Wei, Chun-Yan; Gao, Fei; Wen, Qiao-Yan; Wang, Tian-Yin

    2014-01-01

    Until now, the only kind of practical quantum private query (QPQ), quantum-key-distribution (QKD)-based QPQ, focuses on the retrieval of a single bit. In fact, meaningful message is generally composed of multiple adjacent bits (i.e., a multi-bit block). To obtain a message from database, the user Alice has to query l times to get each ai. In this condition, the server Bob could gain Alice's privacy once he obtains the address she queried in any of the l queries, since each ai contributes to the message Alice retrieves. Apparently, the longer the retrieved message is, the worse the user privacy becomes. To solve this problem, via an unbalanced-state technique and based on a variant of multi-level BB84 protocol, we present a protocol for QPQ of blocks, which allows the user to retrieve a multi-bit block from database in one query. Our protocol is somewhat like the high-dimension version of the first QKD-based QPQ protocol proposed by Jacobi et al., but some nontrivial modifications are necessary. PMID:25518810

  8. A Novel Database to Rank and Display Archeomagnetic Intensity Data

    NASA Astrophysics Data System (ADS)

    Donadini, F.; Korhonen, K.; Riisager, P.; Pesonen, L. J.; Kahma, K.

    2005-12-01

    To understand the content and the causes of the changes in the Earth's magnetic field beyond the observatory records one has to rely on archeomagnetic and lake sediment paleomagnetic data. The regional archeointensity curves are often of different quality and temporally variable which hampers the global analysis of the data in terms of dipole vs non-dipole field. We have developed a novel archeointensity database application utilizing MySQL, PHP (PHP Hypertext Preprocessor), and the Generic Mapping Tools (GMT) for ranking and displaying geomagnetic intensity data from the last 12000 years. Our application has the advantage that no specific software is required to query the database and view the results. Querying the database is performed using any Web browser; a fill-out form is used to enter the site location and a minimum ranking value to select the data points to be displayed. The form also features the possibility to select plotting of the data as an archeointensity curve with error bars, and a Virtual Axial Dipole Moment (VADM) or ancient field value (Ba) curve calculated using the CALS7K model (Continuous Archaeomagnetic and Lake Sediment geomagnetic model) of (Korte and Constable, 2005). The results of a query are displayed on a Web page containing a table summarizing the query parameters, a table showing the archeointensity values satisfying the query parameters, and a plot of VADM or Ba as a function of sample age. The database consists of eight related tables. The main one, INTENSITIES, stores the 3704 archeointensity measurements collected from 159 publications as VADM (and VDM when available) and Ba values, including their standard deviations and sampling locations. It also contains the number of samples and specimens measured from each site. The REFS table stores the references to a particular study. The names, latitudes, and longitudes of the regions where the samples were collected are stored in the SITES table. The MATERIALS, METHODS, SPECIMEN_TYPES and DATING_METHODS tables store information about the sample materials, intensity determination methods, specimen types and age determination methods. The SIGMA_COUNT table is used indirectly for ranking data according to the number of samples measured and their standard deviations. Each intensity measurement is assigned a score (0--2) depending on the number of specimens measured and their standard deviations, the intensity determination method, the type of specimens measured and materials. The ranking of each data point is calculated as the sum of the four scores and varies between 0 and 8. Additionally, users can select the parameters that will be included in the ranking.

  9. The StarView intelligent query mechanism

    NASA Technical Reports Server (NTRS)

    Semmel, R. D.; Silberberg, D. P.

    1993-01-01

    The StarView interface is being developed to facilitate the retrieval of scientific and engineering data produced by the Hubble Space Telescope. While predefined screens in the interface can be used to specify many common requests, ad hoc requests require a dynamic query formulation capability. Unfortunately, logical level knowledge is too sparse to support this capability. In particular, essential formulation knowledge is lost when the domain of interest is mapped to a set of database relation schemas. Thus, a system known as QUICK has been developed that uses conceptual design knowledge to facilitate query formulation. By heuristically determining strongly associated objects at the conceptual level, QUICK is able to formulate semantically reasonable queries in response to high-level requests that specify only attributes of interest. Moreover, by exploiting constraint knowledge in the conceptual design, QUICK assures that queries are formulated quickly and will execute efficiently.

  10. SP2Bench: A SPARQL Performance Benchmark

    NASA Astrophysics Data System (ADS)

    Schmidt, Michael; Hornung, Thomas; Meier, Michael; Pinkel, Christoph; Lausen, Georg

    A meaningful analysis and comparison of both existing storage schemes for RDF data and evaluation approaches for SPARQL queries necessitates a comprehensive and universal benchmark platform. We present SP2Bench, a publicly available, language-specific performance benchmark for the SPARQL query language. SP2Bench is settled in the DBLP scenario and comprises a data generator for creating arbitrarily large DBLP-like documents and a set of carefully designed benchmark queries. The generated documents mirror vital key characteristics and social-world distributions encountered in the original DBLP data set, while the queries implement meaningful requests on top of this data, covering a variety of SPARQL operator constellations and RDF access patterns. In this chapter, we discuss requirements and desiderata for SPARQL benchmarks and present the SP2Bench framework, including its data generator, benchmark queries and performance metrics.

  11. Object-Oriented Query Language For Events Detection From Images Sequences

    NASA Astrophysics Data System (ADS)

    Ganea, Ion Eugen

    2015-09-01

    In this paper is presented a method to represent the events extracted from images sequences and the query language used for events detection. Using an object oriented model the spatial and temporal relationships between salient objects and also between events are stored and queried. This works aims to unify the storing and querying phases for video events processing. The object oriented language syntax used for events processing allow the instantiation of the indexes classes in order to improve the accuracy of the query results. The experiments were performed on images sequences provided from sport domain and it shows the reliability and the robustness of the proposed language. To extend the language will be added a specific syntax for constructing the templates for abnormal events and for detection of the incidents as the final goal of the research.

  12. Geospatial Data Management Platform for Urban Groundwater

    NASA Astrophysics Data System (ADS)

    Gaitanaru, D.; Priceputu, A.; Gogu, C. R.

    2012-04-01

    Due to the large amount of civil work projects and research studies, large quantities of geo-data are produced for the urban environments. These data are usually redundant as well as they are spread in different institutions or private companies. Time consuming operations like data processing and information harmonisation represents the main reason to systematically avoid the re-use of data. The urban groundwater data shows the same complex situation. The underground structures (subway lines, deep foundations, underground parkings, and others), the urban facility networks (sewer systems, water supply networks, heating conduits, etc), the drainage systems, the surface water works and many others modify continuously. As consequence, their influence on groundwater changes systematically. However, these activities provide a large quantity of data, aquifers modelling and then behaviour prediction can be done using monitored quantitative and qualitative parameters. Due to the rapid evolution of technology in the past few years, transferring large amounts of information through internet has now become a feasible solution for sharing geoscience data. Furthermore, standard platform-independent means to do this have been developed (specific mark-up languages like: GML, GeoSciML, WaterML, GWML, CityML). They allow easily large geospatial databases updating and sharing through internet, even between different companies or between research centres that do not necessarily use the same database structures. For Bucharest City (Romania) an integrated platform for groundwater geospatial data management is developed under the framework of a national research project - "Sedimentary media modeling platform for groundwater management in urban areas" (SIMPA) financed by the National Authority for Scientific Research of Romania. The platform architecture is based on three components: a geospatial database, a desktop application (a complex set of hydrogeological and geological analysis tools) and a front-end geoportal service. The SIMPA platform makes use of mark-up transfer standards to provide a user-friendly application that can be accessed through internet to query, analyse, and visualise geospatial data related to urban groundwater. The platform holds the information within the local groundwater geospatial databases and the user is able to access this data through a geoportal service. The database architecture allows storing accurate and very detailed geological, hydrogeological, and infrastructure information that can be straightforwardly generalized and further upscaled. The geoportal service offers the possibility of querying a dataset from the spatial database. The query is coded in a standard mark-up language, and sent to the server through a standard Hyper Text Transfer Protocol (http) to be processed by the local application. After the validation of the query, the results are sent back to the user to be displayed by the geoportal application. The main advantage of the SIMPA platform is that it offers to the user the possibility to make a primary multi-criteria query, which results in a smaller set of data to be analysed afterwards. This improves both the transfer process parameters and the user's means of creating the desired query.

  13. Hepatic CT image query using Gabor features

    NASA Astrophysics Data System (ADS)

    Zhao, Chenguang; Cheng, Hongyan; Zhuang, Tiange

    2004-07-01

    A retrieval scheme for liver computerize tomography (CT) images based on Gabor texture is presented. For each hepatic CT image, we manually delineate abnormal regions within liver area. Then, a continuous Gabor transform is utilized to analyze the texture of the pathology bearing region and extract the corresponding feature vectors. For a given sample image, we compare its feature vector with those of other images. Similar images with the highest rank are retrieved. In experiments, 45 liver CT images are collected, and the effectiveness of Gabor texture for content based retrieval is verified.

  14. LETTER TO THE EDITOR: Constant-time solution to the global optimization problem using Brüschweiler's ensemble search algorithm

    NASA Astrophysics Data System (ADS)

    Protopopescu, V.; D'Helon, C.; Barhen, J.

    2003-06-01

    A constant-time solution of the continuous global optimization problem (GOP) is obtained by using an ensemble algorithm. We show that under certain assumptions, the solution can be guaranteed by mapping the GOP onto a discrete unsorted search problem, whereupon Brüschweiler's ensemble search algorithm is applied. For adequate sensitivities of the measurement technique, the query complexity of the ensemble search algorithm depends linearly on the size of the function's domain. Advantages and limitations of an eventual NMR implementation are discussed.

  15. Novel Surveillance of Psychological Distress during the Great Recession

    PubMed Central

    Ayers, John W.; Althouse, Benjamin M.; Allem, Jon-Patrick; Childers, Matthew A.; Zafar, Waleed; Latkin, Carl; Ribisl, Kurt M.; Brownstein, John S.

    2015-01-01

    Background Economic stressors have been retrospectively associated with net population increases in nonspecific psychological distress (PD). However, no sentinels exist to evaluate contemporaneous associations. Aggregate Internet search query surveillance was used to monitor population changes in PD around the United States’ Great Recession. Methods Monthly PD query trends were compared with unemployment, underemployment, homes in delinquency and foreclosure, median home value or sale prices, and S&P 500 trends for 2004–2010. Time series analyses, where economic indicators predicted PD one to seven months into the future, were performed in 2011. Results PD queries surpassed 1,000,000 per month, of which 300,000 may be attributable to the Great Recession. A one percentage point increase in mortgage delinquencies and foreclosures was associated with a 16% (95%CI, 9–24) increase in PD queries one-month, and 11% (95%CI, 3–18) four months later, in reference to a pre-Great Recession mean. Unemployment and underemployment had similar associations half and one-quarter the intensity. “Anxiety disorder,” “what is depression,” “signs of depression,” “depression symptoms,” and “symptoms of depression” were the queries exhibiting the strongest associations with mortgage delinquencies and foreclosures, unemployment or underemployment. Housing prices and S&P 500 trends were not associated with PD queries. Limitations A non-traditional measure of PD was used. It is unclear if actual clinically significant depression or anxiety increased during the Great Recession. Alternative explanations for strong associations between the Great Recession and PD queries, such as media, were explored and rejected. Conclusions Because the economy is constantly changing, this work not only provides a snapshot of recent associations between the economy and PD queries but also a framework and toolkit for real-time surveillance going forward. Health resources, clinician screening patterns, and policy debate may potentially be informed by changes in PD query trends. PMID:22835843

  16. Novel surveillance of psychological distress during the great recession.

    PubMed

    Ayers, John W; Althouse, Benjamin M; Allem, Jon-Patrick; Childers, Matthew A; Zafar, Waleed; Latkin, Carl; Ribisl, Kurt M; Brownstein, John S

    2012-12-15

    Economic stressors have been retrospectively associated with net population increases in nonspecific psychological distress (PD). However, no sentinels exist to evaluate contemporaneous associations. Aggregate Internet search query surveillance was used to monitor population changes in PD around the United States' Great Recession. Monthly PD query trends were compared with unemployment, underemployment, homes in delinquency and foreclosure, median home value or sale prices, and S&P 500 trends for 2004-2010. Time series analyses, where economic indicators predicted PD one to seven months into the future, were performed in 2011. PD queries surpassed 1,000,000 per month, of which 300,000 may be attributable to the Great Recession. A one percentage point increase in mortgage delinquencies and foreclosures was associated with a 16% (95%CI, 9-24) increase in PD queries one-month, and 11% (95%CI, 3-18) four months later, in reference to a pre-Great Recession mean. Unemployment and underemployment had similar associations half and one-quarter the intensity. "Anxiety disorder", "what is depression", "signs of depression", "depression symptoms", and "symptoms of depression" were the queries exhibiting the strongest associations with mortgage delinquencies and foreclosures, unemployment or underemployment. Housing prices and S&P 500 trends were not associated with PD queries. A non-traditional measure of PD was used. It is unclear if actual clinically significant depression or anxiety increased during the Great Recession. Alternative explanations for strong associations between the Great Recession and PD queries, such as media, were explored and rejected. Because the economy is constantly changing, this work not only provides a snapshot of recent associations between the economy and PD queries but also a framework and toolkit for real-time surveillance going forward. Health resources, clinician screening patterns, and policy debate may be informed by changes in PD query trends. Copyright © 2012 Elsevier B.V. All rights reserved.

  17. A natural language interface plug-in for cooperative query answering in biological databases.

    PubMed

    Jamil, Hasan M

    2012-06-11

    One of the many unique features of biological databases is that the mere existence of a ground data item is not always a precondition for a query response. It may be argued that from a biologist's standpoint, queries are not always best posed using a structured language. By this we mean that approximate and flexible responses to natural language like queries are well suited for this domain. This is partly due to biologists' tendency to seek simpler interfaces and partly due to the fact that questions in biology involve high level concepts that are open to interpretations computed using sophisticated tools. In such highly interpretive environments, rigidly structured databases do not always perform well. In this paper, our goal is to propose a semantic correspondence plug-in to aid natural language query processing over arbitrary biological database schema with an aim to providing cooperative responses to queries tailored to users' interpretations. Natural language interfaces for databases are generally effective when they are tuned to the underlying database schema and its semantics. Therefore, changes in database schema become impossible to support, or a substantial reorganization cost must be absorbed to reflect any change. We leverage developments in natural language parsing, rule languages and ontologies, and data integration technologies to assemble a prototype query processor that is able to transform a natural language query into a semantically equivalent structured query over the database. We allow knowledge rules and their frequent modifications as part of the underlying database schema. The approach we adopt in our plug-in overcomes some of the serious limitations of many contemporary natural language interfaces, including support for schema modifications and independence from underlying database schema. The plug-in introduced in this paper is generic and facilitates connecting user selected natural language interfaces to arbitrary databases using a semantic description of the intended application. We demonstrate the feasibility of our approach with a practical example.

  18. POPE: Partial Order Preserving Encoding

    DTIC Science & Technology

    2016-09-09

    Alex X. Liu, Ann L. Wang, and Bezawada Bruhadeshwar. Fast range query processing with strong privacy protection for cloud computing . Proc. VLDB...States government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article , or to allow others to do so...of these schemes the direc- tory in the persistent client storage depends on the full dataset. Thus 1We abuse notation and use OPE to refer to both

  19. They’re heating up: Internet search query trends reveal significant public interest in heat-not-burn tobacco products

    PubMed Central

    Caputi, Theodore L.; Leas, Eric; Dredze, Mark; Cohen, Joanna E.; Ayers, John W.

    2017-01-01

    Heat-not-burn tobacco products, battery powered devices that heat leaf tobacco to approximately 500 degrees Fahrenheit to produce an inhalable aerosol, are being introduced in markets around the world. Japan, where manufacturers have marketed several heat-not-burn brands since 2014, has been the focal national test market, with the intention of developing global marketing strategies. We used Google search query data to estimate, for the first time, the scale and growth potential of heat-not-burn tobacco products. Average monthly searches for heat-not-burn products rose 1,426% (95%CI: 746,3574) between their first (2015) and second (2016) complete years on the market and an additional 100% (95%CI: 60, 173) between the products second (2016) and third years on the market (Jan-Sep 2017). There are now between 5.9 and 7.5 million heat-not-burn related Google searches in Japan each month based on September 2017 estimates. Moreover, forecasts relying on the historical trends suggest heat-not-burn searches will increase an additional 32% (95%CI: -4 to 79) during 2018, compared to current estimates for 2017 (Jan-Sep), with continued growth thereafter expected. Contrasting heat-not-burn’s rise in Japan to electronic cigarettes’ rise in the United States we find searches for heat-not-burn eclipsed electronic cigarette searches during April 2016. Moreover, the change in average monthly queries for heat-not-burn in Japan between 2015 and 2017 was 399 (95% CI: 184, 1490) times larger than the change in average monthly queries for electronic cigarettes in the Unites States over the same time period, increasing by 2,956% (95% CI: 1729, 7304) compared to only 7% (95% CI: 3,13). Our findings are a clarion call for tobacco control leaders to ready themselves as heat-not-burn tobacco products will likely garner substantial interest as they are introduced into new markets. Public health practitioners should expand heat-not-burn tobacco product surveillance, adjust existing tobacco control strategies to account for heat-not-burn tobacco products, and preemptively study the health risks/benefits, popular perceptions, and health messaging around heat-not-burn tobacco products. PMID:29020019

  20. They're heating up: Internet search query trends reveal significant public interest in heat-not-burn tobacco products.

    PubMed

    Caputi, Theodore L; Leas, Eric; Dredze, Mark; Cohen, Joanna E; Ayers, John W

    2017-01-01

    Heat-not-burn tobacco products, battery powered devices that heat leaf tobacco to approximately 500 degrees Fahrenheit to produce an inhalable aerosol, are being introduced in markets around the world. Japan, where manufacturers have marketed several heat-not-burn brands since 2014, has been the focal national test market, with the intention of developing global marketing strategies. We used Google search query data to estimate, for the first time, the scale and growth potential of heat-not-burn tobacco products. Average monthly searches for heat-not-burn products rose 1,426% (95%CI: 746,3574) between their first (2015) and second (2016) complete years on the market and an additional 100% (95%CI: 60, 173) between the products second (2016) and third years on the market (Jan-Sep 2017). There are now between 5.9 and 7.5 million heat-not-burn related Google searches in Japan each month based on September 2017 estimates. Moreover, forecasts relying on the historical trends suggest heat-not-burn searches will increase an additional 32% (95%CI: -4 to 79) during 2018, compared to current estimates for 2017 (Jan-Sep), with continued growth thereafter expected. Contrasting heat-not-burn's rise in Japan to electronic cigarettes' rise in the United States we find searches for heat-not-burn eclipsed electronic cigarette searches during April 2016. Moreover, the change in average monthly queries for heat-not-burn in Japan between 2015 and 2017 was 399 (95% CI: 184, 1490) times larger than the change in average monthly queries for electronic cigarettes in the Unites States over the same time period, increasing by 2,956% (95% CI: 1729, 7304) compared to only 7% (95% CI: 3,13). Our findings are a clarion call for tobacco control leaders to ready themselves as heat-not-burn tobacco products will likely garner substantial interest as they are introduced into new markets. Public health practitioners should expand heat-not-burn tobacco product surveillance, adjust existing tobacco control strategies to account for heat-not-burn tobacco products, and preemptively study the health risks/benefits, popular perceptions, and health messaging around heat-not-burn tobacco products.

  1. Measuring the development of conceptual understanding in chemistry

    NASA Astrophysics Data System (ADS)

    Claesgens, Jennifer Marie

    The purpose of this dissertation research is to investigate and characterize how students learn chemistry from pre-instruction to deeper understanding of the subject matter in their general chemistry coursework. Based on preliminary work, I believe that students have a general pathway of learning across the "big ideas," or concepts, in chemistry that can be characterized over the course of instruction. My hypothesis is that as students learn chemistry they build from experience and logical reasoning then relate chemistry specific ideas in a pair-wise fashion before making more complete multi-relational links for deeper understanding of the subject matter. This proposed progression of student learning, which starts at Notions, moves to Recognition, and then to Formulation, is described in the ChemQuery Perspectives framework. My research continues the development of ChemQuery, an NSF-funded assessment system that uses a framework of the key ideas in the discipline and criterion-referenced analysis using item response theory (IRT) to map student progress. Specifially, this research investigates the potential for using criterion-referenced analysis to describe and measure how students learn chemistry followed by more detailed task analysis of patterns in student responses found in the data. My research question asks: does IRT work to describe and measure how students learn chemistry and if so, what is discovered about how students learn? Although my findings seem to neither entirely support nor entirely refute the pathway of student understanding proposed in the ChemQuery Perspectives framework. My research does provide an indication of trouble spots. For example, it seems like the pathway from Notions to Recognition is holding but there are difficulties around the transition from Recognition to Formulation that cannot be resolved with this data. Nevertheless, this research has produced the following, which has contributed to the development of the ChemQuery assessment system, (a) 13 new change items with good fits, 3 new change items that need further study, (b) a refined scoring guide and (c) a set of item exemplars that can then be developed further into a computer-adapted model so that more data can be captured.

  2. UGent Participation in the Microblog Track 2012

    DTIC Science & Technology

    2012-11-01

    original one. A counter example is the query “ Steve Jobs ’ health” in topic 106. For this query, user wants to know about the health situation of Steve ...5x), “destroyer” (2x), “creator” (3x), “composed” (2x), and “products” (2x). These terms can be formed another query like: “ Steve Jobs is... Jobs ’. However, the returned title terms of top 30 tweets for this query are “steve” (7x), “jobs” (7x), “is” (2x), “of” (3x), “health” (1x), “apple

  3. Hybrid Schema Matching for Deep Web

    NASA Astrophysics Data System (ADS)

    Chen, Kerui; Zuo, Wanli; He, Fengling; Chen, Yongheng

    Schema matching is the process of identifying semantic mappings, or correspondences, between two or more schemas. Schema matching is a first step and critical part of data integration. For schema matching of deep web, most researches only interested in query interface, while rarely pay attention to abundant schema information contained in query result pages. This paper proposed a mixed schema matching technique, which combines attributes that appeared in query structures and query results of different data sources, and mines the matched schemas inside. Experimental results prove the effectiveness of this method for improving the accuracy of schema matching.

  4. TreeQ-VISTA: An Interactive Tree Visualization Tool withFunctional Annotation Query Capabilities

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gu, Shengyin; Anderson, Iain; Kunin, Victor

    2007-05-07

    Summary: We describe a general multiplatform exploratorytool called TreeQ-Vista, designed for presenting functional annotationsin a phylogenetic context. Traits, such as phenotypic and genomicproperties, are interactively queried from a relational database with auser-friendly interface which provides a set of tools for users with orwithout SQL knowledge. The query results are projected onto aphylogenetic tree and can be displayed in multiple color groups. A richset of browsing, grouping and query tools are provided to facilitatetrait exploration, comparison and analysis.Availability: The program,detailed tutorial and examples are available online athttp://genome-test.lbl.gov/vista/TreeQVista.

  5. The I4 Online Query Tool for Earth Observations Data

    NASA Technical Reports Server (NTRS)

    Stefanov, William L.; Vanderbloemen, Lisa A.; Lawrence, Samuel J.

    2015-01-01

    The NASA Earth Observation System Data and Information System (EOSDIS) delivers an average of 22 terabytes per day of data collected by orbital and airborne sensor systems to end users through an integrated online search environment (the Reverb/ECHO system). Earth observations data collected by sensors on the International Space Station (ISS) are not currently included in the EOSDIS system, and are only accessible through various individual online locations. This increases the effort required by end users to query multiple datasets, and limits the opportunity for data discovery and innovations in analysis. The Earth Science and Remote Sensing Unit of the Exploration Integration and Science Directorate at NASA Johnson Space Center has collaborated with the School of Earth and Space Exploration at Arizona State University (ASU) to develop the ISS Instrument Integration Implementation (I4) data query tool to provide end users a clean, simple online interface for querying both current and historical ISS Earth Observations data. The I4 interface is based on the Lunaserv and Lunaserv Global Explorer (LGE) open-source software packages developed at ASU for query of lunar datasets. In order to avoid mirroring existing databases - and the need to continually sync/update those mirrors - our design philosophy is for the I4 tool to be a pure query engine only. Once an end user identifies a specific scene or scenes of interest, I4 transparently takes the user to the appropriate online location to download the data. The tool consists of two public-facing web interfaces. The Map Tool provides a graphic geobrowser environment where the end user can navigate to an area of interest and select single or multiple datasets to query. The Map Tool displays active image footprints for the selected datasets (Figure 1). Selecting a footprint will open a pop-up window that includes a browse image and a link to available image metadata, along with a link to the online location to order or download the actual data. Search results are either delivered in the form of browse images linked to the appropriate online database, similar to the Map Tool, or they may be transferred within the I4 environment for display as footprints in the Map Tool. Datasets searchable through I4 (http://eol.jsc.nasa.gov/I4_tool) currently include: Crew Earth Observations (CEO) cataloged and uncataloged handheld astronaut photography; Sally Ride EarthKAM; Hyperspectral Imager for the Coastal Ocean (HICO); and the ISS SERVIR Environmental Research and Visualization System (ISERV). The ISS is a unique platform in that it will have multiple users over its lifetime, and that no single remote sensing system has a permanent internal or external berth. The open source I4 tool is designed to enable straightforward addition of new datasets as they become available such as ISS-RapidSCAT, Cloud Aerosol Transport System (CATS), and the High Definition Earth Viewing (HDEV) system. Data from other sensor systems, such as those operated by the ISS International Partners or under the auspices of the US National Laboratory program, can also be added to I4 provided sufficient access to enable searching of data or metadata is available. Commercial providers of remotely sensed data from the ISS may be particularly interested in I4 as an additional means of directing potential customers and clients to their products.

  6. Visualizing whole-brain DTI tractography with GPU-based Tuboids and LoD management.

    PubMed

    Petrovic, Vid; Fallon, James; Kuester, Falko

    2007-01-01

    Diffusion Tensor Imaging (DTI) of the human brain, coupled with tractography techniques, enable the extraction of large-collections of three-dimensional tract pathways per subject. These pathways and pathway bundles represent the connectivity between different brain regions and are critical for the understanding of brain related diseases. A flexible and efficient GPU-based rendering technique for DTI tractography data is presented that addresses common performance bottlenecks and image-quality issues, allowing interactive render rates to be achieved on commodity hardware. An occlusion query-based pathway LoD management system for streamlines/streamtubes/tuboids is introduced that optimizes input geometry, vertex processing, and fragment processing loads, and helps reduce overdraw. The tuboid, a fully-shaded streamtube impostor constructed entirely on the GPU from streamline vertices, is also introduced. Unlike full streamtubes and other impostor constructs, tuboids require little to no preprocessing or extra space over the original streamline data. The supported fragment processing levels of detail range from texture-based draft shading to full raycast normal computation, Phong shading, environment mapping, and curvature-correct text labeling. The presented text labeling technique for tuboids provides adaptive, aesthetically pleasing labels that appear attached to the surface of the tubes. Furthermore, an occlusion query aggregating and scheduling scheme for tuboids is described that reduces the query overhead. Results for a tractography dataset are presented, and demonstrate that LoD-managed tuboids offer benefits over traditional streamtubes both in performance and appearance.

  7. LibKiSAO: a Java library for Querying KiSAO.

    PubMed

    Zhukova, Anna; Adams, Richard; Laibe, Camille; Le Novère, Nicolas

    2012-09-24

    The Kinetic Simulation Algorithm Ontology (KiSAO) supplies information about existing algorithms available for the simulation of Systems Biology models, their characteristics, parameters and inter-relationships. KiSAO enables the unambiguous identification of algorithms from simulation descriptions. Information about analogous methods having similar characteristics and about algorithm parameters incorporated into KiSAO is desirable for simulation tools. To retrieve this information programmatically an application programming interface (API) for KiSAO is needed. We developed libKiSAO, a Java library to enable querying of the KiSA Ontology. It implements methods to retrieve information about simulation algorithms stored in KiSAO, their characteristics and parameters, and methods to query the algorithm hierarchy and search for similar algorithms providing comparable results for the same simulation set-up. Using libKiSAO, simulation tools can make logical inferences based on this knowledge and choose the most appropriate algorithm to perform a simulation. LibKiSAO also enables simulation tools to handle a wider range of simulation descriptions by determining which of the available methods are similar and can be used instead of the one indicated in the simulation description if that one is not implemented. LibKiSAO enables Java applications to easily access information about simulation algorithms, their characteristics and parameters stored in the OWL-encoded Kinetic Simulation Algorithm Ontology. LibKiSAO can be used by simulation description editors and simulation tools to improve reproducibility of computational simulation tasks and facilitate model re-use.

  8. MP3C - the Minor Planet Physical Properties Catalogue: a New VO Service For Multi-database Query

    NASA Astrophysics Data System (ADS)

    Tanga, Paolo; Delbo, M.; Gerakis, J.

    2013-10-01

    In the last few years we witnessed a large growth in the number of asteroids for which we have physical properties. However, these data are dispersed in a multiplicity of catalogs. Extracting data and combining them for further analysis requires custom tools, a situation further complicated by the variety of data sources, some of them standardized (Planetary Data System) others not. With these problems in mind, we created a new Virtual Observatory service named “Minor Planet Physical Properties Catalogue” (abbreviated as MP3C - http://mp3c.oca.eu/). MP3C is not a new database, but rather a portal allowing the user to access selected properties of objects by easy SQL query, even from different sources. At present, such diverse data as orbital parameters, photometric and light curve parameters, sizes and albedos derived by IRAS, AKARI and WISE, SDSS colors, SMASS taxonomy, family membership, satellite data, stellar occultation results, are included. Other data sources will be added in the near future. The physical properties output of the MP3C can be tuned by the users by query criteria based upon ranges of values of the ingested quantities. The resulting list of object can be used for interactive plots through standard VO tools such as TOPCAT. Also, their ephemerids and visibilities from given sites can be computed. We are targeting full VO compliance for providing a new standardized service to the community.

  9. Complementary and alternative medicine among veterans and military personnel: a synthesis of population surveys.

    PubMed

    Davis, Margot T; Mulvaney-Day, Norah; Larson, Mary Jo; Hoover, Ronald; Mauch, Danna

    2014-12-01

    Recent reports reinforce the widespread interest in complementary and alternative medicine (CAM), not only among military personnel with combat-related disorders, but also among providers who are pressed to respond to patient demand for these therapies. However, an understanding of utilization of CAM therapies in this population is lacking. The goals of this study are to synthesize the content of self-report population surveys with information on use of CAM in military and veteran populations, assess gaps in knowledge, and suggest ways to address current limitations. The research team conducted a literature review of population surveys to identify CAM definitions, whether military status was queried, the medical and psychological conditions queried, and each specific CAM question. Utilization estimates specific to military/veterans were summarized and limitations to knowledge was classified. Seven surveys of CAM utilization were conducted with military/veteran groups. In addition, 7 household surveys queried military status, although there was no military/veteran subgroup analysis. Definition of CAM varied widely limiting cross-survey analysis. Among active duty and Reserve military, CAM use ranged between 37% and 46%. Survey estimates do not specify CAM use that is associated with a medical or behavioral health condition. Comparisons between surveys are hampered due to variation in methodologies. Too little is known about reasons for using CAM and conditions for which it is used. Additional information could be drawn from current surveys with additional subgroup analysis, and future surveys of CAM should include military status variable.

  10. FLASHFLOOD: A 3D Field-based similarity search and alignment method for flexible molecules

    NASA Astrophysics Data System (ADS)

    Pitman, Michael C.; Huber, Wolfgang K.; Horn, Hans; Krämer, Andreas; Rice, Julia E.; Swope, William C.

    2001-07-01

    A three-dimensional field-based similarity search and alignment method for flexible molecules is introduced. The conformational space of a flexible molecule is represented in terms of fragments and torsional angles of allowed conformations. A user-definable property field is used to compute features of fragment pairs. Features are generalizations of CoMMA descriptors (Silverman, B.D. and Platt, D.E., J. Med. Chem., 39 (1996) 2129.) that characterize local regions of the property field by its local moments. The features are invariant under coordinate system transformations. Features taken from a query molecule are used to form alignments with fragment pairs in the database. An assembly algorithm is then used to merge the fragment pairs into full structures, aligned to the query. Key to the method is the use of a context adaptive descriptor scaling procedure as the basis for similarity. This allows the user to tune the weights of the various feature components based on examples relevant to the particular context under investigation. The property fields may range from simple, phenomenological fields, to fields derived from quantum mechanical calculations. We apply the method to the dihydrofolate/methotrexate benchmark system, and show that when one injects relevant contextual information into the descriptor scaling procedure, better results are obtained more efficiently. We also show how the method works and include computer times for a query from a database that represents approximately 23 million conformers of seventeen flexible molecules.

  11. Classification of Automated Search Traffic

    NASA Astrophysics Data System (ADS)

    Buehrer, Greg; Stokes, Jack W.; Chellapilla, Kumar; Platt, John C.

    As web search providers seek to improve both relevance and response times, they are challenged by the ever-increasing tax of automated search query traffic. Third party systems interact with search engines for a variety of reasons, such as monitoring a web site’s rank, augmenting online games, or possibly to maliciously alter click-through rates. In this paper, we investigate automated traffic (sometimes referred to as bot traffic) in the query stream of a large search engine provider. We define automated traffic as any search query not generated by a human in real time. We first provide examples of different categories of query logs generated by automated means. We then develop many different features that distinguish between queries generated by people searching for information, and those generated by automated processes. We categorize these features into two classes, either an interpretation of the physical model of human interactions, or as behavioral patterns of automated interactions. Using the these detection features, we next classify the query stream using multiple binary classifiers. In addition, a multiclass classifier is then developed to identify subclasses of both normal and automated traffic. An active learning algorithm is used to suggest which user sessions to label to improve the accuracy of the multiclass classifier, while also seeking to discover new classes of automated traffic. Performance analysis are then provided. Finally, the multiclass classifier is used to predict the subclass distribution for the search query stream.

  12. Meshable: searching PubMed abstracts by utilizing MeSH and MeSH-derived topical terms.

    PubMed

    Kim, Sun; Yeganova, Lana; Wilbur, W John

    2016-10-01

    Medical Subject Headings (MeSH(®)) is a controlled vocabulary for indexing and searching biomedical literature. MeSH terms and subheadings are organized in a hierarchical structure and are used to indicate the topics of an article. Biologists can use either MeSH terms as queries or the MeSH interface provided in PubMed(®) for searching PubMed abstracts. However, these are rarely used, and there is no convenient way to link standardized MeSH terms to user queries. Here, we introduce a web interface which allows users to enter queries to find MeSH terms closely related to the queries. Our method relies on co-occurrence of text words and MeSH terms to find keywords that are related to each MeSH term. A query is then matched with the keywords for MeSH terms, and candidate MeSH terms are ranked based on their relatedness to the query. The experimental results show that our method achieves the best performance among several term extraction approaches in terms of topic coherence. Moreover, the interface can be effectively used to find full names of abbreviations and to disambiguate user queries. https://www.ncbi.nlm.nih.gov/IRET/MESHABLE/ CONTACT: sun.kim@nih.gov Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  13. Web queries as a source for syndromic surveillance.

    PubMed

    Hulth, Anette; Rydevik, Gustaf; Linde, Annika

    2009-01-01

    In the field of syndromic surveillance, various sources are exploited for outbreak detection, monitoring and prediction. This paper describes a study on queries submitted to a medical web site, with influenza as a case study. The hypothesis of the work was that queries on influenza and influenza-like illness would provide a basis for the estimation of the timing of the peak and the intensity of the yearly influenza outbreaks that would be as good as the existing laboratory and sentinel surveillance. We calculated the occurrence of various queries related to influenza from search logs submitted to a Swedish medical web site for two influenza seasons. These figures were subsequently used to generate two models, one to estimate the number of laboratory verified influenza cases and one to estimate the proportion of patients with influenza-like illness reported by selected General Practitioners in Sweden. We applied an approach designed for highly correlated data, partial least squares regression. In our work, we found that certain web queries on influenza follow the same pattern as that obtained by the two other surveillance systems for influenza epidemics, and that they have equal power for the estimation of the influenza burden in society. Web queries give a unique access to ill individuals who are not (yet) seeking care. This paper shows the potential of web queries as an accurate, cheap and labour extensive source for syndromic surveillance.

  14. An Evaluation of the Interactive Query Expansion in an Online Library Catalogue with a Graphical User Interface.

    ERIC Educational Resources Information Center

    Hancock-Beaulieu, Micheline; And Others

    1995-01-01

    An online library catalog was used to evaluate an interactive query expansion facility based on relevance feedback for the Okapi, probabilistic, term weighting, retrieval system. A graphical user interface allowed searchers to select candidate terms extracted from relevant retrieved items to reformulate queries. Results suggested that the…

  15. A New Framework for Textual Information Mining over Parse Trees. CRESST Report 805

    ERIC Educational Resources Information Center

    Mousavi, Hamid; Kerr, Deirdre; Iseli, Markus R.

    2011-01-01

    Textual information mining is a challenging problem that has resulted in the creation of many different rule-based linguistic query languages. However, these languages generally are not optimized for the purpose of text mining. In other words, they usually consider queries as individuals and only return raw results for each query. Moreover they…

  16. Query Classification and Study of University Students' Search Trends

    ERIC Educational Resources Information Center

    Maabreh, Majdi A.; Al-Kabi, Mohammed N.; Alsmadi, Izzat M.

    2012-01-01

    Purpose: This study is an attempt to develop an automatic identification method for Arabic web queries and divide them into several query types using data mining. In addition, it seeks to evaluate the impact of the academic environment on using the internet. Design/methodology/approach: The web log files were collected from one of the higher…

  17. Searching the Web: The Public and Their Queries.

    ERIC Educational Resources Information Center

    Spink, Amanda; Wolfram, Dietmar; Jansen, Major B. J.; Saracevic, Tefko

    2001-01-01

    Reports findings from a study of searching behavior by over 200,000 users of the Excite search engine. Analysis of over one million queries revealed most people use few search terms, few modified queries, view few Web pages, and rarely use advanced search features. Concludes that Web searching by the public differs significantly from searching of…

  18. The Limitations of Term Co-Occurrence Data for Query Expansion in Document Retrieval Systems.

    ERIC Educational Resources Information Center

    Peat, Helen J.; Willett, Peter

    1991-01-01

    Identifies limitations in the use of term co-occurrence data as a basis for automatic query expansion in natural language document retrieval systems. The use of similarity coefficients to calculate the degree of similarity between pairs of terms is explained, and frequency and discriminatory characteristics for nearest neighbors of query terms are…

  19. User Oriented Platform for Data Analytics in Medical Imaging Repositories.

    PubMed

    Valerio, Miguel; Godinho, Tiago Marques; Costa, Carlos

    2016-01-01

    The production of medical imaging studies and associated data has been growing in the last decades. Their primary use is to support medical diagnosis and treatment processes. However, the secondary use of the tremendous amount of stored data is generally more limited. Nowadays, medical imaging repositories have turned into rich databanks holding not only the images themselves, but also a wide range of metadata related to the medical practice. Exploring these repositories through data analysis and business intelligence techniques has the potential of increasing the efficiency and quality of the medical practice. Nevertheless, the continuous production of tremendous amounts of data makes their analysis difficult by conventional approaches. This article proposes a novel automated methodology to derive knowledge from medical imaging repositories that does not disrupt the regular medical practice. Our method is able to apply statistical analysis and business intelligence techniques directly on top of live institutional repositories. It is a Web-based solution that provides extensive dashboard capabilities, including complete charting and reporting options, combined with data mining components. Moreover, it enables the operator to set a wide multitude of query parameters and operators through the use of an intuitive graphical interface.

  20. Analyzing locomotion synthesis with feature-based motion graphs.

    PubMed

    Mahmudi, Mentar; Kallmann, Marcelo

    2013-05-01

    We propose feature-based motion graphs for realistic locomotion synthesis among obstacles. Among several advantages, feature-based motion graphs achieve improved results in search queries, eliminate the need of postprocessing for foot skating removal, and reduce the computational requirements in comparison to traditional motion graphs. Our contributions are threefold. First, we show that choosing transitions based on relevant features significantly reduces graph construction time and leads to improved search performances. Second, we employ a fast channel search method that confines the motion graph search to a free channel with guaranteed clearance among obstacles, achieving faster and improved results that avoid expensive collision checking. Lastly, we present a motion deformation model based on Inverse Kinematics applied over the transitions of a solution branch. Each transition is assigned a continuous deformation range that does not exceed the original transition cost threshold specified by the user for the graph construction. The obtained deformation improves the reachability of the feature-based motion graph and in turn also reduces the time spent during search. The results obtained by the proposed methods are evaluated and quantified, and they demonstrate significant improvements in comparison to traditional motion graph techniques.

  1. Quantifying killing of orangutans and human-orangutan conflict in Kalimantan, Indonesia.

    PubMed

    Meijaard, Erik; Buchori, Damayanti; Hadiprakarsa, Yokyok; Utami-Atmoko, Sri Suci; Nurcahyo, Anton; Tjiu, Albertus; Prasetyo, Didik; Nardiyono; Christie, Lenny; Ancrenaz, Marc; Abadi, Firman; Antoni, I Nyoman Gede; Armayadi, Dedy; Dinato, Adi; Ella; Gumelar, Pajar; Indrawan, Tito P; Kussaritano; Munajat, Cecep; Priyono, C Wawan Puji; Purwanto, Yadi; Puspitasari, Dewi; Putra, M Syukur Wahyu; Rahmat, Abdi; Ramadani, Harri; Sammy, Jim; Siswanto, Dedi; Syamsuri, Muhammad; Andayani, Noviar; Wu, Huanhuan; Wells, Jessie Anne; Mengersen, Kerrie

    2011-01-01

    Human-orangutan conflict and hunting are thought to pose a serious threat to orangutan existence in Kalimantan, the Indonesian part of Borneo. No data existed prior to the present study to substantiate these threats. We investigated the rates, spatial distribution and causes of conflict and hunting through an interview-based survey in the orangutan's range in Kalimantan, Indonesia. Between April 2008 and September 2009, we interviewed 6983 respondents in 687 villages to obtain socio-economic information, assess knowledge of local wildlife in general and orangutan encounters specifically, and to query respondents about their knowledge on orangutan conflicts and killing, and relevant laws. This survey revealed estimated killing rates of between 750 and 1800 animals killed in the last year, and between 1950 and 3100 animals killed per year on average within the lifetime of the survey respondents. These killing rates are higher than previously thought and are high enough to pose a serious threat to the continued existence of orangutans in Kalimantan. Importantly, the study contributes to our understanding of the spatial variation in threats, and the underlying causes of those threats, which can be used to facilitate the development of targeted conservation management.

  2. Quantifying Killing of Orangutans and Human-Orangutan Conflict in Kalimantan, Indonesia

    PubMed Central

    Meijaard, Erik; Buchori, Damayanti; Hadiprakarsa, Yokyok; Utami-Atmoko, Sri Suci; Nurcahyo, Anton; Tjiu, Albertus; Prasetyo, Didik; Nardiyono; Christie, Lenny; Ancrenaz, Marc; Abadi, Firman; Antoni, I Nyoman Gede; Armayadi, Dedy; Dinato, Adi; Ella; Gumelar, Pajar; Indrawan, Tito P.; Kussaritano; Munajat, Cecep; Priyono, C. Wawan Puji; Purwanto, Yadi; Puspitasari, Dewi; Putra, M. Syukur Wahyu; Rahmat, Abdi; Ramadani, Harri; Sammy, Jim; Siswanto, Dedi; Syamsuri, Muhammad; Andayani, Noviar; Wu, Huanhuan; Wells, Jessie Anne; Mengersen, Kerrie

    2011-01-01

    Human-orangutan conflict and hunting are thought to pose a serious threat to orangutan existence in Kalimantan, the Indonesian part of Borneo. No data existed prior to the present study to substantiate these threats. We investigated the rates, spatial distribution and causes of conflict and hunting through an interview-based survey in the orangutan's range in Kalimantan, Indonesia. Between April 2008 and September 2009, we interviewed 6983 respondents in 687 villages to obtain socio-economic information, assess knowledge of local wildlife in general and orangutan encounters specifically, and to query respondents about their knowledge on orangutan conflicts and killing, and relevant laws. This survey revealed estimated killing rates of between 750 and 1800 animals killed in the last year, and between 1950 and 3100 animals killed per year on average within the lifetime of the survey respondents. These killing rates are higher than previously thought and are high enough to pose a serious threat to the continued existence of orangutans in Kalimantan. Importantly, the study contributes to our understanding of the spatial variation in threats, and the underlying causes of those threats, which can be used to facilitate the development of targeted conservation management. PMID:22096582

  3. Computing health quality measures using Informatics for Integrating Biology and the Bedside.

    PubMed

    Klann, Jeffrey G; Murphy, Shawn N

    2013-04-19

    The Health Quality Measures Format (HQMF) is a Health Level 7 (HL7) standard for expressing computable Clinical Quality Measures (CQMs). Creating tools to process HQMF queries in clinical databases will become increasingly important as the United States moves forward with its Health Information Technology Strategic Plan to Stages 2 and 3 of the Meaningful Use incentive program (MU2 and MU3). Informatics for Integrating Biology and the Bedside (i2b2) is one of the analytical databases used as part of the Office of the National Coordinator (ONC)'s Query Health platform to move toward this goal. Our goal is to integrate i2b2 with the Query Health HQMF architecture, to prepare for other HQMF use-cases (such as MU2 and MU3), and to articulate the functional overlap between i2b2 and HQMF. Therefore, we analyze the structure of HQMF, and then we apply this understanding to HQMF computation on the i2b2 clinical analytical database platform. Specifically, we develop a translator between two query languages, HQMF and i2b2, so that the i2b2 platform can compute HQMF queries. We use the HQMF structure of queries for aggregate reporting, which define clinical data elements and the temporal and logical relationships between them. We use the i2b2 XML format, which allows flexible querying of a complex clinical data repository in an easy-to-understand domain-specific language. The translator can represent nearly any i2b2-XML query as HQMF and execute in i2b2 nearly any HQMF query expressible in i2b2-XML. This translator is part of the freely available reference implementation of the QueryHealth initiative. We analyze limitations of the conversion and find it covers many, but not all, of the complex temporal and logical operators required by quality measures. HQMF is an expressive language for defining quality measures, and it will be important to understand and implement for CQM computation, in both meaningful use and population health. However, its current form might allow complexity that is intractable for current database systems (both in terms of implementation and computation). Our translator, which supports the subset of HQMF currently expressible in i2b2-XML, may represent the beginnings of a practical compromise. It is being pilot-tested in two Query Health demonstration projects, and it can be further expanded to balance computational tractability with the advanced features needed by measure developers.

  4. Computing Health Quality Measures Using Informatics for Integrating Biology and the Bedside

    PubMed Central

    Murphy, Shawn N

    2013-01-01

    Background The Health Quality Measures Format (HQMF) is a Health Level 7 (HL7) standard for expressing computable Clinical Quality Measures (CQMs). Creating tools to process HQMF queries in clinical databases will become increasingly important as the United States moves forward with its Health Information Technology Strategic Plan to Stages 2 and 3 of the Meaningful Use incentive program (MU2 and MU3). Informatics for Integrating Biology and the Bedside (i2b2) is one of the analytical databases used as part of the Office of the National Coordinator (ONC)’s Query Health platform to move toward this goal. Objective Our goal is to integrate i2b2 with the Query Health HQMF architecture, to prepare for other HQMF use-cases (such as MU2 and MU3), and to articulate the functional overlap between i2b2 and HQMF. Therefore, we analyze the structure of HQMF, and then we apply this understanding to HQMF computation on the i2b2 clinical analytical database platform. Specifically, we develop a translator between two query languages, HQMF and i2b2, so that the i2b2 platform can compute HQMF queries. Methods We use the HQMF structure of queries for aggregate reporting, which define clinical data elements and the temporal and logical relationships between them. We use the i2b2 XML format, which allows flexible querying of a complex clinical data repository in an easy-to-understand domain-specific language. Results The translator can represent nearly any i2b2-XML query as HQMF and execute in i2b2 nearly any HQMF query expressible in i2b2-XML. This translator is part of the freely available reference implementation of the QueryHealth initiative. We analyze limitations of the conversion and find it covers many, but not all, of the complex temporal and logical operators required by quality measures. Conclusions HQMF is an expressive language for defining quality measures, and it will be important to understand and implement for CQM computation, in both meaningful use and population health. However, its current form might allow complexity that is intractable for current database systems (both in terms of implementation and computation). Our translator, which supports the subset of HQMF currently expressible in i2b2-XML, may represent the beginnings of a practical compromise. It is being pilot-tested in two Query Health demonstration projects, and it can be further expanded to balance computational tractability with the advanced features needed by measure developers. PMID:23603227

  5. Adverse Reactions Associated With Cannabis Consumption as Evident From Search Engine Queries.

    PubMed

    Yom-Tov, Elad; Lev-Ran, Shaul

    2017-10-26

    Cannabis is one of the most widely used psychoactive substances worldwide, but adverse drug reactions (ADRs) associated with its use are difficult to study because of its prohibited status in many countries. Internet search engine queries have been used to investigate ADRs in pharmaceutical drugs. In this proof-of-concept study, we tested whether these queries can be used to detect the adverse reactions of cannabis use. We analyzed anonymized queries from US-based users of Bing, a widely used search engine, made over a period of 6 months and compared the results with the prevalence of cannabis use as reported in the US National Survey on Drug Use in the Household (NSDUH) and with ADRs reported in the Food and Drug Administration's Adverse Drug Reporting System. Predicted prevalence of cannabis use was estimated from the fraction of people making queries about cannabis, marijuana, and 121 additional synonyms. Predicted ADRs were estimated from queries containing layperson descriptions to 195 ICD-10 symptoms list. Our results indicated that the predicted prevalence of cannabis use at the US census regional level reaches an R 2 of .71 NSDUH data. Queries for ADRs made by people who also searched for cannabis reveal many of the known adverse effects of cannabis (eg, cough and psychotic symptoms), as well as plausible unknown reactions (eg, pyrexia). These results indicate that search engine queries can serve as an important tool for the study of adverse reactions of illicit drugs, which are difficult to study in other settings. ©Elad Yom-Tov, Shaul Lev-Ran. Originally published in JMIR Public Health and Surveillance (http://publichealth.jmir.org), 26.10.2017.

  6. An SQL query generator for CLIPS

    NASA Technical Reports Server (NTRS)

    Snyder, James; Chirica, Laurian

    1990-01-01

    As expert systems become more widely used, their access to large amounts of external information becomes increasingly important. This information exists in several forms such as statistical, tabular data, knowledge gained by experts and large databases of information maintained by companies. Because many expert systems, including CLIPS, do not provide access to this external information, much of the usefulness of expert systems is left untapped. The scope of this paper is to describe a database extension for the CLIPS expert system shell. The current industry standard database language is SQL. Due to SQL standardization, large amounts of information stored on various computers, potentially at different locations, will be more easily accessible. Expert systems should be able to directly access these existing databases rather than requiring information to be re-entered into the expert system environment. The ORACLE relational database management system (RDBMS) was used to provide a database connection within the CLIPS environment. To facilitate relational database access a query generation system was developed as a CLIPS user function. The queries are entered in a CLlPS-like syntax and are passed to the query generator, which constructs and submits for execution, an SQL query to the ORACLE RDBMS. The query results are asserted as CLIPS facts. The query generator was developed primarily for use within the ICADS project (Intelligent Computer Aided Design System) currently being developed by the CAD Research Unit in the California Polytechnic State University (Cal Poly). In ICADS, there are several parallel or distributed expert systems accessing a common knowledge base of facts. Expert system has a narrow domain of interest and therefore needs only certain portions of the information. The query generator provides a common method of accessing this information and allows the expert system to specify what data is needed without specifying how to retrieve it.

  7. A Coding Method for Efficient Subgraph Querying on Vertex- and Edge-Labeled Graphs

    PubMed Central

    Zhu, Lei; Song, Qinbao; Guo, Yuchen; Du, Lei; Zhu, Xiaoyan; Wang, Guangtao

    2014-01-01

    Labeled graphs are widely used to model complex data in many domains, so subgraph querying has been attracting more and more attention from researchers around the world. Unfortunately, subgraph querying is very time consuming since it involves subgraph isomorphism testing that is known to be an NP-complete problem. In this paper, we propose a novel coding method for subgraph querying that is based on Laplacian spectrum and the number of walks. Our method follows the filtering-and-verification framework and works well on graph databases with frequent updates. We also propose novel two-step filtering conditions that can filter out most false positives and prove that the two-step filtering conditions satisfy the no-false-negative requirement (no dismissal in answers). Extensive experiments on both real and synthetic graphs show that, compared with six existing counterpart methods, our method can effectively improve the efficiency of subgraph querying. PMID:24853266

  8. Learning from the U.S. Department of Veterans Affairs Quality Enhancement Research Initiative: QUERI Series

    PubMed Central

    Graham, Ian D; Tetroe, Jacqueline

    2009-01-01

    As the recent collection of papers from the Quality Enhancement Research Initiative (QUERI) Series indicates, knowledge is leading to considerable action in the United States (U.S.) Department of Veterans Affairs (VA). The QUERI Series offers clinical researchers, implementation scientists, health systems, and health research funders from around the globe a unique window into the both the practice and science of implementation or knowledge translation (KT) in the VA. By describing successes and challenges as well as setbacks and disappointments, the QUERI Series is all the more useful. From the vantage point of Canadian KT researchers and officials at a national health research funding agency, we offer a number of observations and lessons that can be learned from QUERI. "Knowledge, if it does not determine action, is dead to us." Plotinus (Roman philosopher 205AD-270AD) PMID:19267920

  9. QRFXFreeze: Queryable Compressor for RFX.

    PubMed

    Senthilkumar, Radha; Nandagopal, Gomathi; Ronald, Daphne

    2015-01-01

    The verbose nature of XML has been mulled over again and again and many compression techniques for XML data have been excogitated over the years. Some of the techniques incorporate support for querying the XML database in its compressed format while others have to be decompressed before they can be queried. XML compression in which querying is directly supported instantaneously with no compromise over time is forced to compromise over space. In this paper, we propose the compressor, QRFXFreeze, which not only reduces the space of storage but also supports efficient querying. The compressor does this without decompressing the compressed XML file. The compressor supports all kinds of XML documents along with insert, update, and delete operations. The forte of QRFXFreeze is that the textual data are semantically compressed and are indexed to reduce the querying time. Experimental results show that the proposed compressor performs much better than other well-known compressors.

  10. Practical private database queries based on a quantum-key-distribution protocol

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jakobi, Markus; Humboldt-Universitaet zu Berlin, D-10117 Berlin; Simon, Christoph

    2011-02-15

    Private queries allow a user, Alice, to learn an element of a database held by a provider, Bob, without revealing which element she is interested in, while limiting her information about the other elements. We propose to implement private queries based on a quantum-key-distribution protocol, with changes only in the classical postprocessing of the key. This approach makes our scheme both easy to implement and loss tolerant. While unconditionally secure private queries are known to be impossible, we argue that an interesting degree of security can be achieved by relying on fundamental physical principles instead of unverifiable security assumptions inmore » order to protect both the user and the database. We think that the scope exists for such practical private queries to become another remarkable application of quantum information in the footsteps of quantum key distribution.« less

  11. Data Processing on Database Management Systems with Fuzzy Query

    NASA Astrophysics Data System (ADS)

    Şimşek, Irfan; Topuz, Vedat

    In this study, a fuzzy query tool (SQLf) for non-fuzzy database management systems was developed. In addition, samples of fuzzy queries were made by using real data with the tool developed in this study. Performance of SQLf was tested with the data about the Marmara University students' food grant. The food grant data were collected in MySQL database by using a form which had been filled on the web. The students filled a form on the web to describe their social and economical conditions for the food grant request. This form consists of questions which have fuzzy and crisp answers. The main purpose of this fuzzy query is to determine the students who deserve the grant. The SQLf easily found the eligible students for the grant through predefined fuzzy values. The fuzzy query tool (SQLf) could be used easily with other database system like ORACLE and SQL server.

  12. An intelligent user interface for browsing satellite data catalogs

    NASA Technical Reports Server (NTRS)

    Cromp, Robert F.; Crook, Sharon

    1989-01-01

    A large scale domain-independent spatial data management expert system that serves as a front-end to databases containing spatial data is described. This system is unique for two reasons. First, it uses spatial search techniques to generate a list of all the primary keys that fall within a user's spatial constraints prior to invoking the database management system, thus substantially decreasing the amount of time required to answer a user's query. Second, a domain-independent query expert system uses a domain-specific rule base to preprocess the user's English query, effectively mapping a broad class of queries into a smaller subset that can be handled by a commercial natural language processing system. The methods used by the spatial search module and the query expert system are explained, and the system architecture for the spatial data management expert system is described. The system is applied to data from the International Ultraviolet Explorer (IUE) satellite, and results are given.

  13. ESTminer: a Web interface for mining EST contig and cluster databases.

    PubMed

    Huang, Yecheng; Pumphrey, Janie; Gingle, Alan R

    2005-03-01

    ESTminer is a Web application and database schema for interactive mining of expressed sequence tag (EST) contig and cluster datasets. The Web interface contains a query frame that allows the selection of contigs/clusters with specific cDNA library makeup or a threshold number of members. The results are displayed as color-coded tree nodes, where the color indicates the fractional size of each cDNA library component. The nodes are expandable, revealing library statistics as well as EST or contig members, with links to sequence data, GenBank records or user configurable links. Also, the interface allows 'queries within queries' where the result set of a query is further filtered by the subsequent query. ESTminer is implemented in Java/JSP and the package, including MySQL and Oracle schema creation scripts, is available from http://cggc.agtec.uga.edu/Data/download.asp agingle@uga.edu.

  14. Classifying Clinical Trial Eligibility Criteria to Facilitate Phased Cohort Identification Using Clinical Data Repositories.

    PubMed

    Wang, Amy Y; Lancaster, William J; Wyatt, Matthew C; Rasmussen, Luke V; Fort, Daniel G; Cimino, James J

    2017-01-01

    A major challenge in using electronic health record repositories for research is the difficulty matching subject eligibility criteria to query capabilities of the repositories. We propose categories for study criteria corresponding to the effort needed for querying those criteria: "easy" (supporting automated queries), mixed (initial automated querying with manual review), "hard" (fully manual record review), and "impossible" or "point of enrollment" (not typically in health repositories). We obtained a sample of 292 criteria from 20 studies from ClinicalTrials.gov. Six independent reviewers, three each from two academic research institutions, rated criteria according to our four types. We observed high interrater reliability both within and between institutions. The analysis demonstrated typical features of criteria that map with varying levels of difficulty to repositories. We propose using these features to improve enrollment workflow through more standardized study criteria, self-service repository queries, and analyst-mediated retrievals.

  15. Classifying Clinical Trial Eligibility Criteria to Facilitate Phased Cohort Identification Using Clinical Data Repositories

    PubMed Central

    Wang, Amy Y.; Lancaster, William J.; Wyatt, Matthew C.; Rasmussen, Luke V.; Fort, Daniel G.; Cimino, James J.

    2017-01-01

    A major challenge in using electronic health record repositories for research is the difficulty matching subject eligibility criteria to query capabilities of the repositories. We propose categories for study criteria corresponding to the effort needed for querying those criteria: “easy” (supporting automated queries), mixed (initial automated querying with manual review), “hard” (fully manual record review), and “impossible” or “point of enrollment” (not typically in health repositories). We obtained a sample of 292 criteria from 20 studies from ClinicalTrials.gov. Six independent reviewers, three each from two academic research institutions, rated criteria according to our four types. We observed high interrater reliability both within and between institutions. The analysis demonstrated typical features of criteria that map with varying levels of difficulty to repositories. We propose using these features to improve enrollment workflow through more standardized study criteria, self-service repository queries, and analyst-mediated retrievals. PMID:29854246

  16. Use of controlled vocabularies to improve biomedical information retrieval tasks.

    PubMed

    Pasche, Emilie; Gobeill, Julien; Vishnyakova, Dina; Ruch, Patrick; Lovis, Christian

    2013-01-01

    The high heterogeneity of biomedical vocabulary is a major obstacle for information retrieval in large biomedical collections. Therefore, using biomedical controlled vocabularies is crucial for managing these contents. We investigate the impact of query expansion based on controlled vocabularies to improve the effectiveness of two search engines. Our strategy relies on the enrichment of users' queries with additional terms, directly derived from such vocabularies applied to infectious diseases and chemical patents. We observed that query expansion based on pathogen names resulted in improvements of the top-precision of our first search engine, while the normalization of diseases degraded the top-precision. The expansion of chemical entities, which was performed on the second search engine, positively affected the mean average precision. We have shown that query expansion of some types of biomedical entities has a great potential to improve search effectiveness; therefore a fine-tuning of query expansion strategies could help improving the performances of search engines.

  17. Building a Smart Portal for Astronomy

    NASA Astrophysics Data System (ADS)

    Derriere, S.; Boch, T.

    2011-07-01

    The development of a portal for accessing astronomical resources is not an easy task. The ever-increasing complexity of the data products can result in very complex user interfaces, requiring a lot of effort and learning from the user in order to perform searches. This is often a design choice, where the user must explicitly set many constraints, while the portal search logic remains simple. We investigated a different approach, where the query interface is kept as simple as possible (ideally, a simple text field, like for Google search), and the search logic is made much more complex to interpret the query in a relevant manner. We will present the implications of this approach in terms of interpretation and categorization of the query parameters (related to astronomical vocabularies), translation (mapping) of these concepts into the portal components metadata, identification of query schemes and use cases matching the input parameters, and delivery of query results to the user.

  18. Content-aware network storage system supporting metadata retrieval

    NASA Astrophysics Data System (ADS)

    Liu, Ke; Qin, Leihua; Zhou, Jingli; Nie, Xuejun

    2008-12-01

    Nowadays, content-based network storage has become the hot research spot of academy and corporation[1]. In order to solve the problem of hit rate decline causing by migration and achieve the content-based query, we exploit a new content-aware storage system which supports metadata retrieval to improve the query performance. Firstly, we extend the SCSI command descriptor block to enable system understand those self-defined query requests. Secondly, the extracted metadata is encoded by extensible markup language to improve the universality. Thirdly, according to the demand of information lifecycle management (ILM), we store those data in different storage level and use corresponding query strategy to retrieval them. Fourthly, as the file content identifier plays an important role in locating data and calculating block correlation, we use it to fetch files and sort query results through friendly user interface. Finally, the experiments indicate that the retrieval strategy and sort algorithm have enhanced the retrieval efficiency and precision.

  19. Remembrance of inferences past: Amortization in human hypothesis generation.

    PubMed

    Dasgupta, Ishita; Schulz, Eric; Goodman, Noah D; Gershman, Samuel J

    2018-05-21

    Bayesian models of cognition assume that people compute probability distributions over hypotheses. However, the required computations are frequently intractable or prohibitively expensive. Since people often encounter many closely related distributions, selective reuse of computations (amortized inference) is a computationally efficient use of the brain's limited resources. We present three experiments that provide evidence for amortization in human probabilistic reasoning. When sequentially answering two related queries about natural scenes, participants' responses to the second query systematically depend on the structure of the first query. This influence is sensitive to the content of the queries, only appearing when the queries are related. Using a cognitive load manipulation, we find evidence that people amortize summary statistics of previous inferences, rather than storing the entire distribution. These findings support the view that the brain trades off accuracy and computational cost, to make efficient use of its limited cognitive resources to approximate probabilistic inference. Copyright © 2018 Elsevier B.V. All rights reserved.

  20. R3D-2-MSA: the RNA 3D structure-to-multiple sequence alignment server.

    PubMed

    Cannone, Jamie J; Sweeney, Blake A; Petrov, Anton I; Gutell, Robin R; Zirbel, Craig L; Leontis, Neocles

    2015-07-01

    The RNA 3D Structure-to-Multiple Sequence Alignment Server (R3D-2-MSA) is a new web service that seamlessly links RNA three-dimensional (3D) structures to high-quality RNA multiple sequence alignments (MSAs) from diverse biological sources. In this first release, R3D-2-MSA provides manual and programmatic access to curated, representative ribosomal RNA sequence alignments from bacterial, archaeal, eukaryal and organellar ribosomes, using nucleotide numbers from representative atomic-resolution 3D structures. A web-based front end is available for manual entry and an Application Program Interface for programmatic access. Users can specify up to five ranges of nucleotides and 50 nucleotide positions per range. The R3D-2-MSA server maps these ranges to the appropriate columns of the corresponding MSA and returns the contents of the columns, either for display in a web browser or in JSON format for subsequent programmatic use. The browser output page provides a 3D interactive display of the query, a full list of sequence variants with taxonomic information and a statistical summary of distinct sequence variants found. The output can be filtered and sorted in the browser. Previous user queries can be viewed at any time by resubmitting the output URL, which encodes the search and re-generates the results. The service is freely available with no login requirement at http://rna.bgsu.edu/r3d-2-msa. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

Top