Sample records for aggregate query quantities

  1. Spatial aggregation query in dynamic geosensor networks

    NASA Astrophysics Data System (ADS)

    Yi, Baolin; Feng, Dayang; Xiao, Shisong; Zhao, Erdun

    2007-11-01

    Wireless sensor networks have been widely used for civilian and military applications, such as environmental monitoring and vehicle tracking. In many of these applications, the researches mainly aim at building sensor network based systems to leverage the sensed data to applications. However, the existing works seldom exploited spatial aggregation query considering the dynamic characteristics of sensor networks. In this paper, we investigate how to process spatial aggregation query over dynamic geosensor networks where both the sink node and sensor nodes are mobile and propose several novel improvements on enabling techniques. The mobility of sensors makes the existing routing protocol based on information of fixed framework or the neighborhood infeasible. We present an improved location-based stateless implicit geographic forwarding (IGF) protocol for routing a query toward the area specified by query window, a diameter-based window aggregation query (DWAQ) algorithm for query propagation and data aggregation in the query window, finally considering the location changing of the sink node, we present two schemes to forward the result to the sink node. Simulation results show that the proposed algorithms can improve query latency and query accuracy.

  2. a Spatiotemporal Aggregation Query Method Using Multi-Thread Parallel Technique Based on Regional Division

    NASA Astrophysics Data System (ADS)

    Liao, S.; Chen, L.; Li, J.; Xiong, W.; Wu, Q.

    2015-07-01

    Existing spatiotemporal database supports spatiotemporal aggregation query over massive moving objects datasets. Due to the large amounts of data and single-thread processing method, the query speed cannot meet the application requirements. On the other hand, the query efficiency is more sensitive to spatial variation then temporal variation. In this paper, we proposed a spatiotemporal aggregation query method using multi-thread parallel technique based on regional divison and implemented it on the server. Concretely, we divided the spatiotemporal domain into several spatiotemporal cubes, computed spatiotemporal aggregation on all cubes using the technique of multi-thread parallel processing, and then integrated the query results. By testing and analyzing on the real datasets, this method has improved the query speed significantly.

  3. Sugeno Fuzzy Integral as a Basis for the Interpretation of Flexible Queries Involving Monotonic Aggregates.

    ERIC Educational Resources Information Center

    Bosc, P.; Lietard, L.; Pivert, O.

    2003-01-01

    Considers flexible querying of relational databases. Highlights include SQL languages and basic aggregate operators; Sugeno's fuzzy integral; evaluation examples; and how and under what conditions other aggregate functions could be applied to fuzzy sets in a flexible query. (Author/LRW)

  4. Aggregating Queries Against Large Inventories of Remotely Accessible Data

    NASA Astrophysics Data System (ADS)

    Gallagher, J. H. R.; Fulker, D. W.

    2016-12-01

    Those seeking to discover data for a specific purpose often encounter search results that are so large as to be useless without computing assistance. This situation arises, with increasing frequency, in part because repositories contain ever greater numbers of granules, and their granularities may well be poorly aligned or even orthogonal to the data-selection needs of the user. This presentation describes a recently developed service for simultaneously querying large lists of OPeNDAP-accessible granules to extract specified data. The specifications include a richly expressive set of data-selection criteria—applicable to content as well as metadata—and the service has been tested successfully against lists naming hundreds of thousands of granules. Querying such numbers of local files (i.e., granules) on a desktop or laptop computer is practical (by using a scripting language, e.g.), but this practicality is diminished when the data are remote and thus best accessed through a Web-services interface. In these cases, which are increasingly common, scripted queries can take many hours because of inherent network latencies. Furthermore, communication dropouts can add fragility to such scripts, yielding gaps in the acquired results. In contrast, OPeNDAP's new aggregated-query services enable data discovery in the context of very large inventory sizes. These capabilities have been developed for use with OPeNDAP's Hyrax server, which is an open-source realization of DAP (for "Data Access Protocol," a specification widely used in NASA, NOAA and other data-intensive contexts). These aggregated-query services exhibit good response times (on the order of seconds, not hours) even for inventories that list hundreds of thousands of source granules.

  5. RiPPAS: A Ring-Based Privacy-Preserving Aggregation Scheme in Wireless Sensor Networks

    PubMed Central

    Zhang, Kejia; Han, Qilong; Cai, Zhipeng; Yin, Guisheng

    2017-01-01

    Recently, data privacy in wireless sensor networks (WSNs) has been paid increased attention. The characteristics of WSNs determine that users’ queries are mainly aggregation queries. In this paper, the problem of processing aggregation queries in WSNs with data privacy preservation is investigated. A Ring-based Privacy-Preserving Aggregation Scheme (RiPPAS) is proposed. RiPPAS adopts ring structure to perform aggregation. It uses pseudonym mechanism for anonymous communication and uses homomorphic encryption technique to add noise to the data easily to be disclosed. RiPPAS can handle both sum() queries and min()/max() queries, while the existing privacy-preserving aggregation methods can only deal with sum() queries. For processing sum() queries, compared with the existing methods, RiPPAS has advantages in the aspects of privacy preservation and communication efficiency, which can be proved by theoretical analysis and simulation results. For processing min()/max() queries, RiPPAS provides effective privacy preservation and has low communication overhead. PMID:28178197

  6. 40 CFR 355.14 - Do I have to aggregate extremely hazardous substances to determine the total quantity present?

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 40 Protection of Environment 28 2011-07-01 2011-07-01 false Do I have to aggregate extremely hazardous substances to determine the total quantity present? 355.14 Section 355.14 Protection of Environment ENVIRONMENTAL PROTECTION AGENCY (CONTINUED) SUPERFUND, EMERGENCY PLANNING, AND COMMUNITY RIGHT-TO...

  7. A Fuzzy Query Mechanism for Human Resource Websites

    NASA Astrophysics Data System (ADS)

    Lai, Lien-Fu; Wu, Chao-Chin; Huang, Liang-Tsung; Kuo, Jung-Chih

    Users' preferences often contain imprecision and uncertainty that are difficult for traditional human resource websites to deal with. In this paper, we apply the fuzzy logic theory to develop a fuzzy query mechanism for human resource websites. First, a storing mechanism is proposed to store fuzzy data into conventional database management systems without modifying DBMS models. Second, a fuzzy query language is proposed for users to make fuzzy queries on fuzzy databases. User's fuzzy requirement can be expressed by a fuzzy query which consists of a set of fuzzy conditions. Third, each fuzzy condition associates with a fuzzy importance to differentiate between fuzzy conditions according to their degrees of importance. Fourth, the fuzzy weighted average is utilized to aggregate all fuzzy conditions based on their degrees of importance and degrees of matching. Through the mutual compensation of all fuzzy conditions, the ordering of query results can be obtained according to user's preference.

  8. Query Transformations for Result Merging

    DTIC Science & Technology

    2014-11-01

    tors, term dependence, query expansion 1. INTRODUCTION Federated search deals with the problem of aggregating results from multiple search engines . The...invidual search engines are (i) typically focused on a particular domain or a particular corpus, (ii) employ diverse retrieval models, and (iii...determine which search engines are appropri- ate for addressing the information need (resource selection), and (ii) merging the results returned by

  9. Aggregated Computational Toxicology Online Resource

    EPA Pesticide Factsheets

    Aggregated Computational Toxicology Online Resource (AcTOR) is EPA's online aggregator of all the public sources of chemical toxicity data. ACToR aggregates data from over 1,000 public sources on over 500,000 chemicals and is searchable by chemical name, other identifiers and by chemical structure. It can be used to query a specific chemical and find all publicly available hazard, exposure and risk assessment data. It also provides access to EPA's ToxCast, ToxRefDB, DSSTox, Dashboard and DSSTox data.

  10. On-demand information retrieval in sensor networks with localised query and energy-balanced data collection.

    PubMed

    Teng, Rui; Zhang, Bing

    2011-01-01

    On-demand information retrieval enables users to query and collect up-to-date sensing information from sensor nodes. Since high energy efficiency is required in a sensor network, it is desirable to disseminate query messages with small traffic overhead and to collect sensing data with low energy consumption. However, on-demand query messages are generally forwarded to sensor nodes in network-wide broadcasts, which create large traffic overhead. In addition, since on-demand information retrieval may introduce intermittent and spatial data collections, the construction and maintenance of conventional aggregation structures such as clusters and chains will be at high cost. In this paper, we propose an on-demand information retrieval approach that exploits the name resolution of data queries according to the attribute and location of each sensor node. The proposed approach localises each query dissemination and enable localised data collection with maximised aggregation. To illustrate the effectiveness of the proposed approach, an analytical model that describes the criteria of sink proxy selection is provided. The evaluation results reveal that the proposed scheme significantly reduces energy consumption and improves the balance of energy consumption among sensor nodes by alleviating heavy traffic near the sink.

  11. Web page sorting algorithm based on query keyword distance relation

    NASA Astrophysics Data System (ADS)

    Yang, Han; Cui, Hong Gang; Tang, Hao

    2017-08-01

    In order to optimize the problem of page sorting, according to the search keywords in the web page in the relationship between the characteristics of the proposed query keywords clustering ideas. And it is converted into the degree of aggregation of the search keywords in the web page. Based on the PageRank algorithm, the clustering degree factor of the query keyword is added to make it possible to participate in the quantitative calculation. This paper proposes an improved algorithm for PageRank based on the distance relation between search keywords. The experimental results show the feasibility and effectiveness of the method.

  12. ASSET Queries: A Set-Oriented and Column-Wise Approach to Modern OLAP

    NASA Astrophysics Data System (ADS)

    Chatziantoniou, Damianos; Sotiropoulos, Yannis

    Modern data analysis has given birth to numerous grouping constructs and programming paradigms, way beyond the traditional group by. Applications such as data warehousing, web log analysis, streams monitoring and social networks understanding necessitated the use of data cubes, grouping variables, windows and MapReduce. In this paper we review the associated set (ASSET) concept and discuss its applicability in both continuous and traditional data settings. Given a set of values B, an associated set over B is just a collection of annotated data multisets, one for each b(B. The goal is to efficiently compute aggregates over these data sets. An ASSET query consists of repeated definitions of associated sets and aggregates of these, possibly correlated, resembling a spreadsheet document. We review systems implementing ASSET queries both in continuous and persistent contexts and argue for associated sets' analytical abilities and optimization opportunities.

  13. Aggregating land use quantity and intensity to link water quality in upper catchment of Miyun Reservoir

    NASA Astrophysics Data System (ADS)

    Xu, E.

    2015-12-01

    Land use is closely related to hydrological and biochemical processes influencing the water quality. Quantifying relationship between both of them can help effectively manage land use to improve water quality. Previous studies majorly utilized land use quantity as an indicator to link water quality parameters, which lacked an insight to the influence of land use intensity. Taking upper catchment of Miyun Reservoir as a case study, we proposed a method of aggregating land use quantity and intensity to build a new land use indicator and investigated its explanation empower on water quality. Six nutrient concentrations from 52 sub-watersheds covering the whole catchment were used to characterize spatial distributions of water eutrophication. Based on spatial techniques and empirical conversion coefficients, combined remote sensing with socio-economic statistical data, land use intensity was measured and mapped visually. Then the new land use indicator was calculated and linked to nutrient concentrations by Pearson correlation coefficients. Results demonstrated that our new land use indicator incorporating intensity information can quantify the potential different nutrients exporting abilities from land uses. Comparing to traditional indicators only characterized by land use quantity, most Pearson correlation coefficients between new indicator and water nutrient concentrations increased. New information enhanced the explanatory power of land use on water nutrient concentrations. Then it can help better understand the impact of land use on water quality and guide land use management for supporting decision making.

  14. Federated querying architecture with clinical & translational health IT application.

    PubMed

    Livne, Oren E; Schultz, N Dustin; Narus, Scott P

    2011-10-01

    We present a software architecture that federates data from multiple heterogeneous health informatics data sources owned by multiple organizations. The architecture builds upon state-of-the-art open-source Java and XML frameworks in innovative ways. It consists of (a) federated query engine, which manages federated queries and result set aggregation via a patient identification service; and (b) data source facades, which translate the physical data models into a common model on-the-fly and handle large result set streaming. System modules are connected via reusable Apache Camel integration routes and deployed to an OSGi enterprise service bus. We present an application of our architecture that allows users to construct queries via the i2b2 web front-end, and federates patient data from the University of Utah Enterprise Data Warehouse and the Utah Population database. Our system can be easily adopted, extended and integrated with existing SOA Healthcare and HL7 frameworks such as i2b2 and caGrid.

  15. Federated queries of clinical data repositories: the sum of the parts does not equal the whole

    PubMed Central

    Weber, Griffin M

    2013-01-01

    Background and objective In 2008 we developed a shared health research information network (SHRINE), which for the first time enabled research queries across the full patient populations of four Boston hospitals. It uses a federated architecture, where each hospital returns only the aggregate count of the number of patients who match a query. This allows hospitals to retain control over their local databases and comply with federal and state privacy laws. However, because patients may receive care from multiple hospitals, the result of a federated query might differ from what the result would be if the query were run against a single central repository. This paper describes the situations when this happens and presents a technique for correcting these errors. Methods We use a one-time process of identifying which patients have data in multiple repositories by comparing one-way hash values of patient demographics. This enables us to partition the local databases such that all patients within a given partition have data at the same subset of hospitals. Federated queries are then run separately on each partition independently, and the combined results are presented to the user. Results Using theoretical bounds and simulated hospital networks, we demonstrate that once the partitions are made, SHRINE can produce more precise estimates of the number of patients matching a query. Conclusions Uncertainty in the overlap of patient populations across hospitals limits the effectiveness of SHRINE and other federated query tools. Our technique reduces this uncertainty while retaining an aggregate federated architecture. PMID:23349080

  16. Predicting Drug Recalls From Internet Search Engine Queries.

    PubMed

    Yom-Tov, Elad

    2017-01-01

    Batches of pharmaceuticals are sometimes recalled from the market when a safety issue or a defect is detected in specific production runs of a drug. Such problems are usually detected when patients or healthcare providers report abnormalities to medical authorities. Here, we test the hypothesis that defective production lots can be detected earlier by monitoring queries to Internet search engines. We extracted queries from the USA to the Bing search engine, which mentioned one of the 5195 pharmaceutical drugs during 2015 and all recall notifications issued by the Food and Drug Administration (FDA) during that year. By using attributes that quantify the change in query volume at the state level, we attempted to predict if a recall of a specific drug will be ordered by FDA in a time horizon ranging from 1 to 40 days in future. Our results show that future drug recalls can indeed be identified with an AUC of 0.791 and a lift at 5% of approximately 6 when predicting a recall occurring one day ahead. This performance degrades as prediction is made for longer periods ahead. The most indicative attributes for prediction are sudden spikes in query volume about a specific medicine in each state. Recalls of prescription drugs and those estimated to be of medium-risk are more likely to be identified using search query data. These findings suggest that aggregated Internet search engine data can be used to facilitate in early warning of faulty batches of medicines.

  17. A Query Integrator and Manager for the Query Web

    PubMed Central

    Brinkley, James F.; Detwiler, Landon T.

    2012-01-01

    We introduce two concepts: the Query Web as a layer of interconnected queries over the document web and the semantic web, and a Query Web Integrator and Manager (QI) that enables the Query Web to evolve. QI permits users to write, save and reuse queries over any web accessible source, including other queries saved in other installations of QI. The saved queries may be in any language (e.g. SPARQL, XQuery); the only condition for interconnection is that the queries return their results in some form of XML. This condition allows queries to chain off each other, and to be written in whatever language is appropriate for the task. We illustrate the potential use of QI for several biomedical use cases, including ontology view generation using a combination of graph-based and logical approaches, value set generation for clinical data management, image annotation using terminology obtained from an ontology web service, ontology-driven brain imaging data integration, small-scale clinical data integration, and wider-scale clinical data integration. Such use cases illustrate the current range of applications of QI and lead us to speculate about the potential evolution from smaller groups of interconnected queries into a larger query network that layers over the document and semantic web. The resulting Query Web could greatly aid researchers and others who now have to manually navigate through multiple information sources in order to answer specific questions. PMID:22531831

  18. Semantics Enabled Queries in EuroGEOSS: a Discovery Augmentation Approach

    NASA Astrophysics Data System (ADS)

    Santoro, M.; Mazzetti, P.; Fugazza, C.; Nativi, S.; Craglia, M.

    2010-12-01

    regards the multilinguality relationship; ii. The resulting queries are submitted to the federated catalog services; iii. The DAC performs a “smart” aggregation of the queries results and provides them back to the client. In the second case, the main discovery steps are: i. the user browses the federated semantic repositories and selects the concepts/terms-of-interest; ii. The DAC creates the set of geospatial queries based on the selected concepts/terms and submits them to the federated catalog services; iii. The DAC performs a “smart” aggregation of the queries results and provides them back to the client. A Graphical User Interface (GUI) was also developed for testing and interacting with the DAC. The entire brokering framework is deployed in the context of EuroGEOSS infrastructure and it is used in a couple of GEOSS AIP-3 use scenarios: the “e-Habitat Use Scenario” for the Biodiversity and Climate Change topic, and the “Comprehensive Drought Index Use Scenario” for Water/Drought topic

  19. Language model: Extension to solve inconsistency, incompleteness, and short query in cultural heritage collection

    NASA Astrophysics Data System (ADS)

    Tan, Kian Lam; Lim, Chen Kim

    2017-10-01

    With the explosive growth of online information such as email messages, news articles, and scientific literature, many institutions and museums are converting their cultural collections from physical data to digital format. However, this conversion resulted in the issues of inconsistency and incompleteness. Besides, the usage of inaccurate keywords also resulted in short query problem. Most of the time, the inconsistency and incompleteness are caused by the aggregation fault in annotating a document itself while the short query problem is caused by naive user who has prior knowledge and experience in cultural heritage domain. In this paper, we presented an approach to solve the problem of inconsistency, incompleteness and short query by incorporating the Term Similarity Matrix into the Language Model. Our approach is tested on the Cultural Heritage in CLEF (CHiC) collection which consists of short queries and documents. The results show that the proposed approach is effective and has improved the accuracy in retrieval time.

  20. GenoQuery: a new querying module for functional annotation in a genomic warehouse

    PubMed Central

    Lemoine, Frédéric; Labedan, Bernard; Froidevaux, Christine

    2008-01-01

    Motivation: We have to cope with both a deluge of new genome sequences and a huge amount of data produced by high-throughput approaches used to exploit these genomic features. Crossing and comparing such heterogeneous and disparate data will help improving functional annotation of genomes. This requires designing elaborate integration systems such as warehouses for storing and querying these data. Results: We have designed a relational genomic warehouse with an original multi-layer architecture made of a databases layer and an entities layer. We describe a new querying module, GenoQuery, which is based on this architecture. We use the entities layer to define mixed queries. These mixed queries allow searching for instances of biological entities and their properties in the different databases, without specifying in which database they should be found. Accordingly, we further introduce the central notion of alternative queries. Such queries have the same meaning as the original mixed queries, while exploiting complementarities yielded by the various integrated databases of the warehouse. We explain how GenoQuery computes all the alternative queries of a given mixed query. We illustrate how useful this querying module is by means of a thorough example. Availability: http://www.lri.fr/~lemoine/GenoQuery/ Contact: chris@lri.fr, lemoine@lri.fr PMID:18586731

  1. Querying Proofs

    NASA Technical Reports Server (NTRS)

    Aspinall, David; Denney, Ewen; Lueth, Christoph

    2012-01-01

    We motivate and introduce a query language PrQL designed for inspecting machine representations of proofs. PrQL natively supports hiproofs which express proof structure using hierarchical nested labelled trees. The core language presented in this paper is locally structured (first-order), with queries built using recursion and patterns over proof structure and rule names. We define the syntax and semantics of locally structured queries, demonstrate their power, and sketch some implementation experiments.

  2. The role of economics in the QUERI program: QUERI Series.

    PubMed

    Smith, Mark W; Barnett, Paul G

    2008-04-22

    The United States (U.S.) Department of Veterans Affairs (VA) Quality Enhancement Research Initiative (QUERI) has implemented economic analyses in single-site and multi-site clinical trials. To date, no one has reviewed whether the QUERI Centers are taking an optimal approach to doing so. Consistent with the continuous learning culture of the QUERI Program, this paper provides such a reflection. We present a case study of QUERI as an example of how economic considerations can and should be integrated into implementation research within both single and multi-site studies. We review theoretical and applied cost research in implementation studies outside and within VA. We also present a critique of the use of economic research within the QUERI program. Economic evaluation is a key element of implementation research. QUERI has contributed many developments in the field of implementation but has only recently begun multi-site implementation trials across multiple regions within the national VA healthcare system. These trials are unusual in their emphasis on developing detailed costs of implementation, as well as in the use of business case analyses (budget impact analyses). Economics appears to play an important role in QUERI implementation studies, only after implementation has reached the stage of multi-site trials. Economic analysis could better inform the choice of which clinical best practices to implement and the choice of implementation interventions to employ. QUERI economics also would benefit from research on costing methods and development of widely accepted international standards for implementation economics.

  3. IJA: an efficient algorithm for query processing in sensor networks.

    PubMed

    Lee, Hyun Chang; Lee, Young Jae; Lim, Ji Hyang; Kim, Dong Hwa

    2011-01-01

    One of main features in sensor networks is the function that processes real time state information after gathering needed data from many domains. The component technologies consisting of each node called a sensor node that are including physical sensors, processors, actuators and power have advanced significantly over the last decade. Thanks to the advanced technology, over time sensor networks have been adopted in an all-round industry sensing physical phenomenon. However, sensor nodes in sensor networks are considerably constrained because with their energy and memory resources they have a very limited ability to process any information compared to conventional computer systems. Thus query processing over the nodes should be constrained because of their limitations. Due to the problems, the join operations in sensor networks are typically processed in a distributed manner over a set of nodes and have been studied. By way of example while simple queries, such as select and aggregate queries, in sensor networks have been addressed in the literature, the processing of join queries in sensor networks remains to be investigated. Therefore, in this paper, we propose and describe an Incremental Join Algorithm (IJA) in Sensor Networks to reduce the overhead caused by moving a join pair to the final join node or to minimize the communication cost that is the main consumer of the battery when processing the distributed queries in sensor networks environments. At the same time, the simulation result shows that the proposed IJA algorithm significantly reduces the number of bytes to be moved to join nodes compared to the popular synopsis join algorithm.

  4. IJA: An Efficient Algorithm for Query Processing in Sensor Networks

    PubMed Central

    Lee, Hyun Chang; Lee, Young Jae; Lim, Ji Hyang; Kim, Dong Hwa

    2011-01-01

    One of main features in sensor networks is the function that processes real time state information after gathering needed data from many domains. The component technologies consisting of each node called a sensor node that are including physical sensors, processors, actuators and power have advanced significantly over the last decade. Thanks to the advanced technology, over time sensor networks have been adopted in an all-round industry sensing physical phenomenon. However, sensor nodes in sensor networks are considerably constrained because with their energy and memory resources they have a very limited ability to process any information compared to conventional computer systems. Thus query processing over the nodes should be constrained because of their limitations. Due to the problems, the join operations in sensor networks are typically processed in a distributed manner over a set of nodes and have been studied. By way of example while simple queries, such as select and aggregate queries, in sensor networks have been addressed in the literature, the processing of join queries in sensor networks remains to be investigated. Therefore, in this paper, we propose and describe an Incremental Join Algorithm (IJA) in Sensor Networks to reduce the overhead caused by moving a join pair to the final join node or to minimize the communication cost that is the main consumer of the battery when processing the distributed queries in sensor networks environments. At the same time, the simulation result shows that the proposed IJA algorithm significantly reduces the number of bytes to be moved to join nodes compared to the popular synopsis join algorithm. PMID:22319375

  5. The role of economics in the QUERI program: QUERI Series

    PubMed Central

    Smith, Mark W; Barnett, Paul G

    2008-01-01

    Background The United States (U.S.) Department of Veterans Affairs (VA) Quality Enhancement Research Initiative (QUERI) has implemented economic analyses in single-site and multi-site clinical trials. To date, no one has reviewed whether the QUERI Centers are taking an optimal approach to doing so. Consistent with the continuous learning culture of the QUERI Program, this paper provides such a reflection. Methods We present a case study of QUERI as an example of how economic considerations can and should be integrated into implementation research within both single and multi-site studies. We review theoretical and applied cost research in implementation studies outside and within VA. We also present a critique of the use of economic research within the QUERI program. Results Economic evaluation is a key element of implementation research. QUERI has contributed many developments in the field of implementation but has only recently begun multi-site implementation trials across multiple regions within the national VA healthcare system. These trials are unusual in their emphasis on developing detailed costs of implementation, as well as in the use of business case analyses (budget impact analyses). Conclusion Economics appears to play an important role in QUERI implementation studies, only after implementation has reached the stage of multi-site trials. Economic analysis could better inform the choice of which clinical best practices to implement and the choice of implementation interventions to employ. QUERI economics also would benefit from research on costing methods and development of widely accepted international standards for implementation economics. PMID:18430199

  6. In-context query reformulation for failing SPARQL queries

    NASA Astrophysics Data System (ADS)

    Viswanathan, Amar; Michaelis, James R.; Cassidy, Taylor; de Mel, Geeth; Hendler, James

    2017-05-01

    Knowledge bases for decision support systems are growing increasingly complex, through continued advances in data ingest and management approaches. However, humans do not possess the cognitive capabilities to retain a bird's-eyeview of such knowledge bases, and may end up issuing unsatisfiable queries to such systems. This work focuses on the implementation of a query reformulation approach for graph-based knowledge bases, specifically designed to support the Resource Description Framework (RDF). The reformulation approach presented is instance-and schema-aware. Thus, in contrast to relaxation techniques found in the state-of-the-art, the presented approach produces in-context query reformulation.

  7. Query Expansion and Query Translation as Logical Inference.

    ERIC Educational Resources Information Center

    Nie, Jian-Yun

    2003-01-01

    Examines query expansion during query translation in cross language information retrieval and develops a general framework for inferential information retrieval in two particular contexts: using fuzzy logic and probability theory. Obtains evaluation formulas that are shown to strongly correspond to those used in other information retrieval models.…

  8. A Method of Data Aggregation for Wearable Sensor Systems

    PubMed Central

    Shen, Bo; Fu, Jun-Song

    2016-01-01

    Data aggregation has been considered as an effective way to decrease the data to be transferred in sensor networks. Particularly for wearable sensor systems, smaller battery has less energy, which makes energy conservation in data transmission more important. Nevertheless, wearable sensor systems usually have features like frequently dynamic changes of topologies and data over a large range, of which current aggregating methods can’t adapt to the demand. In this paper, we study the system composed of many wearable devices with sensors, such as the network of a tactical unit, and introduce an energy consumption-balanced method of data aggregation, named LDA-RT. In the proposed method, we develop a query algorithm based on the idea of ‘happened-before’ to construct a dynamic and energy-balancing routing tree. We also present a distributed data aggregating and sorting algorithm to execute top-k query and decrease the data that must be transferred among wearable devices. Combining these algorithms, LDA-RT tries to balance the energy consumptions for prolonging the lifetime of wearable sensor systems. Results of evaluation indicate that LDA-RT performs well in constructing routing trees and energy balances. It also outperforms the filter-based top-k monitoring approach in energy consumption, load balance, and the network’s lifetime, especially for highly dynamic data sources. PMID:27347953

  9. Protein aggregate turbidity: Simulation of turbidity profiles for mixed-aggregation reactions.

    PubMed

    Hall, Damien; Zhao, Ran; Dehlsen, Ian; Bloomfield, Nathaniel; Williams, Steven R; Arisaka, Fumio; Goto, Yuji; Carver, John A

    2016-04-01

    Due to their colloidal nature, all protein aggregates scatter light in the visible wavelength region when formed in aqueous solution. This phenomenon makes solution turbidity, a quantity proportional to the relative loss in forward intensity of scattered light, a convenient method for monitoring protein aggregation in biochemical assays. Although turbidity is often taken to be a linear descriptor of the progress of aggregation reactions, this assumption is usually made without performing the necessary checks to provide it with a firm underlying basis. In this article, we outline utilitarian methods for simulating the turbidity generated by homogeneous and mixed-protein aggregation reactions containing fibrous, amorphous, and crystalline structures. The approach is based on a combination of Rayleigh-Gans-Debye theory and approximate forms of the Mie scattering equations. Crown Copyright © 2015. Published by Elsevier Inc. All rights reserved.

  10. A Review of Statistical Disclosure Control Techniques Employed by Web-Based Data Query Systems.

    PubMed

    Matthews, Gregory J; Harel, Ofer; Aseltine, Robert H

    We systematically reviewed the statistical disclosure control techniques employed for releasing aggregate data in Web-based data query systems listed in the National Association for Public Health Statistics and Information Systems (NAPHSIS). Each Web-based data query system was examined to see whether (1) it employed any type of cell suppression, (2) it used secondary cell suppression, and (3) suppressed cell counts could be calculated. No more than 30 minutes was spent on each system. Of the 35 systems reviewed, no suppression was observed in more than half (n = 18); observed counts below the threshold were observed in 2 sites; and suppressed values were recoverable in 9 sites. Six sites effectively suppressed small counts. This inquiry has revealed substantial weaknesses in the protective measures used in data query systems containing sensitive public health data. Many systems utilized no disclosure control whatsoever, and the vast majority of those that did deployed it inconsistently or inadequately.

  11. VISAGE: Interactive Visual Graph Querying.

    PubMed

    Pienta, Robert; Navathe, Shamkant; Tamersoy, Acar; Tong, Hanghang; Endert, Alex; Chau, Duen Horng

    2016-06-01

    Extracting useful patterns from large network datasets has become a fundamental challenge in many domains. We present VISAGE, an interactive visual graph querying approach that empowers users to construct expressive queries, without writing complex code (e.g., finding money laundering rings of bankers and business owners). Our contributions are as follows: (1) we introduce graph autocomplete , an interactive approach that guides users to construct and refine queries, preventing over-specification; (2) VISAGE guides the construction of graph queries using a data-driven approach, enabling users to specify queries with varying levels of specificity, from concrete and detailed (e.g., query by example), to abstract (e.g., with "wildcard" nodes of any types), to purely structural matching; (3) a twelve-participant, within-subject user study demonstrates VISAGE's ease of use and the ability to construct graph queries significantly faster than using a conventional query language; (4) VISAGE works on real graphs with over 468K edges, achieving sub-second response times for common queries.

  12. VISAGE: Interactive Visual Graph Querying

    PubMed Central

    Pienta, Robert; Navathe, Shamkant; Tamersoy, Acar; Tong, Hanghang; Endert, Alex; Chau, Duen Horng

    2017-01-01

    Extracting useful patterns from large network datasets has become a fundamental challenge in many domains. We present VISAGE, an interactive visual graph querying approach that empowers users to construct expressive queries, without writing complex code (e.g., finding money laundering rings of bankers and business owners). Our contributions are as follows: (1) we introduce graph autocomplete, an interactive approach that guides users to construct and refine queries, preventing over-specification; (2) VISAGE guides the construction of graph queries using a data-driven approach, enabling users to specify queries with varying levels of specificity, from concrete and detailed (e.g., query by example), to abstract (e.g., with “wildcard” nodes of any types), to purely structural matching; (3) a twelve-participant, within-subject user study demonstrates VISAGE’s ease of use and the ability to construct graph queries significantly faster than using a conventional query language; (4) VISAGE works on real graphs with over 468K edges, achieving sub-second response times for common queries. PMID:28553670

  13. SkyQuery - A Prototype Distributed Query and Cross-Matching Web Service for the Virtual Observatory

    NASA Astrophysics Data System (ADS)

    Thakar, A. R.; Budavari, T.; Malik, T.; Szalay, A. S.; Fekete, G.; Nieto-Santisteban, M.; Haridas, V.; Gray, J.

    2002-12-01

    We have developed a prototype distributed query and cross-matching service for the VO community, called SkyQuery, which is implemented with hierarchichal Web Services. SkyQuery enables astronomers to run combined queries on existing distributed heterogeneous astronomy archives. SkyQuery provides a simple, user-friendly interface to run distributed queries over the federation of registered astronomical archives in the VO. The SkyQuery client connects to the portal Web Service, which farms the query out to the individual archives, which are also Web Services called SkyNodes. The cross-matching algorithm is run recursively on each SkyNode. Each archive is a relational DBMS with a HTM index for fast spatial lookups. The results of the distributed query are returned as an XML DataSet that is automatically rendered by the client. SkyQuery also returns the image cutout corresponding to the query result. SkyQuery finds not only matches between the various catalogs, but also dropouts - objects that exist in some of the catalogs but not in others. This is often as important as finding matches. We demonstrate the utility of SkyQuery with a brown-dwarf search between SDSS and 2MASS, and a search for radio-quiet quasars in SDSS, 2MASS and FIRST. The importance of a service like SkyQuery for the worldwide astronomical community cannot be overstated: data on the same objects in various archives is mapped in different wavelength ranges and looks very different due to different errors, instrument sensitivities and other peculiarities of each archive. Our cross-matching algorithm preforms a fuzzy spatial join across multiple catalogs. This type of cross-matching is currently often done by eye, one object at a time. A static cross-identification table for a set of archives would become obsolete by the time it was built - the exponential growth of astronomical data means that a dynamic cross-identification mechanism like SkyQuery is the only viable option. SkyQuery was funded by a

  14. Querying Safety Cases

    NASA Technical Reports Server (NTRS)

    Denney, Ewen W.; Naylor, Dwight; Pai, Ganesh

    2014-01-01

    Querying a safety case to show how the various stakeholders' concerns about system safety are addressed has been put forth as one of the benefits of argument-based assurance (in a recent study by the Health Foundation, UK, which reviewed the use of safety cases in safety-critical industries). However, neither the literature nor current practice offer much guidance on querying mechanisms appropriate for, or available within, a safety case paradigm. This paper presents a preliminary approach that uses a formal basis for querying safety cases, specifically Goal Structuring Notation (GSN) argument structures. Our approach semantically enriches GSN arguments with domain-specific metadata that the query language leverages, along with its inherent structure, to produce views. We have implemented the approach in our toolset AdvoCATE, and illustrate it by application to a fragment of the safety argument for an Unmanned Aircraft System (UAS) being developed at NASA Ames. We also discuss the potential practical utility of our query mechanism within the context of the existing framework for UAS safety assurance.

  15. Noesis: Ontology based Scoped Search Engine and Resource Aggregator for Atmospheric Science

    NASA Astrophysics Data System (ADS)

    Ramachandran, R.; Movva, S.; Li, X.; Cherukuri, P.; Graves, S.

    2006-12-01

    The goal for search engines is to return results that are both accurate and complete. The search engines should find only what you really want and find everything you really want. Search engines (even meta search engines) lack semantics. The basis for search is simply based on string matching between the user's query term and the resource database and the semantics associated with the search string is not captured. For example, if an atmospheric scientist is searching for "pressure" related web resources, most search engines return inaccurate results such as web resources related to blood pressure. In this presentation Noesis, which is a meta-search engine and a resource aggregator that uses domain ontologies to provide scoped search capabilities will be described. Noesis uses domain ontologies to help the user scope the search query to ensure that the search results are both accurate and complete. The domain ontologies guide the user to refine their search query and thereby reduce the user's burden of experimenting with different search strings. Semantics are captured by refining the query terms to cover synonyms, specializations, generalizations and related concepts. Noesis also serves as a resource aggregator. It categorizes the search results from different online resources such as education materials, publications, datasets, web search engines that might be of interest to the user.

  16. Aggregate Resource Inventory and Needs Forecast Study : Final Report

    DOT National Transportation Integrated Search

    2002-09-01

    This study identified and inventoried ODOT-owned and leased aggregate sites throughout the state, assessing the : quality and estimated quantity of material. In addition, an aggregate needs forecast was prepared, projecting that : 60,801,320 Mg of ag...

  17. The CMS DBS query language

    NASA Astrophysics Data System (ADS)

    Kuznetsov, Valentin; Riley, Daniel; Afaq, Anzar; Sekhri, Vijay; Guo, Yuyi; Lueking, Lee

    2010-04-01

    The CMS experiment has implemented a flexible and powerful system enabling users to find data within the CMS physics data catalog. The Dataset Bookkeeping Service (DBS) comprises a database and the services used to store and access metadata related to CMS physics data. To this, we have added a generalized query system in addition to the existing web and programmatic interfaces to the DBS. This query system is based on a query language that hides the complexity of the underlying database structure by discovering the join conditions between database tables. This provides a way of querying the system that is simple and straightforward for CMS data managers and physicists to use without requiring knowledge of the database tables or keys. The DBS Query Language uses the ANTLR tool to build the input query parser and tokenizer, followed by a query builder that uses a graph representation of the DBS schema to construct the SQL query sent to underlying database. We will describe the design of the query system, provide details of the language components and overview of how this component fits into the overall data discovery system architecture.

  18. CUFID-query: accurate network querying through random walk based network flow estimation.

    PubMed

    Jeong, Hyundoo; Qian, Xiaoning; Yoon, Byung-Jun

    2017-12-28

    Functional modules in biological networks consist of numerous biomolecules and their complicated interactions. Recent studies have shown that biomolecules in a functional module tend to have similar interaction patterns and that such modules are often conserved across biological networks of different species. As a result, such conserved functional modules can be identified through comparative analysis of biological networks. In this work, we propose a novel network querying algorithm based on the CUFID (Comparative network analysis Using the steady-state network Flow to IDentify orthologous proteins) framework combined with an efficient seed-and-extension approach. The proposed algorithm, CUFID-query, can accurately detect conserved functional modules as small subnetworks in the target network that are expected to perform similar functions to the given query functional module. The CUFID framework was recently developed for probabilistic pairwise global comparison of biological networks, and it has been applied to pairwise global network alignment, where the framework was shown to yield accurate network alignment results. In the proposed CUFID-query algorithm, we adopt the CUFID framework and extend it for local network alignment, specifically to solve network querying problems. First, in the seed selection phase, the proposed method utilizes the CUFID framework to compare the query and the target networks and to predict the probabilistic node-to-node correspondence between the networks. Next, the algorithm selects and greedily extends the seed in the target network by iteratively adding nodes that have frequent interactions with other nodes in the seed network, in a way that the conductance of the extended network is maximally reduced. Finally, CUFID-query removes irrelevant nodes from the querying results based on the personalized PageRank vector for the induced network that includes the fully extended network and its neighboring nodes. Through extensive

  19. Knowledge Query Language (KQL)

    DTIC Science & Technology

    2016-02-12

    Lexington Massachusetts This page intentionally left blank. iii EXECUTIVE SUMMARY Currently, queries for data ...retrieval from non-Structured Query Language (NoSQL) data stores are tightly coupled to the specific implementation of the data store implementation...independent of the storage content and format for querying NoSQL or relational data stores. This approach uses address expressions (or A-Expressions

  20. Heuristic query optimization for query multiple table and multiple clausa on mobile finance application

    NASA Astrophysics Data System (ADS)

    Indrayana, I. N. E.; P, N. M. Wirasyanti D.; Sudiartha, I. KG

    2018-01-01

    Mobile application allow many users to access data from the application without being limited to space, space and time. Over time the data population of this application will increase. Data access time will cause problems if the data record has reached tens of thousands to millions of records.The objective of this research is to maintain the performance of data execution for large data records. One effort to maintain data access time performance is to apply query optimization method. The optimization used in this research is query heuristic optimization method. The built application is a mobile-based financial application using MySQL database with stored procedure therein. This application is used by more than one business entity in one database, thus enabling rapid data growth. In this stored procedure there is an optimized query using heuristic method. Query optimization is performed on a “Select” query that involves more than one table with multiple clausa. Evaluation is done by calculating the average access time using optimized and unoptimized queries. Access time calculation is also performed on the increase of population data in the database. The evaluation results shown the time of data execution with query heuristic optimization relatively faster than data execution time without using query optimization.

  1. Occam's razor: supporting visual query expression for content-based image queries

    NASA Astrophysics Data System (ADS)

    Venters, Colin C.; Hartley, Richard J.; Hewitt, William T.

    2005-01-01

    This paper reports the results of a usability experiment that investigated visual query formulation on three dimensions: effectiveness, efficiency, and user satisfaction. Twenty eight evaluation sessions were conducted in order to assess the extent to which query by visual example supports visual query formulation in a content-based image retrieval environment. In order to provide a context and focus for the investigation, the study was segmented by image type, user group, and use function. The image type consisted of a set of abstract geometric device marks supplied by the UK Trademark Registry. Users were selected from the 14 UK Patent Information Network offices. The use function was limited to the retrieval of images by shape similarity. Two client interfaces were developed for comparison purposes: Trademark Image Browser Engine (TRIBE) and Shape Query Image Retrieval Systems Engine (SQUIRE).

  2. Multi-Bit Quantum Private Query

    NASA Astrophysics Data System (ADS)

    Shi, Wei-Xu; Liu, Xing-Tong; Wang, Jian; Tang, Chao-Jing

    2015-09-01

    Most of the existing Quantum Private Queries (QPQ) protocols provide only single-bit queries service, thus have to be repeated several times when more bits are retrieved. Wei et al.'s scheme for block queries requires a high-dimension quantum key distribution system to sustain, which is still restricted in the laboratory. Here, based on Markus Jakobi et al.'s single-bit QPQ protocol, we propose a multi-bit quantum private query protocol, in which the user can get access to several bits within one single query. We also extend the proposed protocol to block queries, using a binary matrix to guard database security. Analysis in this paper shows that our protocol has better communication complexity, implementability and can achieve a considerable level of security.

  3. Knowledge Query Language (KQL)

    DTIC Science & Technology

    2016-02-01

    unlimited. This page intentionally left blank. iii EXECUTIVE SUMMARY Currently, queries for data ...retrieval from non-Structured Query Language (NoSQL) data stores are tightly coupled to the specific implementation of the data store implementation, making...of the storage content and format for querying NoSQL or relational data stores. This approach uses address expressions (or A-Expressions) embedded in

  4. Construction aggregates

    USGS Publications Warehouse

    Nelson, T.I.; Bolen, W.P.

    2007-01-01

    Construction aggregates, primarily stone, sand and gravel, are recovered from widespread naturally occurring mineral deposits and processed for use primarily in the construction industry. They are mined, crushed, sorted by size and sold loose or combined with portland cement or asphaltic cement to make concrete products to build roads, houses, buildings, and other structures. Much smaller quantities are used in agriculture, cement manufacture, chemical and metallurgical processes, glass production and many other products.

  5. Occam"s razor: supporting visual query expression for content-based image queries

    NASA Astrophysics Data System (ADS)

    Venters, Colin C.; Hartley, Richard J.; Hewitt, William T.

    2004-12-01

    This paper reports the results of a usability experiment that investigated visual query formulation on three dimensions: effectiveness, efficiency, and user satisfaction. Twenty eight evaluation sessions were conducted in order to assess the extent to which query by visual example supports visual query formulation in a content-based image retrieval environment. In order to provide a context and focus for the investigation, the study was segmented by image type, user group, and use function. The image type consisted of a set of abstract geometric device marks supplied by the UK Trademark Registry. Users were selected from the 14 UK Patent Information Network offices. The use function was limited to the retrieval of images by shape similarity. Two client interfaces were developed for comparison purposes: Trademark Image Browser Engine (TRIBE) and Shape Query Image Retrieval Systems Engine (SQUIRE).

  6. Federated Space-Time Query for Earth Science Data Using OpenSearch Conventions

    NASA Astrophysics Data System (ADS)

    Lynnes, C.; Beaumont, B.; Duerr, R. E.; Hua, H.

    2009-12-01

    The past decade has seen a burgeoning of remote sensing and Earth science data providers, as evidenced in the growth of the Earth Science Information Partner (ESIP) federation. At the same time, the need to combine diverse data sets to enable understanding of the Earth as a system has also grown. While the expansion of data providers is in general a boon to such studies, the diversity presents a challenge to finding useful data for a given study. Locating all the data files with aerosol information for a particular volcanic eruption, for example, may involve learning and using several different search tools to execute the requisite space-time queries. To address this issue, the ESIP federation is developing a federated space-time query framework, based on the OpenSearch convention (www.opensearch.org), with Geo and Time extensions. In this framework, data providers publish OpenSearch Description Documents that describe in a machine-readable form how to execute queries against the provider. The novelty of OpenSearch is that the space-time query interface becomes both machine callable and easy enough to integrate into the web browser's search box. This flexibility, together with a simple REST (HTTP-get) interface, should allow a variety of data providers to participate in the federated search framework, from large institutional data centers to individual scientists. The simple interface enables trivial querying of multiple data sources and participation in recursive-like federated searches--all using the same common OpenSearch interface. This simplicity also makes the construction of clients easy, as does existing OpenSearch client libraries in a variety of languages. Moreover, a number of clients and aggregation services already exist and OpenSearch is already supported by a number of web browsers such as Firefox and Internet Explorer.

  7. Incremental Query Rewriting with Resolution

    NASA Astrophysics Data System (ADS)

    Riazanov, Alexandre; Aragão, Marcelo A. T.

    We address the problem of semantic querying of relational databases (RDB) modulo knowledge bases using very expressive knowledge representation formalisms, such as full first-order logic or its various fragments. We propose to use a resolution-based first-order logic (FOL) reasoner for computing schematic answers to deductive queries, with the subsequent translation of these schematic answers to SQL queries which are evaluated using a conventional relational DBMS. We call our method incremental query rewriting, because an original semantic query is rewritten into a (potentially infinite) series of SQL queries. In this chapter, we outline the main idea of our technique - using abstractions of databases and constrained clauses for deriving schematic answers, and provide completeness and soundness proofs to justify the applicability of this technique to the case of resolution for FOL without equality. The proposed method can be directly used with regular RDBs, including legacy databases. Moreover, we propose it as a potential basis for an efficient Web-scale semantic search technology.

  8. A Semantic Graph Query Language

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kaplan, I L

    2006-10-16

    Semantic graphs can be used to organize large amounts of information from a number of sources into one unified structure. A semantic query language provides a foundation for extracting information from the semantic graph. The graph query language described here provides a simple, powerful method for querying semantic graphs.

  9. SPARK: Adapting Keyword Query to Semantic Search

    NASA Astrophysics Data System (ADS)

    Zhou, Qi; Wang, Chong; Xiong, Miao; Wang, Haofen; Yu, Yong

    Semantic search promises to provide more accurate result than present-day keyword search. However, progress with semantic search has been delayed due to the complexity of its query languages. In this paper, we explore a novel approach of adapting keywords to querying the semantic web: the approach automatically translates keyword queries into formal logic queries so that end users can use familiar keywords to perform semantic search. A prototype system named 'SPARK' has been implemented in light of this approach. Given a keyword query, SPARK outputs a ranked list of SPARQL queries as the translation result. The translation in SPARK consists of three major steps: term mapping, query graph construction and query ranking. Specifically, a probabilistic query ranking model is proposed to select the most likely SPARQL query. In the experiment, SPARK achieved an encouraging translation result.

  10. Querying and Ranking XML Documents.

    ERIC Educational Resources Information Center

    Schlieder, Torsten; Meuss, Holger

    2002-01-01

    Discussion of XML, information retrieval, precision, and recall focuses on a retrieval technique that adopts the similarity measure of the vector space model, incorporates the document structure, and supports structured queries. Topics include a query model based on tree matching; structured queries and term-based ranking; and term frequency and…

  11. Quantum Private Queries

    NASA Astrophysics Data System (ADS)

    Giovannetti, Vittorio; Lloyd, Seth; Maccone, Lorenzo

    2008-06-01

    We propose a cheat sensitive quantum protocol to perform a private search on a classical database which is efficient in terms of communication complexity. It allows a user to retrieve an item from the database provider without revealing which item he or she retrieved: if the provider tries to obtain information on the query, the person querying the database can find it out. The protocol ensures also perfect data privacy of the database: the information that the user can retrieve in a single query is bounded and does not depend on the size of the database. With respect to the known (quantum and classical) strategies for private information retrieval, our protocol displays an exponential reduction in communication complexity and in running-time computational complexity.

  12. Comparative Analysis of Rank Aggregation Techniques for Metasearch Using Genetic Algorithm

    ERIC Educational Resources Information Center

    Kaur, Parneet; Singh, Manpreet; Singh Josan, Gurpreet

    2017-01-01

    Rank Aggregation techniques have found wide applications for metasearch along with other streams such as Sports, Voting System, Stock Markets, and Reduction in Spam. This paper presents the optimization of rank lists for web queries put by the user on different MetaSearch engines. A metaheuristic approach such as Genetic algorithm based rank…

  13. Secure Skyline Queries on Cloud Platform.

    PubMed

    Liu, Jinfei; Yang, Juncheng; Xiong, Li; Pei, Jian

    2017-04-01

    Outsourcing data and computation to cloud server provides a cost-effective way to support large scale data storage and query processing. However, due to security and privacy concerns, sensitive data (e.g., medical records) need to be protected from the cloud server and other unauthorized users. One approach is to outsource encrypted data to the cloud server and have the cloud server perform query processing on the encrypted data only. It remains a challenging task to support various queries over encrypted data in a secure and efficient way such that the cloud server does not gain any knowledge about the data, query, and query result. In this paper, we study the problem of secure skyline queries over encrypted data. The skyline query is particularly important for multi-criteria decision making but also presents significant challenges due to its complex computations. We propose a fully secure skyline query protocol on data encrypted using semantically-secure encryption. As a key subroutine, we present a new secure dominance protocol, which can be also used as a building block for other queries. Finally, we provide both serial and parallelized implementations and empirically study the protocols in terms of efficiency and scalability under different parameter settings, verifying the feasibility of our proposed solutions.

  14. Secure Skyline Queries on Cloud Platform

    PubMed Central

    Liu, Jinfei; Yang, Juncheng; Xiong, Li; Pei, Jian

    2017-01-01

    Outsourcing data and computation to cloud server provides a cost-effective way to support large scale data storage and query processing. However, due to security and privacy concerns, sensitive data (e.g., medical records) need to be protected from the cloud server and other unauthorized users. One approach is to outsource encrypted data to the cloud server and have the cloud server perform query processing on the encrypted data only. It remains a challenging task to support various queries over encrypted data in a secure and efficient way such that the cloud server does not gain any knowledge about the data, query, and query result. In this paper, we study the problem of secure skyline queries over encrypted data. The skyline query is particularly important for multi-criteria decision making but also presents significant challenges due to its complex computations. We propose a fully secure skyline query protocol on data encrypted using semantically-secure encryption. As a key subroutine, we present a new secure dominance protocol, which can be also used as a building block for other queries. Finally, we provide both serial and parallelized implementations and empirically study the protocols in terms of efficiency and scalability under different parameter settings, verifying the feasibility of our proposed solutions. PMID:28883710

  15. Lost in translation? A multilingual Query Builder improves the quality of PubMed queries: a randomised controlled trial.

    PubMed

    Schuers, Matthieu; Joulakian, Mher; Kerdelhué, Gaetan; Segas, Léa; Grosjean, Julien; Darmoni, Stéfan J; Griffon, Nicolas

    2017-07-03

    MEDLINE is the most widely used medical bibliographic database in the world. Most of its citations are in English and this can be an obstacle for some researchers to access the information the database contains. We created a multilingual query builder to facilitate access to the PubMed subset using a language other than English. The aim of our study was to assess the impact of this multilingual query builder on the quality of PubMed queries for non-native English speaking physicians and medical researchers. A randomised controlled study was conducted among French speaking general practice residents. We designed a multi-lingual query builder to facilitate information retrieval, based on available MeSH translations and providing users with both an interface and a controlled vocabulary in their own language. Participating residents were randomly allocated either the French or the English version of the query builder. They were asked to translate 12 short medical questions into MeSH queries. The main outcome was the quality of the query. Two librarians blind to the arm independently evaluated each query, using a modified published classification that differentiated eight types of errors. Twenty residents used the French version of the query builder and 22 used the English version. 492 queries were analysed. There were significantly more perfect queries in the French group vs. the English group (respectively 37.9% vs. 17.9%; p < 0.01). It took significantly more time for the members of the English group than the members of the French group to build each query, respectively 194 sec vs. 128 sec; p < 0.01. This multi-lingual query builder is an effective tool to improve the quality of PubMed queries in particular for researchers whose first language is not English.

  16. Monitoring Moving Queries inside a Safe Region

    PubMed Central

    Al-Khalidi, Haidar; Taniar, David; Alamri, Sultan

    2014-01-01

    With mobile moving range queries, there is a need to recalculate the relevant surrounding objects of interest whenever the query moves. Therefore, monitoring the moving query is very costly. The safe region is one method that has been proposed to minimise the communication and computation cost of continuously monitoring a moving range query. Inside the safe region the set of objects of interest to the query do not change; thus there is no need to update the query while it is inside its safe region. However, when the query leaves its safe region the mobile device has to reevaluate the query, necessitating communication with the server. Knowing when and where the mobile device will leave a safe region is widely known as a difficult problem. To solve this problem, we propose a novel method to monitor the position of the query over time using a linear function based on the direction of the query obtained by periodic monitoring of its position. Periodic monitoring ensures that the query is aware of its location all the time. This method reduces the costs associated with communications in client-server architecture. Computational results show that our method is successful in handling moving query patterns. PMID:24696652

  17. Sociocultural dimensions of supply and demand for natural aggregate; examples from the Mid-Atlantic region, United States

    USGS Publications Warehouse

    Robinson, Gilpin R.; Brown, William M.

    2002-01-01

    The United States uses large quantities of natural aggregate to build and maintain a continuously expanding infrastructure. In recent years, per capita demand for aggregate in the United States has grown to about 9.7 metric tons (10.7 tons) per person per year. Over the next 25 years, the aggregate industry expects to mine quantities equivalent to all aggregate mined in the United States over the past 100 years. The issues surrounding supply and demand for aggregate in the mid-Atlantic states of Maryland, Pennsylvania, Virginia, and West Virginia illustrate competing requirements for industrial minerals and many simultaneous social and environmental objectives.

  18. Searching for rare diseases in PubMed: a blind comparison of Orphanet expert query and query based on terminological knowledge.

    PubMed

    Griffon, N; Schuers, M; Dhombres, F; Merabti, T; Kerdelhué, G; Rollin, L; Darmoni, S J

    2016-08-02

    Despite international initiatives like Orphanet, it remains difficult to find up-to-date information about rare diseases. The aim of this study is to propose an exhaustive set of queries for PubMed based on terminological knowledge and to evaluate it versus the queries based on expertise provided by the most frequently used resource in Europe: Orphanet. Four rare disease terminologies (MeSH, OMIM, HPO and HRDO) were manually mapped to each other permitting the automatic creation of expended terminological queries for rare diseases. For 30 rare diseases, 30 citations retrieved by Orphanet expert query and/or query based on terminological knowledge were assessed for relevance by two independent reviewers unaware of the query's origin. An adjudication procedure was used to resolve any discrepancy. Precision, relative recall and F-measure were all computed. For each Orphanet rare disease (n = 8982), there was a corresponding terminological query, in contrast with only 2284 queries provided by Orphanet. Only 553 citations were evaluated due to queries with 0 or only a few hits. There were no significant differences between the Orpha query and terminological query in terms of precision, respectively 0.61 vs 0.52 (p = 0.13). Nevertheless, terminological queries retrieved more citations more often than Orpha queries (0.57 vs. 0.33; p = 0.01). Interestingly, Orpha queries seemed to retrieve older citations than terminological queries (p < 0.0001). The terminological queries proposed in this study are now currently available for all rare diseases. They may be a useful tool for both precision or recall oriented literature search.

  19. Personalized query suggestion based on user behavior

    NASA Astrophysics Data System (ADS)

    Chen, Wanyu; Hao, Zepeng; Shao, Taihua; Chen, Honghui

    Query suggestions help users refine their queries after they input an initial query. Previous work mainly concentrated on similarity-based and context-based query suggestion approaches. However, models that focus on adapting to a specific user (personalization) can help to improve the probability of the user being satisfied. In this paper, we propose a personalized query suggestion model based on users’ search behavior (UB model), where we inject relevance between queries and users’ search behavior into a basic probabilistic model. For the relevance between queries, we consider their semantical similarity and co-occurrence which indicates the behavior information from other users in web search. Regarding the current user’s preference to a query, we combine the user’s short-term and long-term search behavior in a linear fashion and deal with the data sparse problem with Bayesian probabilistic matrix factorization (BPMF). In particular, we also investigate the impact of different personalization strategies (the combination of the user’s short-term and long-term search behavior) on the performance of query suggestion reranking. We quantify the improvement of our proposed UB model against a state-of-the-art baseline using the public AOL query logs and show that it beats the baseline in terms of metrics used in query suggestion reranking. The experimental results show that: (i) for personalized ranking, users’ behavioral information helps to improve query suggestion effectiveness; and (ii) given a query, merging information inferred from the short-term and long-term search behavior of a particular user can result in a better performance than both plain approaches.

  20. Evolution of Query Optimization Methods

    NASA Astrophysics Data System (ADS)

    Hameurlain, Abdelkader; Morvan, Franck

    Query optimization is the most critical phase in query processing. In this paper, we try to describe synthetically the evolution of query optimization methods from uniprocessor relational database systems to data Grid systems through parallel, distributed and data integration systems. We point out a set of parameters to characterize and compare query optimization methods, mainly: (i) size of the search space, (ii) type of method (static or dynamic), (iii) modification types of execution plans (re-optimization or re-scheduling), (iv) level of modification (intra-operator and/or inter-operator), (v) type of event (estimation errors, delay, user preferences), and (vi) nature of decision-making (centralized or decentralized control).

  1. Optimizing a Query by Transformation and Expansion.

    PubMed

    Glocker, Katrin; Knurr, Alexander; Dieter, Julia; Dominick, Friederike; Forche, Melanie; Koch, Christian; Pascoe Pérez, Analie; Roth, Benjamin; Ückert, Frank

    2017-01-01

    In the biomedical sector not only the amount of information produced and uploaded into the web is enormous, but also the number of sources where these data can be found. Clinicians and researchers spend huge amounts of time on trying to access this information and to filter the most important answers to a given question. As the formulation of these queries is crucial, automated query expansion is an effective tool to optimize a query and receive the best possible results. In this paper we introduce the concept of a workflow for an optimization of queries in the medical and biological sector by using a series of tools for expansion and transformation of the query. After the definition of attributes by the user, the query string is compared to previous queries in order to add semantic co-occurring terms to the query. Additionally, the query is enlarged by an inclusion of synonyms. The translation into database specific ontologies ensures the optimal query formulation for the chosen database(s). As this process can be performed in various databases at once, the results are ranked and normalized in order to achieve a comparable list of answers for a question.

  2. Facilitating Cohort Discovery by Enhancing Ontology Exploration, Query Management and Query Sharing for Large Clinical Data Repositories.

    PubMed

    Tao, Shiqiang; Cui, Licong; Wu, Xi; Zhang, Guo-Qiang

    2017-01-01

    To help researchers better access clinical data, we developed a prototype query engine called DataSphere for exploring large-scale integrated clinical data repositories. DataSphere expedites data importing using a NoSQL data management system and dynamically renders its user interface for concept-based querying tasks. DataSphere provides an interactive query-building interface together with query translation and optimization strategies, which enable users to build and execute queries effectively and efficiently. We successfully loaded a dataset of one million patients for University of Kentucky (UK) Healthcare into DataSphere with more than 300 million clinical data records. We evaluated DataSphere by comparing it with an instance of i2b2 deployed at UK Healthcare, demonstrating that DataSphere provides enhanced user experience for both query building and execution.

  3. Facilitating Cohort Discovery by Enhancing Ontology Exploration, Query Management and Query Sharing for Large Clinical Data Repositories

    PubMed Central

    Tao, Shiqiang; Cui, Licong; Wu, Xi; Zhang, Guo-Qiang

    2017-01-01

    To help researchers better access clinical data, we developed a prototype query engine called DataSphere for exploring large-scale integrated clinical data repositories. DataSphere expedites data importing using a NoSQL data management system and dynamically renders its user interface for concept-based querying tasks. DataSphere provides an interactive query-building interface together with query translation and optimization strategies, which enable users to build and execute queries effectively and efficiently. We successfully loaded a dataset of one million patients for University of Kentucky (UK) Healthcare into DataSphere with more than 300 million clinical data records. We evaluated DataSphere by comparing it with an instance of i2b2 deployed at UK Healthcare, demonstrating that DataSphere provides enhanced user experience for both query building and execution. PMID:29854239

  4. A Framework for WWW Query Processing

    NASA Technical Reports Server (NTRS)

    Wu, Binghui Helen; Wharton, Stephen (Technical Monitor)

    2000-01-01

    Query processing is the most common operation in a DBMS. Sophisticated query processing has been mainly targeted at a single enterprise environment providing centralized control over data and metadata. Submitting queries by anonymous users on the web is different in such a way that load balancing or DBMS' accessing control becomes the key issue. This paper provides a solution by introducing a framework for WWW query processing. The success of this framework lies in the utilization of query optimization techniques and the ontological approach. This methodology has proved to be cost effective at the NASA Goddard Space Flight Center Distributed Active Archive Center (GDAAC).

  5. Information Network Model Query Processing

    NASA Astrophysics Data System (ADS)

    Song, Xiaopu

    Information Networking Model (INM) [31] is a novel database model for real world objects and relationships management. It naturally and directly supports various kinds of static and dynamic relationships between objects. In INM, objects are networked through various natural and complex relationships. INM Query Language (INM-QL) [30] is designed to explore such information network, retrieve information about schema, instance, their attributes, relationships, and context-dependent information, and process query results in the user specified form. INM database management system has been implemented using Berkeley DB, and it supports INM-QL. This thesis is mainly focused on the implementation of the subsystem that is able to effectively and efficiently process INM-QL. The subsystem provides a lexical and syntactical analyzer of INM-QL, and it is able to choose appropriate evaluation strategies and index mechanism to process queries in INM-QL without the user's intervention. It also uses intermediate result structure to hold intermediate query result and other helping structures to reduce complexity of query processing.

  6. The Profile-Query Relationship.

    ERIC Educational Resources Information Center

    Shepherd, Michael A.; Phillips, W. J.

    1986-01-01

    Defines relationship between user profile and user query in terms of relationship between clusters of documents retrieved by each, and explores the expression of cluster similarity and cluster overlap as linear functions of similarity existing between original pairs of profiles and queries, given the desired retrieval threshold. (23 references)…

  7. Design Recommendations for Query Languages

    DTIC Science & Technology

    1980-09-01

    DESIGN RECOMMENDATIONS FOR QUERY LANGUAGES S.L. Ehrenreich Submitted by: Stanley M. Halpin, Acting Chief HUMAN FACTORS TECHNICAL AREA Approved by: Edgar ...respond to que- ries that it recognizes as faulty. Codd (1974) states that in designing a nat- ural query language, attention must be given to dealing...impaired. Codd (1974) also regarded the user’s perception of the data base to be of critical importance in properly designing a query language system

  8. An advanced web query interface for biological databases

    PubMed Central

    Latendresse, Mario; Karp, Peter D.

    2010-01-01

    Although most web-based biological databases (DBs) offer some type of web-based form to allow users to author DB queries, these query forms are quite restricted in the complexity of DB queries that they can formulate. They can typically query only one DB, and can query only a single type of object at a time (e.g. genes) with no possible interaction between the objects—that is, in SQL parlance, no joins are allowed between DB objects. Writing precise queries against biological DBs is usually left to a programmer skillful enough in complex DB query languages like SQL. We present a web interface for building precise queries for biological DBs that can construct much more precise queries than most web-based query forms, yet that is user friendly enough to be used by biologists. It supports queries containing multiple conditions, and connecting multiple object types without using the join concept, which is unintuitive to biologists. This interactive web interface is called the Structured Advanced Query Page (SAQP). Users interactively build up a wide range of query constructs. Interactive documentation within the SAQP describes the schema of the queried DBs. The SAQP is based on BioVelo, a query language based on list comprehension. The SAQP is part of the Pathway Tools software and is available as part of several bioinformatics web sites powered by Pathway Tools, including the BioCyc.org site that contains more than 500 Pathway/Genome DBs. PMID:20624715

  9. An Ensemble Approach for Expanding Queries

    DTIC Science & Technology

    2012-11-01

    0.39 pain^0.39 Hospital 15094 0.82 hospital^0.82 Miscarriage 45 3.35 miscarriage ^3.35 Radiotherapy 53 3.28 radiotherapy^3.28 Hypoaldosteronism 3...negated query is the expansion of the original query with negation terms preceding each word. For example, the negated version of “ miscarriage ^3.35...includes “no miscarriage ”^3.35 and “not miscarriage ”^3.35. If a document is the result of both original query and negated query, its score is

  10. Mining Longitudinal Web Queries: Trends and Patterns.

    ERIC Educational Resources Information Center

    Wang, Peiling; Berry, Michael W.; Yang, Yiheng

    2003-01-01

    Analyzed user queries submitted to an academic Web site during a four-year period, using a relational database, to examine users' query behavior, to identify problems they encounter, and to develop techniques for optimizing query analysis and mining. Linguistic analyses focus on query structures, lexicon, and word associations using statistical…

  11. Querying XML Data with SPARQL

    NASA Astrophysics Data System (ADS)

    Bikakis, Nikos; Gioldasis, Nektarios; Tsinaraki, Chrisa; Christodoulakis, Stavros

    SPARQL is today the standard access language for Semantic Web data. In the recent years XML databases have also acquired industrial importance due to the widespread applicability of XML in the Web. In this paper we present a framework that bridges the heterogeneity gap and creates an interoperable environment where SPARQL queries are used to access XML databases. Our approach assumes that fairly generic mappings between ontology constructs and XML Schema constructs have been automatically derived or manually specified. The mappings are used to automatically translate SPARQL queries to semantically equivalent XQuery queries which are used to access the XML databases. We present the algorithms and the implementation of SPARQL2XQuery framework, which is used for answering SPARQL queries over XML databases.

  12. Advanced Query Formulation in Deductive Databases.

    ERIC Educational Resources Information Center

    Niemi, Timo; Jarvelin, Kalervo

    1992-01-01

    Discusses deductive databases and database management systems (DBMS) and introduces a framework for advanced query formulation for end users. Recursive processing is described, a sample extensional database is presented, query types are explained, and criteria for advanced query formulation from the end user's viewpoint are examined. (31…

  13. SPLICE: A program to assemble partial query solutions from three-dimensional database searches into novel ligands

    NASA Astrophysics Data System (ADS)

    Ho, Chris M. W.; Marshall, Garland R.

    1993-12-01

    SPLICE is a program that processes partial query solutions retrieved from 3D, structural databases to generate novel, aggregate ligands. It is designed to interface with the database searching program FOUNDATION, which retrieves fragments containing any combination of a user-specified minimum number of matching query elements. SPLICE eliminates aspects of structures that are physically incapable of binding within the active site. Then, a systematic rule-based procedure is performed upon the remaining fragments to ensure receptor complementarity. All modifications are automated and remain transparent to the user. Ligands are then assembled by linking components into composite structures through overlapping bonds. As a control experiment, FOUNDATION and SPLICE were used to reconstruct a know HIV-1 protease inhibitor after it had been fragmented, reoriented, and added to a sham database of fifty different small molecules. To illustrate the capabilities of this program, a 3D search query containing the pharmacophoric elements of an aspartic proteinase-inhibitor crystal complex was searched using FOUNDATION against a subset of the Cambridge Structural Database. One hundred thirty-one compounds were retrieved, each containing any combination of at least four query elements. Compounds were automatically screened and edited for receptor complementarity. Numerous combinations of fragments were discovered that could be linked to form novel structures, containing a greater number of pharmacophoric elements than any single retrieved fragment.

  14. CrossQuery: a web tool for easy associative querying of transcriptome data.

    PubMed

    Wagner, Toni U; Fischer, Andreas; Thoma, Eva C; Schartl, Manfred

    2011-01-01

    Enormous amounts of data are being generated by modern methods such as transcriptome or exome sequencing and microarray profiling. Primary analyses such as quality control, normalization, statistics and mapping are highly complex and need to be performed by specialists. Thereafter, results are handed back to biomedical researchers, who are then confronted with complicated data lists. For rather simple tasks like data filtering, sorting and cross-association there is a need for new tools which can be used by non-specialists. Here, we describe CrossQuery, a web tool that enables straight forward, simple syntax queries to be executed on transcriptome sequencing and microarray datasets. We provide deep-sequencing data sets of stem cell lines derived from the model fish Medaka and microarray data of human endothelial cells. In the example datasets provided, mRNA expression levels, gene, transcript and sample identification numbers, GO-terms and gene descriptions can be freely correlated, filtered and sorted. Queries can be saved for later reuse and results can be exported to standard formats that allow copy-and-paste to all widespread data visualization tools such as Microsoft Excel. CrossQuery enables researchers to quickly and freely work with transcriptome and microarray data sets requiring only minimal computer skills. Furthermore, CrossQuery allows growing association of multiple datasets as long as at least one common point of correlated information, such as transcript identification numbers or GO-terms, is shared between samples. For advanced users, the object-oriented plug-in and event-driven code design of both server-side and client-side scripts allow easy addition of new features, data sources and data types.

  15. A Visual Interface for Querying Heterogeneous Phylogenetic Databases.

    PubMed

    Jamil, Hasan M

    2017-01-01

    Despite the recent growth in the number of phylogenetic databases, access to these wealth of resources remain largely tool or form-based interface driven. It is our thesis that the flexibility afforded by declarative query languages may offer the opportunity to access these repositories in a better way, and to use such a language to pose truly powerful queries in unprecedented ways. In this paper, we propose a substantially enhanced closed visual query language, called PhyQL, that can be used to query phylogenetic databases represented in a canonical form. The canonical representation presented helps capture most phylogenetic tree formats in a convenient way, and is used as the storage model for our PhyloBase database for which PhyQL serves as the query language. We have implemented a visual interface for the end users to pose PhyQL queries using visual icons, and drag and drop operations defined over them. Once a query is posed, the interface translates the visual query into a Datalog query for execution over the canonical database. Responses are returned as hyperlinks to phylogenies that can be viewed in several formats using the tree viewers supported by PhyloBase. Results cached in PhyQL buffer allows secondary querying on the computed results making it a truly powerful querying architecture.

  16. Relativistic quantum private database queries

    NASA Astrophysics Data System (ADS)

    Sun, Si-Jia; Yang, Yu-Guang; Zhang, Ming-Ou

    2015-04-01

    Recently, Jakobi et al. (Phys Rev A 83, 022301, 2011) suggested the first practical private database query protocol (J-protocol) based on the Scarani et al. (Phys Rev Lett 92, 057901, 2004) quantum key distribution protocol. Unfortunately, the J-protocol is just a cheat-sensitive private database query protocol. In this paper, we present an idealized relativistic quantum private database query protocol based on Minkowski causality and the properties of quantum information. Also, we prove that the protocol is secure in terms of the user security and the database security.

  17. Spatial and symbolic queries for 3D image data

    NASA Astrophysics Data System (ADS)

    Benson, Daniel C.; Zick, Gregory L.

    1992-04-01

    We present a query system for an object-oriented biomedical imaging database containing 3-D anatomical structures and their corresponding 2-D images. The graphical interface facilitates the formation of spatial queries, nonspatial or symbolic queries, and combined spatial/symbolic queries. A query editor is used for the creation and manipulation of 3-D query objects as volumes, surfaces, lines, and points. Symbolic predicates are formulated through a combination of text fields and multiple choice selections. Query results, which may include images, image contents, composite objects, graphics, and alphanumeric data, are displayed in multiple views. Objects returned by the query may be selected directly within the views for further inspection or modification, or for use as query objects in subsequent queries. Our image database query system provides visual feedback and manipulation of spatial query objects, multiple views of volume data, and the ability to combine spatial and symbolic queries. The system allows for incremental enhancement of existing objects and the addition of new objects and spatial relationships. The query system is designed for databases containing symbolic and spatial data. This paper discuses its application to data acquired in biomedical 3- D image reconstruction, but it is applicable to other areas such as CAD/CAM, geographical information systems, and computer vision.

  18. PAQ: Persistent Adaptive Query Middleware for Dynamic Environments

    NASA Astrophysics Data System (ADS)

    Rajamani, Vasanth; Julien, Christine; Payton, Jamie; Roman, Gruia-Catalin

    Pervasive computing applications often entail continuous monitoring tasks, issuing persistent queries that return continuously updated views of the operational environment. We present PAQ, a middleware that supports applications' needs by approximating a persistent query as a sequence of one-time queries. PAQ introduces an integration strategy abstraction that allows composition of one-time query responses into streams representing sophisticated spatio-temporal phenomena of interest. A distinguishing feature of our middleware is the realization that the suitability of a persistent query's result is a function of the application's tolerance for accuracy weighed against the associated overhead costs. In PAQ, programmers can specify an inquiry strategy that dictates how information is gathered. Since network dynamics impact the suitability of a particular inquiry strategy, PAQ associates an introspection strategy with a persistent query, that evaluates the quality of the query's results. The result of introspection can trigger application-defined adaptation strategies that alter the nature of the query. PAQ's simple API makes developing adaptive querying systems easily realizable. We present the key abstractions, describe their implementations, and demonstrate the middleware's usefulness through application examples and evaluation.

  19. Activity-driven changes in the mechanical properties of fire ant aggregations

    NASA Astrophysics Data System (ADS)

    Tennenbaum, Michael; Fernandez-Nieves, Alberto

    2017-11-01

    Fire ant aggregations are active materials composed of individual constituents that are able to transform internal energy into work. We find using rheology and direct visualization that the aggregation undergoes activity cycles that affect the mechanical properties of the system. When the activity is high, the aggregation approximately equally stores and dissipates energy, it is more homogeneous, and exerts a high outward force. When the activity is low, the aggregation is predominantly elastic, it is more heterogeneous, and it exerts a small outward force. We rationalize our results using a simple kinetic model where the number of active ants within the aggregation is the essential quantity.

  20. EquiX-A Search and Query Language for XML.

    ERIC Educational Resources Information Center

    Cohen, Sara; Kanza, Yaron; Kogan, Yakov; Sagiv, Yehoshua; Nutt, Werner; Serebrenik, Alexander

    2002-01-01

    Describes EquiX, a search language for XML that combines querying with searching to query the data and the meta-data content of Web pages. Topics include search engines; a data model for XML documents; search query syntax; search query semantics; an algorithm for evaluating a query on a document; and indexing EquiX queries. (LRW)

  1. Assisting Consumer Health Information Retrieval with Query Recommendations

    PubMed Central

    Zeng, Qing T.; Crowell, Jonathan; Plovnick, Robert M.; Kim, Eunjung; Ngo, Long; Dibble, Emily

    2006-01-01

    Objective: Health information retrieval (HIR) on the Internet has become an important practice for millions of people, many of whom have problems forming effective queries. We have developed and evaluated a tool to assist people in health-related query formation. Design: We developed the Health Information Query Assistant (HIQuA) system. The system suggests alternative/additional query terms related to the user's initial query that can be used as building blocks to construct a better, more specific query. The recommended terms are selected according to their semantic distance from the original query, which is calculated on the basis of concept co-occurrences in medical literature and log data as well as semantic relations in medical vocabularies. Measurements: An evaluation of the HIQuA system was conducted and a total of 213 subjects participated in the study. The subjects were randomized into 2 groups. One group was given query recommendations and the other was not. Each subject performed HIR for both a predefined and a self-defined task. Results: The study showed that providing HIQuA recommendations resulted in statistically significantly higher rates of successful queries (odds ratio = 1.66, 95% confidence interval = 1.16–2.38), although no statistically significant impact on user satisfaction or the users' ability to accomplish the predefined retrieval task was found. Conclusion: Providing semantic-distance-based query recommendations can help consumers with query formation during HIR. PMID:16221944

  2. Querying Proofs (Work in Progress)

    NASA Technical Reports Server (NTRS)

    Aspinall, David; Denney, Ewen; Lueth, Christoph

    2011-01-01

    We motivate and introduce the basis for a query language designed for inspecting electronic representations of proofs. We argue that there is much to learn from large proofs beyond their validity, and that a dedicated query language can provide a principled way of implementing a family of useful operations.

  3. Implementation of Quantum Private Queries Using Nuclear Magnetic Resonance

    NASA Astrophysics Data System (ADS)

    Wang, Chuan; Hao, Liang; Zhao, Lian-Jie

    2011-08-01

    We present a modified protocol for the realization of a quantum private query process on a classical database. Using one-qubit query and CNOT operation, the query process can be realized in a two-mode database. In the query process, the data privacy is preserved as the sender would not reveal any information about the database besides her query information, and the database provider cannot retain any information about the query. We implement the quantum private query protocol in a nuclear magnetic resonance system. The density matrix of the memory registers are constructed.

  4. Relational similarity-based model of data part 1: foundations and query systems

    NASA Astrophysics Data System (ADS)

    Belohlavek, Radim; Vychodil, Vilem

    2017-10-01

    We present a general rank-aware model of data which supports handling of similarity in relational databases. The model is based on the assumption that in many cases it is desirable to replace equalities on values in data tables by similarity relations expressing degrees to which the values are similar. In this context, we study various phenomena which emerge in the model, including similarity-based queries and similarity-based data dependencies. Central notion in our model is that of a ranked data table over domains with similarities which is our counterpart to the notion of relation on relation scheme from the classical relational model. Compared to other approaches which cover related problems, we do not propose a similarity-based or ranking module on top of the classical relational model. Instead, we generalize the very core of the model by replacing the classical, two-valued logic upon which the classical model is built by a more general logic involving a scale of truth degrees that, in addition to the classical truth degrees 0 and 1, contains intermediate truth degrees. While the classical truth degrees 0 and 1 represent nonequality and equality of values, and subsequently mismatch and match of queries, the intermediate truth degrees in the new model represent similarity of values and partial match of queries. Moreover, the truth functions of many-valued logical connectives in the new model serve to aggregate degrees of similarity. The presented approach is conceptually clean, logically sound, and retains most properties of the classical model while enabling us to employ new types of queries and data dependencies. Most importantly, similarity is not handled in an ad hoc way or by putting a "similarity module" atop the classical model in our approach. Rather, it is consistently viewed as a notion that generalizes and replaces equality in the very core of the relational model. We present fundamentals of the formal model and two equivalent query systems which are

  5. SPARQL Query Re-writing Using Partonomy Based Transformation Rules

    NASA Astrophysics Data System (ADS)

    Jain, Prateek; Yeh, Peter Z.; Verma, Kunal; Henson, Cory A.; Sheth, Amit P.

    Often the information present in a spatial knowledge base is represented at a different level of granularity and abstraction than the query constraints. For querying ontology's containing spatial information, the precise relationships between spatial entities has to be specified in the basic graph pattern of SPARQL query which can result in long and complex queries. We present a novel approach to help users intuitively write SPARQL queries to query spatial data, rather than relying on knowledge of the ontology structure. Our framework re-writes queries, using transformation rules to exploit part-whole relations between geographical entities to address the mismatches between query constraints and knowledge base. Our experiments were performed on completely third party datasets and queries. Evaluations were performed on Geonames dataset using questions from National Geographic Bee serialized into SPARQL and British Administrative Geography Ontology using questions from a popular trivia website. These experiments demonstrate high precision in retrieval of results and ease in writing queries.

  6. Improve Performance of Data Warehouse by Query Cache

    NASA Astrophysics Data System (ADS)

    Gour, Vishal; Sarangdevot, S. S.; Sharma, Anand; Choudhary, Vinod

    2010-11-01

    The primary goal of data warehouse is to free the information locked up in the operational database so that decision makers and business analyst can make queries, analysis and planning regardless of the data changes in operational database. As the number of queries is large, therefore, in certain cases there is reasonable probability that same query submitted by the one or multiple users at different times. Each time when query is executed, all the data of warehouse is analyzed to generate the result of that query. In this paper we will study how using query cache improves performance of Data Warehouse and try to find the common problems faced. These kinds of problems are faced by Data Warehouse administrators which are minimizes response time and improves the efficiency of query in data warehouse overall, particularly when data warehouse is updated at regular interval.

  7. Applying Query Structuring in Cross-language Retrieval.

    ERIC Educational Resources Information Center

    Pirkola, Ari; Puolamaki, Deniz; Jarvelin, Kalervo

    2003-01-01

    Explores ways to apply query structuring in cross-language information retrieval. Tested were: English queries translated into Finnish using an electronic dictionary, and run in a Finnish newspaper databases; effects of compound-based structuring using a proximity operator for translation equivalents of query language compound components; and a…

  8. Evaluation of Sub Query Performance in SQL Server

    NASA Astrophysics Data System (ADS)

    Oktavia, Tanty; Sujarwo, Surya

    2014-03-01

    The paper explores several sub query methods used in a query and their impact on the query performance. The study uses experimental approach to evaluate the performance of each sub query methods combined with indexing strategy. The sub query methods consist of in, exists, relational operator and relational operator combined with top operator. The experimental shows that using relational operator combined with indexing strategy in sub query has greater performance compared with using same method without indexing strategy and also other methods. In summary, for application that emphasized on the performance of retrieving data from database, it better to use relational operator combined with indexing strategy. This study is done on Microsoft SQL Server 2012.

  9. Distributed query plan generation using multiobjective genetic algorithm.

    PubMed

    Panicker, Shina; Kumar, T V Vijay

    2014-01-01

    A distributed query processing strategy, which is a key performance determinant in accessing distributed databases, aims to minimize the total query processing cost. One way to achieve this is by generating efficient distributed query plans that involve fewer sites for processing a query. In the case of distributed relational databases, the number of possible query plans increases exponentially with respect to the number of relations accessed by the query and the number of sites where these relations reside. Consequently, computing optimal distributed query plans becomes a complex problem. This distributed query plan generation (DQPG) problem has already been addressed using single objective genetic algorithm, where the objective is to minimize the total query processing cost comprising the local processing cost (LPC) and the site-to-site communication cost (CC). In this paper, this DQPG problem is formulated and solved as a biobjective optimization problem with the two objectives being minimize total LPC and minimize total CC. These objectives are simultaneously optimized using a multiobjective genetic algorithm NSGA-II. Experimental comparison of the proposed NSGA-II based DQPG algorithm with the single objective genetic algorithm shows that the former performs comparatively better and converges quickly towards optimal solutions for an observed crossover and mutation probability.

  10. Distributed Query Plan Generation Using Multiobjective Genetic Algorithm

    PubMed Central

    Panicker, Shina; Vijay Kumar, T. V.

    2014-01-01

    A distributed query processing strategy, which is a key performance determinant in accessing distributed databases, aims to minimize the total query processing cost. One way to achieve this is by generating efficient distributed query plans that involve fewer sites for processing a query. In the case of distributed relational databases, the number of possible query plans increases exponentially with respect to the number of relations accessed by the query and the number of sites where these relations reside. Consequently, computing optimal distributed query plans becomes a complex problem. This distributed query plan generation (DQPG) problem has already been addressed using single objective genetic algorithm, where the objective is to minimize the total query processing cost comprising the local processing cost (LPC) and the site-to-site communication cost (CC). In this paper, this DQPG problem is formulated and solved as a biobjective optimization problem with the two objectives being minimize total LPC and minimize total CC. These objectives are simultaneously optimized using a multiobjective genetic algorithm NSGA-II. Experimental comparison of the proposed NSGA-II based DQPG algorithm with the single objective genetic algorithm shows that the former performs comparatively better and converges quickly towards optimal solutions for an observed crossover and mutation probability. PMID:24963513

  11. SPARQL Assist language-neutral query composer

    PubMed Central

    2012-01-01

    Background SPARQL query composition is difficult for the lay-person, and even the experienced bioinformatician in cases where the data model is unfamiliar. Moreover, established best-practices and internationalization concerns dictate that the identifiers for ontological terms should be opaque rather than human-readable, which further complicates the task of synthesizing queries manually. Results We present SPARQL Assist: a Web application that addresses these issues by providing context-sensitive type-ahead completion during SPARQL query construction. Ontological terms are suggested using their multi-lingual labels and descriptions, leveraging existing support for internationalization and language-neutrality. Moreover, the system utilizes the semantics embedded in ontologies, and within the query itself, to help prioritize the most likely suggestions. Conclusions To ensure success, the Semantic Web must be easily available to all users, regardless of locale, training, or preferred language. By enhancing support for internationalization, and moreover by simplifying the manual construction of SPARQL queries through the use of controlled-natural-language interfaces, we believe we have made some early steps towards simplifying access to Semantic Web resources. PMID:22373327

  12. SPARQL assist language-neutral query composer.

    PubMed

    McCarthy, Luke; Vandervalk, Ben; Wilkinson, Mark

    2012-01-25

    SPARQL query composition is difficult for the lay-person, and even the experienced bioinformatician in cases where the data model is unfamiliar. Moreover, established best-practices and internationalization concerns dictate that the identifiers for ontological terms should be opaque rather than human-readable, which further complicates the task of synthesizing queries manually. We present SPARQL Assist: a Web application that addresses these issues by providing context-sensitive type-ahead completion during SPARQL query construction. Ontological terms are suggested using their multi-lingual labels and descriptions, leveraging existing support for internationalization and language-neutrality. Moreover, the system utilizes the semantics embedded in ontologies, and within the query itself, to help prioritize the most likely suggestions. To ensure success, the Semantic Web must be easily available to all users, regardless of locale, training, or preferred language. By enhancing support for internationalization, and moreover by simplifying the manual construction of SPARQL queries through the use of controlled-natural-language interfaces, we believe we have made some early steps towards simplifying access to Semantic Web resources.

  13. Spatial information semantic query based on SPARQL

    NASA Astrophysics Data System (ADS)

    Xiao, Zhifeng; Huang, Lei; Zhai, Xiaofang

    2009-10-01

    How can the efficiency of spatial information inquiries be enhanced in today's fast-growing information age? We are rich in geospatial data but poor in up-to-date geospatial information and knowledge that are ready to be accessed by public users. This paper adopts an approach for querying spatial semantic by building an Web Ontology language(OWL) format ontology and introducing SPARQL Protocol and RDF Query Language(SPARQL) to search spatial semantic relations. It is important to establish spatial semantics that support for effective spatial reasoning for performing semantic query. Compared to earlier keyword-based and information retrieval techniques that rely on syntax, we use semantic approaches in our spatial queries system. Semantic approaches need to be developed by ontology, so we use OWL to describe spatial information extracted by the large-scale map of Wuhan. Spatial information expressed by ontology with formal semantics is available to machines for processing and to people for understanding. The approach is illustrated by introducing a case study for using SPARQL to query geo-spatial ontology instances of Wuhan. The paper shows that making use of SPARQL to search OWL ontology instances can ensure the result's accuracy and applicability. The result also indicates constructing a geo-spatial semantic query system has positive efforts on forming spatial query and retrieval.

  14. Information Retrieval Using UMLS-based Structured Queries

    PubMed Central

    Fagan, Lawrence M.; Berrios, Daniel C.; Chan, Albert; Cucina, Russell; Datta, Anupam; Shah, Maulik; Surendran, Sujith

    2001-01-01

    During the last three years, we have developed and described components of ELBook, a semantically based information-retrieval system [1-4]. Using these components, domain experts can specify a query model, indexers can use the query model to index documents, and end-users can search these documents for instances of indexed queries.

  15. Lumpy Demand and the Diagrammatics of Aggregation.

    ERIC Educational Resources Information Center

    Shmanske, Stephen; Packey, Daniel

    1999-01-01

    Illustrates how a simple discontinuity in an individual's demand curve, or inverse-demand curve, affects the shape of market aggregate curves. Shows, for private goods, that an infinitesimal change in quantity can lead to large changes in consumption patterns; for collective goods, the analysis suggests a theory of coalition building. (DSK)

  16. Development of tiered aggregate specifications for FDOT use : [summary].

    DOT National Transportation Integrated Search

    2012-01-01

    Limestone in Florida's peninsula plays a major : role in transportation. As a road surface, a base : layer, or as aggregate in asphalt and cement : concretes, most road building projects require : large quantities of limestone. Millions of tons : of ...

  17. Bistable aggregate of all-trans-astaxanthin in an aqueous solution

    NASA Astrophysics Data System (ADS)

    Mori, Yuso; Yamano, Kuniko; Hashimoto, Hideki

    1996-05-01

    The temperature dependence of the optical absorption spectra for astaxanthin aggregate has been studied between 2 and 32°C. Red-shifted absorption bands as compared to the monomer absorption band are found above 21°C in addition to the blue-shifted band of the aggregate. The spectra suggest that the molecular arrangement in the aggregate is a bistable one consisting of head-to-tail and card-packed arrangements. A diagram describing the bistability together with the monomer state is proposed in the space defined by the free energy and the quantity of Σi = 1 N< θ12 + < σθ12 for the ith molecule in the N-molecule aggregate.

  18. Code query by example

    NASA Astrophysics Data System (ADS)

    Vaucouleur, Sebastien

    2011-02-01

    We introduce code query by example for customisation of evolvable software products in general and of enterprise resource planning systems (ERPs) in particular. The concept is based on an initial empirical study on practices around ERP systems. We motivate our design choices based on those empirical results, and we show how the proposed solution helps with respect to the infamous upgrade problem: the conflict between the need for customisation and the need for upgrade of ERP systems. We further show how code query by example can be used as a form of lightweight static analysis, to detect automatically potential defects in large software products. Code query by example as a form of lightweight static analysis is particularly interesting in the context of ERP systems: it is often the case that programmers working in this field are not computer science specialists but more of domain experts. Hence, they require a simple language to express custom rules.

  19. Query Auto-Completion Based on Word2vec Semantic Similarity

    NASA Astrophysics Data System (ADS)

    Shao, Taihua; Chen, Honghui; Chen, Wanyu

    2018-04-01

    Query auto-completion (QAC) is the first step of information retrieval, which helps users formulate the entire query after inputting only a few prefixes. Regarding the models of QAC, the traditional method ignores the contribution from the semantic relevance between queries. However, similar queries always express extremely similar search intention. In this paper, we propose a hybrid model FS-QAC based on query semantic similarity as well as the query frequency. We choose word2vec method to measure the semantic similarity between intended queries and pre-submitted queries. By combining both features, our experiments show that FS-QAC model improves the performance when predicting the user’s query intention and helping formulate the right query. Our experimental results show that the optimal hybrid model contributes to a 7.54% improvement in terms of MRR against a state-of-the-art baseline using the public AOL query logs.

  20. Inhibition of p53 Mutant Peptide Aggregation In Vitro by Cationic Osmolyte Acetylcholine Chloride.

    PubMed

    Chen, Zhaolin; Kanapathipillai, Mathumai

    2017-01-01

    Mutations of tumor suppressor protein p53 are present in almost about 50% of all cancers. It has been reported that the p53 mutations cause aggregation and subsequent loss of p53 function, leading to cancer progression. Here in this study we focus on the inhibitory effects of cationic osmolyte molecules acetylcholine chloride, and choline on an aggregation prone 10 amino acid p53 mutant peptide WRPILTIITL, and the corresponding wildtype peptide RRPILTIITL in vitro. The characterization tools used for this study include Thioflavin- T (ThT) induced fluorescence, transmission electron microscopy (TEM), congo red binding, turbidity, dynamic light scattering (DLS), and cell viability assays. The results show that acetylcholine chloride in micromolar concentrations significantly inhibit p53 mutant peptide aggregation in vitro, and could be promising candidate for p53 mutant/ misfolded protein aggregation inhibition. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  1. An SQL query generator for CLIPS

    NASA Technical Reports Server (NTRS)

    Snyder, James; Chirica, Laurian

    1990-01-01

    As expert systems become more widely used, their access to large amounts of external information becomes increasingly important. This information exists in several forms such as statistical, tabular data, knowledge gained by experts and large databases of information maintained by companies. Because many expert systems, including CLIPS, do not provide access to this external information, much of the usefulness of expert systems is left untapped. The scope of this paper is to describe a database extension for the CLIPS expert system shell. The current industry standard database language is SQL. Due to SQL standardization, large amounts of information stored on various computers, potentially at different locations, will be more easily accessible. Expert systems should be able to directly access these existing databases rather than requiring information to be re-entered into the expert system environment. The ORACLE relational database management system (RDBMS) was used to provide a database connection within the CLIPS environment. To facilitate relational database access a query generation system was developed as a CLIPS user function. The queries are entered in a CLlPS-like syntax and are passed to the query generator, which constructs and submits for execution, an SQL query to the ORACLE RDBMS. The query results are asserted as CLIPS facts. The query generator was developed primarily for use within the ICADS project (Intelligent Computer Aided Design System) currently being developed by the CAD Research Unit in the California Polytechnic State University (Cal Poly). In ICADS, there are several parallel or distributed expert systems accessing a common knowledge base of facts. Expert system has a narrow domain of interest and therefore needs only certain portions of the information. The query generator provides a common method of accessing this information and allows the expert system to specify what data is needed without specifying how to retrieve it.

  2. An overview of aggregate resources in the United States

    USGS Publications Warehouse

    Langer, William H.; Scott, P.W.; Bristow, C.M.

    2002-01-01

    In 2000 the USA produced about 2.7 billion tonnes of aggregate worth about $13.7 billion. Both crushed stone and sand and gravel are produced in virtually every State, although limited quantities are available in the Gulf Coastal Plain, the Colorado Plateau , the Wyoming Basin and the Great Plains. Prices vary depending on the product and location. Most aggregates are transported by road, and minor amounts by railroad, barge on navigable inland channels, and through the Great Lake ports. Imports and exports of aggregates are very minor. A major amount f crushed stone aggregates is consumed by concrete aggregate. Recycled aggregates account for about 8% of total demand, although the amount recycled is thought to be increasing. Current issues facing the inductry unclude the differences in quality specifications between States, adjusting to the increasing concern for the impact of aggregate mining on the environmentm, health issues from particulate matter and crystalline silica, and the complexity of obtaining permits for extraction. Redcustion in the number od companies extracting aggregrates is likely to occur through acquisitions.

  3. Mining the SDSS SkyServer SQL queries log

    NASA Astrophysics Data System (ADS)

    Hirota, Vitor M.; Santos, Rafael; Raddick, Jordan; Thakar, Ani

    2016-05-01

    SkyServer, the Internet portal for the Sloan Digital Sky Survey (SDSS) astronomic catalog, provides a set of tools that allows data access for astronomers and scientific education. One of SkyServer data access interfaces allows users to enter ad-hoc SQL statements to query the catalog. SkyServer also presents some template queries that can be used as basis for more complex queries. This interface has logged over 330 million queries submitted since 2001. It is expected that analysis of this data can be used to investigate usage patterns, identify potential new classes of queries, find similar queries, etc. and to shed some light on how users interact with the Sloan Digital Sky Survey data and how scientists have adopted the new paradigm of e-Science, which could in turn lead to enhancements on the user interfaces and experience in general. In this paper we review some approaches to SQL query mining, apply the traditional techniques used in the literature and present lessons learned, namely, that the general text mining approach for feature extraction and clustering does not seem to be adequate for this type of data, and, most importantly, we find that this type of analysis can result in very different queries being clustered together.

  4. Fragger: a protein fragment picker for structural queries.

    PubMed

    Berenger, Francois; Simoncini, David; Voet, Arnout; Shrestha, Rojan; Zhang, Kam Y J

    2017-01-01

    Protein modeling and design activities often require querying the Protein Data Bank (PDB) with a structural fragment, possibly containing gaps. For some applications, it is preferable to work on a specific subset of the PDB or with unpublished structures. These requirements, along with specific user needs, motivated the creation of a new software to manage and query 3D protein fragments. Fragger is a protein fragment picker that allows protein fragment databases to be created and queried. All fragment lengths are supported and any set of PDB files can be used to create a database. Fragger can efficiently search a fragment database with a query fragment and a distance threshold. Matching fragments are ranked by distance to the query. The query fragment can have structural gaps and the allowed amino acid sequences matching a query can be constrained via a regular expression of one-letter amino acid codes. Fragger also incorporates a tool to compute the backbone RMSD of one versus many fragments in high throughput. Fragger should be useful for protein design, loop grafting and related structural bioinformatics tasks.

  5. Manchester visual query language

    NASA Astrophysics Data System (ADS)

    Oakley, John P.; Davis, Darryl N.; Shann, Richard T.

    1993-04-01

    We report a database language for visual retrieval which allows queries on image feature information which has been computed and stored along with images. The language is novel in that it provides facilities for dealing with feature data which has actually been obtained from image analysis. Each line in the Manchester Visual Query Language (MVQL) takes a set of objects as input and produces another, usually smaller, set as output. The MVQL constructs are mainly based on proven operators from the field of digital image analysis. An example is the Hough-group operator which takes as input a specification for the objects to be grouped, a specification for the relevant Hough space, and a definition of the voting rule. The output is a ranked list of high scoring bins. The query could be directed towards one particular image or an entire image database, in the latter case the bins in the output list would in general be associated with different images. We have implemented MVQL in two layers. The command interpreter is a Lisp program which maps each MVQL line to a sequence of commands which are used to control a specialized database engine. The latter is a hybrid graph/relational system which provides low-level support for inheritance and schema evolution. In the paper we outline the language and provide examples of useful queries. We also describe our solution to the engineering problems associated with the implementation of MVQL.

  6. Enabling Incremental Query Re-Optimization.

    PubMed

    Liu, Mengmeng; Ives, Zachary G; Loo, Boon Thau

    2016-01-01

    As declarative query processing techniques expand to the Web, data streams, network routers, and cloud platforms, there is an increasing need to re-plan execution in the presence of unanticipated performance changes. New runtime information may affect which query plan we prefer to run. Adaptive techniques require innovation both in terms of the algorithms used to estimate costs , and in terms of the search algorithm that finds the best plan. We investigate how to build a cost-based optimizer that recomputes the optimal plan incrementally given new cost information, much as a stream engine constantly updates its outputs given new data. Our implementation especially shows benefits for stream processing workloads. It lays the foundations upon which a variety of novel adaptive optimization algorithms can be built. We start by leveraging the recently proposed approach of formulating query plan enumeration as a set of recursive datalog queries ; we develop a variety of novel optimization approaches to ensure effective pruning in both static and incremental cases. We further show that the lessons learned in the declarative implementation can be equally applied to more traditional optimizer implementations.

  7. Enabling Incremental Query Re-Optimization

    PubMed Central

    Liu, Mengmeng; Ives, Zachary G.; Loo, Boon Thau

    2017-01-01

    As declarative query processing techniques expand to the Web, data streams, network routers, and cloud platforms, there is an increasing need to re-plan execution in the presence of unanticipated performance changes. New runtime information may affect which query plan we prefer to run. Adaptive techniques require innovation both in terms of the algorithms used to estimate costs, and in terms of the search algorithm that finds the best plan. We investigate how to build a cost-based optimizer that recomputes the optimal plan incrementally given new cost information, much as a stream engine constantly updates its outputs given new data. Our implementation especially shows benefits for stream processing workloads. It lays the foundations upon which a variety of novel adaptive optimization algorithms can be built. We start by leveraging the recently proposed approach of formulating query plan enumeration as a set of recursive datalog queries; we develop a variety of novel optimization approaches to ensure effective pruning in both static and incremental cases. We further show that the lessons learned in the declarative implementation can be equally applied to more traditional optimizer implementations. PMID:28659658

  8. RDF-GL: A SPARQL-Based Graphical Query Language for RDF

    NASA Astrophysics Data System (ADS)

    Hogenboom, Frederik; Milea, Viorel; Frasincar, Flavius; Kaymak, Uzay

    This chapter presents RDF-GL, a graphical query language (GQL) for RDF. The GQL is based on the textual query language SPARQL and mainly focuses on SPARQL SELECT queries. The advantage of a GQL over textual query languages is that complexity is hidden through the use of graphical symbols. RDF-GL is supported by a Java-based editor, SPARQLinG, which is presented as well. The editor does not only allow for RDF-GL query creation, but also converts RDF-GL queries to SPARQL queries and is able to subsequently execute these. Experiments show that using the GQL in combination with the editor makes RDF querying more accessible for end users.

  9. The StarView intelligent query mechanism

    NASA Technical Reports Server (NTRS)

    Semmel, R. D.; Silberberg, D. P.

    1993-01-01

    The StarView interface is being developed to facilitate the retrieval of scientific and engineering data produced by the Hubble Space Telescope. While predefined screens in the interface can be used to specify many common requests, ad hoc requests require a dynamic query formulation capability. Unfortunately, logical level knowledge is too sparse to support this capability. In particular, essential formulation knowledge is lost when the domain of interest is mapped to a set of database relation schemas. Thus, a system known as QUICK has been developed that uses conceptual design knowledge to facilitate query formulation. By heuristically determining strongly associated objects at the conceptual level, QUICK is able to formulate semantically reasonable queries in response to high-level requests that specify only attributes of interest. Moreover, by exploiting constraint knowledge in the conceptual design, QUICK assures that queries are formulated quickly and will execute efficiently.

  10. Query-Based Outlier Detection in Heterogeneous Information Networks.

    PubMed

    Kuck, Jonathan; Zhuang, Honglei; Yan, Xifeng; Cam, Hasan; Han, Jiawei

    2015-03-01

    Outlier or anomaly detection in large data sets is a fundamental task in data science, with broad applications. However, in real data sets with high-dimensional space, most outliers are hidden in certain dimensional combinations and are relative to a user's search space and interest. It is often more effective to give power to users and allow them to specify outlier queries flexibly, and the system will then process such mining queries efficiently. In this study, we introduce the concept of query-based outlier in heterogeneous information networks, design a query language to facilitate users to specify such queries flexibly, define a good outlier measure in heterogeneous networks, and study how to process outlier queries efficiently in large data sets. Our experiments on real data sets show that following such a methodology, interesting outliers can be defined and uncovered flexibly and effectively in large heterogeneous networks.

  11. Query-Based Outlier Detection in Heterogeneous Information Networks

    PubMed Central

    Kuck, Jonathan; Zhuang, Honglei; Yan, Xifeng; Cam, Hasan; Han, Jiawei

    2015-01-01

    Outlier or anomaly detection in large data sets is a fundamental task in data science, with broad applications. However, in real data sets with high-dimensional space, most outliers are hidden in certain dimensional combinations and are relative to a user’s search space and interest. It is often more effective to give power to users and allow them to specify outlier queries flexibly, and the system will then process such mining queries efficiently. In this study, we introduce the concept of query-based outlier in heterogeneous information networks, design a query language to facilitate users to specify such queries flexibly, define a good outlier measure in heterogeneous networks, and study how to process outlier queries efficiently in large data sets. Our experiments on real data sets show that following such a methodology, interesting outliers can be defined and uncovered flexibly and effectively in large heterogeneous networks. PMID:27064397

  12. Query Health: standards-based, cross-platform population health surveillance

    PubMed Central

    Klann, Jeffrey G; Buck, Michael D; Brown, Jeffrey; Hadley, Marc; Elmore, Richard; Weber, Griffin M; Murphy, Shawn N

    2014-01-01

    Objective Understanding population-level health trends is essential to effectively monitor and improve public health. The Office of the National Coordinator for Health Information Technology (ONC) Query Health initiative is a collaboration to develop a national architecture for distributed, population-level health queries across diverse clinical systems with disparate data models. Here we review Query Health activities, including a standards-based methodology, an open-source reference implementation, and three pilot projects. Materials and methods Query Health defined a standards-based approach for distributed population health queries, using an ontology based on the Quality Data Model and Consolidated Clinical Document Architecture, Health Quality Measures Format (HQMF) as the query language, the Query Envelope as the secure transport layer, and the Quality Reporting Document Architecture as the result language. Results We implemented this approach using Informatics for Integrating Biology and the Bedside (i2b2) and hQuery for data analytics and PopMedNet for access control, secure query distribution, and response. We deployed the reference implementation at three pilot sites: two public health departments (New York City and Massachusetts) and one pilot designed to support Food and Drug Administration post-market safety surveillance activities. The pilots were successful, although improved cross-platform data normalization is needed. Discussions This initiative resulted in a standards-based methodology for population health queries, a reference implementation, and revision of the HQMF standard. It also informed future directions regarding interoperability and data access for ONC's Data Access Framework initiative. Conclusions Query Health was a test of the learning health system that supplied a functional methodology and reference implementation for distributed population health queries that has been validated at three sites. PMID:24699371

  13. Complex dynamics of our economic life on different scales: insights from search engine query data.

    PubMed

    Preis, Tobias; Reith, Daniel; Stanley, H Eugene

    2010-12-28

    Search engine query data deliver insight into the behaviour of individuals who are the smallest possible scale of our economic life. Individuals are submitting several hundred million search engine queries around the world each day. We study weekly search volume data for various search terms from 2004 to 2010 that are offered by the search engine Google for scientific use, providing information about our economic life on an aggregated collective level. We ask the question whether there is a link between search volume data and financial market fluctuations on a weekly time scale. Both collective 'swarm intelligence' of Internet users and the group of financial market participants can be regarded as a complex system of many interacting subunits that react quickly to external changes. We find clear evidence that weekly transaction volumes of S&P 500 companies are correlated with weekly search volume of corresponding company names. Furthermore, we apply a recently introduced method for quantifying complex correlations in time series with which we find a clear tendency that search volume time series and transaction volume time series show recurring patterns.

  14. SAM Methods Query

    EPA Pesticide Factsheets

    Laboratories measuring target chemical, radiochemical, pathogens, and biotoxin analytes in environmental samples can use this online query tool to identify analytical methods included in EPA's Selected Analytical Methods for Environmental Remediation

  15. Cumulative query method for influenza surveillance using search engine data.

    PubMed

    Seo, Dong-Woo; Jo, Min-Woo; Sohn, Chang Hwan; Shin, Soo-Yong; Lee, JaeHo; Yu, Maengsoo; Kim, Won Young; Lim, Kyoung Soo; Lee, Sang-Il

    2014-12-16

    Internet search queries have become an important data source in syndromic surveillance system. However, there is currently no syndromic surveillance system using Internet search query data in South Korea. The objective of this study was to examine correlations between our cumulative query method and national influenza surveillance data. Our study was based on the local search engine, Daum (approximately 25% market share), and influenza-like illness (ILI) data from the Korea Centers for Disease Control and Prevention. A quota sampling survey was conducted with 200 participants to obtain popular queries. We divided the study period into two sets: Set 1 (the 2009/10 epidemiological year for development set 1 and 2010/11 for validation set 1) and Set 2 (2010/11 for development Set 2 and 2011/12 for validation Set 2). Pearson's correlation coefficients were calculated between the Daum data and the ILI data for the development set. We selected the combined queries for which the correlation coefficients were .7 or higher and listed them in descending order. Then, we created a cumulative query method n representing the number of cumulative combined queries in descending order of the correlation coefficient. In validation set 1, 13 cumulative query methods were applied, and 8 had higher correlation coefficients (min=.916, max=.943) than that of the highest single combined query. Further, 11 of 13 cumulative query methods had an r value of ≥.7, but 4 of 13 combined queries had an r value of ≥.7. In validation set 2, 8 of 15 cumulative query methods showed higher correlation coefficients (min=.975, max=.987) than that of the highest single combined query. All 15 cumulative query methods had an r value of ≥.7, but 6 of 15 combined queries had an r value of ≥.7. Cumulative query method showed relatively higher correlation with national influenza surveillance data than combined queries in the development and validation set.

  16. Query Health: standards-based, cross-platform population health surveillance.

    PubMed

    Klann, Jeffrey G; Buck, Michael D; Brown, Jeffrey; Hadley, Marc; Elmore, Richard; Weber, Griffin M; Murphy, Shawn N

    2014-01-01

    Understanding population-level health trends is essential to effectively monitor and improve public health. The Office of the National Coordinator for Health Information Technology (ONC) Query Health initiative is a collaboration to develop a national architecture for distributed, population-level health queries across diverse clinical systems with disparate data models. Here we review Query Health activities, including a standards-based methodology, an open-source reference implementation, and three pilot projects. Query Health defined a standards-based approach for distributed population health queries, using an ontology based on the Quality Data Model and Consolidated Clinical Document Architecture, Health Quality Measures Format (HQMF) as the query language, the Query Envelope as the secure transport layer, and the Quality Reporting Document Architecture as the result language. We implemented this approach using Informatics for Integrating Biology and the Bedside (i2b2) and hQuery for data analytics and PopMedNet for access control, secure query distribution, and response. We deployed the reference implementation at three pilot sites: two public health departments (New York City and Massachusetts) and one pilot designed to support Food and Drug Administration post-market safety surveillance activities. The pilots were successful, although improved cross-platform data normalization is needed. This initiative resulted in a standards-based methodology for population health queries, a reference implementation, and revision of the HQMF standard. It also informed future directions regarding interoperability and data access for ONC's Data Access Framework initiative. Query Health was a test of the learning health system that supplied a functional methodology and reference implementation for distributed population health queries that has been validated at three sites. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under

  17. Producing approximate answers to database queries

    NASA Technical Reports Server (NTRS)

    Vrbsky, Susan V.; Liu, Jane W. S.

    1993-01-01

    We have designed and implemented a query processor, called APPROXIMATE, that makes approximate answers available if part of the database is unavailable or if there is not enough time to produce an exact answer. The accuracy of the approximate answers produced improves monotonically with the amount of data retrieved to produce the result. The exact answer is produced if all of the needed data are available and query processing is allowed to continue until completion. The monotone query processing algorithm of APPROXIMATE works within the standard relational algebra framework and can be implemented on a relational database system with little change to the relational architecture. We describe here the approximation semantics of APPROXIMATE that serves as the basis for meaningful approximations of both set-valued and single-valued queries. We show how APPROXIMATE is implemented to make effective use of semantic information, provided by an object-oriented view of the database, and describe the additional overhead required by APPROXIMATE.

  18. Path querying system on mobile devices

    NASA Astrophysics Data System (ADS)

    Lin, Xing; Wang, Yifei; Tian, Yuan; Wu, Lun

    2006-01-01

    Traditional approaches to path querying problems are not efficient and convenient under most circumstances. A more convenient and reliable approach to this problem has to be found. This paper is devoted to a path querying solution on mobile devices. By using an improved Dijkstra's shortest path algorithm and a natural language translating module, this system can help people find the shortest path between two places through their cell phones or other mobile devices. The chosen path is prompted in text of natural language, as well as a map picture. This system would be useful in solving best path querying problems and have potential to be a profitable business system.

  19. Flexible Querying of Lifelong Learner Metadata

    ERIC Educational Resources Information Center

    Poulovassilis, A.; Selmer, P.; Wood, P. T.

    2012-01-01

    This paper discusses the provision of flexible querying facilities over heterogeneous data arising from lifelong learners' educational and work experiences. A key aim of such querying facilities is to allow learners to identify possible choices for their future learning and professional development by seeing what others have done. We motivate and…

  20. Does query expansion limit our learning? A comparison of social-based expansion to content-based expansion for medical queries on the internet.

    PubMed

    Pentoney, Christopher; Harwell, Jeff; Leroy, Gondy

    2014-01-01

    Searching for medical information online is a common activity. While it has been shown that forming good queries is difficult, Google's query suggestion tool, a type of query expansion, aims to facilitate query formation. However, it is unknown how this expansion, which is based on what others searched for, affects the information gathering of the online community. To measure the impact of social-based query expansion, this study compared it with content-based expansion, i.e., what is really in the text. We used 138,906 medical queries from the AOL User Session Collection and expanded them using Google's Autocomplete method (social-based) and the content of the Google Web Corpus (content-based). We evaluated the specificity and ambiguity of the expansion terms for trigram queries. We also looked at the impact on the actual results using domain diversity and expansion edit distance. Results showed that the social-based method provided more precise expansion terms as well as terms that were less ambiguous. Expanded queries do not differ significantly in diversity when expanded using the social-based method (6.72 different domains returned in the first ten results, on average) vs. content-based method (6.73 different domains, on average).

  1. Policy Compliance of Queries for Private Information Retrieval

    DTIC Science & Technology

    2010-11-01

    SPARQL, unfortunately, is not in RDF and so we had to develop tools to translate SPARQL queries into RDF to be used by our policy compliance prototype...policy-assurance/sparql2n3.py) that accepts SPARQL queries and returns the translated query in our simplified ontology. An example of a translated

  2. Device-independent quantum private query

    NASA Astrophysics Data System (ADS)

    Maitra, Arpita; Paul, Goutam; Roy, Sarbani

    2017-04-01

    In quantum private query (QPQ), a client obtains values corresponding to his or her query only, and nothing else from the server, and the server does not get any information about the queries. V. Giovannetti et al. [Phys. Rev. Lett. 100, 230502 (2008)], 10.1103/PhysRevLett.100.230502 gave the first QPQ protocol and since then quite a few variants and extensions have been proposed. However, none of the existing protocols are device independent; i.e., all of them assume implicitly that the entangled states supplied to the client and the server are of a certain form. In this work, we exploit the idea of a local CHSH game and connect it with the scheme of Y. G. Yang et al. [Quantum Info. Process. 13, 805 (2014)], 10.1007/s11128-013-0692-8 to present the concept of a device-independent QPQ protocol.

  3. Model-based query language for analyzing clinical processes.

    PubMed

    Barzdins, Janis; Barzdins, Juris; Rencis, Edgars; Sostaks, Agris

    2013-01-01

    Nowadays large databases of clinical process data exist in hospitals. However, these data are rarely used in full scope. In order to perform queries on hospital processes, one must either choose from the predefined queries or develop queries using MS Excel-type software system, which is not always a trivial task. In this paper we propose a new query language for analyzing clinical processes that is easily perceptible also by non-IT professionals. We develop this language based on a process modeling language which is also described in this paper. Prototypes of both languages have already been verified using real examples from hospitals.

  4. AQBE — QBE Style Queries for Archetyped Data

    NASA Astrophysics Data System (ADS)

    Sachdeva, Shelly; Yaginuma, Daigo; Chu, Wanming; Bhalla, Subhash

    Large-scale adoption of electronic healthcare applications requires semantic interoperability. The new proposals propose an advanced (multi-level) DBMS architecture for repository services for health records of patients. These also require query interfaces at multiple levels and at the level of semi-skilled users. In this regard, a high-level user interface for querying the new form of standardized Electronic Health Records system has been examined in this study. It proposes a step-by-step graphical query interface to allow semi-skilled users to write queries. Its aim is to decrease user effort and communication ambiguities, and increase user friendliness.

  5. Query-Time Optimization Techniques for Structured Queries in Information Retrieval

    ERIC Educational Resources Information Center

    Cartright, Marc-Allen

    2013-01-01

    The use of information retrieval (IR) systems is evolving towards larger, more complicated queries. Both the IR industrial and research communities have generated significant evidence indicating that in order to continue improving retrieval effectiveness, increases in retrieval model complexity may be unavoidable. From an operational perspective,…

  6. Consistent Query Answering of Conjunctive Queries under Primary Key Constraints

    ERIC Educational Resources Information Center

    Pema, Enela

    2014-01-01

    An inconsistent database is a database that violates one or more of its integrity constraints. In reality, violations of integrity constraints arise frequently under several different circumstances. Inconsistent databases have long posed the challenge to develop suitable tools for meaningful query answering. A principled approach for querying…

  7. Improving Web Search for Difficult Queries

    ERIC Educational Resources Information Center

    Wang, Xuanhui

    2009-01-01

    Search engines have now become essential tools in all aspects of our life. Although a variety of information needs can be served very successfully, there are still a lot of queries that search engines can not answer very effectively and these queries always make users feel frustrated. Since it is quite often that users encounter such "difficult…

  8. Matching health information seekers' queries to medical terms

    PubMed Central

    2012-01-01

    Background The Internet is a major source of health information but most seekers are not familiar with medical vocabularies. Hence, their searches fail due to bad query formulation. Several methods have been proposed to improve information retrieval: query expansion, syntactic and semantic techniques or knowledge-based methods. However, it would be useful to clean those queries which are misspelled. In this paper, we propose a simple yet efficient method in order to correct misspellings of queries submitted by health information seekers to a medical online search tool. Methods In addition to query normalizations and exact phonetic term matching, we tested two approximate string comparators: the similarity score function of Stoilos and the normalized Levenshtein edit distance. We propose here to combine them to increase the number of matched medical terms in French. We first took a sample of query logs to determine the thresholds and processing times. In the second run, at a greater scale we tested different combinations of query normalizations before or after misspelling correction with the retained thresholds in the first run. Results According to the total number of suggestions (around 163, the number of the first sample of queries), at a threshold comparator score of 0.3, the normalized Levenshtein edit distance gave the highest F-Measure (88.15%) and at a threshold comparator score of 0.7, the Stoilos function gave the highest F-Measure (84.31%). By combining Levenshtein and Stoilos, the highest F-Measure (80.28%) is obtained with 0.2 and 0.7 thresholds respectively. However, queries are composed by several terms that may be combination of medical terms. The process of query normalization and segmentation is thus required. The highest F-Measure (64.18%) is obtained when this process is realized before spelling-correction. Conclusions Despite the widely known high performance of the normalized edit distance of Levenshtein, we show in this paper that its

  9. Processing SPARQL queries with regular expressions in RDF databases

    PubMed Central

    2011-01-01

    Background As the Resource Description Framework (RDF) data model is widely used for modeling and sharing a lot of online bioinformatics resources such as Uniprot (dev.isb-sib.ch/projects/uniprot-rdf) or Bio2RDF (bio2rdf.org), SPARQL - a W3C recommendation query for RDF databases - has become an important query language for querying the bioinformatics knowledge bases. Moreover, due to the diversity of users’ requests for extracting information from the RDF data as well as the lack of users’ knowledge about the exact value of each fact in the RDF databases, it is desirable to use the SPARQL query with regular expression patterns for querying the RDF data. To the best of our knowledge, there is currently no work that efficiently supports regular expression processing in SPARQL over RDF databases. Most of the existing techniques for processing regular expressions are designed for querying a text corpus, or only for supporting the matching over the paths in an RDF graph. Results In this paper, we propose a novel framework for supporting regular expression processing in SPARQL query. Our contributions can be summarized as follows. 1) We propose an efficient framework for processing SPARQL queries with regular expression patterns in RDF databases. 2) We propose a cost model in order to adapt the proposed framework in the existing query optimizers. 3) We build a prototype for the proposed framework in C++ and conduct extensive experiments demonstrating the efficiency and effectiveness of our technique. Conclusions Experiments with a full-blown RDF engine show that our framework outperforms the existing ones by up to two orders of magnitude in processing SPARQL queries with regular expression patterns. PMID:21489225

  10. Processing SPARQL queries with regular expressions in RDF databases.

    PubMed

    Lee, Jinsoo; Pham, Minh-Duc; Lee, Jihwan; Han, Wook-Shin; Cho, Hune; Yu, Hwanjo; Lee, Jeong-Hoon

    2011-03-29

    As the Resource Description Framework (RDF) data model is widely used for modeling and sharing a lot of online bioinformatics resources such as Uniprot (dev.isb-sib.ch/projects/uniprot-rdf) or Bio2RDF (bio2rdf.org), SPARQL - a W3C recommendation query for RDF databases - has become an important query language for querying the bioinformatics knowledge bases. Moreover, due to the diversity of users' requests for extracting information from the RDF data as well as the lack of users' knowledge about the exact value of each fact in the RDF databases, it is desirable to use the SPARQL query with regular expression patterns for querying the RDF data. To the best of our knowledge, there is currently no work that efficiently supports regular expression processing in SPARQL over RDF databases. Most of the existing techniques for processing regular expressions are designed for querying a text corpus, or only for supporting the matching over the paths in an RDF graph. In this paper, we propose a novel framework for supporting regular expression processing in SPARQL query. Our contributions can be summarized as follows. 1) We propose an efficient framework for processing SPARQL queries with regular expression patterns in RDF databases. 2) We propose a cost model in order to adapt the proposed framework in the existing query optimizers. 3) We build a prototype for the proposed framework in C++ and conduct extensive experiments demonstrating the efficiency and effectiveness of our technique. Experiments with a full-blown RDF engine show that our framework outperforms the existing ones by up to two orders of magnitude in processing SPARQL queries with regular expression patterns.

  11. Multidimensional indexing structure for use with linear optimization queries

    NASA Technical Reports Server (NTRS)

    Bergman, Lawrence David (Inventor); Castelli, Vittorio (Inventor); Chang, Yuan-Chi (Inventor); Li, Chung-Sheng (Inventor); Smith, John Richard (Inventor)

    2002-01-01

    Linear optimization queries, which usually arise in various decision support and resource planning applications, are queries that retrieve top N data records (where N is an integer greater than zero) which satisfy a specific optimization criterion. The optimization criterion is to either maximize or minimize a linear equation. The coefficients of the linear equation are given at query time. Methods and apparatus are disclosed for constructing, maintaining and utilizing a multidimensional indexing structure of database records to improve the execution speed of linear optimization queries. Database records with numerical attributes are organized into a number of layers and each layer represents a geometric structure called convex hull. Such linear optimization queries are processed by searching from the outer-most layer of this multi-layer indexing structure inwards. At least one record per layer will satisfy the query criterion and the number of layers needed to be searched depends on the spatial distribution of records, the query-issued linear coefficients, and N, the number of records to be returned. When N is small compared to the total size of the database, answering the query typically requires searching only a small fraction of all relevant records, resulting in a tremendous speedup as compared to linearly scanning the entire dataset.

  12. RCQ-GA: RDF Chain Query Optimization Using Genetic Algorithms

    NASA Astrophysics Data System (ADS)

    Hogenboom, Alexander; Milea, Viorel; Frasincar, Flavius; Kaymak, Uzay

    The application of Semantic Web technologies in an Electronic Commerce environment implies a need for good support tools. Fast query engines are needed for efficient querying of large amounts of data, usually represented using RDF. We focus on optimizing a special class of SPARQL queries, the so-called RDF chain queries. For this purpose, we devise a genetic algorithm called RCQ-GA that determines the order in which joins need to be performed for an efficient evaluation of RDF chain queries. The approach is benchmarked against a two-phase optimization algorithm, previously proposed in literature. The more complex a query is, the more RCQ-GA outperforms the benchmark in solution quality, execution time needed, and consistency of solution quality. When the algorithms are constrained by a time limit, the overall performance of RCQ-GA compared to the benchmark further improves.

  13. FRS EZ Query

    EPA Pesticide Factsheets

    This page is the starting point for EZ Query. This page describes how to select key data elements from EPA's Facility Information Database and Geospatial Reference Database to build a tabular report or a Comma Separated Value (CSV) files for downloading.

  14. Experimental quantum private queries with linear optics

    NASA Astrophysics Data System (ADS)

    de Martini, Francesco; Giovannetti, Vittorio; Lloyd, Seth; Maccone, Lorenzo; Nagali, Eleonora; Sansoni, Linda; Sciarrino, Fabio

    2009-07-01

    The quantum private query is a quantum cryptographic protocol to recover information from a database, preserving both user and data privacy: the user can test whether someone has retained information on which query was asked and the database provider can test the amount of information released. Here we discuss a variant of the quantum private query algorithm that admits a simple linear optical implementation: it employs the photon’s momentum (or time slot) as address qubits and its polarization as bus qubit. A proof-of-principle experimental realization is implemented.

  15. Provenance Storage, Querying, and Visualization in PBase

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kianmajd, Parisa; Ludascher, Bertram; Missier, Paolo

    2015-01-01

    We present PBase, a repository for scientific workflows and their corresponding provenance information that facilitates the sharing of experiments among the scientific community. PBase is interoperable since it uses ProvONE, a standard provenance model for scientific workflows. Workflows and traces are stored in RDF, and with the support of SPARQL and the tree cover encoding, the repository provides a scalable infrastructure for querying the provenance data. Furthermore, through its user interface, it is possible to: visualize workflows and execution traces; visualize reachability relations within these traces; issue SPARQL queries; and visualize query results.

  16. A study of medical and health queries to web search engines.

    PubMed

    Spink, Amanda; Yang, Yin; Jansen, Jim; Nykanen, Pirrko; Lorence, Daniel P; Ozmutlu, Seda; Ozmutlu, H Cenk

    2004-03-01

    This paper reports findings from an analysis of medical or health queries to different web search engines. We report results: (i). comparing samples of 10000 web queries taken randomly from 1.2 million query logs from the AlltheWeb.com and Excite.com commercial web search engines in 2001 for medical or health queries, (ii). comparing the 2001 findings from Excite and AlltheWeb.com users with results from a previous analysis of medical and health related queries from the Excite Web search engine for 1997 and 1999, and (iii). medical or health advice-seeking queries beginning with the word 'should'. Findings suggest: (i). a small percentage of web queries are medical or health related, (ii). the top five categories of medical or health queries were: general health, weight issues, reproductive health and puberty, pregnancy/obstetrics, and human relationships, and (iii). over time, the medical and health queries may have declined as a proportion of all web queries, as the use of specialized medical/health websites and e-commerce-related queries has increased. Findings provide insights into medical and health-related web querying and suggests some implications for the use of the general web search engines when seeking medical/health information.

  17. Multiple Query Evaluation Based on an Enhanced Genetic Algorithm.

    ERIC Educational Resources Information Center

    Tamine, Lynda; Chrisment, Claude; Boughanem, Mohand

    2003-01-01

    Explains the use of genetic algorithms to combine results from multiple query evaluations to improve relevance in information retrieval. Discusses niching techniques, relevance feedback techniques, and evolution heuristics, and compares retrieval results obtained by both genetic multiple query evaluation and classical single query evaluation…

  18. Federated ontology-based queries over cancer data

    PubMed Central

    2012-01-01

    Background Personalised medicine provides patients with treatments that are specific to their genetic profiles. It requires efficient data sharing of disparate data types across a variety of scientific disciplines, such as molecular biology, pathology, radiology and clinical practice. Personalised medicine aims to offer the safest and most effective therapeutic strategy based on the gene variations of each subject. In particular, this is valid in oncology, where knowledge about genetic mutations has already led to new therapies. Current molecular biology techniques (microarrays, proteomics, epigenetic technology and improved DNA sequencing technology) enable better characterisation of cancer tumours. The vast amounts of data, however, coupled with the use of different terms - or semantic heterogeneity - in each discipline makes the retrieval and integration of information difficult. Results Existing software infrastructures for data-sharing in the cancer domain, such as caGrid, support access to distributed information. caGrid follows a service-oriented model-driven architecture. Each data source in caGrid is associated with metadata at increasing levels of abstraction, including syntactic, structural, reference and domain metadata. The domain metadata consists of ontology-based annotations associated with the structural information of each data source. However, caGrid's current querying functionality is given at the structural metadata level, without capitalising on the ontology-based annotations. This paper presents the design of and theoretical foundations for distributed ontology-based queries over cancer research data. Concept-based queries are reformulated to the target query language, where join conditions between multiple data sources are found by exploiting the semantic annotations. The system has been implemented, as a proof of concept, over the caGrid infrastructure. The approach is applicable to other model-driven architectures. A graphical user

  19. Executing SPARQL Queries over the Web of Linked Data

    NASA Astrophysics Data System (ADS)

    Hartig, Olaf; Bizer, Christian; Freytag, Johann-Christoph

    The Web of Linked Data forms a single, globally distributed dataspace. Due to the openness of this dataspace, it is not possible to know in advance all data sources that might be relevant for query answering. This openness poses a new challenge that is not addressed by traditional research on federated query processing. In this paper we present an approach to execute SPARQL queries over the Web of Linked Data. The main idea of our approach is to discover data that might be relevant for answering a query during the query execution itself. This discovery is driven by following RDF links between data sources based on URIs in the query and in partial results. The URIs are resolved over the HTTP protocol into RDF data which is continuously added to the queried dataset. This paper describes concepts and algorithms to implement our approach using an iterator-based pipeline. We introduce a formalization of the pipelining approach and show that classical iterators may cause blocking due to the latency of HTTP requests. To avoid blocking, we propose an extension of the iterator paradigm. The evaluation of our approach shows its strengths as well as the still existing challenges.

  20. Query Expansion Using SNOMED-CT and Weighing Schemes

    DTIC Science & Technology

    2014-11-01

    For this research, we have used SNOMED-CT along with UMLS Methathesaurus as our ontology in medical domain to expand the queries. General Terms...CT along with UMLS Methathesaurus as our ontology in medical domain to expand the queries. 15. SUBJECT TERMS 16. SECURITY CLASSIFICATION OF: 17...University of the Basque country discuss their finding on query expansion using external sources headlined by Unified Medical Language System ( UMLS

  1. A Relational Algebra Query Language for Programming Relational Databases

    ERIC Educational Resources Information Center

    McMaster, Kirby; Sambasivam, Samuel; Anderson, Nicole

    2011-01-01

    In this paper, we describe a Relational Algebra Query Language (RAQL) and Relational Algebra Query (RAQ) software product we have developed that allows database instructors to teach relational algebra through programming. Instead of defining query operations using mathematical notation (the approach commonly taken in database textbooks), students…

  2. Cognitive issues in searching images with visual queries

    NASA Astrophysics Data System (ADS)

    Yu, ByungGu; Evens, Martha W.

    1999-01-01

    In this paper, we propose our image indexing technique and visual query processing technique. Our mental images are different from the actual retinal images and many things, such as personal interests, personal experiences, perceptual context, the characteristics of spatial objects, and so on, affect our spatial perception. These private differences are propagated into our mental images and so our visual queries become different from the real images that we want to find. This is a hard problem and few people have tried to work on it. In this paper, we survey the human mental imagery system, the human spatial perception, and discuss several kinds of visual queries. Also, we propose our own approach to visual query interpretation and processing.

  3. Ontology-based geospatial data query and integration

    USGS Publications Warehouse

    Zhao, T.; Zhang, C.; Wei, M.; Peng, Z.-R.

    2008-01-01

    Geospatial data sharing is an increasingly important subject as large amount of data is produced by a variety of sources, stored in incompatible formats, and accessible through different GIS applications. Past efforts to enable sharing have produced standardized data format such as GML and data access protocols such as Web Feature Service (WFS). While these standards help enabling client applications to gain access to heterogeneous data stored in different formats from diverse sources, the usability of the access is limited due to the lack of data semantics encoded in the WFS feature types. Past research has used ontology languages to describe the semantics of geospatial data but ontology-based queries cannot be applied directly to legacy data stored in databases or shapefiles, or to feature data in WFS services. This paper presents a method to enable ontology query on spatial data available from WFS services and on data stored in databases. We do not create ontology instances explicitly and thus avoid the problems of data replication. Instead, user queries are rewritten to WFS getFeature requests and SQL queries to database. The method also has the benefits of being able to utilize existing tools of databases, WFS, and GML while enabling query based on ontology semantics. ?? 2008 Springer-Verlag Berlin Heidelberg.

  4. Fast Query-Optimized Kernel-Machine Classification

    NASA Technical Reports Server (NTRS)

    Mazzoni, Dominic; DeCoste, Dennis

    2004-01-01

    A recently developed algorithm performs kernel-machine classification via incremental approximate nearest support vectors. The algorithm implements support-vector machines (SVMs) at speeds 10 to 100 times those attainable by use of conventional SVM algorithms. The algorithm offers potential benefits for classification of images, recognition of speech, recognition of handwriting, and diverse other applications in which there are requirements to discern patterns in large sets of data. SVMs constitute a subset of kernel machines (KMs), which have become popular as models for machine learning and, more specifically, for automated classification of input data on the basis of labeled training data. While similar in many ways to k-nearest-neighbors (k-NN) models and artificial neural networks (ANNs), SVMs tend to be more accurate. Using representations that scale only linearly in the numbers of training examples, while exploring nonlinear (kernelized) feature spaces that are exponentially larger than the original input dimensionality, KMs elegantly and practically overcome the classic curse of dimensionality. However, the price that one must pay for the power of KMs is that query-time complexity scales linearly with the number of training examples, making KMs often orders of magnitude more computationally expensive than are ANNs, decision trees, and other popular machine learning alternatives. The present algorithm treats an SVM classifier as a special form of a k-NN. The algorithm is based partly on an empirical observation that one can often achieve the same classification as that of an exact KM by using only small fraction of the nearest support vectors (SVs) of a query. The exact KM output is a weighted sum over the kernel values between the query and the SVs. In this algorithm, the KM output is approximated with a k-NN classifier, the output of which is a weighted sum only over the kernel values involving k selected SVs. Before query time, there are gathered

  5. Optimization of the Controlled Evaluation of Closed Relational Queries

    NASA Astrophysics Data System (ADS)

    Biskup, Joachim; Lochner, Jan-Hendrik; Sonntag, Sebastian

    For relational databases, controlled query evaluation is an effective inference control mechanism preserving confidentiality regarding a previously declared confidentiality policy. Implementations of controlled query evaluation usually lack efficiency due to costly theorem prover calls. Suitably constrained controlled query evaluation can be implemented efficiently, but is not flexible enough from the perspective of database users and security administrators. In this paper, we propose an optimized framework for controlled query evaluation in relational databases, being efficiently implementable on the one hand and relaxing the constraints of previous approaches on the other hand.

  6. Secure and Cost-Effective Distributed Aggregation for Mobile Sensor Networks

    PubMed Central

    Guo, Kehua; Zhang, Ping; Ma, Jianhua

    2016-01-01

    Secure data aggregation (SDA) schemes are widely used in distributed applications, such as mobile sensor networks, to reduce communication cost, prolong the network life cycle and provide security. However, most SDA are only suited for a single type of statistics (i.e., summation-based or comparison-based statistics) and are not applicable to obtaining multiple statistic results. Most SDA are also inefficient for dynamic networks. This paper presents multi-functional secure data aggregation (MFSDA), in which the mapping step and coding step are introduced to provide value-preserving and order-preserving and, later, to enable arbitrary statistics support in the same query. MFSDA is suited for dynamic networks because these active nodes can be counted directly from aggregation data. The proposed scheme is tolerant to many types of attacks. The network load of the proposed scheme is balanced, and no significant bottleneck exists. The MFSDA includes two versions: MFSDA-I and MFSDA-II. The first one can obtain accurate results, while the second one is a more generalized version that can significantly reduce network traffic at the expense of less accuracy loss. PMID:27120599

  7. Query construction, entropy, and generalization in neural-network models

    NASA Astrophysics Data System (ADS)

    Sollich, Peter

    1994-05-01

    We study query construction algorithms, which aim at improving the generalization ability of systems that learn from examples by choosing optimal, nonredundant training sets. We set up a general probabilistic framework for deriving such algorithms from the requirement of optimizing a suitable objective function; specifically, we consider the objective functions entropy (or information gain) and generalization error. For two learning scenarios, the high-low game and the linear perceptron, we evaluate the generalization performance obtained by applying the corresponding query construction algorithms and compare it to training on random examples. We find qualitative differences between the two scenarios due to the different structure of the underlying rules (nonlinear and ``noninvertible'' versus linear); in particular, for the linear perceptron, random examples lead to the same generalization ability as a sequence of queries in the limit of an infinite number of examples. We also investigate learning algorithms which are ill matched to the learning environment and find that, in this case, minimum entropy queries can in fact yield a lower generalization ability than random examples. Finally, we study the efficiency of single queries and its dependence on the learning history, i.e., on whether the previous training examples were generated randomly or by querying, and the difference between globally and locally optimal query construction.

  8. Dynamics of proteins aggregation. I. Universal scaling in unbounded media

    NASA Astrophysics Data System (ADS)

    Zheng, Size; Javidpour, Leili; Shing, Katherine S.; Sahimi, Muhammad

    2016-10-01

    It is well understood that in some cases proteins do not fold correctly and, depending on their environment, even properly-folded proteins change their conformation spontaneously, taking on a misfolded state that leads to protein aggregation and formation of large aggregates. An important factor that contributes to the aggregation is the interactions between the misfolded proteins. Depending on the aggregation environment, the aggregates may take on various shapes forming larger structures, such as protein plaques that are often toxic. Their deposition in tissues is a major contributing factor to many neuro-degenerative diseases, such as Alzheimer's, Parkinson's, amyotrophic lateral sclerosis, and prion. This paper represents the first part in a series devoted to molecular simulation of protein aggregation. We use the PRIME, a meso-scale model of proteins, together with extensive discontinuous molecular dynamics simulation to study the aggregation process in an unbounded fluid system, as the first step toward MD simulation of the same phenomenon in crowded cellular environments. Various properties of the aggregates have been computed, including dynamic evolution of aggregate-size distribution, mean aggregate size, number of peptides that contribute to the formation of β sheets, number of various types of hydrogen bonds formed in the system, radius of gyration of the aggregates, and the aggregates' diffusivity. We show that many of such quantities follow dynamic scaling, similar to those for aggregation of colloidal clusters. In particular, at long times the mean aggregate size S(t) grows with time as, S(t) ˜ tz, where z is the dynamic exponent. To our knowledge, this is the first time that the qualitative similarity between aggregation of proteins and colloidal aggregates has been pointed out.

  9. SAM Pathogen Methods Query

    EPA Pesticide Factsheets

    Laboratories measuring target pathogen analytes in environmental samples can use this online query tool to identify analytical methods in EPA's Selected Analytical Methods for Environmental Remediation and Recovery for select pathogens.

  10. The Hub Population Health System: distributed ad hoc queries and alerts

    PubMed Central

    Anane, Sheila; Taverna, John; Amirfar, Sam; Stubbs-Dame, Remle; Singer, Jesse

    2011-01-01

    The Hub Population Health System enables the creation and distribution of queries for aggregate count information, clinical decision support alerts at the point-of-care for patients who meet specified conditions, and secure messages sent directly to provider electronic health record (EHR) inboxes. Using a metronidazole medication recall, the New York City Department of Health was able to determine the number of affected patients and message providers, and distribute an alert to participating practices. As of September 2011, the system is live in 400 practices and within a year will have over 532 practices with 2500 providers, representing over 2.5 million New Yorkers. The Hub can help public health experts to evaluate population health and quality improvement activities throughout the ambulatory care network. Multiple EHR vendors are building these features in partnership with the department's regional extension center in anticipation of new meaningful use requirements. PMID:22071531

  11. Cognitive search model and a new query paradigm

    NASA Astrophysics Data System (ADS)

    Xu, Zhonghui

    2001-06-01

    This paper proposes a cognitive model in which people begin to search pictures by using semantic content and find a right picture by judging whether its visual content is a proper visualization of the semantics desired. It is essential that human search is not just a process of matching computation on visual feature but rather a process of visualization of the semantic content known. For people to search electronic images in the way as they manually do in the model, we suggest that querying be a semantic-driven process like design. A query-by-design paradigm is prosed in the sense that what you design is what you find. Unlike query-by-example, query-by-design allows users to specify the semantic content through an iterative and incremental interaction process so that a retrieval can start with association and identification of the given semantic content and get refined while further visual cues are available. An experimental image retrieval system, Kuafu, has been under development using the query-by-design paradigm and an iconic language is adopted.

  12. Amount of Televised Alcohol Advertising Exposure and the Quantity of Alcohol Consumed by Youth.

    PubMed

    Naimi, Timothy S; Ross, Craig S; Siegel, Michael B; DeJong, William; Jernigan, David H

    2016-09-01

    Although studies demonstrate that exposure to brand-specific alcohol advertising is associated with an increased likelihood of youth consuming particular brands, the relationship between quantity of brand-specific advertising exposure and quantity of brand-specific consumption has not been firmly established. Using the Alcohol Brand Research Among Underage Drinkers (ABRAND) national sample of 1,031 young drinkers (ages 13-20), this study examined the relationship between their aggregated past-year exposure to advertising (in adstock units, a measure based on gross rating points) for 61 alcohol brands that advertised on the 20 most popular nonsports television programs viewed by underage youth and their aggregated total consumption of those same brands during the past 30 days. Predictive models adjusted for other media exposure, predictors of youth's alcohol consumption, and the consumption of brands not advertised on the 20 shows. For the fully adjusted models, each 100 adstock unit increase in exposure (about 1 SD) was associated with an increase of 5.9 drinks (95% CI [0.9, 11.0 drinks]) consumed during the past 30 days among those with less than 300 units of advertising exposure, and an increase of 55.7 drinks (95% CI [13.9, 97.4 drinks]) among those with 300 or more adstock units of exposure. Among underage youth, the quantity of brand-specific advertising exposure is positively associated with the total quantity of consumption of those advertised brands, even after controlling for the consumption of non-advertised brands. Future research should examine exposure-consumption relationships longitudinally and in other media.

  13. Amount of Televised Alcohol Advertising Exposure and the Quantity of Alcohol Consumed by Youth

    PubMed Central

    Naimi, Timothy S.; Ross, Craig S.; Siegel, Michael B.; DeJong, William; Jernigan, David H.

    2016-01-01

    Objective: Although studies demonstrate that exposure to brand-specific alcohol advertising is associated with an increased likelihood of youth consuming particular brands, the relationship between quantity of brand-specific advertising exposure and quantity of brand-specific consumption has not been firmly established. Method: Using the Alcohol Brand Research Among Underage Drinkers (ABRAND) national sample of 1,031 young drinkers (ages 13–20), this study examined the relationship between their aggregated past-year exposure to advertising (in adstock units, a measure based on gross rating points) for 61 alcohol brands that advertised on the 20 most popular nonsports television programs viewed by underage youth and their aggregated total consumption of those same brands during the past 30 days. Predictive models adjusted for other media exposure, predictors of youth’s alcohol consumption, and the consumption of brands not advertised on the 20 shows. Results: For the fully adjusted models, each 100 adstock unit increase in exposure (about 1 SD) was associated with an increase of 5.9 drinks (95% CI [0.9, 11.0 drinks]) consumed during the past 30 days among those with less than 300 units of advertising exposure, and an increase of 55.7 drinks (95% CI [13.9, 97.4 drinks]) among those with 300 or more adstock units of exposure. Conclusions: Among underage youth, the quantity of brand-specific advertising exposure is positively associated with the total quantity of consumption of those advertised brands, even after controlling for the consumption of non-advertised brands. Future research should examine exposure–consumption relationships longitudinally and in other media. PMID:27588530

  14. A New Publicly Available Chemical Query Language, CSRML ...

    EPA Pesticide Factsheets

    A new XML-based query language, CSRML, has been developed for representing chemical substructures, molecules, reaction rules, and reactions. CSRML queries are capable of integrating additional forms of information beyond the simple substructure (e.g., SMARTS) or reaction transformation (e.g., SMIRKS, reaction SMILES) queries currently in use. Chemotypes, a term used to represent advanced CSRML queries for repeated application can be encoded not only with connectivity and topology, but also with properties of atoms, bonds, electronic systems, or molecules. The CSRML language has been developed in parallel with a public set of chemotypes, i.e., the ToxPrint chemotypes, which are designed to provide excellent coverage of environmental, regulatory and commercial use chemical space, as well as to represent features and frameworks believed to be especially relevant to toxicity concerns. A software application, ChemoTyper, has also been developed and made publicly available to enable chemotype searching and fingerprinting against a target structure set. The public ChemoTyper houses the ToxPrint chemotype CSRML dictionary, as well as reference implementation so that the query specifications may be adopted by other chemical structure knowledge systems. The full specifications of the XML standard used in CSRML-based chemotypes are publicly available to facilitate and encourage the exchange of structural knowledge. Paper details specifications for a new XML-based query lan

  15. SAM Biotoxin Methods Query

    EPA Pesticide Factsheets

    Laboratories measuring target biotoxin analytes in environmental samples can use this online query tool to identify analytical methods included in EPA's Selected Analytical Methods for Environmental Remediation and Recovery for select biotoxins.

  16. SAM Radiochemical Methods Query

    EPA Pesticide Factsheets

    Laboratories measuring target radiochemical analytes in environmental samples can use this online query tool to identify analytical methods in EPA's Selected Analytical Methods for Environmental Remediation and Recovery for select radiochemical analytes.

  17. SAM Chemical Methods Query

    EPA Pesticide Factsheets

    Laboratories measuring target chemical, radiochemical, pathogens, and biotoxin analytes in environmental samples can use this online query tool to identify analytical methods in EPA's Selected Analytical Methods for Environmental Remediation and Recovery

  18. Aggregated Indexing of Biomedical Time Series Data

    PubMed Central

    Woodbridge, Jonathan; Mortazavi, Bobak; Sarrafzadeh, Majid; Bui, Alex A.T.

    2016-01-01

    Remote and wearable medical sensing has the potential to create very large and high dimensional datasets. Medical time series databases must be able to efficiently store, index, and mine these datasets to enable medical professionals to effectively analyze data collected from their patients. Conventional high dimensional indexing methods are a two stage process. First, a superset of the true matches is efficiently extracted from the database. Second, supersets are pruned by comparing each of their objects to the query object and rejecting any objects falling outside a predetermined radius. This pruning stage heavily dominates the computational complexity of most conventional search algorithms. Therefore, indexing algorithms can be significantly improved by reducing the amount of pruning. This paper presents an online algorithm to aggregate biomedical times series data to significantly reduce the search space (index size) without compromising the quality of search results. This algorithm is built on the observation that biomedical time series signals are composed of cyclical and often similar patterns. This algorithm takes in a stream of segments and groups them to highly concentrated collections. Locality Sensitive Hashing (LSH) is used to reduce the overall complexity of the algorithm, allowing it to run online. The output of this aggregation is used to populate an index. The proposed algorithm yields logarithmic growth of the index (with respect to the total number of objects) while keeping sensitivity and specificity simultaneously above 98%. Both memory and runtime complexities of time series search are improved when using aggregated indexes. In addition, data mining tasks, such as clustering, exhibit runtimes that are orders of magnitudes faster when run on aggregated indexes. PMID:27617298

  19. A Semantic Basis for Proof Queries and Transformations

    NASA Technical Reports Server (NTRS)

    Aspinall, David; Denney, Ewen W.; Luth, Christoph

    2013-01-01

    We extend the query language PrQL, designed for inspecting machine representations of proofs, to also allow transformation of proofs. PrQL natively supports hiproofs which express proof structure using hierarchically nested labelled trees, which we claim is a natural way of taming the complexity of huge proofs. Query-driven transformations enable manipulation of this structure, in particular, to transform proofs produced by interactive theorem provers into forms that assist their understanding, or that could be consumed by other tools. In this paper we motivate and define basic transformation operations, using an abstract denotational semantics of hiproofs and queries. This extends our previous semantics for queries based on syntactic tree representations.We define update operations that add and remove sub-proofs, and manipulate the hierarchy to group and ungroup nodes. We show that

  20. Query Language for Location-Based Services: A Model Checking Approach

    NASA Astrophysics Data System (ADS)

    Hoareau, Christian; Satoh, Ichiro

    We present a model checking approach to the rationale, implementation, and applications of a query language for location-based services. Such query mechanisms are necessary so that users, objects, and/or services can effectively benefit from the location-awareness of their surrounding environment. The underlying data model is founded on a symbolic model of space organized in a tree structure. Once extended to a semantic model for modal logic, we regard location query processing as a model checking problem, and thus define location queries as hybrid logicbased formulas. Our approach is unique to existing research because it explores the connection between location models and query processing in ubiquitous computing systems, relies on a sound theoretical basis, and provides modal logic-based query mechanisms for expressive searches over a decentralized data structure. A prototype implementation is also presented and will be discussed.

  1. Polylogarithmic equilibrium treatment of molecular aggregation and critical concentrations.

    PubMed

    Michel, Denis; Ruelle, Philippe

    2017-02-15

    A full equilibrium treatment of molecular aggregation is presented for prototypes of 1D and 3D aggregates, with and without nucleation. By skipping complex kinetic parameters like aggregate size-dependent diffusion, the equilibrium treatment allows us to predict directly time-independent quantities such as critical concentrations. The relationships between the macroscopic equilibrium constants for different paths are first established by statistical corrections and so as to comply with the detailed balance constraints imposed by nucleation, and the composition of the mixture resulting from homogeneous aggregation is then analyzed using a polylogarithmic function. Several critical concentrations are distinguished: the residual monomer concentration at equilibrium (RMC) and the critical nucleation concentration (CNC), which is the threshold concentration of total subunits necessary for initiating aggregation. When increasing the concentration of total subunits, the RMC converges more strongly to its asymptotic value, the equilibrium constant of depolymerization, for 3D aggregates and in the case of nucleation. The CNC moderately depends on the number of subunits in the nucleus, but sharply increases with the difference between the equilibrium constants of polymerization and nucleation. As the RMC and CNC can be numerically but not analytically determined, ansatz equations connecting them to thermodynamic parameters are proposed.

  2. How Do Children Reformulate Their Search Queries?

    ERIC Educational Resources Information Center

    Rutter, Sophie; Ford, Nigel; Clough, Paul

    2015-01-01

    Introduction: This paper investigates techniques used by children in year 4 (age eight to nine) of a UK primary school to reformulate their queries, and how they use information retrieval systems to support query reformulation. Method: An in-depth study analysing the interactions of twelve children carrying out search tasks in a primary school…

  3. Spatial and Temporal Uncertainty of Crop Yield Aggregations

    NASA Technical Reports Server (NTRS)

    Porwollik, Vera; Mueller, Christoph; Elliott, Joshua; Chryssanthacopoulos, James; Iizumi, Toshichika; Ray, Deepak K.; Ruane, Alex C.; Arneth, Almut; Balkovic, Juraj; Ciais, Philippe; hide

    2016-01-01

    The aggregation of simulated gridded crop yields to national or regional scale requires information on temporal and spatial patterns of crop-specific harvested areas. This analysis estimates the uncertainty of simulated gridded yield time series related to the aggregation with four different harvested area data sets. We compare aggregated yield time series from the Global Gridded Crop Model Inter-comparison project for four crop types from 14 models at global, national, and regional scale to determine aggregation-driven differences in mean yields and temporal patterns as measures of uncertainty. The quantity and spatial patterns of harvested areas differ for individual crops among the four datasets applied for the aggregation. Also simulated spatial yield patterns differ among the 14 models. These differences in harvested areas and simulated yield patterns lead to differences in aggregated productivity estimates, both in mean yield and in the temporal dynamics. Among the four investigated crops, wheat yield (17% relative difference) is most affected by the uncertainty introduced by the aggregation at the global scale. The correlation of temporal patterns of global aggregated yield time series can be as low as for soybean (r = 0.28).For the majority of countries, mean relative differences of nationally aggregated yields account for10% or less. The spatial and temporal difference can be substantial higher for individual countries. Of the top-10 crop producers, aggregated national multi-annual mean relative difference of yields can be up to 67% (maize, South Africa), 43% (wheat, Pakistan), 51% (rice, Japan), and 427% (soybean, Bolivia).Correlations of differently aggregated yield time series can be as low as r = 0.56 (maize, India), r = 0.05*Corresponding (wheat, Russia), r = 0.13 (rice, Vietnam), and r = -0.01 (soybean, Uruguay). The aggregation to sub-national scale in comparison to country scale shows that spatial uncertainties can cancel out in countries with

  4. A distributed query execution engine of big attributed graphs.

    PubMed

    Batarfi, Omar; Elshawi, Radwa; Fayoumi, Ayman; Barnawi, Ahmed; Sakr, Sherif

    2016-01-01

    A graph is a popular data model that has become pervasively used for modeling structural relationships between objects. In practice, in many real-world graphs, the graph vertices and edges need to be associated with descriptive attributes. Such type of graphs are referred to as attributed graphs. G-SPARQL has been proposed as an expressive language, with a centralized execution engine, for querying attributed graphs. G-SPARQL supports various types of graph querying operations including reachability, pattern matching and shortest path where any G-SPARQL query may include value-based predicates on the descriptive information (attributes) of the graph edges/vertices in addition to the structural predicates. In general, a main limitation of centralized systems is that their vertical scalability is always restricted by the physical limits of computer systems. This article describes the design, implementation in addition to the performance evaluation of DG-SPARQL, a distributed, hybrid and adaptive parallel execution engine of G-SPARQL queries. In this engine, the topology of the graph is distributed over the main memory of the underlying nodes while the graph data are maintained in a relational store which is replicated on the disk of each of the underlying nodes. DG-SPARQL evaluates parts of the query plan via SQL queries which are pushed to the underlying relational stores while other parts of the query plan, as necessary, are evaluated via indexless memory-based graph traversal algorithms. Our experimental evaluation shows the efficiency and the scalability of DG-SPARQL on querying massive attributed graph datasets in addition to its ability to outperform the performance of Apache Giraph, a popular distributed graph processing system, by orders of magnitudes.

  5. Ceramic ware waste as coarse aggregate for structural concrete production.

    PubMed

    García-González, Julia; Rodríguez-Robles, Desirée; Juan-Valdés, Andrés; Morán-Del Pozo, Julia M; Guerra-Romero, M Ignacio

    2015-01-01

    The manufacture of any kind of product inevitably entails the production of waste. The quantity of waste generated by the ceramic industry, a very important sector in Spain, is between 5% and 8% of the final output and it is therefore necessary to find an effective waste recovery method. The aim of the study reported in the present article was to seek a sustainable means of managing waste from the ceramic industry through the incorporation of this type of waste in the total replacement of conventional aggregate (gravel) used in structural concrete. Having verified that the recycled ceramic aggregates met all the technical requirements imposed by current Spanish legislation, established in the Code on Structural Concrete (EHE-08), then it is prepared a control concrete mix and the recycled concrete mix using 100% recycled ceramic aggregate instead of coarse natural aggregate. The concretes obtained were subjected to the appropriate tests in order to conduct a comparison of their mechanical properties. The results show that the concretes made using ceramic sanitary ware aggregate possessed the same mechanical properties as those made with conventional aggregate. It is therefore possible to conclude that the reuse of recycled ceramic aggregate to produce recycled concrete is a feasible alternative for the sustainable management of this waste.

  6. Web queries as a source for syndromic surveillance.

    PubMed

    Hulth, Anette; Rydevik, Gustaf; Linde, Annika

    2009-01-01

    In the field of syndromic surveillance, various sources are exploited for outbreak detection, monitoring and prediction. This paper describes a study on queries submitted to a medical web site, with influenza as a case study. The hypothesis of the work was that queries on influenza and influenza-like illness would provide a basis for the estimation of the timing of the peak and the intensity of the yearly influenza outbreaks that would be as good as the existing laboratory and sentinel surveillance. We calculated the occurrence of various queries related to influenza from search logs submitted to a Swedish medical web site for two influenza seasons. These figures were subsequently used to generate two models, one to estimate the number of laboratory verified influenza cases and one to estimate the proportion of patients with influenza-like illness reported by selected General Practitioners in Sweden. We applied an approach designed for highly correlated data, partial least squares regression. In our work, we found that certain web queries on influenza follow the same pattern as that obtained by the two other surveillance systems for influenza epidemics, and that they have equal power for the estimation of the influenza burden in society. Web queries give a unique access to ill individuals who are not (yet) seeking care. This paper shows the potential of web queries as an accurate, cheap and labour extensive source for syndromic surveillance.

  7. A novel adaptive Cuckoo search for optimal query plan generation.

    PubMed

    Gomathi, Ramalingam; Sharmila, Dhandapani

    2014-01-01

    The emergence of multiple web pages day by day leads to the development of the semantic web technology. A World Wide Web Consortium (W3C) standard for storing semantic web data is the resource description framework (RDF). To enhance the efficiency in the execution time for querying large RDF graphs, the evolving metaheuristic algorithms become an alternate to the traditional query optimization methods. This paper focuses on the problem of query optimization of semantic web data. An efficient algorithm called adaptive Cuckoo search (ACS) for querying and generating optimal query plan for large RDF graphs is designed in this research. Experiments were conducted on different datasets with varying number of predicates. The experimental results have exposed that the proposed approach has provided significant results in terms of query execution time. The extent to which the algorithm is efficient is tested and the results are documented.

  8. Secure Nearest Neighbor Query on Crowd-Sensing Data

    PubMed Central

    Cheng, Ke; Wang, Liangmin; Zhong, Hong

    2016-01-01

    Nearest neighbor queries are fundamental in location-based services, and secure nearest neighbor queries mainly focus on how to securely and quickly retrieve the nearest neighbor in the outsourced cloud server. However, the previous big data system structure has changed because of the crowd-sensing data. On the one hand, sensing data terminals as the data owner are numerous and mistrustful, while, on the other hand, in most cases, the terminals find it difficult to finish many safety operation due to computation and storage capability constraints. In light of they Multi Owners and Multi Users (MOMU) situation in the crowd-sensing data cloud environment, this paper presents a secure nearest neighbor query scheme based on the proxy server architecture, which is constructed by protocols of secure two-party computation and secure Voronoi diagram algorithm. It not only preserves the data confidentiality and query privacy but also effectively resists the collusion between the cloud server and the data owners or users. Finally, extensive theoretical and experimental evaluations are presented to show that our proposed scheme achieves a superior balance between the security and query performance compared to other schemes. PMID:27669253

  9. Evaluation methodology for query-based scene understanding systems

    NASA Astrophysics Data System (ADS)

    Huster, Todd P.; Ross, Timothy D.; Culbertson, Jared L.

    2015-05-01

    In this paper, we are proposing a method for the principled evaluation of scene understanding systems in a query-based framework. We can think of a query-based scene understanding system as a generalization of typical sensor exploitation systems where instead of performing a narrowly defined task (e.g., detect, track, classify, etc.), the system can perform general user-defined tasks specified in a query language. Examples of this type of system have been developed as part of DARPA's Mathematics of Sensing, Exploitation, and Execution (MSEE) program. There is a body of literature on the evaluation of typical sensor exploitation systems, but the open-ended nature of the query interface introduces new aspects to the evaluation problem that have not been widely considered before. In this paper, we state the evaluation problem and propose an approach to efficiently learn about the quality of the system under test. We consider the objective of the evaluation to be to build a performance model of the system under test, and we rely on the principles of Bayesian experiment design to help construct and select optimal queries for learning about the parameters of that model.

  10. Secure Nearest Neighbor Query on Crowd-Sensing Data.

    PubMed

    Cheng, Ke; Wang, Liangmin; Zhong, Hong

    2016-09-22

    Nearest neighbor queries are fundamental in location-based services, and secure nearest neighbor queries mainly focus on how to securely and quickly retrieve the nearest neighbor in the outsourced cloud server. However, the previous big data system structure has changed because of the crowd-sensing data. On the one hand, sensing data terminals as the data owner are numerous and mistrustful, while, on the other hand, in most cases, the terminals find it difficult to finish many safety operation due to computation and storage capability constraints. In light of they Multi Owners and Multi Users (MOMU) situation in the crowd-sensing data cloud environment, this paper presents a secure nearest neighbor query scheme based on the proxy server architecture, which is constructed by protocols of secure two-party computation and secure Voronoi diagram algorithm. It not only preserves the data confidentiality and query privacy but also effectively resists the collusion between the cloud server and the data owners or users. Finally, extensive theoretical and experimental evaluations are presented to show that our proposed scheme achieves a superior balance between the security and query performance compared to other schemes.

  11. CSRQ: Communication-Efficient Secure Range Queries in Two-Tiered Sensor Networks

    PubMed Central

    Dai, Hua; Ye, Qingqun; Yang, Geng; Xu, Jia; He, Ruiliang

    2016-01-01

    In recent years, we have seen many applications of secure query in two-tiered wireless sensor networks. Storage nodes are responsible for storing data from nearby sensor nodes and answering queries from Sink. It is critical to protect data security from a compromised storage node. In this paper, the Communication-efficient Secure Range Query (CSRQ)—a privacy and integrity preserving range query protocol—is proposed to prevent attackers from gaining information of both data collected by sensor nodes and queries issued by Sink. To preserve privacy and integrity, in addition to employing the encoding mechanisms, a novel data structure called encrypted constraint chain is proposed, which embeds the information of integrity verification. Sink can use this encrypted constraint chain to verify the query result. The performance evaluation shows that CSRQ has lower communication cost than the current range query protocols. PMID:26907293

  12. QBIC project: querying images by content, using color, texture, and shape

    NASA Astrophysics Data System (ADS)

    Niblack, Carlton W.; Barber, Ron; Equitz, Will; Flickner, Myron D.; Glasman, Eduardo H.; Petkovic, Dragutin; Yanker, Peter; Faloutsos, Christos; Taubin, Gabriel

    1993-04-01

    In the query by image content (QBIC) project we are studying methods to query large on-line image databases using the images' content as the basis of the queries. Examples of the content we use include color, texture, and shape of image objects and regions. Potential applications include medical (`Give me other images that contain a tumor with a texture like this one'), photo-journalism (`Give me images that have blue at the top and red at the bottom'), and many others in art, fashion, cataloging, retailing, and industry. Key issues include derivation and computation of attributes of images and objects that provide useful query functionality, retrieval methods based on similarity as opposed to exact match, query by image example or user drawn image, the user interfaces, query refinement and navigation, high dimensional database indexing, and automatic and semi-automatic database population. We currently have a prototype system written in X/Motif and C running on an RS/6000 that allows a variety of queries, and a test database of over 1000 images and 1000 objects populated from commercially available photo clip art images. In this paper we present the main algorithms for color texture, shape and sketch query that we use, show example query results, and discuss future directions.

  13. Querying and Extracting Timeline Information from Road Traffic Sensor Data

    PubMed Central

    Imawan, Ardi; Indikawati, Fitri Indra; Kwon, Joonho; Rao, Praveen

    2016-01-01

    The escalation of traffic congestion in urban cities has urged many countries to use intelligent transportation system (ITS) centers to collect historical traffic sensor data from multiple heterogeneous sources. By analyzing historical traffic data, we can obtain valuable insights into traffic behavior. Many existing applications have been proposed with limited analysis results because of the inability to cope with several types of analytical queries. In this paper, we propose the QET (querying and extracting timeline information) system—a novel analytical query processing method based on a timeline model for road traffic sensor data. To address query performance, we build a TQ-index (timeline query-index) that exploits spatio-temporal features of timeline modeling. We also propose an intuitive timeline visualization method to display congestion events obtained from specified query parameters. In addition, we demonstrate the benefit of our system through a performance evaluation using a Busan ITS dataset and a Seattle freeway dataset. PMID:27563900

  14. Automatic building information model query generation

    DOE PAGES

    Jiang, Yufei; Yu, Nan; Ming, Jiang; ...

    2015-12-01

    Energy efficient building design and construction calls for extensive collaboration between different subfields of the Architecture, Engineering and Construction (AEC) community. Performing building design and construction engineering raises challenges on data integration and software interoperability. Using Building Information Modeling (BIM) data hub to host and integrate building models is a promising solution to address those challenges, which can ease building design information management. However, the partial model query mechanism of current BIM data hub collaboration model has several limitations, which prevents designers and engineers to take advantage of BIM. To address this problem, we propose a general and effective approachmore » to generate query code based on a Model View Definition (MVD). This approach is demonstrated through a software prototype called QueryGenerator. In conclusion, by demonstrating a case study using multi-zone air flow analysis, we show how our approach and tool can help domain experts to use BIM to drive building design with less labour and lower overhead cost.« less

  15. Automatic building information model query generation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jiang, Yufei; Yu, Nan; Ming, Jiang

    Energy efficient building design and construction calls for extensive collaboration between different subfields of the Architecture, Engineering and Construction (AEC) community. Performing building design and construction engineering raises challenges on data integration and software interoperability. Using Building Information Modeling (BIM) data hub to host and integrate building models is a promising solution to address those challenges, which can ease building design information management. However, the partial model query mechanism of current BIM data hub collaboration model has several limitations, which prevents designers and engineers to take advantage of BIM. To address this problem, we propose a general and effective approachmore » to generate query code based on a Model View Definition (MVD). This approach is demonstrated through a software prototype called QueryGenerator. In conclusion, by demonstrating a case study using multi-zone air flow analysis, we show how our approach and tool can help domain experts to use BIM to drive building design with less labour and lower overhead cost.« less

  16. Nearest private query based on quantum oblivious key distribution

    NASA Astrophysics Data System (ADS)

    Xu, Min; Shi, Run-hua; Luo, Zhen-yu; Peng, Zhen-wan

    2017-12-01

    Nearest private query is a special private query which involves two parties, a user and a data owner, where the user has a private input (e.g., an integer) and the data owner has a private data set, and the user wants to query which element in the owner's private data set is the nearest to his input without revealing their respective private information. In this paper, we first present a quantum protocol for nearest private query, which is based on quantum oblivious key distribution (QOKD). Compared to the classical related protocols, our protocol has the advantages of the higher security and the better feasibility, so it has a better prospect of applications.

  17. Labeling RDF Graphs for Linear Time and Space Querying

    NASA Astrophysics Data System (ADS)

    Furche, Tim; Weinzierl, Antonius; Bry, François

    Indices and data structures for web querying have mostly considered tree shaped data, reflecting the view of XML documents as tree-shaped. However, for RDF (and when querying ID/IDREF constraints in XML) data is indisputably graph-shaped. In this chapter, we first study existing indexing and labeling schemes for RDF and other graph datawith focus on support for efficient adjacency and reachability queries. For XML, labeling schemes are an important part of the widespread adoption of XML, in particular for mapping XML to existing (relational) database technology. However, the existing indexing and labeling schemes for RDF (and graph data in general) sacrifice one of the most attractive properties of XML labeling schemes, the constant time (and per-node space) test for adjacency (child) and reachability (descendant). In the second part, we introduce the first labeling scheme for RDF data that retains this property and thus achieves linear time and space processing of acyclic RDF queries on a significantly larger class of graphs than previous approaches (which are mostly limited to tree-shaped data). Finally, we show how this labeling scheme can be applied to (acyclic) SPARQL queries to obtain an evaluation algorithm with time and space complexity linear in the number of resources in the queried RDF graph.

  18. Concept-based query language approach to enterprise information systems

    NASA Astrophysics Data System (ADS)

    Niemi, Timo; Junkkari, Marko; Järvelin, Kalervo

    2014-01-01

    In enterprise information systems (EISs) it is necessary to model, integrate and compute very diverse data. In advanced EISs the stored data often are based both on structured (e.g. relational) and semi-structured (e.g. XML) data models. In addition, the ad hoc information needs of end-users may require the manipulation of data-oriented (structural), behavioural and deductive aspects of data. Contemporary languages capable of treating this kind of diversity suit only persons with good programming skills. In this paper we present a concept-oriented query language approach to manipulate this diversity so that the programming skill requirements are considerably reduced. In our query language, the features which need technical knowledge are hidden in application-specific concepts and structures. Therefore, users need not be aware of the underlying technology. Application-specific concepts and structures are represented by the modelling primitives of the extended RDOOM (relational deductive object-oriented modelling) which contains primitives for all crucial real world relationships (is-a relationship, part-of relationship, association), XML documents and views. Our query language also supports intensional and extensional-intensional queries, in addition to conventional extensional queries. In its query formulation, the end-user combines available application-specific concepts and structures through shared variables.

  19. Advanced SPARQL querying in small molecule databases.

    PubMed

    Galgonek, Jakub; Hurt, Tomáš; Michlíková, Vendula; Onderka, Petr; Schwarz, Jan; Vondrášek, Jiří

    2016-01-01

    In recent years, the Resource Description Framework (RDF) and the SPARQL query language have become more widely used in the area of cheminformatics and bioinformatics databases. These technologies allow better interoperability of various data sources and powerful searching facilities. However, we identified several deficiencies that make usage of such RDF databases restrictive or challenging for common users. We extended a SPARQL engine to be able to use special procedures inside SPARQL queries. This allows the user to work with data that cannot be simply precomputed and thus cannot be directly stored in the database. We designed an algorithm that checks a query against data ontology to identify possible user errors. This greatly improves query debugging. We also introduced an approach to visualize retrieved data in a user-friendly way, based on templates describing visualizations of resource classes. To integrate all of our approaches, we developed a simple web application. Our system was implemented successfully, and we demonstrated its usability on the ChEBI database transformed into RDF form. To demonstrate procedure call functions, we employed compound similarity searching based on OrChem. The application is publicly available at https://bioinfo.uochb.cas.cz/projects/chemRDF.

  20. Autocorrelation and Regularization of Query-Based Information Retrieval Scores

    DTIC Science & Technology

    2008-02-01

    of the most general information retrieval models [ Salton , 1968]. By treating a query as a very short document, documents and queries can be rep... Salton , 1971]. In the context of single link hierarchical clustering, Jardine and van Rijsbergen showed that ranking all k clusters and retrieving a...a document about “dogs”, then the system will always miss this document when a user queries “dog”. Salton recognized that a document’s representation

  1. Study on Mechanical Properties of Concrete Using Plastic Waste as an Aggregate

    NASA Astrophysics Data System (ADS)

    Jaivignesh, B.; Sofi, A.

    2017-07-01

    Disposal of large quantity of plastic causes land, water and air pollution etc.., so a study is conducted to recycle the plastic in concrete. This work investigates about the replacement of natural aggregate with non-biodegradable plastic aggregate made up of mixed plastic waste in concrete. Several tests are conducted such as compressive strength of cube, split tensile strength of cylinder, flexural strength test of prism to identify the properties and behavior of concrete using plastic aggregate. Replacement of fine aggregate weight by 10%, 15%, 20% with Plastic fine (PF) aggregate and for each replacement of fine aggregate 15%, 20%, 25% of coarse aggregate replacement also conducted with Plastic Coarse(PC) aggregate. In literatures reported that the addition of plastic aggregate in concrete causes the reduction of strength in concrete due to poor bonding between concrete and plastic aggregate, so addition of 0.3% of steel fiber by weight of cement in concrete is done to improve the concrete strength. Totally 60 cubes, 60 cylinders and 40 prisms are casted to identify the compressive strength, split tensile strength and flexural strength respectively. Casted specimens are tested at 7 and 28 days. The identified results from concrete using plastic aggregate are compared with conventional concrete. Result shows that reduction in mechanical properties of plastic aggregate added concrete. This reduction in strength is mainly due to poor bond strength between cement and plastic aggregate.

  2. Multi-field query expansion is effective for biomedical dataset retrieval

    PubMed Central

    2017-01-01

    Abstract In the context of the bioCADDIE challenge addressing information retrieval of biomedical datasets, we propose a method for retrieval of biomedical data sets with heterogenous schemas through query reformulation. In particular, the method proposed transforms the initial query into a multi-field query that is then enriched with terms that are likely to occur in the relevant datasets. We compare and evaluate two query expansion strategies, one based on the Rocchio method and another based on a biomedical lexicon. We then perform a comprehensive comparative evaluation of our method on the bioCADDIE dataset collection for biomedical retrieval. We demonstrate the effectiveness of our multi-field query method compared to two baselines, with MAP improved from 0.2171 and 0.2669 to 0.2996. We also show the benefits of query expansion, where the Rocchio expanstion method improves the MAP for our two baselines from 0.2171 and 0.2669 to 0.335. We show that the Rocchio query expansion method slightly outperforms the one based on the biomedical lexicon as a source of terms, with an improvement of roughly 3% for MAP. However, the query expansion method based on the biomedical lexicon is much less resource intensive since it does not require computation of any relevance feedback set or any initial execution of the query. Hence, in term of trade-off between efficiency, execution time and retrieval accuracy, we argue that the query expansion method based on the biomedical lexicon offers the best performance for a prototype biomedical data search engine intended to be used at a large scale. In the official bioCADDIE challenge results, although our approach is ranked seventh in terms of the infNDCG evaluation metric, it ranks second in term of P@10 and NDCG. Hence, the method proposed here provides overall good retrieval performance in relation to the approaches of other competitors. Consequently, the observations made in this paper should benefit the development of a Data

  3. Optimization on Fc for Improvement of Stability and Aggregation Resistance.

    PubMed

    Chen, Xiaobo; Zeng, Fang; Huang, Tao; Cheng, Liang; Liu, Huan; Gong, Rui

    2016-01-01

    Fc-based therapeutics including therapeutic full-size monoclonal antibodies (mAbs) and Fcfusion proteins represent fastest-growing market in biopharmaceutical industrial. However, one major challenge during development of Fc-based therapeutics is how to maintain their efficacy in clinic use. Many factors may lead to failure in final marketing. For example, the stability and aggregation resistance might not be high enough for bearing the disadvantages during fermentation, purification, formulation, storage, shipment and other steps in manufacture and sale. Low stability and high aggregation tendency lead to decreased bioactivity and increased risk of immunogenicity resulting in serious side effect. Because Fc is one of the major parts in monoclonal antibodies and Fc-fusion proteins, engineering of Fc to increase its stability and reduce or eliminate aggregation due to incorrect association are of great importance and could further extend the potential of Fc-based therapeutics. Lots of studies focus on Fc optimization for better physical and chemical characteristics and function by structured-based computer-aid rational design, high-throughput screening expression system selection and other methods. The identification of optimized Fc mutants increases the clinic potential of currently existed therapeutics mAbs and Fc-fusion proteins, and accelerates the development of new Fc-based therapeutics. Here we provide an overview of the related field, and discuss recent advances and future directions in optimization of Fc-based therapeutics with modified stability and aggregation resistance. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  4. Parasol: An Architecture for Cross-Cloud Federated Graph Querying

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lieberman, Michael; Choudhury, Sutanay; Hughes, Marisa

    2014-06-22

    Large scale data fusion of multiple datasets can often provide in- sights that examining datasets individually cannot. However, when these datasets reside in different data centers and cannot be collocated due to technical, administrative, or policy barriers, a unique set of problems arise that hamper querying and data fusion. To ad- dress these problems, a system and architecture named Parasol is presented that enables federated queries over graph databases residing in multiple clouds. Parasol’s design is flexible and requires only minimal assumptions for participant clouds. Query optimization techniques are also described that are compatible with Parasol’s lightweight architecture. Experiments onmore » a prototype implementation of Parasol indicate its suitability for cross-cloud federated graph queries.« less

  5. Visual Aggregate Analysis of Eligibility Features of Clinical Trials

    PubMed Central

    He, Zhe; Carini, Simona; Sim, Ida; Weng, Chunhua

    2015-01-01

    Objective To develop a method for profiling the collective populations targeted for recruitment by multiple clinical studies addressing the same medical condition using one eligibility feature each time. Methods Using a previously published database COMPACT as the backend, we designed a scalable method for visual aggregate analysis of clinical trial eligibility features. This method consists of four modules for eligibility feature frequency analysis, query builder, distribution analysis, and visualization, respectively. This method is capable of analyzing (1) frequently used qualitative and quantitative features for recruiting subjects for a selected medical condition, (2) distribution of study enrollment on consecutive value points or value intervals of each quantitative feature, and (3) distribution of studies on the boundary values, permissible value ranges, and value range widths of each feature. All analysis results were visualized using Google Charts API. Five recruited potential users assessed the usefulness of this method for identifying common patterns in any selected eligibility feature for clinical trial participant selection. Results We implemented this method as a Web-based analytical system called VITTA (Visual Analysis Tool of Clinical Study Target Populations). We illustrated the functionality of VITTA using two sample queries involving quantitative features BMI and HbA1c for conditions “hypertension” and “Type 2 diabetes”, respectively. The recruited potential users rated the user-perceived usefulness of VITTA with an average score of 86.4/100. Conclusions We contributed a novel aggregate analysis method to enable the interrogation of common patterns in quantitative eligibility criteria and the collective target populations of multiple related clinical studies. A larger-scale study is warranted to formally assess the usefulness of VITTA among clinical investigators and sponsors in various therapeutic areas. PMID:25615940

  6. Visual aggregate analysis of eligibility features of clinical trials.

    PubMed

    He, Zhe; Carini, Simona; Sim, Ida; Weng, Chunhua

    2015-04-01

    To develop a method for profiling the collective populations targeted for recruitment by multiple clinical studies addressing the same medical condition using one eligibility feature each time. Using a previously published database COMPACT as the backend, we designed a scalable method for visual aggregate analysis of clinical trial eligibility features. This method consists of four modules for eligibility feature frequency analysis, query builder, distribution analysis, and visualization, respectively. This method is capable of analyzing (1) frequently used qualitative and quantitative features for recruiting subjects for a selected medical condition, (2) distribution of study enrollment on consecutive value points or value intervals of each quantitative feature, and (3) distribution of studies on the boundary values, permissible value ranges, and value range widths of each feature. All analysis results were visualized using Google Charts API. Five recruited potential users assessed the usefulness of this method for identifying common patterns in any selected eligibility feature for clinical trial participant selection. We implemented this method as a Web-based analytical system called VITTA (Visual Analysis Tool of Clinical Study Target Populations). We illustrated the functionality of VITTA using two sample queries involving quantitative features BMI and HbA1c for conditions "hypertension" and "Type 2 diabetes", respectively. The recruited potential users rated the user-perceived usefulness of VITTA with an average score of 86.4/100. We contributed a novel aggregate analysis method to enable the interrogation of common patterns in quantitative eligibility criteria and the collective target populations of multiple related clinical studies. A larger-scale study is warranted to formally assess the usefulness of VITTA among clinical investigators and sponsors in various therapeutic areas. Copyright © 2015 Elsevier Inc. All rights reserved.

  7. Parallel Index and Query for Large Scale Data Analysis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chou, Jerry; Wu, Kesheng; Ruebel, Oliver

    2011-07-18

    Modern scientific datasets present numerous data management and analysis challenges. State-of-the-art index and query technologies are critical for facilitating interactive exploration of large datasets, but numerous challenges remain in terms of designing a system for process- ing general scientific datasets. The system needs to be able to run on distributed multi-core platforms, efficiently utilize underlying I/O infrastructure, and scale to massive datasets. We present FastQuery, a novel software framework that address these challenges. FastQuery utilizes a state-of-the-art index and query technology (FastBit) and is designed to process mas- sive datasets on modern supercomputing platforms. We apply FastQuery to processing ofmore » a massive 50TB dataset generated by a large scale accelerator modeling code. We demonstrate the scalability of the tool to 11,520 cores. Motivated by the scientific need to search for inter- esting particles in this dataset, we use our framework to reduce search time from hours to tens of seconds.« less

  8. Fast Inbound Top-K Query for Random Walk with Restart.

    PubMed

    Zhang, Chao; Jiang, Shan; Chen, Yucheng; Sun, Yidan; Han, Jiawei

    2015-09-01

    Random walk with restart (RWR) is widely recognized as one of the most important node proximity measures for graphs, as it captures the holistic graph structure and is robust to noise in the graph. In this paper, we study a novel query based on the RWR measure, called the inbound top-k (Ink) query. Given a query node q and a number k , the Ink query aims at retrieving k nodes in the graph that have the largest weighted RWR scores to q . Ink queries can be highly useful for various applications such as traffic scheduling, disease treatment, and targeted advertising. Nevertheless, none of the existing RWR computation techniques can accurately and efficiently process the Ink query in large graphs. We propose two algorithms, namely Squeeze and Ripple, both of which can accurately answer the Ink query in a fast and incremental manner. To identify the top- k nodes, Squeeze iteratively performs matrix-vector multiplication and estimates the lower and upper bounds for all the nodes in the graph. Ripple employs a more aggressive strategy by only estimating the RWR scores for the nodes falling in the vicinity of q , the nodes outside the vicinity do not need to be evaluated because their RWR scores are propagated from the boundary of the vicinity and thus upper bounded. Ripple incrementally expands the vicinity until the top- k result set can be obtained. Our extensive experiments on real-life graph data sets show that Ink queries can retrieve interesting results, and the proposed algorithms are orders of magnitude faster than state-of-the-art method.

  9. Query-by-example surgical activity detection.

    PubMed

    Gao, Yixin; Vedula, S Swaroop; Lee, Gyusung I; Lee, Mija R; Khudanpur, Sanjeev; Hager, Gregory D

    2016-06-01

    Easy acquisition of surgical data opens many opportunities to automate skill evaluation and teaching. Current technology to search tool motion data for surgical activity segments of interest is limited by the need for manual pre-processing, which can be prohibitive at scale. We developed a content-based information retrieval method, query-by-example (QBE), to automatically detect activity segments within surgical data recordings of long duration that match a query. The example segment of interest (query) and the surgical data recording (target trial) are time series of kinematics. Our approach includes an unsupervised feature learning module using a stacked denoising autoencoder (SDAE), two scoring modules based on asymmetric subsequence dynamic time warping (AS-DTW) and template matching, respectively, and a detection module. A distance matrix of the query against the trial is computed using the SDAE features, followed by AS-DTW combined with template scoring, to generate a ranked list of candidate subsequences (substrings). To evaluate the quality of the ranked list against the ground-truth, thresholding conventional DTW distances and bipartite matching are applied. We computed the recall, precision, F1-score, and a Jaccard index-based score on three experimental setups. We evaluated our QBE method using a suture throw maneuver as the query, on two tool motion datasets (JIGSAWS and MISTIC-SL) captured in a training laboratory. We observed a recall of 93, 90 and 87 % and a precision of 93, 91, and 88 % with same surgeon same trial (SSST), same surgeon different trial (SSDT) and different surgeon (DS) experiment setups on JIGSAWS, and a recall of 87, 81 and 75 % and a precision of 72, 61, and 53 % with SSST, SSDT and DS experiment setups on MISTIC-SL, respectively. We developed a novel, content-based information retrieval method to automatically detect multiple instances of an activity within long surgical recordings. Our method demonstrated adequate recall

  10. Geometric Representations of Condition Queries on Three-Dimensional Vector Fields

    NASA Technical Reports Server (NTRS)

    Henze, Chris

    1999-01-01

    Condition queries on distributed data ask where particular conditions are satisfied. It is possible to represent condition queries as geometric objects by plotting field data in various spaces derived from the data, and by selecting loci within these derived spaces which signify the desired conditions. Rather simple geometric partitions of derived spaces can represent complex condition queries because much complexity can be encapsulated in the derived space mapping itself A geometric view of condition queries provides a useful conceptual unification, allowing one to intuitively understand many existing vector field feature detection algorithms -- and to design new ones -- as variations on a common theme. A geometric representation of condition queries also provides a simple and coherent basis for computer implementation, reducing a wide variety of existing and potential vector field feature detection techniques to a few simple geometric operations.

  11. LAILAPS-QSM: A RESTful API and JAVA library for semantic query suggestions.

    PubMed

    Chen, Jinbo; Scholz, Uwe; Zhou, Ruonan; Lange, Matthias

    2018-03-01

    In order to access and filter content of life-science databases, full text search is a widely applied query interface. But its high flexibility and intuitiveness is paid for with potentially imprecise and incomplete query results. To reduce this drawback, query assistance systems suggest those combinations of keywords with the highest potential to match most of the relevant data records. Widespread approaches are syntactic query corrections that avoid misspelling and support expansion of words by suffixes and prefixes. Synonym expansion approaches apply thesauri, ontologies, and query logs. All need laborious curation and maintenance. Furthermore, access to query logs is in general restricted. Approaches that infer related queries by their query profile like research field, geographic location, co-authorship, affiliation etc. require user's registration and its public accessibility that contradict privacy concerns. To overcome these drawbacks, we implemented LAILAPS-QSM, a machine learning approach that reconstruct possible linguistic contexts of a given keyword query. The context is referred from the text records that are stored in the databases that are going to be queried or extracted for a general purpose query suggestion from PubMed abstracts and UniProt data. The supplied tool suite enables the pre-processing of these text records and the further computation of customized distributed word vectors. The latter are used to suggest alternative keyword queries. An evaluated of the query suggestion quality was done for plant science use cases. Locally present experts enable a cost-efficient quality assessment in the categories trait, biological entity, taxonomy, affiliation, and metabolic function which has been performed using ontology term similarities. LAILAPS-QSM mean information content similarity for 15 representative queries is 0.70, whereas 34% have a score above 0.80. In comparison, the information content similarity for human expert made query suggestions

  12. DISPAQ: Distributed Profitable-Area Query from Big Taxi Trip Data.

    PubMed

    Putri, Fadhilah Kurnia; Song, Giltae; Kwon, Joonho; Rao, Praveen

    2017-09-25

    One of the crucial problems for taxi drivers is to efficiently locate passengers in order to increase profits. The rapid advancement and ubiquitous penetration of Internet of Things (IoT) technology into transportation industries enables us to provide taxi drivers with locations that have more potential passengers (more profitable areas) by analyzing and querying taxi trip data. In this paper, we propose a query processing system, called Distributed Profitable-Area Query ( DISPAQ ) which efficiently identifies profitable areas by exploiting the Apache Software Foundation's Spark framework and a MongoDB database. DISPAQ first maintains a profitable-area query index (PQ-index) by extracting area summaries and route summaries from raw taxi trip data. It then identifies candidate profitable areas by searching the PQ-index during query processing. Then, it exploits a Z-Skyline algorithm, which is an extension of skyline processing with a Z-order space filling curve, to quickly refine the candidate profitable areas. To improve the performance of distributed query processing, we also propose local Z-Skyline optimization, which reduces the number of dominant tests by distributing killer profitable areas to each cluster node. Through extensive evaluation with real datasets, we demonstrate that our DISPAQ system provides a scalable and efficient solution for processing profitable-area queries from huge amounts of big taxi trip data.

  13. iSMART: Ontology-based Semantic Query of CDA Documents

    PubMed Central

    Liu, Shengping; Ni, Yuan; Mei, Jing; Li, Hanyu; Xie, Guotong; Hu, Gang; Liu, Haifeng; Hou, Xueqiao; Pan, Yue

    2009-01-01

    The Health Level 7 Clinical Document Architecture (CDA) is widely accepted as the format for electronic clinical document. With the rich ontological references in CDA documents, the ontology-based semantic query could be performed to retrieve CDA documents. In this paper, we present iSMART (interactive Semantic MedicAl Record reTrieval), a prototype system designed for ontology-based semantic query of CDA documents. The clinical information in CDA documents will be extracted into RDF triples by a declarative XML to RDF transformer. An ontology reasoner is developed to infer additional information by combining the background knowledge from SNOMED CT ontology. Then an RDF query engine is leveraged to enable the semantic queries. This system has been evaluated using the real clinical documents collected from a large hospital in southern China. PMID:20351883

  14. Automatic query formulations in information retrieval.

    PubMed

    Salton, G; Buckley, C; Fox, E A

    1983-07-01

    Modern information retrieval systems are designed to supply relevant information in response to requests received from the user population. In most retrieval environments the search requests consist of keywords, or index terms, interrelated by appropriate Boolean operators. Since it is difficult for untrained users to generate effective Boolean search requests, trained search intermediaries are normally used to translate original statements of user need into useful Boolean search formulations. Methods are introduced in this study which reduce the role of the search intermediaries by making it possible to generate Boolean search formulations completely automatically from natural language statements provided by the system patrons. Frequency considerations are used automatically to generate appropriate term combinations as well as Boolean connectives relating the terms. Methods are covered to produce automatic query formulations both in a standard Boolean logic system, as well as in an extended Boolean system in which the strict interpretation of the connectives is relaxed. Experimental results are supplied to evaluate the effectiveness of the automatic query formulation process, and methods are described for applying the automatic query formulation process in practice.

  15. Improving accuracy for identifying related PubMed queries by an integrated approach.

    PubMed

    Lu, Zhiyong; Wilbur, W John

    2009-10-01

    PubMed is the most widely used tool for searching biomedical literature online. As with many other online search tools, a user often types a series of multiple related queries before retrieving satisfactory results to fulfill a single information need. Meanwhile, it is also a common phenomenon to see a user type queries on unrelated topics in a single session. In order to study PubMed users' search strategies, it is necessary to be able to automatically separate unrelated queries and group together related queries. Here, we report a novel approach combining both lexical and contextual analyses for segmenting PubMed query sessions and identifying related queries and compare its performance with the previous approach based solely on concept mapping. We experimented with our integrated approach on sample data consisting of 1539 pairs of consecutive user queries in 351 user sessions. The prediction results of 1396 pairs agreed with the gold-standard annotations, achieving an overall accuracy of 90.7%. This demonstrates that our approach is significantly better than the previously published method. By applying this approach to a one day query log of PubMed, we found that a significant proportion of information needs involved more than one PubMed query, and that most of the consecutive queries for the same information need are lexically related. Finally, the proposed PubMed distance is shown to be an accurate and meaningful measure for determining the contextual similarity between biological terms. The integrated approach can play a critical role in handling real-world PubMed query log data as is demonstrated in our experiments.

  16. cPath: open source software for collecting, storing, and querying biological pathways

    PubMed Central

    Cerami, Ethan G; Bader, Gary D; Gross, Benjamin E; Sander, Chris

    2006-01-01

    Background Biological pathways, including metabolic pathways, protein interaction networks, signal transduction pathways, and gene regulatory networks, are currently represented in over 220 diverse databases. These data are crucial for the study of specific biological processes, including human diseases. Standard exchange formats for pathway information, such as BioPAX, CellML, SBML and PSI-MI, enable convenient collection of this data for biological research, but mechanisms for common storage and communication are required. Results We have developed cPath, an open source database and web application for collecting, storing, and querying biological pathway data. cPath makes it easy to aggregate custom pathway data sets available in standard exchange formats from multiple databases, present pathway data to biologists via a customizable web interface, and export pathway data via a web service to third-party software, such as Cytoscape, for visualization and analysis. cPath is software only, and does not include new pathway information. Key features include: a built-in identifier mapping service for linking identical interactors and linking to external resources; built-in support for PSI-MI and BioPAX standard pathway exchange formats; a web service interface for searching and retrieving pathway data sets; and thorough documentation. The cPath software is freely available under the LGPL open source license for academic and commercial use. Conclusion cPath is a robust, scalable, modular, professional-grade software platform for collecting, storing, and querying biological pathways. It can serve as the core data handling component in information systems for pathway visualization, analysis and modeling. PMID:17101041

  17. cPath: open source software for collecting, storing, and querying biological pathways.

    PubMed

    Cerami, Ethan G; Bader, Gary D; Gross, Benjamin E; Sander, Chris

    2006-11-13

    Biological pathways, including metabolic pathways, protein interaction networks, signal transduction pathways, and gene regulatory networks, are currently represented in over 220 diverse databases. These data are crucial for the study of specific biological processes, including human diseases. Standard exchange formats for pathway information, such as BioPAX, CellML, SBML and PSI-MI, enable convenient collection of this data for biological research, but mechanisms for common storage and communication are required. We have developed cPath, an open source database and web application for collecting, storing, and querying biological pathway data. cPath makes it easy to aggregate custom pathway data sets available in standard exchange formats from multiple databases, present pathway data to biologists via a customizable web interface, and export pathway data via a web service to third-party software, such as Cytoscape, for visualization and analysis. cPath is software only, and does not include new pathway information. Key features include: a built-in identifier mapping service for linking identical interactors and linking to external resources; built-in support for PSI-MI and BioPAX standard pathway exchange formats; a web service interface for searching and retrieving pathway data sets; and thorough documentation. The cPath software is freely available under the LGPL open source license for academic and commercial use. cPath is a robust, scalable, modular, professional-grade software platform for collecting, storing, and querying biological pathways. It can serve as the core data handling component in information systems for pathway visualization, analysis and modeling.

  18. Multi-field query expansion is effective for biomedical dataset retrieval.

    PubMed

    Bouadjenek, Mohamed Reda; Verspoor, Karin

    2017-01-01

    In the context of the bioCADDIE challenge addressing information retrieval of biomedical datasets, we propose a method for retrieval of biomedical data sets with heterogenous schemas through query reformulation. In particular, the method proposed transforms the initial query into a multi-field query that is then enriched with terms that are likely to occur in the relevant datasets. We compare and evaluate two query expansion strategies, one based on the Rocchio method and another based on a biomedical lexicon. We then perform a comprehensive comparative evaluation of our method on the bioCADDIE dataset collection for biomedical retrieval. We demonstrate the effectiveness of our multi-field query method compared to two baselines, with MAP improved from 0.2171 and 0.2669 to 0.2996. We also show the benefits of query expansion, where the Rocchio expanstion method improves the MAP for our two baselines from 0.2171 and 0.2669 to 0.335. We show that the Rocchio query expansion method slightly outperforms the one based on the biomedical lexicon as a source of terms, with an improvement of roughly 3% for MAP. However, the query expansion method based on the biomedical lexicon is much less resource intensive since it does not require computation of any relevance feedback set or any initial execution of the query. Hence, in term of trade-off between efficiency, execution time and retrieval accuracy, we argue that the query expansion method based on the biomedical lexicon offers the best performance for a prototype biomedical data search engine intended to be used at a large scale. In the official bioCADDIE challenge results, although our approach is ranked seventh in terms of the infNDCG evaluation metric, it ranks second in term of P@10 and NDCG. Hence, the method proposed here provides overall good retrieval performance in relation to the approaches of other competitors. Consequently, the observations made in this paper should benefit the development of a Data Discovery

  19. Sleep-wake time perception varies by direct or indirect query.

    PubMed

    Alameddine, Y; Ellenbogen, J M; Bianchi, M T

    2015-01-15

    The diagnosis of insomnia rests on self-report of difficulty initiating or maintaining sleep. However, subjective reports may be unreliable, and possibly may vary by the method of inquiry. We investigated this possibility by comparing within-individual response to direct versus indirect time queries after overnight polysomnography. We obtained self-reported sleep-wake times via morning questionnaires in 879 consecutive adult diagnostic polysomnograms. Responses were compared within subjects (direct versus indirect query) and across groups defined by apnea-hypopnea index and by self-reported insomnia symptoms in pre-sleep questionnaires. Direct queries required a time duration response, while indirect queries required clock times from which we calculated time durations. Direct and indirect queries of sleep latency were the same in only 41% of cases, and total sleep time queries matched in only 5.4%. For both latency and total sleep, the most common discrepancy involved the indirect value being larger than the direct response. The discrepancy between direct and indirect queries was not related to objective sleep metrics. The degree of discrepancy was not related to the presence of insomnia symptoms, although patients reporting insomnia symptoms showed underestimation of total sleep duration by direct response. Self-reported sleep latency and total sleep time are often internally inconsistent when comparing direct and indirect survey queries of each measure. These discrepancies represent substantive challenges to effective clinical practice, particularly when diagnosis and management depends on self-reported sleep patterns, as with insomnia. Although self-reported sleep-wake times remains fundamental to clinical practice, objective measures provide clinically relevant adjunctive information. © 2015 American Academy of Sleep Medicine.

  20. Analyzing Medical Image Search Behavior: Semantics and Prediction of Query Results.

    PubMed

    De-Arteaga, Maria; Eggel, Ivan; Kahn, Charles E; Müller, Henning

    2015-10-01

    Log files of information retrieval systems that record user behavior have been used to improve the outcomes of retrieval systems, understand user behavior, and predict events. In this article, a log file of the ARRS GoldMiner search engine containing 222,005 consecutive queries is analyzed. Time stamps are available for each query, as well as masked IP addresses, which enables to identify queries from the same person. This article describes the ways in which physicians (or Internet searchers interested in medical images) search and proposes potential improvements by suggesting query modifications. For example, many queries contain only few terms and therefore are not specific; others contain spelling mistakes or non-medical terms that likely lead to poor or empty results. One of the goals of this report is to predict the number of results a query will have since such a model allows search engines to automatically propose query modifications in order to avoid result lists that are empty or too large. This prediction is made based on characteristics of the query terms themselves. Prediction of empty results has an accuracy above 88%, and thus can be used to automatically modify the query to avoid empty result sets for a user. The semantic analysis and data of reformulations done by users in the past can aid the development of better search systems, particularly to improve results for novice users. Therefore, this paper gives important ideas to better understand how people search and how to use this knowledge to improve the performance of specialized medical search engines.

  1. Query Log Analysis of an Electronic Health Record Search Engine

    PubMed Central

    Yang, Lei; Mei, Qiaozhu; Zheng, Kai; Hanauer, David A.

    2011-01-01

    We analyzed a longitudinal collection of query logs of a full-text search engine designed to facilitate information retrieval in electronic health records (EHR). The collection, 202,905 queries and 35,928 user sessions recorded over a course of 4 years, represents the information-seeking behavior of 533 medical professionals, including frontline practitioners, coding personnel, patient safety officers, and biomedical researchers for patient data stored in EHR systems. In this paper, we present descriptive statistics of the queries, a categorization of information needs manifested through the queries, as well as temporal patterns of the users’ information-seeking behavior. The results suggest that information needs in medical domain are substantially more sophisticated than those that general-purpose web search engines need to accommodate. Therefore, we envision there exists a significant challenge, along with significant opportunities, to provide intelligent query recommendations to facilitate information retrieval in EHR. PMID:22195150

  2. VPipe: Virtual Pipelining for Scheduling of DAG Stream Query Plans

    NASA Astrophysics Data System (ADS)

    Wang, Song; Gupta, Chetan; Mehta, Abhay

    There are data streams all around us that can be harnessed for tremendous business and personal advantage. For an enterprise-level stream processing system such as CHAOS [1] (Continuous, Heterogeneous Analytic Over Streams), handling of complex query plans with resource constraints is challenging. While several scheduling strategies exist for stream processing, efficient scheduling of complex DAG query plans is still largely unsolved. In this paper, we propose a novel execution scheme for scheduling complex directed acyclic graph (DAG) query plans with meta-data enriched stream tuples. Our solution, called Virtual Pipelined Chain (or VPipe Chain for short), effectively extends the "Chain" pipelining scheduling approach to complex DAG query plans.

  3. End-User Use of Data Base Query Language: Pros and Cons.

    ERIC Educational Resources Information Center

    Nicholes, Walter

    1988-01-01

    Man-machine interface, the concept of a computer "query," a review of database technology, and a description of the use of query languages at Brigham Young University are discussed. The pros and cons of end-user use of database query languages are explored. (Author/MLW)

  4. Bat-Inspired Algorithm Based Query Expansion for Medical Web Information Retrieval.

    PubMed

    Khennak, Ilyes; Drias, Habiba

    2017-02-01

    With the increasing amount of medical data available on the Web, looking for health information has become one of the most widely searched topics on the Internet. Patients and people of several backgrounds are now using Web search engines to acquire medical information, including information about a specific disease, medical treatment or professional advice. Nonetheless, due to a lack of medical knowledge, many laypeople have difficulties in forming appropriate queries to articulate their inquiries, which deem their search queries to be imprecise due the use of unclear keywords. The use of these ambiguous and vague queries to describe the patients' needs has resulted in a failure of Web search engines to retrieve accurate and relevant information. One of the most natural and promising method to overcome this drawback is Query Expansion. In this paper, an original approach based on Bat Algorithm is proposed to improve the retrieval effectiveness of query expansion in medical field. In contrast to the existing literature, the proposed approach uses Bat Algorithm to find the best expanded query among a set of expanded query candidates, while maintaining low computational complexity. Moreover, this new approach allows the determination of the length of the expanded query empirically. Numerical results on MEDLINE, the on-line medical information database, show that the proposed approach is more effective and efficient compared to the baseline.

  5. Active Learning by Querying Informative and Representative Examples.

    PubMed

    Huang, Sheng-Jun; Jin, Rong; Zhou, Zhi-Hua

    2014-10-01

    Active learning reduces the labeling cost by iteratively selecting the most valuable data to query their labels. It has attracted a lot of interests given the abundance of unlabeled data and the high cost of labeling. Most active learning approaches select either informative or representative unlabeled instances to query their labels, which could significantly limit their performance. Although several active learning algorithms were proposed to combine the two query selection criteria, they are usually ad hoc in finding unlabeled instances that are both informative and representative. We address this limitation by developing a principled approach, termed QUIRE, based on the min-max view of active learning. The proposed approach provides a systematic way for measuring and combining the informativeness and representativeness of an unlabeled instance. Further, by incorporating the correlation among labels, we extend the QUIRE approach to multi-label learning by actively querying instance-label pairs. Extensive experimental results show that the proposed QUIRE approach outperforms several state-of-the-art active learning approaches in both single-label and multi-label learning.

  6. Which factors predict the time spent answering queries to a drug information centre?

    PubMed Central

    Reppe, Linda A.; Spigset, Olav

    2010-01-01

    Objective To develop a model based upon factors able to predict the time spent answering drug-related queries to Norwegian drug information centres (DICs). Setting and method Drug-related queries received at 5 DICs in Norway from March to May 2007 were randomly assigned to 20 employees until each of them had answered a minimum of five queries. The employees reported the number of drugs involved, the type of literature search performed, and whether the queries were considered judgmental or not, using a specifically developed scoring system. Main outcome measures The scores of these three factors were added together to define a workload score for each query. Workload and its individual factors were subsequently related to the measured time spent answering the queries by simple or multiple linear regression analyses. Results Ninety-six query/answer pairs were analyzed. Workload significantly predicted the time spent answering the queries (adjusted R2 = 0.22, P < 0.001). Literature search was the individual factor best predicting the time spent answering the queries (adjusted R2 = 0.17, P < 0.001), and this variable also contributed the most in the multiple regression analyses. Conclusion The most important workload factor predicting the time spent handling the queries in this study was the type of literature search that had to be performed. The categorisation of queries as judgmental or not, also affected the time spent answering the queries. The number of drugs involved did not significantly influence the time spent answering drug information queries. PMID:20922480

  7. SPARQLGraph: a web-based platform for graphically querying biological Semantic Web databases.

    PubMed

    Schweiger, Dominik; Trajanoski, Zlatko; Pabinger, Stephan

    2014-08-15

    Semantic Web has established itself as a framework for using and sharing data across applications and database boundaries. Here, we present a web-based platform for querying biological Semantic Web databases in a graphical way. SPARQLGraph offers an intuitive drag & drop query builder, which converts the visual graph into a query and executes it on a public endpoint. The tool integrates several publicly available Semantic Web databases, including the databases of the just recently released EBI RDF platform. Furthermore, it provides several predefined template queries for answering biological questions. Users can easily create and save new query graphs, which can also be shared with other researchers. This new graphical way of creating queries for biological Semantic Web databases considerably facilitates usability as it removes the requirement of knowing specific query languages and database structures. The system is freely available at http://sparqlgraph.i-med.ac.at.

  8. Aggregation Behavior and a Putative Aggregation Pheromone in Sugar Beet Root Maggot Flies (Diptera: Ulidiidae)

    PubMed Central

    Emmert, Susan Y.; Tindall, Kelly; Ding, Hongjian; Boetel, Mark A.; Rajabaskar, D.; Eigenbrode, Sanford D.

    2017-01-01

    Male-biased aggregations of sugar beet root maggot, Tetanops myopaeformis (Röder) (Diptera: Ulidiidae), flies were observed on utility poles near sugar beet (Beta vulgaris L. [Chenopodiaceae]) fields in southern Idaho; this contrasts with the approximately equal sex ratio typically observed within fields. Peak observation of mating pairs coincided with peak diurnal abundance of flies. Volatiles released by individual male and female flies were sampled from 08:00 to 24:00 hours in the laboratory using solid-phase microextraction and analyzed using gas chromatography/mass spectrometry (GC/MS). Eleven compounds were uniquely detected from males. Three of these compounds (2-undecanol, 2-decanol, and sec-nonyl acetate) were detected in greater quantities during 12:00–24:00 hours than during 08:00–12:00 hours. The remaining eight compounds uniquely detected from males did not exhibit temporal trends in release. Both sexes produced 2-nonanol, but males produced substantially higher (ca. 80-fold) concentrations of this compound than females, again peaking after 12:00 hours. The temporal synchrony among male aggregation behavior, peak mating rates, and release of certain volatile compounds by males suggest that T. myopaeformis flies exhibit lekking behavior and produce an associated pheromone. Field assays using synthetic blends of the putative aggregation pheromone showed evidence of attraction in both females and males. PMID:28423428

  9. Graphical modeling and query language for hospitals.

    PubMed

    Barzdins, Janis; Barzdins, Juris; Rencis, Edgars; Sostaks, Agris

    2013-01-01

    So far there has been little evidence that implementation of the health information technologies (HIT) is leading to health care cost savings. One of the reasons for this lack of impact by the HIT likely lies in the complexity of the business process ownership in the hospitals. The goal of our research is to develop a business model-based method for hospital use which would allow doctors to retrieve directly the ad-hoc information from various hospital databases. We have developed a special domain-specific process modelling language called the MedMod. Formally, we define the MedMod language as a profile on UML Class diagrams, but we also demonstrate it on examples, where we explain the semantics of all its elements informally. Moreover, we have developed the Process Query Language (PQL) that is based on MedMod process definition language. The purpose of PQL is to allow a doctor querying (filtering) runtime data of hospital's processes described using MedMod. The MedMod language tries to overcome deficiencies in existing process modeling languages, allowing to specify the loosely-defined sequence of the steps to be performed in the clinical process. The main advantages of PQL are in two main areas - usability and efficiency. They are: 1) the view on data through "glasses" of familiar process, 2) the simple and easy-to-perceive means of setting filtering conditions require no more expertise than using spreadsheet applications, 3) the dynamic response to each step in construction of the complete query that shortens the learning curve greatly and reduces the error rate, and 4) the selected means of filtering and data retrieving allows to execute queries in O(n) time regarding the size of the dataset. We are about to continue developing this project with three further steps. First, we are planning to develop user-friendly graphical editors for the MedMod process modeling and query languages. The second step is to do evaluation of usability the proposed language and tool

  10. Improving accuracy for identifying related PubMed queries by an integrated approach

    PubMed Central

    Lu, Zhiyong; Wilbur, W. John

    2009-01-01

    PubMed is the most widely used tool for searching biomedical literature online. As with many other online search tools, a user often types a series of multiple related queries before retrieving satisfactory results to fulfill a single information need. Meanwhile, it is also a common phenomenon to see a user type queries on unrelated topics in a single session. In order to study PubMed users’ search strategies, it is necessary to be able to automatically separate unrelated queries and group together related queries. Here, we report a novel approach combining both lexical and contextual analyses for segmenting PubMed query sessions and identifying related queries and compare its performance with the previous approach based solely on concept mapping. We experimented with our integrated approach on sample data consisting of 1,539 pairs of consecutive user queries in 351 user sessions. The prediction results of 1,396 pairs agreed with the gold-standard annotations, achieving an overall accuracy of 90.7%. This demonstrates that our approach is significantly better than the previously published method. By applying this approach to a one day query log of PubMed, we found that a significant proportion of information needs involved more than one PubMed query, and that most of the consecutive queries for the same information need are lexically related. Finally, the proposed PubMed distance is shown to be an accurate and meaningful measure for determining the contextual similarity between biological terms. The integrated approach can play a critical role in handling real-world PubMed query log data as is demonstrated in our experiments. PMID:19162232

  11. Searching the Web: The Public and Their Queries.

    ERIC Educational Resources Information Center

    Spink, Amanda; Wolfram, Dietmar; Jansen, Major B. J.; Saracevic, Tefko

    2001-01-01

    Reports findings from a study of searching behavior by over 200,000 users of the Excite search engine. Analysis of over one million queries revealed most people use few search terms, few modified queries, view few Web pages, and rarely use advanced search features. Concludes that Web searching by the public differs significantly from searching of…

  12. Querying graphs in protein-protein interactions networks using feedback vertex set.

    PubMed

    Blin, Guillaume; Sikora, Florian; Vialette, Stéphane

    2010-01-01

    Recent techniques increase rapidly the amount of our knowledge on interactions between proteins. The interpretation of these new information depends on our ability to retrieve known substructures in the data, the Protein-Protein Interactions (PPIs) networks. In an algorithmic point of view, it is an hard task since it often leads to NP-hard problems. To overcome this difficulty, many authors have provided tools for querying patterns with a restricted topology, i.e., paths or trees in PPI networks. Such restriction leads to the development of fixed parameter tractable (FPT) algorithms, which can be practicable for restricted sizes of queries. Unfortunately, Graph Homomorphism is a W[1]-hard problem, and hence, no FPT algorithm can be found when patterns are in the shape of general graphs. However, Dost et al. gave an algorithm (which is not implemented) to query graphs with a bounded treewidth in PPI networks (the treewidth of the query being involved in the time complexity). In this paper, we propose another algorithm for querying pattern in the shape of graphs, also based on dynamic programming and the color-coding technique. To transform graphs queries into trees without loss of informations, we use feedback vertex set coupled to a node duplication mechanism. Hence, our algorithm is FPT for querying graphs with a bounded size of their feedback vertex set. It gives an alternative to the treewidth parameter, which can be better or worst for a given query. We provide a python implementation which allows us to validate our implementation on real data. Especially, we retrieve some human queries in the shape of graphs into the fly PPI network.

  13. Dynamic Querying of Mass-Storage RDF Data with Rule-Based Entailment Regimes

    NASA Astrophysics Data System (ADS)

    Ianni, Giovambattista; Krennwallner, Thomas; Martello, Alessandra; Polleres, Axel

    RDF Schema (RDFS) as a lightweight ontology language is gaining popularity and, consequently, tools for scalable RDFS inference and querying are needed. SPARQL has become recently a W3C standard for querying RDF data, but it mostly provides means for querying simple RDF graphs only, whereas querying with respect to RDFS or other entailment regimes is left outside the current specification. In this paper, we show that SPARQL faces certain unwanted ramifications when querying ontologies in conjunction with RDF datasets that comprise multiple named graphs, and we provide an extension for SPARQL that remedies these effects. Moreover, since RDFS inference has a close relationship with logic rules, we generalize our approach to select a custom ruleset for specifying inferences to be taken into account in a SPARQL query. We show that our extensions are technically feasible by providing benchmark results for RDFS querying in our prototype system GiaBATA, which uses Datalog coupled with a persistent Relational Database as a back-end for implementing SPARQL with dynamic rule-based inference. By employing different optimization techniques like magic set rewriting our system remains competitive with state-of-the-art RDFS querying systems.

  14. Representation and alignment of sung queries for music information retrieval

    NASA Astrophysics Data System (ADS)

    Adams, Norman H.; Wakefield, Gregory H.

    2005-09-01

    The pursuit of robust and rapid query-by-humming systems, which search melodic databases using sung queries, is a common theme in music information retrieval. The retrieval aspect of this database problem has received considerable attention, whereas the front-end processing of sung queries and the data structure to represent melodies has been based on musical intuition and historical momentum. The present work explores three time series representations for sung queries: a sequence of notes, a ``smooth'' pitch contour, and a sequence of pitch histograms. The performance of the three representations is compared using a collection of naturally sung queries. It is found that the most robust performance is achieved by the representation with highest dimension, the smooth pitch contour, but that this representation presents a formidable computational burden. For all three representations, it is necessary to align the query and target in order to achieve robust performance. The computational cost of the alignment is quadratic, hence it is necessary to keep the dimension small for rapid retrieval. Accordingly, iterative deepening is employed to achieve both robust performance and rapid retrieval. Finally, the conventional iterative framework is expanded to adapt the alignment constraints based on previous iterations, further expediting retrieval without degrading performance.

  15. Characterization of seed nuclei in glucagon aggregation using light scattering methods and field-flow fractionation

    PubMed Central

    Hoppe, Cindy C; Nguyen, Lida T; Kirsch, Lee E; Wiencek, John M

    2008-01-01

    Background Glucagon is a peptide hormone with many uses as a therapeutic agent, including the emergency treatment of hypoglycemia. Physical instability of glucagon in solution leads to problems with the manufacture, formulation, and delivery of this pharmaceutical product. Glucagon has been shown to aggregate and form fibrils and gels in vitro. Small oligomeric precursors serve to initiate and nucleate the aggregation process. In this study, these initial aggregates, or seed nuclei, are characterized in bulk solution using light scattering methods and field-flow fractionation. Results High molecular weight aggregates of glucagon were detected in otherwise monomeric solutions using light scattering techniques. These aggregates were detected upon initial mixing of glucagon powder in dilute HCl and NaOH. In the pharmaceutically relevant case of acidic glucagon, the removal of aggregates by filtration significantly slowed the aggregation process. Field-flow fractionation was used to separate aggregates from monomeric glucagon and determine relative mass. The molar mass of the large aggregates was shown to grow appreciably over time as the glucagon solutions gelled. Conclusion The results of this study indicate that initial glucagon solutions are predominantly monomeric, but contain small quantities of large aggregates. These results suggest that the initial aggregates are seed nuclei, or intermediates which catalyze the aggregation process, even at low concentrations. PMID:18613970

  16. Data Processing on Database Management Systems with Fuzzy Query

    NASA Astrophysics Data System (ADS)

    Şimşek, Irfan; Topuz, Vedat

    In this study, a fuzzy query tool (SQLf) for non-fuzzy database management systems was developed. In addition, samples of fuzzy queries were made by using real data with the tool developed in this study. Performance of SQLf was tested with the data about the Marmara University students' food grant. The food grant data were collected in MySQL database by using a form which had been filled on the web. The students filled a form on the web to describe their social and economical conditions for the food grant request. This form consists of questions which have fuzzy and crisp answers. The main purpose of this fuzzy query is to determine the students who deserve the grant. The SQLf easily found the eligible students for the grant through predefined fuzzy values. The fuzzy query tool (SQLf) could be used easily with other database system like ORACLE and SQL server.

  17. A high performance, ad-hoc, fuzzy query processing system for relational databases

    NASA Technical Reports Server (NTRS)

    Mansfield, William H., Jr.; Fleischman, Robert M.

    1992-01-01

    Database queries involving imprecise or fuzzy predicates are currently an evolving area of academic and industrial research. Such queries place severe stress on the indexing and I/O subsystems of conventional database environments since they involve the search of large numbers of records. The Datacycle architecture and research prototype is a database environment that uses filtering technology to perform an efficient, exhaustive search of an entire database. It has recently been modified to include fuzzy predicates in its query processing. The approach obviates the need for complex index structures, provides unlimited query throughput, permits the use of ad-hoc fuzzy membership functions, and provides a deterministic response time largely independent of query complexity and load. This paper describes the Datacycle prototype implementation of fuzzy queries and some recent performance results.

  18. Searching for cancer information on the internet: analyzing natural language search queries.

    PubMed

    Bader, Judith L; Theofanos, Mary Frances

    2003-12-11

    Searching for health information is one of the most-common tasks performed by Internet users. Many users begin searching on popular search engines rather than on prominent health information sites. We know that many visitors to our (National Cancer Institute) Web site, cancer.gov, arrive via links in search engine result. To learn more about the specific needs of our general-public users, we wanted to understand what lay users really wanted to know about cancer, how they phrased their questions, and how much detail they used. The National Cancer Institute partnered with AskJeeves, Inc to develop a methodology to capture, sample, and analyze 3 months of cancer-related queries on the Ask.com Web site, a prominent United States consumer search engine, which receives over 35 million queries per week. Using a benchmark set of 500 terms and word roots supplied by the National Cancer Institute, AskJeeves identified a test sample of cancer queries for 1 week in August 2001. From these 500 terms only 37 appeared >or= 5 times/day over the trial test week in 17208 queries. Using these 37 terms, 204165 instances of cancer queries were found in the Ask.com query logs for the actual test period of June-August 2001. Of these, 7500 individual user questions were randomly selected for detailed analysis and assigned to appropriate categories. The exact language of sample queries is presented. Considering multiples of the same questions, the sample of 7500 individual user queries represented 76077 queries (37% of the total 3-month pool). Overall 78.37% of sampled Cancer queries asked about 14 specific cancer types. Within each cancer type, queries were sorted into appropriate subcategories including at least the following: General Information, Symptoms, Diagnosis and Testing, Treatment, Statistics, Definition, and Cause/Risk/Link. The most-common specific cancer types mentioned in queries were Digestive/Gastrointestinal/Bowel (15.0%), Breast (11.7%), Skin (11.3%), and Genitourinary

  19. Searching for Cancer Information on the Internet: Analyzing Natural Language Search Queries

    PubMed Central

    Theofanos, Mary Frances

    2003-01-01

    Background Searching for health information is one of the most-common tasks performed by Internet users. Many users begin searching on popular search engines rather than on prominent health information sites. We know that many visitors to our (National Cancer Institute) Web site, cancer.gov, arrive via links in search engine result. Objective To learn more about the specific needs of our general-public users, we wanted to understand what lay users really wanted to know about cancer, how they phrased their questions, and how much detail they used. Methods The National Cancer Institute partnered with AskJeeves, Inc to develop a methodology to capture, sample, and analyze 3 months of cancer-related queries on the Ask.com Web site, a prominent United States consumer search engine, which receives over 35 million queries per week. Using a benchmark set of 500 terms and word roots supplied by the National Cancer Institute, AskJeeves identified a test sample of cancer queries for 1 week in August 2001. From these 500 terms only 37 appeared ≥ 5 times/day over the trial test week in 17208 queries. Using these 37 terms, 204165 instances of cancer queries were found in the Ask.com query logs for the actual test period of June-August 2001. Of these, 7500 individual user questions were randomly selected for detailed analysis and assigned to appropriate categories. The exact language of sample queries is presented. Results Considering multiples of the same questions, the sample of 7500 individual user queries represented 76077 queries (37% of the total 3-month pool). Overall 78.37% of sampled Cancer queries asked about 14 specific cancer types. Within each cancer type, queries were sorted into appropriate subcategories including at least the following: General Information, Symptoms, Diagnosis and Testing, Treatment, Statistics, Definition, and Cause/Risk/Link. The most-common specific cancer types mentioned in queries were Digestive/Gastrointestinal/Bowel (15.0%), Breast (11

  20. Effective Multi-Query Expansions: Collaborative Deep Networks for Robust Landmark Retrieval.

    PubMed

    Wang, Yang; Lin, Xuemin; Wu, Lin; Zhang, Wenjie

    2017-03-01

    Given a query photo issued by a user (q-user), the landmark retrieval is to return a set of photos with their landmarks similar to those of the query, while the existing studies on the landmark retrieval focus on exploiting geometries of landmarks for similarity matches between candidate photos and a query photo. We observe that the same landmarks provided by different users over social media community may convey different geometry information depending on the viewpoints and/or angles, and may, subsequently, yield very different results. In fact, dealing with the landmarks with low quality shapes caused by the photography of q-users is often nontrivial and has seldom been studied. In this paper, we propose a novel framework, namely, multi-query expansions, to retrieve semantically robust landmarks by two steps. First, we identify the top- k photos regarding the latent topics of a query landmark to construct multi-query set so as to remedy its possible low quality shape. For this purpose, we significantly extend the techniques of Latent Dirichlet Allocation. Then, motivated by the typical collaborative filtering methods, we propose to learn a collaborative deep networks-based semantically, nonlinear, and high-level features over the latent factor for landmark photo as the training set, which is formed by matrix factorization over collaborative user-photo matrix regarding the multi-query set. The learned deep network is further applied to generate the features for all the other photos, meanwhile resulting into a compact multi-query set within such space. Then, the final ranking scores are calculated over the high-level feature space between the multi-query set and all other photos, which are ranked to serve as the final ranking list of landmark retrieval. Extensive experiments are conducted on real-world social media data with both landmark photos together with their user information to show the superior performance over the existing methods, especially our recently

  1. Massive Query Resolution for Rapid Selective Dissemination of Information.

    ERIC Educational Resources Information Center

    Cohen, Jonathan D.

    1999-01-01

    Outlines an efficient approach to performing query resolution which, when matched with a keyword scanner, offers rapid selecting and routing for massive Boolean queries, and which is suitable for implementation on a desktop computer. Demonstrates the system's operation with large examples in a practical setting. (AEF)

  2. Using Bitmap Indexing Technology for Combined Numerical and TextQueries

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Stockinger, Kurt; Cieslewicz, John; Wu, Kesheng

    2006-10-16

    In this paper, we describe a strategy of using compressedbitmap indices to speed up queries on both numerical data and textdocuments. By using an efficient compression algorithm, these compressedbitmap indices are compact even for indices with millions of distinctterms. Moreover, bitmap indices can be used very efficiently to answerBoolean queries over text documents involving multiple query terms.Existing inverted indices for text searches are usually inefficient forcorpora with a very large number of terms as well as for queriesinvolving a large number of hits. We demonstrate that our compressedbitmap index technology overcomes both of those short-comings. In aperformance comparison against amore » commonly used database system, ourindices answer queries 30 times faster on average. To provide full SQLsupport, we integrated our indexing software, called FastBit, withMonetDB. The integrated system MonetDB/FastBit provides not onlyefficient searches on a single table as FastBit does, but also answersjoin queries efficiently. Furthermore, MonetDB/FastBit also provides avery efficient retrieval mechanism of result records.« less

  3. Consumer-Resource Dynamics: Quantity, Quality, and Allocation

    PubMed Central

    Getz, Wayne M.; Owen-Smith, Norman

    2011-01-01

    Background The dominant paradigm for modeling the complexities of interacting populations and food webs is a system of coupled ordinary differential equations in which the state of each species, population, or functional trophic group is represented by an aggregated numbers-density or biomass-density variable. Here, using the metaphysiological approach to model consumer-resource interactions, we formulate a two-state paradigm that represents each population or group in a food web in terms of both its quantity and quality. Methodology and Principal Findings The formulation includes an allocation function controlling the relative proportion of extracted resources to increasing quantity versus elevating quality. Since lower quality individuals senesce more rapidly than higher quality individuals, an optimal allocation proportion exists and we derive an expression for how this proportion depends on population parameters that determine the senescence rate, the per-capita mortality rate, and the effects of these rates on the dynamics of the quality variable. We demonstrate that oscillations do not arise in our model from quantity-quality interactions alone, but require consumer-resource interactions across trophic levels that can be stabilized through judicious resource allocation strategies. Analysis and simulations provide compelling arguments for the necessity of populations to evolve quality-related dynamics in the form of maternal effects, storage or other appropriate structures. They also indicate that resource allocation switching between investments in abundance versus quality provide a powerful mechanism for promoting the stability of consumer-resource interactions in seasonally forcing environments. Conclusions/Significance Our simulations show that physiological inefficiencies associated with this switching can be favored by selection due to the diminished exposure of inefficient consumers to strong oscillations associated with the well-known paradox of

  4. TokSearch: A search engine for fusion experimental data

    DOE PAGES

    Sammuli, Brian S.; Barr, Jayson L.; Eidietis, Nicholas W.; ...

    2018-04-01

    At a typical fusion research site, experimental data is stored using archive technologies that deal with each discharge as an independent set of data. These technologies (e.g. MDSplus or HDF5) are typically supplemented with a database that aggregates metadata for multiple shots to allow for efficient querying of certain predefined quantities. Often, however, a researcher will need to extract information from the archives, possibly for many shots, that is not available in the metadata store or otherwise indexed for quick retrieval. To address this need, a new search tool called TokSearch has been added to the General Atomics TokSys controlmore » design and analysis suite [1]. This tool provides the ability to rapidly perform arbitrary, parallelized queries of archived tokamak shot data (both raw and analyzed) over large numbers of shots. The TokSearch query API borrows concepts from SQL, and users can choose to implement queries in either MatlabTM or Python.« less

  5. TokSearch: A search engine for fusion experimental data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sammuli, Brian S.; Barr, Jayson L.; Eidietis, Nicholas W.

    At a typical fusion research site, experimental data is stored using archive technologies that deal with each discharge as an independent set of data. These technologies (e.g. MDSplus or HDF5) are typically supplemented with a database that aggregates metadata for multiple shots to allow for efficient querying of certain predefined quantities. Often, however, a researcher will need to extract information from the archives, possibly for many shots, that is not available in the metadata store or otherwise indexed for quick retrieval. To address this need, a new search tool called TokSearch has been added to the General Atomics TokSys controlmore » design and analysis suite [1]. This tool provides the ability to rapidly perform arbitrary, parallelized queries of archived tokamak shot data (both raw and analyzed) over large numbers of shots. The TokSearch query API borrows concepts from SQL, and users can choose to implement queries in either MatlabTM or Python.« less

  6. Using search engine query data to track pharmaceutical utilization: a study of statins.

    PubMed

    Schuster, Nathaniel M; Rogers, Mary A M; McMahon, Laurence F

    2010-08-01

    To examine temporal and geographic associations between Google queries for health information and healthcare utilization benchmarks. Retrospective longitudinal study. Using Google Trends and Google Insights for Search data, the search terms Lipitor (atorvastatin calcium; Pfizer, Ann Arbor, MI) and simvastatin were evaluated for change over time and for association with Lipitor revenues. The relationship between query data and community-based resource use per Medicare beneficiary was assessed for 35 US metropolitan areas. Google queries for Lipitor significantly decreased from January 2004 through June 2009 and queries for simvastatin significantly increased (P <.001 for both), particularly after Lipitor came off patent (P <.001 for change in slope). The mean number of Google queries for Lipitor correlated (r = 0.98) with the percentage change in Lipitor global revenues from 2004 to 2008 (P <.001). Query preference for Lipitor over simvastatin was positively associated (r = 0.40) with a community's use of Medicare services. For every 1% increase in utilization of Medicare services in a community, there was a 0.2-unit increase in the ratio of Lipitor queries to simvastatin queries in that community (P = .02). Specific search engine queries for medical information correlate with pharmaceutical revenue and with overall healthcare utilization in a community. This suggests that search query data can track community-wide characteristics in healthcare utilization and have the potential for informing payers and policy makers regarding trends in utilization.

  7. BioFed: federated query processing over life sciences linked open data.

    PubMed

    Hasnain, Ali; Mehmood, Qaiser; Sana E Zainab, Syeda; Saleem, Muhammad; Warren, Claude; Zehra, Durre; Decker, Stefan; Rebholz-Schuhmann, Dietrich

    2017-03-15

    Biomedical data, e.g. from knowledge bases and ontologies, is increasingly made available following open linked data principles, at best as RDF triple data. This is a necessary step towards unified access to biological data sets, but this still requires solutions to query multiple endpoints for their heterogeneous data to eventually retrieve all the meaningful information. Suggested solutions are based on query federation approaches, which require the submission of SPARQL queries to endpoints. Due to the size and complexity of available data, these solutions have to be optimised for efficient retrieval times and for users in life sciences research. Last but not least, over time, the reliability of data resources in terms of access and quality have to be monitored. Our solution (BioFed) federates data over 130 SPARQL endpoints in life sciences and tailors query submission according to the provenance information. BioFed has been evaluated against the state of the art solution FedX and forms an important benchmark for the life science domain. The efficient cataloguing approach of the federated query processing system 'BioFed', the triple pattern wise source selection and the semantic source normalisation forms the core to our solution. It gathers and integrates data from newly identified public endpoints for federated access. Basic provenance information is linked to the retrieved data. Last but not least, BioFed makes use of the latest SPARQL standard (i.e., 1.1) to leverage the full benefits for query federation. The evaluation is based on 10 simple and 10 complex queries, which address data in 10 major and very popular data sources (e.g., Dugbank, Sider). BioFed is a solution for a single-point-of-access for a large number of SPARQL endpoints providing life science data. It facilitates efficient query generation for data access and provides basic provenance information in combination with the retrieved data. BioFed fully supports SPARQL 1.1 and gives access to the

  8. 28 CFR 25.7 - Querying records in the system.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 28 Judicial Administration 1 2010-07-01 2010-07-01 false Querying records in the system. 25.7 Section 25.7 Judicial Administration DEPARTMENT OF JUSTICE DEPARTMENT OF JUSTICE INFORMATION SYSTEMS The National Instant Criminal Background Check System § 25.7 Querying records in the system. (a) The following...

  9. DISPAQ: Distributed Profitable-Area Query from Big Taxi Trip Data †

    PubMed Central

    Putri, Fadhilah Kurnia; Song, Giltae; Rao, Praveen

    2017-01-01

    One of the crucial problems for taxi drivers is to efficiently locate passengers in order to increase profits. The rapid advancement and ubiquitous penetration of Internet of Things (IoT) technology into transportation industries enables us to provide taxi drivers with locations that have more potential passengers (more profitable areas) by analyzing and querying taxi trip data. In this paper, we propose a query processing system, called Distributed Profitable-Area Query (DISPAQ) which efficiently identifies profitable areas by exploiting the Apache Software Foundation’s Spark framework and a MongoDB database. DISPAQ first maintains a profitable-area query index (PQ-index) by extracting area summaries and route summaries from raw taxi trip data. It then identifies candidate profitable areas by searching the PQ-index during query processing. Then, it exploits a Z-Skyline algorithm, which is an extension of skyline processing with a Z-order space filling curve, to quickly refine the candidate profitable areas. To improve the performance of distributed query processing, we also propose local Z-Skyline optimization, which reduces the number of dominant tests by distributing killer profitable areas to each cluster node. Through extensive evaluation with real datasets, we demonstrate that our DISPAQ system provides a scalable and efficient solution for processing profitable-area queries from huge amounts of big taxi trip data. PMID:28946679

  10. A novel methodology for querying web images

    NASA Astrophysics Data System (ADS)

    Prabhakara, Rashmi; Lee, Ching Cheng

    2005-01-01

    Ever since the advent of Internet, there has been an immense growth in the amount of image data that is available on the World Wide Web. With such a magnitude of image availability, an efficient and effective image retrieval system is required to make use of this information. This research presents an effective image matching and indexing technique that improvises on existing integrated image retrieval methods. The proposed technique follows a two-phase approach, integrating query by topic and query by example specification methods. The first phase consists of topic-based image retrieval using an improved text information retrieval (IR) technique that makes use of the structured format of HTML documents. It consists of a focused crawler that not only provides for the user to enter the keyword for the topic-based search but also, the scope in which the user wants to find the images. The second phase uses the query by example specification to perform a low-level content-based image match for the retrieval of smaller and relatively closer results of the example image. Information related to the image feature is automatically extracted from the query image by the image processing system. A technique that is not computationally intensive based on color feature is used to perform content-based matching of images. The main goal is to develop a functional image search and indexing system and to demonstrate that better retrieval results can be achieved with this proposed hybrid search technique.

  11. A novel methodology for querying web images

    NASA Astrophysics Data System (ADS)

    Prabhakara, Rashmi; Lee, Ching Cheng

    2004-12-01

    Ever since the advent of Internet, there has been an immense growth in the amount of image data that is available on the World Wide Web. With such a magnitude of image availability, an efficient and effective image retrieval system is required to make use of this information. This research presents an effective image matching and indexing technique that improvises on existing integrated image retrieval methods. The proposed technique follows a two-phase approach, integrating query by topic and query by example specification methods. The first phase consists of topic-based image retrieval using an improved text information retrieval (IR) technique that makes use of the structured format of HTML documents. It consists of a focused crawler that not only provides for the user to enter the keyword for the topic-based search but also, the scope in which the user wants to find the images. The second phase uses the query by example specification to perform a low-level content-based image match for the retrieval of smaller and relatively closer results of the example image. Information related to the image feature is automatically extracted from the query image by the image processing system. A technique that is not computationally intensive based on color feature is used to perform content-based matching of images. The main goal is to develop a functional image search and indexing system and to demonstrate that better retrieval results can be achieved with this proposed hybrid search technique.

  12. Data Aggregation System: A system for information retrieval on demand over relational and non-relational distributed data sources

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ball, G.; Kuznetsov, V.; Evans, D.

    We present the Data Aggregation System, a system for information retrieval and aggregation from heterogenous sources of relational and non-relational data for the Compact Muon Solenoid experiment on the CERN Large Hadron Collider. The experiment currently has a number of organically-developed data sources, including front-ends to a number of different relational databases and non-database data services which do not share common data structures or APIs (Application Programming Interfaces), and cannot at this stage be readily converged. DAS provides a single interface for querying all these services, a caching layer to speed up access to expensive underlying calls and the abilitymore » to merge records from different data services pertaining to a single primary key.« less

  13. The DEDUCE Guided Query Tool: Providing Simplified Access to Clinical Data for Research and Quality Improvement

    PubMed Central

    Horvath, Monica M.; Winfield, Stephanie; Evans, Steve; Slopek, Steve; Shang, Howard; Ferranti, Jeffrey

    2011-01-01

    In many healthcare organizations, comparative effectiveness research and quality improvement (QI) investigations are hampered by a lack of access to data created as a byproduct of patient care. Data collection often hinges upon either manual chart review or ad hoc requests to technical experts who support legacy clinical systems. In order to facilitate this needed capacity for data exploration at our institution (Duke University Health System), we have designed and deployed a robust Web application for cohort identification and data extraction—the Duke Enterprise Data Unified Content Explorer (DEDUCE). DEDUCE is envisioned as a simple, web-based environment that allows investigators access to administrative, financial, and clinical information generated during patient care. By using business intelligence tools to create a view into Duke Medicine's enterprise data warehouse, DEDUCE provides a guided query functionality using a wizard-like interface that lets users filter through millions of clinical records, explore aggregate reports, and, export extracts. Researchers and QI specialists can obtain detailed patient- and observation-level extracts without needing to understand structured query language or the underlying database model. Developers designing such tools must devote sufficient training and develop application safeguards to ensure that patient-centered clinical researchers understand when observation-level extracts should be used. This may mitigate the risk of data being misunderstood and consequently used in an improper fashion. PMID:21130181

  14. The DEDUCE Guided Query tool: providing simplified access to clinical data for research and quality improvement.

    PubMed

    Horvath, Monica M; Winfield, Stephanie; Evans, Steve; Slopek, Steve; Shang, Howard; Ferranti, Jeffrey

    2011-04-01

    In many healthcare organizations, comparative effectiveness research and quality improvement (QI) investigations are hampered by a lack of access to data created as a byproduct of patient care. Data collection often hinges upon either manual chart review or ad hoc requests to technical experts who support legacy clinical systems. In order to facilitate this needed capacity for data exploration at our institution (Duke University Health System), we have designed and deployed a robust Web application for cohort identification and data extraction--the Duke Enterprise Data Unified Content Explorer (DEDUCE). DEDUCE is envisioned as a simple, web-based environment that allows investigators access to administrative, financial, and clinical information generated during patient care. By using business intelligence tools to create a view into Duke Medicine's enterprise data warehouse, DEDUCE provides a Guided Query functionality using a wizard-like interface that lets users filter through millions of clinical records, explore aggregate reports, and, export extracts. Researchers and QI specialists can obtain detailed patient- and observation-level extracts without needing to understand structured query language or the underlying database model. Developers designing such tools must devote sufficient training and develop application safeguards to ensure that patient-centered clinical researchers understand when observation-level extracts should be used. This may mitigate the risk of data being misunderstood and consequently used in an improper fashion. Copyright © 2010 Elsevier Inc. All rights reserved.

  15. Object-Oriented Query Language For Events Detection From Images Sequences

    NASA Astrophysics Data System (ADS)

    Ganea, Ion Eugen

    2015-09-01

    In this paper is presented a method to represent the events extracted from images sequences and the query language used for events detection. Using an object oriented model the spatial and temporal relationships between salient objects and also between events are stored and queried. This works aims to unify the storing and querying phases for video events processing. The object oriented language syntax used for events processing allow the instantiation of the indexes classes in order to improve the accuracy of the query results. The experiments were performed on images sequences provided from sport domain and it shows the reliability and the robustness of the proposed language. To extend the language will be added a specific syntax for constructing the templates for abnormal events and for detection of the incidents as the final goal of the research.

  16. Four queries concerning the metaphysics of early human embryogenesis.

    PubMed

    Howsepian, A A

    2008-04-01

    In this essay, I attempt to provide answers to the following four queries concerning the metaphysics of early human embryogenesis. (1) Following its first cellular fission, is it coherent to claim that one and only one of two "blastomeric" twins of a human zygote is identical with that zygote? (2) Following the fusion of two human pre-embryos, is it coherent to claim that one and only one pre-fusion pre-embryo is identical with that postfusion pre-embryo? (3) Does a live human being come into existence only when its brain comes into existence? (4) At implantation, does a pre-embryo become a mere part of its mother? I argue that either if things have quidditative properties or if criterialism is false, then queries (1) and (2) can be answered in the affirmative; that in light of recent developments in theories of human death and in light of a more "functional" theory of brains, query (3) can be answered in the negative; and that plausible mereological principles require a negative answer to query (4).

  17. Clean Air Markets - Facility Attributes and Contacts Query Wizard

    EPA Pesticide Factsheets

    The Facility Attributes and Contacts Query Wizard is part of a suite of Clean Air Markets-related tools that are accessible at http://camddataandmaps.epa.gov/gdm/index.cfm. The Facility Attributes and Contact module gives the user access to current and historical facility, owner, and representative data using custom queries, via the Facility Attributes Query Wizard, or Quick Reports. In addition, data regarding EPA, State, and local agency staff are also available. The Query Wizard can be used to search for data about a facility or facilities by identifying characteristics such as associated programs, owners, representatives, locations, and unit characteristics, facility inventories, and classifications.EPA's Clean Air Markets Division (CAMD) includes several market-based regulatory programs designed to improve air quality and ecosystems. The most well-known of these programs are EPA's Acid Rain Program and the NOx Programs, which reduce emissions of sulfur dioxide (SO2) and nitrogen oxides (NOx)-compounds that adversely affect air quality, the environment, and public health. CAMD also plays an integral role in the development and implementation of the Clean Air Interstate Rule (CAIR).

  18. Ballistic aggregation in systems of inelastic particles: Cluster growth, structure, and aging

    NASA Astrophysics Data System (ADS)

    Paul, Subhajit; Das, Subir K.

    2017-07-01

    We study far-from-equilibrium dynamics in models of freely cooling granular gas and ballistically aggregating compact clusters. For both the cases, from event-driven molecular dynamics simulations, we have presented detailed results on structure and dynamics in space dimensions d =1 and 2. Via appropriate analyses it has been confirmed that the ballistic aggregation mechanism applies in d =1 granular gases as well. Aging phenomena for this mechanism, in both the dimensions, have been studied via the two-time density autocorrelation function. This quantity is demonstrated to exhibit scaling property similar to that in the standard phase transition kinetics. The corresponding functional forms have been quantified and the outcomes have been discussed in connection with the structural properties. Our results on aging establish a more complete equivalence between the granular gas and the ballistic aggregation models in d =1 .

  19. Query Classification and Study of University Students' Search Trends

    ERIC Educational Resources Information Center

    Maabreh, Majdi A.; Al-Kabi, Mohammed N.; Alsmadi, Izzat M.

    2012-01-01

    Purpose: This study is an attempt to develop an automatic identification method for Arabic web queries and divide them into several query types using data mining. In addition, it seeks to evaluate the impact of the academic environment on using the internet. Design/methodology/approach: The web log files were collected from one of the higher…

  20. Predicting the hand, foot, and mouth disease incidence using search engine query data and climate variables: an ecological study in Guangdong, China

    PubMed Central

    Du, Zhicheng; Xu, Lin; Zhang, Wangjian; Zhang, Dingmei; Yu, Shicheng; Hao, Yuantao

    2017-01-01

    Objectives Hand, foot, and mouth disease (HFMD) has caused a substantial burden in China, especially in Guangdong Province. Based on the enhanced surveillance system, we aimed to explore whether the addition of temperate and search engine query data improves the risk prediction of HFMD. Design Ecological study. Setting and participants Information on the confirmed cases of HFMD, climate parameters and search engine query logs was collected. A total of 1.36 million HFMD cases were identified from the surveillance system during 2011–2014. Analyses were conducted at aggregate level and no confidential information was involved. Outcome measures A seasonal autoregressive integrated moving average (ARIMA) model with external variables (ARIMAX) was used to predict the HFMD incidence from 2011 to 2014, taking into account temperature and search engine query data (Baidu Index, BDI). Statistics of goodness-of-fit and precision of prediction were used to compare models (1) based on surveillance data only, and with the addition of (2) temperature, (3) BDI, and (4) both temperature and BDI. Results A high correlation between HFMD incidence and BDI (r=0.794, p<0.001) or temperature (r=0.657, p<0.001) was observed using both time series plot and correlation matrix. A linear effect of BDI (without lag) and non-linear effect of temperature (1 week lag) on HFMD incidence were found in a distributed lag non-linear model. Compared with the model based on surveillance data only, the ARIMAX model including BDI reached the best goodness-of-fit with an Akaike information criterion (AIC) value of −345.332, whereas the model including both BDI and temperature had the most accurate prediction in terms of the mean absolute percentage error (MAPE) of 101.745%. Conclusions An ARIMAX model incorporating search engine query data significantly improved the prediction of HFMD. Further studies are warranted to examine whether including search engine query data also improves the prediction of

  1. HodDB: Design and Analysis of a Query Processor for Brick.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Fierro, Gabriel; Culler, David

    Brick is a recently proposed metadata schema and ontology for describing building components and the relationships between them. It represents buildings as directed labeled graphs using the RDF data model. Using the SPARQL query language, building-agnostic applications query a Brick graph to discover the set of resources and relationships they require to operate. Latency-sensitive applications, such as user interfaces, demand response and modelpredictive control, require fast queries — conventionally less than 100ms. We benchmark a set of popular open-source and commercial SPARQL databases against three real Brick models using seven application queries and find that none of them meet thismore » performance target. This lack of performance can be attributed to design decisions that optimize for queries over large graphs consisting of billions of triples, but give poor spatial locality and join performance on the small dense graphs typical of Brick. We present the design and evaluation of HodDB, a RDF/SPARQL database for Brick built over a node-based index structure. HodDB performs Brick queries 3-700x faster than leading SPARQL databases and consistently meets the 100ms threshold, enabling the portability of important latency-sensitive building applications.« less

  2. Front Range Infrastructure Resources Project--Aggregate Resources Activities

    USGS Publications Warehouse

    ,

    1998-01-01

    Infrastructure, such as roads, buildings, airports, and dams, is built and maintained by use of large quantities of aggregate—sand, gravel, and stone. As urban areas expand, local sources of these resources become inaccessible. Other competitive land uses have a higher value than aggregate resources. For example, gravel cannot be mined from under a subdivision. The failure to plan for the protection and extraction of infrastructure resources often results in increased consumer cost, environmental damage, and an adversarial relationship between the industry and the community.

  3. Large scale study of multiple-molecule queries

    PubMed Central

    2009-01-01

    Background In ligand-based screening, as well as in other chemoinformatics applications, one seeks to effectively search large repositories of molecules in order to retrieve molecules that are similar typically to a single molecule lead. However, in some case, multiple molecules from the same family are available to seed the query and search for other members of the same family. Multiple-molecule query methods have been less studied than single-molecule query methods. Furthermore, the previous studies have relied on proprietary data and sometimes have not used proper cross-validation methods to assess the results. In contrast, here we develop and compare multiple-molecule query methods using several large publicly available data sets and background. We also create a framework based on a strict cross-validation protocol to allow unbiased benchmarking for direct comparison in future studies across several performance metrics. Results Fourteen different multiple-molecule query methods were defined and benchmarked using: (1) 41 publicly available data sets of related molecules with similar biological activity; and (2) publicly available background data sets consisting of up to 175,000 molecules randomly extracted from the ChemDB database and other sources. Eight of the fourteen methods were parameter free, and six of them fit one or two free parameters to the data using a careful cross-validation protocol. All the methods were assessed and compared for their ability to retrieve members of the same family against the background data set by using several performance metrics including the Area Under the Accumulation Curve (AUAC), Area Under the Curve (AUC), F1-measure, and BEDROC metrics. Consistent with the previous literature, the best parameter-free methods are the MAX-SIM and MIN-RANK methods, which score a molecule to a family by the maximum similarity, or minimum ranking, obtained across the family. One new parameterized method introduced in this study and two

  4. High Performance Visualization using Query-Driven Visualizationand Analytics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bethel, E. Wes; Campbell, Scott; Dart, Eli

    2006-06-15

    Query-driven visualization and analytics is a unique approach for high-performance visualization that offers new capabilities for knowledge discovery and hypothesis testing. The new capabilities akin to finding needles in haystacks are the result of combining technologies from the fields of scientific visualization and scientific data management. This approach is crucial for rapid data analysis and visualization in the petascale regime. This article describes how query-driven visualization is applied to a hero-sized network traffic analysis problem.

  5. Browsing schematics: Query-filtered graphs with context nodes

    NASA Technical Reports Server (NTRS)

    Ciccarelli, Eugene C.; Nardi, Bonnie A.

    1988-01-01

    The early results of a research project to create tools for building interfaces to intelligent systems on the NASA Space Station are reported. One such tool is the Schematic Browser which helps users engaged in engineering problem solving find and select schematics from among a large set. Users query for schematics with certain components, and the Schematic Browser presents a graph whose nodes represent the schematics with those components. The query greatly reduces the number of choices presented to the user, filtering the graph to a manageable size. Users can reformulate and refine the query serially until they locate the schematics of interest. To help users maintain orientation as they navigate a large body of data, the graph also includes nodes that are not matches but provide global and local context for the matching nodes. Context nodes include landmarks, ancestors, siblings, children and previous matches.

  6. Automatically Preparing Safe SQL Queries

    NASA Astrophysics Data System (ADS)

    Bisht, Prithvi; Sistla, A. Prasad; Venkatakrishnan, V. N.

    We present the first sound program source transformation approach for automatically transforming the code of a legacy web application to employ PREPARE statements in place of unsafe SQL queries. Our approach therefore opens the way for eradicating the SQL injection threat vector from legacy web applications.

  7. Towards Hybrid Online On-Demand Querying of Realtime Data with Stateful Complex Event Processing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhou, Qunzhi; Simmhan, Yogesh; Prasanna, Viktor K.

    Emerging Big Data applications in areas like e-commerce and energy industry require both online and on-demand queries to be performed over vast and fast data arriving as streams. These present novel challenges to Big Data management systems. Complex Event Processing (CEP) is recognized as a high performance online query scheme which in particular deals with the velocity aspect of the 3-V’s of Big Data. However, traditional CEP systems do not consider data variety and lack the capability to embed ad hoc queries over the volume of data streams. In this paper, we propose H2O, a stateful complex event processing framework,more » to support hybrid online and on-demand queries over realtime data. We propose a semantically enriched event and query model to address data variety. A formal query algebra is developed to precisely capture the stateful and containment semantics of online and on-demand queries. We describe techniques to achieve the interactive query processing over realtime data featured by efficient online querying, dynamic stream data persistence and on-demand access. The system architecture is presented and the current implementation status reported.« less

  8. Feasibility of Homomorphic Encryption for Sharing I2B2 Aggregate-Level Data in the Cloud

    PubMed Central

    Raisaro, Jean Louis; Klann, Jeffrey G; Wagholikar, Kavishwar B; Estiri, Hossein; Hubaux, Jean-Pierre; Murphy, Shawn N

    2018-01-01

    The biomedical community is lagging in the adoption of cloud computing for the management of medical data. The primary obstacles are concerns about privacy and security. In this paper, we explore the feasibility of using advanced privacy-enhancing technologies in order to enable the sharing of sensitive clinical data in a public cloud. Our goal is to facilitate sharing of clinical data in the cloud by minimizing the risk of unintended leakage of sensitive clinical information. In particular, we focus on homomorphic encryption, a specific type of encryption that offers the ability to run computation on the data while the data remains encrypted. This paper demonstrates that homomorphic encryption can be used efficiently to compute aggregating queries on the ciphertexts, along with providing end-to-end confidentiality of aggregate-level data from the i2b2 data model. PMID:29888067

  9. Feasibility of Homomorphic Encryption for Sharing I2B2 Aggregate-Level Data in the Cloud.

    PubMed

    Raisaro, Jean Louis; Klann, Jeffrey G; Wagholikar, Kavishwar B; Estiri, Hossein; Hubaux, Jean-Pierre; Murphy, Shawn N

    2018-01-01

    The biomedical community is lagging in the adoption of cloud computing for the management of medical data. The primary obstacles are concerns about privacy and security. In this paper, we explore the feasibility of using advanced privacy-enhancing technologies in order to enable the sharing of sensitive clinical data in a public cloud. Our goal is to facilitate sharing of clinical data in the cloud by minimizing the risk of unintended leakage of sensitive clinical information. In particular, we focus on homomorphic encryption, a specific type of encryption that offers the ability to run computation on the data while the data remains encrypted. This paper demonstrates that homomorphic encryption can be used efficiently to compute aggregating queries on the ciphertexts, along with providing end-to-end confidentiality of aggregate-level data from the i2b2 data model.

  10. A web-based data-querying tool based on ontology-driven methodology and flowchart-based model.

    PubMed

    Ping, Xiao-Ou; Chung, Yufang; Tseng, Yi-Ju; Liang, Ja-Der; Yang, Pei-Ming; Huang, Guan-Tarn; Lai, Feipei

    2013-10-08

    Because of the increased adoption rate of electronic medical record (EMR) systems, more health care records have been increasingly accumulating in clinical data repositories. Therefore, querying the data stored in these repositories is crucial for retrieving the knowledge from such large volumes of clinical data. The aim of this study is to develop a Web-based approach for enriching the capabilities of the data-querying system along the three following considerations: (1) the interface design used for query formulation, (2) the representation of query results, and (3) the models used for formulating query criteria. The Guideline Interchange Format version 3.5 (GLIF3.5), an ontology-driven clinical guideline representation language, was used for formulating the query tasks based on the GLIF3.5 flowchart in the Protégé environment. The flowchart-based data-querying model (FBDQM) query execution engine was developed and implemented for executing queries and presenting the results through a visual and graphical interface. To examine a broad variety of patient data, the clinical data generator was implemented to automatically generate the clinical data in the repository, and the generated data, thereby, were employed to evaluate the system. The accuracy and time performance of the system for three medical query tasks relevant to liver cancer were evaluated based on the clinical data generator in the experiments with varying numbers of patients. In this study, a prototype system was developed to test the feasibility of applying a methodology for building a query execution engine using FBDQMs by formulating query tasks using the existing GLIF. The FBDQM-based query execution engine was used to successfully retrieve the clinical data based on the query tasks formatted using the GLIF3.5 in the experiments with varying numbers of patients. The accuracy of the three queries (ie, "degree of liver damage," "degree of liver damage when applying a mutually exclusive setting

  11. A Web-Based Data-Querying Tool Based on Ontology-Driven Methodology and Flowchart-Based Model

    PubMed Central

    Ping, Xiao-Ou; Chung, Yufang; Liang, Ja-Der; Yang, Pei-Ming; Huang, Guan-Tarn; Lai, Feipei

    2013-01-01

    Background Because of the increased adoption rate of electronic medical record (EMR) systems, more health care records have been increasingly accumulating in clinical data repositories. Therefore, querying the data stored in these repositories is crucial for retrieving the knowledge from such large volumes of clinical data. Objective The aim of this study is to develop a Web-based approach for enriching the capabilities of the data-querying system along the three following considerations: (1) the interface design used for query formulation, (2) the representation of query results, and (3) the models used for formulating query criteria. Methods The Guideline Interchange Format version 3.5 (GLIF3.5), an ontology-driven clinical guideline representation language, was used for formulating the query tasks based on the GLIF3.5 flowchart in the Protégé environment. The flowchart-based data-querying model (FBDQM) query execution engine was developed and implemented for executing queries and presenting the results through a visual and graphical interface. To examine a broad variety of patient data, the clinical data generator was implemented to automatically generate the clinical data in the repository, and the generated data, thereby, were employed to evaluate the system. The accuracy and time performance of the system for three medical query tasks relevant to liver cancer were evaluated based on the clinical data generator in the experiments with varying numbers of patients. Results In this study, a prototype system was developed to test the feasibility of applying a methodology for building a query execution engine using FBDQMs by formulating query tasks using the existing GLIF. The FBDQM-based query execution engine was used to successfully retrieve the clinical data based on the query tasks formatted using the GLIF3.5 in the experiments with varying numbers of patients. The accuracy of the three queries (ie, “degree of liver damage,” “degree of liver damage

  12. Evolutionary Multiobjective Query Workload Optimization of Cloud Data Warehouses

    PubMed Central

    Dokeroglu, Tansel; Sert, Seyyit Alper; Cinar, Muhammet Serkan

    2014-01-01

    With the advent of Cloud databases, query optimizers need to find paretooptimal solutions in terms of response time and monetary cost. Our novel approach minimizes both objectives by deploying alternative virtual resources and query plans making use of the virtual resource elasticity of the Cloud. We propose an exact multiobjective branch-and-bound and a robust multiobjective genetic algorithm for the optimization of distributed data warehouse query workloads on the Cloud. In order to investigate the effectiveness of our approach, we incorporate the devised algorithms into a prototype system. Finally, through several experiments that we have conducted with different workloads and virtual resource configurations, we conclude remarkable findings of alternative deployments as well as the advantages and disadvantages of the multiobjective algorithms we propose. PMID:24892048

  13. Research on Extension of Sparql Ontology Query Language Considering the Computation of Indoor Spatial Relations

    NASA Astrophysics Data System (ADS)

    Li, C.; Zhu, X.; Guo, W.; Liu, Y.; Huang, H.

    2015-05-01

    A method suitable for indoor complex semantic query considering the computation of indoor spatial relations is provided According to the characteristics of indoor space. This paper designs ontology model describing the space related information of humans, events and Indoor space objects (e.g. Storey and Room) as well as their relations to meet the indoor semantic query. The ontology concepts are used in IndoorSPARQL query language which extends SPARQL syntax for representing and querying indoor space. And four types specific primitives for indoor query, "Adjacent", "Opposite", "Vertical" and "Contain", are defined as query functions in IndoorSPARQL used to support quantitative spatial computations. Also a method is proposed to analysis the query language. Finally this paper adopts this method to realize indoor semantic query on the study area through constructing the ontology model for the study building. The experimental results show that the method proposed in this paper can effectively support complex indoor space semantic query.

  14. Design of FastQuery: How to Generalize Indexing and Querying System for Scientific Data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wu, Jerry; Wu, Kesheng

    2011-04-18

    Modern scientific datasets present numerous data management and analysis challenges. State-of-the-art index and query technologies such as FastBit are critical for facilitating interactive exploration of large datasets. These technologies rely on adding auxiliary information to existing datasets to accelerate query processing. To use these indices, we need to match the relational data model used by the indexing systems with the array data model used by most scientific data, and to provide an efficient input and output layer for reading and writing the indices. In this work, we present a flexible design that can be easily applied to most scientific datamore » formats. We demonstrate this flexibility by applying it to two of the most commonly used scientific data formats, HDF5 and NetCDF. We present two case studies using simulation data from the particle accelerator and climate simulation communities. To demonstrate the effectiveness of the new design, we also present a detailed performance study using both synthetic and real scientific workloads.« less

  15. Locality in Search Engine Queries and Its Implications for Caching

    DTIC Science & Technology

    2001-05-01

    in the question of whether caching might be effective for search engines as well. They study two real search engine traces by examining query...locality and its implications for caching. The two search engines studied are Vivisimo and Excite. Their trace analysis results show that queries have

  16. A Comparison of Two Methods for Boolean Query Relevancy Feedback.

    ERIC Educational Resources Information Center

    Salton, G.; And Others

    1984-01-01

    Evaluates and compares two recently proposed automatic methods for relevance feedback of Boolean queries (Dillon method, which uses probabilistic approach as basis, and disjunctive normal form method). Conclusions are drawn concerning the use of effective feedback methods in a Boolean query environment. Nineteen references are included. (EJS)

  17. Superfund Chemical Data Matrix (SCDM) Query

    EPA Pesticide Factsheets

    This site allows you to to easily query the Superfund Chemical Data Matrix (SCDM) and generate a list of the corresponding Hazard Ranking System (HRS) factor values, benchmarks, and data elements that you need.

  18. Queries over Unstructured Data: Probabilistic Methods to the Rescue

    NASA Astrophysics Data System (ADS)

    Sarawagi, Sunita

    Unstructured data like emails, addresses, invoices, call transcripts, reviews, and press releases are now an integral part of any large enterprise. A challenge of modern business intelligence applications is analyzing and querying data seamlessly across structured and unstructured sources. This requires the development of automated techniques for extracting structured records from text sources and resolving entity mentions in data from various sources. The success of any automated method for extraction and integration depends on how effectively it unifies diverse clues in the unstructured source and in existing structured databases. We argue that statistical learning techniques like Conditional Random Fields (CRFs) provide a accurate, elegant and principled framework for tackling these tasks. Given the inherent noise in real-world sources, it is important to capture the uncertainty of the above operations via imprecise data models. CRFs provide a sound probability distribution over extractions but are not easy to represent and query in a relational framework. We present methods of approximating this distribution to query-friendly row and column uncertainty models. Finally, we present models for representing the uncertainty of de-duplication and algorithms for various Top-K count queries on imprecise duplicates.

  19. Categorical and Specificity Differences between User-Supplied Tags and Search Query Terms for Images. An Analysis of "Flickr" Tags and Web Image Search Queries

    ERIC Educational Resources Information Center

    Chung, EunKyung; Yoon, JungWon

    2009-01-01

    Introduction: The purpose of this study is to compare characteristics and features of user supplied tags and search query terms for images on the "Flickr" Website in terms of categories of pictorial meanings and level of term specificity. Method: This study focuses on comparisons between tags and search queries using Shatford's categorization…

  20. a Novel Approach of Indexing and Retrieving Spatial Polygons for Efficient Spatial Region Queries

    NASA Astrophysics Data System (ADS)

    Zhao, J. H.; Wang, X. Z.; Wang, F. Y.; Shen, Z. H.; Zhou, Y. C.; Wang, Y. L.

    2017-10-01

    Spatial region queries are more and more widely used in web-based applications. Mechanisms to provide efficient query processing over geospatial data are essential. However, due to the massive geospatial data volume, heavy geometric computation, and high access concurrency, it is difficult to get response in real time. Spatial indexes are usually used in this situation. In this paper, based on k-d tree, we introduce a distributed KD-Tree (DKD-Tree) suitbable for polygon data, and a two-step query algorithm. The spatial index construction is recursive and iterative, and the query is an in memory process. Both the index and query methods can be processed in parallel, and are implemented based on HDFS, Spark and Redis. Experiments on a large volume of Remote Sensing images metadata have been carried out, and the advantages of our method are investigated by comparing with spatial region queries executed on PostgreSQL and PostGIS. Results show that our approach not only greatly improves the efficiency of spatial region query, but also has good scalability, Moreover, the two-step spatial range query algorithm can also save cluster resources to support a large number of concurrent queries. Therefore, this method is very useful when building large geographic information systems.

  1. A Query System Implementation Case Study.

    ERIC Educational Resources Information Center

    Hiser, Judith N.; Neil, M. Elizabeth

    1985-01-01

    The Department of Administrative Programming Services of Clemson University investigated products available in user-friendly retrieval systems. The test of INTELLECT, a natural language query system written by Artifical Intelligence Corporation, is described. (Author/MLW)

  2. Querying Large Biological Network Datasets

    ERIC Educational Resources Information Center

    Gulsoy, Gunhan

    2013-01-01

    New experimental methods has resulted in increasing amount of genetic interaction data to be generated every day. Biological networks are used to store genetic interaction data gathered. Increasing amount of data available requires fast large scale analysis methods. Therefore, we address the problem of querying large biological network datasets.…

  3. Improving biomedical information retrieval by linear combinations of different query expansion techniques.

    PubMed

    Abdulla, Ahmed AbdoAziz Ahmed; Lin, Hongfei; Xu, Bo; Banbhrani, Santosh Kumar

    2016-07-25

    Biomedical literature retrieval is becoming increasingly complex, and there is a fundamental need for advanced information retrieval systems. Information Retrieval (IR) programs scour unstructured materials such as text documents in large reserves of data that are usually stored on computers. IR is related to the representation, storage, and organization of information items, as well as to access. In IR one of the main problems is to determine which documents are relevant and which are not to the user's needs. Under the current regime, users cannot precisely construct queries in an accurate way to retrieve particular pieces of data from large reserves of data. Basic information retrieval systems are producing low-quality search results. In our proposed system for this paper we present a new technique to refine Information Retrieval searches to better represent the user's information need in order to enhance the performance of information retrieval by using different query expansion techniques and apply a linear combinations between them, where the combinations was linearly between two expansion results at one time. Query expansions expand the search query, for example, by finding synonyms and reweighting original terms. They provide significantly more focused, particularized search results than do basic search queries. The retrieval performance is measured by some variants of MAP (Mean Average Precision) and according to our experimental results, the combination of best results of query expansion is enhanced the retrieved documents and outperforms our baseline by 21.06 %, even it outperforms a previous study by 7.12 %. We propose several query expansion techniques and their combinations (linearly) to make user queries more cognizable to search engines and to produce higher-quality search results.

  4. Evaluation of Content-Matched Range Monitoring Queries over Moving Objects in Mobile Computing Environments

    PubMed Central

    Jung, HaRim; Song, MoonBae; Youn, Hee Yong; Kim, Ung Mo

    2015-01-01

    A content-matched (CM) range monitoring query over moving objects continually retrieves the moving objects (i) whose non-spatial attribute values are matched to given non-spatial query values; and (ii) that are currently located within a given spatial query range. In this paper, we propose a new query indexing structure, called the group-aware query region tree (GQR-tree) for efficient evaluation of CM range monitoring queries. The primary role of the GQR-tree is to help the server leverage the computational capabilities of moving objects in order to improve the system performance in terms of the wireless communication cost and server workload. Through a series of comprehensive simulations, we verify the superiority of the GQR-tree method over the existing methods. PMID:26393613

  5. Evaluation of Content-Matched Range Monitoring Queries over Moving Objects in Mobile Computing Environments.

    PubMed

    Jung, HaRim; Song, MoonBae; Youn, Hee Yong; Kim, Ung Mo

    2015-09-18

    A content-matched (CM) rangemonitoring query overmoving objects continually retrieves the moving objects (i) whose non-spatial attribute values are matched to given non-spatial query values; and (ii) that are currently located within a given spatial query range. In this paper, we propose a new query indexing structure, called the group-aware query region tree (GQR-tree) for efficient evaluation of CMrange monitoring queries. The primary role of the GQR-tree is to help the server leverage the computational capabilities of moving objects in order to improve the system performance in terms of the wireless communication cost and server workload. Through a series of comprehensive simulations, we verify the superiority of the GQR-tree method over the existing methods.

  6. An Experimental Investigation of Complexity in Database Query Formulation Tasks

    ERIC Educational Resources Information Center

    Casterella, Gretchen Irwin; Vijayasarathy, Leo

    2013-01-01

    Information Technology professionals and other knowledge workers rely on their ability to extract data from organizational databases to respond to business questions and support decision making. Structured query language (SQL) is the standard programming language for querying data in relational databases, and SQL skills are in high demand and are…

  7. Research on presentation and query service of geo-spatial data based on ontology

    NASA Astrophysics Data System (ADS)

    Li, Hong-wei; Li, Qin-chao; Cai, Chang

    2008-10-01

    The paper analyzed the deficiency on presentation and query of geo-spatial data existed in current GIS, discussed the advantages that ontology possessed in formalization of geo-spatial data and the presentation of semantic granularity, taken land-use classification system as an example to construct domain ontology, and described it by OWL; realized the grade level and category presentation of land-use data benefited from the thoughts of vertical and horizontal navigation; and then discussed query mode of geo-spatial data based on ontology, including data query based on types and grade levels, instances and spatial relation, and synthetic query based on types and instances; these methods enriched query mode of current GIS, and is a useful attempt; point out that the key point of the presentation and query of spatial data based on ontology is to construct domain ontology that can correctly reflect geo-concept and its spatial relation and realize its fine formalization description.

  8. VIGOR: Interactive Visual Exploration of Graph Query Results.

    PubMed

    Pienta, Robert; Hohman, Fred; Endert, Alex; Tamersoy, Acar; Roundy, Kevin; Gates, Chris; Navathe, Shamkant; Chau, Duen Horng

    2018-01-01

    Finding patterns in graphs has become a vital challenge in many domains from biological systems, network security, to finance (e.g., finding money laundering rings of bankers and business owners). While there is significant interest in graph databases and querying techniques, less research has focused on helping analysts make sense of underlying patterns within a group of subgraph results. Visualizing graph query results is challenging, requiring effective summarization of a large number of subgraphs, each having potentially shared node-values, rich node features, and flexible structure across queries. We present VIGOR, a novel interactive visual analytics system, for exploring and making sense of query results. VIGOR uses multiple coordinated views, leveraging different data representations and organizations to streamline analysts sensemaking process. VIGOR contributes: (1) an exemplar-based interaction technique, where an analyst starts with a specific result and relaxes constraints to find other similar results or starts with only the structure (i.e., without node value constraints), and adds constraints to narrow in on specific results; and (2) a novel feature-aware subgraph result summarization. Through a collaboration with Symantec, we demonstrate how VIGOR helps tackle real-world problems through the discovery of security blindspots in a cybersecurity dataset with over 11,000 incidents. We also evaluate VIGOR with a within-subjects study, demonstrating VIGOR's ease of use over a leading graph database management system, and its ability to help analysts understand their results at higher speed and make fewer errors.

  9. Soil organic carbon response to shrub encroachment regulated by soil aggregates

    NASA Astrophysics Data System (ADS)

    Zhu, Y.; Li, H.; Shen, H.; Feng, Y.; Fang, J.

    2017-12-01

    Shrub encroachment leads to change in soil organic carbon content, but there still exists a lot of uncertainty in its mechanism as it relates to deep soil research. Soil organic carbon is usually associated with stable aggregate quantity. In this study, we conducted a field investigation for typical steppe and desert steppe in Inner Mongolia with the view to examining the impact of shrub encroachment on soil organic carbon with soil aggregate at a depth of 0-500 cm. The results show that in the desert steppe, the particle size of soil aggregate content level in different depth are presented the trend of shrub patches is lower than the herb matrix, organic carbon content of soil aggregate under 50 cm deeper presents the trend of shrub patches is higher than herb matrix, eventually leading to shrub patches whole soil organic carbon in the 0 to 50 cm depth lower than the herb matrix, and in deeper soil below 50 cm higher than the herb matrix. In the typical steppe, there is no significant difference between soil aggregate structure of shrub patches and herb matrix, but organic carbon content of soil aggregate, especially large aggregate organic carbon content in the shrub patches is significantly higher than the herb matrix, so that the whole soil organic carbon content in the shrub patches is significantly higher than herb matrix. The rate of soil organic carbon content change (0-100 cm) by shrub encroachment showed significant negative correlation with the mean weight diameter of soil aggregate of herb matrix. We also found that the variations of soil organic carbon in desert steppe is not dominant by aggregates of some size, but the change of the typical steppe soil organic carbon mainly contributed by > 0.25 mm and 0.053-0.25 mm aggregates. The results suggested that the effects of shrub encroachment on soil organic carbon is regulated by soil aggregate, but it is varied for different type of grassland, which should provide some insights into our understanding on

  10. Towards Building a High Performance Spatial Query System for Large Scale Medical Imaging Data.

    PubMed

    Aji, Ablimit; Wang, Fusheng; Saltz, Joel H

    2012-11-06

    Support of high performance queries on large volumes of scientific spatial data is becoming increasingly important in many applications. This growth is driven by not only geospatial problems in numerous fields, but also emerging scientific applications that are increasingly data- and compute-intensive. For example, digital pathology imaging has become an emerging field during the past decade, where examination of high resolution images of human tissue specimens enables more effective diagnosis, prediction and treatment of diseases. Systematic analysis of large-scale pathology images generates tremendous amounts of spatially derived quantifications of micro-anatomic objects, such as nuclei, blood vessels, and tissue regions. Analytical pathology imaging provides high potential to support image based computer aided diagnosis. One major requirement for this is effective querying of such enormous amount of data with fast response, which is faced with two major challenges: the "big data" challenge and the high computation complexity. In this paper, we present our work towards building a high performance spatial query system for querying massive spatial data on MapReduce. Our framework takes an on demand index building approach for processing spatial queries and a partition-merge approach for building parallel spatial query pipelines, which fits nicely with the computing model of MapReduce. We demonstrate our framework on supporting multi-way spatial joins for algorithm evaluation and nearest neighbor queries for microanatomic objects. To reduce query response time, we propose cost based query optimization to mitigate the effect of data skew. Our experiments show that the framework can efficiently support complex analytical spatial queries on MapReduce.

  11. Towards Building a High Performance Spatial Query System for Large Scale Medical Imaging Data

    PubMed Central

    Aji, Ablimit; Wang, Fusheng; Saltz, Joel H.

    2013-01-01

    Support of high performance queries on large volumes of scientific spatial data is becoming increasingly important in many applications. This growth is driven by not only geospatial problems in numerous fields, but also emerging scientific applications that are increasingly data- and compute-intensive. For example, digital pathology imaging has become an emerging field during the past decade, where examination of high resolution images of human tissue specimens enables more effective diagnosis, prediction and treatment of diseases. Systematic analysis of large-scale pathology images generates tremendous amounts of spatially derived quantifications of micro-anatomic objects, such as nuclei, blood vessels, and tissue regions. Analytical pathology imaging provides high potential to support image based computer aided diagnosis. One major requirement for this is effective querying of such enormous amount of data with fast response, which is faced with two major challenges: the “big data” challenge and the high computation complexity. In this paper, we present our work towards building a high performance spatial query system for querying massive spatial data on MapReduce. Our framework takes an on demand index building approach for processing spatial queries and a partition-merge approach for building parallel spatial query pipelines, which fits nicely with the computing model of MapReduce. We demonstrate our framework on supporting multi-way spatial joins for algorithm evaluation and nearest neighbor queries for microanatomic objects. To reduce query response time, we propose cost based query optimization to mitigate the effect of data skew. Our experiments show that the framework can efficiently support complex analytical spatial queries on MapReduce. PMID:24501719

  12. Monotonically improving approximate answers to relational algebra queries

    NASA Technical Reports Server (NTRS)

    Smith, Kenneth P.; Liu, J. W. S.

    1989-01-01

    We present here a query processing method that produces approximate answers to queries posed in standard relational algebra. This method is monotone in the sense that the accuracy of the approximate result improves with the amount of time spent producing the result. This strategy enables us to trade the time to produce the result for the accuracy of the result. An approximate relational model that characterizes appromimate relations and a partial order for comparing them is developed. Relational operators which operate on and return approximate relations are defined.

  13. Virtual Solar Observatory Distributed Query Construction

    NASA Technical Reports Server (NTRS)

    Gurman, J. B.; Dimitoglou, G.; Bogart, R.; Davey, A.; Hill, F.; Martens, P.

    2003-01-01

    Through a prototype implementation (Tian et al., this meeting) the VSO has already demonstrated the capability of unifying geographically distributed data sources following the Web Services paradigm and utilizing mechanisms such as the Simple Object Access Protocol (SOAP). So far, four participating sites (Stanford, Montana State University, National Solar Observatory and the Solar Data Analysis Center) permit Web-accessible, time-based searches that allow browse access to a number of diverse data sets. Our latest work includes the extension of the simple, time-based queries to include numerous other searchable observation parameters. For VSO users, this extended functionality enables more refined searches. For the VSO, it is a proof of concept that more complex, distributed queries can be effectively constructed and that results from heterogeneous, remote sources can be synthesized and presented to users as a single, virtual data product.

  14. Advances in nowcasting influenza-like illness rates using search query logs

    NASA Astrophysics Data System (ADS)

    Lampos, Vasileios; Miller, Andrew C.; Crossan, Steve; Stefansen, Christian

    2015-08-01

    User-generated content can assist epidemiological surveillance in the early detection and prevalence estimation of infectious diseases, such as influenza. Google Flu Trends embodies the first public platform for transforming search queries to indications about the current state of flu in various places all over the world. However, the original model significantly mispredicted influenza-like illness rates in the US during the 2012-13 flu season. In this work, we build on the previous modeling attempt, proposing substantial improvements. Firstly, we investigate the performance of a widely used linear regularized regression solver, known as the Elastic Net. Then, we expand on this model by incorporating the queries selected by the Elastic Net into a nonlinear regression framework, based on a composite Gaussian Process. Finally, we augment the query-only predictions with an autoregressive model, injecting prior knowledge about the disease. We assess predictive performance using five consecutive flu seasons spanning from 2008 to 2013 and qualitatively explain certain shortcomings of the previous approach. Our results indicate that a nonlinear query modeling approach delivers the lowest cumulative nowcasting error, and also suggest that query information significantly improves autoregressive inferences, obtaining state-of-the-art performance.

  15. Advances in nowcasting influenza-like illness rates using search query logs.

    PubMed

    Lampos, Vasileios; Miller, Andrew C; Crossan, Steve; Stefansen, Christian

    2015-08-03

    User-generated content can assist epidemiological surveillance in the early detection and prevalence estimation of infectious diseases, such as influenza. Google Flu Trends embodies the first public platform for transforming search queries to indications about the current state of flu in various places all over the world. However, the original model significantly mispredicted influenza-like illness rates in the US during the 2012-13 flu season. In this work, we build on the previous modeling attempt, proposing substantial improvements. Firstly, we investigate the performance of a widely used linear regularized regression solver, known as the Elastic Net. Then, we expand on this model by incorporating the queries selected by the Elastic Net into a nonlinear regression framework, based on a composite Gaussian Process. Finally, we augment the query-only predictions with an autoregressive model, injecting prior knowledge about the disease. We assess predictive performance using five consecutive flu seasons spanning from 2008 to 2013 and qualitatively explain certain shortcomings of the previous approach. Our results indicate that a nonlinear query modeling approach delivers the lowest cumulative nowcasting error, and also suggest that query information significantly improves autoregressive inferences, obtaining state-of-the-art performance.

  16. Recycling of construction debris as aggregate in the Mid-Atlantic Region, USA

    USGS Publications Warehouse

    Robinson, G.R.; Menzie, W.D.; Hyun, H.

    2004-01-01

    Reclaimed asphalt pavement (RAP) and portland cement concrete (RPCC) are abundant and available substitutes for natural aggregate in many areas. This paper presents an overview of factors that affect recycled aggregate cost, availability, and engineering performance, and the results of a survey of business practices in the Mid-Atlantic region. For RAP, processing costs are less than those for virgin natural aggregate. Use of efficient asphalt pavement stripping technology, on-site reclamation, and linked two-way transport of asphalt debris and processed asphalt paving mix between asphalt mix plants and paving sites has led to extensive recycling of asphalt pavement in the Mid-Atlantic region of the US. Most of the sites that recycle asphalt pavement (RAP) are located in or near urban areas close to important transportation corridors. RPCC is a viable aggregate source in urban settings where unit costs for processed aggregate from RPCC and natural aggregate are comparable. Disposal fees charged at RPCC recycling sites help defray processing costs and the significantly lower tipping fees at recycling sites versus landfill disposal sites encourage recycling of construction debris as aggregate. Construction contractors and construction debris recycling centers, many of which have the ability to crush and process concrete debris at the job site, produce most RPCC. Production of RPCC aggregate from construction debris that is processed on site using portable equipment moved to the construction site eliminates transportation costs for aggregate and provides an economic incentive for RPCC use. Processing costs, quality and performance issues, and lack of large quantities where needed limit RPCC use. Most RPCC suppliers in the Mid-Atlantic area are located in counties with population densities greater than 400 people/km2 (1036 people/mile2) and that have high unit-value costs and limited local availability of natural aggregate. ?? 2004 Published by Elsevier B.V.

  17. Practical private database queries based on a quantum-key-distribution protocol

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jakobi, Markus; Humboldt-Universitaet zu Berlin, D-10117 Berlin; Simon, Christoph

    2011-02-15

    Private queries allow a user, Alice, to learn an element of a database held by a provider, Bob, without revealing which element she is interested in, while limiting her information about the other elements. We propose to implement private queries based on a quantum-key-distribution protocol, with changes only in the classical postprocessing of the key. This approach makes our scheme both easy to implement and loss tolerant. While unconditionally secure private queries are known to be impossible, we argue that an interesting degree of security can be achieved by relying on fundamental physical principles instead of unverifiable security assumptions inmore » order to protect both the user and the database. We think that the scope exists for such practical private queries to become another remarkable application of quantum information in the footsteps of quantum key distribution.« less

  18. Conceptual mapping of user's queries to medical subject headings.

    PubMed Central

    Zieman, Y. L.; Bleich, H. L.

    1997-01-01

    This paper describes a way to map users' queries to relevant Medical Subject Headings (MeSH terms) used by the National Library of Medicine to index the biomedical literature. The method, called SENSE (SEarch with New SEmantics), transforms words and phrases in the users' queries into primary conceptual components and compares these components with those of the MeSH vocabulary. Similar to the way in which most numbers can be split into numerical factors and expressed as their product--for example, 42 can be expressed as 2*21, 6*7, 3*14, 2*3*7,--so most medical concepts can be split into "semantic factors" and expressed as their juxtaposition. Note that if we split 42 into its primary factors, the breakdown is unique: 2*3*7. Similarly, when we split medical concepts into their "primary semantic factors" the breakdown is also unique. For example, the MeSH term 'renovascular hypertension' can be split morphologically into reno, vascular, hyper, and tension--morphemes that can then be translated into their primary semantic factors--kidney, blood vessel, high, and pressure. By "factoring" each MeSH term in this way, and by similarly factoring the user's query, we can match query to MeSH term by searching for combinations of common factors. Unlike UMLS and other methods that match at the level of words or phrases, SENSE matches at the level of concepts; in this way, a wide variety of words and phrases that have the same meaning produce the same match. Now used in PaperChase, the method is surprisingly powerful in matching users' queries to Medical Subject Headings. PMID:9357680

  19. Persistent Identifiers for Improved Accessibility for Linked Data Querying

    NASA Astrophysics Data System (ADS)

    Shepherd, A.; Chandler, C. L.; Arko, R. A.; Fils, D.; Jones, M. B.; Krisnadhi, A.; Mecum, B.

    2016-12-01

    The adoption of linked open data principles within the geosciences has increased the amount of accessible information available on the Web. However, this data is difficult to consume for those who are unfamiliar with Semantic Web technologies such as Web Ontology Language (OWL), Resource Description Framework (RDF) and SPARQL - the RDF query language. Consumers would need to understand the structure of the data and how to efficiently query it. Furthermore, understanding how to query doesn't solve problems of poor precision and recall in search results. For consumers unfamiliar with the data, full-text searches are most accessible, but not ideal as they arrest the advantages of data disambiguation and co-reference resolution efforts. Conversely, URI searches across linked data can deliver improved search results, but knowledge of these exact URIs may remain difficult to obtain. The increased adoption of Persistent Identifiers (PIDs) can lead to improved linked data querying by a wide variety of consumers. Because PIDs resolve to a single entity, they are an excellent data point for disambiguating content. At the same time, PIDs are more accessible and prominent than a single data provider's linked data URI. When present in linked open datasets, PIDs provide balance between the technical and social hurdles of linked data querying as evidenced by the NSF EarthCube GeoLink project. The GeoLink project, funded by NSF's EarthCube initiative, have brought together data repositories include content from field expeditions, laboratory analyses, journal publications, conference presentations, theses/reports, and funding awards that span scientific studies from marine geology to marine ecosystems and biogeochemistry to paleoclimatology.

  20. Secure quantum private information retrieval using phase-encoded queries

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Olejnik, Lukasz

    We propose a quantum solution to the classical private information retrieval (PIR) problem, which allows one to query a database in a private manner. The protocol offers privacy thresholds and allows the user to obtain information from a database in a way that offers the potential adversary, in this model the database owner, no possibility of deterministically establishing the query contents. This protocol may also be viewed as a solution to the symmetrically private information retrieval problem in that it can offer database security (inability for a querying user to steal its contents). Compared to classical solutions, the protocol offersmore » substantial improvement in terms of communication complexity. In comparison with the recent quantum private queries [Phys. Rev. Lett. 100, 230502 (2008)] protocol, it is more efficient in terms of communication complexity and the number of rounds, while offering a clear privacy parameter. We discuss the security of the protocol and analyze its strengths and conclude that using this technique makes it challenging to obtain the unconditional (in the information-theoretic sense) privacy degree; nevertheless, in addition to being simple, the protocol still offers a privacy level. The oracle used in the protocol is inspired both by the classical computational PIR solutions as well as the Deutsch-Jozsa oracle.« less

  1. Ad-Hoc Queries over Document Collections - A Case Study

    NASA Astrophysics Data System (ADS)

    Löser, Alexander; Lutter, Steffen; Düssel, Patrick; Markl, Volker

    We discuss the novel problem of supporting analytical business intelligence queries over web-based textual content, e.g., BI-style reports based on 100.000's of documents from an ad-hoc web search result. Neither conventional search engines nor conventional Business Intelligence and ETL tools address this problem, which lies at the intersection of their capabilities. "Google Squared" or our system GOOLAP.info, are examples of these kinds of systems. They execute information extraction methods over one or several document collections at query time and integrate extracted records into a common view or tabular structure. Frequent extraction and object resolution failures cause incomplete records which could not be joined into a record answering the query. Our focus is the identification of join-reordering heuristics maximizing the size of complete records answering a structured query. With respect to given costs for document extraction we propose two novel join-operations: The multi-way CJ-operator joins records from multiple relationships extracted from a single document. The two-way join-operator DJ ensures data density by removing incomplete records from results. In a preliminary case study we observe that our join-reordering heuristics positively impact result size, record density and lower execution costs.

  2. Recycling of air pollution control residues from municipal solid waste incineration into lightweight aggregates.

    PubMed

    Quina, Margarida J; Bordado, João M; Quinta-Ferreira, Rosa M

    2014-02-01

    This work focuses on the assessment of technological properties and on the leaching behavior of lightweight aggregates (LWA) produced by incorporating different quantities of air pollution control (APC) residues from municipal solid waste (MSW) incineration. Currently this hazardous waste has been mostly landfilled after stabilization/solidification. The LWA were produced by pelletizing natural clay, APC residues as-received from incineration plant, or after a washing treatment, a small amount of oil and water. The pellets were fired in a laboratory chamber furnace over calcium carbonate. The main technological properties of the LWA were evaluated, mainly concerning morphology, bulk and particle densities, compressive strength, bloating index, water adsorption and porosity. Given that APC residues do not own expansive (bloating) properties, the incorporation into LWA is only possible in moderate quantities, such as 3% as received or 5% after pre-washing treatment. The leaching behavior of heavy metals from sintered LWA using water or acid solutions was investigated, and despite the low acid neutralization capacity of the synthetic aggregates, the released quantities were low over a wide pH range. In conclusion, after a washing pre-treatment and if the percentage of incorporation is low, these residues may be incorporated into LWA. However, the recycling of APC residues from MSW incineration into LWA does not revealed any technical advantage. Copyright © 2013 Elsevier Ltd. All rights reserved.

  3. STARS 2.0: 2nd-generation open-source archiving and query software

    NASA Astrophysics Data System (ADS)

    Winegar, Tom

    2008-07-01

    The Subaru Telescope is in process of developing an open-source alternative to the 1st-generation software and databases (STARS 1) used for archiving and query. For STARS 2, we have chosen PHP and Python for scripting and MySQL as the database software. We have collected feedback from staff and observers, and used this feedback to significantly improve the design and functionality of our future archiving and query software. Archiving - We identified two weaknesses in 1st-generation STARS archiving software: a complex and inflexible table structure and uncoordinated system administration for our business model: taking pictures from the summit and archiving them in both Hawaii and Japan. We adopted a simplified and normalized table structure with passive keyword collection, and we are designing an archive-to-archive file transfer system that automatically reports real-time status and error conditions and permits error recovery. Query - We identified several weaknesses in 1st-generation STARS query software: inflexible query tools, poor sharing of calibration data, and no automatic file transfer mechanisms to observers. We are developing improved query tools and sharing of calibration data, and multi-protocol unassisted file transfer mechanisms for observers. In the process, we have redefined a 'query': from an invisible search result that can only transfer once in-house right now, with little status and error reporting and no error recovery - to a stored search result that can be monitored, transferred to different locations with multiple protocols, reporting status and error conditions and permitting recovery from errors.

  4. Adverse Reactions Associated With Cannabis Consumption as Evident From Search Engine Queries

    PubMed Central

    Lev-Ran, Shaul

    2017-01-01

    Background Cannabis is one of the most widely used psychoactive substances worldwide, but adverse drug reactions (ADRs) associated with its use are difficult to study because of its prohibited status in many countries. Objective Internet search engine queries have been used to investigate ADRs in pharmaceutical drugs. In this proof-of-concept study, we tested whether these queries can be used to detect the adverse reactions of cannabis use. Methods We analyzed anonymized queries from US-based users of Bing, a widely used search engine, made over a period of 6 months and compared the results with the prevalence of cannabis use as reported in the US National Survey on Drug Use in the Household (NSDUH) and with ADRs reported in the Food and Drug Administration’s Adverse Drug Reporting System. Predicted prevalence of cannabis use was estimated from the fraction of people making queries about cannabis, marijuana, and 121 additional synonyms. Predicted ADRs were estimated from queries containing layperson descriptions to 195 ICD-10 symptoms list. Results Our results indicated that the predicted prevalence of cannabis use at the US census regional level reaches an R2 of .71 NSDUH data. Queries for ADRs made by people who also searched for cannabis reveal many of the known adverse effects of cannabis (eg, cough and psychotic symptoms), as well as plausible unknown reactions (eg, pyrexia). Conclusions These results indicate that search engine queries can serve as an important tool for the study of adverse reactions of illicit drugs, which are difficult to study in other settings. PMID:29074469

  5. Web search queries can predict stock market volumes.

    PubMed

    Bordino, Ilaria; Battiston, Stefano; Caldarelli, Guido; Cristelli, Matthieu; Ukkonen, Antti; Weber, Ingmar

    2012-01-01

    We live in a computerized and networked society where many of our actions leave a digital trace and affect other people's actions. This has lead to the emergence of a new data-driven research field: mathematical methods of computer science, statistical physics and sociometry provide insights on a wide range of disciplines ranging from social science to human mobility. A recent important discovery is that search engine traffic (i.e., the number of requests submitted by users to search engines on the www) can be used to track and, in some cases, to anticipate the dynamics of social phenomena. Successful examples include unemployment levels, car and home sales, and epidemics spreading. Few recent works applied this approach to stock prices and market sentiment. However, it remains unclear if trends in financial markets can be anticipated by the collective wisdom of on-line users on the web. Here we show that daily trading volumes of stocks traded in NASDAQ-100 are correlated with daily volumes of queries related to the same stocks. In particular, query volumes anticipate in many cases peaks of trading by one day or more. Our analysis is carried out on a unique dataset of queries, submitted to an important web search engine, which enable us to investigate also the user behavior. We show that the query volume dynamics emerges from the collective but seemingly uncoordinated activity of many users. These findings contribute to the debate on the identification of early warnings of financial systemic risk, based on the activity of users of the www.

  6. Web Search Queries Can Predict Stock Market Volumes

    PubMed Central

    Bordino, Ilaria; Battiston, Stefano; Caldarelli, Guido; Cristelli, Matthieu; Ukkonen, Antti; Weber, Ingmar

    2012-01-01

    We live in a computerized and networked society where many of our actions leave a digital trace and affect other people’s actions. This has lead to the emergence of a new data-driven research field: mathematical methods of computer science, statistical physics and sociometry provide insights on a wide range of disciplines ranging from social science to human mobility. A recent important discovery is that search engine traffic (i.e., the number of requests submitted by users to search engines on the www) can be used to track and, in some cases, to anticipate the dynamics of social phenomena. Successful examples include unemployment levels, car and home sales, and epidemics spreading. Few recent works applied this approach to stock prices and market sentiment. However, it remains unclear if trends in financial markets can be anticipated by the collective wisdom of on-line users on the web. Here we show that daily trading volumes of stocks traded in NASDAQ-100 are correlated with daily volumes of queries related to the same stocks. In particular, query volumes anticipate in many cases peaks of trading by one day or more. Our analysis is carried out on a unique dataset of queries, submitted to an important web search engine, which enable us to investigate also the user behavior. We show that the query volume dynamics emerges from the collective but seemingly uncoordinated activity of many users. These findings contribute to the debate on the identification of early warnings of financial systemic risk, based on the activity of users of the www. PMID:22829871

  7. SeqWare Query Engine: storing and searching sequence data in the cloud.

    PubMed

    O'Connor, Brian D; Merriman, Barry; Nelson, Stanley F

    2010-12-21

    Since the introduction of next-generation DNA sequencers the rapid increase in sequencer throughput, and associated drop in costs, has resulted in more than a dozen human genomes being resequenced over the last few years. These efforts are merely a prelude for a future in which genome resequencing will be commonplace for both biomedical research and clinical applications. The dramatic increase in sequencer output strains all facets of computational infrastructure, especially databases and query interfaces. The advent of cloud computing, and a variety of powerful tools designed to process petascale datasets, provide a compelling solution to these ever increasing demands. In this work, we present the SeqWare Query Engine which has been created using modern cloud computing technologies and designed to support databasing information from thousands of genomes. Our backend implementation was built using the highly scalable, NoSQL HBase database from the Hadoop project. We also created a web-based frontend that provides both a programmatic and interactive query interface and integrates with widely used genome browsers and tools. Using the query engine, users can load and query variants (SNVs, indels, translocations, etc) with a rich level of annotations including coverage and functional consequences. As a proof of concept we loaded several whole genome datasets including the U87MG cell line. We also used a glioblastoma multiforme tumor/normal pair to both profile performance and provide an example of using the Hadoop MapReduce framework within the query engine. This software is open source and freely available from the SeqWare project (http://seqware.sourceforge.net). The SeqWare Query Engine provided an easy way to make the U87MG genome accessible to programmers and non-programmers alike. This enabled a faster and more open exploration of results, quicker tuning of parameters for heuristic variant calling filters, and a common data interface to simplify development of

  8. SeqWare Query Engine: storing and searching sequence data in the cloud

    PubMed Central

    2010-01-01

    Background Since the introduction of next-generation DNA sequencers the rapid increase in sequencer throughput, and associated drop in costs, has resulted in more than a dozen human genomes being resequenced over the last few years. These efforts are merely a prelude for a future in which genome resequencing will be commonplace for both biomedical research and clinical applications. The dramatic increase in sequencer output strains all facets of computational infrastructure, especially databases and query interfaces. The advent of cloud computing, and a variety of powerful tools designed to process petascale datasets, provide a compelling solution to these ever increasing demands. Results In this work, we present the SeqWare Query Engine which has been created using modern cloud computing technologies and designed to support databasing information from thousands of genomes. Our backend implementation was built using the highly scalable, NoSQL HBase database from the Hadoop project. We also created a web-based frontend that provides both a programmatic and interactive query interface and integrates with widely used genome browsers and tools. Using the query engine, users can load and query variants (SNVs, indels, translocations, etc) with a rich level of annotations including coverage and functional consequences. As a proof of concept we loaded several whole genome datasets including the U87MG cell line. We also used a glioblastoma multiforme tumor/normal pair to both profile performance and provide an example of using the Hadoop MapReduce framework within the query engine. This software is open source and freely available from the SeqWare project (http://seqware.sourceforge.net). Conclusions The SeqWare Query Engine provided an easy way to make the U87MG genome accessible to programmers and non-programmers alike. This enabled a faster and more open exploration of results, quicker tuning of parameters for heuristic variant calling filters, and a common data

  9. Generating PubMed Chemical Queries for Consumer Health Literature

    PubMed Central

    Loo, Jeffery; Chang, Hua Florence; Hochstein, Colette; Sun, Ying

    2005-01-01

    Two popular NLM resources that provide information for consumers about chemicals and their safety are the Household Products Database and Haz-Map. Search queries to PubMed via web links were generated from these databases. The query retrieves consumer health-oriented literature about adverse effects of chemicals. The retrieval was limited to a manageable set of 20 to 60 citations, achieved by successively applying increasing limits to the search until the desired number of references was reached. PMID:16779322

  10. Analysis of DNS Cache Effects on Query Distribution

    PubMed Central

    2013-01-01

    This paper studies the DNS cache effects that occur on query distribution at the CN top-level domain (TLD) server. We first filter out the malformed DNS queries to purify the log data pollution according to six categories. A model for DNS resolution, more specifically DNS caching, is presented. We demonstrate the presence and magnitude of DNS cache effects and the cache sharing effects on the request distribution through analytic model and simulation. CN TLD log data results are provided and analyzed based on the cache model. The approximate TTL distribution for domain name is inferred quantificationally. PMID:24396313

  11. Analysis of DNS cache effects on query distribution.

    PubMed

    Wang, Zheng

    2013-01-01

    This paper studies the DNS cache effects that occur on query distribution at the CN top-level domain (TLD) server. We first filter out the malformed DNS queries to purify the log data pollution according to six categories. A model for DNS resolution, more specifically DNS caching, is presented. We demonstrate the presence and magnitude of DNS cache effects and the cache sharing effects on the request distribution through analytic model and simulation. CN TLD log data results are provided and analyzed based on the cache model. The approximate TTL distribution for domain name is inferred quantificationally.

  12. Semantic Annotations and Querying of Web Data Sources

    NASA Astrophysics Data System (ADS)

    Hornung, Thomas; May, Wolfgang

    A large part of the Web, actually holding a significant portion of the useful information throughout the Web, consists of views on hidden databases, provided by numerous heterogeneous interfaces that are partly human-oriented via Web forms ("Deep Web"), and partly based on Web Services (only machine accessible). In this paper we present an approach for annotating these sources in a way that makes them citizens of the Semantic Web. We illustrate how queries can be stated in terms of the ontology, and how the annotations are used to selected and access appropriate sources and to answer the queries.

  13. Estimating Influenza Outbreaks Using Both Search Engine Query Data and Social Media Data in South Korea.

    PubMed

    Woo, Hyekyung; Cho, Youngtae; Shim, Eunyoung; Lee, Jong-Koo; Lee, Chang-Gun; Kim, Seong Hwan

    2016-07-04

    As suggested as early as in 2006, logs of queries submitted to search engines seeking information could be a source for detection of emerging influenza epidemics if changes in the volume of search queries are monitored (infodemiology). However, selecting queries that are most likely to be associated with influenza epidemics is a particular challenge when it comes to generating better predictions. In this study, we describe a methodological extension for detecting influenza outbreaks using search query data; we provide a new approach for query selection through the exploration of contextual information gleaned from social media data. Additionally, we evaluate whether it is possible to use these queries for monitoring and predicting influenza epidemics in South Korea. Our study was based on freely available weekly influenza incidence data and query data originating from the search engine on the Korean website Daum between April 3, 2011 and April 5, 2014. To select queries related to influenza epidemics, several approaches were applied: (1) exploring influenza-related words in social media data, (2) identifying the chief concerns related to influenza, and (3) using Web query recommendations. Optimal feature selection by least absolute shrinkage and selection operator (Lasso) and support vector machine for regression (SVR) were used to construct a model predicting influenza epidemics. In total, 146 queries related to influenza were generated through our initial query selection approach. A considerable proportion of optimal features for final models were derived from queries with reference to the social media data. The SVR model performed well: the prediction values were highly correlated with the recent observed influenza-like illness (r=.956; P<.001) and virological incidence rate (r=.963; P<.001). These results demonstrate the feasibility of using search queries to enhance influenza surveillance in South Korea. In addition, an approach for query selection using

  14. Estimating Influenza Outbreaks Using Both Search Engine Query Data and Social Media Data in South Korea

    PubMed Central

    Woo, Hyekyung; Shim, Eunyoung; Lee, Jong-Koo; Lee, Chang-Gun; Kim, Seong Hwan

    2016-01-01

    Background As suggested as early as in 2006, logs of queries submitted to search engines seeking information could be a source for detection of emerging influenza epidemics if changes in the volume of search queries are monitored (infodemiology). However, selecting queries that are most likely to be associated with influenza epidemics is a particular challenge when it comes to generating better predictions. Objective In this study, we describe a methodological extension for detecting influenza outbreaks using search query data; we provide a new approach for query selection through the exploration of contextual information gleaned from social media data. Additionally, we evaluate whether it is possible to use these queries for monitoring and predicting influenza epidemics in South Korea. Methods Our study was based on freely available weekly influenza incidence data and query data originating from the search engine on the Korean website Daum between April 3, 2011 and April 5, 2014. To select queries related to influenza epidemics, several approaches were applied: (1) exploring influenza-related words in social media data, (2) identifying the chief concerns related to influenza, and (3) using Web query recommendations. Optimal feature selection by least absolute shrinkage and selection operator (Lasso) and support vector machine for regression (SVR) were used to construct a model predicting influenza epidemics. Results In total, 146 queries related to influenza were generated through our initial query selection approach. A considerable proportion of optimal features for final models were derived from queries with reference to the social media data. The SVR model performed well: the prediction values were highly correlated with the recent observed influenza-like illness (r=.956; P<.001) and virological incidence rate (r=.963; P<.001). Conclusions These results demonstrate the feasibility of using search queries to enhance influenza surveillance in South Korea. In

  15. An index-based algorithm for fast on-line query processing of latent semantic analysis.

    PubMed

    Zhang, Mingxi; Li, Pohan; Wang, Wei

    2017-01-01

    Latent Semantic Analysis (LSA) is widely used for finding the documents whose semantic is similar to the query of keywords. Although LSA yield promising similar results, the existing LSA algorithms involve lots of unnecessary operations in similarity computation and candidate check during on-line query processing, which is expensive in terms of time cost and cannot efficiently response the query request especially when the dataset becomes large. In this paper, we study the efficiency problem of on-line query processing for LSA towards efficiently searching the similar documents to a given query. We rewrite the similarity equation of LSA combined with an intermediate value called partial similarity that is stored in a designed index called partial index. For reducing the searching space, we give an approximate form of similarity equation, and then develop an efficient algorithm for building partial index, which skips the partial similarities lower than a given threshold θ. Based on partial index, we develop an efficient algorithm called ILSA for supporting fast on-line query processing. The given query is transformed into a pseudo document vector, and the similarities between query and candidate documents are computed by accumulating the partial similarities obtained from the index nodes corresponds to non-zero entries in the pseudo document vector. Compared to the LSA algorithm, ILSA reduces the time cost of on-line query processing by pruning the candidate documents that are not promising and skipping the operations that make little contribution to similarity scores. Extensive experiments through comparison with LSA have been done, which demonstrate the efficiency and effectiveness of our proposed algorithm.

  16. Querying databases of trajectories of differential equations 2: Index functions

    NASA Technical Reports Server (NTRS)

    Grossman, Robert

    1991-01-01

    Suppose that a large number of parameterized trajectories (gamma) of a dynamical system evolving in R sup N are stored in a database. Let eta is contained R sup N denote a parameterized path in Euclidean space, and let parallel to center dot parallel to denote a norm on the space of paths. A data structures and indices for trajectories are defined and algorithms are given to answer queries of the following forms: Query 1. Given a path eta, determine whether eta occurs as a subtrajectory of any trajectory gamma from the database. If so, return the trajectory; otherwise, return null. Query 2. Given a path eta, return the trajectory gamma from the database which minimizes the norm parallel to eta - gamma parallel.

  17. Predicting the hand, foot, and mouth disease incidence using search engine query data and climate variables: an ecological study in Guangdong, China.

    PubMed

    Du, Zhicheng; Xu, Lin; Zhang, Wangjian; Zhang, Dingmei; Yu, Shicheng; Hao, Yuantao

    2017-10-06

    Hand, foot, and mouth disease (HFMD) has caused a substantial burden in China, especially in Guangdong Province. Based on the enhanced surveillance system, we aimed to explore whether the addition of temperate and search engine query data improves the risk prediction of HFMD. Ecological study. Information on the confirmed cases of HFMD, climate parameters and search engine query logs was collected. A total of 1.36 million HFMD cases were identified from the surveillance system during 2011-2014. Analyses were conducted at aggregate level and no confidential information was involved. A seasonal autoregressive integrated moving average (ARIMA) model with external variables (ARIMAX) was used to predict the HFMD incidence from 2011 to 2014, taking into account temperature and search engine query data (Baidu Index, BDI). Statistics of goodness-of-fit and precision of prediction were used to compare models (1) based on surveillance data only, and with the addition of (2) temperature, (3) BDI, and (4) both temperature and BDI. A high correlation between HFMD incidence and BDI ( r =0.794, p<0.001) or temperature ( r =0.657, p<0.001) was observed using both time series plot and correlation matrix. A linear effect of BDI (without lag) and non-linear effect of temperature (1 week lag) on HFMD incidence were found in a distributed lag non-linear model. Compared with the model based on surveillance data only, the ARIMAX model including BDI reached the best goodness-of-fit with an Akaike information criterion (AIC) value of -345.332, whereas the model including both BDI and temperature had the most accurate prediction in terms of the mean absolute percentage error (MAPE) of 101.745%. An ARIMAX model incorporating search engine query data significantly improved the prediction of HFMD. Further studies are warranted to examine whether including search engine query data also improves the prediction of other infectious diseases in other settings. © Article author(s) (or their

  18. Semantic querying of relational data for clinical intelligence: a semantic web services-based approach

    PubMed Central

    2013-01-01

    Background Clinical Intelligence, as a research and engineering discipline, is dedicated to the development of tools for data analysis for the purposes of clinical research, surveillance, and effective health care management. Self-service ad hoc querying of clinical data is one desirable type of functionality. Since most of the data are currently stored in relational or similar form, ad hoc querying is problematic as it requires specialised technical skills and the knowledge of particular data schemas. Results A possible solution is semantic querying where the user formulates queries in terms of domain ontologies that are much easier to navigate and comprehend than data schemas. In this article, we are exploring the possibility of using SADI Semantic Web services for semantic querying of clinical data. We have developed a prototype of a semantic querying infrastructure for the surveillance of, and research on, hospital-acquired infections. Conclusions Our results suggest that SADI can support ad-hoc, self-service, semantic queries of relational data in a Clinical Intelligence context. The use of SADI compares favourably with approaches based on declarative semantic mappings from data schemas to ontologies, such as query rewriting and RDFizing by materialisation, because it can easily cope with situations when (i) some computation is required to turn relational data into RDF or OWL, e.g., to implement temporal reasoning, or (ii) integration with external data sources is necessary. PMID:23497556

  19. Efficient Queries of Stand-off Annotations for Natural Language Processing on Electronic Medical Records.

    PubMed

    Luo, Yuan; Szolovits, Peter

    2016-01-01

    In natural language processing, stand-off annotation uses the starting and ending positions of an annotation to anchor it to the text and stores the annotation content separately from the text. We address the fundamental problem of efficiently storing stand-off annotations when applying natural language processing on narrative clinical notes in electronic medical records (EMRs) and efficiently retrieving such annotations that satisfy position constraints. Efficient storage and retrieval of stand-off annotations can facilitate tasks such as mapping unstructured text to electronic medical record ontologies. We first formulate this problem into the interval query problem, for which optimal query/update time is in general logarithm. We next perform a tight time complexity analysis on the basic interval tree query algorithm and show its nonoptimality when being applied to a collection of 13 query types from Allen's interval algebra. We then study two closely related state-of-the-art interval query algorithms, proposed query reformulations, and augmentations to the second algorithm. Our proposed algorithm achieves logarithmic time stabbing-max query time complexity and solves the stabbing-interval query tasks on all of Allen's relations in logarithmic time, attaining the theoretic lower bound. Updating time is kept logarithmic and the space requirement is kept linear at the same time. We also discuss interval management in external memory models and higher dimensions.

  20. Efficient Queries of Stand-off Annotations for Natural Language Processing on Electronic Medical Records

    PubMed Central

    Luo, Yuan; Szolovits, Peter

    2016-01-01

    In natural language processing, stand-off annotation uses the starting and ending positions of an annotation to anchor it to the text and stores the annotation content separately from the text. We address the fundamental problem of efficiently storing stand-off annotations when applying natural language processing on narrative clinical notes in electronic medical records (EMRs) and efficiently retrieving such annotations that satisfy position constraints. Efficient storage and retrieval of stand-off annotations can facilitate tasks such as mapping unstructured text to electronic medical record ontologies. We first formulate this problem into the interval query problem, for which optimal query/update time is in general logarithm. We next perform a tight time complexity analysis on the basic interval tree query algorithm and show its nonoptimality when being applied to a collection of 13 query types from Allen’s interval algebra. We then study two closely related state-of-the-art interval query algorithms, proposed query reformulations, and augmentations to the second algorithm. Our proposed algorithm achieves logarithmic time stabbing-max query time complexity and solves the stabbing-interval query tasks on all of Allen’s relations in logarithmic time, attaining the theoretic lower bound. Updating time is kept logarithmic and the space requirement is kept linear at the same time. We also discuss interval management in external memory models and higher dimensions. PMID:27478379

  1. Automatic Query Formulations in Information Retrieval.

    ERIC Educational Resources Information Center

    Salton, G.; And Others

    1983-01-01

    Introduces methods designed to reduce role of search intermediaries by generating Boolean search formulations automatically using term frequency considerations from natural language statements provided by system patrons. Experimental results are supplied and methods are described for applying automatic query formulation process in practice.…

  2. Fuzzy queries above relational database

    NASA Astrophysics Data System (ADS)

    Smolka, Pavel; Bradac, Vladimir

    2017-11-01

    The aim of the theme is to introduce a possibility of fuzzy queries implemented in relational databases. The issue is described on a model which identifies the appropriate part of the problem domain for fuzzy approach. The model is demonstrated on a database of wines focused on searching in it. The construction of the database complies with the Law of the Czech Republic.

  3. An Ontology-Based Reasoning Framework for Querying Satellite Images for Disaster Monitoring.

    PubMed

    Alirezaie, Marjan; Kiselev, Andrey; Längkvist, Martin; Klügl, Franziska; Loutfi, Amy

    2017-11-05

    This paper presents a framework in which satellite images are classified and augmented with additional semantic information to enable queries about what can be found on the map at a particular location, but also about paths that can be taken. This is achieved by a reasoning framework based on qualitative spatial reasoning that is able to find answers to high level queries that may vary on the current situation. This framework called SemCityMap, provides the full pipeline from enriching the raw image data with rudimentary labels to the integration of a knowledge representation and reasoning methods to user interfaces for high level querying. To illustrate the utility of SemCityMap in a disaster scenario, we use an urban environment-central Stockholm-in combination with a flood simulation. We show that the system provides useful answers to high-level queries also with respect to the current flood status. Examples of such queries concern path planning for vehicles or retrieval of safe regions such as "find all regions close to schools and far from the flooded area". The particular advantage of our approach lies in the fact that ontological information and reasoning is explicitly integrated so that queries can be formulated in a natural way using concepts on appropriate level of abstraction, including additional constraints.

  4. An Ontology-Based Reasoning Framework for Querying Satellite Images for Disaster Monitoring

    PubMed Central

    Alirezaie, Marjan; Klügl, Franziska; Loutfi, Amy

    2017-01-01

    This paper presents a framework in which satellite images are classified and augmented with additional semantic information to enable queries about what can be found on the map at a particular location, but also about paths that can be taken. This is achieved by a reasoning framework based on qualitative spatial reasoning that is able to find answers to high level queries that may vary on the current situation. This framework called SemCityMap, provides the full pipeline from enriching the raw image data with rudimentary labels to the integration of a knowledge representation and reasoning methods to user interfaces for high level querying. To illustrate the utility of SemCityMap in a disaster scenario, we use an urban environment—central Stockholm—in combination with a flood simulation. We show that the system provides useful answers to high-level queries also with respect to the current flood status. Examples of such queries concern path planning for vehicles or retrieval of safe regions such as “find all regions close to schools and far from the flooded area”. The particular advantage of our approach lies in the fact that ontological information and reasoning is explicitly integrated so that queries can be formulated in a natural way using concepts on appropriate level of abstraction, including additional constraints. PMID:29113073

  5. FTree query construction for virtual screening: a statistical analysis.

    PubMed

    Gerlach, Christof; Broughton, Howard; Zaliani, Andrea

    2008-02-01

    FTrees (FT) is a known chemoinformatic tool able to condense molecular descriptions into a graph object and to search for actives in large databases using graph similarity. The query graph is classically derived from a known active molecule, or a set of actives, for which a similar compound has to be found. Recently, FT similarity has been extended to fragment space, widening its capabilities. If a user were able to build a knowledge-based FT query from information other than a known active structure, the similarity search could be combined with other, normally separate, fields like de-novo design or pharmacophore searches. With this aim in mind, we performed a comprehensive analysis of several databases in terms of FT description and provide a basic statistical analysis of the FT spaces so far at hand. Vendors' catalogue collections and MDDR as a source of potential or known "actives", respectively, have been used. With the results reported herein, a set of ranges, mean values and standard deviations for several query parameters are presented in order to set a reference guide for the users. Applications on how to use this information in FT query building are also provided, using a newly built 3D-pharmacophore from 57 5HT-1F agonists and a published one which was used for virtual screening for tRNA-guanine transglycosylase (TGT) inhibitors.

  6. Secure quantum private information retrieval using phase-encoded queries

    NASA Astrophysics Data System (ADS)

    Olejnik, Lukasz

    2011-08-01

    We propose a quantum solution to the classical private information retrieval (PIR) problem, which allows one to query a database in a private manner. The protocol offers privacy thresholds and allows the user to obtain information from a database in a way that offers the potential adversary, in this model the database owner, no possibility of deterministically establishing the query contents. This protocol may also be viewed as a solution to the symmetrically private information retrieval problem in that it can offer database security (inability for a querying user to steal its contents). Compared to classical solutions, the protocol offers substantial improvement in terms of communication complexity. In comparison with the recent quantum private queries [Phys. Rev. Lett.PRLTAO0031-900710.1103/PhysRevLett.100.230502 100, 230502 (2008)] protocol, it is more efficient in terms of communication complexity and the number of rounds, while offering a clear privacy parameter. We discuss the security of the protocol and analyze its strengths and conclude that using this technique makes it challenging to obtain the unconditional (in the information-theoretic sense) privacy degree; nevertheless, in addition to being simple, the protocol still offers a privacy level. The oracle used in the protocol is inspired both by the classical computational PIR solutions as well as the Deutsch-Jozsa oracle.

  7. FTree query construction for virtual screening: a statistical analysis

    NASA Astrophysics Data System (ADS)

    Gerlach, Christof; Broughton, Howard; Zaliani, Andrea

    2008-02-01

    FTrees (FT) is a known chemoinformatic tool able to condense molecular descriptions into a graph object and to search for actives in large databases using graph similarity. The query graph is classically derived from a known active molecule, or a set of actives, for which a similar compound has to be found. Recently, FT similarity has been extended to fragment space, widening its capabilities. If a user were able to build a knowledge-based FT query from information other than a known active structure, the similarity search could be combined with other, normally separate, fields like de-novo design or pharmacophore searches. With this aim in mind, we performed a comprehensive analysis of several databases in terms of FT description and provide a basic statistical analysis of the FT spaces so far at hand. Vendors' catalogue collections and MDDR as a source of potential or known "actives", respectively, have been used. With the results reported herein, a set of ranges, mean values and standard deviations for several query parameters are presented in order to set a reference guide for the users. Applications on how to use this information in FT query building are also provided, using a newly built 3D-pharmacophore from 57 5HT-1F agonists and a published one which was used for virtual screening for tRNA-guanine transglycosylase (TGT) inhibitors.

  8. Adverse Reactions Associated With Cannabis Consumption as Evident From Search Engine Queries.

    PubMed

    Yom-Tov, Elad; Lev-Ran, Shaul

    2017-10-26

    Cannabis is one of the most widely used psychoactive substances worldwide, but adverse drug reactions (ADRs) associated with its use are difficult to study because of its prohibited status in many countries. Internet search engine queries have been used to investigate ADRs in pharmaceutical drugs. In this proof-of-concept study, we tested whether these queries can be used to detect the adverse reactions of cannabis use. We analyzed anonymized queries from US-based users of Bing, a widely used search engine, made over a period of 6 months and compared the results with the prevalence of cannabis use as reported in the US National Survey on Drug Use in the Household (NSDUH) and with ADRs reported in the Food and Drug Administration's Adverse Drug Reporting System. Predicted prevalence of cannabis use was estimated from the fraction of people making queries about cannabis, marijuana, and 121 additional synonyms. Predicted ADRs were estimated from queries containing layperson descriptions to 195 ICD-10 symptoms list. Our results indicated that the predicted prevalence of cannabis use at the US census regional level reaches an R 2 of .71 NSDUH data. Queries for ADRs made by people who also searched for cannabis reveal many of the known adverse effects of cannabis (eg, cough and psychotic symptoms), as well as plausible unknown reactions (eg, pyrexia). These results indicate that search engine queries can serve as an important tool for the study of adverse reactions of illicit drugs, which are difficult to study in other settings. ©Elad Yom-Tov, Shaul Lev-Ran. Originally published in JMIR Public Health and Surveillance (http://publichealth.jmir.org), 26.10.2017.

  9. Visually defining and querying consistent multi-granular clinical temporal abstractions.

    PubMed

    Combi, Carlo; Oliboni, Barbara

    2012-02-01

    The main goal of this work is to propose a framework for the visual specification and query of consistent multi-granular clinical temporal abstractions. We focus on the issue of querying patient clinical information by visually defining and composing temporal abstractions, i.e., high level patterns derived from several time-stamped raw data. In particular, we focus on the visual specification of consistent temporal abstractions with different granularities and on the visual composition of different temporal abstractions for querying clinical databases. Temporal abstractions on clinical data provide a concise and high-level description of temporal raw data, and a suitable way to support decision making. Granularities define partitions on the time line and allow one to represent time and, thus, temporal clinical information at different levels of detail, according to the requirements coming from the represented clinical domain. The visual representation of temporal information has been considered since several years in clinical domains. Proposed visualization techniques must be easy and quick to understand, and could benefit from visual metaphors that do not lead to ambiguous interpretations. Recently, physical metaphors such as strips, springs, weights, and wires have been proposed and evaluated on clinical users for the specification of temporal clinical abstractions. Visual approaches to boolean queries have been considered in the last years and confirmed that the visual support to the specification of complex boolean queries is both an important and difficult research topic. We propose and describe a visual language for the definition of temporal abstractions based on a set of intuitive metaphors (striped wall, plastered wall, brick wall), allowing the clinician to use different granularities. A new algorithm, underlying the visual language, allows the physician to specify only consistent abstractions, i.e., abstractions not containing contradictory conditions on

  10. Anytime query-tuned kernel machine classifiers via Cholesky factorization

    NASA Technical Reports Server (NTRS)

    DeCoste, D.

    2002-01-01

    We recently demonstrated 2 to 64-fold query-time speedups of Support Vector Machine and Kernel Fisher classifiers via a new computational geometry method for anytime output bounds (DeCoste,2002). This new paper refines our approach in two key ways. First, we introduce a simple linear algebra formulation based on Cholesky factorization, yielding simpler equations and lower computational overhead. Second, this new formulation suggests new methods for achieving additional speedups, including tuning on query samples. We demonstrate effectiveness on benchmark datasets.

  11. Using Common Table Expressions to Build a Scalable Boolean Query Generator for Clinical Data Warehouses

    PubMed Central

    Harris, Daniel R.; Henderson, Darren W.; Kavuluru, Ramakanth; Stromberg, Arnold J.; Johnson, Todd R.

    2015-01-01

    We present a custom, Boolean query generator utilizing common-table expressions (CTEs) that is capable of scaling with big datasets. The generator maps user-defined Boolean queries, such as those interactively created in clinical-research and general-purpose healthcare tools, into SQL. We demonstrate the effectiveness of this generator by integrating our work into the Informatics for Integrating Biology and the Bedside (i2b2) query tool and show that it is capable of scaling. Our custom generator replaces and outperforms the default query generator found within the Clinical Research Chart (CRC) cell of i2b2. In our experiments, sixteen different types of i2b2 queries were identified by varying four constraints: date, frequency, exclusion criteria, and whether selected concepts occurred in the same encounter. We generated non-trivial, random Boolean queries based on these 16 types; the corresponding SQL queries produced by both generators were compared by execution times. The CTE-based solution significantly outperformed the default query generator and provided a much more consistent response time across all query types (M=2.03, SD=6.64 vs. M=75.82, SD=238.88 seconds). Without costly hardware upgrades, we provide a scalable solution based on CTEs with very promising empirical results centered on performance gains. The evaluation methodology used for this provides a means of profiling clinical data warehouse performance. PMID:25192572

  12. Query-based biclustering of gene expression data using Probabilistic Relational Models.

    PubMed

    Zhao, Hui; Cloots, Lore; Van den Bulcke, Tim; Wu, Yan; De Smet, Riet; Storms, Valerie; Meysman, Pieter; Engelen, Kristof; Marchal, Kathleen

    2011-02-15

    With the availability of large scale expression compendia it is now possible to view own findings in the light of what is already available and retrieve genes with an expression profile similar to a set of genes of interest (i.e., a query or seed set) for a subset of conditions. To that end, a query-based strategy is needed that maximally exploits the coexpression behaviour of the seed genes to guide the biclustering, but that at the same time is robust against the presence of noisy genes in the seed set as seed genes are often assumed, but not guaranteed to be coexpressed in the queried compendium. Therefore, we developed ProBic, a query-based biclustering strategy based on Probabilistic Relational Models (PRMs) that exploits the use of prior distributions to extract the information contained within the seed set. We applied ProBic on a large scale Escherichia coli compendium to extend partially described regulons with potentially novel members. We compared ProBic's performance with previously published query-based biclustering algorithms, namely ISA and QDB, from the perspective of bicluster expression quality, robustness of the outcome against noisy seed sets and biological relevance.This comparison learns that ProBic is able to retrieve biologically relevant, high quality biclusters that retain their seed genes and that it is particularly strong in handling noisy seeds. ProBic is a query-based biclustering algorithm developed in a flexible framework, designed to detect biologically relevant, high quality biclusters that retain relevant seed genes even in the presence of noise or when dealing with low quality seed sets.

  13. Bio-TDS: bioscience query tool discovery system.

    PubMed

    Gnimpieba, Etienne Z; VanDiermen, Menno S; Gustafson, Shayla M; Conn, Bill; Lushbough, Carol M

    2017-01-04

    Bioinformatics and computational biology play a critical role in bioscience and biomedical research. As researchers design their experimental projects, one major challenge is to find the most relevant bioinformatics toolkits that will lead to new knowledge discovery from their data. The Bio-TDS (Bioscience Query Tool Discovery Systems, http://biotds.org/) has been developed to assist researchers in retrieving the most applicable analytic tools by allowing them to formulate their questions as free text. The Bio-TDS is a flexible retrieval system that affords users from multiple bioscience domains (e.g. genomic, proteomic, bio-imaging) the ability to query over 12 000 analytic tool descriptions integrated from well-established, community repositories. One of the primary components of the Bio-TDS is the ontology and natural language processing workflow for annotation, curation, query processing, and evaluation. The Bio-TDS's scientific impact was evaluated using sample questions posed by researchers retrieved from Biostars, a site focusing on BIOLOGICAL DATA ANALYSIS: The Bio-TDS was compared to five similar bioscience analytic tool retrieval systems with the Bio-TDS outperforming the others in terms of relevance and completeness. The Bio-TDS offers researchers the capacity to associate their bioscience question with the most relevant computational toolsets required for the data analysis in their knowledge discovery process. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  14. Regular paths in SparQL: querying the NCI Thesaurus.

    PubMed

    Detwiler, Landon T; Suciu, Dan; Brinkley, James F

    2008-11-06

    OWL, the Web Ontology Language, provides syntax and semantics for representing knowledge for the semantic web. Many of the constructs of OWL have a basis in the field of description logics. While the formal underpinnings of description logics have lead to a highly computable language, it has come at a cognitive cost. OWL ontologies are often unintuitive to readers lacking a strong logic background. In this work we describe GLEEN, a regular path expression library, which extends the RDF query language SparQL to support complex path expressions over OWL and other RDF-based ontologies. We illustrate the utility of GLEEN by showing how it can be used in a query-based approach to defining simpler, more intuitive views of OWL ontologies. In particular we show how relatively simple GLEEN-enhanced SparQL queries can create views of the OWL version of the NCI Thesaurus that match the views generated by the web-based NCI browser.

  15. Complex analyses on clinical information systems using restricted natural language querying to resolve time-event dependencies.

    PubMed

    Safari, Leila; Patrick, Jon D

    2018-06-01

    This paper reports on a generic framework to provide clinicians with the ability to conduct complex analyses on elaborate research topics using cascaded queries to resolve internal time-event dependencies in the research questions, as an extension to the proposed Clinical Data Analytics Language (CliniDAL). A cascaded query model is proposed to resolve internal time-event dependencies in the queries which can have up to five levels of criteria starting with a query to define subjects to be admitted into a study, followed by a query to define the time span of the experiment. Three more cascaded queries can be required to define control groups, control variables and output variables which all together simulate a real scientific experiment. According to the complexity of the research questions, the cascaded query model has the flexibility of merging some lower level queries for simple research questions or adding a nested query to each level to compose more complex queries. Three different scenarios (one of them contains two studies) are described and used for evaluation of the proposed solution. CliniDAL's complex analyses solution enables answering complex queries with time-event dependencies at most in a few hours which manually would take many days. An evaluation of results of the research studies based on the comparison between CliniDAL and SQL solutions reveals high usability and efficiency of CliniDAL's solution. Copyright © 2018 Elsevier Inc. All rights reserved.

  16. Clean Air Markets - Allowances Query Wizard

    EPA Pesticide Factsheets

    The Allowances Query Wizard is part of a suite of Clean Air Markets-related tools that are accessible at http://camddataandmaps.epa.gov/gdm/index.cfm. The Allowances module allows the user to view allowance data associated with EPA's emissions trading programs. Allowance data can be specified and organized using the Allowance Query Wizard to find allowances information associated with specific accounts, companies, transactions, programs, facilities, representatives, allowance type, or by date. Quick Reports and Prepackaged Datasets are also available for data that are commonly requested.EPA's Clean Air Markets Division (CAMD) includes several market-based regulatory programs designed to improve air quality and ecosystems. The most well-known of these programs are EPA's Acid Rain Program and the NOx Programs, which reduce emissions of sulfur dioxide (SO2) and nitrogen oxides (NOx)-compounds that adversely affect air quality, the environment, and public health. CAMD also plays an integral role in the development and implementation of the Clean Air Interstate Rule (CAIR).

  17. An index-based algorithm for fast on-line query processing of latent semantic analysis

    PubMed Central

    Li, Pohan; Wang, Wei

    2017-01-01

    Latent Semantic Analysis (LSA) is widely used for finding the documents whose semantic is similar to the query of keywords. Although LSA yield promising similar results, the existing LSA algorithms involve lots of unnecessary operations in similarity computation and candidate check during on-line query processing, which is expensive in terms of time cost and cannot efficiently response the query request especially when the dataset becomes large. In this paper, we study the efficiency problem of on-line query processing for LSA towards efficiently searching the similar documents to a given query. We rewrite the similarity equation of LSA combined with an intermediate value called partial similarity that is stored in a designed index called partial index. For reducing the searching space, we give an approximate form of similarity equation, and then develop an efficient algorithm for building partial index, which skips the partial similarities lower than a given threshold θ. Based on partial index, we develop an efficient algorithm called ILSA for supporting fast on-line query processing. The given query is transformed into a pseudo document vector, and the similarities between query and candidate documents are computed by accumulating the partial similarities obtained from the index nodes corresponds to non-zero entries in the pseudo document vector. Compared to the LSA algorithm, ILSA reduces the time cost of on-line query processing by pruning the candidate documents that are not promising and skipping the operations that make little contribution to similarity scores. Extensive experiments through comparison with LSA have been done, which demonstrate the efficiency and effectiveness of our proposed algorithm. PMID:28520747

  18. Effects of self-aggregation on the hydration of an amphiphilic antidepressant drug in different aqueous media

    NASA Astrophysics Data System (ADS)

    Taboada, Pablo; Gutiérrez-Pichel, Manuel; Mosquera, Víctor

    2004-03-01

    Apparent molal volumes and adiabatic compressibilities of aqueous solutions of the amphiphilic antidepressant drug clomipramine hydrochloride have been determined from density and ultrasound velocity measurements in the temperature range 288.15-313.15 K in buffered aqueous solution of pH 3.0 and 5.5. Critical concentrations of aggregation of this drug were obtained from inflections on the plots of the sound velocity against drug concentration. Apparent molal adiabatic compressibilities of the aggregates formed by the drug, calculated by combining the ultrasound velocity and density data, were typical of those for a stacked aggregate. From the temperature dependence of the critical concentration and using the mass action model combined with the Phillips definition of the critical concentration the thermodynamic standard quantities: free Gibbs energy, enthalpy and entropy of aggregate formation were calculated. The critical concentration and energy involved in the aggregation process of this drug have been also evaluated experimentally using isothermal titration calorimetry at 298.15 K. The solvent-drug interactions have been discussed from compressibility and calorimetry data.

  19. Generating and Executing Complex Natural Language Queries across Linked Data.

    PubMed

    Hamon, Thierry; Mougin, Fleur; Grabar, Natalia

    2015-01-01

    With the recent and intensive research in the biomedical area, the knowledge accumulated is disseminated through various knowledge bases. Links between these knowledge bases are needed in order to use them jointly. Linked Data, SPARQL language, and interfaces in Natural Language question-answering provide interesting solutions for querying such knowledge bases. We propose a method for translating natural language questions in SPARQL queries. We use Natural Language Processing tools, semantic resources, and the RDF triples description. The method is designed on 50 questions over 3 biomedical knowledge bases, and evaluated on 27 questions. It achieves 0.78 F-measure on the test set. The method for translating natural language questions into SPARQL queries is implemented as Perl module available at http://search.cpan.org/ thhamon/RDF-NLP-SPARQLQuery.

  20. Searching for Images: The Analysis of Users' Queries for Image Retrieval in American History.

    ERIC Educational Resources Information Center

    Choi, Youngok; Rasmussen, Edie M.

    2003-01-01

    Studied users' queries for visual information in American history to identify the image attributes important for retrieval and the characteristics of users' queries for digital images, based on queries from 38 faculty and graduate students. Results of pre- and post-test questionnaires and interviews suggest principle categories of search terms.…

  1. The Localized Discovery and Recovery for Query Packet Losses in Wireless Sensor Networks with Distributed Detector Clusters

    PubMed Central

    Teng, Rui; Leibnitz, Kenji; Miura, Ryu

    2013-01-01

    An essential application of wireless sensor networks is to successfully respond to user queries. Query packet losses occur in the query dissemination due to wireless communication problems such as interference, multipath fading, packet collisions, etc. The losses of query messages at sensor nodes result in the failure of sensor nodes reporting the requested data. Hence, the reliable and successful dissemination of query messages to sensor nodes is a non-trivial problem. The target of this paper is to enable highly successful query delivery to sensor nodes by localized and energy-efficient discovery, and recovery of query losses. We adopt local and collective cooperation among sensor nodes to increase the success rate of distributed discoveries and recoveries. To enable the scalability in the operations of discoveries and recoveries, we employ a distributed name resolution mechanism at each sensor node to allow sensor nodes to self-detect the correlated queries and query losses, and then efficiently locally respond to the query losses. We prove that the collective discovery of query losses has a high impact on the success of query dissemination and reveal that scalability can be achieved by using the proposed approach. We further study the novel features of the cooperation and competition in the collective recovery at PHY and MAC layers, and show that the appropriate number of detectors can achieve optimal successful recovery rate. We evaluate the proposed approach with both mathematical analyses and computer simulations. The proposed approach enables a high rate of successful delivery of query messages and it results in short route lengths to recover from query losses. The proposed approach is scalable and operates in a fully distributed manner. PMID:23748172

  2. Superfund Chemical Data Matrix (SCDM) Query - Popup

    EPA Pesticide Factsheets

    This site allows you to to easily query the Superfund Chemical Data Matrix (SCDM) and generate a list of the corresponding Hazardous Ranking System (HRS) factor values, benchmarks, and data elements that you need.

  3. Queries for Bias Testing

    NASA Technical Reports Server (NTRS)

    Gordon, Diana F.

    1992-01-01

    Selecting a good bias prior to concept learning can be difficult. Therefore, dynamic bias adjustment is becoming increasingly popular. Current dynamic bias adjustment systems, however, are limited in their ability to identify erroneous assumptions about the relationship between the bias and the target concept. Without proper diagnosis, it is difficult to identify and then remedy faulty assumptions. We have developed an approach that makes these assumptions explicit, actively tests them with queries to an oracle, and adjusts the bias based on the test results.

  4. PropBase Query Layer: a single portal to UK subsurface physical property databases

    NASA Astrophysics Data System (ADS)

    Kingdon, Andrew; Nayembil, Martin L.; Richardson, Anne E.; Smith, A. Graham

    2013-04-01

    Until recently, the delivery of geological information for industry and public was achieved by geological mapping. Now pervasively available computers mean that 3D geological models can deliver realistic representations of the geometric location of geological units, represented as shells or volumes. The next phase of this process is to populate these with physical properties data that describe subsurface heterogeneity and its associated uncertainty. Achieving this requires capture and serving of physical, hydrological and other property information from diverse sources to populate these models. The British Geological Survey (BGS) holds large volumes of subsurface property data, derived both from their own research data collection and also other, often commercially derived data sources. This can be voxelated to incorporate this data into the models to demonstrate property variation within the subsurface geometry. All property data held by BGS has for many years been stored in relational databases to ensure their long-term continuity. However these have, by necessity, complex structures; each database contains positional reference data and model information, and also metadata such as sample identification information and attributes that define the source and processing. Whilst this is critical to assessing these analyses, it also hugely complicates the understanding of variability of the property under assessment and requires multiple queries to study related datasets making extracting physical properties from these databases difficult. Therefore the PropBase Query Layer has been created to allow simplified aggregation and extraction of all related data and its presentation of complex data in simple, mostly denormalized, tables which combine information from multiple databases into a single system. The structure from each relational database is denormalized in a generalised structure, so that each dataset can be viewed together in a common format using a simple

  5. Towards computational improvement of DNA database indexing and short DNA query searching.

    PubMed

    Stojanov, Done; Koceski, Sašo; Mileva, Aleksandra; Koceska, Nataša; Bande, Cveta Martinovska

    2014-09-03

    In order to facilitate and speed up the search of massive DNA databases, the database is indexed at the beginning, employing a mapping function. By searching through the indexed data structure, exact query hits can be identified. If the database is searched against an annotated DNA query, such as a known promoter consensus sequence, then the starting locations and the number of potential genes can be determined. This is particularly relevant if unannotated DNA sequences have to be functionally annotated. However, indexing a massive DNA database and searching an indexed data structure with millions of entries is a time-demanding process. In this paper, we propose a fast DNA database indexing and searching approach, identifying all query hits in the database, without having to examine all entries in the indexed data structure, limiting the maximum length of a query that can be searched against the database. By applying the proposed indexing equation, the whole human genome could be indexed in 10 hours on a personal computer, under the assumption that there is enough RAM to store the indexed data structure. Analysing the methodology proposed by Reneker, we observed that hits at starting positions [Formula: see text] are not reported, if the database is searched against a query shorter than [Formula: see text] nucleotides, such that [Formula: see text] is the length of the DNA database words being mapped and [Formula: see text] is the length of the query. A solution of this drawback is also presented.

  6. Generalized query-based active learning to identify differentially methylated regions in DNA.

    PubMed

    Haque, Md Muksitul; Holder, Lawrence B; Skinner, Michael K; Cook, Diane J

    2013-01-01

    Active learning is a supervised learning technique that reduces the number of examples required for building a successful classifier, because it can choose the data it learns from. This technique holds promise for many biological domains in which classified examples are expensive and time-consuming to obtain. Most traditional active learning methods ask very specific queries to the Oracle (e.g., a human expert) to label an unlabeled example. The example may consist of numerous features, many of which are irrelevant. Removing such features will create a shorter query with only relevant features, and it will be easier for the Oracle to answer. We propose a generalized query-based active learning (GQAL) approach that constructs generalized queries based on multiple instances. By constructing appropriately generalized queries, we can achieve higher accuracy compared to traditional active learning methods. We apply our active learning method to find differentially DNA methylated regions (DMRs). DMRs are DNA locations in the genome that are known to be involved in tissue differentiation, epigenetic regulation, and disease. We also apply our method on 13 other data sets and show that our method is better than another popular active learning technique.

  7. Application of Electron Backscatter Diffraction to evaluate the ASR risk of concrete aggregates

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Rößler, C., E-mail: christiane.roessler@uni-weimar.de; Möser, B.; Giebson, C.

    Alkali-Silica Reaction (ASR) is a frequent cause of reduced concrete durability. Eliminating the application of alkali reactive aggregates would reduce the quantity of ASR concrete deterioration in the field. This study introduces an Electron Backscatter Diffraction (EBSD) technique to distinguish the ASR risk of slow-late reacting aggregates by measuring microstructural properties of quartz. Quantifying the amount of quartz grain boundaries and the associated misorientation of grains can thereby be used to differentiate microstructures bearing an ASR risk. It is also shown that dissolution of quartz in high pH environments occurs along quartz grain and subgrain boundaries. Results of EBSD analysismore » are compared with ASR performance testing on concrete prisms and optical light microscopy characterization of quartz microstructure. EBSD opens new possibilities to quantitatively characterize microstructure of quartz in concrete aggregates with respect to ASR. This leads to a better understanding on the actual cause of ASR.« less

  8. An alternative database approach for management of SNOMED CT and improved patient data queries.

    PubMed

    Campbell, W Scott; Pedersen, Jay; McClay, James C; Rao, Praveen; Bastola, Dhundy; Campbell, James R

    2015-10-01

    SNOMED CT is the international lingua franca of terminologies for human health. Based in Description Logics (DL), the terminology enables data queries that incorporate inferences between data elements, as well as, those relationships that are explicitly stated. However, the ontologic and polyhierarchical nature of the SNOMED CT concept model make it difficult to implement in its entirety within electronic health record systems that largely employ object oriented or relational database architectures. The result is a reduction of data richness, limitations of query capability and increased systems overhead. The hypothesis of this research was that a graph database (graph DB) architecture using SNOMED CT as the basis for the data model and subsequently modeling patient data upon the semantic core of SNOMED CT could exploit the full value of the terminology to enrich and support advanced data querying capability of patient data sets. The hypothesis was tested by instantiating a graph DB with the fully classified SNOMED CT concept model. The graph DB instance was tested for integrity by calculating the transitive closure table for the SNOMED CT hierarchy and comparing the results with transitive closure tables created using current, validated methods. The graph DB was then populated with 461,171 anonymized patient record fragments and over 2.1 million associated SNOMED CT clinical findings. Queries, including concept negation and disjunction, were then run against the graph database and an enterprise Oracle relational database (RDBMS) of the same patient data sets. The graph DB was then populated with laboratory data encoded using LOINC, as well as, medication data encoded with RxNorm and complex queries performed using LOINC, RxNorm and SNOMED CT to identify uniquely described patient populations. A graph database instance was successfully created for two international releases of SNOMED CT and two US SNOMED CT editions. Transitive closure tables and descriptive

  9. On Relevance Weight Estimation and Query Expansion.

    ERIC Educational Resources Information Center

    Robertson, S. E.

    1986-01-01

    A Bayesian argument is used to suggest modifications to the Robertson and Jones relevance weighting formula to accommodate the addition to the query of terms taken from the relevant documents identified during the search. (Author)

  10. miBLAST: scalable evaluation of a batch of nucleotide sequence queries with BLAST

    PubMed Central

    Kim, You Jung; Boyd, Andrew; Athey, Brian D.; Patel, Jignesh M.

    2005-01-01

    A common task in many modern bioinformatics applications is to match a set of nucleotide query sequences against a large sequence dataset. Exis-ting tools, such as BLAST, are designed to evaluate a single query at a time and can be unacceptably slow when the number of sequences in the query set is large. In this paper, we present a new algorithm, called miBLAST, that evaluates such batch workloads efficiently. At the core, miBLAST employs a q-gram filtering and an index join for efficiently detecting similarity between the query sequences and database sequences. This set-oriented technique, which indexes both the query and the database sets, results in substantial performance improvements over existing methods. Our results show that miBLAST is significantly faster than BLAST in many cases. For example, miBLAST aligned 247 965 oligonucleotide sequences in the Affymetrix probe set against the Human UniGene in 1.26 days, compared with 27.27 days with BLAST (an improvement by a factor of 22). The relative performance of miBLAST increases for larger word sizes; however, it decreases for longer queries. miBLAST employs the familiar BLAST statistical model and output format, guaranteeing the same accuracy as BLAST and facilitating a seamless transition for existing BLAST users. PMID:16061938

  11. Cytotoxicity Assessment of Some Carbon Nanotubes and Related Carbon Nanoparticle Aggregates and the Implications for Anthropogenic Carbon Nanotube Aggregates in the Environment

    PubMed Central

    Murr, L. E.; Garza, K. M.; Soto, K. F.; Carrasco, A.; Powell, T. G.; Ramirez, D. A.; Guerrero, P. A.; Lopez, D. A.; Venzor, J.

    2005-01-01

    Nanotechnology and nanomaterials have become the new frontier world-wide over the past few years and prospects for the production and novel uses of large quantities of carbon nanotubes in particular are becoming an increasing reality. Correspondingly, the potential health risks for these and other nanoparticulate materials have been of considerable concern. Toxicological studies, while sparse, have been concerned with virtually uncharacterized, single wall carbon nanotubes, and the conclusions have been conflicting and uncertain. In this research we performed viability assays on a murine lung macrophage cell line to assess the comparative cytotoxicity of commercial, single wall carbon nanotubes (ropes) and two different multiwall carbon nanotube samples; utilizing chrysotile asbestos nanotubes and black carbon nanoaggregates as toxicity standards. These nanotube materials were completely characterized by transmission electron microscopy and observed to be aggregates ranging from 1 to 2 μm in mean diameter, with closed ends. The cytotoxicity data indicated a strong concentration relationship and toxicity for all the carbon nanotube materials relative to the asbestos nanotubes and black carbon. A commercial multiwall carbon nanotube aggregate exhibiting this significant cell response was observed to be identical in structure to multiwall carbon nanotube aggregates demonstrated to be ubiquitous in the environment, and especially in indoor environments, where natural gas or propane cooking stoves exist. Correspondingly, preliminary epidemiological data, although sparse, indicate a correlation between asthma incidence or classification, and exposure to gas stoves. These results suggest a number of novel epidemiological and etiological avenues for asthma triggers and related respiratory or other environmental health effects, especially since indoor number concentrations for multiwall carbon nanotube aggregates is at least 10 times the outdoor concentration, and

  12. Experimental aggregation of volcanic ash: the role of liquid bonding

    NASA Astrophysics Data System (ADS)

    Mueller, S.; Kueppers, U.; Jacob, M.; Ayris, P. M.; Dingwell, D. B.

    2015-12-01

    Explosive volcanic eruptions may release vast quantities of ash. Because of its size, it has the greatest dispersal potential and can be distributed globally. Ash may pose severe risks for 1) air traffic, 2) human and animal health, 3) agriculture and 4) infrastructure. Such ash particles can however cluster and form ash aggregates that range in size from millimeters to centimeters. During their growth, weight and aerodynamic properties change. This leads to significantly changed transport and settling behavior. The physico-chemical processes involved in aggregation are quantitatively poorly constrained. We have performed laboratory ash aggregation experiments using the ProCell Lab System® of Glatt Ingenieurtechnik GmbH. Solid particles are set into motion in a fluidized bed over a range of well-controlled boundary conditions (e.g., air flow rate, gas temperature, humidity, liquid composition). In this manner we simulate the variable gas-particle flow conditions expected in eruption plumes and pyroclastic density currents. We have used 1) soda-lime glass beads as an analogue material and 2) natural volcanic ash from Laacher See Volcano (Germany). In order to influence form, size, stability and the production rate of aggregates, a range of experimental conditions (e.g., particle concentration, degree of turbulence, temperature and moisture in the process chamber and the composition of the liquid phase) have been employed. We have successfully reproduced several features of natural ash aggregates, including round, internally structured ash pellets up to 3 mm in diameter. These experimental results help to constrain the boundary conditions required for the generation of spherical, internally-structured ash aggregates that survive deposition and are preserved in the volcanological record. These results should also serve as input parameters for models of ash transport and ash mass distribution.

  13. Collusion-aware privacy-preserving range query in tiered wireless sensor networks.

    PubMed

    Zhang, Xiaoying; Dong, Lei; Peng, Hui; Chen, Hong; Zhao, Suyun; Li, Cuiping

    2014-12-11

    Wireless sensor networks (WSNs) are indispensable building blocks for the Internet of Things (IoT). With the development of WSNs, privacy issues have drawn more attention. Existing work on the privacy-preserving range query mainly focuses on privacy preservation and integrity verification in two-tiered WSNs in the case of compromisedmaster nodes, but neglects the damage of node collusion. In this paper, we propose a series of collusion-aware privacy-preserving range query protocols in two-tiered WSNs. To the best of our knowledge, this paper is the first to consider collusion attacks for a range query in tiered WSNs while fulfilling the preservation of privacy and integrity. To preserve the privacy of data and queries, we propose a novel encoding scheme to conceal sensitive information. To preserve the integrity of the results, we present a verification scheme using the correlation among data. In addition, two schemes are further presented to improve result accuracy and reduce communication cost. Finally, theoretical analysis and experimental results confirm the efficiency, accuracy and privacy of our proposals.

  14. Collusion-Aware Privacy-Preserving Range Query in Tiered Wireless Sensor Networks†

    PubMed Central

    Zhang, Xiaoying; Dong, Lei; Peng, Hui; Chen, Hong; Zhao, Suyun; Li, Cuiping

    2014-01-01

    Wireless sensor networks (WSNs) are indispensable building blocks for the Internet of Things (IoT). With the development of WSNs, privacy issues have drawn more attention. Existing work on the privacy-preserving range query mainly focuses on privacy preservation and integrity verification in two-tiered WSNs in the case of compromised master nodes, but neglects the damage of node collusion. In this paper, we propose a series of collusion-aware privacy-preserving range query protocols in two-tiered WSNs. To the best of our knowledge, this paper is the first to consider collusion attacks for a range query in tiered WSNs while fulfilling the preservation of privacy and integrity. To preserve the privacy of data and queries, we propose a novel encoding scheme to conceal sensitive information. To preserve the integrity of the results, we present a verification scheme using the correlation among data. In addition, two schemes are further presented to improve result accuracy and reduce communication cost. Finally, theoretical analysis and experimental results confirm the efficiency, accuracy and privacy of our proposals. PMID:25615731

  15. Solving the problem of Trans-Genomic Query with alignment tables.

    PubMed

    Parker, Douglass Stott; Hsiao, Ruey-Lung; Xing, Yi; Resch, Alissa M; Lee, Christopher J

    2008-01-01

    The trans-genomic query (TGQ) problem--enabling the free query of biological information, even across genomes--is a central challenge facing bioinformatics. Solutions to this problem can alter the nature of the field, moving it beyond the jungle of data integration and expanding the number and scope of questions that can be answered. An alignment table is a binary relationship on locations (sequence segments). An important special case of alignment tables are hit tables ? tables of pairs of highly similar segments produced by alignment tools like BLAST. However, alignment tables also include general binary relationships, and can represent any useful connection between sequence locations. They can be curated, and provide a high-quality queryable backbone of connections between biological information. Alignment tables thus can be a natural foundation for TGQ, as they permit a central part of the TGQ problem to be reduced to purely technical problems involving tables of locations.Key challenges in implementing alignment tables include efficient representation and indexing of sequence locations. We define a location datatype that can be incorporated naturally into common off-the-shelf database systems. We also describe an implementation of alignment tables in BLASTGRES, an extension of the open-source POSTGRESQL database system that provides indexing and operators on locations required for querying alignment tables. This paper also reviews several successful large-scale applications of alignment tables for Trans-Genomic Query. Tables with millions of alignments have been used in queries about alternative splicing, an area of genomic analysis concerning the way in which a single gene can yield multiple transcripts. Comparative genomics is a large potential application area for TGQ and alignment tables.

  16. Private database queries based on counterfactual quantum key distribution

    NASA Astrophysics Data System (ADS)

    Zhang, Jia-Li; Guo, Fen-Zhuo; Gao, Fei; Liu, Bin; Wen, Qiao-Yan

    2013-08-01

    Based on the fundamental concept of quantum counterfactuality, we propose a protocol to achieve quantum private database queries, which is a theoretical study of how counterfactuality can be employed beyond counterfactual quantum key distribution (QKD). By adding crucial detecting apparatus to the device of QKD, the privacy of both the distrustful user and the database owner can be guaranteed. Furthermore, the proposed private-database-query protocol makes full use of the low efficiency in the counterfactual QKD, and by adjusting the relevant parameters, the protocol obtains excellent flexibility and extensibility.

  17. Rice SNP-seek database update: new SNPs, indels, and queries.

    PubMed

    Mansueto, Locedie; Fuentes, Roven Rommel; Borja, Frances Nikki; Detras, Jeffery; Abriol-Santos, Juan Miguel; Chebotarov, Dmytro; Sanciangco, Millicent; Palis, Kevin; Copetti, Dario; Poliakov, Alexandre; Dubchak, Inna; Solovyev, Victor; Wing, Rod A; Hamilton, Ruaraidh Sackville; Mauleon, Ramil; McNally, Kenneth L; Alexandrov, Nickolai

    2017-01-04

    We describe updates to the Rice SNP-Seek Database since its first release. We ran a new SNP-calling pipeline followed by filtering that resulted in complete, base, filtered and core SNP datasets. Besides the Nipponbare reference genome, the pipeline was run on genome assemblies of IR 64, 93-11, DJ 123 and Kasalath. New genotype query and display features are added for reference assemblies, SNP datasets and indels. JBrowse now displays BAM, VCF and other annotation tracks, the additional genome assemblies and an embedded VISTA genome comparison viewer. Middleware is redesigned for improved performance by using a hybrid of HDF5 and RDMS for genotype storage. Query modules for genotypes, varieties and genes are improved to handle various constraints. An integrated list manager allows the user to pass query parameters for further analysis. The SNP Annotator adds traits, ontology terms, effects and interactions to markers in a list. Web-service calls were implemented to access most data. These features enable seamless querying of SNP-Seek across various biological entities, a step toward semi-automated gene-trait association discovery. URL: http://snp-seek.irri.org. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  18. Querying databases of trajectories of differential equations: Data structures for trajectories

    NASA Technical Reports Server (NTRS)

    Grossman, Robert

    1989-01-01

    One approach to qualitative reasoning about dynamical systems is to extract qualitative information by searching or making queries on databases containing very large numbers of trajectories. The efficiency of such queries depends crucially upon finding an appropriate data structure for trajectories of dynamical systems. Suppose that a large number of parameterized trajectories gamma of a dynamical system evolving in R sup N are stored in a database. Let Eta is contained in set R sup N denote a parameterized path in Euclidean Space, and let the Euclidean Norm denote a norm on the space of paths. A data structure is defined to represent trajectories of dynamical systems, and an algorithm is sketched which answers queries.

  19. The role of organizational research in implementing evidence-based practice: QUERI Series

    PubMed Central

    Yano, Elizabeth M

    2008-01-01

    Background Health care organizations exert significant influence on the manner in which clinicians practice and the processes and outcomes of care that patients experience. A greater understanding of the organizational milieu into which innovations will be introduced, as well as the organizational factors that are likely to foster or hinder the adoption and use of new technologies, care arrangements and quality improvement (QI) strategies are central to the effective implementation of research into practice. Unfortunately, much implementation research seems to not recognize or adequately address the influence and importance of organizations. Using examples from the U.S. Department of Veterans Affairs (VA) Quality Enhancement Research Initiative (QUERI), we describe the role of organizational research in advancing the implementation of evidence-based practice into routine care settings. Methods Using the six-step QUERI process as a foundation, we present an organizational research framework designed to improve and accelerate the implementation of evidence-based practice into routine care. Specific QUERI-related organizational research applications are reviewed, with discussion of the measures and methods used to apply them. We describe these applications in the context of a continuum of organizational research activities to be conducted before, during and after implementation. Results Since QUERI's inception, various approaches to organizational research have been employed to foster progress through QUERI's six-step process. We report on how explicit integration of the evaluation of organizational factors into QUERI planning has informed the design of more effective care delivery system interventions and enabled their improved "fit" to individual VA facilities or practices. We examine the value and challenges in conducting organizational research, and briefly describe the contributions of organizational theory and environmental context to the research framework

  20. System for Performing Single Query Searches of Heterogeneous and Dispersed Databases

    NASA Technical Reports Server (NTRS)

    Maluf, David A. (Inventor); Okimura, Takeshi (Inventor); Gurram, Mohana M. (Inventor); Tran, Vu Hoang (Inventor); Knight, Christopher D. (Inventor); Trinh, Anh Ngoc (Inventor)

    2017-01-01

    The present invention is a distributed computer system of heterogeneous databases joined in an information grid and configured with an Application Programming Interface hardware which includes a search engine component for performing user-structured queries on multiple heterogeneous databases in real time. This invention reduces overhead associated with the impedance mismatch that commonly occurs in heterogeneous database queries.

  1. A journey to Semantic Web query federation in the life sciences.

    PubMed

    Cheung, Kei-Hoi; Frost, H Robert; Marshall, M Scott; Prud'hommeaux, Eric; Samwald, Matthias; Zhao, Jun; Paschke, Adrian

    2009-10-01

    As interest in adopting the Semantic Web in the biomedical domain continues to grow, Semantic Web technology has been evolving and maturing. A variety of technological approaches including triplestore technologies, SPARQL endpoints, Linked Data, and Vocabulary of Interlinked Datasets have emerged in recent years. In addition to the data warehouse construction, these technological approaches can be used to support dynamic query federation. As a community effort, the BioRDF task force, within the Semantic Web for Health Care and Life Sciences Interest Group, is exploring how these emerging approaches can be utilized to execute distributed queries across different neuroscience data sources. We have created two health care and life science knowledge bases. We have explored a variety of Semantic Web approaches to describe, map, and dynamically query multiple datasets. We have demonstrated several federation approaches that integrate diverse types of information about neurons and receptors that play an important role in basic, clinical, and translational neuroscience research. Particularly, we have created a prototype receptor explorer which uses OWL mappings to provide an integrated list of receptors and executes individual queries against different SPARQL endpoints. We have also employed the AIDA Toolkit, which is directed at groups of knowledge workers who cooperatively search, annotate, interpret, and enrich large collections of heterogeneous documents from diverse locations. We have explored a tool called "FeDeRate", which enables a global SPARQL query to be decomposed into subqueries against the remote databases offering either SPARQL or SQL query interfaces. Finally, we have explored how to use the vocabulary of interlinked Datasets (voiD) to create metadata for describing datasets exposed as Linked Data URIs or SPARQL endpoints. We have demonstrated the use of a set of novel and state-of-the-art Semantic Web technologies in support of a neuroscience query

  2. A journey to Semantic Web query federation in the life sciences

    PubMed Central

    Cheung, Kei-Hoi; Frost, H Robert; Marshall, M Scott; Prud'hommeaux, Eric; Samwald, Matthias; Zhao, Jun; Paschke, Adrian

    2009-01-01

    Background As interest in adopting the Semantic Web in the biomedical domain continues to grow, Semantic Web technology has been evolving and maturing. A variety of technological approaches including triplestore technologies, SPARQL endpoints, Linked Data, and Vocabulary of Interlinked Datasets have emerged in recent years. In addition to the data warehouse construction, these technological approaches can be used to support dynamic query federation. As a community effort, the BioRDF task force, within the Semantic Web for Health Care and Life Sciences Interest Group, is exploring how these emerging approaches can be utilized to execute distributed queries across different neuroscience data sources. Methods and results We have created two health care and life science knowledge bases. We have explored a variety of Semantic Web approaches to describe, map, and dynamically query multiple datasets. We have demonstrated several federation approaches that integrate diverse types of information about neurons and receptors that play an important role in basic, clinical, and translational neuroscience research. Particularly, we have created a prototype receptor explorer which uses OWL mappings to provide an integrated list of receptors and executes individual queries against different SPARQL endpoints. We have also employed the AIDA Toolkit, which is directed at groups of knowledge workers who cooperatively search, annotate, interpret, and enrich large collections of heterogeneous documents from diverse locations. We have explored a tool called "FeDeRate", which enables a global SPARQL query to be decomposed into subqueries against the remote databases offering either SPARQL or SQL query interfaces. Finally, we have explored how to use the vocabulary of interlinked Datasets (voiD) to create metadata for describing datasets exposed as Linked Data URIs or SPARQL endpoints. Conclusion We have demonstrated the use of a set of novel and state-of-the-art Semantic Web technologies

  3. Hybrid Filtering in Semantic Query Processing

    ERIC Educational Resources Information Center

    Jeong, Hanjo

    2011-01-01

    This dissertation presents a hybrid filtering method and a case-based reasoning framework for enhancing the effectiveness of Web search. Web search may not reflect user needs, intent, context, and preferences, because today's keyword-based search is lacking semantic information to capture the user's context and intent in posing the search query.…

  4. GO2PUB: Querying PubMed with semantic expansion of gene ontology terms

    PubMed Central

    2012-01-01

    Background With the development of high throughput methods of gene analyses, there is a growing need for mining tools to retrieve relevant articles in PubMed. As PubMed grows, literature searches become more complex and time-consuming. Automated search tools with good precision and recall are necessary. We developed GO2PUB to automatically enrich PubMed queries with gene names, symbols and synonyms annotated by a GO term of interest or one of its descendants. Results GO2PUB enriches PubMed queries based on selected GO terms and keywords. It processes the result and displays the PMID, title, authors, abstract and bibliographic references of the articles. Gene names, symbols and synonyms that have been generated as extra keywords from the GO terms are also highlighted. GO2PUB is based on a semantic expansion of PubMed queries using the semantic inheritance between terms through the GO graph. Two experts manually assessed the relevance of GO2PUB, GoPubMed and PubMed on three queries about lipid metabolism. Experts’ agreement was high (kappa = 0.88). GO2PUB returned 69% of the relevant articles, GoPubMed: 40% and PubMed: 29%. GO2PUB and GoPubMed have 17% of their results in common, corresponding to 24% of the total number of relevant results. 70% of the articles returned by more than one tool were relevant. 36% of the relevant articles were returned only by GO2PUB, 17% only by GoPubMed and 14% only by PubMed. For determining whether these results can be generalized, we generated twenty queries based on random GO terms with a granularity similar to those of the first three queries and compared the proportions of GO2PUB and GoPubMed results. These were respectively of 77% and 40% for the first queries, and of 70% and 38% for the random queries. The two experts also assessed the relevance of seven of the twenty queries (the three related to lipid metabolism and four related to other domains). Expert agreement was high (0.93 and 0.8). GO2PUB and GoPubMed performances

  5. Query-biased preview over outsourced and encrypted data.

    PubMed

    Peng, Ningduo; Luo, Guangchun; Qin, Ke; Chen, Aiguo

    2013-01-01

    For both convenience and security, more and more users encrypt their sensitive data before outsourcing it to a third party such as cloud storage service. However, searching for the desired documents becomes problematic since it is costly to download and decrypt each possibly needed document to check if it contains the desired content. An informative query-biased preview feature, as applied in modern search engine, could help the users to learn about the content without downloading the entire document. However, when the data are encrypted, securely extracting a keyword-in-context snippet from the data as a preview becomes a challenge. Based on private information retrieval protocol and the core concept of searchable encryption, we propose a single-server and two-round solution to securely obtain a query-biased snippet over the encrypted data from the server. We achieve this novel result by making a document (plaintext) previewable under any cryptosystem and constructing a secure index to support dynamic computation for a best matched snippet when queried by some keywords. For each document, the scheme has O(d) storage complexity and O(log(d/s) + s + d/s) communication complexity, where d is the document size and s is the snippet length.

  6. Query-Biased Preview over Outsourced and Encrypted Data

    PubMed Central

    Luo, Guangchun; Qin, Ke; Chen, Aiguo

    2013-01-01

    For both convenience and security, more and more users encrypt their sensitive data before outsourcing it to a third party such as cloud storage service. However, searching for the desired documents becomes problematic since it is costly to download and decrypt each possibly needed document to check if it contains the desired content. An informative query-biased preview feature, as applied in modern search engine, could help the users to learn about the content without downloading the entire document. However, when the data are encrypted, securely extracting a keyword-in-context snippet from the data as a preview becomes a challenge. Based on private information retrieval protocol and the core concept of searchable encryption, we propose a single-server and two-round solution to securely obtain a query-biased snippet over the encrypted data from the server. We achieve this novel result by making a document (plaintext) previewable under any cryptosystem and constructing a secure index to support dynamic computation for a best matched snippet when queried by some keywords. For each document, the scheme has O(d) storage complexity and O(log(d/s) + s + d/s) communication complexity, where d is the document size and s is the snippet length. PMID:24078798

  7. MP3C - the Minor Planet Physical Properties Catalogue: a New VO Service For Multi-database Query

    NASA Astrophysics Data System (ADS)

    Tanga, Paolo; Delbo, M.; Gerakis, J.

    2013-10-01

    In the last few years we witnessed a large growth in the number of asteroids for which we have physical properties. However, these data are dispersed in a multiplicity of catalogs. Extracting data and combining them for further analysis requires custom tools, a situation further complicated by the variety of data sources, some of them standardized (Planetary Data System) others not. With these problems in mind, we created a new Virtual Observatory service named “Minor Planet Physical Properties Catalogue” (abbreviated as MP3C - http://mp3c.oca.eu/). MP3C is not a new database, but rather a portal allowing the user to access selected properties of objects by easy SQL query, even from different sources. At present, such diverse data as orbital parameters, photometric and light curve parameters, sizes and albedos derived by IRAS, AKARI and WISE, SDSS colors, SMASS taxonomy, family membership, satellite data, stellar occultation results, are included. Other data sources will be added in the near future. The physical properties output of the MP3C can be tuned by the users by query criteria based upon ranges of values of the ingested quantities. The resulting list of object can be used for interactive plots through standard VO tools such as TOPCAT. Also, their ephemerids and visibilities from given sites can be computed. We are targeting full VO compliance for providing a new standardized service to the community.

  8. Behavioral transitions induced by speed and noise in animal aggregates

    NASA Astrophysics Data System (ADS)

    Cambui, Dorílson S.; Iliass, Tarras

    2017-04-01

    In this paper, we used a self-propelled particle model to study the transition between phases of collective behavior observed in animal aggregates. In these systems, transitions occur when individuals shift from one collective state to another. We investigated transitions induced by both the speed and the noise. Statistical quantities that characterize the phase transition driven by noise, such as order parameter, the Binder cumulant and the susceptibility were analyzed, and we used the finite-size scaling theory to estimate the critical exponent ratios β/ν and γ/ν.

  9. Query by example video based on fuzzy c-means initialized by fixed clustering center

    NASA Astrophysics Data System (ADS)

    Hou, Sujuan; Zhou, Shangbo; Siddique, Muhammad Abubakar

    2012-04-01

    Currently, the high complexity of video contents has posed the following major challenges for fast retrieval: (1) efficient similarity measurements, and (2) efficient indexing on the compact representations. A video-retrieval strategy based on fuzzy c-means (FCM) is presented for querying by example. Initially, the query video is segmented and represented by a set of shots, each shot can be represented by a key frame, and then we used video processing techniques to find visual cues to represent the key frame. Next, because the FCM algorithm is sensitive to the initializations, here we initialized the cluster center by the shots of query video so that users could achieve appropriate convergence. After an FCM cluster was initialized by the query video, each shot of query video was considered a benchmark point in the aforesaid cluster, and each shot in the database possessed a class label. The similarity between the shots in the database with the same class label and benchmark point can be transformed into the distance between them. Finally, the similarity between the query video and the video in database was transformed into the number of similar shots. Our experimental results demonstrated the performance of this proposed approach.

  10. Clean Air Markets - Compliance Query Wizard

    EPA Pesticide Factsheets

    The Compliance Query Wizard is part of a suite of Clean Air Markets-related tools that are accessible at http://ampd.epa.gov/ampd/. The Compliance module provides final compliance results. Using the Compliance Query Wizard, the user can find compliance information associated with specific programs, facilities, states or time frames. Quick Reports and Prepackaged Datasets are also available for data that are commonly requested. Final compliance results are available for all years since 1995 for the Acid Rain Program and for the various NOx trading programs EPA has operated since 1999.EPA's Clean Air Markets Division (CAMD) includes several market-based regulatory programs designed to improve air quality and ecosystems. The most well-known of these programs are EPA's Acid Rain Program and the NOx Programs, which reduce emissions of sulfur dioxide (SO2) and nitrogen oxides (NOx)-compounds that adversely affect air quality, the environment, and public health. CAMD also plays an integral role in the development and implementation of the Clean Air Interstate Rule (CAIR).

  11. A natural language interface plug-in for cooperative query answering in biological databases.

    PubMed

    Jamil, Hasan M

    2012-06-11

    One of the many unique features of biological databases is that the mere existence of a ground data item is not always a precondition for a query response. It may be argued that from a biologist's standpoint, queries are not always best posed using a structured language. By this we mean that approximate and flexible responses to natural language like queries are well suited for this domain. This is partly due to biologists' tendency to seek simpler interfaces and partly due to the fact that questions in biology involve high level concepts that are open to interpretations computed using sophisticated tools. In such highly interpretive environments, rigidly structured databases do not always perform well. In this paper, our goal is to propose a semantic correspondence plug-in to aid natural language query processing over arbitrary biological database schema with an aim to providing cooperative responses to queries tailored to users' interpretations. Natural language interfaces for databases are generally effective when they are tuned to the underlying database schema and its semantics. Therefore, changes in database schema become impossible to support, or a substantial reorganization cost must be absorbed to reflect any change. We leverage developments in natural language parsing, rule languages and ontologies, and data integration technologies to assemble a prototype query processor that is able to transform a natural language query into a semantically equivalent structured query over the database. We allow knowledge rules and their frequent modifications as part of the underlying database schema. The approach we adopt in our plug-in overcomes some of the serious limitations of many contemporary natural language interfaces, including support for schema modifications and independence from underlying database schema. The plug-in introduced in this paper is generic and facilitates connecting user selected natural language interfaces to arbitrary databases using a

  12. Query Optimization by Semantic Reasoning.

    DTIC Science & Technology

    1981-05-01

    condition holds, then formulas X and Y are said to be ,nerge-compatible. Let xi be the variable in X that corresponds to variable yj in Y (x is not...Davidson, Ramez EI-Masri, Sheldon Finkelstein, Hector Garcia, Mohammed Olumi, Tom Rogers, Neil Rowe, David Shaw, and Kyu-Young Whang . Special credit...for the simple queries, along with cost formulas and applicability conditions for the methods. Most recently has come the development of optimizers for

  13. Protein A chromatography increases monoclonal antibody aggregation rate during subsequent low pH virus inactivation hold

    PubMed Central

    Mazzer, Alice R.; Perraud, Xavier; Halley, Jennifer; O’Hara, John; Bracewell, Daniel G.

    2015-01-01

    Protein A chromatography is a near-ubiquitous method of mAb capture in bioprocesses. The use of low pH buffer for elution from protein A is known to contribute to product aggregation. Yet, a more limited set of evidence suggests that low pH may not be the sole cause of aggregation in protein A chromatography, rather, other facets of the process may contribute significantly. This paper presents a well-defined method for investigating this problem. An IgG4 was incubated in elution buffer after protein A chromatography (typical of the viral inactivation hold) and the quantity of monomer in neutralised samples was determined by size exclusion chromatography; elution buffers of different pH values predetermined to induce aggregation of the IgG4 were used. Rate constants for monomer decay over time were determined by fitting exponential decay functions to the data. Similar experiments were implemented in the absence of a chromatography step, i.e. IgG4 aggregation at low pH. Rate constants for aggregation after protein A chromatography were considerably higher than those from low pH exposure alone; a distinct shift in aggregation rates was apparent across the pH range tested. PMID:26346187

  14. Calcium oxalate monohydrate aggregation induced by aggregation of desialylated Tamm-Horsfall protein

    PubMed Central

    Viswanathan, Pragasam; Rimer, Jeffrey D.; Kolbach, Ann M.; Kleinman, Jack G.

    2011-01-01

    Tamm-Horsfall protein (THP) is thought to protect against calcium oxalate monohydrate (COM) stone formation by inhibiting COM aggregation. Several studies reported that stone formers produce THP with reduced levels of glycosylation, particularly sialic acid levels, which leads to reduced negative charge. In this study, normal THP was treated with neuraminidase to remove sialic acid residues, confirmed by an isoelectric point shift to higher pH. COM aggregation assays revealed that desialylated THP (ds-THP) promoted COM aggregation, while normal THP inhibited aggregation. The appearance of protein aggregates in solutions at ds-THP concentrations ≥1 µg/mL in 150 mM NaCl correlated with COM aggregation promotion, implying that ds-THP aggregation induced COM aggregation. The aggregation-promoting effect of the ds-THP was independent of pH above its isoelectric point, but was substantially reduced at low ionic strength, where protein aggregation was much reduced. COM aggregation promotion was maximized at a ds-THP to COM mass ratio of ~0.025, which can be explained by a model wherein partial COM surface coverage by ds-THP aggregates promotes crystal aggregation by bridging opposing COM surfaces, whereas higher surface coverage leads to repulsion between adsorbed ds-THP aggregates. Thus, desialylation of THP apparently abrogates a normal defensive action of THP by inducing protein aggregation, and subsequently COM aggregation, a condition that favors kidney stone formation. PMID:21229239

  15. Enhancing user privacy in SARG04-based private database query protocols

    NASA Astrophysics Data System (ADS)

    Yu, Fang; Qiu, Daowen; Situ, Haozhen; Wang, Xiaoming; Long, Shun

    2015-11-01

    The well-known SARG04 protocol can be used in a private query application to generate an oblivious key. By usage of the key, the user can retrieve one out of N items from a database without revealing which one he/she is interested in. However, the existing SARG04-based private query protocols are vulnerable to the attacks of faked data from the database since in its canonical form, the SARG04 protocol lacks means for one party to defend attacks from the other. While such attacks can cause significant loss of user privacy, a variant of the SARG04 protocol is proposed in this paper with new mechanisms designed to help the user protect its privacy in private query applications. In the protocol, it is the user who starts the session with the database, trying to learn from it bits of a raw key in an oblivious way. An honesty test is used to detect a cheating database who had transmitted faked data. The whole private query protocol has O( N) communication complexity for conveying at least N encrypted items. Compared with the existing SARG04-based protocols, it is efficient in communication for per-bit learning.

  16. Experiments on Interfaces To Support Query Expansion.

    ERIC Educational Resources Information Center

    Beaulieu, M.

    1997-01-01

    Focuses on the user and human-computer interaction aspects of the research based on the Okapi text retrieval system. Three experiments implementing different approaches to query expansion are described, including the use of graphical user interfaces with different windowing techniques. (Author/LRW)

  17. Secure and Privacy-Preserving Body Sensor Data Collection and Query Scheme.

    PubMed

    Zhu, Hui; Gao, Lijuan; Li, Hui

    2016-02-01

    With the development of body sensor networks and the pervasiveness of smart phones, different types of personal data can be collected in real time by body sensors, and the potential value of massive personal data has attracted considerable interest recently. However, the privacy issues of sensitive personal data are still challenging today. Aiming at these challenges, in this paper, we focus on the threats from telemetry interface and present a secure and privacy-preserving body sensor data collection and query scheme, named SPCQ, for outsourced computing. In the proposed SPCQ scheme, users' personal information is collected by body sensors in different types and converted into multi-dimension data, and each dimension is converted into the form of a number and uploaded to the cloud server, which provides a secure, efficient and accurate data query service, while the privacy of sensitive personal information and users' query data is guaranteed. Specifically, based on an improved homomorphic encryption technology over composite order group, we propose a special weighted Euclidean distance contrast algorithm (WEDC) for multi-dimension vectors over encrypted data. With the SPCQ scheme, the confidentiality of sensitive personal data, the privacy of data users' queries and accurate query service can be achieved in the cloud server. Detailed analysis shows that SPCQ can resist various security threats from telemetry interface. In addition, we also implement SPCQ on an embedded device, smart phone and laptop with a real medical database, and extensive simulation results demonstrate that our proposed SPCQ scheme is highly efficient in terms of computation and communication costs.

  18. Secure and Privacy-Preserving Body Sensor Data Collection and Query Scheme

    PubMed Central

    Zhu, Hui; Gao, Lijuan; Li, Hui

    2016-01-01

    With the development of body sensor networks and the pervasiveness of smart phones, different types of personal data can be collected in real time by body sensors, and the potential value of massive personal data has attracted considerable interest recently. However, the privacy issues of sensitive personal data are still challenging today. Aiming at these challenges, in this paper, we focus on the threats from telemetry interface and present a secure and privacy-preserving body sensor data collection and query scheme, named SPCQ, for outsourced computing. In the proposed SPCQ scheme, users’ personal information is collected by body sensors in different types and converted into multi-dimension data, and each dimension is converted into the form of a number and uploaded to the cloud server, which provides a secure, efficient and accurate data query service, while the privacy of sensitive personal information and users’ query data is guaranteed. Specifically, based on an improved homomorphic encryption technology over composite order group, we propose a special weighted Euclidean distance contrast algorithm (WEDC) for multi-dimension vectors over encrypted data. With the SPCQ scheme, the confidentiality of sensitive personal data, the privacy of data users’ queries and accurate query service can be achieved in the cloud server. Detailed analysis shows that SPCQ can resist various security threats from telemetry interface. In addition, we also implement SPCQ on an embedded device, smart phone and laptop with a real medical database, and extensive simulation results demonstrate that our proposed SPCQ scheme is highly efficient in terms of computation and communication costs. PMID:26840319

  19. Rydberg aggregates

    NASA Astrophysics Data System (ADS)

    Wüster, S.; Rost, J.-M.

    2018-02-01

    We review Rydberg aggregates, assemblies of a few Rydberg atoms exhibiting energy transport through collective eigenstates, considering isolated atoms or assemblies embedded within clouds of cold ground-state atoms. We classify Rydberg aggregates, and provide an overview of their possible applications as quantum simulators for phenomena from chemical or biological physics. Our main focus is on flexible Rydberg aggregates, in which atomic motion is an essential feature. In these, simultaneous control over Rydberg-Rydberg interactions, external trapping and electronic energies, allows Born-Oppenheimer surfaces for the motion of the entire aggregate to be tailored as desired. This is illustrated with theory proposals towards the demonstration of joint motion and excitation transport, conical intersections and non-adiabatic effects. Additional flexibility for quantum simulations is enabled by the use of dressed dipole-dipole interactions or the embedding of the aggregate in a cold gas or Bose-Einstein condensate environment. Finally we provide some guidance regarding the parameter regimes that are most suitable for the realization of either static or flexible Rydberg aggregates based on Li or Rb atoms. The current status of experimental progress towards enabling Rydberg aggregates is also reviewed.

  20. Supporting temporal queries on clinical relational databases: the S-WATCH-QL language.

    PubMed Central

    Combi, C.; Missora, L.; Pinciroli, F.

    1996-01-01

    Due to the ubiquitous and special nature of time, specially in clinical datábases there's the need of particular temporal data and operators. In this paper we describe S-WATCH-QL (Structured Watch Query Language), a temporal extension of SQL, the widespread query language based on the relational model. S-WATCH-QL extends the well-known SQL by the addition of: a) temporal data types that allow the storage of information with different levels of granularity; b) historical relations that can store together both instantaneous valid times and intervals; c) some temporal clauses, functions and predicates allowing to define complex temporal queries. PMID:8947722

  1. A Comparison of Query-by-Example Methods for Spoken Term Detection

    DTIC Science & Technology

    2009-09-01

    consistent “errors” between the in- dex and the query. Few query terms have more than one pro- nunciation (avg. 1.1 prons . per term), as a result, there is... pron lex. one dict entry (llr) 73.01 47.66 21.11 all dict entries (avg+llr) 73.99 48.16 20.92 all dict entries (max+llr) 74.27 48.26 20.93 Table 1

  2. Cyclone: java-based querying and computing with Pathway/Genome databases.

    PubMed

    Le Fèvre, François; Smidtas, Serge; Schächter, Vincent

    2007-05-15

    Cyclone aims at facilitating the use of BioCyc, a collection of Pathway/Genome Databases (PGDBs). Cyclone provides a fully extensible Java Object API to analyze and visualize these data. Cyclone can read and write PGDBs, and can write its own data in the CycloneML format. This format is automatically generated from the BioCyc ontology by Cyclone itself, ensuring continued compatibility. Cyclone objects can also be stored in a relational database CycloneDB. Queries can be written in SQL, and in an intuitive and concise object-oriented query language, Hibernate Query Language (HQL). In addition, Cyclone interfaces easily with Java software including the Eclipse IDE for HQL edition, the Jung API for graph algorithms or Cytoscape for graph visualization. Cyclone is freely available under an open source license at: http://sourceforge.net/projects/nemo-cyclone. For download and installation instructions, tutorials, use cases and examples, see http://nemo-cyclone.sourceforge.net.

  3. Monitoring Influenza Epidemics in China with Search Query from Baidu

    PubMed Central

    Lv, Benfu; Peng, Geng; Chunara, Rumi; Brownstein, John S.

    2013-01-01

    Several approaches have been proposed for near real-time detection and prediction of the spread of influenza. These include search query data for influenza-related terms, which has been explored as a tool for augmenting traditional surveillance methods. In this paper, we present a method that uses Internet search query data from Baidu to model and monitor influenza activity in China. The objectives of the study are to present a comprehensive technique for: (i) keyword selection, (ii) keyword filtering, (iii) index composition and (iv) modeling and detection of influenza activity in China. Sequential time-series for the selected composite keyword index is significantly correlated with Chinese influenza case data. In addition, one-month ahead prediction of influenza cases for the first eight months of 2012 has a mean absolute percent error less than 11%. To our knowledge, this is the first study on the use of search query data from Baidu in conjunction with this approach for estimation of influenza activity in China. PMID:23750192

  4. Evaluation of recycled concrete aggregates for their suitability in construction activities: An experimental study.

    PubMed

    Puthussery, Joseph V; Kumar, Rakesh; Garg, Anurag

    2017-02-01

    Construction and demolition waste disposal is a major challenge in developing nations due to its ever increasing quantities. In this study, the recycling potential of waste concrete as aggregates in construction activities was studied. The metal leaching from the recycled concrete aggregates (RCA) collected from the demolition site of a 50year old building, was evaluated by performing three different leaching tests (compliance, availability and Toxic Characteristic Leaching Procedure). The metal leaching was found mostly within the permissible limit except for Hg. Several tests were performed to determine the physical and mechanical properties of the fine and coarse aggregates produced from recycled concrete. The properties of recycled aggregates were found to be satisfactory for their utilization in road construction activities. The suitability of using recycled fine and coarse aggregates with Portland pozzolanic cement to make a sustainable and environmental friendly concrete mix design was also analyzed. No significant difference was observed in the compressive strength of various concrete mixes prepared by natural and recycled aggregates. However, only the tensile strength of the mix prepared with 25% recycled fine aggregates was comparable to that of the control concrete. For other mixes, the tensile strength of the concrete was found to drop significantly. In summary, RCA should be considered seriously as a building material for road construction, mass concrete works, lightly reinforced sections, etc. The present work will be useful for the waste managers and policy makers particularly in developing nations where proper guidelines are still lacking. Copyright © 2016 Elsevier Ltd. All rights reserved.

  5. DREAM: Classification scheme for dialog acts in clinical research query mediation.

    PubMed

    Hoxha, Julia; Chandar, Praveen; He, Zhe; Cimino, James; Hanauer, David; Weng, Chunhua

    2016-02-01

    Clinical data access involves complex but opaque communication between medical researchers and query analysts. Understanding such communication is indispensable for designing intelligent human-machine dialog systems that automate query formulation. This study investigates email communication and proposes a novel scheme for classifying dialog acts in clinical research query mediation. We analyzed 315 email messages exchanged in the communication for 20 data requests obtained from three institutions. The messages were segmented into 1333 utterance units. Through a rigorous process, we developed a classification scheme and applied it for dialog act annotation of the extracted utterances. Evaluation results with high inter-annotator agreement demonstrate the reliability of this scheme. This dataset is used to contribute preliminary understanding of dialog acts distribution and conversation flow in this dialog space. Copyright © 2015 Elsevier Inc. All rights reserved.

  6. A SQL-Database Based Meta-CASE System and its Query Subsystem

    NASA Astrophysics Data System (ADS)

    Eessaar, Erki; Sgirka, Rünno

    Meta-CASE systems simplify the creation of CASE (Computer Aided System Engineering) systems. In this paper, we present a meta-CASE system that provides a web-based user interface and uses an object-relational database system (ORDBMS) as its basis. The use of ORDBMSs allows us to integrate different parts of the system and simplify the creation of meta-CASE and CASE systems. ORDBMSs provide powerful query mechanism. The proposed system allows developers to use queries to evaluate and gradually improve artifacts and calculate values of software measures. We illustrate the use of the systems by using SimpleM modeling language and discuss the use of SQL in the context of queries about artifacts. We have created a prototype of the meta-CASE system by using PostgreSQL™ ORDBMS and PHP scripting language.

  7. Comparative analysis of online health queries originating from personal computers and smart devices on a consumer health information portal.

    PubMed

    Jadhav, Ashutosh; Andrews, Donna; Fiksdal, Alexander; Kumbamu, Ashok; McCormick, Jennifer B; Misitano, Andrew; Nelsen, Laurie; Ryu, Euijung; Sheth, Amit; Wu, Stephen; Pathak, Jyotishman

    2014-07-04

    The number of people using the Internet and mobile/smart devices for health information seeking is increasing rapidly. Although the user experience for online health information seeking varies with the device used, for example, smart devices (SDs) like smartphones/tablets versus personal computers (PCs) like desktops/laptops, very few studies have investigated how online health information seeking behavior (OHISB) may differ by device. The objective of this study is to examine differences in OHISB between PCs and SDs through a comparative analysis of large-scale health search queries submitted through Web search engines from both types of devices. Using the Web analytics tool, IBM NetInsight OnDemand, and based on the type of devices used (PCs or SDs), we obtained the most frequent health search queries between June 2011 and May 2013 that were submitted on Web search engines and directed users to the Mayo Clinic's consumer health information website. We performed analyses on "Queries with considering repetition counts (QwR)" and "Queries without considering repetition counts (QwoR)". The dataset contains (1) 2.74 million and 3.94 million QwoR, respectively for PCs and SDs, and (2) more than 100 million QwR for both PCs and SDs. We analyzed structural properties of the queries (length of the search queries, usage of query operators and special characters in health queries), types of search queries (keyword-based, wh-questions, yes/no questions), categorization of the queries based on health categories and information mentioned in the queries (gender, age-groups, temporal references), misspellings in the health queries, and the linguistic structure of the health queries. Query strings used for health information searching via PCs and SDs differ by almost 50%. The most searched health categories are "Symptoms" (1 in 3 search queries), "Causes", and "Treatments & Drugs". The distribution of search queries for different health categories differs with the device used for

  8. Achieve Location Privacy-Preserving Range Query in Vehicular Sensing

    PubMed Central

    Lu, Rongxing; Ma, Maode; Bao, Haiyong

    2017-01-01

    Modern vehicles are equipped with a plethora of on-board sensors and large on-board storage, which enables them to gather and store various local-relevant data. However, the wide application of vehicular sensing has its own challenges, among which location-privacy preservation and data query accuracy are two critical problems. In this paper, we propose a novel range query scheme, which helps the data requester to accurately retrieve the sensed data from the distributive on-board storage in vehicular ad hoc networks (VANETs) with location privacy preservation. The proposed scheme exploits structured scalars to denote the locations of data requesters and vehicles, and achieves the privacy-preserving location matching with the homomorphic Paillier cryptosystem technique. Detailed security analysis shows that the proposed range query scheme can successfully preserve the location privacy of the involved data requesters and vehicles, and protect the confidentiality of the sensed data. In addition, performance evaluations are conducted to show the efficiency of the proposed scheme, in terms of computation delay and communication overhead. Specifically, the computation delay and communication overhead are not dependent on the length of the scalar, and they are only proportional to the number of vehicles. PMID:28786943

  9. Achieve Location Privacy-Preserving Range Query in Vehicular Sensing.

    PubMed

    Kong, Qinglei; Lu, Rongxing; Ma, Maode; Bao, Haiyong

    2017-08-08

    Modern vehicles are equipped with a plethora of on-board sensors and large on-board storage, which enables them to gather and store various local-relevant data. However, the wide application of vehicular sensing has its own challenges, among which location-privacy preservation and data query accuracy are two critical problems. In this paper, we propose a novel range query scheme, which helps the data requester to accurately retrieve the sensed data from the distributive on-board storage in vehicular ad hoc networks (VANETs) with location privacy preservation. The proposed scheme exploits structured scalars to denote the locations of data requesters and vehicles, and achieves the privacy-preserving location matching with the homomorphic Paillier cryptosystem technique. Detailed security analysis shows that the proposed range query scheme can successfully preserve the location privacy of the involved data requesters and vehicles, and protect the confidentiality of the sensed data. In addition, performance evaluations are conducted to show the efficiency of the proposed scheme, in terms of computation delay and communication overhead. Specifically, the computation delay and communication overhead are not dependent on the length of the scalar, and they are only proportional to the number of vehicles.

  10. Pulsed coherent population trapping with repeated queries for producing single-peaked high contrast Ramsey interference

    NASA Astrophysics Data System (ADS)

    Warren, Z.; Shahriar, M. S.; Tripathi, R.; Pati, G. S.

    2018-02-01

    A repeated query technique has been demonstrated as a new interrogation method in pulsed coherent population trapping for producing single-peaked Ramsey interference with high contrast. This technique enhances the contrast of the central Ramsey fringe by nearly 1.5 times and significantly suppresses the side fringes by using more query pulses ( >10) in the pulse cycle. Theoretical models have been developed to simulate Ramsey interference and analyze the characteristics of the Ramsey spectrum produced by the repeated query technique. Experiments have also been carried out employing a repeated query technique in a prototype rubidium clock to study its frequency stability performance.

  11. Total organic carbon in aggregates as a soil recovery indicator

    NASA Astrophysics Data System (ADS)

    Luciene Maltoni, Katia; Rodrigues Cassiolato, Ana Maria; Amorim Faria, Glaucia; Dubbin, William

    2015-04-01

    The soil aggregation promotes physical protection of organic matter, preservation of which is crucial to improve soil structure, fertility and ensure the agro-ecosystems sustainability. The no-tillage cultivation system has been considered as one of the strategies to increase total soil organic carbono (TOC) contents and soil aggregation, both are closely related and influenced by soil management systems. The aim of this study was to evaluate the distribution of soil aggregates and the total organic carbon inside aggregates, with regard to soil recovery, under 3 different soil management systems, i.e. 10 and 20 years of no-tillage cultivation as compared with soil under natural vegetation (Cerrado). Undisturbed soils (0-5; 5-10; and 10-20 cm depth) were collected from Brazil, Central Region. The soils, Oxisols from Cerrado, were collected from a field under Natural Vegetation-Cerrado (NV), and from fields that were under conventional tillage since 1970s, and 10 and 20 years ago were changed to no-tillage cultivation system (NT-10; NT-20 respectively). The undisturbed samples were sieved (4mm) and the aggregates retained were further fractionated by wet sieving through five sieves (2000, 1000, 500, 250, and 50 μm) with the aggregates distribution expressed as percentage retained by each sieve. The TOC was determined, for each aggregate size, by combustion (Thermo-Finnigan). A predominance of aggregates >2000 μm was observed under NV treatment (92, 91, 82 %), NT-10 (64, 73, 61 %), and NT-20 (71, 79, 63 %) for all three depths (0-5; 5-10; 10-20 cm). In addition greater quantities of aggregates in sizes 1000, 500, 250 and 50 μm under NT-10 and NT-20 treatments, explain the lower aggregate stability under these treatments compared to the soil under NV. The organic C concentration for NV in aggregates >2000 μm was 24,4; 14,2; 8,7 mg/g for each depth (0-5; 5-10; 10-20 cm, respectively), higher than in aggregates sized 250-50 μm (7,2; 5,5; 4,4 mg/g) for all depths

  12. Normalized Legal Drafting and the Query Method.

    ERIC Educational Resources Information Center

    Allen, Layman E.; Engholm, C. Rudy

    1978-01-01

    Normalized legal drafting, a mode of expressing ideas in legal documents so that the syntax that relates the constituent propositions is simplified and standardized, and the query method, a question-asking activity that teaches normalized drafting and provides practice, are examined. Some examples are presented. (JMD)

  13. Spatiotemporal conceptual platform for querying archaeological information systems

    NASA Astrophysics Data System (ADS)

    Partsinevelos, Panagiotis; Sartzetaki, Mary; Sarris, Apostolos

    2015-04-01

    Spatial and temporal distribution of archaeological sites has been shown to associate with several attributes including marine, water, mineral and food resources, climate conditions, geomorphological features, etc. In this study, archeological settlement attributes are evaluated under various associations in order to provide a specialized query platform in a geographic information system (GIS). Towards this end, a spatial database is designed to include a series of archaeological findings for a secluded geographic area of Crete in Greece. The key categories of the geodatabase include the archaeological type (palace, burial site, village, etc.), temporal information of the habitation/usage period (pre Minoan, Minoan, Byzantine, etc.), and the extracted geographical attributes of the sites (distance to sea, altitude, resources, etc.). Most of the related spatial attributes are extracted with readily available GIS tools. Additionally, a series of conceptual data attributes are estimated, including: Temporal relation of an era to a future one in terms of alteration of the archaeological type, topologic relations of various types and attributes, spatial proximity relations between various types. These complex spatiotemporal relational measures reveal new attributes towards better understanding of site selection for prehistoric and/or historic cultures, yet their potential combinations can become numerous. Therefore, after the quantification of the above mentioned attributes, they are classified as of their importance for archaeological site location modeling. Under this new classification scheme, the user may select a geographic area of interest and extract only the important attributes for a specific archaeological type. These extracted attributes may then be queried against the entire spatial database and provide a location map of possible new archaeological sites. This novel type of querying is robust since the user does not have to type a standard SQL query but

  14. Superfund Chemical Data Matrix (SCDM) Query - April 2016

    EPA Pesticide Factsheets

    This site allows you to to easily query the Superfund Chemical Data Matrix (SCDM) and generate a list of the corresponding Hazardous Ranking System (HRS) factor values, benchmarks, and data elements that you need.

  15. A Coding Method for Efficient Subgraph Querying on Vertex- and Edge-Labeled Graphs

    PubMed Central

    Zhu, Lei; Song, Qinbao; Guo, Yuchen; Du, Lei; Zhu, Xiaoyan; Wang, Guangtao

    2014-01-01

    Labeled graphs are widely used to model complex data in many domains, so subgraph querying has been attracting more and more attention from researchers around the world. Unfortunately, subgraph querying is very time consuming since it involves subgraph isomorphism testing that is known to be an NP-complete problem. In this paper, we propose a novel coding method for subgraph querying that is based on Laplacian spectrum and the number of walks. Our method follows the filtering-and-verification framework and works well on graph databases with frequent updates. We also propose novel two-step filtering conditions that can filter out most false positives and prove that the two-step filtering conditions satisfy the no-false-negative requirement (no dismissal in answers). Extensive experiments on both real and synthetic graphs show that, compared with six existing counterpart methods, our method can effectively improve the efficiency of subgraph querying. PMID:24853266

  16. A Lightweight I/O Scheme to Facilitate Spatial and Temporal Queries of Scientific Data Analytics

    NASA Technical Reports Server (NTRS)

    Tian, Yuan; Liu, Zhuo; Klasky, Scott; Wang, Bin; Abbasi, Hasan; Zhou, Shujia; Podhorszki, Norbert; Clune, Tom; Logan, Jeremy; Yu, Weikuan

    2013-01-01

    In the era of petascale computing, more scientific applications are being deployed on leadership scale computing platforms to enhance the scientific productivity. Many I/O techniques have been designed to address the growing I/O bottleneck on large-scale systems by handling massive scientific data in a holistic manner. While such techniques have been leveraged in a wide range of applications, they have not been shown as adequate for many mission critical applications, particularly in data post-processing stage. One of the examples is that some scientific applications generate datasets composed of a vast amount of small data elements that are organized along many spatial and temporal dimensions but require sophisticated data analytics on one or more dimensions. Including such dimensional knowledge into data organization can be beneficial to the efficiency of data post-processing, which is often missing from exiting I/O techniques. In this study, we propose a novel I/O scheme named STAR (Spatial and Temporal AggRegation) to enable high performance data queries for scientific analytics. STAR is able to dive into the massive data, identify the spatial and temporal relationships among data variables, and accordingly organize them into an optimized multi-dimensional data structure before storing to the storage. This technique not only facilitates the common access patterns of data analytics, but also further reduces the application turnaround time. In particular, STAR is able to enable efficient data queries along the time dimension, a practice common in scientific analytics but not yet supported by existing I/O techniques. In our case study with a critical climate modeling application GEOS-5, the experimental results on Jaguar supercomputer demonstrate an improvement up to 73 times for the read performance compared to the original I/O method.

  17. Knowledge Acquisition of Generic Queries for Information Retrieval

    PubMed Central

    Seol, Yoon-Ho; Johnson, Stephen B.; Cimino, James J.

    2002-01-01

    Several studies have identified clinical questions posed by health care professionals to understand the nature of information needs during clinical practice. To support access to digital information sources, it is necessary to integrate the information needs with a computer system. We have developed a conceptual guidance approach in information retrieval, based on a knowledge base that contains the patterns of information needs. The knowledge base uses a formal representation of clinical questions based on the UMLS knowledge sources, called the Generic Query model. To improve the coverage of the knowledge base, we investigated a method for extracting plausible clinical questions from the medical literature. This poster presents the Generic Query model, shows how it is used to represent the patterns of clinical questions, and describes the framework used to extract knowledge from the medical literature.

  18. Query-based learning for aerospace applications.

    PubMed

    Saad, E W; Choi, J J; Vian, J L; Wunsch, D C Ii

    2003-01-01

    Models of real-world applications often include a large number of parameters with a wide dynamic range, which contributes to the difficulties of neural network training. Creating the training data set for such applications becomes costly, if not impossible. In order to overcome the challenge, one can employ an active learning technique known as query-based learning (QBL) to add performance-critical data to the training set during the learning phase, thereby efficiently improving the overall learning/generalization. The performance-critical data can be obtained using an inverse mapping called network inversion (discrete network inversion and continuous network inversion) followed by oracle query. This paper investigates the use of both inversion techniques for QBL learning, and introduces an original heuristic to select the inversion target values for continuous network inversion method. Efficiency and generalization was further enhanced by employing node decoupled extended Kalman filter (NDEKF) training and a causality index (CI) as a means to reduce the input search dimensionality. The benefits of the overall QBL approach are experimentally demonstrated in two aerospace applications: a classification problem with large input space and a control distribution problem.

  19. Comparative Analysis of Online Health Queries Originating From Personal Computers and Smart Devices on a Consumer Health Information Portal

    PubMed Central

    Jadhav, Ashutosh; Andrews, Donna; Fiksdal, Alexander; Kumbamu, Ashok; McCormick, Jennifer B; Misitano, Andrew; Nelsen, Laurie; Ryu, Euijung; Sheth, Amit; Wu, Stephen

    2014-01-01

    Background The number of people using the Internet and mobile/smart devices for health information seeking is increasing rapidly. Although the user experience for online health information seeking varies with the device used, for example, smart devices (SDs) like smartphones/tablets versus personal computers (PCs) like desktops/laptops, very few studies have investigated how online health information seeking behavior (OHISB) may differ by device. Objective The objective of this study is to examine differences in OHISB between PCs and SDs through a comparative analysis of large-scale health search queries submitted through Web search engines from both types of devices. Methods Using the Web analytics tool, IBM NetInsight OnDemand, and based on the type of devices used (PCs or SDs), we obtained the most frequent health search queries between June 2011 and May 2013 that were submitted on Web search engines and directed users to the Mayo Clinic’s consumer health information website. We performed analyses on “Queries with considering repetition counts (QwR)” and “Queries without considering repetition counts (QwoR)”. The dataset contains (1) 2.74 million and 3.94 million QwoR, respectively for PCs and SDs, and (2) more than 100 million QwR for both PCs and SDs. We analyzed structural properties of the queries (length of the search queries, usage of query operators and special characters in health queries), types of search queries (keyword-based, wh-questions, yes/no questions), categorization of the queries based on health categories and information mentioned in the queries (gender, age-groups, temporal references), misspellings in the health queries, and the linguistic structure of the health queries. Results Query strings used for health information searching via PCs and SDs differ by almost 50%. The most searched health categories are “Symptoms” (1 in 3 search queries), “Causes”, and “Treatments & Drugs”. The distribution of search queries for

  20. Correlation between National Influenza Surveillance Data and Search Queries from Mobile Devices and Desktops in South Korea

    PubMed Central

    Seo, Dong-Woo; Sohn, Chang Hwan; Kim, Sung-Hoon; Ryoo, Seung Mok; Lee, Yoon-Seon; Lee, Jae Ho; Kim, Won Young; Lim, Kyoung Soo

    2016-01-01

    Background Digital surveillance using internet search queries can improve both the sensitivity and timeliness of the detection of a health event, such as an influenza outbreak. While it has recently been estimated that the mobile search volume surpasses the desktop search volume and mobile search patterns differ from desktop search patterns, the previous digital surveillance systems did not distinguish mobile and desktop search queries. The purpose of this study was to compare the performance of mobile and desktop search queries in terms of digital influenza surveillance. Methods and Results The study period was from September 6, 2010 through August 30, 2014, which consisted of four epidemiological years. Influenza-like illness (ILI) and virologic surveillance data from the Korea Centers for Disease Control and Prevention were used. A total of 210 combined queries from our previous survey work were used for this study. Mobile and desktop weekly search data were extracted from Naver, which is the largest search engine in Korea. Spearman’s correlation analysis was used to examine the correlation of the mobile and desktop data with ILI and virologic data in Korea. We also performed lag correlation analysis. We observed that the influenza surveillance performance of mobile search queries matched or exceeded that of desktop search queries over time. The mean correlation coefficients of mobile search queries and the number of queries with an r-value of ≥ 0.7 equaled or became greater than those of desktop searches over the four epidemiological years. A lag correlation analysis of up to two weeks showed similar trends. Conclusion Our study shows that mobile search queries for influenza surveillance have equaled or even become greater than desktop search queries over time. In the future development of influenza surveillance using search queries, the recognition of changing trend of mobile search data could be necessary. PMID:27391028

  1. Correlation between National Influenza Surveillance Data and Search Queries from Mobile Devices and Desktops in South Korea.

    PubMed

    Shin, Soo-Yong; Kim, Taerim; Seo, Dong-Woo; Sohn, Chang Hwan; Kim, Sung-Hoon; Ryoo, Seung Mok; Lee, Yoon-Seon; Lee, Jae Ho; Kim, Won Young; Lim, Kyoung Soo

    2016-01-01

    Digital surveillance using internet search queries can improve both the sensitivity and timeliness of the detection of a health event, such as an influenza outbreak. While it has recently been estimated that the mobile search volume surpasses the desktop search volume and mobile search patterns differ from desktop search patterns, the previous digital surveillance systems did not distinguish mobile and desktop search queries. The purpose of this study was to compare the performance of mobile and desktop search queries in terms of digital influenza surveillance. The study period was from September 6, 2010 through August 30, 2014, which consisted of four epidemiological years. Influenza-like illness (ILI) and virologic surveillance data from the Korea Centers for Disease Control and Prevention were used. A total of 210 combined queries from our previous survey work were used for this study. Mobile and desktop weekly search data were extracted from Naver, which is the largest search engine in Korea. Spearman's correlation analysis was used to examine the correlation of the mobile and desktop data with ILI and virologic data in Korea. We also performed lag correlation analysis. We observed that the influenza surveillance performance of mobile search queries matched or exceeded that of desktop search queries over time. The mean correlation coefficients of mobile search queries and the number of queries with an r-value of ≥ 0.7 equaled or became greater than those of desktop searches over the four epidemiological years. A lag correlation analysis of up to two weeks showed similar trends. Our study shows that mobile search queries for influenza surveillance have equaled or even become greater than desktop search queries over time. In the future development of influenza surveillance using search queries, the recognition of changing trend of mobile search data could be necessary.

  2. SPANG: a SPARQL client supporting generation and reuse of queries for distributed RDF databases.

    PubMed

    Chiba, Hirokazu; Uchiyama, Ikuo

    2017-02-08

    Toward improved interoperability of distributed biological databases, an increasing number of datasets have been published in the standardized Resource Description Framework (RDF). Although the powerful SPARQL Protocol and RDF Query Language (SPARQL) provides a basis for exploiting RDF databases, writing SPARQL code is burdensome for users including bioinformaticians. Thus, an easy-to-use interface is necessary. We developed SPANG, a SPARQL client that has unique features for querying RDF datasets. SPANG dynamically generates typical SPARQL queries according to specified arguments. It can also call SPARQL template libraries constructed in a local system or published on the Web. Further, it enables combinatorial execution of multiple queries, each with a distinct target database. These features facilitate easy and effective access to RDF datasets and integrative analysis of distributed data. SPANG helps users to exploit RDF datasets by generation and reuse of SPARQL queries through a simple interface. This client will enhance integrative exploitation of biological RDF datasets distributed across the Web. This software package is freely available at http://purl.org/net/spang .

  3. Usage of the Jess Engine, Rules and Ontology to Query a Relational Database

    NASA Astrophysics Data System (ADS)

    Bak, Jaroslaw; Jedrzejek, Czeslaw; Falkowski, Maciej

    We present a prototypical implementation of a library tool, the Semantic Data Library (SDL), which integrates the Jess (Java Expert System Shell) engine, rules and ontology to query a relational database. The tool extends functionalities of previous OWL2Jess with SWRL implementations and takes full advantage of the Jess engine, by separating forward and backward reasoning. The optimization of integration of all these technologies is an advancement over previous tools. We discuss the complexity of the query algorithm. As a demonstration of capability of the SDL library, we execute queries using crime ontology which is being developed in the Polish PPBW project.

  4. IQARIS : a tool for the intelligent querying, analysis, and retrieval from information systems.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hummel, J. R.; Silver, R. B.

    Information glut is one of the primary characteristics of the electronic age. Managing such large volumes of information (e.g., keeping track of the types, where they are, their relationships, who controls them, etc.) can be done efficiently with an intelligent, user-oriented information management system. The purpose of this paper is to describe a concept for managing information resources based on an intelligent information technology system developed by the Argonne National Laboratory for managing digital libraries. The Argonne system, Intelligent Query (IQ), enables users to query digital libraries and view the holdings that match the query from different perspectives.

  5. TreeQ-VISTA: An Interactive Tree Visualization Tool withFunctional Annotation Query Capabilities

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gu, Shengyin; Anderson, Iain; Kunin, Victor

    2007-05-07

    Summary: We describe a general multiplatform exploratorytool called TreeQ-Vista, designed for presenting functional annotationsin a phylogenetic context. Traits, such as phenotypic and genomicproperties, are interactively queried from a relational database with auser-friendly interface which provides a set of tools for users with orwithout SQL knowledge. The query results are projected onto aphylogenetic tree and can be displayed in multiple color groups. A richset of browsing, grouping and query tools are provided to facilitatetrait exploration, comparison and analysis.Availability: The program,detailed tutorial and examples are available online athttp://genome-test.lbl.gov/vista/TreeQVista.

  6. Visual graph query formulation and exploration: a new perspective on information retrieval at the edge

    NASA Astrophysics Data System (ADS)

    Kase, Sue E.; Vanni, Michelle; Knight, Joanne A.; Su, Yu; Yan, Xifeng

    2016-05-01

    Within operational environments decisions must be made quickly based on the information available. Identifying an appropriate knowledge base and accurately formulating a search query are critical tasks for decision-making effectiveness in dynamic situations. The spreading of graph data management tools to access large graph databases is a rapidly emerging research area of potential benefit to the intelligence community. A graph representation provides a natural way of modeling data in a wide variety of domains. Graph structures use nodes, edges, and properties to represent and store data. This research investigates the advantages of information search by graph query initiated by the analyst and interactively refined within the contextual dimensions of the answer space toward a solution. The paper introduces SLQ, a user-friendly graph querying system enabling the visual formulation of schemaless and structureless graph queries. SLQ is demonstrated with an intelligence analyst information search scenario focused on identifying individuals responsible for manufacturing a mosquito-hosted deadly virus. The scenario highlights the interactive construction of graph queries without prior training in complex query languages or graph databases, intuitive navigation through the problem space, and visualization of results in graphical format.

  7. Effective Filtering of Query Results on Updated User Behavioral Profiles in Web Mining

    PubMed Central

    Sadesh, S.; Suganthe, R. C.

    2015-01-01

    Web with tremendous volume of information retrieves result for user related queries. With the rapid growth of web page recommendation, results retrieved based on data mining techniques did not offer higher performance filtering rate because relationships between user profile and queries were not analyzed in an extensive manner. At the same time, existing user profile based prediction in web data mining is not exhaustive in producing personalized result rate. To improve the query result rate on dynamics of user behavior over time, Hamilton Filtered Regime Switching User Query Probability (HFRS-UQP) framework is proposed. HFRS-UQP framework is split into two processes, where filtering and switching are carried out. The data mining based filtering in our research work uses the Hamilton Filtering framework to filter user result based on personalized information on automatic updated profiles through search engine. Maximized result is fetched, that is, filtered out with respect to user behavior profiles. The switching performs accurate filtering updated profiles using regime switching. The updating in profile change (i.e., switches) regime in HFRS-UQP framework identifies the second- and higher-order association of query result on the updated profiles. Experiment is conducted on factors such as personalized information search retrieval rate, filtering efficiency, and precision ratio. PMID:26221626

  8. Analytics-Driven Lossless Data Compression for Rapid In-situ Indexing, Storing, and Querying

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jenkins, John; Arkatkar, Isha; Lakshminarasimhan, Sriram

    2013-01-01

    The analysis of scientific simulations is highly data-intensive and is becoming an increasingly important challenge. Peta-scale data sets require the use of light-weight query-driven analysis methods, as opposed to heavy-weight schemes that optimize for speed at the expense of size. This paper is an attempt in the direction of query processing over losslessly compressed scientific data. We propose a co-designed double-precision compression and indexing methodology for range queries by performing unique-value-based binning on the most significant bytes of double precision data (sign, exponent, and most significant mantissa bits), and inverting the resulting metadata to produce an inverted index over amore » reduced data representation. Without the inverted index, our method matches or improves compression ratios over both general-purpose and floating-point compression utilities. The inverted index is light-weight, and the overall storage requirement for both reduced column and index is less than 135%, whereas existing DBMS technologies can require 200-400%. As a proof-of-concept, we evaluate univariate range queries that additionally return column values, a critical component of data analytics, against state-of-the-art bitmap indexing technology, showing multi-fold query performance improvements.« less

  9. Space Object Query Tool

    NASA Technical Reports Server (NTRS)

    Phillips, Veronica J.

    2017-01-01

    STI is for a fact sheet on the Space Object Query Tool being created by the MDC. When planning launches, NASA must first factor in the tens of thousands of objects already in orbit around the Earth. The number of human-made objects, including nonfunctional spacecraft, abandoned launch vehicle stages, mission-related debris and fragmentation debris orbiting Earth has grown steadily since Sputnik 1 was launched in 1957. Currently, the U.S. Department of Defenses Joint Space Operations Center, or JSpOC, tracks over 15,000 distinct objects and provides data for more than 40,000 objects via its Space-Track program, found at space-track.org.

  10. The Database Query Support Processor (QSP)

    NASA Technical Reports Server (NTRS)

    1993-01-01

    The number and diversity of databases available to users continues to increase dramatically. Currently, the trend is towards decentralized, client server architectures that (on the surface) are less expensive to acquire, operate, and maintain than information architectures based on centralized, monolithic mainframes. The database query support processor (QSP) effort evaluates the performance of a network level, heterogeneous database access capability. Air Force Material Command's Rome Laboratory has developed an approach, based on ANSI standard X3.138 - 1988, 'The Information Resource Dictionary System (IRDS)' to seamless access to heterogeneous databases based on extensions to data dictionary technology. To successfully query a decentralized information system, users must know what data are available from which source, or have the knowledge and system privileges necessary to find out this information. Privacy and security considerations prohibit free and open access to every information system in every network. Even in completely open systems, time required to locate relevant data (in systems of any appreciable size) would be better spent analyzing the data, assuming the original question was not forgotten. Extensions to data dictionary technology have the potential to more fully automate the search and retrieval for relevant data in a decentralized environment. Substantial amounts of time and money could be saved by not having to teach users what data resides in which systems and how to access each of those systems. Information describing data and how to get it could be removed from the application and placed in a dedicated repository where it belongs. The result simplified applications that are less brittle and less expensive to build and maintain. Software technology providing the required functionality is off the shelf. The key difficulty is in defining the metadata required to support the process. The database query support processor effort will provide

  11. Unstructured medical image query using big data - An epilepsy case study.

    PubMed

    Istephan, Sarmad; Siadat, Mohammad-Reza

    2016-02-01

    Big data technologies are critical to the medical field which requires new frameworks to leverage them. Such frameworks would benefit medical experts to test hypotheses by querying huge volumes of unstructured medical data to provide better patient care. The objective of this work is to implement and examine the feasibility of having such a framework to provide efficient querying of unstructured data in unlimited ways. The feasibility study was conducted specifically in the epilepsy field. The proposed framework evaluates a query in two phases. In phase 1, structured data is used to filter the clinical data warehouse. In phase 2, feature extraction modules are executed on the unstructured data in a distributed manner via Hadoop to complete the query. Three modules have been created, volume comparer, surface to volume conversion and average intensity. The framework allows for user-defined modules to be imported to provide unlimited ways to process the unstructured data hence potentially extending the application of this framework beyond epilepsy field. Two types of criteria were used to validate the feasibility of the proposed framework - the ability/accuracy of fulfilling an advanced medical query and the efficiency that Hadoop provides. For the first criterion, the framework executed an advanced medical query that spanned both structured and unstructured data with accurate results. For the second criterion, different architectures were explored to evaluate the performance of various Hadoop configurations and were compared to a traditional Single Server Architecture (SSA). The surface to volume conversion module performed up to 40 times faster than the SSA (using a 20 node Hadoop cluster) and the average intensity module performed up to 85 times faster than the SSA (using a 40 node Hadoop cluster). Furthermore, the 40 node Hadoop cluster executed the average intensity module on 10,000 models in 3h which was not even practical for the SSA. The current study is

  12. A Random Walk Approach to Query Informative Constraints for Clustering.

    PubMed

    Abin, Ahmad Ali

    2017-08-09

    This paper presents a random walk approach to the problem of querying informative constraints for clustering. The proposed method is based on the properties of the commute time, that is the expected time taken for a random walk to travel between two nodes and return, on the adjacency graph of data. Commute time has the nice property of that, the more short paths connect two given nodes in a graph, the more similar those nodes are. Since computing the commute time takes the Laplacian eigenspectrum into account, we use this property in a recursive fashion to query informative constraints for clustering. At each recursion, the proposed method constructs the adjacency graph of data and utilizes the spectral properties of the commute time matrix to bipartition the adjacency graph. Thereafter, the proposed method benefits from the commute times distance on graph to query informative constraints between partitions. This process iterates for each partition until the stop condition becomes true. Experiments on real-world data show the efficiency of the proposed method for constraints selection.

  13. Query by forms: User-oriented relational database retrieving system and its application in analysis of experiment data

    NASA Astrophysics Data System (ADS)

    Skotniczny, Zbigniew

    1989-12-01

    The Query by Forms (QbF) system is a user-oriented interactive tool for querying large relational database with minimal queries difinition cost. The system was worked out under the assumption that user's time and effort for defining needed queries is the most severe bottleneck. The system may be applied in any Rdb/VMS databases system and is recommended for specific information systems of any project where end-user queries cannot be foreseen. The tool is dedicated to specialist of an application domain who have to analyze data maintained in database from any needed point of view, who do not need to know commercial databases languages. The paper presents the system developed as a compromise between its functionality and usability. User-system communication via a menu-driven "tree-like" structure of screen-forms which produces a query difinition and execution is discussed in detail. Output of query results (printed reports and graphics) is also discussed. Finally the paper shows one application of QbF to a HERA-project.

  14. Managing and Querying Image Annotation and Markup in XML.

    PubMed

    Wang, Fusheng; Pan, Tony; Sharma, Ashish; Saltz, Joel

    2010-01-01

    Proprietary approaches for representing annotations and image markup are serious barriers for researchers to share image data and knowledge. The Annotation and Image Markup (AIM) project is developing a standard based information model for image annotation and markup in health care and clinical trial environments. The complex hierarchical structures of AIM data model pose new challenges for managing such data in terms of performance and support of complex queries. In this paper, we present our work on managing AIM data through a native XML approach, and supporting complex image and annotation queries through native extension of XQuery language. Through integration with xService, AIM databases can now be conveniently shared through caGrid.

  15. Managing and Querying Image Annotation and Markup in XML

    PubMed Central

    Wang, Fusheng; Pan, Tony; Sharma, Ashish; Saltz, Joel

    2010-01-01

    Proprietary approaches for representing annotations and image markup are serious barriers for researchers to share image data and knowledge. The Annotation and Image Markup (AIM) project is developing a standard based information model for image annotation and markup in health care and clinical trial environments. The complex hierarchical structures of AIM data model pose new challenges for managing such data in terms of performance and support of complex queries. In this paper, we present our work on managing AIM data through a native XML approach, and supporting complex image and annotation queries through native extension of XQuery language. Through integration with xService, AIM databases can now be conveniently shared through caGrid. PMID:21218167

  16. TopFed: TCGA tailored federated query processing and linking to LOD.

    PubMed

    Saleem, Muhammad; Padmanabhuni, Shanmukha S; Ngomo, Axel-Cyrille Ngonga; Iqbal, Aftab; Almeida, Jonas S; Decker, Stefan; Deus, Helena F

    2014-01-01

    The Cancer Genome Atlas (TCGA) is a multidisciplinary, multi-institutional effort to catalogue genetic mutations responsible for cancer using genome analysis techniques. One of the aims of this project is to create a comprehensive and open repository of cancer related molecular analysis, to be exploited by bioinformaticians towards advancing cancer knowledge. However, devising bioinformatics applications to analyse such large dataset is still challenging, as it often requires downloading large archives and parsing the relevant text files. Therefore, it is making it difficult to enable virtual data integration in order to collect the critical co-variates necessary for analysis. We address these issues by transforming the TCGA data into the Semantic Web standard Resource Description Format (RDF), link it to relevant datasets in the Linked Open Data (LOD) cloud and further propose an efficient data distribution strategy to host the resulting 20.4 billion triples data via several SPARQL endpoints. Having the TCGA data distributed across multiple SPARQL endpoints, we enable biomedical scientists to query and retrieve information from these SPARQL endpoints by proposing a TCGA tailored federated SPARQL query processing engine named TopFed. We compare TopFed with a well established federation engine FedX in terms of source selection and query execution time by using 10 different federated SPARQL queries with varying requirements. Our evaluation results show that TopFed selects on average less than half of the sources (with 100% recall) with query execution time equal to one third to that of FedX. With TopFed, we aim to offer biomedical scientists a single-point-of-access through which distributed TCGA data can be accessed in unison. We believe the proposed system can greatly help researchers in the biomedical domain to carry out their research effectively with TCGA as the amount and diversity of data exceeds the ability of local resources to handle its retrieval and

  17. FastQuery: A Parallel Indexing System for Scientific Data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chou, Jerry; Wu, Kesheng; Prabhat,

    2011-07-29

    Modern scientific datasets present numerous data management and analysis challenges. State-of-the- art index and query technologies such as FastBit can significantly improve accesses to these datasets by augmenting the user data with indexes and other secondary information. However, a challenge is that the indexes assume the relational data model but the scientific data generally follows the array data model. To match the two data models, we design a generic mapping mechanism and implement an efficient input and output interface for reading and writing the data and their corresponding indexes. To take advantage of the emerging many-core architectures, we also developmore » a parallel strategy for indexing using threading technology. This approach complements our on-going MPI-based parallelization efforts. We demonstrate the flexibility of our software by applying it to two of the most commonly used scientific data formats, HDF5 and NetCDF. We present two case studies using data from a particle accelerator model and a global climate model. We also conducted a detailed performance study using these scientific datasets. The results show that FastQuery speeds up the query time by a factor of 2.5x to 50x, and it reduces the indexing time by a factor of 16 on 24 cores.« less

  18. Querying archetype-based EHRs by search ontology-based XPath engineering.

    PubMed

    Kropf, Stefan; Uciteli, Alexandr; Schierle, Katrin; Krücken, Peter; Denecke, Kerstin; Herre, Heinrich

    2018-05-11

    Legacy data and new structured data can be stored in a standardized format as XML-based EHRs on XML databases. Querying documents on these databases is crucial for answering research questions. Instead of using free text searches, that lead to false positive results, the precision can be increased by constraining the search to certain parts of documents. A search ontology-based specification of queries on XML documents defines search concepts and relates them to parts in the XML document structure. Such query specification method is practically introduced and evaluated by applying concrete research questions formulated in natural language on a data collection for information retrieval purposes. The search is performed by search ontology-based XPath engineering that reuses ontologies and XML-related W3C standards. The key result is that the specification of research questions can be supported by the usage of search ontology-based XPath engineering. A deeper recognition of entities and a semantic understanding of the content is necessary for a further improvement of precision and recall. Key limitation is that the application of the introduced process requires skills in ontology and software development. In future, the time consuming ontology development could be overcome by implementing a new clinical role: the clinical ontologist. The introduced Search Ontology XML extension connects Search Terms to certain parts in XML documents and enables an ontology-based definition of queries. Search ontology-based XPath engineering can support research question answering by the specification of complex XPath expressions without deep syntax knowledge about XPaths.

  19. Snaptron: querying splicing patterns across tens of thousands of RNA-seq samples

    PubMed Central

    Wilks, Christopher; Gaddipati, Phani; Nellore, Abhinav

    2018-01-01

    Abstract Motivation As more and larger genomics studies appear, there is a growing need for comprehensive and queryable cross-study summaries. These enable researchers to leverage vast datasets that would otherwise be difficult to obtain. Results Snaptron is a search engine for summarized RNA sequencing data with a query planner that leverages R-tree, B-tree and inverted indexing strategies to rapidly execute queries over 146 million exon-exon splice junctions from over 70 000 human RNA-seq samples. Queries can be tailored by constraining which junctions and samples to consider. Snaptron can score junctions according to tissue specificity or other criteria, and can score samples according to the relative frequency of different splicing patterns. We describe the software and outline biological questions that can be explored with Snaptron queries. Availability and implementation Documentation is at http://snaptron.cs.jhu.edu. Source code is at https://github.com/ChristopherWilks/snaptron and https://github.com/ChristopherWilks/snaptron-experiments with a CC BY-NC 4.0 license. Contact chris.wilks@jhu.edu or langmea@cs.jhu.edu Supplementary information Supplementary data are available at Bioinformatics online. PMID:28968689

  20. Snaptron: querying splicing patterns across tens of thousands of RNA-seq samples.

    PubMed

    Wilks, Christopher; Gaddipati, Phani; Nellore, Abhinav; Langmead, Ben

    2018-01-01

    As more and larger genomics studies appear, there is a growing need for comprehensive and queryable cross-study summaries. These enable researchers to leverage vast datasets that would otherwise be difficult to obtain. Snaptron is a search engine for summarized RNA sequencing data with a query planner that leverages R-tree, B-tree and inverted indexing strategies to rapidly execute queries over 146 million exon-exon splice junctions from over 70 000 human RNA-seq samples. Queries can be tailored by constraining which junctions and samples to consider. Snaptron can score junctions according to tissue specificity or other criteria, and can score samples according to the relative frequency of different splicing patterns. We describe the software and outline biological questions that can be explored with Snaptron queries. Documentation is at http://snaptron.cs.jhu.edu. Source code is at https://github.com/ChristopherWilks/snaptron and https://github.com/ChristopherWilks/snaptron-experiments with a CC BY-NC 4.0 license. chris.wilks@jhu.edu or langmea@cs.jhu.edu. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.

  1. Properties of concrete with tire derived aggregate and crumb rubber as a lighthweight substitute for mineral aggregates in the concrete mix

    NASA Astrophysics Data System (ADS)

    Siringi, Gideon Momanyi

    Scrap tires continue to be a nuisance to the environment and this research proposes one way of recycling them as a lightweight aggregate which can substitute for mineral aggregates in concrete. Aggregates derived from scrap tires are often referred to as Tire Derived Aggregate (TDA). First, the focus is how much mineral aggregate can be replaced by these waste tires and how the properties of concrete are affected with the introduction of rubber. This is being mindful of the fact that for a new material to be acceptable as an engineering material, its properties and behavior has to be well understood, the materials must perform properly and be acceptable to the regulating agencies. The role played by the quantity of TDA and Crumb Rubber replacing coarse aggregate and fine aggregate respectively as well as different treatment and additives in concrete on its properties are examined. Conventional concrete (without TDA) and concrete containing TDA are compared by examining their compressive strength based on ASTM C39, workability based on ASTM C143, Splitting Tensile Strength based on ASTM C496, Modulus of Rupture (flexural strength) based on ASTM C78 and Bond strength of concrete developed with reinforcing steel based on ASTM C234.Through stress-strain plots, the rubberized concrete is compared in terms of change in ductility, toughness and Elastic Modulus. Results indicate that while replacement of mineral aggregates with TDA results in reduction in compressive strength, this may be mitigated by addition of silica fume or using a smaller size of TDA to obtain the desired strength. The greatest benefit of using TDA is in the development of a higher ductile product with lower density while utilizing recycled TDA. From the results, it is observed that 7-10% of weight of mineral aggregates can be replaced by an equal volume of TDA to produce concrete with compressive strength of up to 4000 psi (27.5 MPa). Rubberized concrete would have higher ductility and toughness with

  2. Azaphilones inhibit tau aggregation and dissolve tau aggregates in vitro.

    PubMed

    Paranjape, Smita R; Riley, Andrew P; Somoza, Amber D; Oakley, C Elizabeth; Wang, Clay C C; Prisinzano, Thomas E; Oakley, Berl R; Gamblin, T Chris

    2015-05-20

    The aggregation of the microtubule-associated protein tau is a seminal event in many neurodegenerative diseases, including Alzheimer's disease. The inhibition or reversal of tau aggregation is therefore a potential therapeutic strategy for these diseases. Fungal natural products have proven to be a rich source of useful compounds having wide varieties of biological activities. We have previously screened Aspergillus nidulans secondary metabolites for their ability to inhibit tau aggregation in vitro using an arachidonic acid polymerization protocol. One aggregation inhibitor identified was asperbenzaldehyde, an intermediate in azaphilone biosynthesis. We therefore tested 11 azaphilone derivatives to determine their tau assembly inhibition properties in vitro. All compounds tested inhibited tau filament assembly to some extent, and four of the 11 compounds had the advantageous property of disassembling preformed tau aggregates in a dose-dependent fashion. The addition of these compounds to the tau aggregates reduced both the total length and number of tau polymers. The most potent compounds were tested in in vitro reactions to determine whether they interfere with tau's normal function of stabilizing microtubules (MTs). We found that they did not completely inhibit MT assembly in the presence of tau. These derivatives are very promising lead compounds for tau aggregation inhibitors and, more excitingly, for compounds that can disassemble pre-existing tau filaments. They also represent a new class of anti-tau aggregation compounds with a novel structural scaffold.

  3. A "Simple Query Interface" Adapter for the Discovery and Exchange of Learning Resources

    ERIC Educational Resources Information Center

    Massart, David

    2006-01-01

    Developed as part of CEN/ISSS Workshop on Learning Technology efforts to improve interoperability between learning resource repositories, the Simple Query Interface (SQI) is an Application Program Interface (API) for querying heterogeneous repositories of learning resource metadata. In the context of the ProLearn Network of Excellence, SQI is used…

  4. Quantum Private Query Based on Bell State and Single Photons

    NASA Astrophysics Data System (ADS)

    Gao, Xiang; Chang, Yan; Zhang, Shi-Bin; Yang, Fan; Zhang, Yan

    2018-03-01

    Quantum private query (QPQ) can protect both user's and database holder's privacy. In this paper, we propose a novel quantum private query protocol based on Bell state and single photons. As far as we know, no one has ever proposed the QPQ based on Bell state. By using the decoherence-free (DF) states, our protocol can resist the collective noise. Besides that, our protocol is a one-way quantum protocol, which can resist the Trojan horse attack and reduce the communication complexity. Our protocol can not only guarantee the participants' privacy but also stand against an external eavesdropper.

  5. Comparing NetCDF and SciDB on managing and querying 5D hydrologic dataset

    NASA Astrophysics Data System (ADS)

    Liu, Haicheng; Xiao, Xiao

    2016-11-01

    Efficiently extracting information from high dimensional hydro-meteorological modelling datasets requires smart solutions. Traditional methods are mostly based on files, which can be edited and accessed handily. But they have problems of efficiency due to contiguous storage structure. Others propose databases as an alternative for advantages such as native functionalities for manipulating multidimensional (MD) arrays, smart caching strategy and scalability. In this research, NetCDF file based solutions and the multidimensional array database management system (DBMS) SciDB applying chunked storage structure are benchmarked to determine the best solution for storing and querying 5D large hydrologic modelling dataset. The effect of data storage configurations including chunk size, dimension order and compression on query performance is explored. Results indicate that dimension order to organize storage of 5D data has significant influence on query performance if chunk size is very large. But the effect becomes insignificant when chunk size is properly set. Compression of SciDB mostly has negative influence on query performance. Caching is an advantage but may be influenced by execution of different query processes. On the whole, NetCDF solution without compression is in general more efficient than the SciDB DBMS.

  6. SEQUOIA: significance enhanced network querying through context-sensitive random walk and minimization of network conductance.

    PubMed

    Jeong, Hyundoo; Yoon, Byung-Jun

    2017-03-14

    Network querying algorithms provide computational means to identify conserved network modules in large-scale biological networks that are similar to known functional modules, such as pathways or molecular complexes. Two main challenges for network querying algorithms are the high computational complexity of detecting potential isomorphism between the query and the target graphs and ensuring the biological significance of the query results. In this paper, we propose SEQUOIA, a novel network querying algorithm that effectively addresses these issues by utilizing a context-sensitive random walk (CSRW) model for network comparison and minimizing the network conductance of potential matches in the target network. The CSRW model, inspired by the pair hidden Markov model (pair-HMM) that has been widely used for sequence comparison and alignment, can accurately assess the node-to-node correspondence between different graphs by accounting for node insertions and deletions. The proposed algorithm identifies high-scoring network regions based on the CSRW scores, which are subsequently extended by maximally reducing the network conductance of the identified subnetworks. Performance assessment based on real PPI networks and known molecular complexes show that SEQUOIA outperforms existing methods and clearly enhances the biological significance of the query results. The source code and datasets can be downloaded from http://www.ece.tamu.edu/~bjyoon/SEQUOIA .

  7. Querying Semi-Structured Data

    NASA Technical Reports Server (NTRS)

    Abiteboul, Serge

    1997-01-01

    The amount of data of all kinds available electronically has increased dramatically in recent years. The data resides in different forms, ranging from unstructured data in the systems to highly structured in relational database systems. Data is accessible through a variety of interfaces including Web browsers, database query languages, application-specic interfaces, or data exchange formats. Some of this data is raw data, e.g., images or sound. Some of it has structure even if the structure is often implicit, and not as rigid or regular as that found in standard database systems. Sometimes the structure exists but has to be extracted from the data. Sometimes also it exists but we prefer to ignore it for certain purposes such as browsing. We call here semi-structured data this data that is (from a particular viewpoint) neither raw data nor strictly typed, i.e., not table-oriented as in a relational model or sorted-graph as in object databases. As will seen later when the notion of semi-structured data is more precisely de ned, the need for semi-structured data arises naturally in the context of data integration, even when the data sources are themselves well-structured. Although data integration is an old topic, the need to integrate a wider variety of data- formats (e.g., SGML or ASN.1 data) and data found on the Web has brought the topic of semi-structured data to the forefront of research. The main purpose of the paper is to isolate the essential aspects of semi- structured data. We also survey some proposals of models and query languages for semi-structured data. In particular, we consider recent works at Stanford U. and U. Penn on semi-structured data. In both cases, the motivation is found in the integration of heterogeneous data.

  8. Concept locator: a client-server application for retrieval of UMLS metathesaurus concepts through complex boolean query.

    PubMed

    Nadkarni, P M

    1997-08-01

    Concept Locator (CL) is a client-server application that accesses a Sybase relational database server containing a subset of the UMLS Metathesaurus for the purpose of retrieval of concepts corresponding to one or more query expressions supplied to it. CL's query grammar permits complex Boolean expressions, wildcard patterns, and parenthesized (nested) subexpressions. CL translates the query expressions supplied to it into one or more SQL statements that actually perform the retrieval. The generated SQL is optimized by the client to take advantage of the strengths of the server's query optimizer, and sidesteps its weaknesses, so that execution is reasonably efficient.

  9. An experimental study on the hazard assessment and mechanical properties of porous concrete utilizing coal bottom ash coarse aggregate in Korea.

    PubMed

    Park, Seung Bum; Jang, Young Il; Lee, Jun; Lee, Byung Jae

    2009-07-15

    This study evaluates quality properties and toxicity of coal bottom ash coarse aggregate and analyzes mechanical properties of porous concrete depending on mixing rates of coal bottom ash. As a result, soundness and resistance to abrasion of coal bottom ash coarse aggregate were satisfied according to the standard of coarse aggregate for concrete. To satisfy the standard pertaining to chloride content, the coarse aggregates have to be washed more than twice. In regards to the result of leaching test for coal bottom ash coarse aggregate and porous concrete produced with these coarse aggregates, it was satisfied with the environment criteria. As the mixing rate of coal bottom ash increased, influence of void ratio and permeability coefficient was very little, but compressive and flexural strength decreased. When coal bottom ash was mixed over 40%, strength decreased sharply (compressive strength: by 11.7-27.1%, flexural strength: by maximum 26.4%). Also, as the mixing rate of coal bottom ash increased, it was confirmed that test specimens were destroyed by aggregate fracture more than binder fracture and interface fracture. To utilize coal bottom ash in large quantities, it is thought that an improvement method in regards to strength has to be discussed such as incorporation of reinforcing materials and improvement of aggregate hardness.

  10. Marine Synechococcus Aggregation

    NASA Astrophysics Data System (ADS)

    Neuer, S.; Deng, W.; Cruz, B. N.; Monks, L.

    2016-02-01

    Cyanobacteria are considered to play an important role in the oceanic biological carbon pump, especially in oligotrophic regions. But as single cells are too small to sink, their carbon export has to be mediated by aggregate formation and possible consumption by zooplankton producing sinking fecal pellets. Here we report results on the aggregation of the ubiquitous marine pico-cyanobacterium Synechococcus as a model organism. We first investigated the mechanism behind such aggregation by studying the potential role of transparent exopolymeric particles (TEP) and the effects of nutrient (nitrogen or phosphorus) limitation on the TEP production and aggregate formation of these pico-cyanobacteria. We further studied the aggregation and subsequent settling in roller tanks and investigated the effects of the clays kaolinite and bentonite in a series of concentrations. Our results show that despite of the lowered growth rates, Synechococcus in nutrient limited cultures had larger cell-normalized TEP production, formed a greater volume of aggregates, and resulted in higher settling velocities compared to results from replete cultures. In addition, we found that despite their small size and lack of natural ballasting minerals, Synechococcus cells could still form aggregates and sink at measureable velocities in seawater. Clay minerals increased the number and reduced the size of aggregates, and their ballasting effects increased the sinking velocity and carbon export potential of aggregates. In comparison with the Synechococcus, we will also present results of the aggregation of the pico-cyanobacterium Prochlorococcus in roller tanks. These results contribute to our understanding in the physiology of marine Synechococcus as well as their role in the ecology and biogeochemistry in oligotrophic oceans.

  11. Experimental volcanic ash aggregation: Internal structuring of accretionary lapilli and the role of liquid bonding

    NASA Astrophysics Data System (ADS)

    Mueller, Sebastian B.; Kueppers, Ulrich; Ayris, Paul M.; Jacob, Michael; Dingwell, Donald B.

    2016-01-01

    Explosive volcanic eruptions can release vast quantities of pyroclastic material into Earth's atmosphere, including volcanic ash, particles with diameters less than two millimeters. Ash particles can cluster together to form aggregates, in some cases reaching up to several centimeters in size. Aggregation alters ash transport and settling behavior compared to un-aggregated particles, influencing ash distribution and deposit stratigraphy. Accretionary lapilli, the most commonly preserved type of aggregates within the geologic record, can exhibit complex internal stratigraphy. The processes involved in the formation and preservation of these aggregates remain poorly constrained quantitatively. In this study, we simulate the variable gas-particle flow conditions which may be encountered within eruption plumes and pyroclastic density currents via laboratory experiments using the ProCell Lab System® of Glatt Ingenieurtechnik GmbH. In this apparatus, solid particles are set into motion in a fluidized bed over a range of well-controlled boundary conditions (particle concentration, air flow rate, gas temperature, humidity, liquid composition). Experiments were conducted with soda-lime glass beads and natural volcanic ash particles under a range of experimental conditions. Both glass beads and volcanic ash exhibited the capacity for aggregation, but stable aggregates could only be produced when materials were coated with high but volcanically-relevant concentrations of NaCl. The growth and structure of aggregates was dependent on the initial granulometry, while the rate of aggregate formation increased exponentially with increasing relative humidity (12-45% RH), before overwetting promoted mud droplet formation. Notably, by use of a broad granulometry, we generated spherical, internally structured aggregates similar to some accretionary pellets found in volcanic deposits. Adaptation of a powder-technology model offers an explanation for the origin of natural accretionary

  12. The design and implementation of image query system based on color feature

    NASA Astrophysics Data System (ADS)

    Yao, Xu-Dong; Jia, Da-Chun; Li, Lin

    2013-07-01

    ASP.NET technology was used to construct the B/S mode image query system. The theory and technology of database design, color feature extraction from image, index and retrieval in the construction of the image repository were researched. The campus LAN and WAN environment were used to test the system. From the test results, the needs of user queries about related resources were achieved by system architecture design.

  13. Extending the Query Language of a Data Warehouse for Patient Recruitment.

    PubMed

    Dietrich, Georg; Ertl, Maximilian; Fette, Georg; Kaspar, Mathias; Krebs, Jonathan; Mackenrodt, Daniel; Störk, Stefan; Puppe, Frank

    2017-01-01

    Patient recruitment for clinical trials is a laborious task, as many texts have to be screened. Usually, this work is done manually and takes a lot of time. We have developed a system that automates the screening process. Besides standard keyword queries, the query language supports extraction of numbers, time-spans and negations. In a feasibility study for patient recruitment from a stroke unit with 40 patients, we achieved encouraging extraction rates above 95% for numbers and negations and ca. 86% for time spans.

  14. Quantity Stickiness versus Stackelberg Leadership

    NASA Astrophysics Data System (ADS)

    Ferreira, F. A.

    2008-10-01

    We study the endogenous Stackelberg relations in a dynamic market. We analyze a twice-repeated duopoly where, in the beginning, each firm chooses either a quantity-sticky production mode or a quantity-flexible production mode. The size of the market becomes observable after the first period. In the second period, a firm can adjust its quantity if, and only if, it has adopted the flexible mode. Hence, if one firm chooses the sticky mode whilst the other chooses the flexible mode, then they respectively play the roles of a Stackelberg leader and a Stackelberg follower in the second marketing period. We compute the supply quantities at equilibrium and the corresponding expected profits of the firms. We also analyze the effect of the slope parameter of the demand curve on the expected supply quantities and on the profits.

  15. NEOview: Near Earth Object Data Discovery and Query

    NASA Astrophysics Data System (ADS)

    Tibbetts, M.; Elvis, M.; Galache, J. L.; Harbo, P.; McDowell, J. C.; Rudenko, M.; Van Stone, D.; Zografou, P.

    2013-10-01

    Missions to Near Earth Objects (NEOs) figure prominently in NASA's Flexible Path approach to human space exploration. NEOs offer insight into both the origins of the Solar System and of life, as well as a source of materials for future missions. With NEOview scientists can locate NEO datasets, explore metadata provided by the archives, and query or combine disparate NEO datasets in the search for NEO candidates for exploration. NEOview is a software system that illustrates how standards-based interfaces facilitate NEO data discovery and research. NEOview software follows a client-server architecture. The server is a configurable implementation of the International Virtual Observatory Alliance (IVOA) Table Access Protocol (TAP), a general interface for tabular data access, that can be deployed as a front end to existing NEO datasets. The TAP client, seleste, is a graphical interface that provides intuitive means of discovering NEO providers, exploring dataset metadata to identify fields of interest, and constructing queries to retrieve or combine data. It features a powerful, graphical query builder capable of easing the user's introduction to table searches. Through science use cases, NEOview demonstrates how potential targets for NEO rendezvous could be identified by combining data from complementary sources. Through deployment and operations, it has been shown that the software components are data independent and configurable to many different data servers. As such, NEOview's TAP server and seleste TAP client can be used to create a seamless environment for data discovery and exploration for tabular data in any astronomical archive.

  16. Bin-Hash Indexing: A Parallel Method for Fast Query Processing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bethel, Edward W; Gosink, Luke J.; Wu, Kesheng

    2008-06-27

    This paper presents a new parallel indexing data structure for answering queries. The index, called Bin-Hash, offers extremely high levels of concurrency, and is therefore well-suited for the emerging commodity of parallel processors, such as multi-cores, cell processors, and general purpose graphics processing units (GPU). The Bin-Hash approach first bins the base data, and then partitions and separately stores the values in each bin as a perfect spatial hash table. To answer a query, we first determine whether or not a record satisfies the query conditions based on the bin boundaries. For the bins with records that can not bemore » resolved, we examine the spatial hash tables. The procedures for examining the bin numbers and the spatial hash tables offer the maximum possible level of concurrency; all records are able to be evaluated by our procedure independently in parallel. Additionally, our Bin-Hash procedures access much smaller amounts of data than similar parallel methods, such as the projection index. This smaller data footprint is critical for certain parallel processors, like GPUs, where memory resources are limited. To demonstrate the effectiveness of Bin-Hash, we implement it on a GPU using the data-parallel programming language CUDA. The concurrency offered by the Bin-Hash index allows us to fully utilize the GPU's massive parallelism in our work; over 12,000 records can be simultaneously evaluated at any one time. We show that our new query processing method is an order of magnitude faster than current state-of-the-art CPU-based indexing technologies. Additionally, we compare our performance to existing GPU-based projection index strategies.« less

  17. Query-seeded iterative sequence similarity searching improves selectivity 5–20-fold

    PubMed Central

    Li, Weizhong; Lopez, Rodrigo

    2017-01-01

    Abstract Iterative similarity search programs, like psiblast, jackhmmer, and psisearch, are much more sensitive than pairwise similarity search methods like blast and ssearch because they build a position specific scoring model (a PSSM or HMM) that captures the pattern of sequence conservation characteristic to a protein family. But models are subject to contamination; once an unrelated sequence has been added to the model, homologs of the unrelated sequence will also produce high scores, and the model can diverge from the original protein family. Examination of alignment errors during psiblast PSSM contamination suggested a simple strategy for dramatically reducing PSSM contamination. psiblast PSSMs are built from the query-based multiple sequence alignment (MSA) implied by the pairwise alignments between the query model (PSSM, HMM) and the subject sequences in the library. When the original query sequence residues are inserted into gapped positions in the aligned subject sequence, the resulting PSSM rarely produces alignment over-extensions or alignments to unrelated sequences. This simple step, which tends to anchor the PSSM to the original query sequence and slightly increase target percent identity, can reduce the frequency of false-positive alignments more than 20-fold compared with psiblast and jackhmmer, with little loss in search sensitivity. PMID:27923999

  18. Random and Directed Walk-Based Top-k Queries in Wireless Sensor Networks

    PubMed Central

    Fu, Jun-Song; Liu, Yun

    2015-01-01

    In wireless sensor networks, filter-based top-k query approaches are the state-of-the-art solutions and have been extensively researched in the literature, however, they are very sensitive to the network parameters, including the size of the network, dynamics of the sensors’ readings and declines in the overall range of all the readings. In this work, a random walk-based top-k query approach called RWTQ and a directed walk-based top-k query approach called DWTQ are proposed. At the beginning of a top-k query, one or several tokens are sent to the specific node(s) in the network by the base station. Then, each token walks in the network independently to record and process the readings in a random or directed way. A strategy of choosing the “right” way in DWTQ is carefully designed for the token(s) to arrive at the high-value regions as soon as possible. When designing the walking strategy for DWTQ, the spatial correlations of the readings are also considered. Theoretical analysis and simulation results indicate that RWTQ and DWTQ both are very robust against these parameters discussed previously. In addition, DWTQ outperforms TAG, FILA and EXTOK in transmission cost, energy consumption and network lifetime. PMID:26016914

  19. An SSVEP-Based Brain-Computer Interface for Text Spelling With Adaptive Queries That Maximize Information Gain Rates.

    PubMed

    Akce, Abdullah; Norton, James J S; Bretl, Timothy

    2015-09-01

    This paper presents a brain-computer interface for text entry using steady-state visually evoked potentials (SSVEP). Like other SSVEP-based spellers, ours identifies the desired input character by posing questions (or queries) to users through a visual interface. Each query defines a mapping from possible characters to steady-state stimuli. The user responds by attending to one of these stimuli. Unlike other SSVEP-based spellers, ours chooses from a much larger pool of possible queries-on the order of ten thousand instead of ten. The larger query pool allows our speller to adapt more effectively to the inherent structure of what is being typed and to the input performance of the user, both of which make certain queries provide more information than others. In particular, our speller chooses queries from this pool that maximize the amount of information to be received per unit of time, a measure of mutual information that we call information gain rate. To validate our interface, we compared it with two other state-of-the-art SSVEP-based spellers, which were re-implemented to use the same input mechanism. Results showed that our interface, with the larger query pool, allowed users to spell multiple-word texts nearly twice as fast as they could with the compared spellers.

  20. Federated Space-Time Query for Earth Science Data Using OpenSearch Conventions

    NASA Technical Reports Server (NTRS)

    Lynnes, Chris; Beaumont, Bruce; Duerr, Ruth; Hua, Hook

    2009-01-01

    This slide presentation reviews a Space-time query system that has been developed to assist the user in finding Earth science data that fulfills the researchers needs. It reviews the reasons why finding Earth science data can be so difficult, and explains the workings of the Space-Time Query with OpenSearch and how this system can assist researchers in finding the required data, It also reviews the developments with client server systems.

  1. GEOMETRIC CROSS SECTIONS OF DUST AGGREGATES AND A COMPRESSION MODEL FOR AGGREGATE COLLISIONS

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Suyama, Toru; Wada, Koji; Tanaka, Hidekazu

    2012-07-10

    Geometric cross sections of dust aggregates determine their coupling with disk gas, which governs their motions in protoplanetary disks. Collisional outcomes also depend on geometric cross sections of initial aggregates. In a previous paper, we performed three-dimensional N-body simulations of sequential collisions of aggregates composed of a number of sub-micron-sized icy particles and examined radii of gyration (and bulk densities) of the obtained aggregates. We showed that collisional compression of aggregates is not efficient and that aggregates remain fluffy. In the present study, we examine geometric cross sections of the aggregates. Their cross sections decrease due to compression as wellmore » as to their gyration radii. It is found that a relation between the cross section and the gyration radius proposed by Okuzumi et al. is valid for the compressed aggregates. We also refine the compression model proposed in our previous paper. The refined model enables us to calculate the evolution of both gyration radii and cross sections of growing aggregates and reproduces well our numerical results of sequential aggregate collisions. The refined model can describe non-equal-mass collisions as well as equal-mass cases. Although we do not take into account oblique collisions in the present study, oblique collisions would further hinder compression of aggregates.« less

  2. Defining and systematic analyses of aggregation indices to evaluate degree of calcium oxalate crystal aggregation

    NASA Astrophysics Data System (ADS)

    Chaiyarit, Sakdithep; Thongboonkerd, Visith

    2017-12-01

    Crystal aggregation is one of the most crucial steps in kidney stone pathogenesis. However, previous studies of crystal aggregation were rarely done and quantitative analysis of aggregation degree was handicapped by a lack of the standard measurement. We thus performed an in vitro assay to generate aggregation of calcium oxalate monohydrate (COM) crystals with various concentrations (25-800 µg/ml) in saturated aggregation buffer. The crystal aggregates were analyzed by microscopic examination, UV-visible spectrophotometry, and GraphPad Prism6 software to define a total of 12 aggregation indices (including number of aggregates, aggregated mass index, optical density, aggregation coefficient, span, number of aggregates at plateau time-point, aggregated area index, aggregated diameter index, aggregated symmetry index, time constant, half-life, and rate constant). The data showed linear correlation between crystal concentration and almost all of these indices, except only for rate constant. Among these, number of aggregates provided the greatest regression coefficient (r=0.997; p<0.001), whereas the equally second rank included aggregated mass index and optical density (r=0.993; p<0.001 and r=‑0.993; p<0.001, respectively) and the equally forth were aggregation coefficient and span (r=0.991; p<0.001 for both). These five indices are thus recommended as the most appropriate indices for quantitative analysis of COM crystal aggregation in vitro.

  3. GraQL: A Query Language for High-Performance Attributed Graph Databases

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chavarría-Miranda, Daniel; Castellana, Vito G.; Morari, Alessandro

    Graph databases have gained increasing interest in the last few years due to the emergence of data sources which are not easily analyzable in traditional relational models or for which a graph data model is the natural representation. In order to understand the design and implementation choices for an attributed graph database backend and query language, we have started to design our infrastructure for attributed graph databases. In this paper, we describe the design considerations of our in-memory attributed graph database system with a particular focus on the data definition and query language components.

  4. Implementing and evaluating a regional strategy to improve testing rates in VA patients at risk for HIV, utilizing the QUERI process as a guiding framework: QUERI Series

    PubMed Central

    Goetz, Matthew B; Bowman, Candice; Hoang, Tuyen; Anaya, Henry; Osborn, Teresa; Gifford, Allen L; Asch, Steven M

    2008-01-01

    Background We describe how we used the framework of the U.S. Department of Veterans Affairs (VA) Quality Enhancement Research Initiative (QUERI) to develop a program to improve rates of diagnostic testing for the Human Immunodeficiency Virus (HIV). This venture was prompted by the observation by the CDC that 25% of HIV-infected patients do not know their diagnosis – a point of substantial importance to the VA, which is the largest provider of HIV care in the United States. Methods Following the QUERI steps (or process), we evaluated: 1) whether undiagnosed HIV infection is a high-risk, high-volume clinical issue within the VA, 2) whether there are evidence-based recommendations for HIV testing, 3) whether there are gaps in the performance of VA HIV testing, and 4) the barriers and facilitators to improving current practice in the VA. Based on our findings, we developed and initiated a QUERI step 4/phase 1 pilot project using the precepts of the Chronic Care Model. Our improvement strategy relies upon electronic clinical reminders to provide decision support; audit/feedback as a clinical information system, and appropriate changes in delivery system design. These activities are complemented by academic detailing and social marketing interventions to achieve provider activation. Results Our preliminary formative evaluation indicates the need to ensure leadership and team buy-in, address facility-specific barriers, refine the reminder, and address factors that contribute to inter-clinic variances in HIV testing rates. Preliminary unadjusted data from the first seven months of our program show 3–5 fold increases in the proportion of at-risk patients who are offered HIV testing at the VA sites (stations) where the pilot project has been undertaken; no change was seen at control stations. Discussion This project demonstrates the early success of the application of the QUERI process to the development of a program to improve HIV testing rates. Preliminary unadjusted

  5. Implementing and evaluating a regional strategy to improve testing rates in VA patients at risk for HIV, utilizing the QUERI process as a guiding framework: QUERI Series.

    PubMed

    Goetz, Matthew B; Bowman, Candice; Hoang, Tuyen; Anaya, Henry; Osborn, Teresa; Gifford, Allen L; Asch, Steven M

    2008-03-19

    We describe how we used the framework of the U.S. Department of Veterans Affairs (VA) Quality Enhancement Research Initiative (QUERI) to develop a program to improve rates of diagnostic testing for the Human Immunodeficiency Virus (HIV). This venture was prompted by the observation by the CDC that 25% of HIV-infected patients do not know their diagnosis - a point of substantial importance to the VA, which is the largest provider of HIV care in the United States. Following the QUERI steps (or process), we evaluated: 1) whether undiagnosed HIV infection is a high-risk, high-volume clinical issue within the VA, 2) whether there are evidence-based recommendations for HIV testing, 3) whether there are gaps in the performance of VA HIV testing, and 4) the barriers and facilitators to improving current practice in the VA.Based on our findings, we developed and initiated a QUERI step 4/phase 1 pilot project using the precepts of the Chronic Care Model. Our improvement strategy relies upon electronic clinical reminders to provide decision support; audit/feedback as a clinical information system, and appropriate changes in delivery system design. These activities are complemented by academic detailing and social marketing interventions to achieve provider activation. Our preliminary formative evaluation indicates the need to ensure leadership and team buy-in, address facility-specific barriers, refine the reminder, and address factors that contribute to inter-clinic variances in HIV testing rates. Preliminary unadjusted data from the first seven months of our program show 3-5 fold increases in the proportion of at-risk patients who are offered HIV testing at the VA sites (stations) where the pilot project has been undertaken; no change was seen at control stations. This project demonstrates the early success of the application of the QUERI process to the development of a program to improve HIV testing rates. Preliminary unadjusted results show that the coordinated use of

  6. Query-Adaptive Reciprocal Hash Tables for Nearest Neighbor Search.

    PubMed

    Liu, Xianglong; Deng, Cheng; Lang, Bo; Tao, Dacheng; Li, Xuelong

    2016-02-01

    Recent years have witnessed the success of binary hashing techniques in approximate nearest neighbor search. In practice, multiple hash tables are usually built using hashing to cover more desired results in the hit buckets of each table. However, rare work studies the unified approach to constructing multiple informative hash tables using any type of hashing algorithms. Meanwhile, for multiple table search, it also lacks of a generic query-adaptive and fine-grained ranking scheme that can alleviate the binary quantization loss suffered in the standard hashing techniques. To solve the above problems, in this paper, we first regard the table construction as a selection problem over a set of candidate hash functions. With the graph representation of the function set, we propose an efficient solution that sequentially applies normalized dominant set to finding the most informative and independent hash functions for each table. To further reduce the redundancy between tables, we explore the reciprocal hash tables in a boosting manner, where the hash function graph is updated with high weights emphasized on the misclassified neighbor pairs of previous hash tables. To refine the ranking of the retrieved buckets within a certain Hamming radius from the query, we propose a query-adaptive bitwise weighting scheme to enable fine-grained bucket ranking in each hash table, exploiting the discriminative power of its hash functions and their complement for nearest neighbor search. Moreover, we integrate such scheme into the multiple table search using a fast, yet reciprocal table lookup algorithm within the adaptive weighted Hamming radius. In this paper, both the construction method and the query-adaptive search method are general and compatible with different types of hashing algorithms using different feature spaces and/or parameter settings. Our extensive experiments on several large-scale benchmarks demonstrate that the proposed techniques can significantly outperform both

  7. Sustainable aggregates production : green applications for aggregate by-products.

    DOT National Transportation Integrated Search

    2015-06-01

    Increased emphasis in the construction industry on sustainability and recycling requires production of : aggregate gradations with lower dust (cleaner aggregates) and smaller maximum sizeshence, increased : amount of quarry by-products (QBs). QBs ...

  8. Interactive Querying Techniques for an Office Filing Facility.

    ERIC Educational Resources Information Center

    Morrissey, J. M.; And Others

    1986-01-01

    Proposes a "Model of Querying" for users of office filing facilities and discusses its motivation, aspects, attributes, and advantages. A review of current information systems and attempts to combine information retrieval, artificial intelligence, and database management techniques leads to conclusion that no resultant system is adequate…

  9. An Analysis of Web Image Queries for Search.

    ERIC Educational Resources Information Center

    Pu, Hsiao-Tieh

    2003-01-01

    Examines the differences between Web image and textual queries, and attempts to develop an analytic model to investigate their implications for Web image retrieval systems. Provides results that give insight into Web image searching behavior and suggests implications for improvement of current Web image search engines. (AEF)

  10. Varieties of quantity estimation in children.

    PubMed

    Sella, Francesco; Berteletti, Ilaria; Lucangeli, Daniela; Zorzi, Marco

    2015-06-01

    In the number-to-position task, with increasing age and numerical expertise, children's pattern of estimates shifts from a biased (nonlinear) to a formal (linear) mapping. This widely replicated finding concerns symbolic numbers, whereas less is known about other types of quantity estimation. In Experiment 1, Preschool, Grade 1, and Grade 3 children were asked to map continuous quantities, discrete nonsymbolic quantities (numerosities), and symbolic (Arabic) numbers onto a visual line. Numerical quantity was matched for the symbolic and discrete nonsymbolic conditions, whereas cumulative surface area was matched for the continuous and discrete quantity conditions. Crucially, in the discrete condition children's estimation could rely either on the cumulative area or numerosity. All children showed a linear mapping for continuous quantities, whereas a developmental shift from a logarithmic to a linear mapping was observed for both nonsymbolic and symbolic numerical quantities. Analyses on individual estimates suggested the presence of two distinct strategies in estimating discrete nonsymbolic quantities: one based on numerosity and the other based on spatial extent. In Experiment 2, a non-spatial continuous quantity (shades of gray) and new discrete nonsymbolic conditions were added to the set used in Experiment 1. Results confirmed the linear patterns for the continuous tasks, as well as the presence of a subset of children relying on numerosity for the discrete nonsymbolic numerosity conditions despite the availability of continuous visual cues. Overall, our findings demonstrate that estimation of numerical and non-numerical quantities is based on different processing strategies and follow different developmental trajectories. (c) 2015 APA, all rights reserved).

  11. 48 CFR 52.236-16 - Quantity Surveys.

    Code of Federal Regulations, 2013 CFR

    2013-10-01

    ... 48 Federal Acquisition Regulations System 2 2013-10-01 2013-10-01 false Quantity Surveys. 52.236... Quantity Surveys. As prescribed in 36.516, the contracting officer may insert the following clause in... payment based on quantity surveys is contemplated: Quantity Surveys (APR 1984) (a) Quantity surveys shall...

  12. 48 CFR 52.236-16 - Quantity Surveys.

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ... 48 Federal Acquisition Regulations System 2 2011-10-01 2011-10-01 false Quantity Surveys. 52.236... Quantity Surveys. As prescribed in 36.516, the contracting officer may insert the following clause in... payment based on quantity surveys is contemplated: Quantity Surveys (APR 1984) (a) Quantity surveys shall...

  13. 48 CFR 52.236-16 - Quantity Surveys.

    Code of Federal Regulations, 2012 CFR

    2012-10-01

    ... 48 Federal Acquisition Regulations System 2 2012-10-01 2012-10-01 false Quantity Surveys. 52.236... Quantity Surveys. As prescribed in 36.516, the contracting officer may insert the following clause in... payment based on quantity surveys is contemplated: Quantity Surveys (APR 1984) (a) Quantity surveys shall...

  14. 48 CFR 52.236-16 - Quantity Surveys.

    Code of Federal Regulations, 2014 CFR

    2014-10-01

    ... 48 Federal Acquisition Regulations System 2 2014-10-01 2014-10-01 false Quantity Surveys. 52.236... Quantity Surveys. As prescribed in 36.516, the contracting officer may insert the following clause in... payment based on quantity surveys is contemplated: Quantity Surveys (APR 1984) (a) Quantity surveys shall...

  15. 48 CFR 52.236-16 - Quantity Surveys.

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ... 48 Federal Acquisition Regulations System 2 2010-10-01 2010-10-01 false Quantity Surveys. 52.236... Quantity Surveys. As prescribed in 36.516, the contracting officer may insert the following clause in... payment based on quantity surveys is contemplated: Quantity Surveys (APR 1984) (a) Quantity surveys shall...

  16. Spatial Query for Planetary Data

    NASA Technical Reports Server (NTRS)

    Shams, Khawaja S.; Crockett, Thomas M.; Powell, Mark W.; Joswig, Joseph C.; Fox, Jason M.

    2011-01-01

    Science investigators need to quickly and effectively assess past observations of specific locations on a planetary surface. This innovation involves a location-based search technology that was adapted and applied to planetary science data to support a spatial query capability for mission operations software. High-performance location-based searching requires the use of spatial data structures for database organization. Spatial data structures are designed to organize datasets based on their coordinates in a way that is optimized for location-based retrieval. The particular spatial data structure that was adapted for planetary data search is the R+ tree.

  17. Loss-tolerant measurement-device-independent quantum private queries

    NASA Astrophysics Data System (ADS)

    Zhao, Liang-Yuan; Yin, Zhen-Qiang; Chen, Wei; Qian, Yong-Jun; Zhang, Chun-Mei; Guo, Guang-Can; Han, Zheng-Fu

    2017-01-01

    Quantum private queries (QPQ) is an important cryptography protocol aiming to protect both the user’s and database’s privacy when the database is queried privately. Recently, a variety of practical QPQ protocols based on quantum key distribution (QKD) have been proposed. However, for QKD-based QPQ the user’s imperfect detectors can be subjected to some detector- side-channel attacks launched by the dishonest owner of the database. Here, we present a simple example that shows how the detector-blinding attack can damage the security of QKD-based QPQ completely. To remove all the known and unknown detector side channels, we propose a solution of measurement-device-independent QPQ (MDI-QPQ) with single- photon sources. The security of the proposed protocol has been analyzed under some typical attacks. Moreover, we prove that its security is completely loss independent. The results show that practical QPQ will remain the same degree of privacy as before even with seriously uncharacterized detectors.

  18. Loss-tolerant measurement-device-independent quantum private queries.

    PubMed

    Zhao, Liang-Yuan; Yin, Zhen-Qiang; Chen, Wei; Qian, Yong-Jun; Zhang, Chun-Mei; Guo, Guang-Can; Han, Zheng-Fu

    2017-01-04

    Quantum private queries (QPQ) is an important cryptography protocol aiming to protect both the user's and database's privacy when the database is queried privately. Recently, a variety of practical QPQ protocols based on quantum key distribution (QKD) have been proposed. However, for QKD-based QPQ the user's imperfect detectors can be subjected to some detector- side-channel attacks launched by the dishonest owner of the database. Here, we present a simple example that shows how the detector-blinding attack can damage the security of QKD-based QPQ completely. To remove all the known and unknown detector side channels, we propose a solution of measurement-device-independent QPQ (MDI-QPQ) with single- photon sources. The security of the proposed protocol has been analyzed under some typical attacks. Moreover, we prove that its security is completely loss independent. The results show that practical QPQ will remain the same degree of privacy as before even with seriously uncharacterized detectors.

  19. Loss-tolerant measurement-device-independent quantum private queries

    PubMed Central

    Zhao, Liang-Yuan; Yin, Zhen-Qiang; Chen, Wei; Qian, Yong-Jun; Zhang, Chun-Mei; Guo, Guang-Can; Han, Zheng-Fu

    2017-01-01

    Quantum private queries (QPQ) is an important cryptography protocol aiming to protect both the user’s and database’s privacy when the database is queried privately. Recently, a variety of practical QPQ protocols based on quantum key distribution (QKD) have been proposed. However, for QKD-based QPQ the user’s imperfect detectors can be subjected to some detector- side-channel attacks launched by the dishonest owner of the database. Here, we present a simple example that shows how the detector-blinding attack can damage the security of QKD-based QPQ completely. To remove all the known and unknown detector side channels, we propose a solution of measurement-device-independent QPQ (MDI-QPQ) with single- photon sources. The security of the proposed protocol has been analyzed under some typical attacks. Moreover, we prove that its security is completely loss independent. The results show that practical QPQ will remain the same degree of privacy as before even with seriously uncharacterized detectors. PMID:28051101

  20. Image query and indexing for digital x rays

    NASA Astrophysics Data System (ADS)

    Long, L. Rodney; Thoma, George R.

    1998-12-01

    The web-based medical information retrieval system (WebMIRS) allows interned access to databases containing 17,000 digitized x-ray spine images and associated text data from National Health and Nutrition Examination Surveys (NHANES). WebMIRS allows SQL query of the text, and viewing of the returned text records and images using a standard browser. We are now working (1) to determine utility of data directly derived from the images in our databases, and (2) to investigate the feasibility of computer-assisted or automated indexing of the images to support image retrieval of images of interest to biomedical researchers in the field of osteoarthritis. To build an initial database based on image data, we are manually segmenting a subset of the vertebrae, using techniques from vertebral morphometry. From this, we will derive and add to the database vertebral features. This image-derived data will enhance the user's data access capability by enabling the creation of combined SQL/image-content queries.

  1. PiCO QL: A software library for runtime interactive queries on program data

    NASA Astrophysics Data System (ADS)

    Fragkoulis, Marios; Spinellis, Diomidis; Louridas, Panos

    PiCO QL is an open source C/C++ software whose scientific scope is real-time interactive analysis of in-memory data through SQL queries. It exposes a relational view of a system's or application's data structures, which is queryable through SQL. While the application or system is executing, users can input queries through a web-based interface or issue web service requests. Queries execute on the live data structures through the respective relational views. PiCO QL makes a good candidate for ad-hoc data analysis in applications and for diagnostics in systems settings. Applications of PiCO QL include the Linux kernel, the Valgrind instrumentation framework, a GIS application, a virtual real-time observatory of stellar objects, and a source code analyser.

  2. Biogrid--a microfluidic device for large-scale enzyme-free dissociation of stem cell aggregates.

    PubMed

    Wallman, Lars; Åkesson, Elisabet; Ceric, Dario; Andersson, Per Henrik; Day, Kelly; Hovatta, Outi; Falci, Scott; Laurell, Thomas; Sundström, Erik

    2011-10-07

    Culturing stem cells as free-floating aggregates in suspension facilitates large-scale production of cells in closed systems, for clinical use. To comply with GMP standards, the use of substances such as proteolytic enzymes should be avoided. Instead of enzymatic dissociation, the growing cell aggregates may be mechanically cut at passage, but available methods are not compatible with large-scale cell production and hence translation into the clinic becomes a severe bottle-neck. We have developed the Biogrid device, which consists of an array of micrometerscale knife edges, micro-fabricated in silicon, and a manifold in which the microgrid is placed across the central fluid channel. By connecting one side of the Biogrid to a syringe or a pump and the other side to the cell culture, the culture medium with suspended cell aggregates can be aspirated, forcing the aggregates through the microgrid, and ejected back to the cell culture container. Large aggregates are thereby dissociated into smaller fragments while small aggregates pass through the microgrid unaffected. As proof-of-concept, we demonstrate that the Biogrid device can be successfully used for repeated passage of human neural stem/progenitor cells cultured as so-called neurospheres, as well as for passage of suspension cultures of human embryonic stem cells. We also show that human neural stem/progenitor cells tolerate transient pressure changes far exceeding those that will occur in a fluidic system incorporating the Biogrid microgrids. Thus, by using the Biogrid device it is possible to mechanically passage large quantities of cells in suspension cultures in closed fluidic systems, without the use of proteolytic enzymes.

  3. The Utilisation of Shredded PET as Aggregate Replacement for Interlocking Concrete Block

    NASA Astrophysics Data System (ADS)

    Mokhtar, M.; Kaamin, M.; Sahat, S.; Hamid, N. B.

    2018-03-01

    The consumption of plastic has grown substantially all over the world in recent years and this has created huge quantities of plastic-based waste. Plastic waste is now a serious environmental threat to the modern way of living, although steps were taken to reduce its consumption. This creates substantial garbage every day, which is much unhealthy. Plastic bottles such as Polyethylene terephthalate (PET) was use as the partially component in this making of interlocking blocks concrete. This project investigates the strength and workability of the interlocking block concrete by replacing course aggregate with % PET. The suitability of recycled plastics (PET) as course aggregate in interlocking block concrete and its advantages are discussed here. Moreover, there were more benefits when using interlocking block than using conventional block such as it easy for construction because they are aligning, easy to place, high speed stacking and they offer more resistance to shear and buildings would be even stronger. Based on the test perform, the failure parameter were discussed .From the compressive strength test result, it shows that the strength of concrete block decreased with increased of PET used. From the results, it shows that higher compressive strength was found with 5% natural course aggregate replaced with PET compared to other percentages.

  4. Thermodynamics of Protein Aggregation

    NASA Astrophysics Data System (ADS)

    Osborne, Kenneth L.; Barz, Bogdan; Bachmann, Michael; Strodel, Birgit

    Amyloid protein aggregation characterizes many neurodegenerative disorders, including Alzheimer's, Parkinson's, and Creutz- feldt-Jakob disease. Evidence suggests that amyloid aggregates may share similar aggregation pathways, implying simulation of full-length amyloid proteins is not necessary for understanding amyloid formation. In this study we simulate GNNQQNY, the N-terminal prion-determining domain of the yeast protein Sup35 to investigate the thermodynamics of structural transitions during aggregation. We use a coarse-grained model with replica-exchange molecular dynamics to investigate the association of 3-, 6-, and 12-chain GNNQQNY systems and we determine the aggregation pathway by studying aggregation states of GN- NQQNY. We find that the aggregation of the hydrophilic GNNQQNY sequence is mainly driven by H-bond formation, leading to the formation of /3-sheets from the very beginning of the assembly process. Condensation (aggregation) and ordering take place simultaneously, which is underpinned by the occurrence of a single heat capacity peak only.

  5. DBPQL: A view-oriented query language for the Intel Data Base Processor

    NASA Technical Reports Server (NTRS)

    Fishwick, P. A.

    1983-01-01

    An interactive query language (BDPQL) for the Intel Data Base Processor (DBP) is defined. DBPQL includes a parser generator package which permits the analyst to easily create and manipulate the query statement syntax and semantics. The prototype language, DBPQL, includes trace and performance commands to aid the analyst when implementing new commands and analyzing the execution characteristics of the DBP. The DBPQL grammar file and associated key procedures are included as an appendix to this report.

  6. Structuring Legacy Pathology Reports by openEHR Archetypes to Enable Semantic Querying.

    PubMed

    Kropf, Stefan; Krücken, Peter; Mueller, Wolf; Denecke, Kerstin

    2017-05-18

    Clinical information is often stored as free text, e.g. in discharge summaries or pathology reports. These documents are semi-structured using section headers, numbered lists, items and classification strings. However, it is still challenging to retrieve relevant documents since keyword searches applied on complete unstructured documents result in many false positive retrieval results. We are concentrating on the processing of pathology reports as an example for unstructured clinical documents. The objective is to transform reports semi-automatically into an information structure that enables an improved access and retrieval of relevant data. The data is expected to be stored in a standardized, structured way to make it accessible for queries that are applied to specific sections of a document (section-sensitive queries) and for information reuse. Our processing pipeline comprises information modelling, section boundary detection and section-sensitive queries. For enabling a focused search in unstructured data, documents are automatically structured and transformed into a patient information model specified through openEHR archetypes. The resulting XML-based pathology electronic health records (PEHRs) are queried by XQuery and visualized by XSLT in HTML. Pathology reports (PRs) can be reliably structured into sections by a keyword-based approach. The information modelling using openEHR allows saving time in the modelling process since many archetypes can be reused. The resulting standardized, structured PEHRs allow accessing relevant data by retrieving data matching user queries. Mapping unstructured reports into a standardized information model is a practical solution for a better access to data. Archetype-based XML enables section-sensitive retrieval and visualisation by well-established XML techniques. Focussing the retrieval to particular sections has the potential of saving retrieval time and improving the accuracy of the retrieval.

  7. A Simple Blueprint for Automatic Boolean Query Processing.

    ERIC Educational Resources Information Center

    Salton, G.

    1988-01-01

    Describes a new Boolean retrieval environment in which an extended soft Boolean logic is used to automatically construct queries from original natural language formulations provided by users. Experimental results that compare the retrieval effectiveness of this method to conventional Boolean and vector processing are discussed. (27 references)…

  8. Learning from the U.S. Department of Veterans Affairs Quality Enhancement Research Initiative: QUERI Series

    PubMed Central

    Graham, Ian D; Tetroe, Jacqueline

    2009-01-01

    As the recent collection of papers from the Quality Enhancement Research Initiative (QUERI) Series indicates, knowledge is leading to considerable action in the United States (U.S.) Department of Veterans Affairs (VA). The QUERI Series offers clinical researchers, implementation scientists, health systems, and health research funders from around the globe a unique window into the both the practice and science of implementation or knowledge translation (KT) in the VA. By describing successes and challenges as well as setbacks and disappointments, the QUERI Series is all the more useful. From the vantage point of Canadian KT researchers and officials at a national health research funding agency, we offer a number of observations and lessons that can be learned from QUERI. "Knowledge, if it does not determine action, is dead to us." Plotinus (Roman philosopher 205AD-270AD) PMID:19267920

  9. Natural Language Query System Design for Interactive Information Storage and Retrieval Systems. M.S. Thesis

    NASA Technical Reports Server (NTRS)

    Dominick, Wayne D. (Editor); Liu, I-Hsiung

    1985-01-01

    The currently developed multi-level language interfaces of information systems are generally designed for experienced users. These interfaces commonly ignore the nature and needs of the largest user group, i.e., casual users. This research identifies the importance of natural language query system research within information storage and retrieval system development; addresses the topics of developing such a query system; and finally, proposes a framework for the development of natural language query systems in order to facilitate the communication between casual users and information storage and retrieval systems.

  10. Role of intensive and extensive variables in a soup of firms in economy to address long run prices and aggregate data

    NASA Astrophysics Data System (ADS)

    Hosseiny, Ali; Gallegati, Mauro

    2017-03-01

    We review the production function and the hypothesis of equilibrium in the neoclassical framework. We notify that in a soup of sectors in economy, while capital and labor resemble extensive variables, wage and rate of return on capital act as intensive variables. As a result, Baumol and Bowen's statement of equal wages is inevitable from the thermodynamics point of view. We try to see how aggregation can be performed concerning the extensive variables in a soup of firms. We provide a toy model to perform aggregation for production and the labor income as extensive quantities in a neoclassical framework.

  11. Neuroforecasting Aggregate Choice

    PubMed Central

    Knutson, Brian; Genevsky, Alexander

    2018-01-01

    Advances in brain-imaging design and analysis have allowed investigators to use neural activity to predict individual choice, while emerging Internet markets have opened up new opportunities for forecasting aggregate choice. Here, we review emerging research that bridges these levels of analysis by attempting to use group neural activity to forecast aggregate choice. A survey of initial findings suggests that components of group neural activity might forecast aggregate choice, in some cases even beyond traditional behavioral measures. In addition to demonstrating the plausibility of neuroforecasting, these findings raise the possibility that not all neural processes that predict individual choice forecast aggregate choice to the same degree. We propose that although integrative choice components may confer more consistency within individuals, affective choice components may generalize more broadly across individuals to forecast aggregate choice. PMID:29706726

  12. High-performance web services for querying gene and variant annotation.

    PubMed

    Xin, Jiwen; Mark, Adam; Afrasiabi, Cyrus; Tsueng, Ginger; Juchler, Moritz; Gopal, Nikhil; Stupp, Gregory S; Putman, Timothy E; Ainscough, Benjamin J; Griffith, Obi L; Torkamani, Ali; Whetzel, Patricia L; Mungall, Christopher J; Mooney, Sean D; Su, Andrew I; Wu, Chunlei

    2016-05-06

    Efficient tools for data management and integration are essential for many aspects of high-throughput biology. In particular, annotations of genes and human genetic variants are commonly used but highly fragmented across many resources. Here, we describe MyGene.info and MyVariant.info, high-performance web services for querying gene and variant annotation information. These web services are currently accessed more than three million times permonth. They also demonstrate a generalizable cloud-based model for organizing and querying biological annotation information. MyGene.info and MyVariant.info are provided as high-performance web services, accessible at http://mygene.info and http://myvariant.info . Both are offered free of charge to the research community.

  13. The rules of information aggregation and emergence of collective intelligent behavior.

    PubMed

    Bettencourt, Luís M A

    2009-10-01

    Information is a peculiar quantity. Unlike matter and energy, which are conserved by the laws of physics, the aggregation of knowledge from many sources can in fact produce more information (synergy) or less (redundancy) than the sum of its parts. This feature can endow groups with problem-solving strategies that are superior to those possible among noninteracting individuals and, in turn, may provide a selection drive toward collective cooperation and coordination. Here we explore the formal properties of information aggregation as a general principle for explaining features of social organization. We quantify information in terms of the general formalism of information theory, which also prescribes the rules of how different pieces of evidence inform the solution of a given problem. We then show how several canonical examples of collective cognition and coordination can be understood through principles of minimization of uncertainty (maximization of predictability) under information pooling over many individuals. We discuss in some detail how collective coordination in swarms, markets, natural language processing, and collaborative filtering may be guided by the optimal aggregation of information in social collectives. We also identify circumstances when these processes fail, leading, for example, to inefficient markets. The contrast to approaches to understand coordination and collaboration via decision and game theory is also briefly discussed. Copyright © 2009 Cognitive Science Society, Inc.

  14. Analysis of queries sent to PubMed at the point of care: Observation of search behaviour in a medical teaching hospital

    PubMed Central

    Hoogendam, Arjen; Stalenhoef, Anton FH; Robbé, Pieter F de Vries; Overbeke, A John PM

    2008-01-01

    Background The use of PubMed to answer daily medical care questions is limited because it is challenging to retrieve a small set of relevant articles and time is restricted. Knowing what aspects of queries are likely to retrieve relevant articles can increase the effectiveness of PubMed searches. The objectives of our study were to identify queries that are likely to retrieve relevant articles by relating PubMed search techniques and tools to the number of articles retrieved and the selection of articles for further reading. Methods This was a prospective observational study of queries regarding patient-related problems sent to PubMed by residents and internists in internal medicine working in an Academic Medical Centre. We analyzed queries, search results, query tools (Mesh, Limits, wildcards, operators), selection of abstract and full-text for further reading, using a portal that mimics PubMed. Results PubMed was used to solve 1121 patient-related problems, resulting in 3205 distinct queries. Abstracts were viewed in 999 (31%) of these queries, and in 126 (39%) of 321 queries using query tools. The average term count per query was 2.5. Abstracts were selected in more than 40% of queries using four or five terms, increasing to 63% if the use of four or five terms yielded 2–161 articles. Conclusion Queries sent to PubMed by physicians at our hospital during daily medical care contain fewer than three terms. Queries using four to five terms, retrieving less than 161 article titles, are most likely to result in abstract viewing. PubMed search tools are used infrequently by our population and are less effective than the use of four or five terms. Methods to facilitate the formulation of precise queries, using more relevant terms, should be the focus of education and research. PMID:18816391

  15. The Limitations of Term Co-Occurrence Data for Query Expansion in Document Retrieval Systems.

    ERIC Educational Resources Information Center

    Peat, Helen J.; Willett, Peter

    1991-01-01

    Identifies limitations in the use of term co-occurrence data as a basis for automatic query expansion in natural language document retrieval systems. The use of similarity coefficients to calculate the degree of similarity between pairs of terms is explained, and frequency and discriminatory characteristics for nearest neighbors of query terms are…

  16. Zero-gravity quantity gaging system

    NASA Technical Reports Server (NTRS)

    1989-01-01

    The Zero-Gravity Quantity Gaging System program is a technology development effort funded by NASA-LeRC and contracted by NASA-JSC to develop and evaluate zero-gravity quantity gaging system concepts suitable for application to large, on-orbit cryogenic oxygen and hydrogen tankage. The contract effective date was 28 May 1985. During performance of the program, 18 potential quantity gaging approaches were investigated for their merit and suitability for gaging two-phase cryogenic oxygen and hydrogen in zero-gravity conditions. These approaches were subjected to a comprehensive trade study and selection process, which found that the RF modal quantity gaging approach was the most suitable for both liquid oxygen and liquid hydrogen applications. This selection was made with NASA-JSC concurrence.

  17. Private and Efficient Query Processing on Outsourced Genomic Databases.

    PubMed

    Ghasemi, Reza; Al Aziz, Md Momin; Mohammed, Noman; Dehkordi, Massoud Hadian; Jiang, Xiaoqian

    2017-09-01

    Applications of genomic studies are spreading rapidly in many domains of science and technology such as healthcare, biomedical research, direct-to-consumer services, and legal and forensic. However, there are a number of obstacles that make it hard to access and process a big genomic database for these applications. First, sequencing genomic sequence is a time consuming and expensive process. Second, it requires large-scale computation and storage systems to process genomic sequences. Third, genomic databases are often owned by different organizations, and thus, not available for public usage. Cloud computing paradigm can be leveraged to facilitate the creation and sharing of big genomic databases for these applications. Genomic data owners can outsource their databases in a centralized cloud server to ease the access of their databases. However, data owners are reluctant to adopt this model, as it requires outsourcing the data to an untrusted cloud service provider that may cause data breaches. In this paper, we propose a privacy-preserving model for outsourcing genomic data to a cloud. The proposed model enables query processing while providing privacy protection of genomic databases. Privacy of the individuals is guaranteed by permuting and adding fake genomic records in the database. These techniques allow cloud to evaluate count and top-k queries securely and efficiently. Experimental results demonstrate that a count and a top-k query over 40 Single Nucleotide Polymorphisms (SNPs) in a database of 20 000 records takes around 100 and 150 s, respectively.

  18. Private and Efficient Query Processing on Outsourced Genomic Databases

    PubMed Central

    Ghasemi, Reza; Al Aziz, Momin; Mohammed, Noman; Dehkordi, Massoud Hadian; Jiang, Xiaoqian

    2017-01-01

    Applications of genomic studies are spreading rapidly in many domains of science and technology such as healthcare, biomedical research, direct-to-consumer services, and legal and forensic. However, there are a number of obstacles that make it hard to access and process a big genomic database for these applications. First, sequencing genomic sequence is a time-consuming and expensive process. Second, it requires large-scale computation and storage systems to processes genomic sequences. Third, genomic databases are often owned by different organizations and thus not available for public usage. Cloud computing paradigm can be leveraged to facilitate the creation and sharing of big genomic databases for these applications. Genomic data owners can outsource their databases in a centralized cloud server to ease the access of their databases. However, data owners are reluctant to adopt this model, as it requires outsourcing the data to an untrusted cloud service provider that may cause data breaches. In this paper, we propose a privacy-preserving model for outsourcing genomic data to a cloud. The proposed model enables query processing while providing privacy protection of genomic databases. Privacy of the individuals is guaranteed by permuting and adding fake genomic records in the database. These techniques allow cloud to evaluate count and top-k queries securely and efficiently. Experimental results demonstrate that a count and a top-k query over 40 SNPs in a database of 20,000 records takes around 100 and 150 seconds, respectively. PMID:27834660

  19. From headache to tumour: An examination of health anxiety, health-related Internet use and 'query escalation'.

    PubMed

    Singh, Karmpaul; Brown, Richard J

    2016-09-01

    The current study aimed to explore the phenomenon of disease-related 'query escalation' in high/low health anxious Internet users (N = 40). During a 15-minute health-related Internet search, participants rated their anxiety and the perceived seriousness of information on each page. Post-search interviews determined the reasons for, and effects of, escalating queries to consider serious diseases. Both groups were found to be significantly more anxious after escalating queries. The high group was significantly more likely to escalate queries. Evaluating personal relevance of material was the main reason for escalations and moderated anxiety post-escalation. We conclude that searching for online disease information can increase anxiety, particularly for people worried about their health. © The Author(s) 2015.

  20. Influence of Aggregate Gradation on the Longitudinal Wave Velocity Changes in Unloaded Concrete

    NASA Astrophysics Data System (ADS)

    Teodorczyk, Michał

    2017-10-01

    Diagnosis is an important factor in the assessment of structural and operational condition of a concrete structure. Among diagnostic methods, non-destructive testing methods play a special role. Acoustic emission evaluation based on the identification and location of destructive processes is one of such methods. The 3D location of AE events and moment tensor of fracture analysis are calculated by longitudinal wave velocity. Therefore, determining the velocity of longitudinal wave of concrete and the impact of the material and destructive factors are of essential importance. This paper reports the investigation of the effect of aggregate gradation on the change in wave velocity of unloaded concrete. The investigation was carried out on six 150 x 150 x 600 mm elements. Three elements contained aggregate fraction 8/16 mm and the other three were made with aggregate fraction 2/16 mm. Two acoustic emission sensors were used on the surface of the elements, and the wave was generated by the Hsu - Nielsen source. Longitudinal wave velocities for each group of elements were calculated and statistical test of significance was used for the comparison of two means. The results of the test indicated a substantial effect of the aggregate grain size on the change in longitudinal wave velocity. The average wave velocity in the concrete containing 8/16 mm fraction was 4672 m/s. In the concrete with 2/16 mm fraction, the velocity decreased to 4373 m/s. The velocity of the wave decreases at larger quantities of aggregate. The propagating longitudinal wave encounters more aggregate grains on its way and is reflected, also from air voids, multiple times and so its velocity is noticeably lower in the concrete with the 2/16 fraction. Thus, to be able to accurately locate AE events and analyse moment tensor during concrete structure testing, the aggregate grain size used in the concrete should be taken into account.

  1. Solid-state 13C NMR experiments reveal effects of aggregate size on the chemical composition of particulate organic matter in grazed steppe soils

    NASA Astrophysics Data System (ADS)

    Steffens, M.; Kölbl, A.; Kögel-Knabner, I.

    2009-04-01

    Grazing is one of the most important factors that may reduce soil organic matter (SOM) stocks and subsequently deteriorate aggregate stability in grassland topsoils. Land use management and grazing reduction are assumed to increase the input of OM, improve the soil aggregation and change species composition of vegetation (changes depth of OM input). Many studies have evaluated the impact of grazing cessation on SOM quantity. But until today little is known about the impact of grazing cessation on the chemical quality of SOM in density fractions, aggregate size classes and different horizons. The central aim of this study was to analyse the quality of SOM fractions in differently sized aggregates and horizons as affected by increased inputs of organic matter due to grazing exclusion. We applied a combined aggregate size, density and particle size fractionation procedure to sandy steppe topsoils with different organic matter inputs due to different grazing intensities (continuously grazed = Cg, winter grazing = Wg, ungrazed since 1999 = Ug99, ungrazed since 1979 = Ug79). Three different particulate organic matter (POM; free POM, in aggregate occluded POM and small in aggregate occluded POM) and seven mineral-associated organic matter fractions were separated for each of three aggregate size classes (coarse = 2000-6300 m, medium = 630-2000 m and fine =

  2. Development and empirical user-centered evaluation of semantically-based query recommendation for an electronic health record search engine.

    PubMed

    Hanauer, David A; Wu, Danny T Y; Yang, Lei; Mei, Qiaozhu; Murkowski-Steffy, Katherine B; Vydiswaran, V G Vinod; Zheng, Kai

    2017-03-01

    The utility of biomedical information retrieval environments can be severely limited when users lack expertise in constructing effective search queries. To address this issue, we developed a computer-based query recommendation algorithm that suggests semantically interchangeable terms based on an initial user-entered query. In this study, we assessed the value of this approach, which has broad applicability in biomedical information retrieval, by demonstrating its application as part of a search engine that facilitates retrieval of information from electronic health records (EHRs). The query recommendation algorithm utilizes MetaMap to identify medical concepts from search queries and indexed EHR documents. Synonym variants from UMLS are used to expand the concepts along with a synonym set curated from historical EHR search logs. The empirical study involved 33 clinicians and staff who evaluated the system through a set of simulated EHR search tasks. User acceptance was assessed using the widely used technology acceptance model. The search engine's performance was rated consistently higher with the query recommendation feature turned on vs. off. The relevance of computer-recommended search terms was also rated high, and in most cases the participants had not thought of these terms on their own. The questions on perceived usefulness and perceived ease of use received overwhelmingly positive responses. A vast majority of the participants wanted the query recommendation feature to be available to assist in their day-to-day EHR search tasks. Challenges persist for users to construct effective search queries when retrieving information from biomedical documents including those from EHRs. This study demonstrates that semantically-based query recommendation is a viable solution to addressing this challenge. Published by Elsevier Inc.

  3. Secure count query on encrypted genomic data.

    PubMed

    Hasan, Mohammad Zahidul; Mahdi, Md Safiur Rahman; Sadat, Md Nazmus; Mohammed, Noman

    2018-05-01

    Human genomic information can yield more effective healthcare by guiding medical decisions. Therefore, genomics research is gaining popularity as it can identify potential correlations between a disease and a certain gene, which improves the safety and efficacy of drug treatment and can also develop more effective prevention strategies [1]. To reduce the sampling error and to increase the statistical accuracy of this type of research projects, data from different sources need to be brought together since a single organization does not necessarily possess required amount of data. In this case, data sharing among multiple organizations must satisfy strict policies (for instance, HIPAA and PIPEDA) that have been enforced to regulate privacy-sensitive data sharing. Storage and computation on the shared data can be outsourced to a third party cloud service provider, equipped with enormous storage and computation resources. However, outsourcing data to a third party is associated with a potential risk of privacy violation of the participants, whose genomic sequence or clinical profile is used in these studies. In this article, we propose a method for secure sharing and computation on genomic data in a semi-honest cloud server. In particular, there are two main contributions. Firstly, the proposed method can handle biomedical data containing both genotype and phenotype. Secondly, our proposed index tree scheme reduces the computational overhead significantly for executing secure count query operation. In our proposed method, the confidentiality of shared data is ensured through encryption, while making the entire computation process efficient and scalable for cutting-edge biomedical applications. We evaluated our proposed method in terms of efficiency on a database of Single-Nucleotide Polymorphism (SNP) sequences, and experimental results demonstrate that the execution time for a query of 50 SNPs in a database of 50,000 records is approximately 5 s, where each record

  4. Toward a Cognitive Task Analysis for Biomedical Query Mediation

    PubMed Central

    Hruby, Gregory W.; Cimino, James J.; Patel, Vimla; Weng, Chunhua

    2014-01-01

    In many institutions, data analysts use a Biomedical Query Mediation (BQM) process to facilitate data access for medical researchers. However, understanding of the BQM process is limited in the literature. To bridge this gap, we performed the initial steps of a cognitive task analysis using 31 BQM instances conducted between one analyst and 22 researchers in one academic department. We identified five top-level tasks, i.e., clarify research statement, explain clinical process, identify related data elements, locate EHR data element, and end BQM with either a database query or unmet, infeasible information needs, and 10 sub-tasks. We evaluated the BQM task model with seven data analysts from different clinical research institutions. Evaluators found all the tasks completely or semi-valid. This study contributes initial knowledge towards the development of a generalizable cognitive task representation for BQM. PMID:25954589

  5. Toward a cognitive task analysis for biomedical query mediation.

    PubMed

    Hruby, Gregory W; Cimino, James J; Patel, Vimla; Weng, Chunhua

    2014-01-01

    In many institutions, data analysts use a Biomedical Query Mediation (BQM) process to facilitate data access for medical researchers. However, understanding of the BQM process is limited in the literature. To bridge this gap, we performed the initial steps of a cognitive task analysis using 31 BQM instances conducted between one analyst and 22 researchers in one academic department. We identified five top-level tasks, i.e., clarify research statement, explain clinical process, identify related data elements, locate EHR data element, and end BQM with either a database query or unmet, infeasible information needs, and 10 sub-tasks. We evaluated the BQM task model with seven data analysts from different clinical research institutions. Evaluators found all the tasks completely or semi-valid. This study contributes initial knowledge towards the development of a generalizable cognitive task representation for BQM.

  6. [On the seasonality of dermatoses: a retrospective analysis of search engine query data depending on the season].

    PubMed

    Köhler, M J; Springer, S; Kaatz, M

    2014-09-01

    The volume of search engine queries about disease-relevant items reflects public interest and correlates with disease prevalence as proven by the example of flu (influenza). Other influences include media attention or holidays. The present work investigates if the seasonality of prevalence or symptom severity of dermatoses correlates with search engine query data. The relative weekly volume of dermatological relevant search terms was assessed by the online tool Google Trends for the years 2009-2013. For each item, the degree of seasonality was calculated via frequency analysis and a geometric approach. Many dermatoses show a marked seasonality, reflected by search engine query volumes. Unexpected seasonal variations of these queries suggest a previously unknown variability of the respective disease prevalence. Furthermore, using the example of allergic rhinitis, a close correlation of search engine query data with actual pollen count can be demonstrated. In many cases, search engine query data are appropriate to estimate seasonal variability in prevalence of common dermatoses. This finding may be useful for real-time analysis and formation of hypotheses concerning pathogenetic or symptom aggravating mechanisms and may thus contribute to improvement of diagnostics and prevention of skin diseases.

  7. Patterns of [PSI+] aggregation allow insights into cellular organization of yeast prion aggregates

    PubMed Central

    Tyedmers, Jens

    2012-01-01

    The yeast prion phenomenon is very widespread and mounting evidence suggests that it has an impact on cellular regulatory mechanisms related to phenotypic responses to changing environments. Studying the aggregation patterns of prion amyloids during different stages of the prion life cycle is a first key step to understand major principles of how and where cells generate, organize and turn-over prion aggregates. The induction of the [PSI+] state involves the actin cytoskeleton and quality control compartments such as the Insoluble Protein Deposit (IPOD). An initially unstable transitional induction state can be visualized by overexpression of the prion determinant and displays characteristic large ring- and ribbon-shaped aggregates consisting of poorly fragmented bundles of very long prion fibrils. In the mature prion state, the aggregation pattern is characterized by highly fragmented, shorter prion fibrils that form aggregates, which can be visualized through tagging with fluorescent proteins. The number of aggregates formed varies, ranging from a single large aggregate at the IPOD to multiple smaller ones, depending on several parameters discussed. Aggregate units below the resolution of light microscopy that are detectable by fluorescence correlation spectroscopy are in equilibrium with larger aggregates in this stage and can mediate faithful inheritance of the prion state. Loss of the prion state is often characterized by reduced fragmentation of prion fibrils and fewer, larger aggregates. PMID:22449721

  8. Non-Arrhenius protein aggregation.

    PubMed

    Wang, Wei; Roberts, Christopher J

    2013-07-01

    Protein aggregation presents one of the key challenges in the development of protein biotherapeutics. It affects not only product quality but also potentially impacts safety, as protein aggregates have been shown to be linked with cytotoxicity and patient immunogenicity. Therefore, investigations of protein aggregation remain a major focus in pharmaceutical companies and academic institutions. Due to the complexity of the aggregation process and temperature-dependent conformational stability, temperature-induced protein aggregation is often non-Arrhenius over even relatively small temperature windows relevant for product development, and this makes low-temperature extrapolation difficult based simply on accelerated stability studies at high temperatures. This review discusses the non-Arrhenius nature of the temperature dependence of protein aggregation, explores possible causes, and considers inherent hurdles for accurately extrapolating aggregation rates from conventional industrial approaches for selecting accelerated conditions and from conventional or more advanced methods of analyzing the resulting rate data.

  9. Privacy-Preserving Location-Based Query Using Location Indexes and Parallel Searching in Distributed Networks

    PubMed Central

    Liu, Lei; Zhao, Jing

    2014-01-01

    An efficient location-based query algorithm of protecting the privacy of the user in the distributed networks is given. This algorithm utilizes the location indexes of the users and multiple parallel threads to search and select quickly all the candidate anonymous sets with more users and their location information with more uniform distribution to accelerate the execution of the temporal-spatial anonymous operations, and it allows the users to configure their custom-made privacy-preserving location query requests. The simulated experiment results show that the proposed algorithm can offer simultaneously the location query services for more users and improve the performance of the anonymous server and satisfy the anonymous location requests of the users. PMID:24790579

  10. Privacy-preserving location-based query using location indexes and parallel searching in distributed networks.

    PubMed

    Zhong, Cheng; Liu, Lei; Zhao, Jing

    2014-01-01

    An efficient location-based query algorithm of protecting the privacy of the user in the distributed networks is given. This algorithm utilizes the location indexes of the users and multiple parallel threads to search and select quickly all the candidate anonymous sets with more users and their location information with more uniform distribution to accelerate the execution of the temporal-spatial anonymous operations, and it allows the users to configure their custom-made privacy-preserving location query requests. The simulated experiment results show that the proposed algorithm can offer simultaneously the location query services for more users and improve the performance of the anonymous server and satisfy the anonymous location requests of the users.

  11. A dirty word or a dirty world?: Attribute framing, political affiliation, and query theory.

    PubMed

    Hardisty, David J; Johnson, Eric J; Weber, Elke U

    2010-01-01

    We explored the effect of attribute framing on choice, labeling charges for environmental costs as either an earmarked tax or an offset. Eight hundred ninety-eight Americans chose between otherwise identical products or services, where one option included a surcharge for emitted carbon dioxide. The cost framing changed preferences for self-identified Republicans and Independents, but did not affect Democrats' preferences. We explain this interaction by means of query theory and show that attribute framing can change the order in which internal queries supporting one or another option are posed. The effect of attribute labeling on query order is shown to depend on the representations of either taxes or offsets held by people with different political affiliations.

  12. 48 CFR 36.516 - Quantity surveys.

    Code of Federal Regulations, 2012 CFR

    2012-10-01

    ... 48 Federal Acquisition Regulations System 1 2012-10-01 2012-10-01 false Quantity surveys. 36.516... CONTRACTING CONSTRUCTION AND ARCHITECT-ENGINEER CONTRACTS Contract Clauses 36.516 Quantity surveys. The contracting officer may insert the clause at 52.236-16, Quantity Surveys, in solicitations and contracts when...

  13. 48 CFR 36.516 - Quantity surveys.

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ... 48 Federal Acquisition Regulations System 1 2011-10-01 2011-10-01 false Quantity surveys. 36.516... CONTRACTING CONSTRUCTION AND ARCHITECT-ENGINEER CONTRACTS Contract Clauses 36.516 Quantity surveys. The contracting officer may insert the clause at 52.236-16, Quantity Surveys, in solicitations and contracts when...

  14. 48 CFR 36.516 - Quantity surveys.

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ... 48 Federal Acquisition Regulations System 1 2010-10-01 2010-10-01 false Quantity surveys. 36.516... CONTRACTING CONSTRUCTION AND ARCHITECT-ENGINEER CONTRACTS Contract Clauses 36.516 Quantity surveys. The contracting officer may insert the clause at 52.236-16, Quantity Surveys, in solicitations and contracts when...

  15. 48 CFR 36.516 - Quantity surveys.

    Code of Federal Regulations, 2014 CFR

    2014-10-01

    ... 48 Federal Acquisition Regulations System 1 2014-10-01 2014-10-01 false Quantity surveys. 36.516... CONTRACTING CONSTRUCTION AND ARCHITECT-ENGINEER CONTRACTS Contract Clauses 36.516 Quantity surveys. The contracting officer may insert the clause at 52.236-16, Quantity Surveys, in solicitations and contracts when...

  16. 48 CFR 36.516 - Quantity surveys.

    Code of Federal Regulations, 2013 CFR

    2013-10-01

    ... 48 Federal Acquisition Regulations System 1 2013-10-01 2013-10-01 false Quantity surveys. 36.516... CONTRACTING CONSTRUCTION AND ARCHITECT-ENGINEER CONTRACTS Contract Clauses 36.516 Quantity surveys. The contracting officer may insert the clause at 52.236-16, Quantity Surveys, in solicitations and contracts when...

  17. Aggregating todays data for tomorrows science: a geological use case

    NASA Astrophysics Data System (ADS)

    Glaves, H.; Kingdon, A.; Nayembil, M.; Baker, G.

    2016-12-01

    Geoscience data is made up of diverse and complex smaller datasets that, when aggregated together, build towards what is recognised as `big data'. The British Geological Survey (BGS), which acts as a repository for all subsurface data from the United Kingdom, has been collating these disparate small datasets that have been accumulated from the activities of a large number of geoscientists over many years. Recently this picture has been further complicated by the addition of new data sources such as near real-time sensor data, and industry or community data that is increasingly delivered via automatic donations. Many of these datasets have been aggregated in relational databases to form larger ones that are used to address a variety of issues ranging from development of national infrastructure to disaster response. These complex domain-specific SQL databases deliver effective data management using normalised subject-based database designs in a secure environment. However, the isolated subject-oriented design of these systems inhibits efficient cross-domain querying of the datasets. Additionally, the tools provided often do not enable effective data discovery as they have problems resolving the complex underlying normalised structures. Recent requirements to understand sub-surface geology in three dimensions have led BGS to develop new data systems. One such solution is PropBase which delivers a generic denormalised data structure within an RDBMS to store geological property data. Propbase facilitates rapid and standardised data discovery and access, incorporating 2D and 3D physical and chemical property data, including associated metadata. It also provides a dedicated web interface to deliver complex multiple data sets from a single database in standardised common output formats (e.g. CSV, GIS shape files) without the need for complex data conditioning. PropBase facilitates new scientific research, previously considered impractical, by enabling property data

  18. Radiation quantities and units

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Not Available

    1980-04-15

    This report supersedes ICRU Report 19. Since ICRU Report 19 was published, a number of discussions have taken place between members of the Report Committee on Fundamental Quantities and Units and other workers in the field. Some of these discussions have resulted in the acceptance of certain modifications in the material set out in Report 19 and these modifications are incorporated in the current report. In addition, there has been some expansion and rearrangement of the material in the earlier report. In line, with providing more didactic material and useful source material for other ICRU reports, the general considerations inmore » subsection 1.A of Report 19 have been expanded and placed in a separate subsection. The additional material includes discussions of four terms that are used in this document - quantity, unit, stochastic, and non-stochastic - along with a brief discussion of the mathematical formalism used in ICRU reports. As in ICRU Report 19, the definitions of quantities and units specifically designed for radiation protection (Part B) are separated from those of the general quantities (Part A). The inclusion of the index concept outlined in ICRU Report 25(4) required an extension of Part B.« less

  19. Effect of aggregate graining compositions on skid resistance of Exposed Aggregate Concrete pavement

    NASA Astrophysics Data System (ADS)

    Wasilewska, Marta; Gardziejczyk, Wladysław; Gierasimiuk, Pawel

    2018-05-01

    The paper presents the evaluation of skid resistance of EAC (Exposed Aggregate Concrete) pavements which differ in aggregate graining compositions. The tests were carried out on concrete mixes with a maximum aggregate size of 8 mm. Three types of coarse aggregates were selected depending on their resistance to polishing which was determined on the basis of the PSV (Polished Stone Value). Basalt (PSV 48), gabbro (PSV 50) and trachybasalt (PSV 52) aggregates were chosen. For each type of aggregate three graining compositions were designed, which differed in the content of coarse aggregate > 4mm. Their content for each series was as follows: A - 38%, B - 50% and C - 68%. Evaluation of the skid resistance has been performed using the FAP (Friction After Polishing) test equipment also known as the Wehner/Schulze machine. Laboratory method enables to compare the skid resistance of different types of wearing course under specified conditions simulating polishing processes. In addition, macrotexture measurements were made on the surface of each specimen using the Elatexure laser profile. Analysis of variance showed that at significance level α = 0.05, aggregate graining compositions as well as the PSV have a significant influence on the obtained values of the friction coefficient μm of the tested EAC pavements. The highest values of the μm have been obtained for EAC with the lowest amount of coarse aggregates (compositions A). In these cases the resistance to polishing of the aggregate does not significantly affect the friction coefficients. This is related to the large areas of cement mortar between the exposed coarse grains. Based on the analysis of microscope images, it was observed that the coarse aggregates were not sufficiently exposed. It has been proved that PSV significantly affected the coefficient of friction in the case of compositions B and C. This is caused by large areas of exposed coarse aggregate. The best parameters were achieved for the EAC pavements

  20. 14 CFR 27.1551 - Oil quantity indicator.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... 14 Aeronautics and Space 1 2011-01-01 2011-01-01 false Oil quantity indicator. 27.1551 Section 27... § 27.1551 Oil quantity indicator. Each oil quantity indicator must be marked with enough increments to indicate readily and accurately the quantity of oil. ...

  1. 14 CFR 29.1551 - Oil quantity indicator.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... 14 Aeronautics and Space 1 2011-01-01 2011-01-01 false Oil quantity indicator. 29.1551 Section 29... Placards § 29.1551 Oil quantity indicator. Each oil quantity indicator must be marked with enough increments to indicate readily and accurately the quantity of oil. ...

  2. 14 CFR 27.1551 - Oil quantity indicator.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... 14 Aeronautics and Space 1 2010-01-01 2010-01-01 false Oil quantity indicator. 27.1551 Section 27... § 27.1551 Oil quantity indicator. Each oil quantity indicator must be marked with enough increments to indicate readily and accurately the quantity of oil. ...

  3. 14 CFR 29.1551 - Oil quantity indicator.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... 14 Aeronautics and Space 1 2010-01-01 2010-01-01 false Oil quantity indicator. 29.1551 Section 29... Placards § 29.1551 Oil quantity indicator. Each oil quantity indicator must be marked with enough increments to indicate readily and accurately the quantity of oil. ...

  4. 14 CFR 27.1551 - Oil quantity indicator.

    Code of Federal Regulations, 2013 CFR

    2013-01-01

    ... 14 Aeronautics and Space 1 2013-01-01 2013-01-01 false Oil quantity indicator. 27.1551 Section 27... § 27.1551 Oil quantity indicator. Each oil quantity indicator must be marked with enough increments to indicate readily and accurately the quantity of oil. ...

  5. 14 CFR 29.1551 - Oil quantity indicator.

    Code of Federal Regulations, 2012 CFR

    2012-01-01

    ... 14 Aeronautics and Space 1 2012-01-01 2012-01-01 false Oil quantity indicator. 29.1551 Section 29... Placards § 29.1551 Oil quantity indicator. Each oil quantity indicator must be marked with enough increments to indicate readily and accurately the quantity of oil. ...

  6. 14 CFR 27.1551 - Oil quantity indicator.

    Code of Federal Regulations, 2012 CFR

    2012-01-01

    ... 14 Aeronautics and Space 1 2012-01-01 2012-01-01 false Oil quantity indicator. 27.1551 Section 27... § 27.1551 Oil quantity indicator. Each oil quantity indicator must be marked with enough increments to indicate readily and accurately the quantity of oil. ...

  7. Pilot Study on the Prevalence of Imposed Queries in a School Library Media Center.

    ERIC Educational Resources Information Center

    Gross, Melissa

    1997-01-01

    Discussion of information-seeking behavior focuses on a study of the imposed query, as opposed to self-generated queries, in an elementary school library media center in order to quantify its presence, to record characteristics of the users that carry them, and to identify the persons imposing them. The coding sheet is appended. Contains one table…

  8. Immobilized humic substances and immobilized aggregates of humic substances as sorbent for solid phase extraction.

    PubMed

    Erny, Guillaume L; Gonçalves, Bruna M; Esteves, Valdemar I

    2013-09-06

    In this work, humic substances (HS) immobilized, as a thin layer or as aggregates, on silica gel were tested as material for solid phase extraction. Some triazines (simazine, atrazine, therbutylazine, atrazine-desethyl-desisopropyl-2-hydroxy, ametryn and terbutryn), have been selected as test analytes due to their environmental importance and to span a large range of solubility and octanol/water partition coefficient (logP). The sorbent was obtained immobilizing a thin layer of HS via physisorption on a pre-coated silica gel with a cationic polymer (polybrene). While the sorbent could be used as it is, it was demonstrated that additional HS could be immobilized, via weak interactions, to form stable humic aggregates. However, while a higher quantity of HS could be immobilized, no significant differences were observed in the sorption parameters. This sorbent have been tested for solid phase extraction to concentrate triazines from aqueous matrixes. The sorbent demonstrated performances equivalent to commercial alternatives as a concentration factor between 50 and 200, depending on the type of triazines, was obtained. Moreover the low cost and the high flow rate of sample through the column allowed using high quantity of sorbent. The analytical procedure was tested with different matrixes including tap water, river water and estuarine water. Copyright © 2013 Elsevier B.V. All rights reserved.

  9. Privacy-Aware Relevant Data Access with Semantically Enriched Search Queries for Untrusted Cloud Storage Services.

    PubMed

    Pervez, Zeeshan; Ahmad, Mahmood; Khattak, Asad Masood; Lee, Sungyoung; Chung, Tae Choong

    2016-01-01

    Privacy-aware search of outsourced data ensures relevant data access in the untrusted domain of a public cloud service provider. Subscriber of a public cloud storage service can determine the presence or absence of a particular keyword by submitting search query in the form of a trapdoor. However, these trapdoor-based search queries are limited in functionality and cannot be used to identify secure outsourced data which contains semantically equivalent information. In addition, trapdoor-based methodologies are confined to pre-defined trapdoors and prevent subscribers from searching outsourced data with arbitrarily defined search criteria. To solve the problem of relevant data access, we have proposed an index-based privacy-aware search methodology that ensures semantic retrieval of data from an untrusted domain. This method ensures oblivious execution of a search query and leverages authorized subscribers to model conjunctive search queries without relying on predefined trapdoors. A security analysis of our proposed methodology shows that, in a conspired attack, unauthorized subscribers and untrusted cloud service providers cannot deduce any information that can lead to the potential loss of data privacy. A computational time analysis on commodity hardware demonstrates that our proposed methodology requires moderate computational resources to model a privacy-aware search query and for its oblivious evaluation on a cloud service provider.

  10. Privacy-Aware Relevant Data Access with Semantically Enriched Search Queries for Untrusted Cloud Storage Services

    PubMed Central

    Pervez, Zeeshan; Ahmad, Mahmood; Khattak, Asad Masood; Lee, Sungyoung; Chung, Tae Choong

    2016-01-01

    Privacy-aware search of outsourced data ensures relevant data access in the untrusted domain of a public cloud service provider. Subscriber of a public cloud storage service can determine the presence or absence of a particular keyword by submitting search query in the form of a trapdoor. However, these trapdoor-based search queries are limited in functionality and cannot be used to identify secure outsourced data which contains semantically equivalent information. In addition, trapdoor-based methodologies are confined to pre-defined trapdoors and prevent subscribers from searching outsourced data with arbitrarily defined search criteria. To solve the problem of relevant data access, we have proposed an index-based privacy-aware search methodology that ensures semantic retrieval of data from an untrusted domain. This method ensures oblivious execution of a search query and leverages authorized subscribers to model conjunctive search queries without relying on predefined trapdoors. A security analysis of our proposed methodology shows that, in a conspired attack, unauthorized subscribers and untrusted cloud service providers cannot deduce any information that can lead to the potential loss of data privacy. A computational time analysis on commodity hardware demonstrates that our proposed methodology requires moderate computational resources to model a privacy-aware search query and for its oblivious evaluation on a cloud service provider. PMID:27571421

  11. Evaluation of Canadian unconfined aggregate freeze-thaw tests for identifying nondurable aggregates.

    DOT National Transportation Integrated Search

    2012-06-01

    Concrete is the most widely used material in construction. Aggregates contribute 60% to 75% of the total volume : of concrete. The aggregates play a key role in concrete durability. The U.S. Midwest has many aggregates that can : show distress in the...

  12. High Quantities of Microplastic in Arctic Deep-Sea Sediments from the HAUSGARTEN Observatory.

    PubMed

    Bergmann, Melanie; Wirzberger, Vanessa; Krumpen, Thomas; Lorenz, Claudia; Primpke, Sebastian; Tekman, Mine B; Gerdts, Gunnar

    2017-10-03

    Although mounting evidence suggests the ubiquity of microplastic in aquatic ecosystems worldwide, our knowledge of its distribution in remote environments such as Polar Regions and the deep sea is scarce. Here, we analyzed nine sediment samples taken at the HAUSGARTEN observatory in the Arctic at 2340-5570 m depth. Density separation by MicroPlastic Sediment Separator and treatment with Fenton's reagent enabled analysis via Attenuated Total Reflection FTIR and μFTIR spectroscopy. Our analyses indicate the wide spread of high numbers of microplastics (42-6595 microplastics kg -1 ). The northernmost stations harbored the highest quantities, indicating sea ice as a possible transport vehicle. A positive correlation between microplastic abundance and chlorophyll a content suggests vertical export via incorporation in sinking (ice-) algal aggregates. Overall, 18 different polymers were detected. Chlorinated polyethylene accounted for the largest proportion (38%), followed by polyamide (22%) and polypropylene (16%). Almost 80% of the microplastics were ≤25 μm. The microplastic quantities are among the highest recorded from benthic sediments. This corroborates the deep sea as a major sink for microplastics and the presence of accumulation areas in this remote part of the world, fed by plastics transported to the North via the Thermohaline Circulation.

  13. The accuracy of less: Natural bounds explain why quantity decreases are estimated more accurately than quantity increases.

    PubMed

    Chandon, Pierre; Ordabayeva, Nailya

    2017-02-01

    Five studies show that people, including experts such as professional chefs, estimate quantity decreases more accurately than quantity increases. We argue that this asymmetry occurs because physical quantities cannot be negative. Consequently, there is a natural lower bound (zero) when estimating decreasing quantities but no upper bound when estimating increasing quantities, which can theoretically grow to infinity. As a result, the "accuracy of less" disappears (a) when a numerical or a natural upper bound is present when estimating quantity increases, or (b) when people are asked to estimate the (unbounded) ratio of change from 1 size to another for both increasing and decreasing quantities. Ruling out explanations related to loss aversion, symbolic number mapping, and the visual arrangement of the stimuli, we show that the "accuracy of less" influences choice and demonstrate its robustness in a meta-analysis that includes previously published results. Finally, we discuss how the "accuracy of less" may explain asymmetric reactions to the supersizing and downsizing of food portions, some instances of the endowment effect, and asymmetries in the perception of increases and decreases in physical and psychological distance. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  14. Assessing the strength of soil aggregates produced by two types of organic matter amendments using the ultrasonic energy

    NASA Astrophysics Data System (ADS)

    Zhu, Zhaolong; minasny, Budiman; Field, Damien; Angers, Denis

    2017-04-01

    The presence of organic matter (OM) is known to stimulate the formation of soil aggregates, but the aggregation strength may vary with different amount and type/quality of OM. Conventionally wet sieving method was used to assess the aggregates' strength. In this study, we wish to get insight of the effects of different types of C inputs on aggregate dynamics using quantifiable energy via ultrasonic agitation. A clay soil with an inherently low soil organic carbon (SOC) content, was amended with two different sources of organic matter (alfalfa, C:N = 16.7 and barley straw, C:N = 95.6) at different input levels (0, 10, 20, & 30 g C kg-1 soil). The soil's inherent macro aggregates were first destroyed via puddling. The soils were incubated in pots at moisture content 70% of field capacity for a period of 3 months. The pots were housed in a 1.2L sealed opaque plastic container. The CO2 generated during the incubation was captured by a vial of NaOH which was placed in each of the sealed containers and sampled per week. At 14, 28, 56, and 84 days, soil samples were collected and the change in aggregation was assessed using a combination of wet sieving and ultrasonic agitation. The relative strength of aggregates exposed to ultrasonic agitation was modelled using the aggregate disruption characteristic curve (ADCC) and soil dispersion characteristic curve (SDCC). Both residue quality and quantity of organic matter input influenced the amount of aggregates formed and their relative strength. The MWD of soils amended with alfalfa residues was greater than that of barley straw at lower input rates and early in the incubation. In the longer term, the use of ultrasonic energy revealed that barley straw resulted in stronger aggregates, especially at higher input rates despite showing similar MWD as alfalfa. The use of ultrasonic agitation, where we quantify the energy required to liberate and disperse aggregates allowed us to differentiate the effects of C inputs on the size of

  15. Using Web Search Query Data to Monitor Dengue Epidemics: A New Model for Neglected Tropical Disease Surveillance

    PubMed Central

    Chan, Emily H.; Sahai, Vikram; Conrad, Corrie; Brownstein, John S.

    2011-01-01

    Background A variety of obstacles including bureaucracy and lack of resources have interfered with timely detection and reporting of dengue cases in many endemic countries. Surveillance efforts have turned to modern data sources, such as Internet search queries, which have been shown to be effective for monitoring influenza-like illnesses. However, few have evaluated the utility of web search query data for other diseases, especially those of high morbidity and mortality or where a vaccine may not exist. In this study, we aimed to assess whether web search queries are a viable data source for the early detection and monitoring of dengue epidemics. Methodology/Principal Findings Bolivia, Brazil, India, Indonesia and Singapore were chosen for analysis based on available data and adequate search volume. For each country, a univariate linear model was then built by fitting a time series of the fraction of Google search query volume for specific dengue-related queries from that country against a time series of official dengue case counts for a time-frame within 2003–2010. The specific combination of queries used was chosen to maximize model fit. Spurious spikes in the data were also removed prior to model fitting. The final models, fit using a training subset of the data, were cross-validated against both the overall dataset and a holdout subset of the data. All models were found to fit the data quite well, with validation correlations ranging from 0.82 to 0.99. Conclusions/Significance Web search query data were found to be capable of tracking dengue activity in Bolivia, Brazil, India, Indonesia and Singapore. Whereas traditional dengue data from official sources are often not available until after some substantial delay, web search query data are available in near real-time. These data represent valuable complement to assist with traditional dengue surveillance. PMID:21647308

  16. Using web search query data to monitor dengue epidemics: a new model for neglected tropical disease surveillance.

    PubMed

    Chan, Emily H; Sahai, Vikram; Conrad, Corrie; Brownstein, John S

    2011-05-01

    A variety of obstacles including bureaucracy and lack of resources have interfered with timely detection and reporting of dengue cases in many endemic countries. Surveillance efforts have turned to modern data sources, such as Internet search queries, which have been shown to be effective for monitoring influenza-like illnesses. However, few have evaluated the utility of web search query data for other diseases, especially those of high morbidity and mortality or where a vaccine may not exist. In this study, we aimed to assess whether web search queries are a viable data source for the early detection and monitoring of dengue epidemics. Bolivia, Brazil, India, Indonesia and Singapore were chosen for analysis based on available data and adequate search volume. For each country, a univariate linear model was then built by fitting a time series of the fraction of Google search query volume for specific dengue-related queries from that country against a time series of official dengue case counts for a time-frame within 2003-2010. The specific combination of queries used was chosen to maximize model fit. Spurious spikes in the data were also removed prior to model fitting. The final models, fit using a training subset of the data, were cross-validated against both the overall dataset and a holdout subset of the data. All models were found to fit the data quite well, with validation correlations ranging from 0.82 to 0.99. Web search query data were found to be capable of tracking dengue activity in Bolivia, Brazil, India, Indonesia and Singapore. Whereas traditional dengue data from official sources are often not available until after some substantial delay, web search query data are available in near real-time. These data represent valuable complement to assist with traditional dengue surveillance.

  17. 14 CFR 23.1551 - Oil quantity indicator.

    Code of Federal Regulations, 2013 CFR

    2013-01-01

    ... 14 Aeronautics and Space 1 2013-01-01 2013-01-01 false Oil quantity indicator. 23.1551 Section 23... Information Markings and Placards § 23.1551 Oil quantity indicator. Each oil quantity indicator must be marked in sufficient increments to indicate readily and accurately the quantity of oil. ...

  18. 14 CFR 23.1551 - Oil quantity indicator.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... 14 Aeronautics and Space 1 2010-01-01 2010-01-01 false Oil quantity indicator. 23.1551 Section 23... Information Markings and Placards § 23.1551 Oil quantity indicator. Each oil quantity indicator must be marked in sufficient increments to indicate readily and accurately the quantity of oil. ...

  19. 14 CFR 23.1551 - Oil quantity indicator.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... 14 Aeronautics and Space 1 2011-01-01 2011-01-01 false Oil quantity indicator. 23.1551 Section 23... Information Markings and Placards § 23.1551 Oil quantity indicator. Each oil quantity indicator must be marked in sufficient increments to indicate readily and accurately the quantity of oil. ...

  20. 14 CFR 23.1551 - Oil quantity indicator.

    Code of Federal Regulations, 2012 CFR

    2012-01-01

    ... 14 Aeronautics and Space 1 2012-01-01 2012-01-01 false Oil quantity indicator. 23.1551 Section 23... Information Markings and Placards § 23.1551 Oil quantity indicator. Each oil quantity indicator must be marked in sufficient increments to indicate readily and accurately the quantity of oil. ...